librdesc
Loading...
Searching...
No Matches
Main Page

Introduction

librdesc is a recursive descent parsing library written in C99. Parser logic is generic and can be controlled at runtime without recompiling.

Parsing is driven by rdesc_pump(), a state machine that allows token by token feeding. The engine handles memory allocation errors in pumping and remains in a valid state. See Parser State Machine for details.

librdesc, similar to PEGs, have ordered-choice semantics. Parses grammars using prioritized alternatives where the first matching alternative is selected, providing unlimited lookahead via backtracking.

API Overview

  • Parser (rdesc.h): The engine that consumes tokens. It defines the core parser state machine.
  • Tree macros (cst_macros.h): Macros to access fields of CST.
  • Grammar (grammar.h): Defines the grammar struct for production rules, which usually populated with static constant data using macros.
  • Rule macros (rule_macros.h): A set of macros to facilitate definning librdesc grammars. See Examples for usage.
  • Optional features (util.h): Utilities making librdesc easier to use.

To use librdesc, you typically follow these steps:

  1. Define the identifiers of tokens and nonterminals (e.g. create enums for your language symbols)
  2. Define your grammar and its symbol table. You may use rule_macros.h to create the rules table.
  3. Initialize:
    struct rdesc_grammar grammar;
    struct rdesc parser;
    rdesc_grammar_init_checked(&grammar, ...); // Load your rules.
    rdesc_init(&parser,
    &grammar,
    sizeof(struct seminfo), // Size of seminfo provided by lexer
    token_destroyer_method); // Function pointer for destroying that seminfo
    #define rdesc_grammar_init_checked(grammar, production_count, max_alternative_count, max_alternative_size, production_rules)
    Grammar initialization instrumented with size assertion.
    Definition grammar.h:27
    int rdesc_init(struct rdesc *parser, const struct rdesc_grammar *grammar, size_t seminfo_size, void(*token_destroyer)(uint16_t id, void *seminfo)) _rdesc_wur
    Initializes a new parser.
    Grammar definition.
    Definition grammar.h:50
    Recursive descent parser state.
    Definition rdesc.h:42
  4. Pump Tokens: Feed tokens into the loop until a tree is ready.
    rdesc_start(&parser, NT_STMT); // Set your start symbol.
    enum rdesc_result parser_state;
    do {
    // Acquire next_token from your lexer.
    uint16_t tk = next_tk();
    void *seminfo = next_seminfo();
    parser_state = rdesc_pump(&parser, tk, seminfo);
    // You can retry in case of a memory allocation failure.
    while (parser_state == RDESC_ENOMEM)
    parser_state = rdesc_resume(&parser);
    } while (parser_state == RDESC_CONTINUE);
    if (res == RDESC_READY) {
    struct rdesc_node *cst = rdesc_root(&parser);
    // Process the `cst` (use `cst_macros.h`)
    } else if (res == RDESC_NOMATCH) {
    // Handle syntax errors.
    }
    int rdesc_start(struct rdesc *parser, uint16_t start_symbol) _rdesc_wur
    Sets start symbol for the next match.
    enum rdesc_result rdesc_pump(struct rdesc *parser, uint16_t id, void *seminfo) _rdesc_wur
    Drives the parsing process, the pump.
    enum rdesc_result rdesc_resume(struct rdesc *parser) _rdesc_wur
    Resume parsing without providing a new token.
    struct rdesc_node * rdesc_root(struct rdesc *parser)
    Returns the root of the CST.
    rdesc_result
    Parse operation result codes.
    Definition rdesc.h:30
    @ RDESC_ENOMEM
    Definition rdesc.h:32
    @ RDESC_READY
    Definition rdesc.h:34
    @ RDESC_NOMATCH
    Definition rdesc.h:38
    @ RDESC_CONTINUE
    Definition rdesc.h:36

See Quick Start for a complete example!

Examples

  • boolean_algebra.h: A Boolean Algebra grammar demonstrating backtracking and ambiguity handling.
  • bc.h: A Basic Calculator implementation demonstrating operator precedence and expressions.

Integration

See librdesc for integration instructions, and Building for build options.