librdesc
Loading...
Searching...
No Matches
librdesc

Introduction

librdesc is a standard C99-based, right-recursive, table-driven parsing library. Unlike traditional parsing engines usually rely heavily on the machine's call stack, librdesc implements its own backtracing algorithm over Concrete Sytax Tree (CST) and token stack.

Key Features

  • Table-Driven Parsing: Parser logic is generic and can be controlled at runtime without recompiling via the struct rdesc_cfg table.
  • Pump Mechanism: Parsing is driven by rdesc_pump(), a state machine that allows for token by token feeding.
  • Backtracing Without Call-Stack: Since backtracing is performed via CST, the call stack does not grow. Tokens are be stored in a heap-allocated stack to try non-terminal variants.

API Overview

  1. Parser (rdesc.h): The engine that consumes the CFG and Input. It maintains a cursor (cur) and a root node (root).
  2. CFG (cfg.h): Defines the rules, tokens, and non-terminals. This is usually static constant data populated via macros.
  3. Stack (stack.h): Stores consumed tokens to facilitate backtracking. If a rule fails, the parser pops tokens back to retry an alternative variant.
  4. BNF DSL (bnf_dsl.h): A set of macros to facilitate definning Context free grammars.

Quick Start Guide

To use librdesc, you typically follow these steps:

  1. Define tokens and non-terminals. Create enums for your language symbols.
  2. Define your grammar. You may use bnf_dsl.h macros to create the rules table.
  3. Initialize:
    struct rdesc_cfg cfg;
    rdesc_cfg_init(&cfg, ...); // Load your rules
    struct rdesc parser;
    rdesc_init(&parser, &cfg);
    void rdesc_cfg_init(struct rdesc_cfg *cfg, uint32_t nonterminal_count, uint16_t nonterminal_variant_count, uint16_t nonterminal_body_length, const struct rdesc_cfg_symbol *production_rules)
    Initializes a context-free grammar object.
    Definition: cfg.c:9
    void rdesc_init(struct rdesc *p, const struct rdesc_cfg *cfg)
    Initializes a new parser.
    Definition: rdesc.c:47
    Context-free grammar definition.
    Definition: cfg.h:18
    Right-recursive descent parser.
    Definition: rdesc.h:29
    const struct rdesc_cfg * cfg
    Definition: rdesc.h:31
  4. Pump Tokens: Feed tokens into the loop until a tree is ready.
    rdesc_start(&parser, NT_STMT); // Set your start symbol.
    enum rdesc_result res;
    struct rdesc_node *cst;
    while ((res = rdesc_pump(&parser, &cst, next_token())) == RDESC_CONTINUE) {
    // Acquire next_token from your lexer
    }
    if (res == RDESC_READY) {
    // Process the `cst`
    } else if (res == RDESC_NOMATCH) {
    // Handle syntax error
    }
    enum rdesc_result rdesc_pump(struct rdesc *p, struct rdesc_node **out, struct rdesc_token *incoming_tk)
    Drives the parsing process, The Pump.
    Definition: rdesc.c:107
    void rdesc_start(struct rdesc *p, int start_symbol)
    Sets start symbol for the next match.
    Definition: rdesc.c:57
    rdesc_result
    parsing status
    Definition: rdesc.h:22
    @ RDESC_READY
    Definition: rdesc.h:23
    @ RDESC_NOMATCH
    Definition: rdesc.h:25
    @ RDESC_CONTINUE
    Definition: rdesc.h:24
    A node in the CST.
    Definition: rdesc.h:68

Examples

  • boolean_algebra.h : A Boolean Algebra grammar demonstrating backtracking and ambiguity handling.
  • bc.h : A Basic Calculator implementation demonstrating operator precedence and expressions.

Building

See librdesc for building instructions.