Introduction
librdesc is a standard C99-based, right-recursive, table-driven parsing library. Unlike traditional parsing engines usually rely heavily on the machine's call stack, librdesc implements its own backtracing algorithm over Concrete Sytax Tree (CST) and token stack.
Key Features
- Table-Driven Parsing: Parser logic is generic and can be controlled at runtime without recompiling via the
struct rdesc_cfg table.
- Pump Mechanism: Parsing is driven by
rdesc_pump(), a state machine that allows for token by token feeding.
- Backtracing Without Call-Stack: Since backtracing is performed via CST, the call stack does not grow. Tokens are be stored in a heap-allocated stack to try non-terminal variants.
API Overview
- Parser (
rdesc.h): The engine that consumes the CFG and Input. It maintains a cursor (cur) and a root node (root).
- CFG (
cfg.h): Defines the rules, tokens, and non-terminals. This is usually static constant data populated via macros.
- Stack (
stack.h): Stores consumed tokens to facilitate backtracking. If a rule fails, the parser pops tokens back to retry an alternative variant.
- BNF DSL (
bnf_dsl.h): A set of macros to facilitate definning Context free grammars.
Quick Start Guide
To use librdesc, you typically follow these steps:
- Define tokens and non-terminals. Create enums for your language symbols.
- Define your grammar. You may use
bnf_dsl.h macros to create the rules table.
- Initialize:
void rdesc_cfg_init(struct rdesc_cfg *cfg, uint32_t nonterminal_count, uint16_t nonterminal_variant_count, uint16_t nonterminal_body_length, const struct rdesc_cfg_symbol *production_rules)
Initializes a context-free grammar object.
Definition: cfg.c:9
void rdesc_init(struct rdesc *p, const struct rdesc_cfg *cfg)
Initializes a new parser.
Definition: rdesc.c:47
Context-free grammar definition.
Definition: cfg.h:18
Right-recursive descent parser.
Definition: rdesc.h:29
const struct rdesc_cfg * cfg
Definition: rdesc.h:31
- Pump Tokens: Feed tokens into the loop until a tree is ready.
}
}
enum rdesc_result rdesc_pump(struct rdesc *p, struct rdesc_node **out, struct rdesc_token *incoming_tk)
Drives the parsing process, The Pump.
Definition: rdesc.c:107
void rdesc_start(struct rdesc *p, int start_symbol)
Sets start symbol for the next match.
Definition: rdesc.c:57
rdesc_result
parsing status
Definition: rdesc.h:22
@ RDESC_READY
Definition: rdesc.h:23
@ RDESC_NOMATCH
Definition: rdesc.h:25
@ RDESC_CONTINUE
Definition: rdesc.h:24
A node in the CST.
Definition: rdesc.h:68
Examples
- boolean_algebra.h : A Boolean Algebra grammar demonstrating backtracking and ambiguity handling.
- bc.h : A Basic Calculator implementation demonstrating operator precedence and expressions.
Building
See librdesc for building instructions.