Introduction
librdesc is a recursive descent parsing library written in C99. Parser logic is generic and can be controlled at runtime without recompiling.
Parsing is driven by rdesc_pump(), a state machine that allows token by token feeding. The engine handles memory allocation errors in pumping and remains in a valid state. See Parser State Machine for details.
librdesc, similar to PEGs, have ordered-choice semantics. Parses grammars using prioritized alternatives where the first matching alternative is selected, providing unlimited lookahead via backtracking.
API Overview
- Parser (
rdesc.h): The engine that consumes tokens. It defines the core parser state machine.
- Tree macros (
cst_macros.h): Macros to access fields of CST.
- Grammar (
grammar.h): Defines the grammar struct for production rules, which usually populated with static constant data using macros.
- Rule macros (
rule_macros.h): A set of macros to facilitate definning librdesc grammars. See Examples for usage.
- Optional features (
util.h): Utilities making librdesc easier to use.
To use librdesc, you typically follow these steps:
- Define the identifiers of tokens and nonterminals (e.g. create enums for your language symbols)
- Define your grammar and its symbol table. You may use
rule_macros.h to create the rules table.
- Initialize:
&grammar,
sizeof(struct seminfo),
token_destroyer_method);
#define rdesc_grammar_init_checked(grammar, production_count, max_alternative_count, max_alternative_size, production_rules)
Grammar initialization instrumented with size assertion.
Definition grammar.h:27
int rdesc_init(struct rdesc *parser, const struct rdesc_grammar *grammar, size_t seminfo_size, void(*token_destroyer)(uint16_t id, void *seminfo)) _rdesc_wur
Initializes a new parser.
Grammar definition.
Definition grammar.h:50
Recursive descent parser state.
Definition rdesc.h:42
- Pump Tokens: Feed tokens into the loop until a tree is ready.
do {
uint16_t tk = next_tk();
void *seminfo = next_seminfo();
}
int rdesc_start(struct rdesc *parser, uint16_t start_symbol) _rdesc_wur
Sets start symbol for the next match.
enum rdesc_result rdesc_pump(struct rdesc *parser, uint16_t id, void *seminfo) _rdesc_wur
Drives the parsing process, the pump.
enum rdesc_result rdesc_resume(struct rdesc *parser) _rdesc_wur
Resume parsing without providing a new token.
struct rdesc_node * rdesc_root(struct rdesc *parser)
Returns the root of the CST.
rdesc_result
Parse operation result codes.
Definition rdesc.h:30
@ RDESC_ENOMEM
Definition rdesc.h:32
@ RDESC_READY
Definition rdesc.h:34
@ RDESC_NOMATCH
Definition rdesc.h:38
@ RDESC_CONTINUE
Definition rdesc.h:36
See Quick Start for a complete example!
Examples
boolean_algebra.h: A Boolean Algebra grammar demonstrating backtracking and ambiguity handling.
bc.h: A Basic Calculator implementation demonstrating operator precedence and expressions.
Integration
See librdesc for integration instructions, and Building for build options.