Parsing text using a formal grammar: SWIPL Example

So I think my definition of a “parser” may be much more inclusive than yours. A (text) parser is a program which takes flat text as input and produces a result. It ranges from a simple recognizer - the input text is, or is not, a legal “sentence” in the language formally specified by the grammar. Tokenizers (as I view them) produce a flat list of “tokens”. More capable parsers might produce an abstract syntax tree to be used in the next step of processing, e.g., mapping to a Prolog term, or some other suitable form for compiling.

pPEG is a generic parser system that uses a formal grammar specification to map input text to a generic kind of syntax tree, nothing more or less. If I want to produce Prolog terms from a grammar specification of Prolog syntax, I need a back end to define the semantics of the input text. But I could use the same formal grammar to produce, for example, a JSON equivalent of a term as a string, and that would require a different back end.

And because I could possibly implement any pPEG as a DCG (is that what you mean by “directly realized”?) doesn’t make it a “rip-off”. By the same token, DCG’s are a “rip-off” from Prolog itself. Furthermore, pPEG has an entirely different programming model - more like regular expressions’ direct execution model IMO. But, yes, I suppose it would be possible to “transpile” a PEG grammar to a DCG which would then presumably have to be asserted before it could be used. This would be more inline with traditional grammar systems, e.g., ANTLR, which I speculated might be one of the reasons grammars lagged far behind regular expressions in either usage or direct language support. There are probably many such options I might have chosen but did not. Instead I more closely followed the VM structure of implementations for other programming languages (JavaScript, Python, …) because I knew they worked.

Also note that pPEG operates on strings, not lists or arrays of characters, which could have a significant impact on memory usage and possibly even performance (not that I’m overly concerned about this right now):

?- S=`abcdefghijklmnopqrstuvwxyz`, term_size(S,Sz).
S = [97, 98, 99, 100, 101, 102, 103, 104, 105|...],
Sz = 78.

?- S="abcdefghijklmnopqrstuvwxyz", term_size(S,Sz).
S = "abcdefghijklmnopqrstuvwxyz",
Sz = 6.

Expressed as a list, how big would a 1000 line program be?