DCG pack for the ninja build system - Reply 1

Never did finish the grammar.
The main reason being that there was no documentation of a grammar that worked as needed so it was a long process of trial and error slowing building up the grammar.

In the end since the purpose was to parse PDF files and https://www.ghostscript.com/ did what I needed, it could pull out the details of the a PDF and provide in a structured format that was easier to parse, used that instead. Once I knew I could use that method to grab text out of PDFs as needed moved onto another project.

But now that LLMs like ChatGPT are regularly being used to read PDFs will have to revisit the entire process. At Open Chat GPT Plugin Store (801 plugins) use PDF as keyword to find several ChatGPT plugins that can read a PDF and then you can have a conversation with the PDF. Quite useful for research.

Parsing PostScript is much easier than parsing a PDF. If you dig into a PDF it looks like an archive file of a directory of a programming project. (ref - even this is quite a bit simplified) There are folders for each page, with IIRC each page holding PostScript, resources such as images, meta data, and so on. Most of that is not clearly documented for parsing. Most of the PostScript books are more about how to use PostScript. Adobe really does keep most of it behind closed doors which is why GhostScript is so nice to have.

IIRC Adobe a few years ago did release some internal documents that might be useful but did not look at them in detail as I moved on to other projects. I think one had to sign a release agreement to download them.


EDIT

I uploaded the PostScript scanner and test cases for the code at the time I stopped working on it. (here)

It is NOT released as open source as I will be keeping the copyright but it will give you some idea of how hard it is to create such code. Remember that the PDF parser is much harder but that is not included as that was in the early stages of development and has more problems then being useful.