Scaling to billions of facts?

I don’t know about papers. There have been some Prolog systems in the past that could store predicates on a file. IBM-Prolog? If I had to implement something like this, I’d think about the following:

  • Have a key-value store that stores the clauses by (index-)number as key and the.QLF representation for the clause as value. This would need some optimization to the QLF code handling for small objects. QLF is now a bit heavy in setting up the input/output, dealing with a header, etc.
  • Have one or more additional key-value stores that map index keys to lists of clause numbers. That can use the same technology that now implements JIT indexing for clauses.
  • Now to run
    • Create a shadow database containing a cache of normal clauses.
    • Determine index to use from the instantiation pattern
    • Lookup the clauses in the appropriate index database
    • Execute these clauses, which means
      • Ensure the next clause is in the cache
      • Execute it.

Note that if your data is static you can probably do way better by designing a clever file format and mapping that into memory.

A sensible alternative is probably do distribute the clauses of the predicate over multiple Prolog instances and distribute the query. Depending on the problem that might be quite simple. You will typically loose ordering and committing (cuts).

1 Like