Facts vs Local knowledge

meditans · June 13, 2023, 1:37pm

I’m currently working on a Prolog project and have run into a conundrum. While using a knowledge base as facts leads to great efficiency (thank to indexing), I’m interested in incorporating ‘local knowledgebases’ stored in a list or ordset for the sake of organization. However, this seems to come at the expense of efficiency. Is there a way to achieve both efficiency and organization in my Prolog program?

brebs · June 13, 2023, 2:39pm

Probably yes. There are other datastructure types, e.g. rbtrees: SWI-Prolog -- Manual

Or assert/1

Or tabling: SWI-Prolog -- Manual

Optimization/efficiency is highly problem-specific - so, what are the details of the problem?

anon419083 · June 13, 2023, 2:56pm

I don’t know what you mean exactly, but relational databases are an interesting and efficient way to store and retrieve information and prolog is particularly suited for this kind of job. In more recent times I’ve seen that they have introduced new data structures like records and dictionaries, but i don’t know them very well. It depends on what you want to do, I think

meditans · June 13, 2023, 5:02pm

Thanks, let me phrase my problem better:

I’ve been working on a program that deals with a list of changing facts. Each fact contains an UUID, some additional metadata, and a compound functor that can have different numbers of arguments. Here’s an example:

fact(UUID_1, OtherStuff_1, variant1(Thing1)).
...
fact(UUID_2, OtherStuff_2, variant2(Thing1, Thing2)).
...
fact(UUID_3, OtherStuff_3, variant3(Thing1, Thing2, Thing3)).
...

and my queries are things like:

fact(_, _, variant2(A, B)).

I have many collections of these facts, and I would prefer to encapsulate them
as ord_sets. However, querying these ord_sets using member/2 becomes
inefficient since the first argument is an UUID.

[fact(UUID_1, OtherStuff_1, variant1(Thing1)),
 ...
 fact(UUID_2, OtherStuff_2, variant2(Thing1, Thing2)),
 ...
 fact(UUID_3, OtherStuff_3, variant3(Thing1, Thing2, Thing3)),
 ...
]

I am looking for a way to get the encapsulation in the last example, with the efficiency of the first, without manually writing my own indexes or relying on manual argument order. Additionally, I am not interested in using a relational DB since my program has a lot of compound terms, and I hope to achieve deep indexing (terminology I took from Argument Indexing in Prolog - YouTube)

I can however phrase the problem as “monotonic” in the spirit of event sourcing (if a fact is no longer true, I can just invalidate it explicitly). Also, as I do a lot of repeated queries, I think some variant of tabling would be useful. I have discovered that I can table dynamic monotonic predicates, and I am considering this approach.

I would greatly appreciate any insight or recommendations!

peter.ludemann · June 13, 2023, 5:24pm

Ordsets require O(n) for lookup. Rbtrees require O(log n). If you don’t like how rbtrees display, it’s easy to fix that using portray/1.

You didn’t say how many items you’re dealing with. I have some code that has over 50,000 items and lookup is not the most expensive thing (confirmed using profile/2).

Also, you didn’t say what operations you’re doing. Do you need set operations (intersection, etc.) or just lookup?

meditans · June 13, 2023, 5:40pm

Here’s what I don’t understand of the rbtree argument here. I hope you can clear my misconception.

rbtrees are O(log n) if I’m using the right key to walk the rbtree. If I have a term like:

fact(UUID_1, OtherStuff_1, variant1(Thing1)).
fact(UUID_2, OtherStuff_2, variant2(Thing1, Thing2)).
fact(UUID_3, OtherStuff_3, variant3(Thing1, Thing2, Thing3)).

and I want:

fact(_, _, variant2(ParticularThing, X)).

then, as the functor is the same for all the terms, the rbtree is effectively indexed on UUIDs, since they are the first argument, no? If that’s the case, I won’t reach the response I want in O(log n). Am I missing something?

As for the other questions, I don’t have specific numbers, but I hope I could get to 100000 items, and it’s just lookup, no intersection or union.

peter.ludemann · June 13, 2023, 6:39pm

If you have multiple keys, then you’ll need multiple rbtrees (they can all have the same values, just accessed by different keys). That’s no different from ordsets; it’s just that can use member/3 to lookup in an ordset by a non-key whereas rbtrees would require rb_in/3, rb_visit/2, or rb_map/2.

It’s also annoying to pass around a pair of rbtrees (or multiple pairs). This can be avoided using EDCGs.

For an example of using rbtrees: pykythe/pykythe/pykythe_symtab.pl at 8e9aa1f938a129c1a61c8c1da6b0ed12d4d7b648 · kamahen/pykythe · GitHub … I use EDCGs to pass the symbol table around in the pykythe.pl, but it’s moderately complex code, so probably not the best example. (I wrote pykythe_symbtab.pl when I wasn’t sure which storage technique I would use; benchmarking showed that for my use case, rbtrees were best, but library(assoc) or even the builtin “dicts” might be better for your use case.)

Boris · June 14, 2023, 4:37am

You should maybe normalize your data first. At that point you might get the JIT indexing to work correctly. The other option is to make your own index but I would first try the obvious solution.

peter.ludemann · June 14, 2023, 4:55am

Thinking about this a bit more …

The persistent predicates project currently allows only a single key but it shouldn’t be too difficult to modify it allow multiple keys (similar to just-in-time indexing of clauses).

And a similar approach should work for rbtrees. I’ll try implementing it - and after doing this, I’d do something similar for persistent predicates.

Let’s call the new data structure “mkrbtree” (for multi-key-rbtree"). With each mkrbtree, there would be a list of keys and an associated predicate that defines which key to use. For example, if we’re storing triples node(From,Edge,To), then we might define

data_index(node(From,_,_),  Index, Key), ground(From)    => Index = 1, Key=From.
data_index(node(_,Edge,To), Index, Key), ground(Edge-To) => Index = 2, Key=Edge-To.
data_index(node(_,_,To),    Index, Key), ground(To)      => Index = 3, Key=To.
data_index(node(_,Edge,To), Index, Key), ground(Edge)    => Index = 2, Key=EdgeTo.
data_index(node(_,_,_),     Index, _)                    => Index = 0.

which would define 3 indexes: on From; on a combination of Edge and To; and on To.

So, with this data:

[node(n1, road, n2),
 node(n1, train, n3),
 node(n2, road, n3),
 node(n3, road, n4)]

there would be three keyed lists (stored as rbtrees) plus one unkeyed list:

1-[n1-[node(n1,road,n2), node(n1,train,n2)], 
   n2-[node(n2,road,n3)], 
   n3-[node(n3,road,n4)]]
2-[(road-n2)-[node(n1,road,n2)], 
   (train-n2)-[node(n1,train,n2)], 
   (road-n3)-[node(n2,road,n3), 
   (road-n4)-[node(n3,road,n4)]]
3-[n2-[node(n1,road,n2],
   n3-[node(n2,road,n3),node(n2,train,n2)], 
   n4-[node(n3,road,n4)]]
0-[node(n1,road,n2), node(n1,train,n2), node(n2,road,n3), node(n3,road,n4)]

mkrb_lookup/3 would call data_index/2 to determine which index to use, then call rb_lookup/3 on the appropriate keyed list; if no key matches, then use member/3 on the unkeyed list. Siimlarly, mkrb_insert/4 would add to all the keyed lists (using rb_insert/4) plus to the unkeyed list.

This looks like a lot of data duplication, but the nodes would be shared amongst all the lists, so the only extra space would be for the indexes.

Anyway, this is a rough idea of how multiple indexes could be used with key-value data to do lookup as efficiently as clause lookup with indexing. Does this seem reasonable? (Some small details would change, of course; in particular, data_index/3 would be a bit different, to allow using both for lookup and insert, probably using once/1 instead of SSU.)

meditans · June 14, 2023, 8:58am

Great suggestions, thank you! May I ask a follow-up question? Is the argument indexing computational behaviour the same for:

Facts
Dynamic facts
Facts issued with library(persistency)?

swi · June 27, 2023, 11:43am

If your goal is ‘organization’, you could consider organizing facts using modules, e.g. one module for each ‘local knowledge base’ as you called it. If your local knowledge bases are a pre-defined set then you can assign one module to each; but you can also create modules dynamically using assertz/1.

This use of modules is common sometimes, in which the module represents a 'world" of knowledge.

Topic		Replies	Views
Querying facts database with records General	8	181	May 14, 2024
Identity and facts General	26	902	June 15, 2021
Connecting Prolog to a SQL Database to Store facts Data Structure	1	154	April 2, 2024
Prolog triple facts vs. RDF library -- retrieval performance of ground truths Help!	0	301	February 27, 2019
Insight -- meta programming Help!	3	488	March 6, 2019

Facts vs Local knowledge

Related topics