How to express a disjunction as an argument so the predicate can be indexed on that argument

ericzinda · May 25, 2022, 2:04am

Sorry for the long post, it seems like getting the background down here would be helpful:

I’ve got a database of about 3000 triples that represent what I think is a relatively classic “Knowledge Representation and Reasoning” database that describes a world and allows the user to walk around and examine it. It’s an interactive fiction type game. The data is stored in triples like rel(player, isa, person) ala RDF.

The current performance is not great and I also need the knowledge base to be able to be much bigger. So I’m looking for ways to limit the raw data the queries need to chug through since that is one of the limiting factors.

One thing I’ve noticed is that, analogously to rendering graphics in a 3D environment, a lot of the knowledge is invisible depending on the user’s location. By this I mean that if I were to completely remove those “invisible” triples from the database, the queries would all return the same results and the experience would be identical. However, right now these triples are just sitting there being retrieved, running through queries that fail, and being subsequently ignored…and burning performance.

For example:

The player is in the Conservatory and they ask any question about the kitchen. Since that is another room you can’t see, they will all fail.
There is a box of 50 different toys in the bedroom. When the player is in the living room, we shouldn’t have to cycle through all of these toys when they ask “what can I see?”, since they are “out of scope” and nothing you can say or ask about them will succeed.

etc.

To reduce the amount of irrelevant information SWI Prolog has to fish through, I’m experimenting with changing the lowest level data interface predicate I have, rel(Subject, Relationship, Object), to remove some of the obviously useless information based on the player’s location. I do this by:

Adding an extra Shard argument to the raw data predicate I’m using: store(X, Rel, Y, Shard). It is acting like “Sharding” in other data stores. If a piece of information should be everywhere, Shard is _. Otherwise, it is an atom that only matches when the user is in that Shard.
Before running a query the caller calls setShard(Name) where Name is an atom representing the location of the user. (I realize I could be passing an argument through, but that involves a ton of code changes at this point and I’m trying to keep the experiment simple…)

setShard(Name) :-
    b_setval(shard, Name).

getShard(Name) :-
    (   nb_current(shard, Name)
    ->  true
    ;   true
    ).

rel(X, Rel, Y) :-
    getShard(Shard),
    store(X, Rel, Y, Shard).

% Data to be "sharded": store(X, Rel, Y, Shard)
store(toy1, locatedIn, bedroom, bedroom).
store(toy2, locatedIn, bedroom, bedroom).
store(table, locatedIn, livingRoom, livingRoom).

Then you could use it like:

?- setShard(livingRoom), rel(X, locatedIn, livingRoom).
X = table.

My problem is this: Some pieces of information need to show up in a couple locations but I’d like to get the benefits of indexing on store/4. Setting Shard to _ for that data is unnecessarily broad but setting it to a single atom is incorrect (too narrow). I’d like to logically set it to something like this pseudocode:

store(visibleInBoth, locatedIn, once((Shard == livingRoom; Shard == bedroom)) ).

… but I can’t figure out a way to do that using just unification so I can keep the JIT indexing of store/4.

Any ideas?

peter.ludemann · May 25, 2022, 6:01am

Is there a reason you can’t do this (I’m a bit confused by your preiates, so I might have something wrong here):

store(visibleInBoth, locatedIn, livingRoom).
store(visibleInBoth, locatedIn, bedroom).

For removing duplicates, use setof/3:

store_(Visible, Located, Shard) :-
    setof(V-L, store(V, L, Shard), VLs),
    member(Visible-Located, VLs).

brebs · May 25, 2022, 6:47am

Change your data representation to a better one - could write a bit of code to create e.g. can_see_in_room(convervatory, ...) facts using assert/1 at program startup, and then refer to those faster facts.

… or use table/1

jan · May 25, 2022, 9:02am

You could set your shared to a constraint variable, as in

 freeze(Location, memberchk(Location, [x,y])),

You could also duplicate the information. That might be a good solution if that doesn’t blow up the database too much.

At a higher level, can’t you solve the problem by reordering the queries, So, when looking for things in the living room, start with the triples about the living room and then do the others. That is also what happens in the SWI-Prolog SPARQL engine that is part of ClioPatria. This uses statistics on the database to estimate the complexity of executing a basic graph pattern (as SPARQL calls a pattern of triples) and selects the fastest one. This works pretty well.

If you can easily select everything that is relevant you could also consider creating a small database of facts that are relevant to your current state and objectives.

ericzinda · May 25, 2022, 7:08pm

Thanks for all the ideas! These are great.

peter.ludemann:

Is there a reason you can’t do this (I’m a bit confused by your preiates, so I might have something wrong here):
store(visibleInBoth, locatedIn, livingRoom).
store(visibleInBoth, locatedIn, bedroom).

That is an option, but I was worried that it would increase the number of facts too much. I need to test to see if it matters. Thanks @peter.ludemann!

I have been using table/1 liberally, but I can’t use it everywhere (it actually slows things down sometimes since some data is changing as you play the game). I’m now at the point where I’m doing tuning for the places where I can’t use it.

@brebs, can you tell me more about why can_see_in_room(convervatory, ...) is faster? Are you implying the predicate is a higher level predicate? or is there something about matching compound predicate names that is faster? Or?

With this approach wouldn’t the engine effectively run this query (bypassing any index on the Location argument?):

rel(visibleInBoth, locatedIn, Location),
memberchk(Location, [x,y]).

I have done a little of that so far (also based on your suggestion!), but you may be right that a more radical application of that approach is the solution. The trick with this one is that my queries are trees because some quantifier terms (like ‘some’, ‘many’, etc) do interesting things like counting the number of answers to a predicate before executing another. Something like:

% Get some rocks.
some(X, rock(X), get(X))

some/3 will make sure there are more than 2 answers to rock(X) before calling get(X), etc. I’m sure there is a way to change the execution model but it may be pretty radical.

This is an interesting idea: basically just make a copy of the limited data set. Hmm. The only problem I see is that the queries have side effects (since they are sometimes things like “move north” that change state) and I’d have to make sure that gets done in the right place, etc. I think about that one some more.

Thanks everyone!

brebs · May 25, 2022, 8:30pm

Faster because of indexing: SWI-Prolog -- Manual

If the player is in the conservatory, and conservatory is the first argument (i.e. indexed) on the facts you are looking up as applicable to that room, then the non-conservatory facts will be skipped quickly due to indexing.

Topic		Replies	Views
Double counting in findall disjunctive Help!	8	445	April 5, 2022
Storing and Querying Object Tuples with Prolog Help!	2	1046	January 10, 2020
How to write data for fast indexing? Help!	2	640	August 12, 2019
Why the answer contains a `false`? Predicate discussion , how-to	13	2500	May 26, 2022
Forall/2, foreach/2, double negation, relational division and SQL Nice to know database , sql	5	884	June 3, 2022

How to express a disjunction as an argument so the predicate can be indexed on that argument

Related Topics