Facts vs Local knowledge

Thinking about this a bit more …

The persistent predicates project currently allows only a single key but it shouldn’t be too difficult to modify it allow multiple keys (similar to just-in-time indexing of clauses).

And a similar approach should work for rbtrees. I’ll try implementing it - and after doing this, I’d do something similar for persistent predicates.

Let’s call the new data structure “mkrbtree” (for multi-key-rbtree"). With each mkrbtree, there would be a list of keys and an associated predicate that defines which key to use. For example, if we’re storing triples node(From,Edge,To), then we might define

data_index(node(From,_,_),  Index, Key), ground(From)    => Index = 1, Key=From.
data_index(node(_,Edge,To), Index, Key), ground(Edge-To) => Index = 2, Key=Edge-To.
data_index(node(_,_,To),    Index, Key), ground(To)      => Index = 3, Key=To.
data_index(node(_,Edge,To), Index, Key), ground(Edge)    => Index = 2, Key=EdgeTo.
data_index(node(_,_,_),     Index, _)                    => Index = 0. 

which would define 3 indexes: on From; on a combination of Edge and To; and on To.

So, with this data:

[node(n1, road, n2),
 node(n1, train, n3),
 node(n2, road, n3),
 node(n3, road, n4)]  

there would be three keyed lists (stored as rbtrees) plus one unkeyed list:

1-[n1-[node(n1,road,n2), node(n1,train,n2)], 
   n2-[node(n2,road,n3)], 
   n3-[node(n3,road,n4)]]
2-[(road-n2)-[node(n1,road,n2)], 
   (train-n2)-[node(n1,train,n2)], 
   (road-n3)-[node(n2,road,n3), 
   (road-n4)-[node(n3,road,n4)]]
3-[n2-[node(n1,road,n2],
   n3-[node(n2,road,n3),node(n2,train,n2)], 
   n4-[node(n3,road,n4)]]
0-[node(n1,road,n2), node(n1,train,n2), node(n2,road,n3), node(n3,road,n4)] 

mkrb_lookup/3 would call data_index/2 to determine which index to use, then call rb_lookup/3 on the appropriate keyed list; if no key matches, then use member/3 on the unkeyed list. Siimlarly, mkrb_insert/4 would add to all the keyed lists (using rb_insert/4) plus to the unkeyed list.

This looks like a lot of data duplication, but the nodes would be shared amongst all the lists, so the only extra space would be for the indexes.

Anyway, this is a rough idea of how multiple indexes could be used with key-value data to do lookup as efficiently as clause lookup with indexing. Does this seem reasonable? (Some small details would change, of course; in particular, data_index/3 would be a bit different, to allow using both for lookup and insert, probably using once/1 instead of SSU.)