Speeding up "ajdacent" fact lookup

Hello,

Consider a graph, such as found in the semantic library where nodes and links can be ascribed a type.

Suppose in a graph there is a node that has several outgoing links of different types – e.g. my_business_contact, could be one of them, and that these are stored as so:

link(my_business_contact, node_1, node_2).

where node_1 happens to represent me and node_2 a business contact.

Suppose i want to find all of my business contacts – this can be accomplished by a findall/3 as so:

findall(Contact, link(my_business_contact, node1, Contact), Outgoing_Contacts).

I now wonder how lookup and (compound?) indexing in this case works, what is the lookup pattern for forall/3, as it backtracks over link/3, and finds all Outgoing_Contacts.

Suppose the case that:

  • There are many nodes and links in the system.
  • That for each source node (node1) there are typically less than 10 outgoing typed links,such as business contacts, but could be many more
  • That there are less than 10 link types such as business_contacts, close_friends, etc.

An additional case would be to identify all incoming link types such as so:

findall(Contact, link(my_business_contact, Contact, node2), Incoming_Contacts).

In this case, the metrics could be different – a node could have many more incoming link types, but some could have less.

Overall my question is: in what relational structure, a forall/2 lookup + indexing would be most performant.

thanks,

Dan

A link/3 predicate with atomic values should work pretty well. It will probably create a joined hash for the 1st and 2nd argument. The main overhead is likely to be findall/3. In most reasoning you should be able to avoid that and just use

link(my_business_contact, node_1, To),
...

In any case, the way to find out is to work on real data or, if you do not have that yet, generate synthetic data and see what happens when implementing the various algorithms you need.

Thank you.

So, a combined hash essentially means that a hash is calculated based on <link, my_business_contact, node_1> and, findall, then find each binding for To, via a linear scan linked to the joint hashtable “bucket”.

Is this correct?

Btw, it looks like i can’t avoid the findall/3 since i have to do some processing of the set i obtain in addition to processing each individual items.

But, i am curious, why would findall/3 have a significant overhead vs. the usual backtracking based reasoning route for identifying the To item.

Dan

Except that the predicate is not part of the hash. The predicate creates a hash table that combines the first two arguments. As that will be the most selective table it has, a call will try to create a hash from the first two arguments (fails if one is unbound), do a lookup and scan the bucket.

Besides the backtracking you need to do anyway, you need to store the results into some intermediate database that keeps the results while backtracking (i.e., something global) and on completion you need to get them out and put them into a list. As the store is in non-backtrackable memory we must watch for exceptions during findall/3 as we need to reclaim the store in that case. That all is a lot of work. If you need the set all the time and your graph is mostly static, it is probable better to store the set of reachable targets in the relation. That is a lot more admin to update the graph and is less flexible. As a compromise you could cache the sets, either by hand or using tabling.

My intuition says you’re doing something wrong if you need the set for most operations. It isn’t very Prolog like.

1 Like

Yea, i started to wonder about this …