Scaling to billions of facts?

When I first played with NoSQL, specifically Neo4j I had to use the same restraint used to learn Prolog, meaning that I had to resist the urge to take what I knew about SQL databases and expect something similar with NoSQL. In learning Neo4j by reading a database with embedded Java instead of via the query language it was obvious that indexes were only really needed to search for the starting node, once that node was found all of the rest was done by following hard coded links (relationships). If you have an outline of the organization of the data as relationship, then you can also via a few relationship steps from one starting node get to most starting nodes in a fraction of a second. Walking a relationship is so much faster than doing a relational join.

While I was amazed at how fast that was and the amount of data that Neo4j could handle because it is all stored in files, the query language was not as powerful as Prolog. So as I noted one of my possible solutions is to reimplement the basics of Prolog for queries to work the Neo4j DB data through the Neo4j API. If you look at those example facts, they are all of the form functor(id,value). These become either properties of the node or a relation to another node which is a hard coded relationship. Even if I can’t get Prolog queries working right away with Neo4j API I can still access large amounts of data quickly using the Java API.

Here is an example of a real world web site with Neo4j database that has lots of data stored and is fast to navigate (Reactome)

I am not getting what you mean by that. If it means that you have two nodes and you can easily go from one to the other via relationship (father(A,B)), how to do you do the reverse, e.g. go from B to A. If that is the case, again with Neo4j as an example you have a node (B), and know a relationship (father), just use the relationship method on a node which returns a pair of nodes. Don’t think of the relationship like (father(A,B)), think of it like (A <-father-> B). Of the two nodes returned, the one that is not the same as the current node is the other node. With Neo4j the relationship has a property that indicates if it is directional, e.g. ->, <-, or <->. If so then a check of the direction can be used to see if it is the reverse. Even if it is → or ← the relationship is still there for both nodes to retrieve via the relationship.

Normally in Neo4j most people access it via the query language which can also do the same, but doing it via code showed me the details that cleared out the confusion.