Scaling to billions of facts?

EricGT · May 21, 2019, 12:35am

When I first played with NoSQL, specifically Neo4j I had to use the same restraint used to learn Prolog, meaning that I had to resist the urge to take what I knew about SQL databases and expect something similar with NoSQL. In learning Neo4j by reading a database with embedded Java instead of via the query language it was obvious that indexes were only really needed to search for the starting node, once that node was found all of the rest was done by following hard coded links (relationships). If you have an outline of the organization of the data as relationship, then you can also via a few relationship steps from one starting node get to most starting nodes in a fraction of a second. Walking a relationship is so much faster than doing a relational join.

While I was amazed at how fast that was and the amount of data that Neo4j could handle because it is all stored in files, the query language was not as powerful as Prolog. So as I noted one of my possible solutions is to reimplement the basics of Prolog for queries to work the Neo4j DB data through the Neo4j API. If you look at those example facts, they are all of the form functor(id,value). These become either properties of the node or a relation to another node which is a hard coded relationship. Even if I can’t get Prolog queries working right away with Neo4j API I can still access large amounts of data quickly using the Java API.

Here is an example of a real world web site with Neo4j database that has lots of data stored and is fast to navigate (Reactome)

I am not getting what you mean by that. If it means that you have two nodes and you can easily go from one to the other via relationship (father(A,B)), how to do you do the reverse, e.g. go from B to A. If that is the case, again with Neo4j as an example you have a node (B), and know a relationship (father), just use the relationship method on a node which returns a pair of nodes. Don’t think of the relationship like (father(A,B)), think of it like (A <-father-> B). Of the two nodes returned, the one that is not the same as the current node is the other node. With Neo4j the relationship has a property that indicates if it is directional, e.g. ->, <-, or <->. If so then a check of the direction can be used to see if it is the reverse. Even if it is → or ← the relationship is still there for both nodes to retrieve via the relationship.

Normally in Neo4j most people access it via the query language which can also do the same, but doing it via code showed me the details that cleared out the confusion.

Topic		Replies	Views
SemMedDB and Prolog Discussion on-topic-only	6	612	July 13, 2022
Connecting Prolog to a SQL Database to Store facts Data Structure	1	82	April 2, 2024
Prolog triple facts vs. RDF library -- retrieval performance of ground truths Help!	0	256	February 27, 2019
Does prolog fit my use case? Help!	21	812	September 17, 2019
Discussion about web app Split Topic discussion	2	876	March 31, 2020

Scaling to billions of facts?

Related Topics