Making a large prolog based knowledge base for bioinformatics/epidemiology

Hi Sam,

Long time no see.

At one point I was going to add a reference in here that the OP should talk to you, but then noticed that you were the OP, that caught me off guard as I thought you would be the one to go to for info.

A few questions first so I understand exactly what you mean. I know enough about what you are talking about on both the Prolog side and the problem domain side to not run away in horror, but not enough that I can read through what you wrote in one reading and have it all make sense.

When you ask how to read are you asking about how should Prolog access the data? It seems that it could a large amount of data and you are concerned about loading it all into memory, is that correct? If you
gave the size of all of the data it would be nice for more specific feedback. Are you aware of quick load files? See: Quick Load Files If you read the details in my post you will notice that in those examples, Gigabytes of data are being accessed and the data is stored on a USB flash drive that you can pick up at the grocery store check out lane. :smiley:

Also see: Scaling to billions of facts?

If you go the route of storing the data as facts, but want them to be updatable, then you should also look at Solving two consecutive dependent goals from command line and in particular this post

As you know DCGs are your friends. Again if you look at the example in Quick Load Files you will notice the data is from UniProt which while there is an XML version of the data, I used DCGs to parse the flat file format which is much closer to what you would need to parse the binary format.

The one thing that I was surprised to see you not ask was how to parse PDF and PostScript files for data curation. Did you ever notice my post about using DCG on PostScript

Feel free to private message me if that works better for you.

Also it feels like we have been walking along similar paths for the last several months and they just crossed again. :smiley: