SemMedDB and Prolog

EricGT · June 17, 2022, 6:50am

From another post.

For reasoning, how is the data to be accessed?

Prolog facts
via SPARQL with the data stored as RDF
some other means

Also as I have not worked with SemMedDB, how large are the files? I have not tried to download them and see that I need an account.(ref) but if I can get the download files I am not opposed to giving some ideas a try.

jan · June 17, 2022, 7:48am

There are various ways accessing such repositories.

Using the new RocksDB clauses would quite likely do the job. Access performance will be limited to about 30k-300k lookups/second (depending on data structure, indexes and further enhancements).
Use an external database, either using ODBC or the embedded sqlite or my recent prototyped access to embedded MonetDB. Lookup for single rows is slower than above (probably), but if you can make the database do interesting joins that do not perform well in Prolog the end may be quite ok. Although setting it up can be hard, the bundled CQL package can translate Prolog conjunctions to SQL joins (and a lot more).
Translate the data to RDF and compile that to an HDT. Then use SWI-Prolog’s HDT add-on to gain access. That scales fine. Triple access times should be about 500K/second.
Similar, using TerminusDB. You can use their query language and I think you can also access the triples from Prolog.

Rscho314 · June 17, 2022, 8:33am

The complete database is about 35 GB. My last attempt used prolog facts, as I thought that would be easiest.

jan · June 17, 2022, 8:51am

A machine with 1TB of RAM might do the job (depending on the structure and required indexes)

EricGT · June 17, 2022, 12:07pm

Is this available? I am not easily finding it.

Rscho314 · June 17, 2022, 1:16pm

Well, I’ve only got 100 GB, so close but no cigar.

EricGT · July 13, 2022, 10:16am

As noted here was able to load the Rich Release Format (RRF) files from UMLS 2022AA Full. The technical details on each RRF file can be found here.

Then came across SemMedDB Database Details for version 4.2 or higher and noticed that these tables were not in the data loaded. They look very useful for doing biology information research, reminds me of the legal search LexusNexus.

Do you need that data included?
If so, is this the correct page to access them?

https://lhncbc.nlm.nih.gov/ii/tools/SemRep_SemMedDB_SKR/SemMedDB_download.html

Topic		Replies	Views
Persistent predicates based on RocksDB Request For Comments	110	3335	February 8, 2025
Prolog triple facts vs. RDF library -- retrieval performance of ground truths Help!	0	302	February 27, 2019
Tutorial on accessing external databases Request For Comments	14	3076	May 15, 2020
Insight -- meta programming Help!	3	488	March 6, 2019
Scaling to billions of facts? General	18	4560	November 28, 2024

SemMedDB and Prolog

Related topics