SemMedDB and Prolog

@Rscho314

From another post.

For reasoning, how is the data to be accessed?

  • Prolog facts
  • via SPARQL with the data stored as RDF
  • some other means

Also as I have not worked with SemMedDB, how large are the files? I have not tried to download them and see that I need an account.(ref) but if I can get the download files I am not opposed to giving some ideas a try.

1 Like

There are various ways accessing such repositories.

  • Using the new RocksDB clauses would quite likely do the job. Access performance will be limited to about 30k-300k lookups/second (depending on data structure, indexes and further enhancements).
  • Use an external database, either using ODBC or the embedded sqlite or my recent prototyped access to embedded MonetDB. Lookup for single rows is slower than above (probably), but if you can make the database do interesting joins that do not perform well in Prolog the end may be quite ok. Although setting it up can be hard, the bundled CQL package can translate Prolog conjunctions to SQL joins (and a lot more).
  • Translate the data to RDF and compile that to an HDT. Then use SWI-Prolog’s HDT add-on to gain access. That scales fine. Triple access times should be about 500K/second.
  • Similar, using TerminusDB. You can use their query language and I think you can also access the triples from Prolog.
1 Like

The complete database is about 35 GB. My last attempt used prolog facts, as I thought that would be easiest.

A machine with 1TB of RAM might do the job (depending on the structure and required indexes) :slight_smile:

Is this available? I am not easily finding it.

Well, I’ve only got 100 GB, so close but no cigar.

As noted here was able to load the Rich Release Format (RRF) files from UMLS 2022AA Full. The technical details on each RRF file can be found here.

Then came across SemMedDB Database Details for version 4.2 or higher and noticed that these tables were not in the data loaded. They look very useful for doing biology information research, reminds me of the legal search LexusNexus.

Do you need that data included?
If so, is this the correct page to access them?

https://lhncbc.nlm.nih.gov/ii/tools/SemRep_SemMedDB_SKR/SemMedDB_download.html