New version of bio_db is now available on the pack server.
?- pack_install( pack(bio_db) ).
bio_db is a pack servicing high quality biological data. The data reside with pack(bio_db_repo).
The pack provides data from some of the best resources for biological data, including NCBI, HGNC, EBI and STRING.
If you install bio_db_repo all data is accessible, otherwise data tables will be downloaded as you need them.
bio_db_repo contains 64 tables taking 270Mb of zipped space.
Assuming Prolog-only access to data, then a maximum of 3.1 Gb of space will be used.
The first time a Prolog table is accessed, it is (a) expanded from the zip form,
and (b) a .qlf is generated automatically that will enable faster future loads.
Data tables are served as Prolog facts that are either loaded in memory via fast loading qlfs or served-from-disk from a variety of
database engines (including SQLite, Berkeley and RocksDB).
Version 3.1 depends on pack(lib) 2.6, which allows packs to be composed of a hierarchy of “cells”.
Currently bio_db serves human and mouse data.
?- use_module( library(bio_db) ).
?- use_module( library(lib) ).
load the “full” module.
In either case, only hot-swappable code is loaded at this point.
?- map_hgnc_symb_hgnc( ‘LMTK3’, Hgnc ).
Hgnc = 19295.
?- lib( & bio_db ).
loads the skeleton of the module (prolog/bio_db) but none of the cell files that give access to data predicates.
This is usually how the cell files get access to the core predicates.
?- lib( bio_db(mouse) ).
limit access to mouse data only.
?- lib( bio_db(hs) ).
access to human data only.
?- lib( bio_db(hs(hgnc)) ).
access to the HGNC datasets for human.
As of bio_db 2.0 all the code for fetching the data
and preparing them to the bio_db formats is publicaly available (auxil/build_repo).
On a Linux-like system a single query downloads all data
from primary sources and transform them to bio_db format
(auxil/build_repo/std_repo.pl ?- std_repo().)
(ICLP 2019 paper): https://arxiv.org/abs/1909.08254