Quick Load Files

Feedback on loading large stable fact files (100 Megabytes to a few Gigabytes). Here are some load times using consult/1 of standard pl files and then again as a Quick Load Format (qlf) file.

Processor: Intel Core i7-5500U CPU @ 2.40 GHz
Ram: 8.00 Gb
D drive: USB 3.0 256Gb - SanDisk thumb drive

% -------------------------------------

File Size: 41.0 MB (43,034,398 bytes) Lines: 559077
Example line:

uniProt_identification(entry_name(swiss_prot,“001R”,“FRG3G”),reviewed,256).

Example usage:

?- time(consult('D:/Cellular Information/UniProt/facts/uniProt_fact_identification')).
% 61,499,195 inferences, **12.328 CPU** in 12.469 seconds (99% CPU, 4988528 Lips)
true.

qcompile('D:/Cellular Information/UniProt/facts/uniProt_fact_identification').
?- time(consult('D:/Cellular Information/UniProt/facts/uniProt_fact_identification.qlf')).
% 429 inferences, **0.219 CPU** in 0.337 seconds (65% CPU, 1961 Lips)
true.

*.pl Size: 41.0 MB (43,034,398 bytes)
*.pl GZip Size: 2.89 MB (3,033,871 bytes)
*.qlf Size: 22.3 MB (23,468,300 bytes)
*.qlf GZip Size: 3.52 MB (3,693,834 bytes)

% -----------------

File Size: 115 MB (120,774,927 bytes) Lines: 1215465
Example line:

uniProt_organism_english_name(entry_name(swiss_prot,“001R”,“FRG3G”),“FV-3”).

Example usage:

?- time(consult('D:/Cellular Information/UniProt/facts/uniProt_fact_organism_species')).
% 149,729,072 inferences, **28.672 CPU** in 29.530 seconds (97% CPU, 5222158 Lips)
true.

qcompile('D:/Cellular Information/UniProt/facts/uniProt_fact_organism_species').
?- time(consult('D:/Cellular Information/UniProt/facts/uniProt_fact_organism_species.qlf')).
% 429 inferences, **0.594 CPU** in 0.906 seconds (66% CPU, 723 Lips)
true.

*.pl Size: 115 MB (120,774,927 bytes)
*.pl GZip Size: 11.7 MB (12,324,123 bytes)
*.qlf Size: 71.9 MB (75,438,440 bytes)
*.qlf GZip Size: 13.8 MB (14,534,767 bytes)

% -----------------

File Size: 226 MB (237,750,972 bytes) Lines: 559077
Example line:

uniProt_sequence_data(entry_name(swiss_prot,“001R”,“FRG3G”),“MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPSEKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLDAKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNIHYILTDKRVDIQHLEKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDDSFRKIYTDLGWKFTPL”).

Example usage:

?- time(consult('D:/Cellular Information/UniProt/facts/uniProt_fact_sequence_data')).
% 64,294,643 inferences, **21.563 CPU** in 22.370 seconds (96% CPU, 2981781 Lips)
true.

qcompile('D:/Cellular Information/UniProt/facts/uniProt_fact_sequence_data').
?- time(consult('D:/Cellular Information/UniProt/facts/uniProt_fact_sequence_data.qlf')).
% 429 inferences, **0.922 CPU** in 1.529 seconds (60% CPU, 465 Lips)
true.

*.pl Size: 226 MB (237,750,972 bytes)
*.pl GZip Size: 58.9 MB (61,827,281 bytes)
*.qlf Size: 212 MB (222,645,246 bytes)
*.qlf GZip Size: 62.7 MB (65,802,502 bytes)

% -----------------

File Size: 562 MB (589,457,325 bytes) Lines: 4492023
Example line:

uniProt_feature(entry_name(swiss_prot,“001R”,“FRG3G”),(“CHAIN”,1,256,“Putative transcription factor 001R.”,,“PRO_0000410512”)).

Example usage:

?- time(consult('D:/Cellular Information/UniProt/facts/uniProt_fact_feature_table_data')).
% 1,087,048,078 inferences, **191.406 CPU** in 194.518 seconds (98% CPU, 5679272 Lips)
true.

qcompile('D:/Cellular Information/UniProt/facts/uniProt_fact_feature_table_data').
?- time(consult('D:/Cellular Information/UniProt/facts/uniProt_fact_feature_table_data.qlf')).
% 463 inferences, **3.688 CPU** in 6.738 seconds (55% CPU, 126 Lips)
true.

*.pl Size: 562 MB (589,457,325 bytes)
*.pl GZip Size: 40.3 MB (42,274,614 bytes)
*.qlf Size: 535 MB (561,345,143 bytes)
*.qlf GZip Size: 52.4 MB (55,027,409 bytes)

% -----------------

File Size: 2.30 GB (2,472,802,790 bytes) Lines: 25624211
Example line:

uniProt_reference_authors(reference_id(entry_name(swiss_prot,“001R”,“FRG3G”),1),“Tan W.G.”).

Example usage:

?- time(consult('D:/Cellular Information/UniProt/facts/uniProt_fact_reference_authors')).
% 3,126,154,467 inferences, **649.938 CPU** in 671.015 seconds (97% CPU, 4809931 Lips)
true.

qcompile('D:/Cellular Information/UniProt/facts/uniProt_fact_reference_authors').
?- time(consult('D:/Cellular Information/UniProt/facts/uniProt_fact_reference_authors.qlf')).
% 429 inferences, **14.438 CPU** in 24.734 seconds (58% CPU, 30 Lips)
true.

*.pl Size: 2.30 GB (2,472,802,790 bytes)
*.pl GZip Size: 170 MB (178,322,169 bytes)
*.qlf Size: 1.34 GB (1,444,645,237 bytes)
*.qlf GZip Size: 211 MB (221,565,080 bytes)

% -----------------

The data in the files can basically be thought of like rows in an SQL table with the structure added. Thus the structure is redundant for each line, e.g. uniProt_identification(entry_name(_,_,_),_,_).


For qlf internal details see: pl-wic.c

1 Like