Collaborative ontology construction involving Prolog

#1

Hi!

I posted this question at Stack Overflow and got several useful comments (some of them from people that are also here). But i thought that people here might have interesting suggestions as well.

Essentially i am looking for a recommendation of which tools to use to make this system:

  1. A web interface where end users can add and edit facts, e.g. “smoking causes cancer with probability 0.02”. This should preferably be something accessible and not involve writing Prolog or other code (e.g. selecting a subject, relation and object from three lists). Here it would also be nice to have some functionality for seeing whether some concepts already exist in the knowledge base (e.g. if a user were to enter a fact about cancer and the concept cancer is already present in the knowledge base, this should be shown, preferably also related concepts). Maybe some drop down menus or lists could achieve this.
  2. Functionality for managing user profiles, passwords, privileges, etc.
  3. A web interface where end users can query the facts section, e.g. “what are causes of cancer?”. The rules which compute answers to queries would not be something that end users edit (i want to do this instead). They just see the results in a easy to comprehend format.
  4. A language which is Prolog or as powerful as prolog which i (and not end users) can use to create inference rules (which generate answers to queries from users).
  5. If it is possible, some means for end users to get a graphical representation of the knowledge base.

I am thinking that i could use Pengines for writing and running inference rules, Protegé (which is an ontology editor (does OWL2) for end users to input and edit facts, and Java script for the front end interacting with end users. Does this seem like a viable strategy?

I don’t have much experience with web programming so i am looking for something not too complicated.

Thanks!
/JCR

1 Like
#2

You could do the whole in a SWISH instance. You might need an external OWL reasoner, depending on exactly what you need. From your description that might be just a subset of OWL which you can typically handle quite well with Prolog. There might be some missing bits and pieces in the current SWISH, but I think not that much and it gives you many pieces of the puzzle in one coherent system.

#3

Being able to do everything from SWISH would be ideal! I have been looking a bit at SWRL for reasoning with OWL (e.g. there is an SWRL tab for Protegé). But my impression is that Prolog is more powerful for reasoning and inference though. So maybe there is no real benefit from using an ontology editor like Protegé and OWL…

#4

For an OWL reasoner see
http://trill-sw.eu

1 Like
#5

OWL and Prolog have a difference in their logical view of the world. OWL is ‘open world’, while Prolog is ‘closed world’.
Closed world - unstated things are false.
Open world - unstated things are unknown.

In Prolog, if I say I have brothers Mike and Randy, then we can say I have two brothers.
In OWL, if I say I have brothers Mike and Randy, we don’t know how many brothers I have - I might have more.

We make this distinction in the real world without realizing it. If I list my brothers on a form, it’s reasonable to expect it’s all my brothers. If I mention my brother Mike to you at a party, it’s not reasonable to assume I have only one brother.

Prolog assumes it has ‘all the facts’.
We can do more powerful reasoning about closed worlds. Prolog is a real computer language (I’ve written a shopping web site in it!). The ability to actually reason with OWL is far more limited.

OWL is weaker, but the open world assumption brings important benefits.
Bob has a semantic web page that mentions I have a brother Mike.

Sue has a web page that mentions I have a brother Randy.

A program that reasons against just Bob’s web page will be wrong if it uses closed world, and correct if it uses open world.

Suppose I ask the program, “Annie had lunch with her brother. Did she have lunch with Mike?”
Well, if the program only knows about Bob’s page, It can conclude it must have been Mike in a closed world, since he’s the only brother. Which is incorrect.

In an open world, the program correctly realizes Annie might have more brothers. But is Sue’s page the only one out there? In an open world, we can never figure out how many brothers Annie has, unless someone publishes an explicit ‘Annie has 2 brothers’.

So reasoning in an open world must inherently be weaker. On the other hand, open world reasoning tolerates incomplete and conflicting data better.

Which is why Sir Tim Berners-Lee picked it for the semantic web.

Each has their place. It is no accident that SWI-Prolog is graced with excellent support for RDF.

You’re coming late to the party - Many people are already publishing collaborative ontologies and data. A good reference book on OWL should point you towards some of them.

You’ll be interested in ClioPatria, a web front end and SPARQL server for triples written in SWI-Prolog.

Please forgive if I’m misunderstanding.

And yes, this is cool stuff!

3 Likes
#6

Hi JCR, Jan, All

Yes, in swish with bousi~prolog !

Years ago I implemented an ui of rdf | graph, passing by swi.http+dot+svg, but today there is another view, more modern, dynamic, colored and swish based.

Instead of line style links (aka ‘fur balls’), connections would be by color, and node values by terms, ground for output or free for input. Nodes render size can be used to convey structures in a compact and precise way.

Thus, an hypergraph, connected (colored) by rdf+prolog predicates, with nodes of the graph (areas, pixels, media, subgraphs…) just rendering of terms - swish rules!

Containment and coherent coloring can play an obvious role for value aggregation (structs, objects…), and offers - by panning and zooming - a way to recursively (?) navigate levels. Kind of 2.5 dimensional map over the extensional/intensional Herbrand base.

Prolog predicates/clauses have a compact representation in such graph: a functor is a color, and variables are ordered, colored areas, colors being tags (citations) of their predicate/clause body dataflow - will explain better, just want to say that the predicate/clause head ‘picture’ allows a fairly precise - and anyway deterministic - declaration of body goals. But then, opening the predicate/clause in a swish editor is a breeze, so the ‘true’ meaning is always available. :slight_smile:

It should be doable to integrate dot style graphs, where good layouts are readily available, rendered to svg or pdf.

Node rendering (structured text, tables etc) could be via html5 or pdf, today easily created in jsPDF.
The layout could be made clean and precise by means of tabling and clpfd:disjoint2.

Such representation could be interesting to (swi)prolog itself, to compactly show some program code properties - and then zooming/panning to edit, just a colored and sized shell…kind of simple ide, based on reflective swi-prolog. Much more ‘mobile first’ friendly than the usual, a bit boring ‘box of colored strings’.

But then, why ? there is never time… or is there someone interested to sponsor ? :partying_face:

Other interesting points of JCR’ question are on probability databases, and nlp for end user crud maintenance.

Bousi~prolog, just released, seems to be a great candidate to implement probabilistic models, and offers nlp capabilities. Fernando, Jan, could we try to integrate in swish ?

For publishing/user/admin in-the-wild (security, socials, management, hosting), by myself I would steer to wordpress - just because it’s what i know a bit, being my customers used to it… wp developers call it semantic publishing - and it’s evolving.

Swish user management, cleary, is ready to use, and Prolog based…

Bousi~prolog is ready with WordNet - wow ! will try asap ! Want to know if filtering can help to get rdf probabilistic queries solved. Semantic filtering as user interface… dropping lenses over the database picture… sounds funny… again, any sponsor there ? :blush: < music song:sad > i’m a freelancer, standalone freelancer…< /music >

A note about color. As logic operator, it has some properties I find interesting. In a very simple sense, can represent both identity and aggregation, under similarity conditions, so it should be efficient for the mean of information description and bulk data transmission.

Last, I would suggest to take a look to Attempto, it has a lot to offer.

Ciao, Carlo

1 Like
#8

Thank you very much for all the useful suggestions! Any tips are very welcome!

#9

First of all, many thanks for your interest in Bousi~Prolog (BPL). Of course, it deals with fuzzy logic programming which is a different way of handling non-crisp data with respect to probabilistic programming. The system has a (limited) web user interface which can be used by both guest and registered users (this tool is inherited from the one for DES, which has not been announced yet). But I really agree with you that an integration with SWISH should be a good point.

BPL has been developed both in SWI-Prolog and C. We build an external library for the most sensitive parts regarding performance (tokenizer, closures, and maximal cliques). I’m not sure whether such an esternal library can be accessed from SWISH. In fact, I do not know how to program SWISH to integrate a system as BPL but it seems to be doable. It would be a project for the mid-term and when time permits. Additionally, first we should enhance our compilation to avoid the most general weak unification algorithm when working with a similarity relation (for which only one maximal clique exists, therefore avoiding a run-time overhead of about 10-15%). Maybe we can contact you when we are ready.

#10

Thinking back and forth about this i think i have a plan :slight_smile:

The idea is to use a Pengine to integrate Prolog (back end) and Javascript/HTML (front end). Even for the facts part i am thinking about using Prolog and assert/1 (and retract/1). Will this work in a Pengine? Otherwise, would it be a good idea to use an SQL database and access it with SWIs ODBC from the Pengine?

I am not sure about the difference between Swish and Pengine concepts. Is it correct that Swish is just a Pengine? If not, which one would be easiest to setup?

I’m also wondering about the limitations of Pengines; e.g. is it possible to load modules (i need clpqr), can I use maplist and so on?

Best regards, JCR

#11

I think that SWI-Prolog RDF library would play very well instead… the same for ODBC, etc.

#12

Sorry for the late reply…
I’m sure we are going to achieve the required integration level.

I’ll try in localhost…

#13

For storage one should be aware that by default, SWISH/Pengines do not allow for permanent changes to the system. So, assert works inside a container (temporary module) that is abandoned when the query finishes. To do something permanent you need to provide a library (as part of the server installation) that provides predicates for permanent changes and define the interface to be safe. The interface can check the operation and even incorporate the logged in user in its decision.

Some other pieces if the puzzle may be ClioPatria and Thea. SWI-Prolog’s RDF library is most likely your storage of choice.

1 Like
#14

Thanks CapelliC and Jan!

I really like the idea of RDF but I think it is very limiting to have binary relations only. In my application i have N-ary relations with a lot of nested terms; I heavily rely on relations between relations and Prolog is fantastic for this! I am familiar with the pattern where a Class (instead of an object property) is used for N ary relations in RDF but i think it is a bit counterintuitive (maybe i just don’t understand it)…

So maybe it is a good idea to stick with something like SQL and ODBC for a Pengine?

I am sorry to be secretive about the details of my application but i am putting together a research paper and i will update when i submit it :slight_smile:

Kinds regards, JC

#15

Great! Please ask for any details about BPL implementation and whether we may be of help in any way.

#16

The advantage of RDF is that you can exchange with the linked data community. But indeed, many things are way easier to express directly in Prolog. For data storage I’d first consider library(persistency). This does mean you need to load all data into Prolog at startup. This leads to longer startup times and more memory usage. Upto a couple of millions of clauses it still works on your Raspberry Pi though and up to some 100 million should work fine on affordable server hardware. Query is simple, flexible and fast.

1 Like
#17

Does anyone have statistics on the load time of very large file, e.g in the 100s of megabytes or larger? On my laptop I did a consult with one of these files (~500,000 facts) and the load time was into the tens of minutes, not seconds. If several or more of these files are used at the same time the load time could take hours and then fail do to lack of memory.

#18

Do not use consult for that. The compiler is far too general, dealing with source locations, checking for functions, etc. For lots of data, just use a read/assert loop. That is what library(persistency) does (well, it maintains a journal, so you can also retract). If the data doesn’t change you can also compile it using qcompile/1 and load the .QLF file using any of the normal load predicates.

500,000 facts should be really easy, but of course it depends on the facts. If they contain large data structures it can get arbitrary expensive.

Your timings seems ridiculously slow. Possibly one of the above explains. Else, please share defaults.

2 Likes
#19

Thanks, Jan.

I don’t want to give a quick, off the cuff, response, but will gather some stats on my system and the specs of the system and respond latter.

#20

Thanks for the suggestion!

#21

Feedback on loading large stable fact files (100 Megabytes to a few Gigabytes). Here are some load times using consult/1 of standard pl files and then again as a Quick Load File (qlf).

Processor: Intel Core i7-5500U CPU @ 2.40 GHz
Ram: 8.00 GM
D drive: USB 3.0 256Gb - SanDisk thumb drive

% -------------------------------------

File Size: 41.0 MB (43,034,398 bytes) Lines: 559077
Example line: uniProt_identification(entry_name(swiss_prot,“001R”,“FRG3G”),reviewed,256).
GZip Size: 2.89 MB (3,033,871 bytes)

?- time(consult(‘D:/Cellular Information/UniProt/facts/uniProt_fact_identification’)).
% 61,499,195 inferences, 12.328 CPU in 12.469 seconds (99% CPU, 4988528 Lips)
true.

qcompile(‘D:/Cellular Information/UniProt/facts/uniProt_fact_identification’).
?- time(consult(‘D:/Cellular Information/UniProt/facts/uniProt_fact_identification.qlf’)).
% 429 inferences, 0.219 CPU in 0.337 seconds (65% CPU, 1961 Lips)
true.

qlf Size: 22.3 MB (23,468,300 bytes)
qlf GZip Size: 3.52 MB (3,693,834 bytes)

% -----------------

File Size: 115 MB (120,774,927 bytes) Lines: 1215465
Example line: uniProt_organism_english_name(entry_name(swiss_prot,“001R”,“FRG3G”),“FV-3”).
GZip Size: 11.7 MB (12,324,123 bytes)

?- time(consult(‘D:/Cellular Information/UniProt/facts/uniProt_fact_organism_species’)).
% 149,729,072 inferences, 28.672 CPU in 29.530 seconds (97% CPU, 5222158 Lips)
true.

qcompile(‘D:/Cellular Information/UniProt/facts/uniProt_fact_organism_species’).
?- time(consult(‘D:/Cellular Information/UniProt/facts/uniProt_fact_organism_species.qlf’)).
% 429 inferences, 0.594 CPU in 0.906 seconds (66% CPU, 723 Lips)
true.

qlf Size: 71.9 MB (75,438,440 bytes)
qlf GZip Size: 13.8 MB (14,534,767 bytes)

% -----------------

File Size: 226 MB (237,750,972 bytes) Lines: 559077
Example line: uniProt_sequence_data(entry_name(swiss_prot,“001R”,“FRG3G”),“MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPSEKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLDAKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNIHYILTDKRVDIQHLEKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDDSFRKIYTDLGWKFTPL”).
GZip Size: 58.9 MB (61,827,281 bytes)

?- time(consult(‘D:/Cellular Information/UniProt/facts/uniProt_fact_sequence_data’)).
% 64,294,643 inferences, 21.563 CPU in 22.370 seconds (96% CPU, 2981781 Lips)
true.

qcompile(‘D:/Cellular Information/UniProt/facts/uniProt_fact_sequence_data’).
?- time(consult(‘D:/Cellular Information/UniProt/facts/uniProt_fact_sequence_data.qlf’)).
% 429 inferences, 0.922 CPU in 1.529 seconds (60% CPU, 465 Lips)
true.

qlf Size: 212 MB (222,645,246 bytes)
qlf GZip Size: 62.7 MB (65,802,502 bytes)

% -----------------

File Size: 562 MB (589,457,325 bytes) Lines: 4492023
Example line: uniProt_feature(entry_name(swiss_prot,“001R”,“FRG3G”),(“CHAIN”,1,256,“Putative transcription factor 001R.”,[],“PRO_0000410512”)).
GZip Size: 40.3 MB (42,274,614 bytes)

?- time(consult(‘D:/Cellular Information/UniProt/facts/uniProt_fact_feature_table_data’)).
% 1,087,048,078 inferences, 191.406 CPU in 194.518 seconds (98% CPU, 5679272 Lips)
true.

qcompile(‘D:/Cellular Information/UniProt/facts/uniProt_fact_feature_table_data’).
?- time(consult(‘D:/Cellular Information/UniProt/facts/uniProt_fact_feature_table_data.qlf’)).
% 463 inferences, 3.688 CPU in 6.738 seconds (55% CPU, 126 Lips)
true.

qlf Size: 535 MB (561,345,143 bytes)
qlf GZip Size: 52.4 MB (55,027,409 bytes)

% -----------------

File Size: 2.30 GB (2,472,802,790 bytes) Lines: 25624211
Example line: uniProt_reference_authors(reference_id(entry_name(swiss_prot,“001R”,“FRG3G”),1),“Tan W.G.”).
GZip Size: 170 MB (178,322,169 bytes)

?- time(consult(‘D:/Cellular Information/UniProt/facts/uniProt_fact_reference_authors’)).
% 3,126,154,467 inferences, 649.938 CPU in 671.015 seconds (97% CPU, 4809931 Lips)
true.

qcompile(‘D:/Cellular Information/UniProt/facts/uniProt_fact_reference_authors’).
?- time(consult(‘D:/Cellular Information/UniProt/facts/uniProt_fact_reference_authors.qlf’)).
% 429 inferences, 14.438 CPU in 24.734 seconds (58% CPU, 30 Lips)
true.

qlf Size: 1.34 GB (1,444,645,237 bytes)
qlf GZip Size: 211 MB (221,565,080 bytes)

% -----------------

The data in the files can basically be thought of like rows in an SQL table with the structure added. Thus the structure is redundant for each line, e.g. uniProt_identification(entry_name(_,_,_),_,_).

1 Like
Fast bit mask check
Scaling to billions of facts?
Scaling to billions of facts?
User specific data handling strategy, please comment?