Can you please provide advice on big data handling strategy?

While I am a fan of library(persistency) personally I would look at Quick Load Files first.
See related topic: Quick Load Files

The topic of large fact files has also been discussed before, see: Scaling to billions of facts?

While you may need library(persistency) to create the first fact file. Once the file is created I would convert the data into a qlf.

I can’t help with this as I have not done that.

AFAIK any fact you want access to has to be in memory. If the fact is not in memory then Prolog will not know about it. As I use Windows I would think that Windows Virtual Memory should solve this, see: How To Manage Virtual Memory (Pagefile) In Windows 10

While I do have some experience parsing very large files into Prolog facts using DCGs, the one thing I know for sure is to make sure your DCG is deterministic. If it is not it will take quite some time to run though the variations which might not be what you want.

I can’t.


While these answers are based on my experience, you should wait to see if Jan W. gives an answer. If he has a different answer then both you and I will have learned something. :slightly_smiling_face:


EDIT

If you just want to save the facts, then I would use library(persistency) as used here.

After you have parsed a single entry, then just persist it using something like

add_imports(Import_module,Export_module) :-
    (
        imports(Import_module,Export_module), !
    ;
        assert_imports(Import_module,Export_module)
    ).

where the functor for the fact is imports and values for fact is Import_module,Export_module.

The line

imports(Import_module,Export_module), !

checks to see if the fact is already persisted and if so, the cut then prunes the running the next statement.

The next statement

assert_imports(Import_module,Export_module)

is executed if the fact is not persisted and just adds an assert to the persistency journal file.

HTH


EDIT

atom_codes(Ay, Ey), atom_number(Ay, Ny)

can this not be replaced with number_codes/2 ?

The whole list thing had me confused until I remembered that the DCG tutorials do it that way, e.g. Using Definite Clause Grammars in SWI-Prolog or Prolog DCG Primer

I think you might find basics.pl – Various general DCG utilities of more value, especially string_without//2.

Personally I prefer the way Wouter Beek does parsing with difference list, dcg.pl