Size limitations on qsave_program?

thefisch · June 24, 2020, 8:58pm

Hi -

I am computing some very large tables (somewhere between 1 million and 100 million entries - I don’t actually keep track of how big they get) which I assert as facts. I then try to save the program state so I can resume my computation with the pre-computed tables:

qsave_program('gtestr', [goal(main), stand_alone(true)])

This works well for smaller problems but SWI dies for the problems that induce the very large tables. The stacktrace is below. Note that the computation of the tables works well, even when they are very large (we have 270GB physical RAM and the stack limits are set accordingly), it is just the saving that fails.

I’m using swipl 8.1.30 for x86_64-linux, running on a Linux kernel 4.15.0-54-generic, and an Intel Xeon CPU E5-2640 v2 @ 2GHz processor with 16 physical cores (although nothing else is running, and my application is single-threaded).

Any idea what can go wrong?

Thanks,

– Bernd

ERROR: In:
ERROR:   [21] close(<stream>(0x558954b2e090))
ERROR:   [19] qsave:write_zip_state(<zipper>(0x558954b2dad0),runtime,none,[goal(...),...]) at /usr/lib/swipl/library/qsave.pl:168
ERROR:   [18] setup_call_catcher_cleanup(qsave:zip_open_stream(<stream>(0x558954b2d6b0),<zipper>(0x558954b2dad0),[]),qsave:write_zip_state(<zipper>(0x558954b2dad0),runtime,none,...),_7419814,qsave:zip_close(<zipper>(0x558954b2dad0),...)) at /usr/lib/swipl/boot/init.pl:564
ERROR:   [16] qsave:write_state(<stream>(0x558954b2d6b0),runtime,none,[goal(...),...]) at /usr/lib/swipl/library/qsave.pl:155
ERROR:   [15] setup_call_catcher_cleanup(qsave:open('v20.exe',write,<stream>(0x558954b2d6b0),...),qsave:write_state(<stream>(0x558954b2d6b0),runtime,none,...),_7419950,qsave:finalize_state(_7419994,<stream>(0x558954b2d6b0),'v20.exe')) at /usr/lib/swipl/boot/init.pl:564
ERROR:   [14] '<meta-call>'(qsave:(...,...)) <foreign>
ERROR:   [13] setup_call_catcher_cleanup(qsave:open_map(...),qsave:(...,...),_7420078,qsave:close_map) at /usr/lib/swipl/boot/init.pl:564
ERROR:   [11] qsave:qsave_program('v20.exe',gtestr:[...|...]) at /usr/lib/swipl/library/qsave.pl:136
ERROR:    [9] gtestr:main at /app/src/gtestr.pl:67
ERROR:    [8] catch(gtestr:main,error(io_error(write,<stream>(0x558954b2e090)),context(...,'Invalid argument')),'$toplevel':true) at /usr/lib/swipl/boot/init.pl:482
ERROR:    [7] catch_with_backtrace('<garbage_collected>',error(io_error(write,<stream>(0x558954b2e090)),context(...,'Invalid argument')),'<garbage_collected>') at /usr/lib/swipl/boot/init.pl:532
ERROR:
ERROR: Note: some frames are missing due to last-call optimization.
ERROR: Re-run your program in debug mode (:- debug.) to get more detail.

jan · June 25, 2020, 1:05pm

I wouldn’t expect problems below 2Gb sized saved states. Not sure this is before or after compression. Part of the saved state code is really old, happily using long types. This should be safe on non-Windows, but can lead to a 2Gb size limit on Windows. Other parts may use (unsigned) int.

I’d be interested in the message just before the “ERROR: In:”?

pvheerden · June 26, 2020, 6:27am

Hi -

The missing message line reads ERROR: -g gtestr:main: I/O error in write on stream <stream>(0x56096b817ac0) (Invalid argument).

Thanks,
Phillip

jan · June 26, 2020, 6:57am

Interesting. I guess the way out is to use a C debugger (e.g., gdb) and put a breakpoint on Sseterr() and possibly Sset_exception(). The way to go on depends a little on the OS, your expertise in C debugging and how easy it is to get this reproduced by me. Is it as simple as merely generating a giant dynamic predicate and than creating a saved state?

pvheerden · June 26, 2020, 11:38am

Is it as simple as merely generating a giant dynamic predicate and than creating a saved state?

This seems to be the case yes. We were able to induce the same error using this program:

:- module(test, []).

:- dynamic t/3.

fill(0) :-
	!.
fill(X) :-
	!,
	X > 0,
	numlist(0, X, L),
	random_permutation(L, K),
	assert(t(X, L, K)),
	Y is X - 1,
	fill(Y).

main :- writeln(success).

make :-
	fill(40000),
	qsave_program('./test', [goal(main), stand_alone(true)]).

Have not yet had the time to set up a debug bench for this, but thanks for the breakpoint locations!

Regards,
Phillip

jan · June 26, 2020, 1:38pm

Thanks. This reproduces in 21 min using 36Gb memory thanks to the new dev machine with 64Gb sponsored by DataChemists It stops with the somewhat strangely sized output file of 3.1Gb. I’ll attach a debugger to see whether this is something obvious.

jan · June 26, 2020, 4:11pm

A bit of debugging showed that a flag is required to enable writing zip files over 4Gb. Pushed a patch (498c7dce6b274675b6e0a7b43dc8e9922980b0af) to fix this.

The test now creates a 3.1Gb compressed state representing a 9Gb uncompressed state. It loads in 66 sec on my machine. I wonder what you want to use this for …

pvheerden · June 26, 2020, 6:42pm

Thank you for the effort Jan! I will give that the patched version a go

As for what we use this for, @thefisch is the mastermind behind an advanced test suite generation tool. Before it can generate whatever tests we specify, it needs to compute some standard properties for the given grammar. Snappy for small grammars, but slows down quickly as the grammar grows.

jan · June 27, 2020, 9:55am

Interesting. Not sure how relevant it is, but to avoid the 10 min wait before the table was filled to be saved, I changed the code a little to run in 16 theads, doing the job in 37sec. That opposed to loading the giant state in 65 sec. Here is how I did this,
run using ?- fill(40000, 16).

:- dynamic t/3.

rlist(Q) :-
    thread_get_message(Q, M),
    (   M == done
    ->  true
    ;   M = do(I),
        numlist(0, I, L),
        random_permutation(L, K),
        assert(t(I, L, K)),
        rlist(Q)
    ).

fill(N,M) :-
    length(Threads, M),
    message_queue_create(Q),
    maplist(thread_create(rlist(Q)), Threads),
    forall(between(0,N,I),
           thread_send_message(Q, do(I))),
    forall(between(1,M,_),
           thread_send_message(Q, done)),
    maplist(thread_join, Threads).

pvheerden · June 28, 2020, 9:24am

Very interesting indeed! I must say I have not made much use of the multithreading primitives in Prolog, but perhaps it is time that changed…

Regarding the patch: we were able to create and load our massive saved state! Thanks a lot for the effort, this will be a huge boon to our work.

jan · June 28, 2020, 6:33pm

Regarding the concurrent creation, I’ve added concurrent_forall/2,3 to library(threads), so we can create the DB using the code below. That does more or less the same as my previous post. The implementation of concurrent_forall/3 is way more complicated to deal with failure and exceptions in the worker pool to abandon the computation and fail/throw. It was a bit of a challenge

fill(N,M) :-
    concurrent_forall(between(1, N, I),
                      ( numlist(0, I, L),
                        random_permutation(L, K),
                        assert(t(I, L, K))
                      ),
                      [threads(M)]).

jan · July 1, 2020, 6:50am

I quote myself:

Topic		Replies	Views
Segfault happening some time after 9.3.0 General	12	76	July 26, 2024
Qsave_program - Help!	7	869	August 30, 2019
What and how to get information on silent quit after heavy use of memory to consult experts Predicate	7	656	July 27, 2022
How to extend maximum of main thread count over 1G? Help!	5	247	January 21, 2024
SWI-Prolog: [FATAL ERROR: at Mon Sep 28 11:18:32 2020 Too many stacked strings] Help! how-to	5	843	September 28, 2020

Size limitations on qsave_program?

Related topics