Multi-pass compilation? (for term expansion)

Is there an “easy” way to do multi-pass compilation? My use case is edcg (extended DCG), which requires a declaration describing how each -->> clause is expanded, so the declarations all need to be at the beginning of the file, to ensure that they’re available before the first use of a predicate.

The best I can think of doing two passes is by having -->> expansion save the clause (using assertz or similar) and then process them all when end_of_file is encountered. But I suspect that this will confuse other tools, such as the graphic debugger.
Is there a better/easier way?

Is what you’re trying to do any different from how the Logtalk port of ECGs does term-expansion? See:
https://github.com/LogtalkDotOrg/logtalk3/blob/fa051f946a5421671674de88142ece2d3dcd833f/library/edcg/edcg.lgt

Followup.

Just for fun. If you a pl.pl plain Prolog file with an EDCG such as:

:- op(1200, xfx, '-->>').

% Declare accumulators
acc_info(adder, X, In, Out, plus(X,In,Out)).

% Declare predicates using these hidden arguments
pred_info(len,0,[adder,dcg]).
pred_info(increment,0,[adder]).

increment -->>
	[1]:adder.

len(Xs,N) :-
	len(0,N,Xs,[]).

len -->>
	[_],
	!,
	increment,
	len.
len -->>
	[].

You can do:

?- {edcg(loader)}.
...
% (0 warnings)
true.

?- logtalk_load('pl.pl', [hook(edcg)]).
% [ /Users/pmoura/Desktop/pl.pl loaded ]
% (0 warnings)
true.

?- len([a,b,a], Len).
Len = 3.

Done properly you can implement multi-pass compilation this way and keep all tools running, but it is relatively hard. Can’t you figure out that the declaration is too late and warn the user?

Another dirty trick I use in the XSB emulation is to expand begin_of_file to pre-read the entire file and collect all its declarations. See library(dialect/xsb/source), xsb_P_directives/1.

2 Likes

What I want to do is move pred_info(len,0,[adder,dcg]) to be right before before len -->> . This works with logtalk, but not with SWI-Prolog. ISTR that logtalk has two passes over the source, so is this why it works with logtalk? … I couldn’t see any differences in edcg.lgt compared to edcg.pl beyond avoiding the multifile problem, but maybe I missed something?

The current Logtalk implementation expects the clauses for the pred_info/3, acc_info/5-7, and pass_info/1-2 to occur in the source file before the clauses for -->>/2. An alternative would be to assert temporarily the clauses for -->>/2 and expand them when reaching the entity closing directive (in your case, given a module, that would be the end-of-file). Not worth the trouble for the use cases I found so far but my understanding is that you use EDCGs heavily and it makes sense in your case?

It seems that the trick is to separate the term expansion declarations from the code – if I understand logtalk’s approach correctly, that’s what is done by logtalk_load('....pl', [hook(edcg)]).

So, I should rename my_module.pl to my_module_impl.pl and wrap it with my_module.pl (this code is not yet written – I’d appreciate feedback before writing it – e.g., should I use include(my_module_impl) rather than use_module?):

:- module(my_module, [...]).

:- use_module(library(edcg)).  % term expansion for my_module_impl.pl

my_module_impl:term_expansion(begin_of_file, Out) :-
    % see library(dialect/xsb) 
    %    user:term_expansion/2, xsb_directives_aux/2.
    % read my_module_impl.pl and assert its 
    % pred_info/3 statements before the code is consulted.
    ...

:- use_module(my_module_impl, [...]).

The hook/1 flag specifies an object, implementing the expanding protocol, that is to be used to expand a source file. A recent blog post on the Logtalk approach to term-expansion is:

My previous suggestion was to change the edcgs module to do something like:

	cleanup :-
		retractall(pred_info(_,_,_)),
		retractall(acc_info(_,_,_,_,_,_,_)),
		retractall(acc_info(_,_,_,_,_)),
		retractall(pass_info(_,_)),
		retractall(pass_info(_)),
        retractall(rule(_, _)).

	term_expansion(begin_of_file, begin_of_file) :-
		cleanup.
	term_expansion((:- module(M,Exports), [(:- module(M,Exports)), (:- op(1200, xfx, '-->>'))]).

	term_expansion(pred_info(A,B,C), []) :-
		assertz(pred_info(A,B,C)).
	term_expansion(acc_info(A,B,C,D,E,F,G), []) :-
		assertz(acc_info(A,B,C,D,E,F,G)).
	term_expansion(acc_info(A,B,C,D,E), []) :-
		assertz(acc_info(A,B,C,D,E)).
	term_expansion(pass_info(A,B), []) :-
		assertz(pass_info(A,B)).
	term_expansion(pass_info(A), []) :-
		assertz(pass_info(A)).
	term_expansion('-->>'(A, B), []) :-
		assertz(rule(A, B)).

    term_expansion(end_of_file, Terms) :-
        findall(
            Term,
            (retract(rule(A, B)), .... expand rule ...),
            Terms,
            [end_of_file]
        ).

Hope this helps.

1 Like

term_expansion(begin_of_file, ...) doesn’t apply retroactively in a single file (I’ve tried it), so the two-file approach seems to be necessary. This also allows a wide variety of solutions.

If I use the term_expansion(end_of_file, ...) approach, won’t that confuse tools such as the graphical tracer?

You can preserve the right location for the clause by calling source_location/2 while expanding the term and passing '$source_location'(File,Line):Term for the terms generated in the expansion of end_of_file. Next, as normal normal term expansion will not do its work, you need to block the above term expansion if the Prolog flag xref is set to true and you probably nead hooks into library(prolog_clause) to relate the source term to its compiled form. For short it can be done but it is quite hard. The next iteration of the source layout management will probably be a lot easier to use, but is still a prototype.

I’ve never studied the extended grammar rules, but I still wonder: can’t you simply give an error message if the declaration is too late? Forcing declarations to be at the start of the file isn’t necessarily bad IMO.

3 Likes

A simpler solution is to parse the input file (defining the EDCG) twice. In the first pass, collect and save the clauses that define accumulators, etc. In the second pass, actually load and expand the file using the data collected in the first pass. The expansion rules need, of course, to be modified to access the collected data. This way there would be zero interference with tools such as the graphical tracer. As a proof of concept, I defined:

:- object(edcg_collect,
	implements(expanding)).

	:- public([
		pred_info/3, acc_info/7, acc_info/5, pass_info/2, pass_info/1
	]).
	:- dynamic([
		pred_info/3, acc_info/7, acc_info/5, pass_info/2, pass_info/1
	]).

	cleanup :-
		retractall(pred_info(_,_,_)),
		retractall(acc_info(_,_,_,_,_,_,_)),
		retractall(acc_info(_,_,_,_,_)),
		retractall(pass_info(_,_)),
		retractall(pass_info(_)).

	term_expansion(begin_of_file, _) :-
		cleanup,
		fail.

	term_expansion(pred_info(A,B,C), _) :-
		assertz(pred_info(A,B,C)),
		fail.
	term_expansion(acc_info(A,B,C,D,E,F,G), _) :-
		assertz(acc_info(A,B,C,D,E,F,G)),
		fail.
	term_expansion(acc_info(A,B,C,D,E), _) :-
		assertz(acc_info(A,B,C,D,E)),
		fail.
	term_expansion(pass_info(A,B), _) :-
		assertz(pass_info(A,B)),
		fail.
	term_expansion(pass_info(A), _) :-
		assertz(pass_info(A)),
		fail.

:- end_object.

And then modified the edcg parsing code to do:

	pred_info(A,B,C) :-
		edcg_collect::pred_info(A,B,C).
	acc_info(A,B,C,D,E,F,G) :-
		edcg_collect::acc_info(A,B,C,D,E,F,G).
	acc_info(A,B,C,D,E) :-
		edcg_collect::acc_info(A,B,C,D,E).
	pass_info(A,B) :-
		edcg_collect::pass_info(A,B).
	pass_info(A) :-
		edcg_collect::pass_info(A).

	cleanup :- true.

Parsing twice is then done using:

?- {edcg_collect, edcg(loader)}.
...
% (0 warnings)
true.

?- logtalk_compile('~/Desktop/pl.pl', [hook(edcg_collect)]).
% [ /Users/pmoura/Desktop/pl.pl compiled ]
% (0 warnings)
true.

?- logtalk_load('~/Desktop/pl.pl', [hook(edcg)]).
% [ /Users/pmoura/Desktop/pl.pl loaded ]
% (0 warnings)
true.

?- len([a,b,a], Len).
Len = 3.

As the logtalk_compile/2 predicate compiles to disk, the pl.pl file is only loaded into memory once. It should be possible to adapt this solution to your case. Note that this is in essence similar to what Jan suggested when he wrote " Another dirty trick I use in the XSB emulation is to expand begin_of_file to pre-read the entire file and collect all its declarations.".

1 Like

I want the pred_info declarations with the predicates for the same reason that I want the “%!” comments with the predicates. (It’s trivial to output an error if a pred_info declaration is missing or in the wrong place; in fact, I already do that.)

Extended grammar rules don’t require much study – they allow defining multiple accumulators and the accumulators can use data structures other than lists (e.g., I have 2 separate outputs that use difference lists, a symbol table that uses rbtrees, and a “global” dict that contains information about the file that’s being processed).

The “dirty trick” of reading the file before it’s compiled also allows doing interesting things with “%!” comments (e.g., outputting wrap_predicate/4 directives that check type information and output errors if deterministic predicates fail.

@pmoura – thank-you for the proof-of-concept.

You shouldn’t need multi-passing for that. wrap_predicate/4 can be called before and after defining the predicate, so you can easily use library(pldoc/doc_modes) to inspect declared modes and wrap what you want at any time or unwrap it as you want to get rid of the checks.

The fact that you can wrap and unwrap at runtime is the beauty of wrap_predicate/4. Wrapping itself is easily done using source code translation, but that doesn’t provide the dynamic behavior.

3 Likes

You welcome. I found a working alternative to do generic multi-pass expansion using the standard stream_property/2 and set_stream_position/2 predicates. The idea is relatively simple: at begin_of_file, save the stream position and restore it if there’s a next pass in the expansion pipeline. This works well for EDCGs encapsulated in Logtalk objects due to the end_object/0 directive, which is used to go back to the beginning of the file by resetting the stream position. But, with modules, there’s no closing directive, only end_of_file, forcing you to use a mark clause as the last term to know when to reset the stream position (we cannot do it on end_of_file as the compiler will stop reading terms at this point and will ignore the stream position reset. All together, it’s still a hack as all expansions except the last one are required to suppress all file terms after collecting data like pred_info/3 to ensure no duplicated terms are actually seen by the compiler. Too much serendipity required for my taste :stuck_out_tongue:

It shouldn’t. At least basic streams do/should not. This works fine:

rr(File) :-
    open(File, read, In),
    stream_property(In, position(Pos)),
    (   between(1, 4, Cycle),
        set_stream_position(In, Pos),
        format('*** ~d ***~n', [Cycle]),
        copy_stream_data(In, current_output),
        fail
    ;   close(In)
    ).

Possibly the compiler closes the stream, but IMO it shouldn’t. For XSB I’ve chosen to use use peek_string/3, so we can also load from non-repositioning streams such as compressed streams, networks, etc. That too is non-portable :frowning: