Pushed library(intercept)

Following the discussion on error handling, I added a Ciao compatible library providing intercept/3, send_signal/1 and a few more helpers in this family, some of which are not in the Ciao version. The extensions deal with providing arguments that are not copied and collecting signals in a list. The example is this:


For example, assume we parse a (large) file using a grammar (see phrase_from_file/3) that has some sort of record structure. What should we do with the recognised records? We can return them in a list, but if the input is large this is a huge overhead if the records are to be asserted or written to a file. Using this interface we can use

document -->
    record(Record),
    !,
    { send_signal(record(Record)) },
    document.
document -->
    [].

Given the above, we can assert all records into the database using the following query:

    ...,
    intercept(phrase_from_file(File, document),
              record(Record),
              assertz(Record)).

Or, we can collect all records in a list using intercept_all/4:

    ...,
    intercept_all(Record,
                  phrase_from_file(File, document), record(Record),
                  Records).

Comments are more than welcome. The interface may change based on further discussion.

7 Likes

Hi Jan,

That looks great.

I guess, that sending the signal is fully asynchronous …

Its great to create,for example, log files, and to ensure that writing to a slow output device, doesn’t slow down processing.

Dan

Rereading, i notice that this is not a publish-subscribe interface …

its a dedicated intercept … could this be made to work so that many subscribers (observers) could listen to the signal and act on it …

Dan

That is done by library(broadcast). This too is synchronous, so if you want it asynchronous you have to listen on a channel and send all events to a thread message queue. Note that this library also integrates with several network layers such as TIPC and UDP.

1 Like

SWI-Prolog provides a publish-subcribe broadcast library.

Logtalk provides publish-subcribe via event-driven programming support. There’s also a dependents library implementing a version of this Smalltalk publish-subscribe mechanism.

Thank you @jan, this is very interesting. To me it seems that this finally resolves the question of disentangling the parsing from the side effect, on the level of the source code at least. I am talking about this question that I had.

I now re-wrote my original client code, from this:

:- use_module(fasta).
:- use_module(iupac).

main(_) :-
    phrase_from_stream(fasta_revcomp, current_input).

fasta_revcomp -->
    fasta_record(Descr, Seq),
    {   reverse(Seq, Rev),
        maplist(iupac_complement, Rev, RevCompl),
        phrase(generate_fasta_record(Descr, RevCompl), Codes),
        format(current_output, "~s", [Codes])
    },
    !,
    fasta_revcomp.
fasta_revcomp --> [].

… to this:

:- use_module(library(intercept)).
:- use_module(fasta).
:- use_module(iupac).

main(_) :-
    intercept(phrase_from_stream(fasta, current_input),
              fasta(D, S),
              (   revcompl(S, RS),
                  phrase(generate_fasta_record(D, RS), Codes),
                  format(current_output, "~s", [Codes])
              )).

fasta -->
    fasta_record(Descr, Seq),
    !,
    { send_signal(fasta(Descr, Seq)) },
    fasta.
fasta --> [].

revcompl(Seq, RevCompl) :-
    reverse(Seq, Rev),
    maplist(iupac_complement, Rev, RevCompl).

I like it better like this. I timed the two versions on my original ~30MB, ~20K record input file, and I did not see any difference, which is great! If there is overhead, it is negligible in comparison to the real processing.

(Note: I changed the DCG for parsing FASTA from fasta//1, parsing to a compound term fasta(Description, Sequence) to fasta//2 that has two arguments. This seemed to cut ~5% of the running time, but I didn’t do the timing measurements too carefully.)

I have further questions but I first need to try out how this can be used and broken :slight_smile:

2 Likes

Yes! This was about the scenarios the Ciao developers from what I understood from Edison Mera. I’m thinking to apply this stuff also to library(csv) and the various RDF parsing libraries. The overhead of intercept is significant, but in most practical applications this shouldn’t be in the inner loop. On a tight loop using between/3 as generator I could measure a 10 fold slowdown. Some of the overhead can be reduced by pushing the search for the intercept handler and copying the match to C. Seems this isn’t immediately necessary :slight_smile:

forgive my cluelessness here, but how is this not just library(broadcast)?

1 Like

The most outstanding difference is the scoping. intercept is scoped to a goal, where broadcast listeners are globally scoped. There are also differences wrt the semantics of the called handlers, but these are more arbitrary. A broadcast channel has 0 or more listeners and any number of them is fine. An intercept channel is typically handled by a single handler and lack of a matching handler is normally considered an error.

2 Likes

So, I’m writing a parser for BVH files, and wondering what to do with syntax errors. Some users might expect an exception thrown, some might want to output error messages. Error message folks want to know about subsequent errors Depends on if the bvh file is something they generated, or if it’s an outside file that is valid or isn’t (and even then, the bvh format is ‘de facto’, so some oddball variation might need tweaking.)

Better to use intercept, or better to write error messages and fail, or?

It depends a bit on what some people means. If this means different applications a simple print_message(warning, ). might do and some applications may wish to hook this to generate an exception. If it is the same application the intercept interface could be appropriate. It can be used to choose between throwing an exception, print-and-skip, simply skip or get some dedicated term in the output and let the user deal with it later.

1 Like

probably different applications. So you see throwing from the message definition legit?

From message_hook/3, I guess that is fine.

1 Like

3 posts were split to a new topic: Didactic usage example

Isn’t intercept a Lisp-like condition system as implemented in https://www.swi-prolog.org/pack/file_details/condition/prolog/condition.pl - with Restart ignored?

They are surely related. Thanks for pointing that out. The condition package seems to have two modes though: one where it applies to child goals. This one is broken if Goal is non-deterministic as the handler is removed only if the choice points are exhausted. The global one looks more related to library(broadcast).