Redis: error unserializing "as prolog" terms

swi · October 22, 2022, 2:13am

I am storing arbitrary exceptions thrown by user defined goals into a redis stream, the storing and retrieving works fine most of the time, except when there is a blob involved. I am storing using the as prolog specification for the terms as you can deduce from the \x00T\x00 below.

From redis-cli here is what the stream contains:

127.0.0.1:6379> xrevrange pl:redisjobs:cgrp:log + - count 3
1) 1) "1666391352021-0"
   2)  1) "consumer"
       2) "\x00T\x00'cgrp:h.local:434164:1'"
       3) "goal"
       4) "\x00T\x00:(plunit_redis_jobs,addone)"
       5) "group"
       6) "\x00T\x00cgrp"
       7) "inid"
       8) "\x00T\x00'1666391352019-0'"
       9) "instream"
      10) "\x00T\x00'pl:rj:in'"
      11) "outid"
      12) "\x00T\x00_9974"
      13) "outstream"
      14) "\x00T\x00'pl:rj:out'"
      15) "status"
      16) "\x00T\x00exception(error(existence_error(matching_rule,:(plunit_redis_jobs,addone(redis{val:1},_8736{consumer:'cgrp:h.local:434164:1',group:cgrp,message:'1666391352019-0',redis:redis_connection(default,<stream>(0x5646d8635100,0x5646d82cc900),0,[address(:(localhost,6379)),group(-(cgrp,'cgrp:h.local:434164:1')),reconnect(true),server(default),start(0),version(3)])},_8722,_8724))),context(:(plunit_redis_jobs,/(addone,4)),_8840)))"
2) 1) "1666391351988-0"
   2)  1) "consumer"
       2) "\x00T\x00'cgrp:h.local:434164:1'"
       3) "goal"
       4) "\x00T\x00:(plunit_redis_jobs,addone)"
       5) "group"
       6) "\x00T\x00cgrp"
       7) "inid"
       8) "\x00T\x00'1666391351987-0'"
       9) "instream"
      10) "\x00T\x00'pl:rj:in'"
      11) "outid"
      12) "\x00T\x00_6820"
      13) "outstream"
      14) "\x00T\x00'pl:rj:out'"
      15) "status"
      16) "\x00T\x00exception(hey_some_exception)"
3) 1) "1666391351956-0"
   2)  1) "consumer"
       2) "\x00T\x00'cgrp:h.local:434164:1'"
       3) "goal"
       4) "\x00T\x00:(plunit_redis_jobs,addone)"
       5) "group"
       6) "\x00T\x00cgrp"
       7) "inid"
       8) "\x00T\x00'1666391351955-0'"
       9) "instream"
      10) "\x00T\x00'pl:rj:in'"
      11) "outid"
      12) "\x00T\x00-(1666391351956,0)"
      13) "outstream"
      14) "\x00T\x00'pl:rj:out'"
      15) "status"
      16) "\x00T\x00exit"

Here is what happens when we try to retrieve some problematic entries:

7 ?- redis(default,xrevrange('pl:redisjobs:cgrp:log',+,-,count,1),R1).
ERROR: Syntax error: Operator expected
ERROR: exception(error(existence_error(matching_rule,:(plunit_redis_jobs,addone(redis{val:1},_8736{consumer:'cgrp:h.local:434164:1',group:cgrp,message:'1666391352019-0',redis:redis_connection(default,
ERROR: ** here **
ERROR: <stream>(0x5646d8635100,0x5646d82cc900),0,[address(:(localhost,6379)),group(-(cgrp,'cgrp:h.local:434164:1')),reconnect(true),server(default),start(0),version(3)])},_8722,_8724))),context(:(plunit_redis_jobs,/(addone,4)),_8840))) .

Obviously, the problem is that the stream blob is stored as <stream>(0x..., 0x...) whereas I think it should be stored as '<stream>'(0x..., 0x...) so that it is properly parsed.

I looked at the source in redis4pl.c and it is prperly asking for CVT_WRITE_CANONICAL. Should this not write <stream> surrounded by single quotes?

NOTE: I do want to keep the terms in the redis stream as they were originally, even if it was a blob. Of course we are never going to instantiate the blob again, it is used for other purposes.

EricGT · October 22, 2022, 7:18am

While this is not an answer, I am thinking the answer will be more along the lines of how logs are processed with the http package.

http_log.pl

github.com

SWI-Prolog/packages-http/blob/b08ce8412502dbbea6d6b2a7c7ed57de80fd9181/http_log.pl#L426-L445


      
          log(Code, Status, Bytes, Id, CPU, Stream) :-
              (   map_exception(Status, Term)
              ->  true
              ;   message_to_string(Status, String),
                  Term = error(String)
              ),
              format(Stream, 'completed(~q, ~q, ~q, ~q, ~W).~n',
                     [ Id, CPU, Bytes, Code,
                       Term, [ quoted(true),
                               ignore_ops(true),
                               blobs(portray),
                               portray_goal(write_blob)
                             ]
                     ]).
          
          :- public write_blob/2.
          write_blob(Blob, _Options) :-
              format(string(S), '~q', [Blob]),
              writeq(blob(S)).

or

term_html.pl which uses a hook

github.com

SWI-Prolog/packages-http/blob/34ebe7ff5fd60f754517071ca1ae39ce77db2f2c/term_html.pl#L750-L761


      
                           /*******************************
                           *             HOOKS            *
                           *******************************/
          
          %!  blob_rendering(+BlobType, +Blob, +WriteOptions)// is semidet.
          %
          %   Hook to render blob atoms as HTML.  This hook is called whenever
          %   a blob atom is encountered while   rendering  a compound term as
          %   HTML. The blob type is  provided   to  allow  efficient indexing
          %   without having to examine the blob. If this predicate fails, the
          %   blob is rendered as an HTML SPAN with class 'pl-blob' containing
          %   BlobType as text.

github.com

SWI-Prolog/packages-http/blob/34ebe7ff5fd60f754517071ca1ae39ce77db2f2c/term_html.pl#L110-L116


      
          any(Term, Options) -->
              { blob(Term,Type), Term \== [] },
              !,
              (   blob_rendering(Type,Term,Options)
              ->  []
              ;   html(span(class('pl-blob'),['<',Type,'>']))
              ).

Long story short. This is a common problem for Prolog code that needs to represent blobs and in searching the Prolog code, both the *.pl and *.c files, one finds several different ways each user chose to handle the problem. Two other packages that deal with blobs were pengines and protobuf but I stopped looking for more after having found the same problem handled a few different ways.

HTH

jan · October 22, 2022, 2:57pm

@EricGT, thanks for the various illustrations. All in all this is not very satisfactory. It seems blobs are a very useful extension to handle foreign resources in a reliable way. But, by design we cannot read them (if we could, the provided security is gone).

The main problem seems with log messages. The provided hooks are an attempt to resolve this. Note that you may want to extract relevant information from the blob ans store that. I.e., for a clause you may want to store the predicate it belongs to and/or the source location if it has source information. For a stream (your case), you may want to store some of its properties (input/output, type, file name, position info). The backtrace routine already allows fetching the backtrace using clause references (the default because it is cheap, leaving the translation to source to the message system) or using source information.

write_canonical/1,2 should probably best raise a domain (type?) error. write_term/3 has options to deal with blobs as shown.

Better suggestions are welcome.

swi · October 22, 2022, 9:51pm

I think the best, more logical solution is to think about blobs as two terms

one term that represents the living blob and can be dereferenced into the underlying object,
one term that represents the blob for display and storage purposes

write_canonical/2 can ask the living blob (which currently alive and active) for its display/storage term, which is pretty much the kind of term that is displayed today but with proper quoting, etc.

If this storage term is stored in a log or database, blob/2 would fail on it, because it is only a display term. It can be used for reasoning about the properties of the blob at the time it was stored, but it is not a blob, so you can’t execute code linked with it, run it or do anything that you do with a real blob.

It only contains a snapshot of the properties of the blob (name, pointer addresses, etc) at the time it was stored/displayed.

EricGT · October 23, 2022, 7:20am

Security for …?

I am thinking that the security is so that one can not inject executable Prolog code into a blob and then the Prolog code is unknowingly executed when processing the blob.

I like the idea of two separate concepts but not two Prolog terms.

I am thinking

metadata such as properties which should be a Prolog term or something useable by Prolog.
data AKA payload, which is not a Prolog term but could be converted into a Prolog term.

The metadata can be used to report about the blob, I.e. log files, and also reason about the data by using the metadata without accessing the data. For example with binary files with data values (scalar and structures) that can be accessed via offsets, carrying around the offsets in metadata is quite useful. While my code for binary files does not use SWI-Prolog blobs the concepts are very similar.

jan · October 23, 2022, 8:09am

That is more or less what we achieve using write_term/2,3 and blobs(portray) with a suitable definition for portray/1 or using the portray_goal(:Goal) option to some specific predicate, no? Or, are you looking for a more fundamental option and add two write functions to the blob definition. I’m a bit hesitant here as it affects compatibility and I’m not sure how application context dependent the second write option is.

Security that the pointer encapsulated in the blob (blobs capture arbitrary binary data, but in practice typically a pointer, something with some metadata) is reliable and can be made subject to Prolog (atom) garbage collection. With reliable, we want

It should not be possible to create a blob from Prolog to point at some arbitrary memory.
It should not be possible for a Prolog reference for a pointer to remain valid while the underlying object is gone. Related, it should not be possible for the Prolog reference to be valid again and point at some other object.

Before blobs, we encoded pointers as integers. The use case that caused be to introduce blobs was an application that behaved weird because the object pointed to (a stream) was freed and a new stream was allocated on that same location. As a result, completely unexpected data was written to the wrong stream. Using the current stream blobs, that cannot happen. If a stream is closed, the blob is marked as a closed stream and further use in Prolog raises a reliable error. Only after the blob is garbage collected it may be reused, possibly again for a stream. Because it has been garbage we are sure it is not the old Prolog context that will be reusing this stream.

swi · October 23, 2022, 10:47am

I was thinking about a more fundamental option, but you are right, protray and friends is better and more flexible. In this case then, I think that --for blobs-- write_canonical/1 (and CVT_WRITE_CANONICAL) should write the term returned by portray (of course, properly quoted) if we don’t have a more fundamental option. If the user does not want to portray they always have write_term/2,3.

Topic		Replies	Views
Redis and binary blobs Data Structure redis	1	122	May 16, 2024
Redis output conversion Request For Comments	0	428	October 30, 2020
Ann: SWI-Prolog 8.3.9 Releases	5	587	October 14, 2020
Library(redis): stream data General	18	700	November 20, 2020
New standard package: a redis client (Discussion) Discussion	25	706	October 14, 2020

Redis: error unserializing "as prolog" terms

Related topics