How to send a string + string length on a stream?

I want to write a predicate that takes a stream and a string and writes the string length first as a 64 bit integer followed by the string as UTF8 encoded.

Is this in the right direction? Also: I cannot figure out how to turn a Prolog number into bytes (i.e. turn 2 into binary 0x10). Any tips there?

sendString(Stream, String) :-
    % make sure it is UTF8 encoded
    set_prolog_flag(encoding, utf8),
    % get a set of UTF8 codes from the string
    stringCodes(String, Codes),
    % Count and send the count first
    length(Codes, Length),
    ???? How to convert Length to a binary 64 bit number and send it ????
    % Then send the encoded string
    % assuming this doesn't do anything with Codes except send them raw...
    write_term(Stream, Codes, []),
    flush_output(Stream).

stringCodes(String, Codes) :-
    with_output_to(codes(Codes),
        (
            current_output(Stream),
            % I believe this will use the set_prolog_flag(encoding) to convert
            % to UTF8 bytes?
            write(Stream, String)
        )).

In SWI-Prolog is there an easy way to convert numbers from one base to another using REPL?

?- format("~2r", 0xFF).
11111111
true.

?- format("~2r",2).
10
true.

?- format("~4r",0xF0).
3300
true.

You seem to be doing something similar to library(protobufs).

This code might help you (protobufs.pl, protobufs.c): GitHub - SWI-Prolog/contrib-protobufs: An interface to Google Protocol Buffers (protobuf)

I should have been clearer. I am looking for a way to turn a Prolog term that is a number into a series of bytes that represent that number. I.e.:

?- X = 10000, magic(X, Y).
Y= [39, 16]

where the magic predicate is turning the Prolog number 10000 into a series of bytes that represent that number in binary form.

Yes, but it is much simpler since I literally just want to send a string preceeded by its length. It feels like learning protocol buffers might be overkill…

There’s probably code in the referenced files that does what you want.
See also utf8_codes/3 in library(utf8).

PS: Protobufs aren’t very difficult, particularly as the SWI-Prolog version doesn’t require a protobuf compiler (I think – I’ve only used protobufs with C++ and Python). There’s an example here:

@ericzinda, who is reading the stream on the other side?

Option 1: the reading program is an existing program

  • Use code similar to this from @jamesnvc to convert the int to bytes.

Option 2: the reading program is a prolog program

  • I would suggest simply to use write_canonical/2, but if you want
    more speed you can use fast term io

Option 3: the reading program is a C/C++ program that you control

  • then I would suggest you serialize the string as:
    ** the outpur of write(Len)
    ** put_byte(0) or some such separator
    ** the string as bytes

  • the reason for this suggestion is that different machines represent integers in a different way, some are big endian, some are little endian, so this allows you portability and easy debugging, and easy parsing on the reading side.

  • if you are serializing more than just some strings I would suggest you use msgpack or protobufs like peter suggested.

EDIT: side note, if you are making a distributed app you may want to use library(redis) instead to communicate the different sides.

That’s one of the advantages of protobufs. :wink:

You could also write a foreign predicate that uses the various endian-conversion functions, and then use a C union to fake the result as a string.

1 Like

These are great options, thanks! Option 1 is what I was driving for. But: the other side is a Python program that I control and I ended up doing your option 3 as a “workaround”, but I see what you mean about the endianness, etc. So I’ll take your recommendation and just keep it that way.

Yes, I should bite the bullet and dig into the protocol buffers at some point, but I really am just serializing one string here. I’ll take a look at redis. Sounds interesting.

thanks everyone!

Do you need to use binary format?
If not, you can easily use a string format, such as JSON, which is well supported by both Prolog and Python … and for strings, you don’t even need to to send the length – JSON encoding/decoding takes care of it all.

I actually am using a JSON string but I wanted to go the string length/string route just to give the caller more flexibility on how they deal with the server I’m building. Totally agree that with my particular scenario now I could just use the python JSON library and avoid the count altogether.

FWIW: Here is the code I ended up with (and there is a set_prolog_flag(encoding, utf8) term elsewhere to lock in UTF-8):

reply(Stream, Term) :-
    % Convert term to an actual JSON string
    stringJson(String, Term),
    % Get the length that the string will be once encoded
    % and write into the stream
    stringCodes(String, Codes),
    length(Codes, Length),
    write_term(Stream, Length, []),
    % Terminate with 'a' so we know that's the end of the count
    write_term(Stream, a, []),
    % Now write the actual JSON
    write(Stream, String),
    flush_output(Stream).

stringCodes(String, Codes) :-
    with_output_to(codes(Codes),
        (
            current_output(Stream),
            write(Stream, String)
        )).

stringJson(String, Term) :-
    % Convert Prolog Term to a Prolog JSON term
    term_to_json(Term, Json),
    % Now write to a string
    with_output_to(string(String),
        (
            current_output(Stream),
            json_write(Stream, Json)
        )).

If you want to have the binary representation instead of a variable-length string, you can use protobufs:int64_codes/2 (it’s an unexported predicate, written in C).

protobufs:int64_codes(300, Z), format(string(S), '~c~c~c~c~c~c~c~c~n', Z).
Z = [44, 1, 0, 0, 0, 0, 0, 0],
S = ",\001\\000\\000\\000\\000\\000\\000\\n".

(there’s probably a predicate for outputting codes, but I’m too lazy to search for it.)

Nice! Thanks @peter.ludemann and @j4n_bur53. These were the kind of predicates I was looking for originally.

As mentioned elsewhere, watch out for big-endian / little-endian when converting from wire format to internal form.

I’ve modified library(protobufs) so that float32_codes/2 et al are exported. This should be available in the next SWI-Prolog development version release.

The low-level protobuf wire format – which seems to be close to what you want – is pretty easy to deal with, using int32_codes/2 and similar. If you want more compactness, you can use var_int//1 in protobufs.pl, which isn’t currently exported. (The var_int is used for encoding things like message tags and string lengths.) If you don’t care about compactness, just use the fixed-length predicates.

awesome! thanks @peter.ludemann

I think I’ll also expose var_int//1 and tag_type//2 in the interface (but renamed to protobuf_var_int//1 and protobuf_tag_tpye//2). But I have a few other things to do first, so these won’t be available right away.

For completeness, here’s the code I finally ended up with. @jan had some suggestions on how to clean it up. write_message/2 writes a string to Stream with a leading count:

write_message(Stream, String) :-
    write_string_length(Stream, String),
    write(Stream, String),
    flush_output(Stream).
    

% Terminate with '.\n' so we know that's the end of the count
write_string_length(Stream, String) :-
    stream_property(Stream, encoding(Encoding)),
    string_encoding_length(String, Encoding, Length),
    format(Stream, "~d.\n", [Length]).
    

% converts a string to Codes using Encoding
string_encoding_length(String, Encoding, Length) :-
    setup_call_cleanup(
        open_null_stream(Out),
        (   set_stream(Out, encoding(Encoding)),
            write(Out, String),
            byte_count(Out, Length)
        ),
        close(Out)).

FYI, the protobuf low level functions have some subtle bugs in them for handling numbers, so I’m going to fix the bugs and rename the functions to properly match their behaviors.

If you’re writing the length as a string (format(Stream,'~d.~n'[...]), followed by a string, then you won’t need these, of course.

This might be a simpler version (untested), if your string is valid Unicode:

    phrase(utf8_codes(Codes), String),
    length(Codes, Bytes),
    format(Stream, '~d.~n~s', [Bytes, String]).