How to send a string + string length on a stream?

ericzinda · March 29, 2021, 9:28pm

I want to write a predicate that takes a stream and a string and writes the string length first as a 64 bit integer followed by the string as UTF8 encoded.

Is this in the right direction? Also: I cannot figure out how to turn a Prolog number into bytes (i.e. turn 2 into binary 0x10). Any tips there?

sendString(Stream, String) :-
    % make sure it is UTF8 encoded
    set_prolog_flag(encoding, utf8),
    % get a set of UTF8 codes from the string
    stringCodes(String, Codes),
    % Count and send the count first
    length(Codes, Length),
    ???? How to convert Length to a binary 64 bit number and send it ????
    % Then send the encoded string
    % assuming this doesn't do anything with Codes except send them raw...
    write_term(Stream, Codes, []),
    flush_output(Stream).

stringCodes(String, Codes) :-
    with_output_to(codes(Codes),
        (
            current_output(Stream),
            % I believe this will use the set_prolog_flag(encoding) to convert
            % to UTF8 bytes?
            write(Stream, String)
        )).

EricGT · March 29, 2021, 10:01pm

In SWI-Prolog is there an easy way to convert numbers from one base to another using REPL?

?- format("~2r", 0xFF).
11111111
true.

?- format("~2r",2).
10
true.

?- format("~4r",0xF0).
3300
true.

peter.ludemann · March 29, 2021, 10:11pm

You seem to be doing something similar to library(protobufs).

This code might help you (protobufs.pl, protobufs.c): GitHub - SWI-Prolog/contrib-protobufs: An interface to Google Protocol Buffers (protobuf)

ericzinda · March 30, 2021, 12:39am

I should have been clearer. I am looking for a way to turn a Prolog term that is a number into a series of bytes that represent that number. I.e.:

?- X = 10000, magic(X, Y).
Y= [39, 16]

where the magic predicate is turning the Prolog number 10000 into a series of bytes that represent that number in binary form.

ericzinda · March 30, 2021, 12:42am

Yes, but it is much simpler since I literally just want to send a string preceeded by its length. It feels like learning protocol buffers might be overkill…

peter.ludemann · March 30, 2021, 12:59am

There’s probably code in the referenced files that does what you want.
See also utf8_codes/3 in library(utf8).

PS: Protobufs aren’t very difficult, particularly as the SWI-Prolog version doesn’t require a protobuf compiler (I think – I’ve only used protobufs with C++ and Python). There’s an example here:

github.com

SWI-Prolog/contrib-protobufs/blob/a7bca5bc136fc2f63423afc4c51998225fbc7cca/test_protobufs.pl

/*  Part of SWI-Prolog

    Author:        Jeffrey Rosenwald
    E-mail:        jeffrose@acm.org
    WWW:           http://www.swi-prolog.org
    Copyright (c)  2010-2018, Jeffrey Rosenwald
    All rights reserved.

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions
    are met:

    1. Redistributions of source code must retain the above copyright
       notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
       notice, this list of conditions and the following disclaimer in
       the documentation and/or other materials provided with the
       distribution.

This file has been truncated. show original

swi · March 30, 2021, 1:16am

@ericzinda, who is reading the stream on the other side?

Option 1: the reading program is an existing program

Use code similar to this from @jamesnvc to convert the int to bytes.

Option 2: the reading program is a prolog program

I would suggest simply to use write_canonical/2, but if you want
more speed you can use fast term io

Option 3: the reading program is a C/C++ program that you control

then I would suggest you serialize the string as:
** the outpur of write(Len)
** put_byte(0) or some such separator
** the string as bytes
the reason for this suggestion is that different machines represent integers in a different way, some are big endian, some are little endian, so this allows you portability and easy debugging, and easy parsing on the reading side.
if you are serializing more than just some strings I would suggest you use msgpack or protobufs like peter suggested.

EDIT: side note, if you are making a distributed app you may want to use library(redis) instead to communicate the different sides.

peter.ludemann · March 30, 2021, 1:34am

That’s one of the advantages of protobufs.

You could also write a foreign predicate that uses the various endian-conversion functions, and then use a C union to fake the result as a string.

ericzinda · March 30, 2021, 1:35am

These are great options, thanks! Option 1 is what I was driving for. But: the other side is a Python program that I control and I ended up doing your option 3 as a “workaround”, but I see what you mean about the endianness, etc. So I’ll take your recommendation and just keep it that way.

Yes, I should bite the bullet and dig into the protocol buffers at some point, but I really am just serializing one string here. I’ll take a look at redis. Sounds interesting.

thanks everyone!

peter.ludemann · March 30, 2021, 1:55am

Do you need to use binary format?
If not, you can easily use a string format, such as JSON, which is well supported by both Prolog and Python … and for strings, you don’t even need to to send the length – JSON encoding/decoding takes care of it all.

ericzinda · March 30, 2021, 2:39pm

I actually am using a JSON string but I wanted to go the string length/string route just to give the caller more flexibility on how they deal with the server I’m building. Totally agree that with my particular scenario now I could just use the python JSON library and avoid the count altogether.

FWIW: Here is the code I ended up with (and there is a set_prolog_flag(encoding, utf8) term elsewhere to lock in UTF-8):

reply(Stream, Term) :-
    % Convert term to an actual JSON string
    stringJson(String, Term),
    % Get the length that the string will be once encoded
    % and write into the stream
    stringCodes(String, Codes),
    length(Codes, Length),
    write_term(Stream, Length, []),
    % Terminate with 'a' so we know that's the end of the count
    write_term(Stream, a, []),
    % Now write the actual JSON
    write(Stream, String),
    flush_output(Stream).

stringCodes(String, Codes) :-
    with_output_to(codes(Codes),
        (
            current_output(Stream),
            write(Stream, String)
        )).

stringJson(String, Term) :-
    % Convert Prolog Term to a Prolog JSON term
    term_to_json(Term, Json),
    % Now write to a string
    with_output_to(string(String),
        (
            current_output(Stream),
            json_write(Stream, Json)
        )).

peter.ludemann · March 31, 2021, 4:23pm

If you want to have the binary representation instead of a variable-length string, you can use protobufs:int64_codes/2 (it’s an unexported predicate, written in C).

protobufs:int64_codes(300, Z), format(string(S), '~c~c~c~c~c~c~c~c~n', Z).
Z = [44, 1, 0, 0, 0, 0, 0, 0],
S = ",\001\\000\\000\\000\\000\\000\\000\\n".

(there’s probably a predicate for outputting codes, but I’m too lazy to search for it.)

ericzinda · March 31, 2021, 5:34pm

Nice! Thanks @peter.ludemann and @j4n_bur53. These were the kind of predicates I was looking for originally.

peter.ludemann · March 31, 2021, 8:20pm

As mentioned elsewhere, watch out for big-endian / little-endian when converting from wire format to internal form.

peter.ludemann · April 23, 2021, 5:20pm

I’ve modified library(protobufs) so that float32_codes/2 et al are exported. This should be available in the next SWI-Prolog development version release.

The low-level protobuf wire format – which seems to be close to what you want – is pretty easy to deal with, using int32_codes/2 and similar. If you want more compactness, you can use var_int//1 in protobufs.pl, which isn’t currently exported. (The var_int is used for encoding things like message tags and string lengths.) If you don’t care about compactness, just use the fixed-length predicates.

ericzinda · April 26, 2021, 3:09am

awesome! thanks @peter.ludemann

peter.ludemann · April 26, 2021, 9:28pm

I think I’ll also expose var_int//1 and tag_type//2 in the interface (but renamed to protobuf_var_int//1 and protobuf_tag_tpye//2). But I have a few other things to do first, so these won’t be available right away.

ericzinda · June 23, 2021, 7:20pm

For completeness, here’s the code I finally ended up with. @jan had some suggestions on how to clean it up. write_message/2 writes a string to Stream with a leading count:

write_message(Stream, String) :-
    write_string_length(Stream, String),
    write(Stream, String),
    flush_output(Stream).
    

% Terminate with '.\n' so we know that's the end of the count
write_string_length(Stream, String) :-
    stream_property(Stream, encoding(Encoding)),
    string_encoding_length(String, Encoding, Length),
    format(Stream, "~d.\n", [Length]).
    

% converts a string to Codes using Encoding
string_encoding_length(String, Encoding, Length) :-
    setup_call_cleanup(
        open_null_stream(Out),
        (   set_stream(Out, encoding(Encoding)),
            write(Out, String),
            byte_count(Out, Length)
        ),
        close(Out)).

peter.ludemann · June 23, 2021, 7:42pm

FYI, the protobuf low level functions have some subtle bugs in them for handling numbers, so I’m going to fix the bugs and rename the functions to properly match their behaviors.

If you’re writing the length as a string (format(Stream,'~d.~n'[...]), followed by a string, then you won’t need these, of course.

peter.ludemann · June 23, 2021, 7:48pm

This might be a simpler version (untested), if your string is valid Unicode:

    phrase(utf8_codes(Codes), String),
    length(Codes, Bytes),
    format(Stream, '~d.~n~s', [Bytes, String]).

Topic		Replies	Views
Byte ordering/conversion predicates? Predicate	10	1572	September 5, 2021
Write IEEE 754 floats in binary stream Help!	13	238	December 5, 2023
Read_string/3 counts code points, right? Help!	5	491	January 28, 2022
Read a number of bytes from a stream Help!	2	285	October 2, 2023
term_string('A', "'A'") Help! discussion	18	817	June 22, 2022

How to send a string + string length on a stream?

Related topics