I want to write a predicate that takes a stream and a string and writes the string length first as a 64 bit integer followed by the string as UTF8 encoded.
Is this in the right direction? Also: I cannot figure out how to turn a Prolog number into bytes (i.e. turn 2 into binary 0x10). Any tips there?
sendString(Stream, String) :-
% make sure it is UTF8 encoded
set_prolog_flag(encoding, utf8),
% get a set of UTF8 codes from the string
stringCodes(String, Codes),
% Count and send the count first
length(Codes, Length),
???? How to convert Length to a binary 64 bit number and send it ????
% Then send the encoded string
% assuming this doesn't do anything with Codes except send them raw...
write_term(Stream, Codes, []),
flush_output(Stream).
stringCodes(String, Codes) :-
with_output_to(codes(Codes),
(
current_output(Stream),
% I believe this will use the set_prolog_flag(encoding) to convert
% to UTF8 bytes?
write(Stream, String)
)).
Yes, but it is much simpler since I literally just want to send a string preceeded by its length. It feels like learning protocol buffers might be overkill…
There’s probably code in the referenced files that does what you want.
See also utf8_codes/3 in library(utf8).
PS: Protobufs aren’t very difficult, particularly as the SWI-Prolog version doesn’t require a protobuf compiler (I think – I’ve only used protobufs with C++ and Python). There’s an example here:
@ericzinda, who is reading the stream on the other side?
Option 1: the reading program is an existing program
Use code similar to this from @jamesnvc to convert the int to bytes.
Option 2: the reading program is a prolog program
I would suggest simply to use write_canonical/2, but if you want
more speed you can use fast term io
Option 3: the reading program is a C/C++ program that you control
then I would suggest you serialize the string as:
** the outpur of write(Len)
** put_byte(0) or some such separator
** the string as bytes
the reason for this suggestion is that different machines represent integers in a different way, some are big endian, some are little endian, so this allows you portability and easy debugging, and easy parsing on the reading side.
if you are serializing more than just some strings I would suggest you use msgpack or protobufs like peter suggested.
EDIT: side note, if you are making a distributed app you may want to use library(redis) instead to communicate the different sides.
These are great options, thanks! Option 1 is what I was driving for. But: the other side is a Python program that I control and I ended up doing your option 3 as a “workaround”, but I see what you mean about the endianness, etc. So I’ll take your recommendation and just keep it that way.
Yes, I should bite the bullet and dig into the protocol buffers at some point, but I really am just serializing one string here. I’ll take a look at redis. Sounds interesting.
Do you need to use binary format?
If not, you can easily use a string format, such as JSON, which is well supported by both Prolog and Python … and for strings, you don’t even need to to send the length – JSON encoding/decoding takes care of it all.
I actually am using a JSON string but I wanted to go the string length/string route just to give the caller more flexibility on how they deal with the server I’m building. Totally agree that with my particular scenario now I could just use the python JSON library and avoid the count altogether.
FWIW: Here is the code I ended up with (and there is a set_prolog_flag(encoding, utf8) term elsewhere to lock in UTF-8):
reply(Stream, Term) :-
% Convert term to an actual JSON string
stringJson(String, Term),
% Get the length that the string will be once encoded
% and write into the stream
stringCodes(String, Codes),
length(Codes, Length),
write_term(Stream, Length, []),
% Terminate with 'a' so we know that's the end of the count
write_term(Stream, a, []),
% Now write the actual JSON
write(Stream, String),
flush_output(Stream).
stringCodes(String, Codes) :-
with_output_to(codes(Codes),
(
current_output(Stream),
write(Stream, String)
)).
stringJson(String, Term) :-
% Convert Prolog Term to a Prolog JSON term
term_to_json(Term, Json),
% Now write to a string
with_output_to(string(String),
(
current_output(Stream),
json_write(Stream, Json)
)).
If you want to have the binary representation instead of a variable-length string, you can use protobufs:int64_codes/2 (it’s an unexported predicate, written in C).
protobufs:int64_codes(300, Z), format(string(S), '~c~c~c~c~c~c~c~c~n', Z).
Z = [44, 1, 0, 0, 0, 0, 0, 0],
S = ",\001\\000\\000\\000\\000\\000\\000\\n".
(there’s probably a predicate for outputting codes, but I’m too lazy to search for it.)
I’ve modified library(protobufs) so that float32_codes/2 et al are exported. This should be available in the next SWI-Prolog development version release.
The low-level protobuf wire format – which seems to be close to what you want – is pretty easy to deal with, using int32_codes/2 and similar. If you want more compactness, you can use var_int//1 in protobufs.pl, which isn’t currently exported. (The var_int is used for encoding things like message tags and string lengths.) If you don’t care about compactness, just use the fixed-length predicates.
I think I’ll also expose var_int//1 and tag_type//2 in the interface (but renamed to protobuf_var_int//1 and protobuf_tag_tpye//2). But I have a few other things to do first, so these won’t be available right away.
For completeness, here’s the code I finally ended up with. @jan had some suggestions on how to clean it up. write_message/2 writes a string to Stream with a leading count:
write_message(Stream, String) :-
write_string_length(Stream, String),
write(Stream, String),
flush_output(Stream).
% Terminate with '.\n' so we know that's the end of the count
write_string_length(Stream, String) :-
stream_property(Stream, encoding(Encoding)),
string_encoding_length(String, Encoding, Length),
format(Stream, "~d.\n", [Length]).
% converts a string to Codes using Encoding
string_encoding_length(String, Encoding, Length) :-
setup_call_cleanup(
open_null_stream(Out),
( set_stream(Out, encoding(Encoding)),
write(Out, String),
byte_count(Out, Length)
),
close(Out)).
FYI, the protobuf low level functions have some subtle bugs in them for handling numbers, so I’m going to fix the bugs and rename the functions to properly match their behaviors.
If you’re writing the length as a string (format(Stream,'~d.~n'[...]), followed by a string, then you won’t need these, of course.