Hello friends,
I’m currently investigating how to directly write to streams from rust code. The main reason for this is that at TerminusDB we realized that returning strings from rust into prolog, then having prolog print them, incurs a weird overhead where a utf8 string first gets converted into a wide character format, and then back into a utf8 string. In some of the benchmarks I ran, that was good for about 10% of my total runtime. So it makes sense to print utf8 directly from rust when we can.
Luckily, SWI-Prolog seems to have a whole range of functions for doing just that. Unfortunately, documentation is a bit lacking. Through trial and error I was able to get something working (got rid of that 10% overhead), but I’m unsure about the memory safety of what I’m doing. In particular, I’m worried about the new functionality I’m adding into swipl-rs for it.
So, I have a couple of questions which can hopefully clarify a lot. In all this I’m mostly talking about output streams, but any information about input streams is very welcome as well.
Getting streams from prolog
For getting a stream out of prolog, there’s both PL_get_stream_handle
and PL_get_stream
. The difference between the two is that PL_get_stream
takes an extra (undocumented) flags argument. Looking somewhat deeper in the code I found a bunch of SH_
flags which seem to go here: SH_ERRORS
, SH_ALIAS
, SH_UNLOCKED
, SH_OUTPUT
, SH_INPUT
and SH_NOPAIR
. These aren’t exported from any of the header files. Unfortunately, I don’t really know what exactly these flags are supposed to do, or how I should use them. What do these flags do, and what does it mean to call the variant without flags?
The stream handle
Having called PL_get_stream
, I now have an *IOSTREAM
. It is unclear to me though what sort of lifetime this pointer has, and in what sort of contexts it is valid. Can I safely store this pointer and use it later? Can I send it to other threads and do something with it there? And if not, is there some way to get hold of a stream handle which does allow storage and thread moving?
Acquire and release
There are two functions in the SWI-Prolog.h
header called PL_acquire_stream
and PL_release_stream
. What exactly are these for? Do I need to call these whenever I wish to use a stream? Or do they prevent the stream from being freed, like with registering and unregistering atoms? Or do they do something else entirely?
Stream errors
I noticed there’s an error system where sometimes, errors are somehow registered with the stream, instead of as a term that is accessible with PL_exception()
. Instead, there are the functions Sferror
, Sclearerr
, Sseterr
, and Sset_exception
. Here, Sset_exception
takes a term, but Sferror
returns an int. Is this int a term ref too? Or is it something else? Is there some way to get to an error term or an error message that is currently set on the stream? And when exactly does this error on a stream become a proper prolog exception?
Interpreting the current encoding
A lot of the stream processing is around the kind of encoding the stream is doing, and it’s because of this that I even started looking in this whole stream handling business anyway. As I understand it, the IOSTREAM
has a field, encoding
, which can tell me what encoding is currently being used for the stream. If this is ENC_UTF8
, it is obviously utf-8. However, I noticed that often the stream is in another encoding, namely ENC_ANSI
, which the code documentation says is the “default (multibyte) codepage”. In my experiments, when it is set to this I am able to write UTF-8 without any problems. Am I correct to assume that this default (multibyte) codepage on modern linux is almost always going to be UTF-8? Is there something special that SWI-Prolog does internally to make sure that a particular multi-byte encoding even fits this default?
Setting the current encoding
Finally, I noticed that there’s a function, Ssetenc
, which appears to let you set the stream to another encoding. What does this actually do? In my experiments it seemed like I wasn’t really able to just arbitrarily set this. In particular, in combination with DWIM (which in its corrections will print blobs according to a blob-defined write function which uses a prolog stream, passing in a WCHAR-encoded stream), if I changed this encoding from the default ENC_WCHAR
to ENC_UTF8
and wrote my utf8 string directly, it still came out as a garbled mess. So what is this for? When can I actually change the encoding of a stream?
That’s all my questions for now
All the best,
Matthijs