Prolog streams from foreign code

Hello friends,

I’m currently investigating how to directly write to streams from rust code. The main reason for this is that at TerminusDB we realized that returning strings from rust into prolog, then having prolog print them, incurs a weird overhead where a utf8 string first gets converted into a wide character format, and then back into a utf8 string. In some of the benchmarks I ran, that was good for about 10% of my total runtime. So it makes sense to print utf8 directly from rust when we can.

Luckily, SWI-Prolog seems to have a whole range of functions for doing just that. Unfortunately, documentation is a bit lacking. Through trial and error I was able to get something working (got rid of that 10% overhead), but I’m unsure about the memory safety of what I’m doing. In particular, I’m worried about the new functionality I’m adding into swipl-rs for it.

So, I have a couple of questions which can hopefully clarify a lot. In all this I’m mostly talking about output streams, but any information about input streams is very welcome as well.

Getting streams from prolog

For getting a stream out of prolog, there’s both PL_get_stream_handle and PL_get_stream. The difference between the two is that PL_get_stream takes an extra (undocumented) flags argument. Looking somewhat deeper in the code I found a bunch of SH_ flags which seem to go here: SH_ERRORS, SH_ALIAS, SH_UNLOCKED, SH_OUTPUT, SH_INPUT and SH_NOPAIR. These aren’t exported from any of the header files. Unfortunately, I don’t really know what exactly these flags are supposed to do, or how I should use them. What do these flags do, and what does it mean to call the variant without flags?

The stream handle

Having called PL_get_stream, I now have an *IOSTREAM. It is unclear to me though what sort of lifetime this pointer has, and in what sort of contexts it is valid. Can I safely store this pointer and use it later? Can I send it to other threads and do something with it there? And if not, is there some way to get hold of a stream handle which does allow storage and thread moving?

Acquire and release

There are two functions in the SWI-Prolog.h header called PL_acquire_stream and PL_release_stream. What exactly are these for? Do I need to call these whenever I wish to use a stream? Or do they prevent the stream from being freed, like with registering and unregistering atoms? Or do they do something else entirely?

Stream errors

I noticed there’s an error system where sometimes, errors are somehow registered with the stream, instead of as a term that is accessible with PL_exception(). Instead, there are the functions Sferror, Sclearerr, Sseterr, and Sset_exception. Here, Sset_exception takes a term, but Sferror returns an int. Is this int a term ref too? Or is it something else? Is there some way to get to an error term or an error message that is currently set on the stream? And when exactly does this error on a stream become a proper prolog exception?

Interpreting the current encoding

A lot of the stream processing is around the kind of encoding the stream is doing, and it’s because of this that I even started looking in this whole stream handling business anyway. As I understand it, the IOSTREAM has a field, encoding, which can tell me what encoding is currently being used for the stream. If this is ENC_UTF8, it is obviously utf-8. However, I noticed that often the stream is in another encoding, namely ENC_ANSI, which the code documentation says is the “default (multibyte) codepage”. In my experiments, when it is set to this I am able to write UTF-8 without any problems. Am I correct to assume that this default (multibyte) codepage on modern linux is almost always going to be UTF-8? Is there something special that SWI-Prolog does internally to make sure that a particular multi-byte encoding even fits this default?

Setting the current encoding

Finally, I noticed that there’s a function, Ssetenc, which appears to let you set the stream to another encoding. What does this actually do? In my experiments it seemed like I wasn’t really able to just arbitrarily set this. In particular, in combination with DWIM (which in its corrections will print blobs according to a blob-defined write function which uses a prolog stream, passing in a WCHAR-encoded stream), if I changed this encoding from the default ENC_WCHAR to ENC_UTF8 and wrote my utf8 string directly, it still came out as a garbled mess. So what is this for? When can I actually change the encoding of a stream?

That’s all my questions for now :slight_smile:

All the best,
Matthijs

2 Likes

I was running into problems with the lack of documentation in streams, and started writing some documentation. :wink: But it didn’t meet @jan’s quality standards, and I haven’t got around to rewriting it.

Some of my questions and @jan 's answers are here: (foreign functions) Safe release of resources during cleanup
(streams are a kind of blob, although they have somewhat different interfaces.)

The library(archive) code makes extensive use of streams (and streams that wrap blobs). I’m in the process of cleaning it up (lots of memory leaks, use-after-free, etc.) and have almost got things to the point of being able to submit a PR. If you can wait a few days, there might be some useful things there.

(I’ll let @jan answer your questions … we’ve had some offline discussions, which maybe should have been public)

1 Like

You are even kind :slight_smile: It would be good to document all that. I can give a few tips (don’t have much time now).

  • PL_get_stream_handle() is deprecated
  • PL_get_stream() SIO_INPUT/SIO_OUTPUT: If one is given it says I want an input or output stream. If the argument is a pair you’ll get the right one. If none is given and the stream is a pair, this is an error. The SIO_NOERROR tells the function not to raise an exception but just fail silently. The resulting stream is acquired (PL_acquire_stream()). This means you have exclusive ownership over it.
  • PL_release_stream() must be used when done with the stream, but as @peter.ludemann discovered, not after Sclose(). That invalidates the stream.

(this also answers the next topic)

Most of the functions are modeled after POSIX FILE interface with the same name and S in front of it. Initially I wanted this library to be completely detached from Prolog. That didn’t really work in the end. The error functions set an error condition and errno will be used, but with some complicated callbacks that no longer works. So, Sset_exception() can be used to set the error flag and associates an exception. PL_release_stream() is to be called when done with a stream. If the stream is in an error state it will map that to a Prolog exception and return FALSE.

Modern Unix systems luckily mostly adopt Unicode and use UTF-8 encoding. MacOS same story, Windows is trying to catch up. If Prolog detects that the encoding is UTF-8 (which is guesses from the locale name) it uses its own encoder and decoder. That is a lot faster than the glic one.

That is a bit too complicated. Ssetenc() simply sets ->encoding and if this is set to octet also clears the text mode. This field is used by Sputcode() and Sgetcode(), etc. Notably if the content of the buffer is not what you set the encoding to, you get garbage.

As someone who has implemented a POSIX FILE interface on a non-POSIX machine (MVS and CP/CMS, if you must know), I can assure you that the POSIX FILE interface is, um, a bit less than delightful (errno and perror(), for example), so you can expect some rather ugly bits.

“We are tied down to a language which makes up in obscurity what it lacks in style.”

1 Like

@matthijs, @peter.ludemann. Considering the recent activity accessing streams from foreign code, I have pushed a first draft documenting handling streams in foreign code. I have also updated the website. See SWI-Prolog -- Manual

Quite likely there are quite a few typos and small glitches :frowning:

Feel free send PRs, add or ask for further clarifications :slight_smile:

2 Likes