Prolog streams from foreign code

Hello friends,

I’m currently investigating how to directly write to streams from rust code. The main reason for this is that at TerminusDB we realized that returning strings from rust into prolog, then having prolog print them, incurs a weird overhead where a utf8 string first gets converted into a wide character format, and then back into a utf8 string. In some of the benchmarks I ran, that was good for about 10% of my total runtime. So it makes sense to print utf8 directly from rust when we can.

Luckily, SWI-Prolog seems to have a whole range of functions for doing just that. Unfortunately, documentation is a bit lacking. Through trial and error I was able to get something working (got rid of that 10% overhead), but I’m unsure about the memory safety of what I’m doing. In particular, I’m worried about the new functionality I’m adding into swipl-rs for it.

So, I have a couple of questions which can hopefully clarify a lot. In all this I’m mostly talking about output streams, but any information about input streams is very welcome as well.

Getting streams from prolog

For getting a stream out of prolog, there’s both PL_get_stream_handle and PL_get_stream. The difference between the two is that PL_get_stream takes an extra (undocumented) flags argument. Looking somewhat deeper in the code I found a bunch of SH_ flags which seem to go here: SH_ERRORS, SH_ALIAS, SH_UNLOCKED, SH_OUTPUT, SH_INPUT and SH_NOPAIR. These aren’t exported from any of the header files. Unfortunately, I don’t really know what exactly these flags are supposed to do, or how I should use them. What do these flags do, and what does it mean to call the variant without flags?

The stream handle

Having called PL_get_stream, I now have an *IOSTREAM. It is unclear to me though what sort of lifetime this pointer has, and in what sort of contexts it is valid. Can I safely store this pointer and use it later? Can I send it to other threads and do something with it there? And if not, is there some way to get hold of a stream handle which does allow storage and thread moving?

Acquire and release

There are two functions in the SWI-Prolog.h header called PL_acquire_stream and PL_release_stream. What exactly are these for? Do I need to call these whenever I wish to use a stream? Or do they prevent the stream from being freed, like with registering and unregistering atoms? Or do they do something else entirely?

Stream errors

I noticed there’s an error system where sometimes, errors are somehow registered with the stream, instead of as a term that is accessible with PL_exception(). Instead, there are the functions Sferror, Sclearerr, Sseterr, and Sset_exception. Here, Sset_exception takes a term, but Sferror returns an int. Is this int a term ref too? Or is it something else? Is there some way to get to an error term or an error message that is currently set on the stream? And when exactly does this error on a stream become a proper prolog exception?

Interpreting the current encoding

A lot of the stream processing is around the kind of encoding the stream is doing, and it’s because of this that I even started looking in this whole stream handling business anyway. As I understand it, the IOSTREAM has a field, encoding, which can tell me what encoding is currently being used for the stream. If this is ENC_UTF8, it is obviously utf-8. However, I noticed that often the stream is in another encoding, namely ENC_ANSI, which the code documentation says is the “default (multibyte) codepage”. In my experiments, when it is set to this I am able to write UTF-8 without any problems. Am I correct to assume that this default (multibyte) codepage on modern linux is almost always going to be UTF-8? Is there something special that SWI-Prolog does internally to make sure that a particular multi-byte encoding even fits this default?

Setting the current encoding

Finally, I noticed that there’s a function, Ssetenc, which appears to let you set the stream to another encoding. What does this actually do? In my experiments it seemed like I wasn’t really able to just arbitrarily set this. In particular, in combination with DWIM (which in its corrections will print blobs according to a blob-defined write function which uses a prolog stream, passing in a WCHAR-encoded stream), if I changed this encoding from the default ENC_WCHAR to ENC_UTF8 and wrote my utf8 string directly, it still came out as a garbled mess. So what is this for? When can I actually change the encoding of a stream?

That’s all my questions for now :slight_smile:

All the best,
Matthijs

2 Likes

I was running into problems with the lack of documentation in streams, and started writing some documentation. :wink: But it didn’t meet @jan’s quality standards, and I haven’t got around to rewriting it.

Some of my questions and @jan 's answers are here: (foreign functions) Safe release of resources during cleanup
(streams are a kind of blob, although they have somewhat different interfaces.)

The library(archive) code makes extensive use of streams (and streams that wrap blobs). I’m in the process of cleaning it up (lots of memory leaks, use-after-free, etc.) and have almost got things to the point of being able to submit a PR. If you can wait a few days, there might be some useful things there.

(I’ll let @jan answer your questions … we’ve had some offline discussions, which maybe should have been public)

1 Like

You are even kind :slight_smile: It would be good to document all that. I can give a few tips (don’t have much time now).

  • PL_get_stream_handle() is deprecated
  • PL_get_stream() SIO_INPUT/SIO_OUTPUT: If one is given it says I want an input or output stream. If the argument is a pair you’ll get the right one. If none is given and the stream is a pair, this is an error. The SIO_NOERROR tells the function not to raise an exception but just fail silently. The resulting stream is acquired (PL_acquire_stream()). This means you have exclusive ownership over it.
  • PL_release_stream() must be used when done with the stream, but as @peter.ludemann discovered, not after Sclose(). That invalidates the stream.

(this also answers the next topic)

Most of the functions are modeled after POSIX FILE interface with the same name and S in front of it. Initially I wanted this library to be completely detached from Prolog. That didn’t really work in the end. The error functions set an error condition and errno will be used, but with some complicated callbacks that no longer works. So, Sset_exception() can be used to set the error flag and associates an exception. PL_release_stream() is to be called when done with a stream. If the stream is in an error state it will map that to a Prolog exception and return FALSE.

Modern Unix systems luckily mostly adopt Unicode and use UTF-8 encoding. MacOS same story, Windows is trying to catch up. If Prolog detects that the encoding is UTF-8 (which is guesses from the locale name) it uses its own encoder and decoder. That is a lot faster than the glic one.

That is a bit too complicated. Ssetenc() simply sets ->encoding and if this is set to octet also clears the text mode. This field is used by Sputcode() and Sgetcode(), etc. Notably if the content of the buffer is not what you set the encoding to, you get garbage.

As someone who has implemented a POSIX FILE interface on a non-POSIX machine (MVS and CP/CMS, if you must know), I can assure you that the POSIX FILE interface is, um, a bit less than delightful (errno and perror(), for example), so you can expect some rather ugly bits.

“We are tied down to a language which makes up in obscurity what it lacks in style.”

1 Like

@matthijs, @peter.ludemann. Considering the recent activity accessing streams from foreign code, I have pushed a first draft documenting handling streams in foreign code. I have also updated the website. See SWI-Prolog -- Manual

Quite likely there are quite a few typos and small glitches :frowning:

Feel free send PRs, add or ask for further clarifications :slight_smile:

2 Likes

I finally got around to checking out the documentation and I’m very pleased! It really does answer a lot of my questions about the stream interface. I’m especially interested in defining my own stream types at some point, that seems pretty neat.

One thing I am unsure of now though is what exactly happens when a stream is closed. From my experiments I can see that calling Sclose on a stream will ensure that further PL_get_stream calls on terms containing that stream will return a null-pointer, and prolog predicates that operate on streams will report that the stream doesn’t exist. However, when I just have a raw handle floating around in memory, I have no way of telling that this stream has been closed, and I suspect that keeping on using this handle would lead to icky undefined behavior. Is that right?

Of course, as long as a stream is acquired (which happens automatically on PL_get_stream, that’s not gonna happen, because Sclose will first acquire the stream and therefore block. But I think that means it is impossible to safely move a stream to another thread (as releasing on thread a, then acquiring on thread b, leaves a window open for a racing thread to close the stream, thus invalidating it). I suspect the only way around that is to use Sclosehook and have some registry of streams with metadata, which seems pretty silly.

Another thing, am I correct in assuming that garbage collection might just reap a stream if nothing in prolog refers to it anymore? In that case, for long-lived streams that need to be stored in foreign memory, wouldn’t it be better to keep streams around as atoms, and only convert those atoms to iostreams when necessary? Of course, that raises the question on how to do that, as there doesn’t seem to be an equivalent of PL_get_stream for atoms exported. get_stream_handle in pl-file.c is probably what I want there, but it’s unfortunately not exported. I guess I can work around that by first unifying with a temporary term, then calling PL_get_stream on that term, but that’s a bit annoying.

Loads of questions again :). I know you’re probably off on some well-deserved time off though, so no rush in answering.

Thanks,
Matthijs

IOSTREAM is pretty much like FILE: after closing the memory is freed and all access is thus unsafe.

I’d stay away of these hooks. They have been added to allow the stream library to be independent from Prolog and keep the Prolog admin consistent. I doubt that still makes much sense, so maybe this will change at some point.

In older versions garbage collecting a stream in Prolog has no effect on the IOSTREAM. Since recent there is a flag agc_close_streams. Right now this flag is by default false, but eventually I’d like to turn that to true, letting the atom garbage collector reap dead streams. My guess is that that would make code not using setup_call_cleanup/3 to make sure files are closed and (thus) incidentally leaving a file open a lot safer. It prints a warning as this is (still) bad practice, for example we have no sensible way to deal with (I/O) errors that may happen during close in the garbage collector. The context in which the error should have been raised is gone and in any case lived in a different (unknown and possibly no longer living) thread.

I have added PL_get_stream_from_blob() to avoid this. This is still a bit clumsy. Am I write to assume you want to make a stream available to Rust and incidentally read/write some data from an arbitrary Rust thread? The way that would go now is

  • In Prolog, use is_stream/1 and/or stream_property/2 to validate you have a proper stream.
  • Make a foreign call with the stream handle.
  • Use PL_get_atom() to get the blob handle. Give it a reference using PL_register_atom() and store it.
  • When access is required, use PL_get_stream_from_blob(), read/write and use PL_release_stream().
  • When done, remove the reference and call PL_unregister_atom(). If you want to close the stream, use PL_get_stream_from_blob() and Sclose() before unregistering.