Changes to SWI-cpp.h

We are doing what I hope is a final review of the changes to SWI-cpp.h (it’s now called SWI-cpp2.h because of how extensive the changes have been).

If anyone has any opinions, please speak up! So far, only @jan has had any comments, so we’ve been discussing in either github reviews or by private emails. The latest version of the changes (not yet a PR) is here: GitHub - kamahen/packages-cpp at jw1

Peter, I for one am very grateful for the work you’ve been doing with the C++ api and with pcre2. This has a great effect, not just because of the quality of the code and the detailed test cases that you do, but it shows everyone that the SWI-Prolog community is alive and continuously improving, this has just as much value as the code itself.

For the C++ code the comments I have at the moment are:

  1. In line 326 and 327, should it not say [[deprecated("use as_uint32_t()")]] instead of [[deprecated("use uint32_t()")]]?
  2. In code like this:
PlTermv(PlTerm_atom("instantiation_error"), t))

I would suggest creating static class members for fixed atoms used in the code. The runtime overhead of calling the constructors when the member function is called can be avoided by simply having predefined static class members that are built only once, perhaps they all can be in the PlAtom class.

  1. From what I read in your comments and see in the code PlCheck(int) is used to check both exceptions and failure, and it is implemented like this:
PlCheck(int rc)
{ if ( !rc )
    throw PlFail();
}

I don’t think it is a good idea to use exceptions to handle the failure case, it is probably okay for true exceptions, but throwing an exception for the failure case in a function that is heavily called will slow things down. Exceptions are much slower than the regular code path. See this post for an example: Avoid exception throwing in performance-sensitive code – Daniel Lemire's blog; if you can avoid using exceptions I think it is better.

A (side) note

Performance now days depends more on the smart use of the processor cache than on the algorithms themselves. You could have a linear search that performs faster than any other search algorithm if you use the CPU caches efficiently. This is specially true because of the pre-fetch done in modern CPUs.

Scott Meyers (the well-known C++ expert) gives a great explanation of it in this talk: code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care - YouTube, it is an extremely useful talk that I think every software developer should see.

For this reason, game developers (whose job depends on performance) use Data Oriented Design, rather than using a class to represent an object. They use Structures of Arrays rather than individual objects. It is a simple concept, but it requires a paradigm shift. Very fast C++ and C code can be written when using this kind of design. Of course I don’t think you can redesign the API at this point (may not even be applicable in this case), but I wanted to mention this, perhaps you are aware of it already.

A great talk on Data-oriented-design is given here: Building a Data-Oriented Future - Mike Acton - YouTube, this second talk puts the meat on Meyers talk on how to actually design fast systems using the CPU caches appropriately. Although it is focused on game design it can be applied in many other areas. I really recommend hearing both talks. Even though they are long, I consider them some of the best talks on software development that I have seen; so I would really recommend watching them. Again, thanks for your work and the contributions to SWI-Prolog!

2 Likes

Thanks for reviewing!

This was also one of my points. Over time I’ve also started to like the approach as the large majority of C(++) code interacting with Prolog is supposed to succeed and I’ve seen a lot of code that simply ignored the return code, not only for stuff that must succeed unless there is an error (e.g., the PL_put_() family), but also for stuff that may fail such as the PL_get_() and even the PL_unify_() family. So, I no longer know what is wisdom. My idea of good code is to use the exceptions surely for the resource errors. The PL_get_() family already gets harder as, if an argument can be an atom or integer, we (in C, sorry) write

    if ( PL_get_atom(A1, &a) )
       ...
    else if ( PL_get_integer(A1, &i)
       ...

But the current C++ interface does not allow for that, so we need a A1.is_atom(), pretty much duplicating the work (find the current engine, dereference the term_t to a pointer, follow the optional Prolog reference chain). In the light of your side node, this is all pretty bad from the point of data design :frowning:
Then, the type check warmed the cache for the actual retrieval, so it might not be so bad :slight_smile:

This is surely relevant. I think not so much for this interface though. There are many places in Prolog’s design were this might make a difference. One of them would be rather trivial: improving the structure layout such that members that are often used together are adjacent. I would assume there are tools for that, but I haven’t found them yet. Several linked lists are probably better replaced by arrays. This is not all that trivial though as many of the data structures are designed for lock free concurrent access.

2 Likes

@swi - thank-you for your kind words. I’ll give brief responses to individual items here and a longer response later.

Thank-you for catching my typos - I hope there aren’t more. There have been quite a few changes over time, and my test cases haven’t quite caught up either. :frowning:

The pre-allocaed PlTerm_atom()s is probably a good idea, although I suspect that the performance impact would be small, because throwing a PlInstantiationError() isn’t likely to happen often, and would actually result in a slight slowdown (and slightly more atom space used) for programs that don’t throw that error. There’s a way of allocating these lazily, so perhaps I’ll do that instead. (@jan – I suppose there should be a mutex around allocating/using global atoms – is it possible to expose an interface for that?)

Your comment about throw PlFail() requires a longer response – @jan has given part of the rationale (he and I have had long email discussions about it), but I’d like to add a bit more (I’ll try to write something later today).

As to efficiency … right now, I’m more worried about correctness than performance; there were quite a few things in the original SWI-cpp.h that could lead to subtle bugs (C++ is very difficult to get right, even for those with a lot of C++ experience). If efficiency is important, you should probably use the C interface (although the overhead of using the C++ PREDICATE() is very small, according to my measurements). For situations where failure paths are common, you can still do return false instead of throw PlFail().

As a general comment about efficiency … I once looked into Quintus Prolog performance on PowerPC, using IBM’s “Red Book” for insights – I found that some things ran at quite different speeds compared to what I had expected by following the books calculations, probably due to subtleties in the cache/memory system (PowerPC (and, more recently, ARM/M1) have a different memory model than x86) … in the past I sped up a finite-state automaton engine by about 10x by careful micro-measurements … it’s an interesting exercise, but needs to be redone for each platform and is a very time-consuming task (e.g., the code got very little performance boost by turning on the compiler’s high optimization on x86 (and by using different compilers), but got an immediate 2x boost on SPARC). I suspect that similar attention to detail would be effective on SWI-Prolog’s engine, but that would require many weeks of intense work.

BTW, I have a question for @jan – is there any documentation on how the “lockless” garbage collection and lock-avoidance between threads works in SWI-Prolog? I recently read this paper on the dangers of atomic operations, and it reminded me that things that work on x86 don’t necessarily work on PowerPC or ARM (I met some of the authors when I was at Google - e.g. Mike Burrows was the author of the “Chubby” lock system that is critical to many Google distributed systems).

2 Likes

I’ve done a simple benchmark of return false vs throw PlFail() … and throw PlFail() is about 15x slower, so clearly return false should be used if failure is common (and I’ve made a note about this in the documentation). As a minor note, the overhead of PlCheck() is about 30% when the compiler doesn’t inline it. (The numbers and how I ran the benchmark are in test_cpp.cpp)

There’s essentially zero cost to the try { ... } in the enclosing PREDICATE (that is, there’s essentially no performance advantage in using the C style of predicate declaration compared to the C++ style).

As a matter of programming style, if performance isn’t an issue, I prefer the throw-catch style; but if you prefer the Go style of checking return codes everywhere, I can’t argue with that – and the C++ interface allows it.

As @jan says, there are methods in the C++ interface that possibly should return a bool instead of throwing an exception, thereby requiring that either the method is surrounded by try-catch or that a test is made before using. At one point, I had two versions of many methods but that got a bit confusing because there are some functions in SWI-Prolog.h (ending in _ex()) that create an exception. Perhaps the confusion could be avoided by a suitable naming convention. (See, for example, PlTerm::unify_nil_ex() and PlTerm::unify_nil().

For now, I suppose that for performance, one could use PL_get_atom(A1.C_,&a) for a boolean result and A1.as_atom() if a non-atom should throw an exception. (There are also some in-between cases, such as getting the characters from an “atomic” term; that is, allowing either atom or string.)

One other thought … C the foreign language interface treats failure and errors in a similar way – the difference is in whether an error term has been set or not. So, it seemed natural, when extending this to C++, to use

if (!A1.is_atom()) throw PlFail();

as being in the same style as

if (!A1.is_atom()) throw PlTypeError("atom", A1);

or

... A1.as_atom() ... // throws PlTypeError if !A1.is_atom()

A few random thoughts on cache and performance.

I turned on PGO on 3 different machines:

Intel Core Processor (Broadwell, IBRS)  3.0 GHz, cache: 16384 KB
AMD Ryzen 7 3700C                       2.3 GHz, cache:   512 KB
Intel(R) Core(TM) i5-7500               3.4 GHz, cache:  6144 KB

Looking at the PGO benchmark times (“Generating …/home/boot.prc”), the 2nd and 3rd were roughly the same (!), but the first one was about 4x faster (!!!). I don’t know what to conclude from this (I don’t know enough about the differences of the internals of the 3 CPUs, nor about the speeds of the caches and main memory), but my guess is that it’s a combination of cache size, cache granularity, and pipeline depth. [All three CPUs got about 5-10% performance improvement from PGO]

One of my test foreign code predicates does the equivalent of p(X):-X=0 … for non-PGO, the foreign code version ran about the same speed as the builtin =/2; but with PGO, the foreign code was about 10% faster (no change for the builtin). Again, I don’t know how to interpret this (maybe the PGO suite needs more samples added to it?).

I’m going to do a few benchmarks of some variants of foreign code, and the results might change some of the new C++ API. (I’m hoping that I can simplify some of the API, if it turns out that some variants have essentially the same performance)

Thanks for setting up the test, good to have numbers that show how much slow down is caused by the exception.

This is the kind of problem that using exceptions for regular flow control cause, nonetheless I do like the solution you thought about:

Probably the non-exception functions don’t need to be done for every case, but perhaps only for the most common ones, like the integer/atom case Jan mentioned?

1 Like

Continuing the benchmarking, I tried the various forms of PL_unify_* with atoms and strings:

  • PL_unify_chars()
  • PL_unify_atom(PlAtom(...))
  • PL_unify(PlTerm(...))

These all had about the same performance except for PL_unify_chars(PL_ATOM, ...), which – to my surprise – was about 20% slower than the others. (PL_unify_chars(PL_STRING, ...) was similar to the other.)

[I saw about 10% variance in benchmark times]

So, it seems that we can simplify everything by getting rid of the methods that call PL_unify_chars() and instead have a unify() method that takes either an atom or a string (PlTerm_string). This would avoid some confusing aspects of the API, and would also avoid some problems with function overloading in C++.

For those who are interested, the code is currently here:

I’m also looking into some template programming, possibly using C++ “traits” to simplify the conversions between internal form an external form (Latin1/UTF8/locale) and marking whether an external form is meant to be an atom, string, list-of-codes, list-of-characters. This would possibly look something like A1.unify(PlString<EncUf8,PlAtom>("foo")), meaning that it’s unifying with the UTF8 encoding of an atom. This kind of wrapper imposes almost no runtime overhead, and it allows defining exactly how it’s implemented (in this case. PL_unify_chars(PL_ATOM}REP_UTF8,3,"foo"). (The main reason for doing it this way instead of flags in the calls is that we can associate the encoding information with the string but without converting to the internal form)

So, it seems that performance isn’t a significant item (except for perhaps PL_unify_chars(PL_ATOM,...)), and it looks as if the interface can be simplified somewhat.

This is not the whole store :frowning: The problem with

PREDICATE(unify_foo_atom_2a2, 1)
{ return A1.unify_atom(PlAtom("foo"));
}

Is that the atom remains registered. So we either get some ugly code to unregister it or we need a PlAtom to unregister if it goes out of scope. That is of course possible, but then we must distinguish how we get the atom. If it is created from a string, it is registered (because the asynchronous atom garbage collector could reclaim it before it is even returned). If it is extracted from a term, it is not (because it is protected by the term and register/unregister is expensive if multiple threads happen to do this on the same atom). Doing the registration handling from C++ through the C interface is slower as we need to cross the API boundary more often (notably requires fetching the current Prolog engine).

Another thing that PL_unify_chars() and friends can do (they currently do not) is check that the term is currently unified to an atom and, if so, do a string compare. If the term is instantiated to something that is not an atom it could immediately fail without creating an atom. Although the current implementation is a bit slow, probably due to a few unnecessary indirections as well as some additional functionality (support for different encodings), it can be faster than creating an atom.

So, it seems that unify_chars() is useful, then. :wink:

It’s, of course, possible to do something like this to automatically unregister the atom upon return but it seems that it’s not worth the complication (also, it’s not clear to me - should the atom be unregistered if unification succeeds, if it fails, or for both situations?):

class PlAtom_local : public PlAtom {
  PlAtom_local(const char *s) : PlAtom(s) { }
  ~PlAtom_local() { PL_unregester_atom(C_); }
};

PREDICATE(unify_foo_atom_2a2, 1) {
  PlAtom_local foo("foo");
  return A1.unify_atom(foo);
}

A PlAtom_local is of course fine. Just, it should not be part of the public interface. People are never going to know when to use PlAtom and when PlAtom_local. The current C interface is designed such that one only needs explicit management of the atom reference count when you hold a reference to an atom_t instance outside the scope of a predicate. As that doesn’t happen very often except for using atoms to compare against constants in the program, there are few cases where you need to worry about these things. For atoms using as constants for the module we use

static atom_t ATOM_length;

...
    { atom_t a;
      
      if ( PL_get_atom(t, &a) && a == ATOM_length )
         ...

      return PL_unify_atom(t2, ATOM_length);
....

install_t 
install_mymodule(void)
{ ATOM_length = PL_new_atom("length");
  ...

The C++ interface makes this a little easier as we have global constructors, so we can write this:

static PlAtom ATOM_length("length");

SWI-Prolog 9.1.0 has the (almost) latest version of SWI-cpp2.h, containing additional changes and simplifications since the last comment in this thread. This thread has been quiet because most of the discussion has been between @jan and myself as comments on PRs plus some private email threads.

Hopefully, I’ve captured the important stuff from these discussions in the updated documentation.

The trickiest bit to fully understand the API is probably the exception handling mechanism, which requires understanding how exceptions are handled in foreign predicates. But SWI-cpp2.h is designed such that typical code can be written without this deeper knowledge; either by methods or functions that “do the right thing” (check results and throw a C++ exception if needed) or by using PlCheck() to wrap functions from the C interface.

The documentation can probably do with some reorganization and rewriting, but my guess is that the missing bits can be easily deduced from the code in SWI-cpp2.h and the foreign predicates documentation.
If anyone feels like reviewing the documentation, that would be nice. :wink:
(If you do volunteer to review the documentation, please check with me first to make sure that there are no pending PRs - right now, there’s one, which adds a few small sections.)

3 Likes

Finally we have a proper C++ interface rather than mine, which was a C++ interface by a C programmer :slight_smile:

It was a good start for an amateur C++ programmer. :stuck_out_tongue: And the number of “professional” C++ programmers is pretty small – it’s a rather esoteric skill.

Those of us who’ve had to deal with C++ a lot have also learned to avoid some of its “features” – and exploiting those features is what mostly caused problems with the initial version. (And my experience with C++ didn’t prevent me from getting bitten by some of these problems, fortunately caught by a combination of advanced compiler warnings and Asan.)

And it was an “interesting” exercise in getting the exception models of SWI-Prolog and C++ could (mostly) work cleanly together.

This project also helped improve the documentation of foreign predicates in general (due to my slowness in understanding some of the fundamental design decisions in SWI-Prolog, which @jan was kind enough to explain to me).

4 Likes

Anyway, to finish up the changes, the biggest missing piece is handling string encodings.

Currently, SWI-Prolog supports:

  • Latin1
  • UTF8
  • “locale” (UTF8 on Linux?, UTF16 on Windows?) - please correct me if I’m wrong. (@mgondan?) This uses wcrtomb() for input (@jan - is mbrtowc() used for output?)

PL_get_chars() defaults to Latin1; Sprintf(“%Ws”) defaults to UTF8 (I think). So, those are probably the best defaults.

@jan has proposed an externally defined SWIPL_DEFAULT_TEXT_INPUT_ENCODING (and OUTPUT) that can be defined before the #include <SWI-cpp2.h> to allow different defaults. If it turns out that whatever we choose for the default encodings is wrong, this can be easily “fixed” by each user defining the default appropriately (although it would be nice to avoid this, for consistency across all packages).

Every method or function that inputs or outputs a std::string (or char*) would also have an optional enc parameter that allows specifying the encoding.
(Functions/methods that have std::wstring or wchar_t* don’t need an encoding)

If there are no objections to this, I’ll update SWI-cpp2.h accordingly.

1 Like

Locale can be anything, including the various esoteric multibyte encodings that used to exist. It just means “use the C runtime library primitives to convert from whatever the locale uses to encode characters in bytes to encode them as wchar_t” This is used both for input and output. Then the problem is not yet solved as the encoding of wchar_t is not defined :frowning: In practice we seem to have the choice between USC-4 (which means an array of 32-bit integers that can represent each Unicode code point) or UTF-16 (seen in only on Windows at the OS level; Java also uses UTF-16 for Unicode). The idea behind the general text encoding is “do the default of the OS”. In practice this is moving towards UTF-8, but we still are not there. While the Linux desktop is practically UTF-8 (unless decided otherwise by the admin or user), the default for servers is typically the C locale (ASCII). In Windows one can now opt for UTF-8, but the default is some code page depending on the region and language config.

For PL_get_chars() it is controlled by a bitwise or of flags, using ISO Latin 1 as default, but that is more for backward compatibility. The meaningful choices are typically REP_UTF8 or REP_MB (Locale), depending on what you want to do with the string. I.e., if you need it to access the OS, you must use REP_MB. If you need it to do some text processing in C, use REP_UTF8, so you can process full Unicode. If you know you live in an ASCII world you can use REP_ISO_LATIN_1. As UTF-8 and ISO Latin one are the same for ASCII, all you gain is a scan to see there are no code points > 127.

Agree. I think ISO Latin 1 is generally the wrong default. I propose to use UTF-8, allowing the user to select one of the others. This should be a compile time choice as the choice dictates how to process the string.

2 Likes

yes, please !

Are we all in agreement? – the default encoding for both input and output will be UTF8. (This means that the defaults for C++ and C will be different)

I think I can change the C++ API so that a warning is output if the encoding isn’t specified … do people think that’s a good idea? This won’t be extremely verbose … e.g., in Python, it’s required to add .decode("UTF8") or .encode("UTF8") when translating between strings (internal form) and bytes (external form), and that’s more verbose than .as_string(EncUTF8) – and similar considerations in C++ (except the encoding/decoding is often mistakenly omitted because the compiler and APIs don’t catch such mistakes.

EDIT: It might be that a wrapper for std::string with an encoding is the best way of doing this after all. I have to think about it. (Also, not sure of the best name: class PlString was in the original C++ API; I changed that to PlTerm_string, to be more clear; but it’s probably a bad idea to reuse PlString<encoding>(...) … I suppose PlStr<encoding>(...) would be OK.)

For those who care, I now have a C++ API for blobs. The code isn’t yet in SWI-cpp2.h – If you’re interested, the new API is at the beginning of file rocksdb4pl.cpp in Pull requests · JanWielemaker/rocksdb · GitHub

If you want to see an example of using the API, search for struct dbref. The API has been designed so that most things default in a reasonable way, and the only thing the programmer needs to write are the creation code (search for make_unique<dbref>), the destructor (in struct dbref), optionally write(), compare() save(), load() methods. For this example code, the way of getting a blob from a predicate parameter is a bit more complicated than usual; normally, a the programmer would just use cast_blob_check<dbref>(A1.as_atom()) to get a pointer to the blob.

(The documentation isn’t very good, I’m afraid, which why I suggest looking at the actual use)

1 Like

Hi @peter.ludemann just to confirm, is the latest C++ API for SWI-Prolog ready for use? Is there a particular branch I should check out?