Changes to SWI-cpp.h

@Jan sent me a private note with some suggestions and questions. I’ll paraphrase his comments here. (And a few I’ll put as “code review” comments)

All classes in the end embed (only) a C handle. Doesn’t it make sense
to have the same method for accessing this, e.g. c, so we can do
PlFunctor f;
...
PL_something(f.c(), ...)

I know it is common to provide access functions for members to keep
binary compatibility, but as this is only inline code I doubt that makes
much sense, so f.c and c as a public member would do as well?

The general convention with C++ is for struct to be for POD (plain old data) that might have some convenience functions and class is for everything else; the biggest difference is whether the default visibility is “public” (struct) or “private” (class); and typically fields in a class follow a convention (suffix “_” at Google, prefix “m_” at GNU_, etc.). The C++ wrappers in SWI-cpp.h seem to be half-way in between – they’re POD in the sense that they just wrap a handle, but they provide quite a few “convenience” functions, so I’ve decided that they should be “class”.

As Jan says, there’s no need for getters/setters for binary compatibility; the question is how to access the underlying handle when needed. The compiler will expand PL_something(f.c(), ...) to PL_something(f.functor_, ...) so efficiency isn’t the issue. In general, when someone wants to access a PL_something() in SWI-Prolog.h, we should want to make that available in SWI-cpp.h (as f.something(...)), so the convenience of accessing the wrapped handle shouldn’t be a significant issue.

Perhaps leave the handle private and access it by C()? Making the method name upper-case would also make it easy to find with a regular text editor.

Why PlAtom::reset() and PlTerm::reset()?

I’m following the style of “smart pointers”, by having a function for setting the value to “null” (for atom_t and term_t, the “null” is 0, although I don’t think it’s documented anywhere). I forgot to include PlAtom::reset(atom_t) and PlTerm::reset(term_t). [I’ve also created “null” constants – they’re “private” for now.].

If we access the wrapped handle using C(), then we need a method for setting the value. If we make the wrapped handle public (as in the original SWI-cpp.h, then there’s no need for C() and reset(). I don’t have a strong opinion on this; the atom and term wrappers aren’t really smart pointers, so it could be confusing to follow the smart pointer conventions.

(We could, of course define a dereference operator (*) instead of C(), but I think that would be confusing because atom_t and term_t don’t always act like pointers in the SWI-Prolog.h functions.

some questions about is_valid()

The idea is that a constructor should be written like this:

PlAtom::PlAtom(const char *text) 
    : handle_(PL_new_atom(text)) { ... }

and not like this:

PlAtom::PlAtom(...) {
    handle_ = PL_new_atom(text);
    ....
}

If the system guarantees that PL_new_atom() always succeeds or else terminates the program with a fatal error, then there’s no need for is_valid() in the constructor.

If PL_new_atom() fails due to out-of-memory, does that always give a fatal error? From looking at various bits of foreign function code and swipl internal code, it appears that a “resource error” is a possibility; and that’s why I added the test to the constructors.

However, there’s a second use for is_valid() – some code uses a “null” value (that is: 0) to indicate that the value hasn’t been set. We could create a operator bool() to check for validity, but there is also a “get_bool” defined for terms (which calls PL_get_bool(), which checks whether the term is an atom with the value true or false (and throws an exception if it’s any other value). This is confusing, so I decided to remove the operator bool() and require the check to be explicit.

[BTW, Jan originally wrote PlAtom constructor an initializer list, but some other constructors weren’t that way and I’ve modified them to use an initializer list where possible]

I’m unsure about the unify_*_ex() functions. Seems a lot of stuff that is barely ever needed. Why not PlTerm::must_be_unbound()? You typically only need that for returning blobs, so it could also be part of a C++ blob infrastructure.

When I was applying my changes to users of SWI-cpp.h, I found a number of places where unification was assumed to succeed. However, Jan told me in a private email that a check is always needed because even when unifying with a newly created term (from PL_new_term_ref(), which is encapsulated in the PlTerm_var constructor), there can be failure, typically from out-of-memory.

So, I annotated the regular unification methods with [[nodiscard]] to catch such errors and added the “must succeed” methods that throw an exception if they fail (with the suffix “_ex”). If my understanding is wrong, I could remove the “_ex” methods, and programmers can use write, e.g., (void)t.unify_term(t2) to turn off the warning.

On further thought, I think that “*” for accessing the underlying atom_t, term_t, etc. works quite well – if you think of it as “unwrapping” rather than “dereferencing” (and reset() is part of the convention for smart pointers, so that fits nicely). So, t.atom().get_atom_t() becomes *t.atom().

If nobody objects, I’ll make this change and update the PRs (together with the other things that @jan has commented on.

As for blobs … I intend to work on an API for them. It’ll probably need at least C++-11 and possibly C++-17, but I assume that’s acceptable (rocksdb already requires C++-17 to build librocksdsb.so.

But why? I might be wrong (no C++ user), but I thought smart pointers are there to deal with life time of the object pointed at by means of reference counters, no? The Prolog handles do not serve that role (possibly except for atom_t where you can (un)register the atom. It is rarely needed though and (un)register at constructor/destructor potentially seriously impact performance on highly concurrent applications). You normally never reset these handles, the same way that it is rare for C variables bound to one of these types to be reassigned.

That is a bit of an open issue. As is, PL_new_atom() cannot fail. It could make sense to allow this to fail when creating huge atoms and malloc fails. For small atoms it makes little sense. It is pretty unlikely you can meaningfully recover without the recovery running out of something again. So yes, there is some reason for an is_valid() test, but it must by simply testing for null. Prolog has no means to verify that some other handle is valid or not. Well, it could test that the value might be valid. The value of that is limited though as it may become invalid asynchronously due to atom-GC.

At least GCC’s C warning for unused return is not fooled by casting it to void :slight_smile: Anyway. Unification in Prolog can almost always fail and the foreign predicate should almost always return with failure (after cleaning up if necessary). Failure of PL_unify() has two reasons: logical failure and resource errors. In 99% of the cases the foreign predicate doesn’t care as it needs to do the same thing. The only exception are predicates that create some resource and make it available to Prolog as a blob. In that scenario we better consider it an error if the output unification fails (logically). I’d implement that is part of the (future) blob wrapper though. As is, term.unify() should return true/false for logical results and raise a C++ exception otherwise (by re-throwing PL_exception(0)).

Note that the C style for dealing with the Prolog API is typically like this if no cleanup is required (which is often not the case as Prolog backtracking will do its job if the sequence fails somewhere in the middle).

  ...
  return ( PL_*() && PL_*() && ... );

The C++ version will involve quite a bit more code because the exception and logical failure cases run different code paths.

This looks like C++ level two :slight_smile: I’m usually in favor of shorter notations though :slight_smile:

I have been guilty of overly complicating things and mis-naming. :wink:

So …

I’ll make the wrapped value (atom_t, term_t, etc.) available as “public” - no need for a “getter” (or “*” operator). For now, I’ll call it “C” for each class (so far, PlFunctor, PlAtom, PlTerm).
PlCompound’s fields will remain private. I haven’t made any changes to PlFrame, PlQuery, PlEngine, etc. (yet). (I’ll probably do minor cosmetic changes there, and remove copy constructors, as per one filed Issue.)

Each class will also have a null and methods is_null() and not_null() - as @jan pointed out, is_valid() has implications beyond just testing for null.

For unification, I think it makes sense to introduce a pseudo-exception PlUnificationFailure. This would be an additional “catch” in the code for PREDICATE and would simply return false – the existing foreign function call would notice if there was also a regular exception and handle things appropriately.
This will allow the C-style code of

  return ( PL_f1(t, ...) && PL_f2(t, ...) && ... );

to be done in C++ as

  t.f1(...);
  t.f2(...);

and without a need to check the return values of unification unless it’s needed for the logic of the code (typically, for non-deterministic foreign predicates).
I’ll also add a PlCheck(…) that can be used to wrap a C function (in SWI-Prolog.h) to cause an exit from the function. This would allow writing the C-style code like this, if the desired methods are missing from SWI-cpp.h:

  PlCheck(PL_f1(t.C, ...));
  PlCheck(PL_f2(t.C, ...));

As for blobs: they’ll be a future PR. There are too many changes already in this PR.

2 Likes

I’ve pushed the changes to the code … the documentation has not been updated.

1 Like

It seems that g++ 12 has problems with SWI-cpp.h – discussion is here: Problems compiling Rocksdb
I intend to install g++ 12 and see if I can reproduce the problems … it takes a while to download and build the compiler, so I don’t know when I’ll be able to do this.

EDIT: I could not reproduce the problem after building and installing g++ 12

Weren’t you using Ubuntu 22.04? gcc-12 is available there. Just the default is gcc-11.

I have Ubuntu 20.04, whose default is gcc-9 and which doesn’t have a gcc-12 in a PPA. (An upgrade has been offered, but last time I tried that, I ended up having to a complete re-install, so I’m not switching to 22.04 right now.)
And my Chromebook uses Debian 11 bullseye, whose default is gcc-10, and for which I haven’t figured out how to add an “unauthorized” PPA (such as for swi-prolog).

We are doing what I hope is a final review of the changes to SWI-cpp.h (it’s now called SWI-cpp2.h because of how extensive the changes have been).

If anyone has any opinions, please speak up! So far, only @jan has had any comments, so we’ve been discussing in either github reviews or by private emails. The latest version of the changes (not yet a PR) is here: GitHub - kamahen/packages-cpp at jw1

Peter, I for one am very grateful for the work you’ve been doing with the C++ api and with pcre2. This has a great effect, not just because of the quality of the code and the detailed test cases that you do, but it shows everyone that the SWI-Prolog community is alive and continuously improving, this has just as much value as the code itself.

For the C++ code the comments I have at the moment are:

  1. In line 326 and 327, should it not say [[deprecated("use as_uint32_t()")]] instead of [[deprecated("use uint32_t()")]]?
  2. In code like this:
PlTermv(PlTerm_atom("instantiation_error"), t))

I would suggest creating static class members for fixed atoms used in the code. The runtime overhead of calling the constructors when the member function is called can be avoided by simply having predefined static class members that are built only once, perhaps they all can be in the PlAtom class.

  1. From what I read in your comments and see in the code PlCheck(int) is used to check both exceptions and failure, and it is implemented like this:
PlCheck(int rc)
{ if ( !rc )
    throw PlFail();
}

I don’t think it is a good idea to use exceptions to handle the failure case, it is probably okay for true exceptions, but throwing an exception for the failure case in a function that is heavily called will slow things down. Exceptions are much slower than the regular code path. See this post for an example: Avoid exception throwing in performance-sensitive code – Daniel Lemire's blog; if you can avoid using exceptions I think it is better.

A (side) note

Performance now days depends more on the smart use of the processor cache than on the algorithms themselves. You could have a linear search that performs faster than any other search algorithm if you use the CPU caches efficiently. This is specially true because of the pre-fetch done in modern CPUs.

Scott Meyers (the well-known C++ expert) gives a great explanation of it in this talk: code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care - YouTube, it is an extremely useful talk that I think every software developer should see.

For this reason, game developers (whose job depends on performance) use Data Oriented Design, rather than using a class to represent an object. They use Structures of Arrays rather than individual objects. It is a simple concept, but it requires a paradigm shift. Very fast C++ and C code can be written when using this kind of design. Of course I don’t think you can redesign the API at this point (may not even be applicable in this case), but I wanted to mention this, perhaps you are aware of it already.

A great talk on Data-oriented-design is given here: Building a Data-Oriented Future - Mike Acton - YouTube, this second talk puts the meat on Meyers talk on how to actually design fast systems using the CPU caches appropriately. Although it is focused on game design it can be applied in many other areas. I really recommend hearing both talks. Even though they are long, I consider them some of the best talks on software development that I have seen; so I would really recommend watching them. Again, thanks for your work and the contributions to SWI-Prolog!

2 Likes

Thanks for reviewing!

This was also one of my points. Over time I’ve also started to like the approach as the large majority of C(++) code interacting with Prolog is supposed to succeed and I’ve seen a lot of code that simply ignored the return code, not only for stuff that must succeed unless there is an error (e.g., the PL_put_() family), but also for stuff that may fail such as the PL_get_() and even the PL_unify_() family. So, I no longer know what is wisdom. My idea of good code is to use the exceptions surely for the resource errors. The PL_get_() family already gets harder as, if an argument can be an atom or integer, we (in C, sorry) write

    if ( PL_get_atom(A1, &a) )
       ...
    else if ( PL_get_integer(A1, &i)
       ...

But the current C++ interface does not allow for that, so we need a A1.is_atom(), pretty much duplicating the work (find the current engine, dereference the term_t to a pointer, follow the optional Prolog reference chain). In the light of your side node, this is all pretty bad from the point of data design :frowning:
Then, the type check warmed the cache for the actual retrieval, so it might not be so bad :slight_smile:

This is surely relevant. I think not so much for this interface though. There are many places in Prolog’s design were this might make a difference. One of them would be rather trivial: improving the structure layout such that members that are often used together are adjacent. I would assume there are tools for that, but I haven’t found them yet. Several linked lists are probably better replaced by arrays. This is not all that trivial though as many of the data structures are designed for lock free concurrent access.

2 Likes

@swi - thank-you for your kind words. I’ll give brief responses to individual items here and a longer response later.

Thank-you for catching my typos - I hope there aren’t more. There have been quite a few changes over time, and my test cases haven’t quite caught up either. :frowning:

The pre-allocaed PlTerm_atom()s is probably a good idea, although I suspect that the performance impact would be small, because throwing a PlInstantiationError() isn’t likely to happen often, and would actually result in a slight slowdown (and slightly more atom space used) for programs that don’t throw that error. There’s a way of allocating these lazily, so perhaps I’ll do that instead. (@jan – I suppose there should be a mutex around allocating/using global atoms – is it possible to expose an interface for that?)

Your comment about throw PlFail() requires a longer response – @jan has given part of the rationale (he and I have had long email discussions about it), but I’d like to add a bit more (I’ll try to write something later today).

As to efficiency … right now, I’m more worried about correctness than performance; there were quite a few things in the original SWI-cpp.h that could lead to subtle bugs (C++ is very difficult to get right, even for those with a lot of C++ experience). If efficiency is important, you should probably use the C interface (although the overhead of using the C++ PREDICATE() is very small, according to my measurements). For situations where failure paths are common, you can still do return false instead of throw PlFail().

As a general comment about efficiency … I once looked into Quintus Prolog performance on PowerPC, using IBM’s “Red Book” for insights – I found that some things ran at quite different speeds compared to what I had expected by following the books calculations, probably due to subtleties in the cache/memory system (PowerPC (and, more recently, ARM/M1) have a different memory model than x86) … in the past I sped up a finite-state automaton engine by about 10x by careful micro-measurements … it’s an interesting exercise, but needs to be redone for each platform and is a very time-consuming task (e.g., the code got very little performance boost by turning on the compiler’s high optimization on x86 (and by using different compilers), but got an immediate 2x boost on SPARC). I suspect that similar attention to detail would be effective on SWI-Prolog’s engine, but that would require many weeks of intense work.

BTW, I have a question for @jan – is there any documentation on how the “lockless” garbage collection and lock-avoidance between threads works in SWI-Prolog? I recently read this paper on the dangers of atomic operations, and it reminded me that things that work on x86 don’t necessarily work on PowerPC or ARM (I met some of the authors when I was at Google - e.g. Mike Burrows was the author of the “Chubby” lock system that is critical to many Google distributed systems).

2 Likes

I’ve done a simple benchmark of return false vs throw PlFail() … and throw PlFail() is about 15x slower, so clearly return false should be used if failure is common (and I’ve made a note about this in the documentation). As a minor note, the overhead of PlCheck() is about 30% when the compiler doesn’t inline it. (The numbers and how I ran the benchmark are in test_cpp.cpp)

There’s essentially zero cost to the try { ... } in the enclosing PREDICATE (that is, there’s essentially no performance advantage in using the C style of predicate declaration compared to the C++ style).

As a matter of programming style, if performance isn’t an issue, I prefer the throw-catch style; but if you prefer the Go style of checking return codes everywhere, I can’t argue with that – and the C++ interface allows it.

As @jan says, there are methods in the C++ interface that possibly should return a bool instead of throwing an exception, thereby requiring that either the method is surrounded by try-catch or that a test is made before using. At one point, I had two versions of many methods but that got a bit confusing because there are some functions in SWI-Prolog.h (ending in _ex()) that create an exception. Perhaps the confusion could be avoided by a suitable naming convention. (See, for example, PlTerm::unify_nil_ex() and PlTerm::unify_nil().

For now, I suppose that for performance, one could use PL_get_atom(A1.C_,&a) for a boolean result and A1.as_atom() if a non-atom should throw an exception. (There are also some in-between cases, such as getting the characters from an “atomic” term; that is, allowing either atom or string.)

One other thought … C the foreign language interface treats failure and errors in a similar way – the difference is in whether an error term has been set or not. So, it seemed natural, when extending this to C++, to use

if (!A1.is_atom()) throw PlFail();

as being in the same style as

if (!A1.is_atom()) throw PlTypeError("atom", A1);

or

... A1.as_atom() ... // throws PlTypeError if !A1.is_atom()

A few random thoughts on cache and performance.

I turned on PGO on 3 different machines:

Intel Core Processor (Broadwell, IBRS)  3.0 GHz, cache: 16384 KB
AMD Ryzen 7 3700C                       2.3 GHz, cache:   512 KB
Intel(R) Core(TM) i5-7500               3.4 GHz, cache:  6144 KB

Looking at the PGO benchmark times (“Generating …/home/boot.prc”), the 2nd and 3rd were roughly the same (!), but the first one was about 4x faster (!!!). I don’t know what to conclude from this (I don’t know enough about the differences of the internals of the 3 CPUs, nor about the speeds of the caches and main memory), but my guess is that it’s a combination of cache size, cache granularity, and pipeline depth. [All three CPUs got about 5-10% performance improvement from PGO]

One of my test foreign code predicates does the equivalent of p(X):-X=0 … for non-PGO, the foreign code version ran about the same speed as the builtin =/2; but with PGO, the foreign code was about 10% faster (no change for the builtin). Again, I don’t know how to interpret this (maybe the PGO suite needs more samples added to it?).

I’m going to do a few benchmarks of some variants of foreign code, and the results might change some of the new C++ API. (I’m hoping that I can simplify some of the API, if it turns out that some variants have essentially the same performance)

Thanks for setting up the test, good to have numbers that show how much slow down is caused by the exception.

This is the kind of problem that using exceptions for regular flow control cause, nonetheless I do like the solution you thought about:

Probably the non-exception functions don’t need to be done for every case, but perhaps only for the most common ones, like the integer/atom case Jan mentioned?

1 Like

Continuing the benchmarking, I tried the various forms of PL_unify_* with atoms and strings:

  • PL_unify_chars()
  • PL_unify_atom(PlAtom(...))
  • PL_unify(PlTerm(...))

These all had about the same performance except for PL_unify_chars(PL_ATOM, ...), which – to my surprise – was about 20% slower than the others. (PL_unify_chars(PL_STRING, ...) was similar to the other.)

[I saw about 10% variance in benchmark times]

So, it seems that we can simplify everything by getting rid of the methods that call PL_unify_chars() and instead have a unify() method that takes either an atom or a string (PlTerm_string). This would avoid some confusing aspects of the API, and would also avoid some problems with function overloading in C++.

For those who are interested, the code is currently here:

I’m also looking into some template programming, possibly using C++ “traits” to simplify the conversions between internal form an external form (Latin1/UTF8/locale) and marking whether an external form is meant to be an atom, string, list-of-codes, list-of-characters. This would possibly look something like A1.unify(PlString<EncUf8,PlAtom>("foo")), meaning that it’s unifying with the UTF8 encoding of an atom. This kind of wrapper imposes almost no runtime overhead, and it allows defining exactly how it’s implemented (in this case. PL_unify_chars(PL_ATOM}REP_UTF8,3,"foo"). (The main reason for doing it this way instead of flags in the calls is that we can associate the encoding information with the string but without converting to the internal form)

So, it seems that performance isn’t a significant item (except for perhaps PL_unify_chars(PL_ATOM,...)), and it looks as if the interface can be simplified somewhat.

This is not the whole store :frowning: The problem with

PREDICATE(unify_foo_atom_2a2, 1)
{ return A1.unify_atom(PlAtom("foo"));
}

Is that the atom remains registered. So we either get some ugly code to unregister it or we need a PlAtom to unregister if it goes out of scope. That is of course possible, but then we must distinguish how we get the atom. If it is created from a string, it is registered (because the asynchronous atom garbage collector could reclaim it before it is even returned). If it is extracted from a term, it is not (because it is protected by the term and register/unregister is expensive if multiple threads happen to do this on the same atom). Doing the registration handling from C++ through the C interface is slower as we need to cross the API boundary more often (notably requires fetching the current Prolog engine).

Another thing that PL_unify_chars() and friends can do (they currently do not) is check that the term is currently unified to an atom and, if so, do a string compare. If the term is instantiated to something that is not an atom it could immediately fail without creating an atom. Although the current implementation is a bit slow, probably due to a few unnecessary indirections as well as some additional functionality (support for different encodings), it can be faster than creating an atom.

So, it seems that unify_chars() is useful, then. :wink:

It’s, of course, possible to do something like this to automatically unregister the atom upon return but it seems that it’s not worth the complication (also, it’s not clear to me - should the atom be unregistered if unification succeeds, if it fails, or for both situations?):

class PlAtom_local : public PlAtom {
  PlAtom_local(const char *s) : PlAtom(s) { }
  ~PlAtom_local() { PL_unregester_atom(C_); }
};

PREDICATE(unify_foo_atom_2a2, 1) {
  PlAtom_local foo("foo");
  return A1.unify_atom(foo);
}

A PlAtom_local is of course fine. Just, it should not be part of the public interface. People are never going to know when to use PlAtom and when PlAtom_local. The current C interface is designed such that one only needs explicit management of the atom reference count when you hold a reference to an atom_t instance outside the scope of a predicate. As that doesn’t happen very often except for using atoms to compare against constants in the program, there are few cases where you need to worry about these things. For atoms using as constants for the module we use

static atom_t ATOM_length;

...
    { atom_t a;
      
      if ( PL_get_atom(t, &a) && a == ATOM_length )
         ...

      return PL_unify_atom(t2, ATOM_length);
....

install_t 
install_mymodule(void)
{ ATOM_length = PL_new_atom("length");
  ...

The C++ interface makes this a little easier as we have global constructors, so we can write this:

static PlAtom ATOM_length("length");

SWI-Prolog 9.1.0 has the (almost) latest version of SWI-cpp2.h, containing additional changes and simplifications since the last comment in this thread. This thread has been quiet because most of the discussion has been between @jan and myself as comments on PRs plus some private email threads.

Hopefully, I’ve captured the important stuff from these discussions in the updated documentation.

The trickiest bit to fully understand the API is probably the exception handling mechanism, which requires understanding how exceptions are handled in foreign predicates. But SWI-cpp2.h is designed such that typical code can be written without this deeper knowledge; either by methods or functions that “do the right thing” (check results and throw a C++ exception if needed) or by using PlCheck() to wrap functions from the C interface.

The documentation can probably do with some reorganization and rewriting, but my guess is that the missing bits can be easily deduced from the code in SWI-cpp2.h and the foreign predicates documentation.
If anyone feels like reviewing the documentation, that would be nice. :wink:
(If you do volunteer to review the documentation, please check with me first to make sure that there are no pending PRs - right now, there’s one, which adds a few small sections.)

3 Likes