Changes to SWI-cpp.h

In another thread, the topic of SWI-cpp.h (a C++ wrapper around the C interface for foreign predicates) has come up; and it appears that we need to make some changes.

So, my first question is: who uses SWI-cpp.h? So far, I only know of the pack(rocksdb) and swipl-win.

At a high level, these are my proposed changes (so far):

  1. Remove the use of the assignment operator for unification, replacing by unify() and unify_ex() methods.
    • the signature (returning a bool instead of a PlTerm&) is unexpected and error-prone.
  2. Replace most of the overloaded PlTerm constructors “tagged” constructors. (*)
    • The underlying C interface uses unsigned long for atoms and terms, so the compiler can’t distinguish between creating a term from an atom and creating a term from a uint64_t or size_t.
  3. Remove most of the user-defined conversion operators and replace them by “getTYPE()” methods
    • The implicit conversions have hidden some typos.
    • The conversion operators have the same problem that constructors have, namely the compiler isn’t able to distinguish between term_t, atom_t, etc. and unsigned long or uint64_t. (**)
  4. Remove the == operator and replace by eq() with a tag
    • Again, because the compiler can’t distinguish between term_t and uint64_t.
  5. Some housekeeping changes, such as changing int to bool and removing dangerous copy constructors and assignment operators.
  6. Add a wrapper for atom “blobs”.

In making these changes, I will update pack(rocksdb) and swipl-win. If there is other code that depends on SWI-cpp.h, please tell me about them and I’ll make the necessary changes there.

Of course, if people disagree with what I’m proposing, please speak up! - either here or by filing and issue: Issues · SWI-Prolog/packages-cpp · GitHub
My general approach to C++ is to avoid fancy features as much as possible, and to follow the recommendations of the Google C++ Style Guide (but allowing exceptions, because they greatly simplify SWI-cpp.h).

(*) This is a simpler version of the “tags” that the Boost library uses. An extra argument is added so that the compiler can overload the constructor, for example:
PlTerm::PlTerm(term_t t, const asTermT_tag&)
PlTerm::PlTerm(uint64_t, const asUint64T_tag&)
and then used by:
PlTerm p1(t, asTermT);
PlTerm p2(1234, asIntT);

(**) An alternative would be to edit SWI-Prolog.h, replacing term_t by struct{uintptr_t t} (and similarly for atom_t, functor_t, etc. … this would require extensive editing of the SWI-Prolog source code. It would possibly catch some typos in the code, but it’s unlikely that such typos exist, or are significant. (BTW, in some of my code I accidentally wrote atom_t instead of term_t, but that didn’t break anything; it merely made my code a bit inscrutable.

3 Likes

What would that do? Throw an exception if there is one? I think unify should always do that, so the C++ version is roughly

  rc = PL_unify(t1, t2);
  if ( !rc && (ex=PL_exception(0)) )
     <throw ex as C++ exception>
  return rc;

I use it for the R <-> swipl interface, but I don’t mind if you change some things.

Similar to the PL_XXX_ex() functions … if the action fails, throw an exception. This is for the situations where unify should always work, and that way I can add a WUNIFY to help catch errors.

if ( !unify(t2) )
    throw PlInstantiationError(*this);

If you’ll point me to the code, I can update it as I make my changes. (This will also give me a larger set of test cases.)

My guess:

Exactly, thanks.

I’ve edited my proposal (the first posting in this thread).

  • Removed most of the user-defined conversions, replacing them by “getTYPE()” methods
  • Use “tags” for some constructors (and not using “factory” methods)

The reason is simple: The compiler can’t distinguish between uint64_t and term_t.

It’ll take me a bit of time to propagate these changes through rocksdb4pl.cpp and the swipl-win package … if it turns out to be ugly, I’ll reconsider this design choice.

I doubt there are many such situations. The few exceptions are foreign predicates with a “–” argument (must be uninstantiated at call time). Otherwise arguments may always be instantiated and thus we must check the return value. The only exception is unification to temporary created variables inside the foreing implementation, but these can only fail with an exception.

The first scenario does make sense though, in particular with blobs. Maybe some other name? Like unify_strict_output(Arg, Value). On the other hand, these unifications barely ever use PL_unify() itself and almost exclusively PL_unify_blob(). If you make an abstract blob class anyway, adding a method there to unify with an output argument may be better.

I found some instances of “unify” without a test in swipl_win. From a quick look, they appear to be either foreign predicates with a “-” argument or it had already been determined that the argument was a variable (using PlTerm::type()).

EDIT: I missed the use of BUF_STACK; however, it still seems that std::string is a preferred way of doing things, even if it involves an extra copy, because a std::string can contain nulls.

Thinking about char* vs std::string … I’m a bit confused about when to use PL_STRINGS_MARK and PL_STRINGS_RELEASE … it might be better to remove (or deprecate) the methods that return a char* and instead return a std::string, which would contain a copy of the data. (Are some of the methods obsolete or using parts of SWI-Prolog.h that have been superseded? - the first check-in of SWI-cpp.h was 22 years ago (!))

(Similarly std::wstring, and I’d need to think a bit about encoding because that’s not in the standard C++ library but is in SWI-Prolog (at least, to/from UTF8, which is probably all we care about.)

It is still not correct to ignore the return value. Unification may cause a stack overflow (and I think attributed variables are also considered a variable).

That is hard. From the viewpoint of a simple interface returning std:string() is surely the way to go. The PL_get_chars() functions return a len/char pair and together with the string mark/release functions avoid copying when possible. Even when they copy they avoid malloc()/free() (new/delete) for the first few strings, reusing small preallocated buffers. Only when there are too many strings that are not released or the strings are large it may allocate them. That reduces fragmentation and (notably multi threaded) performance.

One could go for the simple std::string() api, still allowing time critical applications to use the plain C api.

That’s what I was thinking of (actually, both std::string and std::wstring … and it’d probably be useful to have something that translates to/from UTF8.
But first, I need to get the current API working, and remove the ambiguities and pitfalls with size_t, term_t, etc.

1 Like

The implicit conversions and constructors have turned into a bit of a nightmare – when I make one thing work nicely, another thing breaks. (Whac-a-Mole)

I’m going to remove the implicit conversion (I had them marked as “deprecated”, but that didn’t prevent getting errors for ambiguous overloaded calls), and instead have explicit “get_XXX()” methods. This will make the C++ code a bit more verbose, but it’ll avoid some subtle bugs. I’m using swipl-win and rolog to verify that things work (plus I’m adding unit tests).

The main problem is PlTerm, which has an implicit conversion to term_t and also to various kinds of “int”; but other classes have similar problems.

1 Like

I’ve finished the changes to SWI-cpp.h, except for the documentation, a bit of clean-up and some more test cases.
I’m not happy with what I’ve done – it feels too Java-ish – but it’s not horrible (and the old API proved itself to be susceptible to some subtle and nasty bugs). I’ll probably have the PR ready for review in a couple of days.

Here are the changes to SWI-cpp.h:

And here are the changes to users of SWI-ccpp.h. There are some changes I can make so that the API is less verbose, especially with rocksdb, but for now I’m just doing minimal changes.

1 Like

Thanks. I’ve glanced a little through the code and doc changes, also considering Use PL_BLOB_WCHAR instead of test for ==&ucs_atom by kamahen · Pull Request #1016 · SWI-Prolog/swipl-devel · GitHub

Changing the implicit conversions to An.get_XXX() seems a pity, but indeed hard to avoid. The only way I see would be to turn term_t, atom_t, and similar integer handles to Prolog objects into pointers. That was discussed before. It still might be something to consider … I consider the get_Type() notation ugly, but I’m happy to go with it if this is what C++ users want. On the other hand, if I look at std::string we see .c_str(), so why do we not simply use A1.string() and A1.uint64(), etc.?

The need for tags, as in A1.unify(42, AsInt) is clear for all the integer types. Can’t we leave them out for (notably) strings, providing implicit constructors and maybe also conversion for char* and std::string?

For C++ users, please step in and provide feedback. Also on how to move on. As this is seriously incompatible, what do we do? Force people to update their code? Provide two versions of the include file, e.g. keep SWI-cpp.h and add a new file (what name)?

For both this and the above mentioned PR, I think it is not a good idea to (re)define text atom/blob types. It is also not going to work as the ordering of blobs with different types is undefined, but Prolog ordering of atoms does require that all text atoms are compared as a sequence of Unicode code points. As a result the blob API is in part bypassed for atoms. Future versions are not unlikely to replace the current two atom types with a single UTF-8 based type. This most likely simplifies the code base. I’m inclined to include the documentation part of the mentioned PR and skip the code changes.

I think that these could be turned into structs that wrap the uintptr_t, although that might have performance effects (it shouldn’t, but I don’t know how good the various C compiler optimizers are). I tried making this change for term_t, but it looked as if it would be quite a bit of work. If you think that this is worthwhile, then let’s start a separate thread to discuss it.

I ran into a strange implicit conversion error between int and char*, and rather than figure out what was going on, I just added tags to all the C types (but not to C++ types such as PlAtom). Possibly this was too strict – I can try removing the tags for char* and see if it still works. (The weirdest implicit conversion I found was when term_t (or maybe atom_t) was converted to float! … I didn’t figure out how that happened …)

The only reason for not using std::string everywhere is that it involves an extra copy operation. char* is one of those weird left-overs from C that’s used everywhere, so for now, I’d prefer leaving std::string out as the default and instead of c_str() as the standard. (Some of the copying can be avoided by using “move constructors”, but that’s a fairly new feature that I don’t fully understand, so I don’t want to use it – yet).
[Note that std::string::c_str() may also require a copy.]
[There’s also std::wstring, which is defined as basic_string<wchar_t> … that could be used to replace my get_wchar_t()]
[As far as I know, cout << "Hello" << endl passes a char* to the stream and doesn’t use a default constructor for std::string (which would - probably - require a string copy of some kind). I might be wrong, but I don’t know how to verify this. If it matters, I know a few people working on C++ compilers and I could ask one of them.]

In general everything that accepts a char* should also accept a std::string, to avoid copying overheads. See also: abseil / Strings Library

You mean: remove the “get_” prefix? That seems fine, but I suppose we would still need A1.get_type_t(). I can leave the conversion operator in (it’s currently “deprecated”), but that results in the somewhat ungainly static_const<term_t>(A1) … the C-style (term_t)A1 is allowed by is considered bad style; however, I just learned about the “{}” form in C++, so term_t{A1} can be used, and I can un-deprecate the term_t conversion operator … should I also un-deprecate the other conversion operators and get rid of the “get_XXX” forms? (or maybe allow both)

Good to know … I’ll change some of my tests. This will delay my changes to SWI-cpp.h being submitted (I’m adding tests to both the existing test_ffi.c and to the new test_cpp.c).

EDITED: added some clarification about char* and std::string.

I edited my previous post to clarify some things.

Probably I can get rid of the asCharP tag; if not, it should be named asCStr. (Similarly for wchar_t) And the asString tag might also be removed.
I’m not sure if we can make these into implicit conversions or should make them explicit. I am not aware of a tool that shows what conversions are done, and the rules are not trivial, as documented in Implicit conversions - cppreference.com and IBM Docs
So, for now, my inclination is to make the conversion operators explicit in most places.

One other possible improvement is to create a parallel interface to SWI-Prolog.h, using PlAtom, PlTerm, etc. that gives the effect of implicit conversions from PlAtom to atom_t and PlTerm to term_t, etc. For example:

[[nodiscard]] inline bool PL_get_string(PlTerm t, char **s, size_t *len) {
    return PL_get_string(term_t{t}, s, len);
}

However, this would be better done by a method (in this case, a conversion operator, possibly marked as explicit):

void operator std::string() {
  char *s;
  size_t len;
  if ( !PL_get_string(term_t{*this}, &s, &len)
    throw PlResourceError();
  return std::string(s, len);
}

I think that most of the functions in SWI-Prolog.h can be treated this way, but it’s a bit of work. (Some, like this one, have already been done.)

We had this discussion. A struct with a uintptr_t doesn’t work because equality and non-zero testing doesn’t work unmodified in C. Also zero-initializing changes, so this has a lot of impact on all foreign code written for SWI-Prolog. There is also the worry that the ABI on some systems will introduce less efficient parameter passing. This is a no-go.

The only solution I see is a pointer. That could work, especially for term_t which effectively is an offset in the environment stack, so all we have to do is add/subtract the base of the environment stack. fid_t is also an offset into the environment stack. qid_t is a pointer these days. Another one is atom_t. This is a more complex beast as it is a tagged offset into the atom array, so if you have a pointer you must turn it first into an uintptr_t and then do the arithmetic. I’m not against such a change as it improves the internal type safety. If the price in terms of coding is worth it (might be just adjusting macros) and the performance impact is neglectable it might be worth it.

Yes, remote the “get_”. I guess the other one can be A1.type(). Good to see the term_t{A1}. That seems usable. I think it is ok to have functions for all conversions and conversion operators for the unambiguous ones (i.e., the strings and double/float; all the rest are integral as far as I know where the sizes depend on the OS/C-compiler, so they get ambiguous somewhere).

PL_get_string() doesn’t exist? Otherwise I don’t see what you are getting at. The idea of the C++ interface is to

  • make type conversion easier
  • transparently deal with errors in both directions
  • deal with cleanup (not needed a lot anyway)

Other than that one can just as well use the C interface from C++ IMO. I guess from these three, the error handling is the most vital because that is what makes the C interface clumsy.

I was assuming that the struct would need to be “de-structed” for things like testing for zero (that is, the zero test would need to become explicit). Initialization should still work, I think …

I can do that. Unless anyone objects, I’ll leave the conversion operators as deprecated – they’re more verbose than the “get” methods, so probably won’t be used in new code (because I need to make at least some of them explicit).

I think that floats also need to be explicit because of default conversions between floats and integral types. (At least, that’s my best guess why at one point I had a bug that made all my atoms display as floating point numbers).

I think that void* also needs to be explicit – I’ll have to experiment a bit to find out whether it can get misinterpreted as char*.

If we could do implicit conversion from PlTerm to term_t, then this we could do:

PlTerm t; // assume t gets a value somehow
char *s; size_t len;
if (!PL_get_string(t, &s, &len)) ...

instead of

if (!PL_get_string(term_t{t}, &s, &len)) ...

By defining an overloaded PL_get_string(), we can avoid the need for the explicit cast, and there’s no ambiguity.
Of course, in this particular situation, there’s a method that does the call and throws an exception on failure … the reason for the parallel interface to SWI-Prolog.h is to provide convenient interfaces to any that have been missed, either because it’s not obvious what class to put them into (e.g. because there are a mixture of term_t, atom_t, module_t, etc. in the call) or because they’re “rare” and the the SWI-cpp.h maintainer hasn’t created the appropriate method(s) to wrap them.