Confusion with C++ interface and Prolog lists

This is the C++ interface (slightly simplified):

PlTerm_atom::PlTerm_atom(const char *text) { PL_put_atom_chars(unwrap(), text); }

and PL_put_atom_chars() does the equivalent of PL_new_atom(), PL_put_atom(), PL_unregister_atom().

There are also variants for wchar_t*, std::string and std::wstring.

It should be noted that if you do something like:

void some_function(PlTerm t) {
    static PlAtom ATOM_read("read");
    ...
    if ( t.as_atom() == ATOM_read) ...
    ....
}

then ATOM_read is initialized the first time that some_function() is called and t.as_atom() throws an exception if t isn’t an atom.
It’s also possible to put the static PlAtom ATOM_read("read") outside the function, where it’ll be initialized on loading; but under some circumstances (which I haven’t figured out) this crashes, even though the SWI-Prolog runtime is designed to allow arbitrary order of global initializes.

I don’t think it is as bad as you fear. It is probably not a perfect match though. First of all, while the data referenced by a term_t follows rules that are not very C(++) friendly, term_t itself is much easier. They are scoped to a foreign frame. One is created when Prolog is loaded (PL_initialise()), each foreign predicate invocation involves one (if the predicate is non-deterministic, the “redo” call has its own independent foreign frame, i.e., the foreign frame of the first call is destroyed even if the first call signals non-deterministic success). Finally, users can create a frame using PL_open_foreign_frame()/PL_close_foreign_frame() or, in C++, using PlFrame.

So, I think that using a destructor that calls PL_free_term_ref() is fine unless one would use new PlTerm(), but allocating term_t outside the stacks is a very bad idea anyway.

That is fine as it doesn’t create the atom explicitly. The only gotcha is that this function takes text as being ISO-Latin-1 encoded. That is another story :slight_smile:

That is also fine. The use case that started this is interesting: how to create a Prolog list of atoms from a C++ array of strings without exhausting term_t allocation?

I think it’s a bad idea to add a destructor to the current PlTerm (backwards compatibility); but it should be easy enough to add a variant constructor PlTerm_new that has a destructor that calls PL_free_term_ref(). I’ll have to think a bit about it … and write test cases of course.

The primary use case would be inside a loop. Otherwise, the Prolog frame will take care of things.

I don’t see the backward compatibility issue. If the C++ scope is left, you can’t touch it any more, can you? There could be a problem with PlTerm() that you create from a term_t. Notably, you cannot free the term_t that you get from a predicate argument. As these are created by the wrapper, I’d assume we can avoid that the destructor is called on them. If nothing else works I could make PL_free_term_ref() be a no-op on predicate arguments (right now it aborts with an API error). That is probably a bad idea though.

I think this would make the interface more intuitive and to me, the main issue is whether the slowdown would be noticeable.

I think that if you do PL_free_term_ref() in the destructor for PlTerm, then it needs to be reference counted, and that gets a bit tricky (the code would be similar to what std::shared_ptr<...> does).
The reason is simple: without reference counting, the following code would result in calling PL_free_term_ref() twice on each term_t:

PREDICATE_NONDET(range_cpp, 3)
{ auto t_low = A1, t_high = A2, t_result = A3;
  ...
}

and there are other situations where a PlTerm is assigned or passed as a parameter (that is, parameters to a predicate aren’t the only problem).

It’s probably easier to add a constructor PlTerm_new whose destructor calls PL_free_term_ref() - and that disallows assignment and a few other operations.

I think that the main use case for a PlTerm with a PL_free_term_ref() destructor is when a term_t is created inside a loop, correct?

What does this do? Does this create a PlTerm that has the same term_t as A1? In that case we would indeed have a problem if we destroyed it. Looks indeed that this needs reference counting :frowning:

There are two cases where we need to be careful about term_t handles. One is where we build large terms from C(++). You find such examples in e.g., the sgml parser or the Janus Python interface. Here we map external data to a possibly huge Prolog term and we have helper functions that need to create term_t references to get their work done. As is, this typically uses PL_reset_term_refs() to discard all term references created after the first one in the function. That works fine, but it is a delicate API that is easily misused, leading to weird errors (although my new runtime checks will spot many of them).

The other happens if the control remains in C(++) and the C(++) code makes regular calls to Prolog. We need to make sure that we are not building up handles. Typically we do this by creating a foreign frame, but this too is IMO not a very nice interface. People tend to miss this, eventually running out of stack. Although not common, such code may wish to keep handles to certain terms for a long time, providing something similar to backtrackable global variables in Prolog.

I had hope it would be possible to relief the C++ user from these details at the cost of a few extra CPU cycles … We all know that the C API is efficient, but too much based on the Prolog machinery and thus requires deep understanding about the life cycle of Prolog data :frowning:

PlTerm is a struct that contains a term_t (as if it were declared struct { term_t C_; } in C), with additional “methods”. So, auto t_low = A1 is the essentially the same as PlTerm t_low=A1, which simply copies the term_t value. In C++, when t_low goes out of scope, it calls the destructor – currently the destructor does nothing but @jan has proposed that it call PL_free_term_ref(). To do this correctly would require reference counting, I think.
A variant of assignment is PlTerm t_low(A1) – this also creates a new object that wraps a term_t, but uses the “copy constructor” instead of assignment; the net result is the same as auto t_low=A1. A copy constructor is also invoked when an object is passed as an argument to a function, or as a return value. (In theory, a copy constructor could be different from the assignment operator, but in practice they’re usually the same.) There is also a “move assignment” and “move constructor” which allow defining an optimized version of copying for certain situations – SWI-cpp2.h doesn’t define any move operations, so they default to the copy operations.

C++ allows overriding the copy operations that are done by assignment and function calls. It’s also possible to define a class that doesn’t allow copying – that’s what I was thinking of with the PlTerm_new constructor that would have a destructor that calls PL_free_term_ref().

C++ has two “smart pointers” that wrap an ordinary C pointer and handle automatic deletion: shared_ptr and unique_ptr (they both override assignment and the “copy constructor”). Shared_ptr has a reference count, so the underlying pointer is deleted only when its reference count goes to zero. Unique_ptr ensures that a pointer has only a single “owner”, by transferring ownership on copy and zeroing the source – it doesn’t allow assignment. For example:

std::unique_ptr<MyStruct> p(new MyStruct);
std::unique_ptr<MyStruct> q;
// Not allowed: q = p;
q.reset(p.release());

is roughly equivalent to:

MyStruct *p = malloc(size of MyStruct);
MyStruct *q;
q = p; p = 0;
   ...
free(q);
free(p); // Does nothing because q==0

My proposal of PlTerm_new is that it would have semantics similar to unique_ptr. (Unique_ptr has a few other operations: get(), which is dangerous and accesses the underlying pointer; release(), which explicitly transfers ownership; reset(), which sets the underlying pointer.) It’s relatively straightforward to implement these correctly.

Alternatively, we could make PlTerm into a reference-counted term_t. It’s trickier to get everything correct, but it can be done. (The source code for shared_ptr is fairly unreadable, because it’s quite general; but there exist some simplified implementations, such as GitHub - SRombauts/shared_ptr: A minimal shared/unique_ptr implementation to handle cases where boost/std::shared/unique_ptr are not available.)

As an alternative, it might be possible to simply document how to use unique_ptr and shared_ptr to do what we want (both of these can define a custom “deleter”, for example). I’d have to think a bit about that possibility.

Thanks for the detailed description. I thought C++ was simple :slight_smile: I think the best user experience would be shared pointer semantics, where we ideally give the term_t that you get from the predicate arguments an extra reference count, so these are never deleted. Possibly these could also use a sub type. Besides that you cannot delete them, you are also not allowed to call any of the PL_put_*() functions on them or use any of the - maked arguments in the API functions, such as the 2nd or 3rd argument of PL_unify_list(). The current runtime checking protects against such usage.

It could however be that the performance implication of using shared pointers is too much (I doubt that, but I’ve recently learned again that guessing performance implications is non-trivial). In that case (or if the shared semantics gets too complicated) a PlTerm_new() is probably as good as it gets … If you turn these into unique pointers, can you still pass them to helper functions and use them in the main function after the helpers complete?

This won’t work with existing code that passes PlTerm as value, and not as pointers or references.

For a “unique pointer” semantics, it’s possible to get the underlying pointer (or term_t in this case) and also to pass ownership; but assignment isn’t allowed. (For unique_ptr, these are the get(), release(), reset() methods).

I’ll have to think more about how to do this, without breaking existing code. (And I don’t want to create a SWI-cpp3.h; two are enough.)

1 Like

Using a hypothetical PlTermScoped whose constructor calls PL_new_term_ref() and whose destructor calls PL_free_term_ref(), this code becomes (untested):

bool
unify_atom_list(PlTerm list, std::vector<std::string> array)
{ PlTermScoped tail(list); // calls list.copy_term_ref()
  for( auto item : array )
  { PlTermScoped head; // var term
    PlCheckFail(tail.unify_list(head, tail));
    PlCheckFail(head.unify_chars(PL_ATOM, item));
  }
  return list.unify_nil(); // This is missing from the C code
}

(I don’t see an obvious way of automatically calling PL_copy_term_refs().)

@jan – does this seem reasonable?

Good question. You are the C++ expert :slight_smile: I had the hope we could get rid of the scoping issues and leave that to the compiler. True, using a hypothetical “scoped term” rather than the C way to explicitly having to free it is a step ahead.

Anyone who claims to be a C++ expert (except for Scott Meyers, Bjarne Stroustrup, and a few others), is lying. :slight_smile:
I didn’t start programming in C++ by choice.

I don’t see how that’s possible, given that C++ doesn’t have garbage collection and also has its own stack that is different from Prolog’s stack (or, rather, Prolog’s stack is different from most programming languages, because it allows backtracking). There is a reference-counted pointer in C++ (shared_ptr), but its implementation is somewhat complex; it’s possible to do something similar with PlTerm.

I have an experimental implementation of a scoped PlTerm. I’ll submit a PR soon.

Some questions:

  • Is there a way to test that the Prolog stack doesn’t grow?
  • Do we want a “scoped PlAtom” that can be garbage-collected? If so, what would the C code look like?

[BTW, scoped_ptr is an older terminology in C++, which became unique_ptr, to distinguish it better from the reference-counted shared_ptr]

You can call PL_new_term_ref() from the C(++) test. The numbers should not keep increasing. You can (should) immediately discard the reference using PL_reset_term_refs().

I don’t think so. Under normal circumstances raw atom handles are typically only created once and reused to compare or unify. They normally are the “symbols” used by the extension. To deal with application data you normally directly unify a term to the atom handle derived from some string or you get the string from an term immediately. Note that if you get an atom handle from a term it is not locked and it doesn’t need to be as it is protected by the Prolog term in which it appears. Atom reference counting only applies if you create an atom from a string.

Here’s an example of where we might want a scoped PlAtom:

A1.unify_term(PlCompound("hello", PlTermv(PlAtom("world"))));

This would (I think) create an atom world that will never be garbage collected.

Instead, we might want:

A1.unify_term(PlCompound("hello", PlTermv(PlAtomScoped("world"))));

PlTerm (and term_t) avoids this problem with PlTerm::unify_chars(), I think; but I don’t see an obvious way to extend this to PlTermv.

Here’s another situation (a bit artificial), where I think the atom can’t be garbage collected, and where PlAtomScoped would solve the problem:

PREDICATE(foo, 2)
{ PlAtom atom_a1(A1.as_atom()); // calls PL_get_atom_ex()
  return A2.unify_atom(atom_a1);
}

Here’s my first cut at a “scoped” PlTerm. There’s no documentation (yet) and, depending on how this discussion goes, I might add PlAtomScoped:

(test_cpp.cpp has some example code added; this code fixes a few errors in unify_atom_list() that @jan and I wrote earlier in this thread.)

I see. The C way is to create a term vector of a given size and use PL_unify_chars() to fill it. You can do that using C++ as well (I guess), but it is less elegant.

This should not be a problem. Assuming PlTerm.as_atom() performs PL_get_atom_ex, the atom does not gain a reference count and all works fine.

It appears that I PlTermScoped requires C++-17 to support a PlTermScoped::release() method. So far, this has only affected packages/swipl-win; is it likely to be a problem? (The CMakeLists.txt for packages/cpp already specifies C++-17, so I presume it’s not a problem.)

I’ve added documentation and some more tests and sample code to the experimental implementation of PlTermScoped … I would appreciate it being reviewed by anyone with C++ experience (despite what @jan says, I am not a C++ expert).

1 Like

I’ve made an important change: for constructing a PlTermScoped object from a PlTerm object, the constructor calls PL_copy_term_ref() – this avoids some potential error situations, such as:

PREDICATE foo(2)
{ PlTermScoped first_arg(A1); // should be PlTerm first_arg(A1);
  ...
}

This would call PL_free_term_ref() on the argument, which would incorrectly decrement the references to it (in this situation, it probably wouldn’t cause a crash because the stack frame would be deleted upon return, but there are other situations where this could cause a crash).

The various PL_put_*() functions should also be disallowed with PlTermScoped.