change_arg/3 seems to be the same as nb_setarg/3 except that the documentation talks about “unsafe heap operations”. No such thing exists for SWI-Prolog. There is also nb_linkarg/3. That is more low-level and might be about the “unsafe heap operations”. I don’t know.
The compatibility layers are typically developed along with projects and therefore pretty incomplete. Ideally the next project extend on it and in the end you get something workable. A full compatibility layers that emulates all dark corners of some other system is a lot of work!
On my timing I get an about 10% performance loss for your findall2/3 (both on Linux and Windows). The fun thing is that the timings changed quite a bit. When I added nb_setarg/3 I wrote a version of findall/3 based on it, but it was so much slower that I decided to keep the old one. My version seems slightly more efficient and now hits only a 3% loss. That might make it worthwhile due to its simplicity and the smaller overhead for findall calls that produce few solutions. Here is also one that does not copy if there is only one answer:
Well, on my measurements my version is faster than yours. Of course that depends on the platform. This version was developed in a project that used findall/3 in a scenario where there is frequently only one answer. That is why is is called “find_few” The call_det/2 hack avoids the need to copy anything if there is only one answer. The drawback is that you get the copy semantics for non-det goals and the non-copy semantics for (semi)det goals. That was not a problem for this project, but in general it is not acceptable.
That we have now an acceptable performance difference is probably due to better compilation of arg/3, and inline unification. Well, the lesson is that this design is now viable. Given how much simpler it is that the actual implementation of findall/3 makes reconsidering a real option. There is nothing wrong with how it is implemented now though, so there is hurry.
Works fine for any atomic data. The set and link variants only differ for compound terms. I see it still call duplicate_term, but this is a no-op for atomic data. Enhanced to avoid this.