PL_atom_chars() vs PL_blob_data()

Removing this question for now, while I try some things out.

The documentation for PL_atom_chars() says to use PL_blob_data() instead (and there’s a general recommendation to use functions that specify or get a length rather than use null-termination (Documentation of PL_put_term_from_chars() and PL_chars_to_term() - #2 by jan)

However, PL_blob_data() (and PL_get_blob()) have a type parameter. What is the correct way to use this type to be sure that I have an atom?
By experimentation (and reading pl-fli.c, pl-atom.c), I’ve determined that the blob->name is either text or ucs_text and the blob->flags can be tested using PL_BLOB_TEXT and/or PL_BLOB_WCHAR, but is that sufficient to be sure that I have an atom?

For testing this, I created a blob with flags=PL_BLOB_TEXT … PL_is_atom() succeeded and PL_term_type() returned PL_ATOM and not PL_BLOB. This doesn’t seem right; shouldn’t PL_is_atom() also require PL_BLOB_UNIQUE? I’m guessing that nothing would break with two kinds of atoms (there’s already ucs_text), because atoms and blobs all are atom_t and those are unique (but PL_BLOB_UNIQUE needs to be set if we want to test equality using == instead of memcmp()).

You’re not asking for an answer at this point, but I’ll provide one anyway, as I’ve been staring at this code for most of yesterday.
PL_atom_nchars and PL_atom_wchars ensure that an atom is an actual atom and not some blob type by comparing type with ucs_atom, which is defined in pl-fli.c and unfortunately not exported in the ffi (though you can get hold of it by retrieving the type from any known atom). Just checking the type name is strictly speaking not enough, cause any blob definer can just claim their blob is called “ucs_text”. I highly doubt it’d lead to issues in practice though, cause who does that.
You could just use PL_atom_nchars and PL_atom_wchars directly of course, as these do the check for you. They will simply return NULL if the atom_t is not a proper atom.

PL_get_wchars(term_t, ...) works with both atoms and blobs that have the PL_BLOB_TEXT flag set, using whatever the write function outputs.
I’ve almost got some test code fully working … will update this thread “soon”.

I was talking about PL_atom_wchars, not PL_get_wchars. PL_atom_wchars works directly on the underlying atom, returning a pointer to its data. PL_get_wchars does quite a bit more, actually copying the data and potentially converting the encoding.

PL_atom_wchars does definitely check for ucs_atom type, not just PL_BLOB_TEXT flag.

If you have a term_t that is supposed to be an atom use one of the PL_get*chars() functions. It relatively rare cases you might want to get the text associated from an atom_t object. PL_atom_chars() and PL_atom_wchars() are fine for that. The blob functions should be used for non-text atoms. Do not use them on text atoms as it is not impossible (and actually quite likely) that sooner or later the two text atom types will be merged. In that case PL_atom_chars() and PL_atom_wchars() can still do their job, but fetching the blob data might get something unexpected.

1 Like

It seems that the documentation isn’t quite right … I’ll try to update it (and put together a test case or two). I’ve also been looking at the code that uses the PL_BLOB_WCHAR flag and/or checks for ucs_atom – it appears that things have evolved over time and there might be some latent problems there.

1 Like

I’d rather not see the wchar aspects of blobs being documented. They are not unlikely to change at some point … So, if they should be documented it should be “internal use” or something like that :slight_smile:

I’ll be making a PR soon … the changes are mostly from checking for &ucs_atom to checking for PL_BLOB_WCHAR. Nothing big, but the changes help me make some other tests that aren’t ready yet.