Ann: SWI-Prolog 8.5.14

Dear SWI-Prolog user,

SWI-Prolog 8.5.14 is ready for download. This release contains
many visible changes whose implementation touched a lot of code.
As a result, some regression is not unlikely. People using the
development version for production purposes should thoroughly
test this version first
. Suspect areas are I/O, atom and string
manipulation in Prolog or C, notably when non-ASCII characters
are involved. Regression is more likely on Windows, but may also
affect other platforms.

Highlights:

  • Update character tables to Unicode 14.0.0

  • Distinguish Unicode decimal digits and act on them regardless
    of the script. It is not allowed to mix digits from multiple
    scripts in the same number.

  • Change internal wchar_t text on systems where wchar_t is 2 bytes
    (Windows) to deal with the encoding as UTF-16. This allows for
    the full unicode range on all platforms. These patches also
    provide UTF-16 I/O on all platforms. The Unicode surrogate
    pairs
    code units are now considered illegal code points
    (0xD800 … 0xDFFF). Note that these changes may well have
    caused regression. Also several of the extensions still handle
    wchar_t on Windows as UCS-2 (notably xpce).

  • Added string_bytes/3 to convert between Unicode text and byte
    sequences. Used by updated base64_encoded/3.

  • The JAVA interface now avoids recursion for exchanging terms.
    Contributed by Paul Singleton.

  • Several msys2 portability issues by @mgondan1

  • Several fixes to pack_install/1, in part by Peter.Ludemann.

  • Avoid a crash on too deeply nested C->Prolog->C call stacks.
    Partial implementation (fully functional on Linux, approximation
    on MacOS and not on Windows).

  • Added PL_scan_options() to the foreign API to simplify processing
    option lists and make their processing consistent. The new API
    is a slight generalization of an old internal API.

    Enjoy — Jan

SWI-Prolog Changelog since version 8.5.13

  • ENHANCED: base64_encoded/3: added option encoding(Encoding) and
    bootstrap base64/2 and base64url/2 from this predicate. base64url/2
    now uses UTF-8 encoding (MODIFIED).

  • ADDED: string_bytes/3, get the bytes for representing a (Unicode)
    string in a given encoding.

  • FIXED: Avoid C-stack overflow in recursive C->Prolog->C calls by
    demanding a minimum of 100Kbytes stack before calling Prolog.

  • FIXED: pack_install/1: if the pack is already installed at an older
    version, upgrade it.

  • ENHANCED: Make git probe silent on all possible errors.

  • FIXED: GIT URLs must be a valid absolute URL to begin with

  • FIXED: pack_install/1: version comparison for already installed
    versions.

  • ADDED: PL_scan_options() public API to deal with option lists.

  • ADDED: pack_install/1 to test whether a URL is a GIT URL using
    git_remote_branches/2.

  • MDOIFIED: Fixed various internal recoding issues to ENC_WCHAR.
    These changes also ensures canonical text as used for atoms and strings
    only contain valid Unicode code points. As a result, passing invalid
    strings to Prolog using the foreign API may result in a failure.

  • DOC: Unicode and UTF-16 issues.

  • MODIFIED: Be consistent about valid character codes. These are the
    Unicode code points 0…U+10FFFF, while the range reserved for UTF-16
    surrogate pairs is excluded (U+D800…U+DFFF).

  • DOC: Base64 encoding issues.

  • PORT: add_package_path/1 also under Windows This change allows for
    a GNU-style directory structure also under Windows.

  • DOC: Rename section label for the statistics section of the manual to
    avoid a clash with the library documentation, hiding statistics/2 docs.

  • TEST: Avoid surrogates for all encodings

There are a number of visible
changes * UTF: Fixed length handling in setenv/2.

  • PORT: Replace most wint_t by int for character classification purposes
    because Windows wint_t is 2 bytes, so we cannot classify anything

    0xffff

  • UTF: atom_concat/3 can now handle UTF-16 sequences.

  • UTF: Make PL_cmp_text() and PL_unify_text_range() deal with UTF-16
    strings.

  • CLEANUP: sub_atom/5: more consistent typing and better reuse of
    primitives. This patch fixes handling atoms longer than 2G code points

  • FIXED: Reading terms from Unicode symbol sequences.

  • ADDED: Use UTF-16 for canonical text on Windows. This is a first
    step that implements some of the basic handling for creating and
    writing atoms with code points > U+FFFF

  • MODIFIED: Official encoding names for UTF-16. Now also allows
    aliases for the IANA names for specifying the encoding (UTF-8,
    UTF-16BE, UTF-16LE).

  • ENHANCED: Allow reading and writing UTF-16 files.

  • PORT: MSYS2, add %MINGW_PREFIX%/bin to dll search

  • MODIFIED: PL_get_char() now returns a domain or representation error if
    the code is outside the Unicode range (domain) or cannot be represented
    by the system (representation).

  • FIXED: built-in option list processing should raise a type error if
    the list is cyclic.

  • FIXED: incr_invalidate_calls/1: succeed if no tabling happened in
    the calling thread yet (and thus there is no variant table).

  • ADDED: Allow floats from other scripts.

  • DOC: various Unicode related updates, including handling of non-ASCII
    decimal number characters.

  • MODIFIED: Make the Prolog parser parse decimal numbers in other
    scripts to integers.

  • ADDED: char_type/2 type decimal.

  • MODIFIED: Updated character classification for read/1 and friends to
    be based on Unicode 14.0.0

  • MODIFIED: Updated to Unicode 14.0.0 (from 6.0.0)

  • CLEANUP: library(unicode/unicode_data) to avoid conflict with table/1.

Package clib

  • ADDED: library(sched), providing a start at accessing the OS scheduling
    primitives.

  • FIXED: read_line_to_codes/2: avoid line ending with \r if the \n is
    found just after a flush. This patch also includes a rewrite of this
    predicate and read_stream_to_codes/3 to use UTF-8 as intermediate
    representation rather than wchar_t, avoiding the need for UTF-16
    surrogate pairs.

Package cpp

  • ADDED: PlTerm casts for bool and uint32_t

Package jpl

  • ENHANCED: Non-recursive Term.getTerm reimplementation Replaces the
    original recursive implementation of Term.getTerm() etc. (which runs
    out of JVM stack for e.g. lists of more than a few thousand members)
    with a depth-unlimited non-recursive version (see Term.getLoop)
    and adds a couple of JUnit tests

  • ENHANCED: Non-recursive Term.put reimplementation Replaces the original
    recursive implementation of Term.put() etc. (which runs out of JVM
    stack for e.g. lists of more than a few thousand members) with a
    depth-unlimited non-recursive version (see Term.putLoop) and adds a
    couple of JUnit tests

  • TEST: Cleanup and enhancements.

1 Like