Ann: SWI-Prolog 8.5.14

Dear SWI-Prolog user,

SWI-Prolog 8.5.14 is ready for download. This release contains
many visible changes whose implementation touched a lot of code.
As a result, some regression is not unlikely. People using the
development version for production purposes should thoroughly
test this version first
. Suspect areas are I/O, atom and string
manipulation in Prolog or C, notably when non-ASCII characters
are involved. Regression is more likely on Windows, but may also
affect other platforms.

Highlights:

  • Update character tables to Unicode 14.0.0

  • Distinguish Unicode decimal digits and act on them regardless
    of the script. It is not allowed to mix digits from multiple
    scripts in the same number.

  • Change internal wchar_t text on systems where wchar_t is 2 bytes
    (Windows) to deal with the encoding as UTF-16. This allows for
    the full unicode range on all platforms. These patches also
    provide UTF-16 I/O on all platforms. The Unicode surrogate
    pairs
    code units are now considered illegal code points
    (0xD800 .. 0xDFFF). Note that these changes may well have
    caused regression. Also several of the extensions still handle
    wchar_t on Windows as UCS-2 (notably xpce).

  • Added string_bytes/3 to convert between Unicode text and byte
    sequences. Used by updated base64_encoded/3.

  • The JAVA interface now avoids recursion for exchanging terms.
    Contributed by Paul Singleton.

  • Several msys2 portability issues by @mgondan1

  • Several fixes to pack_install/1, in part by Peter.Ludemann.

  • Avoid a crash on too deeply nested C->Prolog->C call stacks.
    Partial implementation (fully functional on Linux, approximation
    on MacOS and not on Windows).

  • Added PL_scan_options() to the foreign API to simplify processing
    option lists and make their processing consistent. The new API
    is a slight generalization of an old internal API.

    Enjoy — Jan

SWI-Prolog Changelog since version 8.5.13

  • ENHANCED: base64_encoded/3: added option encoding(Encoding) and
    bootstrap base64/2 and base64url/2 from this predicate. base64url/2
    now uses UTF-8 encoding (MODIFIED).

  • ADDED: string_bytes/3, get the bytes for representing a (Unicode)
    string in a given encoding.

  • FIXED: Avoid C-stack overflow in recursive C->Prolog->C calls by
    demanding a minimum of 100Kbytes stack before calling Prolog.

  • FIXED: pack_install/1: if the pack is already installed at an older
    version, upgrade it.

  • ENHANCED: Make git probe silent on all possible errors.

  • FIXED: GIT URLs must be a valid absolute URL to begin with

  • FIXED: pack_install/1: version comparison for already installed
    versions.

  • ADDED: PL_scan_options() public API to deal with option lists.

  • ADDED: pack_install/1 to test whether a URL is a GIT URL using
    git_remote_branches/2.

  • MDOIFIED: Fixed various internal recoding issues to ENC_WCHAR.
    These changes also ensures canonical text as used for atoms and strings
    only contain valid Unicode code points. As a result, passing invalid
    strings to Prolog using the foreign API may result in a failure.

  • DOC: Unicode and UTF-16 issues.

  • MODIFIED: Be consistent about valid character codes. These are the
    Unicode code points 0..U+10FFFF, while the range reserved for UTF-16
    surrogate pairs is excluded (U+D800..U+DFFF).

  • DOC: Base64 encoding issues.

  • PORT: add_package_path/1 also under Windows This change allows for
    a GNU-style directory structure also under Windows.

  • DOC: Rename section label for the statistics section of the manual to
    avoid a clash with the library documentation, hiding statistics/2 docs.

  • TEST: Avoid surrogates for all encodings

There are a number of visible
changes * UTF: Fixed length handling in setenv/2.

  • PORT: Replace most wint_t by int for character classification purposes
    because Windows wint_t is 2 bytes, so we cannot classify anything

    0xffff

  • UTF: atom_concat/3 can now handle UTF-16 sequences.

  • UTF: Make PL_cmp_text() and PL_unify_text_range() deal with UTF-16
    strings.

  • CLEANUP: sub_atom/5: more consistent typing and better reuse of
    primitives. This patch fixes handling atoms longer than 2G code points

  • FIXED: Reading terms from Unicode symbol sequences.

  • ADDED: Use UTF-16 for canonical text on Windows. This is a first
    step that implements some of the basic handling for creating and
    writing atoms with code points > U+FFFF

  • MODIFIED: Official encoding names for UTF-16. Now also allows
    aliases for the IANA names for specifying the encoding (UTF-8,
    UTF-16BE, UTF-16LE).

  • ENHANCED: Allow reading and writing UTF-16 files.

  • PORT: MSYS2, add %MINGW_PREFIX%/bin to dll search

  • MODIFIED: PL_get_char() now returns a domain or representation error if
    the code is outside the Unicode range (domain) or cannot be represented
    by the system (representation).

  • FIXED: built-in option list processing should raise a type error if
    the list is cyclic.

  • FIXED: incr_invalidate_calls/1: succeed if no tabling happened in
    the calling thread yet (and thus there is no variant table).

  • ADDED: Allow floats from other scripts.

  • DOC: various Unicode related updates, including handling of non-ASCII
    decimal number characters.

  • MODIFIED: Make the Prolog parser parse decimal numbers in other
    scripts to integers.

  • ADDED: char_type/2 type decimal.

  • MODIFIED: Updated character classification for read/1 and friends to
    be based on Unicode 14.0.0

  • MODIFIED: Updated to Unicode 14.0.0 (from 6.0.0)

  • CLEANUP: library(unicode/unicode_data) to avoid conflict with table/1.

Package clib

  • ADDED: library(sched), providing a start at accessing the OS scheduling
    primitives.

  • FIXED: read_line_to_codes/2: avoid line ending with \r if the \n is
    found just after a flush. This patch also includes a rewrite of this
    predicate and read_stream_to_codes/3 to use UTF-8 as intermediate
    representation rather than wchar_t, avoiding the need for UTF-16
    surrogate pairs.

Package cpp

  • ADDED: PlTerm casts for bool and uint32_t

Package jpl

  • ENHANCED: Non-recursive Term.getTerm reimplementation Replaces the
    original recursive implementation of Term.getTerm() etc. (which runs
    out of JVM stack for e.g. lists of more than a few thousand members)
    with a depth-unlimited non-recursive version (see Term.getLoop)
    and adds a couple of JUnit tests

  • ENHANCED: Non-recursive Term.put reimplementation Replaces the original
    recursive implementation of Term.put() etc. (which runs out of JVM
    stack for e.g. lists of more than a few thousand members) with a
    depth-unlimited non-recursive version (see Term.putLoop) and adds a
    couple of JUnit tests

  • TEST: Cleanup and enhancements.

1 Like