Ann: SWI-Prolog 10.1.8

Dear SWI-Prolog user,

SWI-Prolog 10.1.8 is ready for download. This is a stabilising
release on top of the large Unicode overhaul shipped in 10.1.7. The
main themes are tightening up the Unicode source syntax (bracket-pair
atoms, solo characters, write/quote behaviour), rejecting surrogate
code points at every Prolog API surface, a new UTS #39 confusable
identifier linter, a Unicode character-name lookup library, and a
macOS packaging overhaul that ships a swipl.framework and a signed,
notarised split-layout .pkg installer.

Highlights

  • Surrogate code points are now consistently rejected by every
    Prolog API and stream decoder.
  • New library(unicode_security) (UTS #39 / UAX #24) and
    list_confusable_identifiers/{0,1} in library(check) for
    spotting mixed-script and look-alike identifiers.
  • New library(uniname) with unicode_name/2 for Unicode
    character-name lookup, generation and enumeration.
  • macOS runtime is now a standalone swipl.framework; ninja pkg
    produces a signed and notarised .pkg installer with the
    split /Applications + /Library/Frameworks layout.

Unicode source syntax — follow-ups

  • ENHANCED: Unicode bracket pairs (Ps/Pe) now behave
    consistently with {}. An empty pair (optionally with layout in
    between) reads as the atom '<open><close>', the same name acts
    as a functor when followed by (, and '⟨⟩'(X) writes as ⟨X⟩.
    The bare bracket-pair atom is also written unquoted.
  • ADDED: pattern_syntax and prolog_solo categories in
    char_type/2 and code_type/2. pattern_syntax exposes UAX #31
    Pattern_Syntax membership; prolog_solo exposes the kernel’s own
    “stands as a token on its own” flag.
  • ADDED: write_term/2 option pattern_syntax_solo. When set,
    single-character atoms outside the immutable UAX #31 Pattern_Syntax
    set are quoted, so atoms like '€', '·', '🎉' round-trip
    safely across future Unicode versions. write_canonical/1 enables
    the option by default; write/1 and writeq/1 are unchanged.
  • FIXED: solo characters are no longer needlessly quoted by
    writeq/1 ($needs_quotes/1 re-uses the kernel’s own quote
    decision; the underlying unquoted_atom helper was mis-routing
    UCS atoms through the byte-oriented branch).
  • FIXED: compare/3 on wide atoms now orders by code point
    rather than by wchar_t unit. On Windows (16-bit wchar_t) a
    supplementary-plane atom like '\U0001D11E' is stored as a
    surrogate pair and previously wrongly ordered below BMP atoms in
    U+DC00..U+FFFF.

Surrogate code points are now rejected everywhere

A well-formed Unicode text never contains an isolated surrogate code
point. Several Prolog API surfaces still accepted them, leaking
invalid Unicode into atoms, strings and streams.

  • FIXED: UTF-8 and UTF-16 stream decoders (_PL__utf8_code_point,
    Sgetcode’s inline UTF-8 branch, and get_utf16) now substitute
    U+FFFD with the usual SIO_WARN diagnostic on a surrogate
    sequence, matching the convention for other decode errors.
  • FIXED: atom_codes/2, put_code/{1,2}, put_char/{1,2},
    put/{1,2} and format/{2,3} ~c now reject surrogates with
    type_error(character_code, Code) (or format_argument_type for
    ~c) before reaching the stream layer.
  • FIXED: Sputcode() itself now calls reperror() on a
    surrogate, so foreign callers cannot emit invalid UTF-8 or
    raw-wchar_t bytes via Sputcode(0xD800, s).

New: identifier security and Unicode names

  • ADDED: library(unicode_security) (in packages/utf8proc)
    implements UTS #39 and UAX #24 over generated tables:
    unicode_script/2, unicode_script_extensions/2,
    unicode_identifier_status/2, unicode_identifier_type/2,
    unicode_skeleton/2, unicode_confusable/{2,3},
    unicode_resolved_scripts/2 and unicode_restriction_level/2.
    UCD source files are no longer vendored; the regen-uts39 target
    re-runs etc/gen_uts39.pl from a locally fetched UCD copy.
  • CHANGE: unicode_script/2, unicode_identifier_type/2,
    unicode_script_extensions/2 and unicode_identifier_status/2
    are now plain semidet: they fail on code points with no entry in
    the table rather than absorbing the missing case with a default
    (common, [], restricted). Callers that want a default can
    add their own fall-through clause.
  • ADDED: list_confusable_identifiers/{0,1} in library(check),
    autoloaded when library(unicode_security) is available and
    registered as a check:checker/2. Walks every clause in the
    selected modules (default [user]) and warns on
    • mixed-script identifiers (worse than single_script), and
    • confusable identifier collisions (distinct identifiers with
      the same UTS #39 skeleton).
  • ADDED: library(uniname) exporting unicode_name/2. The
    backing C plugin uses a compact 360 KB table (ICU / GNU libunistring
    layout) and supports forward (+,-) and reverse (-,+) lookup as
    semidet, plus full enumeration (-,-) via a stateful foreign
    iterator (>100k solutions in roughly 50 ms).

macOS packaging

  • ADDED: BUILD_MACOS_FRAMEWORK installs libswipl as
    swipl.framework (Versions/A/swipl), so third-party apps can
    link -framework swipl. A new findHomeFromFramework() uses
    dladdr() on the framework binary to locate the Prolog home
    without environment variables or a swipl.home file.
  • ADDED: BUILD_MACOS_BUNDLE arranges the split layout the
    installer ships: swipl-win.app in /Applications,
    swipl.framework in /Library/Frameworks, with install rpaths
    that resolve the framework both relative to the app and at the
    absolute system path.
  • ADDED: ninja pkg produces a single Developer-ID-signed and
    Apple-notarised .pkg (pkgbuild + productbuild flow with
    welcome / license / conclusion pages and a left-pane logo).
    MacPorts/Homebrew dylibs are bundled into the framework with
    rewritten @rpath/<basename> references and a hardened-runtime
    signature. Universal builds merge the arm64 and x86_64 trees,
    name the pkg fat, and drive signing, notarisation and stapling
    in a single run.

xpce

  • ADDED: library(pce_symbol_picker) — a non-modal singleton
    (and modal pick_symbol/1) to browse Unicode blocks and curated
    code ranges, type the picked symbol into the focused window, and
    remember recents. Supports user-defined code_range/3 lists,
    matching pairs, and a filter that searches either block names or
    Unicode character names.
  • ADDED: PceEmacs / Epilog insert_symbol command bound to
    C-x 8 RET and C-x 8 s. Opens the symbol picker targeting the
    invoking editor or terminal.
  • ADDED: PceEmacs normalize_region / normalize_buffer
    M-x commands to apply a Unicode normalisation form (nfc, nfd,
    nfkc, nfkd) to the region or whole buffer (conditional on
    library(unicode)).
  • ADDED: window ->pdf prints the full bounding box of a window
    rather than only <-area; display_manager <->focus_message
    fires a message on keyboard-focus changes; text_item gains an
    optional clear_image clickable icon.
  • ENHANCED: text <-pointed gains a round argument selecting
    caret-style rounding (snap to nearest gap) versus exact
    hit-testing, and now uses the Pango layout for hit-testing rather
    than summing per-character widths — clicks land on the right
    glyph for proportional fonts and font-fallback runs (emoji,
    Greek, math).
  • ENHANCED: SDL backend Windows font fallback chains add Thai /
    Lao (Leelawadee UI) and Yi syllables (Microsoft Yi Baiti); display
    methods to query and override SDL’s on-screen-keyboard policy.
  • FIXED: ws_discard_input() no longer reads a stale fd
    after a socket-based dispatch hook. Replaced the persistent
    dispatch_fd cache with a small console registry populated at
    init time (stdin) and by Epilog pty creation; ws_dispatch()
    watches an fd only for the duration of one call, and discard
    uses tcflush() / FlushConsoleInputBuffer() rather than
    read().
  • FIXED: Timer and Frame callbacks now hold a code reference
    around SDL_PushEvent, eliminating a use-after-free when the
    object is destroyed before the queued event is drained.
  • FIXED: graphical ->pdf negates the page offset
    (previously a graphical at a non-zero position produced a blank
    page); xpce printf %c handles non-ASCII code points via Put()
    rather than snprintf’s byte-collapsing path.
  • FIXED: man class hierarchy icons; thread monitor icons;
    PceEmacs class menu (raised a type error).
  • MODIFIED: auto_copy class variable is now also defined on
    terminal_image.

libedit

  • ADDED: Ctrl+Left / Ctrl+Right (xterm ESC[1;5D / ESC[1;5C)
    are bound to word motion (ed-prev-word / em-next-word, and the
    matching vi-mode bindings). The xpce terminal emits the same
    sequences for the Ctrl modifier on the cursor keys.

C API and C++ binding

  • MODIFIED: PL_predicate() now takes a UTF-8 string.
    Identifier names are program objects rather than externally
    encoded data.
  • C++ binding (packages/cpp):
    • PlModule, PlPredicate, PlFunctor and the
      PlCompound(functor, args) convenience constructors now
      interpret their functor-name argument as UTF-8 (consistent
      with PL_predicate()) and no longer take a PlEncoding
      argument. Text-parsing PlCompound(text[,enc]) constructors
      still take PlEncoding.
    • PlTerm::unify_atom(const std::string&) gains a
      PlEncoding parameter for symmetry with the const char*
      overload.
    • pl2cpp.plx documents the PlEncoding enum, the
      ENC_INPUT / ENC_OUTPUT defaults and the trailing-encoding
      constructor / method forms.

Other

  • FIXED: Reset cached JIT index decisions when a predicate’s
    supervisor changes. A “not indexable” verdict made on a transient
    clause shape (e.g. while an autoload triggered through
    trapUndefined() re-entered Prolog) could otherwise stick and
    disable JIT indexing for the eventual stable clause set.
  • FIXED: engine_destroy/1 no longer self-deadlocks; destroying
    a running engine from another thread no longer crashes.
  • FIXED: Confusable detection now handles zero-arity compounds.
  • MODIFIED: The code walker does not track below call/1. This
    avoids false positives for undefined-predicate detection when
    using call(Goal).
  • FIXED (packages/clib): bsd-crypt.c no longer
    unconditionally #includes crypt.h, so the fallback compiles
    on systems that ship neither libc crypt() (glibc ≥ 2.39) nor
    libxcrypt (crypt.h is now guarded by HAVE_CRYPT_H).
  • FIXED (packages/http): the proxy test suite reserves its
    unused port by holding a bound, unlistened socket, so concurrent
    ctest jobs cannot grab the port and turn an expected
    ECONNREFUSED into a successful connect.
  • FIXED (packages/ltx2htm): \verb/\verbatim bodies round-trip
    as UTF-8 (issue #8), so literal non-ASCII content no longer renders
    double-encoded in HTML output.

Documentation pipeline

  • The PDF manual builds without the utf8proc package: lualatex
    handles the Unicode content directly.

    Enjoy — Jan

Thanks for that Jan. OK if I close out my pull request, as it’s superseded by this?