Dear SWI-Prolog user,
SWI-Prolog 10.1.8 is ready for download. This is a stabilising
release on top of the large Unicode overhaul shipped in 10.1.7. The
main themes are tightening up the Unicode source syntax (bracket-pair
atoms, solo characters, write/quote behaviour), rejecting surrogate
code points at every Prolog API surface, a new UTS #39 confusable
identifier linter, a Unicode character-name lookup library, and a
macOS packaging overhaul that ships a swipl.framework and a signed,
notarised split-layout .pkg installer.
Highlights
- Surrogate code points are now consistently rejected by every
Prolog API and stream decoder. - New
library(unicode_security)(UTS #39 / UAX #24) and
list_confusable_identifiers/{0,1}inlibrary(check)for
spotting mixed-script and look-alike identifiers. - New
library(uniname)withunicode_name/2for Unicode
character-name lookup, generation and enumeration. - macOS runtime is now a standalone
swipl.framework;ninja pkg
produces a signed and notarised.pkginstaller with the
split/Applications+/Library/Frameworkslayout.
Unicode source syntax — follow-ups
- ENHANCED: Unicode bracket pairs (
Ps/Pe) now behave
consistently with{}. An empty pair (optionally with layout in
between) reads as the atom'<open><close>', the same name acts
as a functor when followed by(, and'⟨⟩'(X)writes as⟨X⟩.
The bare bracket-pair atom is also written unquoted. - ADDED:
pattern_syntaxandprolog_solocategories in
char_type/2andcode_type/2.pattern_syntaxexposes UAX #31
Pattern_Syntax membership;prolog_soloexposes the kernel’s own
“stands as a token on its own” flag. - ADDED:
write_term/2optionpattern_syntax_solo. When set,
single-character atoms outside the immutable UAX #31 Pattern_Syntax
set are quoted, so atoms like'€','·','🎉'round-trip
safely across future Unicode versions.write_canonical/1enables
the option by default;write/1andwriteq/1are unchanged. - FIXED: solo characters are no longer needlessly quoted by
writeq/1($needs_quotes/1re-uses the kernel’s own quote
decision; the underlyingunquoted_atomhelper was mis-routing
UCS atoms through the byte-oriented branch). - FIXED:
compare/3on wide atoms now orders by code point
rather than bywchar_tunit. On Windows (16-bitwchar_t) a
supplementary-plane atom like'\U0001D11E'is stored as a
surrogate pair and previously wrongly ordered below BMP atoms in
U+DC00..U+FFFF.
Surrogate code points are now rejected everywhere
A well-formed Unicode text never contains an isolated surrogate code
point. Several Prolog API surfaces still accepted them, leaking
invalid Unicode into atoms, strings and streams.
- FIXED: UTF-8 and UTF-16 stream decoders (
_PL__utf8_code_point,
Sgetcode’s inline UTF-8 branch, andget_utf16) now substitute
U+FFFDwith the usualSIO_WARNdiagnostic on a surrogate
sequence, matching the convention for other decode errors. - FIXED:
atom_codes/2,put_code/{1,2},put_char/{1,2},
put/{1,2}andformat/{2,3}~cnow reject surrogates with
type_error(character_code, Code)(orformat_argument_typefor
~c) before reaching the stream layer. - FIXED:
Sputcode()itself now callsreperror()on a
surrogate, so foreign callers cannot emit invalid UTF-8 or
raw-wchar_tbytes viaSputcode(0xD800, s).
New: identifier security and Unicode names
- ADDED:
library(unicode_security)(inpackages/utf8proc)
implements UTS #39 and UAX #24 over generated tables:
unicode_script/2,unicode_script_extensions/2,
unicode_identifier_status/2,unicode_identifier_type/2,
unicode_skeleton/2,unicode_confusable/{2,3},
unicode_resolved_scripts/2andunicode_restriction_level/2.
UCD source files are no longer vendored; theregen-uts39target
re-runsetc/gen_uts39.plfrom a locally fetched UCD copy. - CHANGE:
unicode_script/2,unicode_identifier_type/2,
unicode_script_extensions/2andunicode_identifier_status/2
are now plain semidet: they fail on code points with no entry in
the table rather than absorbing the missing case with a default
(common,[],restricted). Callers that want a default can
add their own fall-through clause. - ADDED:
list_confusable_identifiers/{0,1}inlibrary(check),
autoloaded whenlibrary(unicode_security)is available and
registered as acheck:checker/2. Walks every clause in the
selected modules (default[user]) and warns on- mixed-script identifiers (worse than
single_script), and - confusable identifier collisions (distinct identifiers with
the same UTS #39 skeleton).
- mixed-script identifiers (worse than
- ADDED:
library(uniname)exportingunicode_name/2. The
backing C plugin uses a compact 360 KB table (ICU / GNU libunistring
layout) and supports forward(+,-)and reverse(-,+)lookup as
semidet, plus full enumeration(-,-)via a stateful foreign
iterator (>100k solutions in roughly 50 ms).
macOS packaging
- ADDED:
BUILD_MACOS_FRAMEWORKinstallslibswiplas
swipl.framework(Versions/A/swipl), so third-party apps can
link-framework swipl. A newfindHomeFromFramework()uses
dladdr()on the framework binary to locate the Prolog home
without environment variables or aswipl.homefile. - ADDED:
BUILD_MACOS_BUNDLEarranges the split layout the
installer ships:swipl-win.appin/Applications,
swipl.frameworkin/Library/Frameworks, with install rpaths
that resolve the framework both relative to the app and at the
absolute system path. - ADDED:
ninja pkgproduces a single Developer-ID-signed and
Apple-notarised.pkg(pkgbuild + productbuild flow with
welcome / license / conclusion pages and a left-pane logo).
MacPorts/Homebrew dylibs are bundled into the framework with
rewritten@rpath/<basename>references and a hardened-runtime
signature. Universal builds merge the arm64 and x86_64 trees,
name the pkgfat, and drive signing, notarisation and stapling
in a single run.
xpce
- ADDED:
library(pce_symbol_picker)— a non-modal singleton
(and modalpick_symbol/1) to browse Unicode blocks and curated
code ranges, type the picked symbol into the focused window, and
remember recents. Supports user-definedcode_range/3lists,
matching pairs, and a filter that searches either block names or
Unicode character names. - ADDED: PceEmacs / Epilog
insert_symbolcommand bound to
C-x 8 RETandC-x 8 s. Opens the symbol picker targeting the
invoking editor or terminal. - ADDED: PceEmacs
normalize_region/normalize_buffer
M-x commands to apply a Unicode normalisation form (nfc,nfd,
nfkc,nfkd) to the region or whole buffer (conditional on
library(unicode)). - ADDED:
window ->pdfprints the full bounding box of a window
rather than only<-area;display_manager <->focus_message
fires a message on keyboard-focus changes;text_itemgains an
optionalclear_imageclickable icon. - ENHANCED:
text <-pointedgains aroundargument selecting
caret-style rounding (snap to nearest gap) versus exact
hit-testing, and now uses the Pango layout for hit-testing rather
than summing per-character widths — clicks land on the right
glyph for proportional fonts and font-fallback runs (emoji,
Greek, math). - ENHANCED: SDL backend Windows font fallback chains add Thai /
Lao (Leelawadee UI) and Yi syllables (Microsoft Yi Baiti); display
methods to query and override SDL’s on-screen-keyboard policy. - FIXED:
ws_discard_input()no longer reads a stalefd
after a socket-based dispatch hook. Replaced the persistent
dispatch_fdcache with a small console registry populated at
init time (stdin) and by Epilog pty creation;ws_dispatch()
watches anfdonly for the duration of one call, and discard
usestcflush()/FlushConsoleInputBuffer()rather than
read(). - FIXED: Timer and Frame callbacks now hold a code reference
aroundSDL_PushEvent, eliminating a use-after-free when the
object is destroyed before the queued event is drained. - FIXED:
graphical ->pdfnegates the page offset
(previously a graphical at a non-zero position produced a blank
page);xpce printf %chandles non-ASCII code points viaPut()
rather thansnprintf’s byte-collapsing path. - FIXED: man class hierarchy icons; thread monitor icons;
PceEmacs class menu (raised a type error). - MODIFIED:
auto_copyclass variable is now also defined on
terminal_image.
libedit
- ADDED: Ctrl+Left / Ctrl+Right (xterm ESC[1;5D / ESC[1;5C)
are bound to word motion (ed-prev-word/em-next-word, and the
matching vi-mode bindings). The xpce terminal emits the same
sequences for the Ctrl modifier on the cursor keys.
C API and C++ binding
- MODIFIED:
PL_predicate()now takes a UTF-8 string.
Identifier names are program objects rather than externally
encoded data. - C++ binding (
packages/cpp):PlModule,PlPredicate,PlFunctorand the
PlCompound(functor, args)convenience constructors now
interpret their functor-name argument as UTF-8 (consistent
withPL_predicate()) and no longer take aPlEncoding
argument. Text-parsingPlCompound(text[,enc])constructors
still takePlEncoding.PlTerm::unify_atom(const std::string&)gains a
PlEncodingparameter for symmetry with theconst char*
overload.pl2cpp.plxdocuments thePlEncodingenum, the
ENC_INPUT/ENC_OUTPUTdefaults and the trailing-encoding
constructor / method forms.
Other
- FIXED: Reset cached JIT index decisions when a predicate’s
supervisor changes. A “not indexable” verdict made on a transient
clause shape (e.g. while an autoload triggered through
trapUndefined()re-entered Prolog) could otherwise stick and
disable JIT indexing for the eventual stable clause set. - FIXED:
engine_destroy/1no longer self-deadlocks; destroying
a running engine from another thread no longer crashes. - FIXED: Confusable detection now handles zero-arity compounds.
- MODIFIED: The code walker does not track below
call/1. This
avoids false positives for undefined-predicate detection when
usingcall(Goal). - FIXED (
packages/clib):bsd-crypt.cno longer
unconditionally#includescrypt.h, so the fallback compiles
on systems that ship neither libccrypt()(glibc ≥ 2.39) nor
libxcrypt (crypt.his now guarded byHAVE_CRYPT_H). - FIXED (
packages/http): the proxy test suite reserves its
unusedport by holding a bound, unlistened socket, so concurrent
ctestjobs cannot grab the port and turn an expected
ECONNREFUSEDinto a successful connect. - FIXED (
packages/ltx2htm):\verb/\verbatimbodies round-trip
as UTF-8 (issue #8), so literal non-ASCII content no longer renders
double-encoded in HTML output.
Documentation pipeline
-
The PDF manual builds without the
utf8procpackage: lualatex
handles the Unicode content directly.Enjoy — Jan