Write to socket -- does it strip quotes?

grossdan · February 22, 2020, 8:40pm

Hello,

I am writing to a socket such as so: write(OStream, “[\“param\”]”)

And on the receiving end at the server it looks like i am receiving [param] with the quotes stripped.

I am wondering if write/2 to a socket strips the quotes, or whether this happens at the server. I don’t have access to the server internals.

btw, a write/1, shows the quotes.

any thoughts are much appreciated,

Dan

grossdan · February 22, 2020, 9:34pm

thank you,

interestingly, with this:

t :-
writeq(“param”).

one gets:

[112,97,114,97,109]
true.

is there a way to get a string … instead of list of chars

pmoura · February 22, 2020, 9:38pm

Check the double_quotes flag documentation.

Boris · February 22, 2020, 10:08pm

You should probably figure out what data type you are working with (this is what Paulo is hinting at!). A string? A code list? Something else? Try:

?- string("param").

and

?- is_list("param").

It would be also nice if you formatted your code as code here in your posts. On top of the other confusion, I am seeing “smart quotes”, the thing that you get when you type

``something''

in LaTeX. While you certainly mean this:

EricGT · February 22, 2020, 10:15pm

Normally I would agree with that.

In this case though I find the only time I use set_prolog_flag(double_quotes,X). is when writing DCGs, and I learned the hard way that the best way to avoid the problems of the double quotes biting you is to put the DCG code in a module with the set_prolog_flag/2 in the same module so that the scope of the double quotes is limited to just the module with the DCGs.

That smells of a lint rule.

pmoura · February 22, 2020, 11:11pm

The double_quotes and the unknown flags are two of the original Prolog sins.

Boris · February 23, 2020, 6:11am

So what do you recommend to do with the double_quotes flag? I prefer to leave my global settings to their defaults. Should I be doing something else?

@EricGT how exactly do you use this flag in a module with DCGs? What problems do you run into?

I admit that code lists and char lists and strings confuse me. I know I must be missing the big picture…

(?) Code lists are not distinguishable from a list of integers, until you run into an element that could not be a character code.

(?) They are real lists (compound terms and so on) but I don’t understand what it is that you do with them, from the client code, that makes it necessary to know that it is a list. A DCG is a really great interface to a “string of characters” already.

(?) What is the deal with lists of chars? Again, any non-char element in a list “breaks” it.

(?) In what situations should my code purposefully use a list of chars?

jan · February 23, 2020, 8:56am

There has been an endless fight on this I’d rather not repeat. Some of the results you find in the docs. I’d say:

Leave it at the default string if it doesn’t pose problems. Mainly implies that if you really want a list of character codes you can write this as `hello`. Strings work fine as non-terminals in DCGs, so even there the default is typically fine.
If you have some legacy code assuming “…” is a list of character codes/chars everywhere, put :- set_prolog_flag(double_quotes, codes)). just above the code. The flag is scoped to the end of the file. This is also a good advice for code that needs to be portable and that is not without this flag.

grossdan · February 23, 2020, 9:02am

My apology for my ignorance, but what are actually codes vs. string vs. atom vs. list of chars as (ascii) numbers.

I couldn’t find anything in the manual beside explanations that mention codes but not what codes are.

Dan

jan · February 23, 2020, 9:37am

A code is short for a code point and is an integer representation for a character. In ISO the mapping is undefined. In SWI-Prolog all code points are defined by Unicode. A char represents a character as an atom of length 1. I guess you know what an atom is. String can mean a lot. It is a built-in type in SWI-Prolog for a things that represents a sequence of characters.

Boris · February 23, 2020, 10:08am

Didn’t mean to reignite anything, sorry. I am missing context but this is unavoidable. I re-read the linked section of the docs. So basically, my take-away is that:

The defaults are fine
The double_quotes flag is useful for backwards compatibility or porting from/to other Prologs; keep it local to modules
I can stop thinking about lists of chars as a representation of text (right?)
I should try and avoid explicitly dealing with code lists, as long as I can use the library predicates. A notable exception is literal code lists that I need for unit tests (?)
For defining DCGs, I do not need code list literals.

EricGT · February 23, 2020, 10:08am

Thanks, I though it was scoped to the end of module.

jan · February 23, 2020, 10:19am

Sorry, you are right. Each module has a setting for this flag and a change is thus effective for that module only.

EricGT · February 23, 2020, 11:19am

To understand my reasoning behind what I do requires a bit of a history lesson.

When I started expanding my use of Prolog by going from just a few short exercise cases that were less than a hundred lines to to code that would work on a real world problem with multiple files I turned to using SWI-Prolog and asking questions on StackOverflow. Turns out that using StackOverflow examples to learn DCGs with SWI-Prolog for parsing will give you lots of headaches because SWI-Prolog uses double quotes differently.

The string type and its double quoted syntax

So many of the examples for DCGs for parsing on StackOverflow and in many blogs, etc. don’t mention this and thus parse differently. If you take that code as is and try in on SWI-Prolog it sometimes works, and sometimes fails.

So after many months came to realize that to ensure that my DCGs were parsing as I expected I would add

:- set_prolog_flag(double_quotes, <something>).
:- set_prolog_flag(back_quotes, <something different>).

in all the code, even the code I created, copied from StackOverflow or other places, and answer I would give on StackOverflow, and even if it was not really changing the meaning.

Then as time went on I started using test cases with

:- begin_tests(abc).

:- end_tests(abc).

and that started a second set of problems because I did not know that

:- begin_tests(abc).

:- end_tests(abc).

are actually creating a new module and the flags are scoped to a module.

So now any time I write a DCG for use by others I follow these steps.

Create a module using module/2
Set Prolog flags for double_quotes and back_quotes at the top of the module.
Add the DCGs
Create unit test section with begin_tests/1 and end_tests/1
Set Prolog flags for double_quotes and back_quotes in the test section at the top of the section.
Add the tests.

Example: How to use DCG in Prolog. See the variation with module/2

Another place you have to be concerned about all of this but I have yet to reason it all out is when you are developing code using the top-level and using assert/N to add the predicates. How does all of this work since you are probably not using modules.

To avoid that problem I write all of my code the traditional way with files, an editor and then consult/make to load the files.

Related Q&A

What type is a single quoted string?
Prolog DCG set_prolog_flag double_quotes source code directive location matters; documentation?

With regards to the Prolog flags double_quotes and back_quotes, none now.

There are many other problems in the area related to parsing with DCGs, but I could write a few chapters on that.

peter.ludemann · February 24, 2020, 12:20am

And just to add more confusion, code point representations aren’t necessarily unique.
For example, try this query (depending on your terminal, you may see how InNfc and InNfd aren’t the same):

?- In = "İzmir",  % https://en.wikipedia.org/wiki/%C4%B0zmir
string_codes(In, Codes),
unicode_nfc( In, InNfc),  string_codes(InNfc,  CodesNfc),
unicode_nfkc(In, InNfkc), string_codes(InNfkc, CodesNfkc),
unicode_nfd( In, InNfd),  string_codes(InNfd,  CodesNfd),
unicode_nfkd(In, InNfkd), string_codes(InNfkd, CodesNfkd).

In = "İzmir",
Codes = CodesNfc, CodesNfc = CodesNfkc, CodesNfkc = [304, 122, 109, 105, 114],
InNfc = InNfkc, InNfkc = 'İzmir',
InNfd = InNfkd, InNfkd = 'İzmir',
CodesNfd = CodesNfkd, CodesNfkd = [73, 775, 122, 109, 105, 114].

This comes from the following test case in Python, which illustrates even more Unicode madness by applying the lower operation (I was too lazy to do the equivalent in the Prolog example):

import unicodedata
base_s = 'İzmir'
for form, s in [('(unnormalized)', base_s)] + [
        (form, unicodedata.normalize(form, base_s))
        for form in ('NFC', 'NFKC', 'NFD', 'NFKD')]:
    print(form, [c.lower() for c in s])
    print(form, [c for c in s.lower()])

j4n_bur53 · February 24, 2020, 12:41pm

You can represent both side by side:

SWI-Prolog (threaded, 64 bits, version 8.1.22)

?- atom_codes(X, [304, 122, 109, 105, 114]), atom_codes(X, L).
X = 'İzmir',
L = [304, 122, 109, 105, 114].

?- atom_codes(X, [73, 775, 122, 109, 105, 114]), atom_codes(X, L).
X = 'İzmir',
L = [73, 775, 122, 109, 105, 114].

The ISO core standard doesn’t require that the glyph you see corresponds one-to-one to the character code or some character codes inside an atom. Some Prolog systems tried to built in some normalization into their atoms, but gave up:

"An additional restriction is that the sequence of characters that makes up a quoted token must be in Normal Form C (NFC). This is currently (SICStus Prolog 4.0.3) not enforced. A future release may enforce this restriction or perform this normalization automatically. "

https://sicstus.sics.se/sicstus/docs/4.1.0/html/sicstus/ref_002dsyn_002dsyn_002dtok.html

I guess the motivation might be some web thingy. But I am not sure whether its charmod as SICStus Prolog says, its rather the comparison operator and user data: Normalization FAQ.

Edit 24.02.2020:
I am afraid, will prevent on the beach for Jan W… Does SWI-Prolog have a comparison operator that does normalization on the fly? Would there be a demand for such a feature?

?- atom_codes(X, [73, 775, 122, 109, 105, 114]), 
    atom_codes(Y, [304, 122, 109, 105, 114]), compare(C, X, Y).
X = 'İzmir',
Y = 'İzmir',
C =  (<).

On the other hand in my system I can do since latest release (and still a little experimental).
I get this for free from Java collator classes.

?- atom_codes(X, [73, 775, 122, 109, 105, 114]), 
    atom_codes(Y, [304, 122, 109, 105, 114]), compare(C, X, Y).
X = 'İzmir',
Y = 'İzmir',
C = <

?- atom_codes(X, [73, 775, 122, 109, 105, 114]), 
   atom_codes(Y, [304, 122, 109, 105, 114]), compare(C, X, Y, [type(collator)]).
X = 'İzmir',
Y = 'İzmir',
C = =

jan · February 24, 2020, 1:42pm

SWI-Prolog is more or less at the same square as SICStus. Unicode data is simply considered a sequence of code points and no normalization is done. Normalization, case folding, diacritic removal and more is provided by means of library(unicode): Unicode string handling.

The whole Unicode data manipulation is extremely complex. The utf8proc library does only the most common stuff and is unfortunately poorly maintained (last time I checked). There is no standard OS support (that would be great) and last time I looked at it, comprehensive Unicode libraries were bigger than SWI-Prolog itself. That is IMO a bit too much for something that is not that useful for many users.

If you want to program using identifiers from a language where Unicode normalization matters you will need an editor or other tools that can do the normalization for you as without any normalization you may often wonder why terms to not unify, predicates are undefined, etc

EricGT · February 24, 2020, 1:52pm

Just want to make sure I get this correctly.

My understanding, outside of Prolog or any language or editor, is that Unicode data manipulation is easy when dealing with Unicode Code Points, e.g U+10BF0 or any other code point. The problem comes when the Unicode Code Points are not used internally but instead an encoding such as UTF-8, UTF-16, and so on is used for the representation. One of the most common problems when using these encodings internally is that the width of a code point is not constant in an encoding so locating a code point in an encoding requires starting at the start of the representation and progressing forward.

Please correct any of this if it is wrong, or expresses an invalid view.

pmoura · February 24, 2020, 2:18pm

The Logtalk distribution includes a unicode_data library that can be regarded as we would get if the Unicode standard had chosen Prolog for representing code points and their properties:

https://logtalk.org/manuals/libraries/unicode_data.html

Size is indeed an issue. This library is close to 9MB.

jan · February 24, 2020, 2:55pm

Yeah. SWI-Prolog provides library(unicode/unicode_data) which can handle the official Unicode consortium data files. This is mainly intended to generate derived tables with the stuff some application needs. For example, it is used to produce the C tables for the character classification routines required by the Prolog parser. You have to download the tables separately.

Good news is that utf8proc seems to be picked up by Julia and is again actively supported!

Topic		Replies	Views
Double_quotes flag and DCGs Discussion dcg	12	783	June 30, 2025
Unicode symbols, and a possible numerical paradox! Discussion	46	2170	June 27, 2022
Best way to write terms for interoperability Predicate	18	1463	July 21, 2023
term_string('A', "'A'") Help! discussion	18	817	June 22, 2022
Phrase_from_file vs phrase Help!	29	2938	September 14, 2020

Write to socket -- does it strip quotes?

Related topics