Phrase_from_file vs phrase

remy.s · September 12, 2020, 12:44pm

In my prolog program, I have the following declaration:

:- set_prolog_flag(double_quotes, chars).
:- use_module(library(dcg/basics)).
dcg_date(Date) -->
	digit(Y1), digit(Y2), digit(Y3), digit(Y4)
	, [Sep]
	, digit(M1), digit(M2)
	, [Sep]
	, digit(D1), digit(D2)
	, {
		member(Sep, [/, -])
		, atom_chars(D, [Y1, Y2, Y3, Y4, -, M1, M2, -, D1, D2])
		, parse_time(D, Date)
	}
	.

A query such as

phrase(dcg_date(Date), “2020-09-11”).

yields the expected answer: Date=1599782400.0.; all well.

Now, I create a file containing the exact same text and

phrase_from_file(dcg_date(Date), ‘test.txt’).

yields false.
The same happens with

phrase_from_file(dcg_date(Date), ‘Repos/prolog/test.txt’, [encoding(utf8)]).

When tracing this prolog code, it appears that my query fails on the line "member(Sep, [/, _]) because when using phrase_from_file, Sep is a Code, where I would expect a Char.

I know, I could convert my DCG to use Codes instead of Chars, but you will have to admit that Chars are more readable, hence my question: is there a way to make phrase_from_file use chars instead of codes?

EricGT · September 12, 2020, 1:33pm

Yes they are. I do many DCGs but do them all with codes. While I don’t use a cheat sheet because I have so many examples of working DCGs in one directory perhaps you need a template and some standard conversions on a cheat sheet.

Also I hate to say it but when I started with DCGs I lived in library(dcg/basics) but now have come to avoid it. (Sorry) I found that in using it I was having to reach for append/3 to get me out of some situations. Now I start with open list/difference list at the lowest level and then convert the open list to closed list when done or as needed.

While it is nice to see the chars, having an ASCII table open helps.

In short if you are looking for someone to say it is OK to switch to code instead of chars then yes, that is what I did.

peter.ludemann · September 12, 2020, 2:44pm

I’m surprised this worked.

Try this:

{ member(Sep, `/-`) }

or replace each use of [Sep] with sep(Sep) defined like this:

sep('-') --> `-`.
sep('/') --> '/'.

Note that this leaves some choicepoints around (as does your original code; you might want to use memberchk/2 instead, or add a cut to the definition of sep//1).

In “original” Prolog, there were no strings and the notation "abc" was used instead of SWI-Prolog’s current back-tick notation; people typically used portray/1 to detect lists that looked like strings and print them in a readable form. There’s a discussion here: SWI-Prolog -- library(strings): String utilities

peter.ludemann · September 12, 2020, 3:04pm

PS: My personal style is to put {...} around single goals, so I’d write:

	, { member(Sep, `/-`) }
	, { atom_chars(D, [Y1, Y2, Y3, Y4, -, M1, M2, -, D1, D2]) }
	, { parse_time(D, Date) }

Also, putting commas at the beginnings of lines, while slightly more convenient for adding items later, is as uncommon as putting semi-colons at the beginnings of lines in C.

remy.s · September 12, 2020, 3:43pm

Could you please expand on this approach?

EricGT · September 12, 2020, 5:52pm

Here is digits//1 from library(dcg/basics)

digits//1 (Source at GitHub)

digits([H|T]) -->
	digit(H), !,
	digits(T).
digits([]) -->
	[].

digit(C) -->
	[C],
	{ code_type(C, digit)
	}.

Here is the typical usage of digits//1.

Details

File name: example_01.pl

digits([H|T]) -->
	digit(H), !,
	digits(T).
digits([]) -->
	[].

digit(C) -->
	[C],
	{ code_type(C, digit)
	}.

:- begin_tests(digits).

:- set_prolog_flag(double_quotes, codes).

case(01,success,  "0",  0).
case(02,success,"123",123).

case(01,fail,"a").

case(01,error,"").

test(01,[forall(case(_,success,Digits,Number))]) :-
    DCG = digits(Digits),
    phrase(DCG,Codes,[]),
    number_codes(Number,Codes).

test(02,[fail,forall(case(_,fail,Digits))]) :-
    DCG = digits(Digits),
    phrase(DCG,Codes,[]),
    number_codes(_,Codes).

test(03,[forall(case(_,error,Digits)), error(syntax_error(illegal_number),_)]) :-
    DCG = digits(Digits),
    phrase(DCG,Codes,[]),
    number_codes(_,Codes).

:- end_tests(digits).

Example run:

Welcome to SWI-Prolog (threaded, 64 bits, version 8.3.7)

?- working_directory(_,'C:/Users/Groot/Documents').
true.

?- ['example_01'].
true.

?- run_tests.
% PL-Unit: digits .... done
% All 4 tests passed
true.

Here is the way I do digits//2 using an open list.

Details

File name: example_01.pl

digits(T0,T) -->
	digit(T0,T1), !,
	digits(T1,T).
digits(T,T) --> [].

digit([C|T],T) -->
	[C],
	{ code_type(C, digit) }.

:- begin_tests(digits).

:- set_prolog_flag(double_quotes, codes).

case(01,success,  "0",  0).
case(02,success,"123",123).

case(01,fail,"a").

case(01,error,"").

test(01,[forall(case(_,success,Digits,Number))]) :-
    DCG = digits(Digits,T),
    phrase(DCG,Codes,[]),
    T = [],
    number_codes(Number,Codes).

test(02,[fail,forall(case(_,fail,Digits))]) :-
    DCG = digits(Digits,T),
    phrase(DCG,Codes,[]),
    T = [],
    number_codes(_,Codes).

test(03,[forall(case(_,error,Digits)), error(syntax_error(illegal_number),_)]) :-
    DCG = digits(Digits,T),
    phrase(DCG,Codes,[]),
    T = [],
    number_codes(_,Codes).

:- end_tests(digits).

Example run:

Welcome to SWI-Prolog (threaded, 64 bits, version 8.3.7)

?- working_directory(_,'C:/Users/Groot/Documents').
true.

?- ['example_02'].
true.

?- run_tests.
% PL-Unit: digits .... done
% All 4 tests passed
true.

Notice that the code is very similar, but that when digits//2 is done it is still an open list. The open list is converted to a closed list with T = [].

Here are some more difference list DCGs that use digits//2

'digit+'(T0,T) -->
    digit(T0,T1), !,
    'digit*'(T1,T).

'digit*'(T0,T) -->
    digit(T0,T1), !,
    'digit*'(T1,T).
'digit*'(T,T) --> [].

‘digit+’//2 is for one or more digits.
‘digit*’//2 is for zero or more digits.

If you are thinking regular expressions after reading the description of those two DCGs you are correct. Even though I don’t like using RE for parsing, those patterns are so common in parsing that they show up in RE and parsing in general.

So what has this bought me. At first sight this looks like nothing gained but one of the first things you need when writing large parser is that the DCG works as a recognizer. This means that the DCG can read the input without any errors. A recognizer does not return and AST, codes, or anything. A recognizer is just a predicate that returns true or false.

Here is an example recognizer I use for recognizing ABNF files.

recognize(Abnf_path) :-
    setup_call_cleanup(
        open(Abnf_path,read,Abnf_stream),
        (
            set_stream(Abnf_stream, newline(posix)),
            read_stream_to_codes(Abnf_stream,Codes),
            DCG = rulelist(_,_),
            phrase(DCG,Codes,[])
        ),
        close(Abnf_stream)
    ).

Once I have the DCG working as a recognizer the next step I typically take is to get the DCG to reach a fixed point. This means the input will get converted to some format then the format will be processed again. Usually after one or two iterations the output is at a fixed point. At the lowest level the fixed point takes apart the input code and builds a list of output code from scratch. The input codes should be identical to the output codes. Sounds easy, but try doing this with code containing white space or that doesn’t have a well designed BNF, or that has a BNF but the BNF was designed for LL or LR parsers which don’t always make for working nicely with DCGs.

Here is an example fixed point check I use with ABNF files.

fixed_point_codes(Abnf_path) :-
    setup_call_cleanup(
        open(Abnf_path,read,Abnf_stream),
        (
            set_stream(Abnf_stream, newline(posix)),
            read_stream_to_codes(Abnf_stream,Codes_1),
            DCG1 = rulelist(Codes_2,[]),
            phrase(DCG1,Codes_1,[])
        ),
        close(Abnf_stream)
    ),
    assertion( Codes_1 == Codes_2 ).

Notice that the check Codes_1 == Codes_2. The output must be reconstructed exactly the same as the input.

Next I start to build the parsers that will return things such as AST, the output that preserves the input with whitespace, etc.

One of the first predicates that I create are recognizers for common non-terminals, e.g.

digits -->
    digits(_,[]).

Notice that it does not return anything. It just recongizes that the input is a valid input of digits. It is based on the difference list DCG, e.g. digits//2 and uses [] to convert the difference list to a closed list.

Here is an example that converted digit//2 into a term for use in an AST

digit(dec(digits(String))) -->
    digit(Codes,[]),
    { string_codes(String,Codes) }, !.

I know this has only given you some small examples but keeps eluding to much larger things. I haven’t posted the much larger examples yet because I ran into a pattern with rfc for parsing URLs and HTTP request that are based on ABNF using other ABNF. While the code runs correctly, I did not like the tricks I had to use to get the code to work correctly while still being in multiple modules matching the different rfc. The tricks were not for the DCG but for the calling of the DCG in modules which started the discussion on Prolog Many Worlds.

So the point of all of this is that if you start writing lots of parsers using DCGs and keep refactoring the code down and do more and more advanced things with parsing I think you will find that starting at the lowest layer with the DCGs returning differnece list instead of closed list will gain you some major benefits in code reuse, understanding of how to make the DCG efficient, and reaching fixed points.

Other post related to this:
Learned the hard way
Wiki discussion on DCGs.

EDIT

Here is a larger code example I found laying around of some code I started to parse JSON files. I did not test this but it shouldn’t take much to make it usable if it is not already.

Example JSON parser using difference list

/*

   Copyright (c) 2020 Eric G. Taucher

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions are met:

   1. Redistributions of source code must retain the above copyright notice, this
      list of conditions and the following disclaimer.
   2. Redistributions in binary form must reproduce the above copyright notice,
      this list of conditions and the following disclaimer in the documentation
      and/or other materials provided with the distribution.

   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
   ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
   WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
   ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
   (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
   LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
   ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
   SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

*/

:- module('JSON DCG parser',[
      array/4,
      % character/4,
      % characters/4,
      digit/4,
      'digit*'/4,
      'digit+'/4,
      element/4,
      % element_first/4,
      % element_rest/4,
      % element_separator/4,
      % escape_code/4,
      % exponent_indicator/4,
      false/4,
      fixed_point_codes/1,
      % hex_digit/4,
      % hex_digits/5,
      json/4,
      member/4,
      % member_first/4,
      % member_rest/3,
      % member_rest/4,
      % member_separator/4,
      % name_value_separator/4,
      null/4,
      number/4,
      % number_exponent/4,
      % number_fraction/4,
      % number_whole/4,
      object/4,
      % optional_negative_sign/4,
      % optional_sign/4,
      recognize/1,
      % start_digit/4,
      string/4,
      true/4,
      % value/4,
      ws//2,
      'ws*'//2,
      'ws+'//2
   ]).

/** <module> JSON DCG parser
These are basic DCG which return difference list.
Difference list are easily converted to standard list (closed list) by unifying the tail with [].

These predicates are not intended to be used directly but instead via predicates in modules that
rely on these predicates, e.g.
modules:
JSON_format_preserving
JSON_structure_preserving

@author Eric G Taucher
@license Simplified BSD License
*/

% -----------------------------------------------------------------------------

% Based on JSON BNF at https://www.json.org/json-en.html

% -----------------------------------------------------------------------------

:- set_prolog_flag(double_quotes,codes).

% -----------------------------------------------------------------------------

json(T0,T) -->
   element(T0,T).

value(T0,T) --> object(T0,T).
value(T0,T) --> array(T0,T).
value(T0,T) --> string(T0,T).
value(T0,T) --> number(T0,T).
value(T0,T) --> true(T0,T).
value(T0,T) --> false(T0,T).
value(T0,T) --> null(T0,T).

object([0'{|T1],T) -->
   "{",
   (
      'ws*'(T1,[0'}|T])
   ;
      'member_first'(T1,[0'}|T])
   ).

member_first(T0,T) -->
   member(T0,T1),
   member_rest(T1,T).

member_rest(T0,T) -->
   member_separator(T0,T1),
   member(T1,T2),
   member_rest(T2,T).
member_rest([]) --> [].

member_separator([0',|T],T) --> ",".

member(T0,T) -->
   'ws*'(T0,T1),
   string(T1,T2),
   'ws*'(T2,T3),
   name_value_separator(T3,T4),
   element(T4,T).

name_value_separator([0':|T],T) --> ":".

array([0'[|T0],T) -->
   "[",
   (
      'ws*'(T0,[0']|T])
   ;
      element_first(T0,[0']|T])
   ),
   "]".

element_first(T0,T) -->
   element(T0,T1),
   element_rest(T1,T).

element_rest(T0,T) -->
   element_separator(T0,T1),
   element(T1,T2),
   element_rest(T2,T).
element_rest(T,T) --> [].

element_separator([0',|T],T) --> ",".

element(T0,T) -->
   'ws*'(T0,T1),
   value(T1,T2),
   'ws*'(T2,T).

string([0'\"|T0],T) -->
   characters(T0,T).

characters(T0,T) -->
   character(T0,T1), !,
   characters(T1,T).
characters([0'"|T],T) --> "\"".

character([Code|T],T) -->
   [Code],
   {
       Code \= 0'"
   ;
       Code \= 0'\
   ;
       \+ ( Code >= 0x00, Code =< 0x0F )
   ;
       Code \= 0x7F
   }.
character(T0,T) --> escape_code(T0,T).
character([0'\\,0'u|T0],T) -->
   [0'\\,0'u],
   hex_digits(4,T0,T).

escape_code([0'\\,0'"|T],T) --> "\"".
escape_code([0'\\,0'\\|T],T) --> "\\".
escape_code([0'\\,0'/|T],T) --> "\\/".
escape_code([0'\\,0'\b|T],T) --> "\\\b".
escape_code([0'\\,0'\f|T],T) --> "\\\f".
escape_code([0'\\,0'\n|T],T) --> "\\\n".
escape_code([0'\\,0'\r|T],T) --> "\\\r".
escape_code([0'\\,0'\t|T],T) --> "\\\t".

hex_digits(0,T,T) --> [].
hex_digits(N0,T0,T) -->
   hex_digit(T0,T1),
   { N is N0 - 1 },
   hex_digits(N,T1,T).

hex_digit([Code|T],T) -->
   [Code],
   { Code >= 0'0, Code =< 0'9 }.
hex_digit([Code|T],T) -->
   [Code],
   { Code >= 0'a, Code =< 0'f }.
hex_digit([Code|T],T) -->
   [Code],
   { Code >= 0'A, Code =< 0'F }.

number(T0,T) -->
   optional_negative_sign(T0,T1),
   number_whole(T1,T2),
   number_fraction(T2,T3),
   number_exponent(T3,T).

optional_negative_sign([0'-|T],T) --> "-".
optional_negative_sign(T,T) --> [].

number_whole([0'0|T],T) --> "0".
number_whole(T0,T) -->
   start_digit(T0,T1),
   'digit*'(T1,T).

number_fraction([0'.|T0],T) -->
   ".",
   'digit+'(T0,T).
number_fraction(T,T) --> [].

number_exponent(T0,T) -->
   exponent_indicator(T0,T1),
   optional_sign(T1,T2),
   'digit+'(T2,T).
number_exponent(T,T) --> [].

exponent_indicator([0'e|T],T) --> "e".
exponent_indicator([0'E|T],T) --> "E".

optional_sign([0'-|T],T) --> "-".
optional_sign([0'+|T],T) --> "+".
optional_sign(T,T) --> [].

start_digit([C|T],T) -->
   [C],
   { between(0'1,0'9,C) }.

'digit+'(T0,T) -->
   digit(T0,T1),
   'digit+'(T1,T).

'digit*'(T0,T) -->
   digit(T0,T1),
   'digit*'(T1,T).
'digit*'(T,T) --> [].

digit([C|T],T) -->
   [C],
   { between(0'0,0'9,C) }.

true([0't,0'r,0'u,0'e|T],T) -->
   "true".

false([0'f,0'a,0'l,0's,0'e|T],T) -->
   "false".

null([0'n,0'u,0'l,0'l|T],T) -->
   "null".

'ws+'(T0,T) -->
   ws(T0,T1), !,
   'ws*'(T1,T).
'ws*'(T0,T) -->
   ws(T0,T1), !,
   'ws*'(T1,T).
'ws*'(T,T) --> [].

ws([0'\s|T],T) --> " ".
ws([0'\n|T],T) --> "\n".
ws([0'\r|T],T) --> "\r".
ws([0'\t|T],T) --> "\t".

% -----------------------------------------------------------------------------

%! recognize(+Path) is semidet.
%
% Recognizes a grammar. Silently fails.
recognize(Path) :-
   setup_call_cleanup(
       open(Path,read,Stream),
       (
           set_stream(Stream, newline(posix)),
           read_stream_to_codes(Stream,Codes),
           DCG = json(_,_),
           phrase(DCG,Codes,[])
       ),
       close(Stream)
   ).

% ?- recognize('xyz.json').

% A parser at this level would be pointless as all it would return is the input
% as a list of character codes which is equivalent to string_codes/2.

%! fixed_point_codes(+Path) is semidet.
%
% Compares the input codes to the output codes which should be identical.
% If they are not the predicate silently fails which indicates a problem with
% the ABNF file or the Prolog Code. <br>
% No specific errors are returned.
fixed_point_codes(Path) :-
   setup_call_cleanup(
       open(Path,read,Stream),
       (
           set_stream(Stream, newline(posix)),
           read_stream_to_codes(Stream,Codes_1),
           DCG1 = json(Codes_2,[]),
           phrase(DCG1,Codes_1,[])
       ),
       close(Stream)
   ),
   assertion( Codes_1 == Codes_2).

% ?- fixed_point_codes('xyz.json').

% -----------------------------------------------------------------------------

SWI-Prolog also has a JSON parser in library(http/json)

GitHub source code

jan · September 13, 2020, 7:59am

For almost any reason I’d write sep//0 and use this instead of [Sep].

sep --> "/".
sep --> "-".

This is more readable, fails earlier and is thus faster and doesn’t depend on representation details. If both seps must be the same, use this and share
the variable.

sep(/) --> "/".
sep(-) --> "-".

jan · September 13, 2020, 8:49am

As is, no. I’m not really sure we want it either. Playing around with flags such as double quotes is mostly asking for trouble. Traditionally Prolog systems from the Edinburgh/Quintus tradition used codes. Some others (I think) in the time of the ISO standardization some others used chars, so we ended up with both options and atom_codes/2 (which was the Quintus atom_chars/2) and atom_chars/2 which now produces one-character atoms.

There are some advantages of chars, notably that you can easily read them. There are some advantages to codes, notably that you can do some arithmetic on them, such as deciding that DigitWeight is Ditgit - 0'0. For many purposes codes are cheaper as we do not need atom-gc on them. In the old days this wasn’t a big issue as there were 256 chars, but now we have 0x10ffff chars and we do not wish to make them all locked built-in atoms.

For writing code it doesn’t matter too much whether chars or codes are used. It just matters in the debugger and when the system prints an answer at the toplevel. For this reason there is library(portray_text), notably portray_text/1. This tries to interpret lists of codes as strings, so we get

1 ?- portray_text(true).
true.

2 ?- A = `hello`.
A = "hello".

(actually we should respect the double quotes flag and render as hello, will change that).

jan · September 13, 2020, 9:49am

Pushed a fix for that. Also added a multifile hook portray_text:is_text_code/1 that can be used to extend the set of integers that are most likely code points (the default only uses ASCII non-control characters). Note that the library already allows defining the minimum length to try and interpret a list as codes and allow writing long strings as start…end. Also partial lists are written in the non-compliant syntax

 `hello world|Tail`

P.s. Note that the portable way to create such a string is

 phrase("hello world", PartialList, Tail)

With library(apply_macros) loaded this is expanded simply to

 PartialList = [104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100|Tail]

EricGT · September 13, 2020, 10:45am

You know, I am sure I knew this was possible because I read it in the documentation but actually seeing it makes it stick in my mind. Now I will have to go back and look at my code to see where this makes sense.

remy.s · September 13, 2020, 11:25am

Actually I was considering something like your suggestion. But then opted in favour of [Sep] because I did not want to introduce just one smallish rule. In fact, my dcg’s will end-up with a lot “business related” rules, so this separator is a very low-level implementation detail.

remy.s · September 13, 2020, 11:29am

At the origin of my post was the frustration that my DCG was working differently when using phrase vs. phrase_from_file.
But this conversation led me to a better understanding of this matter, and I acknowledge that I should go with the codes rather than the chars. I will adapt my DCGs accordingly.
Thanks you so much for your insights.

remy.s · September 13, 2020, 11:47am

By the way, I looked up the code of phrase_from_file/3 in module pure_input.
As I understand the code, the actual reading of the file is done in a clause attr_unify_hook_ndebug/2 attached as a handler to an attributed var.
The relevant code there is

attr_unify_hook_ndebug(State, Value) :-
   State = lazy_input(Stream, _PrevPos, Pos, Read),
   (   var(Read)
   ->  fill_buffer(Stream),
       read_pending_codes(Stream, NewList, Tail),
       (   Tail == []
       ->  nb_setarg(4, State, []),
           Value = []
       ;   stream_to_lazy_list(Stream, Pos, Tail),
           nb_linkarg(4, State, NewList),
           Value = NewList
       )
   ;   Value = Read
   )

Besides read_pending_codes/3 used in this clause, prolog has a read_pending_chars/3
So changing the behaviour of phrase_from_file, should one consider to do so, would be to pass an additional option from phrase_from_file/3 down the line to the attribute hook.

EricGT · September 13, 2020, 11:49am

Please do not do that.

remy.s · September 13, 2020, 12:02pm

What is your argumentation for not doing this?
My argumentation for doing it, would be:

a user of module pure_input should be given a choice, because both methods for reading a file(codes vs chars) do exist in the surrounding environment
obviously, any change must not influence existing user code; hence an additional option, with a default value to codes
implementation should be done in a way to not hit the performance when reading codes

remy.s · September 13, 2020, 12:29pm

EricGT:

Here is an example recognizer I use for recognizing ABNF files.

recognize(Abnf_path) :-
    setup_call_cleanup(
        open(Abnf_path,read,Abnf_stream),
        (
            set_stream(Abnf_stream, newline(posix)),
            read_stream_to_codes(Abnf_stream,Codes),
            DCG = rulelist(_,_),
            phrase(DCG,Codes,[])
        ),
        close(Abnf_stream)
    ).

I studied your code, and this has been of great inspiration to me.
But may I ask: why are you using the combination “read_stream_to_codes(…), phrase(…)” rather than phrase_from_file?

EricGT · September 13, 2020, 12:40pm

Good question.

In one place I read that all JSON line ends have to have both the CR and LF. Without newline(posix) IIRC the CR is dropped and thus the JSON is considered invalid. See: Preserving CR with LF when converting text files to character codes

However in looking at the JSON spec, it seems that the CR is not required so I will have to see if my parser is following the correct spec.

But you are correct that in most places I would simply use phrase_from_file/2,3 and you could use it in your code.

In an eariler version of this post I noted optimizations as the reason for not using phrase_from_file/2,3 but have since rewrote that part; so leaving this part as it is still of use with regards to DCGs in general.

If you read my example JSON parser (post) and notes you will see that my example code has some optimizations that I should not have done during development per my rules of development.

object([0'{|T1],T) -->
   "{",
   (
      'ws*'(T1,[0'}|T])
   ;
      'member_first'(T1,[0'}|T])
   ).

Boris · September 13, 2020, 12:48pm

Just curious, for what kind of JSON parsing did you need this code? How is the JSON support that comes with SWI-Prolog not fitting your use case? Is it something about how the JSON maps to Prolog terms?

EricGT · September 13, 2020, 1:19pm

When I wrote the parser it was to better understand the intricacies of what is valid JSON for use with Cytoscape.js (ref), but then I found JSONLint (ref).

Now that I know the JSON needed for Cytoscape.js, the JSON should not pass a simple JSONLint (syntax), but pass a more complex set of rules (semantics).

While JSON provides a syntactic framework for data interchange, unambiguous data interchange also requires agreement between producer and consumer on the semantics of a specific use of the JSON syntax. (Spec)

Specifically for Cytoscape.js this implies that there should be a JSON recognizer (semantics) for the elements, style and layout. (ref) But the JSON recognizer will not recognize the syntax of JSON, but the semantics of each JSON file.

SWI-Prolog can check the syntax of JSON, but AFAIK there is no predicate to check the semantics. Such a check would probably need something like a DTD. I did find JSON Schema but have not looked at it beyond finding it. I would not be surprised if Jan W. tells me how it is done, but it was not apparent to me when I searched. While I could ask, I find that taking the time to do the search and understand the predicates I found when searching makes it much easier to understand the responses from others when asking, e.g. Is there a way to go from HTML to the Prolog representation of the HTML? before asking I was researching quasiquotations. (ref) I just did not see the connection.

No, as noted it is about the difference between syntax versus semantics and the need for something like a DTD for the semantics.

For something similar see the ini files used with ODBC connections. ini file type specifies the syntax but not the semantics. (ref)

peter.ludemann · September 13, 2020, 4:18pm

Shouldn’t this be

sep(/) --> `/`.

unless the double_quotes flag is set?

Topic		Replies	Views
What's the idiomatic way of developing DCGs? Help!	8	657	December 17, 2020
Simple example of phrase_from_file/2? Help!	13	944	March 29, 2021
Double_quotes flag and DCGs Discussion dcg	7	618	July 22, 2023
Using phrase_from_file Help!	9	1219	February 19, 2020
Most efficient DCG for text parsing? Algorithm	4	1866	July 5, 2021

Phrase_from_file vs phrase

Related topics