`w` write mode not working?

Sorry if I’m missing something obvious but write mode doesn’t seem to be working for me here: I have a very small txt file which I’m reading with

file_lines([]) --> [].
file_lines([L|Ls]) -->
    [L],
    file_lines(Ls).    

test(File,Content) :-
  phrase_from_file(file_lines(Content), File).

I’m getting an abbreviated list of codes in the repl which I used to be able to expand with w but now I’m getting

46 ?- test('file.txt',Content).
Content = [102, 111, 111, 10, 98, 97, 114, 10, 104|…] [write]
Content = [102, 111, 111, 10, 98, 97, 114, 10, 104|…] [write]
Content = [102, 111, 111, 10, 98, 97, 114, 10, 104|…] .

Is there a different command for this now or a flag or anything? I recently upgraded from 9.2.X (I think) to 10.0.2. I tried checking the changelog but don’t believe I’m seeing anything on this.

See Effefct of "write" option at the top-level - #2 by jan

+ works, thanks @jan . Do you want me to delete this post?

@jan can I ask one more quick question? I keep getting args not instantiated for this and can’t tell what I’m doing wrong

file_lines([]) --> [].
file_lines([L|Ls]) -->
    { string_codes(Char,[L]) },
    [Char],
    file_lines(Ls).    

test(File,Content) :-
  phrase_from_file(file_lines(Content), File).

I’m trying to get a list of characters rather than the character codes above.

You should swap two lines:

file_lines([L|Ls]) →
[Char],
{ string_codes(Char,[L]) },
file_lines(Ls).

Like this?

file_lines([]) --> [].
file_lines([L|Ls]) -->
    [Char],
    { string_codes(Char,[L]) },
    file_lines(Ls).    

test(File,Content) :-
  phrase_from_file(file_lines(Content), File).

I’m getting false

131 ?- trace, test('file.txt',Content).
   Call: (13) test('file.txt', _7680) ? creep
^  Call: (14) pure_input:phrase_from_file(file_lines(_7680), 'file.txt') ? creep
   Call: (19) open('file.txt', read, _10326, []) ? creep
   Exit: (19) open('file.txt', read, <stream>(0x600001124300), []) ? creep
   Call: (21) file_lines(_7680, _12174{pure_input = ...}, []) ? creep
   Call: (22) _12174{pure_input = ...}=[] ? creep
   Fail: (22) _12174{pure_input = ...}=[] ? creep
   Redo: (21) file_lines(_7680, _12174{pure_input = ...}, []) ? creep
   Call: (22) string_codes(102, [_15942]) ? creep
   Fail: (22) string_codes(102, [_15942]) ? creep
   Fail: (21) file_lines(_7680, _12174{pure_input = ...}, []) ? creep
   Call: (18) close(<stream>(0x600001124300)) ? creep
   Exit: (18) close(<stream>(0x600001124300)) ? creep
^  Fail: (14) pure_input:phrase_from_file(user:file_lines(_7680), 'file.txt') ? creep
   Fail: (13) test('file.txt', _7680) ? creep
false.

By the way, what does the caret ^ indicate in the trace output? I couldn’t find that in the docs.

I think that should be


file_lines([]) --> [].
file_lines([Char|Ls]) -->
    [Code],
    { string_codes(Char, [Code]) },
    file_lines(Ls). 

because in a DCG [Code] will unify Code with a single code and so you want to turn that into a string (i.e. in CapeliC’s code, Char and L should be swapped).

The reason you got an instantiantion error and CapelliC suggested you change the order of the lines is that, in your original code, you’re calling string_codes/2 with both arguments uninstantiated, which raises an error; therefore you want to match the code first, then convert to a string.

Thanks, yep that works, but it’s definitely counter intuitive.

file_lines([]) --> [].
file_lines([L|Ls]) -->
    [L],
    file_lines(Ls).

L seems to unify with a code

109 ?- test('file.txt',Content).
Content = [102, 111, 111, 10, 98, 97, 114, 10, 104|…] .

So I thought my goal is to take the head, where Content = [L|_], L = 102 and pass it to the second argument as [L], since

110 ?- string_codes(Char,[102]).
Char = "f".

I’ll go with your solution but I’m a little confused about why this works.

Ah, sorry, I may have confused things by changing variable names.

Your original code:

file_lines([]) --> [].
file_lines([L|Ls]) -->
    [L],
    file_lines(Ls).

But we want L to be a single-character string instead:

file_lines([]) --> [].
file_lines([L|Ls]) -->
    [Char], % changed from L to Char
    { string_codes(L, [Char]) }, % added this line
    file_lines(Ls).

There are two differences here from what you had above:

  1. The DCG matches [Char] before calling string_codes/2, so we don’t get the instantiation error (because at least one of the arguments must be ground).
  2. The order of the arguments to string_codes/2 - the first argument is the string, the second is the list of codes.

To clarify, the L in head is bound by [L] matching the next code in the stream; at the beginning of the DCG body, L is still unbound; does that make sense?

Even if we get this right though, the file is translated into a list of strings, each holding one character. I doubt that is the intend of the OP. What is the intend though? The name suggests a list of strings, one for each line?

Sorry @jamesnvc not completely sure I’m following.

With

 file_lines([]) --> [].
 file_lines([L|Ls]) -->
     [L],
     file_lines(Ls).

I’m seeing

105 ?- test('file.txt',Content), Content = [L|_].
Content = [102, 111, 111, 10, 98, 97, 114, 10, 104|…],
L = 102 .

Again, what I was attempting to do was wrap L in a list, as [102], then convert it to a char by

106 ?-  string_codes(Char,[102]).
Char = "f".

That was the thought behind

file_lines([]) --> [].
file_lines([L|Ls]) -->
    [Char],
    { string_codes(Char,[L]) },
    file_lines(Ls). 

because string_codes(?String, ?Codes) file_lines is unifying with the codes, the head unifies with a code literal, so I wanted to put in a list and pass it in to the second argument of string_codes/2 to get a string.

In your solution we’re passing 102 into the first argument which is expecting a string but it still works?

@jan yes, sorry about the deceptively named predicates. I’m working my way to a different behavior than what we have here. Ultimately I’m trying to get to a list of lines from a file akin to python’s readlines().

I think this is where the misunderstanding lies: In the code above, [Char] is what’s unifying the next code in the file with the variable Char; at that point, L has not been unified with anything. In your previous code block, where you have file_lines([L|Ls]) --> [L], … it’s the second occurrence of L, in the body of the DCG that actually does the parsing, which then unifies with the same L in head.

Oh I see what you’re saying. I thought variables in file_lines([L|Ls]) were unified first. If everything after → gets unified first then this makes sense. Thanks.

And one last thing:

This basically gets me what I want

lines([]) --> [].
lines([Str|Lines]) -->
   line(Line),
   { atomic_list_concat(Line,Str) },
   lines(Lines).

line(["\n"]) -->
   [Code],
   { string_codes("\n", [Code]) }.
line([Char|Rest]) -->
   [Code],
   { string_codes(Char, [Code]) },
   line(Rest).

readlines(File,Content) :-
  phrase_from_file(lines(Content),File),
  !.

For a file like file.txt with

foo
bar
hello
world

I get

?- readlines('file.txt',Content).
Content = ['foo\n', 'bar\n', 'hello\n', 'world\n'].

but then I run this on a 26.7MB (>303k rows csv) data set file and get “Stack limit (1.0Gb) exceeded” after about 7 seconds. The analogous readlines() python function handles the same file with no issue

import time
def readlines_time():
	start = time.perf_counter()
	file = open("health.csv", "r")
	lines = file.readlines()
	file.close()
	print(lines)
	end = time.perf_counter()
	elapsed = end - start
	print(f'Time taken: {elapsed:.6f} seconds')

runs in 1.365822 seconds cold.

How do I optimize this?

E.g.

lines([]) --> [].
lines([String|Strings]) -->
   line(Codes),
   {string_codes(String,Codes)},
   !, lines(Strings).

line([]) --> "\n".
line([Code|Rest]) -->
   [Code],
   line(Rest).

readlines(File,Strings) :-
  phrase_from_file(lines(Strings),File).

that yields

112 ?- time(readlines(large,Ls)).
% 16,691,295 inferences, 1.750 CPU in 1.791 seconds (98% CPU, 9537883 Lips)

on a 16mb file.

But… are you aware of library(csv) ?

At least that gets rid of the choicepoint per line, which avoids the stack overflow. Better yet is to make line//1 deterministic though by using

 line([]) --> "\n", !.

P.s. The quickest (be it a bit ugly) is probably

read_lines(File, Lines) :-
    read_file_to_string(File, String, []),
    split_string(String, "\n", "", Lines).

See also read_line_to_string/2. But if CSV is the target, library(csv) is probably the best answer. It is fairly efficient and deals with various subtle details in handling various CSV dialects and edge cases.

They are unified first but they don’t necessarily become ground first. For example, in X=Y, the X and Y are unified, but they don’t have any value. In X=Y,Y=1, because X and Y are unified, Y=1 results in X having the value 1 also.

Yes, please use library(csv) as suggested by Jan if you parse a CSV file. It also has backtracking predicate, csv_read_file_row/3, and also csv_read_row/3. You can use those if you don’t need the full content of the file in memory. You can see the example under repeat/0 to see how to process a file with csv_read_row/3.

If you have a well-defined task it would be easier to provide a good solution.

Thanks all, really appreciate it. For sure, I’ll check out the csv library but the goal here is to put together a generic utilities library for myself for my work to parse my clients’ data, some of which tends to be structured but not of a well known specification like csv or json. I chose csv in this case at random for testing on a large data set.