Io_error with 'Success' context in Pengines

I’m using: swipl:8.4.2 (Docker, Linux host, k8s cluster)

I’m getting an odd error when passing certain addresses as src_url to Pengines. It’s an io_error with 'Success' as the context. It seems to happen randomly, about 50% chance for me, with the other requests succeeding as normal. It only happens with HTTP URLs on the local network. It seems that HTTPS URLs on the public internet are fine.

error(io_error(read,'<stream>(0x562a20ed6000)'),context(system:read_string/3,'Success'))

I’m calling Pengines like such:

pengine_rpc('http://pengines-service:4242/', member(X,[1,2]), [src_url('http://foo-service.bar-dev.svc.cluster.local:8000/internal/script/')]).

However, requests using a public HTTPS URL work properly. Attaching to the toplevel and running http_open/3 had no trouble obtaining the script with either HTTP or HTTPS so I’m led to believe this might be a Pengines or threading-specific thing. My pengines setup is a minimal, mostly default setup that looks like this.

I tried enabling a bunch of debugging stuff to get a log, and this is what I found:

% [Thread http@4242] New HTTP connection from ip(127,0,0,6)
% [Thread http@4242] Waiting for connection
% [Thread httpd@4242_2] Opened connection from ip(127,0,0,6)
% [Thread httpd@4242_2] First line: POST /pengine/create?format=prolog HTTP/1.1
% [Thread httpd@4242_2] Header =
host: pengines-service:4242
user-agent: Go-http-client/1.1
content-length: 598
content-type: application/json; charset=utf-8
accept-encoding: gzip
x-forwarded-proto: http
x-request-id: 62b42b45-2175-4561-90a0-b2e7cbbca38f
x-envoy-attempt-count: 1
x-forwarded-client-cert: By=spiffe://cluster.local/ns/stream-dev/sa/default;Hash=b11c61e4809804d88820e751dfa9e748722cd16c697ced90f5addac7ddd9bd15;Subject="";URI=spiffe://cluster.local/ns/bar-dev/sa/default
x-b3-traceid: 5b588c39a8db8f697d98554f664c2572
x-b3-spanid: 0bcd87679e8f2804
x-b3-parentspanid: 7d98554f664c2572
x-b3-sampled: 0



% [Thread httpd@4242_2] Field: [host('pengines-service'),port(4242),user_agent('Go-http-client/1.1'),content_length(598),content_type('application/json; charset=utf-8'),accept_encoding(gzip),x_forwarded_proto(http),x_request_id('62b42b45-2175-4561-90a0-b2e7cbbca38f'),x_envoy_attempt_count('1'),x_forwarded_client_cert('By=spiffe://cluster.local/ns/bar-dev/sa/default;Hash=b11c61e4809804d88820e751dfa9e748722cd16c697ced90f5addac7ddd9bd15;Subject="";URI=spiffe://cluster.local/ns/bar-dev/sa/default'),x_b3_traceid('5b588c39a8db8f697d98554f664c2572'),x_b3_spanid('0bcd87679e8f2804'),x_b3_parentspanid('7d98554f664c2572'),x_b3_sampled('0')]
% [Thread httpd@4242_2] [234] post /pengine/create ...
% [Thread 13] http_open: Connecting to 'foo-service.bar-dev.svc.cluster.local':8000 ...
% [Thread 13] 	ok <stream>(0x562a20fb8f00) ---> <stream>(0x562a20ed6600)
% [Thread 13] > GET /internal/script/ HTTP/1.1
% [Thread 13] > Host: foo-service.bar-dev.svc.cluster.local:8000
% [Thread 13] > User-Agent: SWI-Prolog
% [Thread 13] > Connection: close
% [Thread 13] HTTP/1.1 200 OK
% [Thread 13] content-type: application/x-prolog; charset=utf-8
% [Thread 13] etag: ACME_1650984427546036306
% [Thread 13] last-modified: Tue, 26 Apr 2022 23:47:07 GMT
% [Thread 13] date: Tue, 26 Apr 2022 14:53:29 GMT
% [Thread 13] x-envoy-upstream-service-time: 1
% [Thread 13] server: envoy
% [Thread 13] connection: close
% [Thread 13] transfer-encoding: chunked
% [Thread 13] Field: [content_type('application/x-prolog; charset=utf-8')]
% [Thread 13] Field: [etag('ACME_1650984427546036306')]
% [Thread 13] Field: [last_modified('Tue, 26 Apr 2022 23:47:07 GMT')]
% [Thread 13] Field: [date('Tue, 26 Apr 2022 14:53:29 GMT')]
% [Thread 13] Field: [x_envoy_upstream_service_time('1')]
% [Thread 13] Field: [server(envoy)]
% [Thread 13] Field: [connection(close)]
% [Thread 13] Field: [transfer_encoding(chunked)]
[Thread 13]  T [24] Call: read_string(<stream>(0x562a20ed6000), _626, _628)
% [Thread 13] Reply to <message_queue>(0x562a210533b0): error('b5b507ab-cca7-4492-b320-f9f3d20bd6a3',error(io_error(read,'<stream>(0x562a20ed6000)'),context(system:read_string/3,'Success')))
% [Thread 13] Sending error('b5b507ab-cca7-4492-b320-f9f3d20bd6a3',error(io_error(read,'<stream>(0x562a20ed6000)'),context(system:read_string/3,'Success'))), timout: 300
% [Thread 13] Reply to <message_queue>(0x562a210533b0): destroy('b5b507ab-cca7-4492-b320-f9f3d20bd6a3')
% [Thread 11] Got error('b5b507ab-cca7-4492-b320-f9f3d20bd6a3',error(io_error(read,'<stream>(0x562a20ed6000)'),context(system:read_string/3,'Success'))) from <message_queue>(0x562a210533b0)
% [Thread 13] Sending destroy('b5b507ab-cca7-4492-b320-f9f3d20bd6a3'), timout: 300
% [Thread 11] DESTROY? error('b5b507ab-cca7-4492-b320-f9f3d20bd6a3',error(io_error(read,'<stream>(0x562a20ed6000)'),context(system:read_string/3,'Success')))
% [Thread 11] Field: [cache_control('no-cache, no-store, must-revalidate'),pragma('no-cache'),expires('0'),content_type('text/x-prolog; charset=UTF-8')]
% [Thread 11] [234] 200 OK (0.000 seconds; 142 bytes)
% [Thread httpd@4242_3] Closing connection from ip(127,0,0,6)

I can see custom headers parsed properly such as the ETag, so it doesn’t look like a network issue.

I’m quite perplexed by this error. I think that 'Success' corresponds to C’s strerror(0) but I can’t imagine why that would be getting returned. Race condition? I noticed this change recently in swipl-devel, could it possibly be related?

I’m having trouble reproducing this easily locally so I am still searching for a minimal reproduction case. Any ideas for extra debug flags or traces I could add to help isolate this?

Apologies for the double post. I noticed the original problematic endpoint (/internal/script/ in the logs) wasn’t setting a Content-Length header, so I tried setting that properly and now it seems like it has no trouble with either the local network HTTP or public network HTTPS URLs. Perhaps this is only an issue for HTTP without Content-Length set?

BTW, Content-Length of this particular response is 41577. I would be happy to provide further details to help isolate and reproduce.

Maybe the sandboxing of your Pengines is using default configurations??? read_string/3 is not considered safe by default.

?- safe_call(read_string(_,_,_)).
ERROR: No permission to call sandboxed `read_string(_39748,_39750,_39752)'
?- safe_primitive(read_string(_,_,_)).
Correct to: "sandbox:safe_primitive(read_string(_,_,_))"? yes
false.

?- 

I’m not calling read_string/3 directly, it’s (presumably) called by pengines’s internals for handling the src_url option. Unfortunately I am not at liberty to share the actual program loaded by it, but I am working on reproducing it from scratch.

Also, wouldn’t that lead to a sandbox error instead of io_error with ‘Success’?

I did try disabling the sandbox. Here is the full Pengines server program I used to produce the logs in the first post:

:- use_module(library(http/thread_httpd)).
:- use_module(library(http/json)).
:- use_module(library(http/json_convert)).
:- use_module(library(http/http_dispatch)).
:- use_module(library(http/http_stream)).
:- use_module(library(pengines)).

:- debug(log).
:- debug(http(request)).
:- debug(http(error)).
:- debug(redirect_log).
:- debug(request_json_log).
:- debug(http(post_request)).
:- debug(http_session).
:- debug(http_path).
:- debug(http(header)).
:- debug(http(send_request)).
:- debug(http(open)).
:- debug(http(get)).
:- debug(http(post)).
:- debug(http(connection)).
:- debug(pengine(_)).
:- trace(read_string/3). % it happens without this too, just a fruitless debug attempt

:- set_setting(pengines:debug_info, true).

server(Port) :- http_server(http_dispatch, [port(Port)]).

pengines:write_result(prolog, Event, _) :-
		format('Content-type: text/x-prolog; charset=UTF-8~n~n'),
		write_term(Event,
							 [ quoted(true),
								 quote_non_ascii(true),
								 character_escapes_unicode(false),
								 ignore_ops(true),
								 fullstop(true),
								 blobs(portray),
								 portray_goal(pengines:portray_blob),
								 nl(true)
							 ]).

pengines:authentication_hook(_, _, User) :-
	User = nobody.

pengines:not_sandboxed(_, _).

:- server(4242).

Adding Content-Length to the response for the src_url alleviated the problem. I haven’t been able to reproduce locally, this is likely tied deeply to OS things, and I’ve only seen it happen deep in a Kubernetes cluster.

read_string/3 is used only when pengine application has setting(Application:debug_info,true)
The setting(Application:debug_info, false) does not use read_string/3.
You could try spy(pengines:pengine_src_url). and that would show in debugger what is going on.

.

1 Like

Debugging Pengines is not always easy. The normal debug/3 interferes with Pengines I/O redirection. On Linux you can hack around that using

?- debug(topic > '/proc/self/fd/2').

On Windows you probably need to redirect the topic to a file. Haven’t followed this closely (I’m traveling). Errors in content length indicate output is coming from not expected sources and gets mixed into the HTTP stream.

1 Like