Http_open & json reponse & encoding

I’m using: SWI-Prolog (threaded, 64 bits, version 8.0.3)

Using http_open I would like to download json data from an external website but I cannot encode correctly the response data

My code is

:- use_module(library(http/json)).
:- use_module(library(http/http_open)).

http_open("http://www.viaggiatreno.it/viaggiatrenonew/resteasy/viaggiatreno/andamentoTreno/S01520/10829", In, [request_header('Accept'='application/json'),request_header('Accept-Charset'='utf-8')]), json_read_dict(In, Data).

?- current_prolog_flag(encoding, Encoding).
Encoding = utf8.

but I get strange characters like “mit einer Verzögerung von 10 Min.”, “avec un retard de 10 min.”, “con un retraso de 10 min.”, “cu o întârziere de 10 min.”, “10 å\210\206\ã\201\®é\201\205\延”, “误ç\202\¹ 10å\210\206\é\222\237”|…]

But using chrome I get a better result

curl "http://www.viaggiatreno.it/viaggiatrenonew/resteasy/viaggiatreno/andamentoTreno/S01520/10829"

I get a good result

"compRitardo":["ritardo 10 min.","delay 10 min.","Verspätung 10 Min.","retard de 10 min.","retraso de 10 min.","întârziere 10 min.","遅延 10 分","误点 10分钟","опоздание на 10 минут"],"compRitardoAndamento":["con un ritardo di 10 min.","10 minutes late","mit einer Verzögerung von 10 Min.","avec un retard de 10 min.","con un retraso de 10 min.","cu o întârziere de 10 min.","10 分の遅延","误点 10分钟","с опозданием в 10 минут"]
["ritardo 10 min.","delay 10 min.","Verspätung 10 Min.","retard de 10 min.","retraso de 10 min.","întârziere 10 min.","遅延 10 分","误点 10分钟","опоздание на 10 минут"],"compRitardoAndamento":["con un ritardo di 10 min.","10 minutes late","mit einer Verzögerung von 10 Min.","avec un retard de 10 min.","con un retraso de 10 min.","cu o întârziere de 10 min.","10 分の遅延","误点 10分钟","с опозданием в 10 минут"]

My linux locale is

LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Can anyone help me?

The JSON read predicates do not do anything regarding the stream encoding. http_open/3 always returns a binary stream. So, the solution is to adjust the stream encoding using a set_stream(In, encoding(utf8)).

That isn’t very satisfactory. Unfortunately, HTTP servers are notorious for sending misleading encoding, charset information, content type, etc.

Possible the JSON read predicates should switch to UTF-8 if the current encoding is binary? After all, the JSON standard says documents are in UTF-8 AFAIK. Something similar is done for the XML parser which uses UTF-8, unless the current stream has a specific encoding.

1 Like

That makes sense to me. It might be nice to output a warning if the encoding hasn’t been set properly. (Also, isn’t Content-Type supposed to be application/json?)

Pushed using UTF-8 when the stream is binary. Warnings seem a bit dubious. It can easily lead to many many messages you do not want and that might be hard to avoid.

I agree in general, but it can be useful when developing. If the warning is defined by a call to debug/2, then it can be optional.