Issue getting http parameter from application/x-www-form-urlencoded; charset=UTF-8

Hi,

I’m not sure if it is a bug or a feature: when I use

 http_parameters(Request,
    [
       result(Result, [length<4096,optional(true)])
    ])

to extract a parameter from a request encoded as application/x-www-form-urlencoded; charset=UTF-8

[user(submeto),
protocol(http),
peer(ip(127,0,0,1)),
pool(client('httpd@8080',http_unix_daemon:http_dispatch,...,
method(post),
request_uri('/.../submeto.pl'),
path('/.../submeto.pl'),
http_version(1-1),
host(localhost),
port(8080),
authorization('Basic ...'),
user_agent('Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:36.0) Gecko/20100101 Firefox/36.0'),
content_type('application/x-www-form-urlencoded; charset=UTF-8'),
x_forwarded_for('...'),
x_forwarded_host('...'),
x_forwarded_server('...'),
content_length(138),
connection('Keep-Alive')]

I get the parameter in latin1: Å\x9D\oveto instead of ŝoveto.

As a workaround I applied:

atom_codes(Result,Utf8),
phrase(utf8_codes(RCode), Utf8),
atom_codes(Res2,RCode),

which works as long as I expect a specific parameter posted in UTF-8. Not sure if a proper fix should be done in http_parameters instead or the client should send some additional information in order to get the decoding right.

Kind regards,
Wolfram.

Not sure what is happening. I tried to replicate that. First I had the ChatGPT “SWI-Prolog Assistent” write a test server for this. Using

Can you produce a minimal http server that serves a page with a form using a single text input field and handles the form input in another handler?

It came with this. I tried entering your ŝoveto and it works fine. Also added a print statement that confirms that the string is received fine and used ?- debug(http(_)). to verify it uses Content-Type: application/x-www-form-urlencoded. As far as I can tell, the standard says that x`-www-form-urlencoded must be decoded the same was as %-encoded URLs (Decode %XX and consider the result UTF-8)` It seems there should not be a chartype specifier.

:- use_module(library(http/thread_httpd)).
:- use_module(library(http/http_dispatch)).
:- use_module(library(http/http_parameters)).
:- use_module(library(http/html_write)).

% Define HTTP handlers
:- http_handler(root(.), form_page, []).
:- http_handler(root(handle), handle_form, []).

% Start the server
server(Port) :-
    http_server(http_dispatch, [port(Port)]).

% Handler for the form page
form_page(_Request) :-
    reply_html_page(
        title('Input Form'),
        form([action('/handle'), method(post)],
             [ p([], [label([for(name)], 'Enter your name:'),
                      input([name(name), type(text)])]),
               p([], input([type(submit), value('Submit')]))
             ])).

% Handler for processing the form input
handle_form(Request) :-
    http_parameters(Request,
                    [ name(Name, [string]) ]),
    reply_html_page(
        title('Form Result'),
        [ h1('Form Submitted'),
          p(['You entered: ', Name])
        ]).


> It seems there should not be a chartype specifier.

Mhm, maybe it’s the client than. I don’t see this issue with multipart/form-data. I see that there is a debug topic ‘post’, but I cannot inspect the original HTTP body this way. The daemon is running in a docker container. I think I could read the http stream from the request before http_parameters to see how it is encoded. At least the work around does it’s purpose meanwhile.