"at" in www_form_encode, uri_encoded

Dear developers,

I am playing around with the web service of prolog, and I noticed one little inconsistency in www_form_encode vs. uri_encoded (note that www_form_encode/2 is deprecated and uri_encoded is recommended instead). www_form_encode translates the at sign @ to %40 (which is welcome in my use case), whereas none of variants of uri_encoded does so:

?- www_form_encode(‘someone@gmx.de’, M).
M = ‘someone%40gmx.de’.

?- uri_encoded(query_value, ‘someone@gmx.de’, M).
M = ‘someone@gmx.de’.

?- uri_encoded(fragment, ‘someone@gmx.de’, M).
M = ‘someone@gmx.de’.

?- uri_encoded(path, ‘someone@gmx.de’, M).
M = ‘someone@gmx.de’.

?- uri_encoded(segment, ‘someone@gmx.de’, M).
M = ‘someone@gmx.de’.

I am wondering if the at sign has been intentionally skipped.

Please note also that www_form_encode produces lowercase hex codes, whereas uri_encoded produces upper case.

?- www_form_encode(‘/’, M).
M = ‘%2f’.

?- uri_encoded(segment, ‘/’, M).
M = ‘%2F’.

Thank you for your consideration.

Matthias

This seems indeed dubious/wrong. Quoting has always been a hard topic, but it seems the debate is settled by now. JavaScript uses {encode,decode}URI and encode,decode}URIComponent with clearly defined semantics. I think we should provide the same functionality. uri_encoded/3 could be hacked to add uri and component as additional classes. Next we can probably add uri_encoded/2 and
uri_component_encoded/2. Would that make sense?

The overall story is, in my understanding

  • If you assembled a URI, but it may contain characters that are not allowed such as spaces or non-ascii characters, use encodeURI() to turn it into a valid URI. To me, this sounds a bit dubious because the parts from which you have assembled the initial URI may contain URI special characters and then you do not end up with the desired result. So, what is actually the use case for this function?
  • If you have some string and you want to use it as a component of a URI (typically a path segment or query parameter value) you use encodeURIComponent(). That makes sense. SWI-Prolog lib wanted to escape as little as possible, depending on the component you want to put the value in.

Comments welcome. Especially the first point is a bit puzzling to me.

1 Like

Dear Jan,

The use case is a bit special. I wanted to implement a so-called tool provider for an e-learning system (Moodle) using the “Basic LTI” specification. The “Basic Launch request” is a http POST request with a lot of parameters (including the email address of a learner, if available). This request is signed (HMAC-SHA1). To check the signature, the tool provider needs to sign the request, as well, and then compare the result to the provided signature. As a part of this procedure, all the post parameters need to be encoded (using the uppercase hex letters). Therefore, I cannot use www_form_encode. [For me, the quickest solution is an extra flag, e.g., www_form_encode/3 that enables uppercase letters…]

Thank you for your consideration.

With best wishes,

Matthias

If yiou have very specific encoding requirements I would write my own encoder (stealing some code from the libraries). If this encoding has some recognized name we can add it to uri_encoded/3 as this is quite easily extended. I don’t like moodle in the code though, so it needs to be some (W3C) standard.

Dear Jan,

Ah now I get it. The at sign is actually allowed in query_value, so there’s no need to translate it to %40. All right, I guess I have to continue using the self-written function.

One thing that could actually be implemented easily in uri_encoded/3 is something like uri_encoded(all, Value, Encoded) that only keeps “-._~” and the alphanumeric characters. It is less messy/specific than “moodle”, but transparent enough not to confuse people. And it would keep compatibility with the deprecated www_form_encode/2, except for the uppercase letters in the percent-tokens.

Thank you for your consideration.

Best wishes,

Matthias

That might actually be the same as what JavaScript encodeURIComponent does. If I understand this correctly it encodes a string such that you can use it for any componnent of a uri without changing the way the uri is disassembled to its components.

If anyone here really understands this stuff, please jump in. Would be great to resolve this once and forever :slight_smile:

It does indeed seem like encodeURI is incomplete by design; on the MDN page for it it explicitly warns that it’s not sufficient for assembling GET/POST requests on its own.

The usecase for it seems to be solely for encoding non-ASCII characters when all the other bits of the URI are known to be okay.

2 Likes

This is the use case (I am describing it because you both asked, but skip it if you don’t have time, it’s not important): I am trying to implement a so-called LTI tool provider. LTI is an interface that is used to offer foreign courses and tasks to e-learning environments such as moodle and canvas and so on. The e-learning environment (= tool consumer) sends a http post request to a so-called tool provider, which is basically a web service. The request has a signature, which is build from a string that is created by the term “POST”, then “&”, then the URL of the web service, then “&” again, then the encoded (!) post parameters in alphabetical order (separated by “&”).

In other words, to verify the signature, I need to extract the post parameters from the http request, sort them, URIencode them, connect them with the & sign as described, URI encode the resulting string a second time (!), add the POST and URL, and then compute the signature.

You find an example in Section B.5. The parameter with the “at”-sign is the email,

user@school.edu

Step 1: %-encode the form parameters, create Name=Value pairs,

…&lis_person_contact_email_primary=user%40school.edu&…

Step 2: encode the whole string a second time

…%26lis_person_contact_email_primary%3Duser%2540school.edu%26

You can see that the percent itself is now quoted, this is because the &-encoding was done twice. Terrible, but that’s how this LTI standard works…