Multipart_open does not advance the stream's pointer

The code in multipart.c looks ahead using the stream’s buffer, but if the boundary is within the buffer, it never calls S__fillbuf() (or Sfread() or Sread()) and so the bufp never gets advanced. This leads to the following quirk:

?- open('/tmp/data.bin', read, S), stream_property(S, position(P0)), multipart_open(S, M, [boundary('------------------------76f42a85822912c2')]), read_line_to_codes(M, Codes),  stream_property(S, position(P1)).
S = <stream>(0x600001094000),
P0 = P1, P1 = '$stream_position'(0, 1, 0, 0),
M = <stream>(0x600001094100),
Codes = [...].

In particular, note that P0 = P1, even though I’ve read several hundred bytes.

Is this expected behaviour, that after reading from the (child) stream we get the same value for the position on the (parent) stream before and after the read?

Hi Matt, long time no see :slight_smile: multipart_read() calls S__fillbuf() and updated bufp at line 192. Nothing however updates the parent stream’s position data. I guess that is wrong. Not sure how much value there is in this. Note that it is a binary stream, so basically only the byte position is relevant. I think we could update that with the update of bufp?

Are you dealing with multipart documents that are embedded into other streams?

It calls S__fillbuf() but only if there’s not enough data in the stream buffer, which in my case, there is :frowning:

My use case is possibly a bit weird - and the underlying issue may have actually been fixed in a later version of Prolog. The gist is (according to my notes from 2 years ago on the ticket) that I’m dealing with HTTP post requests in keep-alive connections. In the case where the handler reads only one part of the multipart document (for example, if the same content is represented in both text and HTML), processes it, and succeeds, the socket is left with the rest of the multipart document pending on it, meaning the thread reading the next request receives a lot of unexpected stuff (the remainder of the previous request). To fix this, I wrapped the multipart handler in a predicate which gets the current stream position at the start of the handler, gets the stream position at the end, looks at the content-length (it’s more complex with chunked, but you get the idea) and then discards the unread part of the body. Obviously this doesn’t work if the stream pointer hasn’t been updated as the stream has been processed.

It might be as simple as moving stream->bufp after successfully consuming the data in the stream…

Sounds like a valid use case. I pushed FIXED: Multipart reader to update stream position data of parent stream. · SWI-Prolog/packages-http@c87298b · GitHub. This may address the issue. I don’t think it is it bufp that is the problem as that is not used by getting the position data. Surely the commit should improve on the current state :slight_smile:

If this fails, can you share the data file such that I can reproduce the behaviour?

That was quick!
I think this is right - the only caveat is that when I read from the parent stream, I still see the 0x10 at the end of the boundary (the 0x13 is consumed though).

This still works OK for my test case (I think; I’m doing testing now), but I thought I should mention it anyway.

As for the test case, the base64 encoding of the file I’m loading in my test case (winraw.txt) is

LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0yY2IxMGVjOTAyODE3MDU0DQpDb250ZW50LURpc3Bvc2l0aW9uOiBmb3JtLWRhdGE7IG5hbWU9ImZpbGUiOyBmaWxlbmFtZT0idGVzdF9hdHRhY2hlbWVudC50eHQiDQpDb250ZW50LVR5cGU6IHRleHQvcGxhaW4NCg0KMTIzNDUKYXNkZmcNCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tMmNiMTBlYzkwMjgxNzA1NA0KQ29udGVudC1EaXNwb3NpdGlvbjogZm9ybS1kYXRhOyBuYW1lPSJmaWxlIjsgZmlsZW5hbWU9InJlYWRtZS50eHQiDQpDb250ZW50LVR5cGU6IHRleHQvcGxhaW4NCg0KMTIzNDUKNjc4OTANCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tMmNiMTBlYzkwMjgxNzA1NC0tDQo=

(This is probably easier than trying to work out how to upload a text file with the correct line endings in it)

Then I run

open('/tmp/winraw.txt', read, S), stream_property(S, position(P0)), multipart_open(S, M, [boundary('------------------------2cb10ec902817054')]), read_line_to_codes(M, Codes),  stream_property(S, position(P1))

Also, I’ve found the very original ticket that lead me down this rabbit hole :slight_smile: (some text removed)

A pentest vulnerability report provisioned by … highlighted an issue with HTTP request smuggling …:

In the SWI-Prolog webserver it is the responsibility of the HTTP handler to consume bytes from the HTTP request’s body. This leaves the possibility that the request body is not fully consumed. For example:

  printf "%s\r\n"
    "GET /path1 HTTP/1.1"
    "Host: localhost"
    "Content-Length: 26"
    ""
    "GET /path2"
    "Host:localhost"
    ""
  | nc localhost 8080

Here a single HTTP request is sent to … with the request’s body containing a (smuggled) request. This results in … returning two HTTP responses:

  HTTP/1.1 200 OK
  Date: Tue, 01 Nov 2022 12:47:59 GMT
  Connection: Keep-Alive
  Content-Type: text/html; charset=UTF-8
  Content-Length: 2289

  <!DOCTYPE html>
  ...
  </html>HTTP/1.1 302 Moved Temporary

  Date: Tue, 01 Nov 2022 12:47:59 GMT
  Location: ....
  Connection: close
  Content-Length: 363
  Content-Type: text/html; charset=UTF-8

  <!DOCTYPE html>
  ...
  </html>

Not sure whether we must read the \n. That would be something for the multipart parser though, not the position logic. That seems fine:

test :-
    size_file('winraw.txt', Bytes),
    open('winraw.txt', read, S, [type(binary)]),
    stream_property(S, position(P0)),
    pp(P0),
    multipart_open(S, M, [boundary('------------------------2cb10ec902817054')]),
    read_line_to_codes(M, Codes),
    format('Got \"~s\"~n', [Codes]),
    stream_property(S, position(P1)),
    pp(P1),
    stream_position_data(byte_count, P1, Here),
    Left is Bytes-Here,
    copy_stream_data(S, current_output, Left),
    (   at_end_of_stream(S)
    ->  writeln(ok)
    ;   read_string(S, Rest, Str),
        format('Left ~D bytes: ~s~n', [Rest, Str])
    ).

Running this gives:

?- test.
P0 = '$stream_position'(0,1,0,0)
Got "Content-Disposition: form-data; name="file"; filename="test_attachement.txt""
P1 = '$stream_position'(206,1,0,206)

Content-Disposition: form-data; name="file"; filename="readme.txt"
Content-Type: text/plain

12345
67890
--------------------------2cb10ec902817054--
ok
true.

I.e., we can read the remainder of the body based on the Content-length.

Right. It is up to the handler to read the request. If you use http_read_data/4 you should be safe. If all goes right this predicate should have read all data. If there is an error, http_wrapper.pl forces the connection to close except for some exceptions that are not really errors, such as switching protocols or reply with a static file.

I think it would be worthwhile to see whether we can handle this more robustly in http_wrapper.pl, i.e., always validate that all data is read if the connection is a keep-alive connection. If there is a lot of data in this case, should we consider closing the connection instead? That is the policy that is implemented by http_open/3: when using a keep-alive connection: if the client has not read all data and it is not much, we read the remainder, else we close the connection.

P.s. The new ChatGPT assistent was of great help. See ChatGPT SWI-Prolog assistent. You need to be careful though as its first answer was completely wrong, claiming http_wrapper.pl was dealing with all of this … After pointing at the non-existence of claimed predicates and asking it to re-read the source it came with a quite adequate answer!

Pushed some changes to HTTP that causes the wrapper to deal with this. This still requires the fix to the multipart parser. In fact, without this fix the wrapper will try to read the body again, causing it to hang.

Added some simple tests with POST handlers that perform a partial read of the
request.

Awesome. Thanks!

I had a go with your LLM. You’re right - the first time I asked it a question it just made up nonsense. But then I told it to think again, and it came up with a response that was not only correct, it was kind of insightful. I learned something today :slight_smile: