Does SWI-Prolog http_open/3 have some cache control options?

Lets say a proxy is somewhere involved. How about cache control?

Now I am comparing requests between different Prolog systems.
In one Prolog system the request has these headers. Which is possibly
the wrong approach for accessing a REST API, but anyway:

cache-control: no-cache
pragma: no-cache

In http_open/3 of SWI-Prolog these are missing by default. Can I add such
headers to my request, or is there a nocache/1 option that does it automatically?
My prompt engineered POST payload is such that it asks ChatGPT to randomly

generate something. So identical POST cannot be cached since each POST
might give a different generative AI result, and proxies sometimes cache? Although
here I am totally mistaken. This would be rather a problem for GET.

Since RFC 2616 says POST is anyway not cached? But I don’t see cache
control for GET either. Was only looking at library(http/http_open). Don’t know
yet what library(http/http_client) says.

Use the headers(-List) option of http_open/3, to add those headers.

I seems that when in server mode, i.e. not as client like when using http_open/3,
there are some options controlling some cache handling. I find:

cache(+Boolean)
If true (default), handle If-modified-since and send modification time.
https://www.swi-prolog.org/pldoc/doc_for?object=http_reply_file/3

Is there a posibility to issue if-modified-since request via http_open/3, or
is this server feature only for other clients?

Its a little tricky, how would http_open/3 notify the application, that called
http_open/3 about the outcome? Using status_code(-Code)?

if-modified-since is just another HTTP header passed by the client, same as cache-control.

http_open/3 is only performed by a HTTP client, not a HTTP server.

SWI-Prolog acting as a HTTP server would respond with e.g. http_reply/4, to include a response status code. Could be behind-the-scenes, via a wrapper such as http_reply_file/3 which we call.

Well, it changes the behaviour of a GET:

The If-Modified-Since request HTTP header makes the request conditional: the server sends back the requested resource, with a 200 status, only if it has been last modified after the given date. If the resource has not been modified since, the response is a 304 without any body;
If-Modified-Since - HTTP | MDN

How do you receive the 304 code via http_open/3 was my question.

See “status_code (-Code)” option, and also:

throws
error(existence_error(url, Id),Context) is raised if the HTTP result code is not in the range 200…299

… at SWI-Prolog -- The HTTP client libraries (http-clients)

The 304 code is part of Context.

Bottom line: http_open/3 doesn’t implement caching. You need to add that on top of it, or possibly extend the library with hooks that allow plugging in a caching mechanism. You can implement it on top as http_open/3 does allow passing additional request headers and getting reply headers (either individually or all of them) as well as the status code.

Caching and cache control are two different things. I guess there is a misunderstanding. Cache control addresses the end-point that the client is accessing, and needs not to be implemented client side, its usually implemented server side. Although browser has also a client side cache, which they call “memory” cache and a client side cache which resides on file system.

Cache control could be abstracted so that the end-user doesn’t need to know the HTTP protocol version specific details. Currently the header that I see in my HTTP client are from different HTTP protocol versions. Namely:

HTTP 1.0:
pragma: no-cache

HTTP 1.1:
cache-control: no-cache

Also HTTP 1.1 has more than only if-modified-since. There are like a dozen further options. I assume that an intelligent client would somehow abstract all options. And when it has received an ETag in the past for a particular resource, it could use if-none-match with this ETag. So that the end-user gets a little convenience. I guess this is also the gist of a REST client.

Then there is an already existing “memory” cache. Namely the Prolog texts that are consulted are anyway loaded into memory by SWI-Prolog. This can be seen as cache and is used as such in the reload modified function. It could give also a little bit different twist to WASM, where loading from a server first loads into an artificial file system, and then consults from there.

And then there is cache busting, if you give a resource a new name each time there is a new version. How would a Prolog systems deal with that? Does the server also rewrite all use_module/1 occurences. Does some Prolog system have such a tooling.

The easiest here would be possible see that packs are shipped with versions encoded somewhere. And then the individual Prolog texts and other resources inside a pack can keep their name, there is no need for name swizzling on the lower granularity level.

Except for disabling caching the other options rely on something the client remembers from a previous interaction, so you need to do some more work on the client side.

If I recall correctly, the WASM compile-over-http does if-modified-since on a make/0. Unfortunately the raw content of github doesn’t send a modified header, so we can’t use that. Would be nice, as it means you can update the git repo, run ?- make. and have an updated version :slight_smile: Possibly github raw content does implement some other cache control mechanism?

GitHub itself does throttle programmatic access, either to raw or to non-raw.
It prefers only to be accessed by either the GIT protocol, or when HTTP,
by its own web GUI. So this is a dead end. At least I noticed that,

the last time I tried. If you want to access GitHub content via HTTP you
have to enable the GitHub.io option for your GitHub content. This will then
build a website for you, which you can access via a GitHub.io URL.

GitHub.io sends both Date and ETag. You can use either if-modified-since
or if-none-match. In the past I used the below Java code, whereas a remembered
last modified date can be null, indicating if-modified-since should not be used,

and a remembered Etag can be null, again indicating if-none-match should
not be used. But I just notice thats the location in my code where I use cache
control. But its abstracted, ultimatly the UrlConnection does decided the headers

for me in case MASK_OPEN_CACH is not set, and therefore no caching is desired:

       if (if_modified) {  /* conditional GET request */
            fopts.setIfModifiedSince(getLastModified());
            fopts.setIfNoneMatch(getETag());
            fopts.setFlags(fopts.getFlags() & ~OpenOpts.MASK_OPEN_CACH);
        } else {
            fopts.setFlags(fopts.getFlags() | OpenOpts.MASK_OPEN_CACH);
        }

I think I use cache control in the above, since I want the if modified response
from the originating server, not some intermediary. So the request should ripple
through all intermediate stages, private cache, server cache, managed cache, etc…

Edit 31.05.2023
The above code is expression of some distrust on the intermediaries and
puts pressure on the originating server. The only helping hand to reduce the
pressure are the if-modified-since and if-none-match headers, which allow

the originating server to refrain from producing the content body if there was
no change. But this usage of these headers is totally debatable. Also I am
currently trying to abandon the above code, and do something else, for yet

some other reasons. The problem is, I saw this yesterday, after I had already
started this thread, the SWI-Prolog reload modified is pretty smart. For example
if you make ensure_loaded/1 of a file A, and this file A has two

includes A1 and A2 via include/1:

A /* ensure loaded */
+---- A1 /* included */
+---- A2 /* included */

It does reconsult A if either A1 or A2 did change. Cool! Not sure whether
a conditional HTTP GET on A is useful here, since it will not show that A1 or A2
is modified. And to check A1 and A2 one could use HTTP HEAD?

1 Like