GitHub itself does throttle programmatic access, either to raw or to non-raw.
It prefers only to be accessed by either the GIT protocol, or when HTTP,
by its own web GUI. So this is a dead end. At least I noticed that,
the last time I tried. If you want to access GitHub content via HTTP you
have to enable the GitHub.io option for your GitHub content. This will then
build a website for you, which you can access via a GitHub.io URL.
GitHub.io sends both Date and ETag. You can use either if-modified-since
or if-none-match. In the past I used the below Java code, whereas a remembered
last modified date can be null, indicating if-modified-since should not be used,
and a remembered Etag can be null, again indicating if-none-match should
not be used. But I just notice thats the location in my code where I use cache
control. But its abstracted, ultimatly the UrlConnection does decided the headers
for me in case MASK_OPEN_CACH is not set, and therefore no caching is desired:
if (if_modified) { /* conditional GET request */
fopts.setIfModifiedSince(getLastModified());
fopts.setIfNoneMatch(getETag());
fopts.setFlags(fopts.getFlags() & ~OpenOpts.MASK_OPEN_CACH);
} else {
fopts.setFlags(fopts.getFlags() | OpenOpts.MASK_OPEN_CACH);
}
I think I use cache control in the above, since I want the if modified response
from the originating server, not some intermediary. So the request should ripple
through all intermediate stages, private cache, server cache, managed cache, etc…
Edit 31.05.2023
The above code is expression of some distrust on the intermediaries and
puts pressure on the originating server. The only helping hand to reduce the
pressure are the if-modified-since and if-none-match headers, which allow
the originating server to refrain from producing the content body if there was
no change. But this usage of these headers is totally debatable. Also I am
currently trying to abandon the above code, and do something else, for yet
some other reasons. The problem is, I saw this yesterday, after I had already
started this thread, the SWI-Prolog reload modified is pretty smart. For example
if you make ensure_loaded/1 of a file A, and this file A has two
includes A1 and A2 via include/1:
A /* ensure loaded */
+---- A1 /* included */
+---- A2 /* included */
It does reconsult A if either A1 or A2 did change. Cool! Not sure whether
a conditional HTTP GET on A is useful here, since it will not show that A1 or A2
is modified. And to check A1 and A2 one could use HTTP HEAD?