Rate limiting library?

Hi, is there a library I can use for handling rate limiting similar to bottleneck in Node or limiter in Python?

Not that I am aware of. What exactly would it need to do? It shouldn’t be particularly hard to create such a beast. A quick look at these libraries suggest they limit the number of outstanding async requests, right?

SWI-Prolog doesn’t have async behavior in this sense. It has preemptively scheduled threads as alternative. If you use threads for doing the async requests, library(thread_pool) allows for creating a pool with certain properties. Next, you can use thread_create_in_pool/4 as replacement for thread_create/3 to create a thread with certain properties and limit the number of such threads that can be active at any point in time. If you try to create more threads than the size of the pool the pool policies decides what happens, i.e., wait for a thread to complete or raise a resource error.

The library was developed for the HTTP server, where we can associate expensive handlers with a thread pool to ensure not too many of them run concurrently. See http_handler/3

Thanks @jan , I need to simulate the following properties from the bottleneck or limiter libs. Basically I need to ensure I am making a max of 100 requests/minute whether single or multi threaded. Any further guidance would be very much appreciated!

  • bottleneck
const limiter = new Bottleneck({
	"reservoir": 100,
	// How many jobs can be executed before the limiter stops executing jobs. If reservoir reaches 0, no jobs will be executed until it is no longer 0. New jobs will still be queued up.

	"reservoirRefreshAmount": 100,
	// The value to set reservoir to when reservoirRefreshInterval is in use.

	"reservoirRefreshInterval": 60_000
	// Every reservoirRefreshInterval milliseconds, the reservoir value will be automatically updated to the value of reservoirRefreshAmount. The reservoirRefreshInterval value should be a multiple of 250 (5000 for Clustering).
});
  • limiter
@Limiter(rate=36, capacity=3600, consume=2)
// rate is the token replenishment rate per second. Tokens are automatically added every second.
// consume is the amount of tokens consumed from the token bucket upon successfully taking tokens from the bucket.
// capacity is the total amount of tokens the token bucket can hold. Token replenishment stops when this capacity is reached.

I have no experience with these APIs. They sound a bit similar to the thread pool library, but using an different algorithm to decide when a new task may be started. The thread pool library mostly limits concurrency to avoid overloading a server.

Here, it seems you can have some global variable that you raise to 100 every minute and one or more threads that take the current value and remaining time in the minute to insert a sleep and then execute a job. A global variable in the classic Prolog way is a dynamic predicate that you update using asserta/1 and retract/1. For timing you have get_time/1 to get the wall time and sleep/1 to sleep. Alternatively you can use thread_get_message/3 to wait for a message on a queue that never gets there and use the deadline option to wakeup at a specified time.

How exactly to do that depends on whether the individual jobs take more or less than the max rate. If they take too much time you’ll need enough threads to complete jobs fast enough.

You can make a nice fancy implementation by using an engine to master the scheduling rather than the assert/retract approach.

With my SWI-Prolog Solutions hat, we can offer building that for you commercially. Otherwise you’ll need to do some work or hope someone else will do it :slight_smile:

edit another nice approach is to have a message queue and a thread that will simply insert messages into the queue at a rate of 100/minute. Then you have enough “worker” threads that get a message from the queue, do their job and repeat this.

1 Like

Thank you, this is very helpful. Can you link any guides or examples on implementing the message queue solution? Maybe I can implement it myself… How difficult would this be?

My advice is to use a front-end reverse proxy server that can do both rate limiting and load balancing (and handle DOS attacks, etc.) and (if it makes sense) caching. You don’t want this kind of stuff complicating your server.

Rate limiting can’t be easily done with thread pool limits … for example, you’d want to return a 503 with Retry-after status code (or possibly 429) when overloaded.

I don’t have any specific advice for choosing such servers (and you’ll probably get them “for free” if you’re using a cloud service) but a quick Google search showed that Apache has "how-to"s for rate limiting and reverse proxying, and there are many other choices (e.g., nginx).

Thanks @peter.ludemann but just to clarify, this would be client-side rate limiting. I’m running api calls and hitting the rate limiting on the target endpoint so what I’m trying to do is rate limit my client to a certain number of max requests per minute, not defensively rate limit requests to my own server.

My apologies in misunderstanding what you were asking – I didn’t follow your links.

I’m wondering … what does your backend do if you exceed the rate limit? If it returns some kind of “rate limit exceeded” message, there are classic exponential back-off algorithms that should be easy to implement with sleep/1 and get_time/1.
If communication with your backend is via HTTP, there are rate limiters such as sphinx, which returns a header with the number of remaining requests before you hit the limit.

No worries. It’s not about what my backend or client do; it’s that the api endpoint I’m hitting will give me 429s (http error for “too many requests”) if I exceed their rate limit.

Yes,

sleep/1 and get_time/1.

This is what I had in mind. It doesn’t have to be anything nearly as fancy as the node and python libraries I referenced.

Really as long as I’m limiting my requests to max 100/minute I think I should be good.

So is the strategy

  1. Set my requests to be made in a “loop” (recursion, findall, whatever),
  2. but at the outset get the current time,
  3. Then on each iteration check current time again,
  4. compare against the time I started from
  5. if still within a minute, continue, else sleep thread?

Something like this? (with different intervals, of course):

sleeper :-
    get_time(T0),
    sleeper(#{last_time: T0, interval:5.0}).

sleeper(Time0) :-
    get_time(Tnow0),
    stamp_date_time(Tnow0, DateTimeNow0, 'UTC'),
    writeln(hello:Time0.interval:DateTimeNow0),
    get_time(Tnow),
    SleepTime is Time0.last_time - Tnow + Time0.interval,
    sleep(SleepTime),
    (   received_rate_limit_response
    ->  writeln('Rate limited'),
        Interval2 is Time0.interval * 1.5
    ;   Interval2 is max(1.0, Time0.interval *0.75)
    ),
    sleeper(#{last_time:Tnow, interval:Interval2}).

received_rate_limit_response :-
    random(R),
    R =< 0.1.

Honestly, I think even this is more complicated than I need. Think I am just going to take the diff of get_time(Time) and get_time(Time0) and then if Time_diff < 60 (or maybe less for padding) we can continue, else sleep.

But this has been extremely helpful, thank you both!!

@jan let me ask you this: How would you handle rate limiting with concurrency?

If I’m doing something like this

process_api_list(_,[],[]).
process_api_list(Count,[Json|Json_rest],[Data_im_interested_in|Data_rest]) :-
	api_call(get_thing,Json,Json_out),
    Count0 is Count+1,
	Data_im_interested_in = Json_out.thing,
	process_api_list(Count0,Json_rest,Data_rest).

my_api_calls_init(Result) :-
	api_call(get_list,List_of_stuff),
	process_api_list(1,List_of_stuff,Result).

and getting a count to track how many calls I’m making so I can sleep the thread based on elapsed time or number of calls made. This is currently being run in serial.

What if the quantity of “things” I need to pull is quite large and I would like to run these requests in parallel? How would I do that in an immutable environment where I need to synchronize these counts between threads to abide by the rate limit? In a mutable lang like python or node I would imagine you could have a global mutable variable called api_call_count = 0, spawn parallel threads to run the calls and each thread can access this variable to increment and then when it hits the limit you can sleep.

How would you do this in prolog? Would I need to do something with global variables? Thread communication?

Global variables and threads points at SWI-Prolog -- Flags

Will read, thank you!

They make a special point to say that “The predicate flag/3 is the oldest way to store…”. Is that just a fun fact or is the implication that there’s something outdated about it and there’s a more modern approach?

Its a non-standard way to have (thread-safe) global variables that have a simple atomic value. The clasical way is to use a dynamic predicate and asserta/1 combined with retract/1. This is a bit naughty though as the combination is not atomic and thus you need to protect the update and access using with_mutex/2.

The simplest and most elegant route is still to use a message queue. Than you have something like

setup(Rate, Queue) :-
     message_queue_create(Q, []),
     thread_create(feed(Rate, Q), []).

feed(Rate, Q) :-
   Sleep is 1/Rate,
   repeat,
       thread_send_message(Q, token),
       sleep(Sleep),
       fail.

Now, each of the threads participating in the rate limit can simply do

 thread_get_message(Q, token),
 <do work>

in a loop.

You can (and probably should) make many variations, e.g., using a term that describes the work to do in the token.