In the previous post, I wrote about parallelism, and towards the end, mentioned Ray, the ultimate python package for parallelism.
I played around with it today, and wanted to give my take as well as describe the new challenge that I am currently stuck on.
Ray is just so much fun. I am so glad I found it.
Want to define a function that can execute in the background?
@ray.remote def add(x, y): return x + y await add.remote(1, 2)
Just how sick is this? Seriously? That function could also execute in a distributed manner, on some other machine entirely if you wanted to.
How about Actors? That must be more challenging to write, right? … WRONG.
@ray.remote class Actor: def __init__(self): self.state = 0 def increment(self, x): self.state += x return self.state actor = Actor.remote() await actor.increment.remote(1) print(await actor.increment.remote(1)) # Output: 2
Sateful actors, with a few simple lines of code. Just love it.
So far, so good. Where is the catch?
I haven’t been able to find a built-in approach to dispatch work to a pool of workers and await each result separately.
I mean, there is
ray.util.ActorPool, however, its API is very limited. As far as I can tell, the main APIs are
ActorPool.map makes sense if you have a bunch of data to process, but
ActorPool.submit is more for one-off tasks.
Given the above,
ActorPool.map definitely won’t do for my usecase which involves submitting work that is coming from an API request.
ActorPool.submit sounded interesting, however, I have no idea why they chose to make it not return anything. That means, once you submit a task, you have to go search somewhere else for the result :ugh:.
I will now try to conjure up a solution as I go.
At first, I noticed that
ActorPool had one more cool API up its sleeve.
ActorPool.pop_idle. That would work for me, if I can just try to pop an idle actor, do my work on it, and return the result. Unfortunately, if there isn’t an idle actor at the time of the call, it will return
None. I wanted a blocking solution instead.
After reading the source for
ActorPool, I realized the usecase I want to use the actors for is completely different. For example, I would like to keep the
… After a bit of pondering, I realized I could just use a blocking queue to achieve the desired result. Pop an available actor, use it, then put it back. New API requests that come in will block until an actor is available in the queue.
Of course, I will prepare my own abstraction around the queue of actors to provide a simple API similar to
How far will this streak go?