TL;DR
Use Python's completely undocumented ThreadPool
to parallelize non-bytecode work on shared memory!
Like so:
from multiprocessing.pool import ThreadPool
pool = ThreadPool()
pool.map(io_heavy_work, data_in_shared_memory)
In contrast to multiprocessing.Pool
,
where each process gets a copy of the second argument of map
and the main process is responsible for collecting results as returned by map
,
the io_heavy_work
function here may modify the instances of data it gets, since it's threading with shared-memory.
Tutorial
There are two main reasons for writing multi-threaded code:
- high-performance shared-memory programs (including games),
- programs that spend a lot of time waiting for many different I/O tasks.
In Python code, due to the GIL which restricts bytecode execution to a single thread, the first point becomes hopeless and only the second one remains.
I recently had such a workload, specifically a web-forum crawler. I've prototyped all code for crawling in a single-threaded fashing, something like this:
class Subject:
def update(self):
# Look for new replies and store them somewhere
s = Subject(url)
while True:
s.update()
Now imagine you've got a lot of url
s, so many in fact, that running each in its own thread doesn't make sense.
In addition, update
spends most of its time waiting for replies from the server.
This is a perfect illustration of the second case I've mentioned earlier.
If Python had a parallel for, we could simply do something like this:
subjects = [Subject(url) for url in all_my_urls]
while True:
parallel for s in subjects:
s.update()
Unfortuantely, this doesn't exist.
Instead, people are always redirected to multiprocessing
's Pool
for such operations:
pool = multiprocessing.Pool()
subjects = [Subject(url) for url in all_my_urls]
while True:
pool.map(lambda s: s.update(), subjects)
Well, beautiful! Except that this won't work.
The reason is that every process gets its own, separate copy of subjects
and won't update the objects in the main process!
The usual way of solving this, is to completely restructure your code into a "return the result" and "main collector" way,
which works, but is a nontrivial amount of effort and unnecessary code obfuscation.
If only we had threads (and thus shared memory) instead of processes, this would just work.
Well, turns out that Python comes with a thread pool! It's just completely undocumented, because nobody bothered to document it so far. Check this out:
from multiprocessing.pool import ThreadPool
pool = ThreadPool()
subjects = [Subject(url) for url in all_my_urls]
while True:
pool.map(lamda s: s.update(), subjects)
This now actually works, because all threads share the exact same subjects
instance.
Beautiful, isn't it?