21
loading...
This website collects cookies to deliver better user experience
map
function, you can also use the trick to pass multiple parameters to Executor.map
from the concurrent.futures
and multiprocessing.Pool
.The Problem With the map()
Function
1.1. Solution 1 - Mapping Multiple Arguments with itertools.starmap()
1.2. Solution 2 - Using functools.partial
to “Freeze” the Arguments
1.3. Solution 3 - Mapping Multiple Arguments by "Repeating" Them
Problem 2: Passing Multiple Parameters to multiprocessing Pool.map
2.1. Using pool.starmap
2.2. Using partial()
3.1. Using partial()
3.2. Using repeat()
sum_four
that takes 4 arguments and returns their sum.>>> def sum_four(a, b, c, d):
return a + b + c + d
>>> a, b, c = 1, 2, 3
>>> sum_four(a=a, b=b, c=c, d=1)
7
>>> sum_four(a=a, b=b, c=c, d=2)
8
>>> sum_four(a=a, b=b, c=c, d=3)
9
>>> sum_four(a=a, b=b, c=c, d=4)
10
map
, because you like functional programming, or maybe because you come from a language that encourages this paradigm. d
varies, we could store all potential values for d
we want to test in a list like this all_d_values = [1, 2, 3, 4]
. map
function and it takes only one element, what can you do?map
function but use itertools.starmap
instead. This function will take a function as arguments and an iterable of tuples. Then, starmap
will iterate over each tuple t
and call the function by unpacking the arguments, like this for t in tuples: function(*t)
.>>> import itertools
>>> all_d_values = [1, 2, 3, 4]
>>> items = ((a, b, c, d) for d in all_d_values)
>>> list(items)
[(1, 2, 3, 1), (1, 2, 3, 2), (1, 2, 3, 3), (1, 2, 3, 4)]
>>> list(itertools.starmap(sum_four, items))
[7, 8, 9, 10]
items
as a generator, this way we only hold in memory the element we’ll be processing.partial()
will "freeze" some portion of a function’s arguments and/or keywords resulting in a new function with a simplified signature.>>> import functools
>>> partial_sum_four = functools.partial(sum_four, a, b, c)
>>> partial_sum_four(3)
9
>>> list(map(partial_sum_four, all_d_values))
[7, 8, 9, 10]
itertools.repeat()
. map()
's signature, it accepts a function and multiple iterables, map(function, iterable, ...)
. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted.
a
, b
and c
infitnite iterables by using itertools.repeat()
. As soon as all_d_values
is exhausted, which is the shortest iterable, map()
will stop.>>> import itertools
>>> list(map(sum_four, itertools.repeat(a), itertools.repeat(b), itertools.repeat(c), all_d_values))
[7, 8, 9, 10]
repeat()
is roughly equivalent to:>>> list(map(sum_four, [1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], all_d_values))
[7, 8, 9, 10]
repeat
produces the elements on the go. In fact, it returns a repeatobject
, not list
[ref] .map()
. The only difference is that we need to pass multiple arguments to the multiprocessing's pool map.sum_four
in parallel using processes. Pool.map
only accepts one iterable. This means we cannot use repeat()
here. Let's see the alternatives.Pool
class from multiprocessing
module implements a starmap
function that works the same way as its counterpart from the itertools
module.>>> from multiprocessing import Pool
>>> import itertools
>>> def sum_four(a, b, c, d):
return a + b + c + d
>>> a, b, c = 1, 2, 3
>>> all_d_values = [1, 2, 3, 4]
>>> items = [(a, b, c, d) for d in all_d_values]
>>> items
[(1, 2, 3, 1), (1, 2, 3, 2), (1, 2, 3, 3), (1, 2, 3, 4)]
>>> with Pool(processes=4) as pool:
res = pool.starmap(sum_four, items)
>>> res
[7, 8, 9, 10]
partial
function.>>> import functools
>>> partial_sum_four = functools.partial(sum_four, a, b, c)
>>> with Pool(processes=4) as pool:
res = pool.map(partial_sum_four, all_d_values)
>>> res
[7, 8, 9, 10]
concurrent.futures
module provides a high-level interface called Executor
to run callables asynchronously. ThreadPoolExecutor
and a ProcessPoolExecutor
. multiprocessing.Pool
, a Executor
does not have a startmap()
function. However, its map()
implementation supports multiple iterables, which allow us to use repeat()
. Another difference is that Executor.map
returns a generator, not a list.partial
we use the map
method from ProcessPoolExecutor
like a regular map function. Since they both share the same interface, you can do the same interchangeably with a ThreadPoolExecutor
>>> from concurrent.futures import ProcessPoolExecutor
>>> import functools
>>> def sum_four(a, b, c, d):
return a + b + c + d
>>> a, b, c = 1, 2, 3
>>> all_d_values = [1, 2, 3, 4]
>>> partial_sum_four = functools.partial(sum_four, a, b, c)
>>> with ProcessPoolExecutor(max_workers=4) as pool:
res = list(pool.map(partial_sum_four, all_d_values))
>>> res
[7, 8, 9, 10]
itertools.repeat
to get the job done like the previous solutions.>>> from concurrent.futures import ProcessPoolExecutor
>>> from itertools import repeat
>>> def sum_four(a, b, c, d):
return a + b + c + d
>>> a, b, c = 1, 2, 3
>>> all_d_values = [1, 2, 3, 4]
>>> with ProcessPoolExecutor(max_workers=4) as pool:
res = list(pool.map(sum_four, repeat(a), repeat(b), repeat(c), all_d_values))
>>> res
[7, 8, 9, 10]
map()
function makes Python feel like a functional programming language. map()
is available not only as a built-in function but also as methods in the multiprocessing
and concurrent.futures
module. In this article, I showed what I do to map functions that take several arguments.