Distributed execution

unified_map.univariate.distributed.dask(function, argument_list)

Apply a univariate function to a list of arguments in a distributed fashion.

Uses Dask’s delayed() function to build a task graph and compute() function with a cluster connection to calculate results.

Parameters:
  • function – A callable object that accepts one argument
  • argument_list – An iterable object of input arguments
Returns:

List of output results

Raises:

ConnectionError – If no connection to a Dask scheduler was established.

Example

>>> def square(x):
...     return x**2
...
>>> dask(square, [1, 2, 3, 4, 5])
[1, 4, 9, 16, 25]

Note

Requires that a connection to a scheduler has been established, see Cluster setup and Dask cluster setup.

References

unified_map.univariate.distributed.spark(function, argument_list)

Apply a univariate function to a list of arguments in a distributed fashion.

Uses Apache Spark’s map() and collect() functions provided by a resilient distributed dataset (RDD).

Parameters:
  • function – A callable object that accepts one argument
  • argument_list – An iterable object of input arguments
Returns:

List of output results

Raises:

ConnectionError – If no connection (“context”) to a Spark scheduler (“master”) was established.

Example

>>> def square(x):
...     return x**2
...
>>> spark(square, [1, 2, 3, 4, 5])
[1, 4, 9, 16, 25]

Note

Requires that a connection to a scheduler has been established, see Cluster setup and Spark cluster setup.

References