sklearn.utils
.parallel_backend#
- sklearn.utils.parallel_backend(backend, n_jobs=-1, inner_max_num_threads=None, **backend_params)[source]#
Change the default backend used by Parallel inside a with block.
Warning
It is advised to use the
parallel_config
context manager instead, which allows more fine-grained control over the backend configuration.If
backend
is a string it must match a previously registered implementation using theregister_parallel_backend
function.By default the following backends are available:
‘loky’: single-host, process-based parallelism (used by default),
‘threading’: single-host, thread-based parallelism,
‘multiprocessing’: legacy single-host, process-based parallelism.
‘loky’ is recommended to run functions that manipulate Python objects. ‘threading’ is a low-overhead alternative that is most efficient for functions that release the Global Interpreter Lock: e.g. I/O-bound code or CPU-bound code in a few calls to native code that explicitly releases the GIL. Note that on some rare systems (such as Pyodide), multiprocessing and loky may not be available, in which case joblib defaults to threading.
You can also use the Dask joblib backend to distribute work across machines. This works well with scikit-learn estimators with the
n_jobs
parameter, for example:>>> import joblib >>> from sklearn.model_selection import GridSearchCV >>> from dask.distributed import Client, LocalCluster
>>> # create a local Dask cluster >>> cluster = LocalCluster() >>> client = Client(cluster) >>> grid_search = GridSearchCV(estimator, param_grid, n_jobs=-1) ... >>> with joblib.parallel_backend("dask", scatter=[X, y]): ... grid_search.fit(X, y)
It is also possible to use the distributed ‘ray’ backend for distributing the workload to a cluster of nodes. To use the ‘ray’ joblib backend add the following lines:
>>> from ray.util.joblib import register_ray >>> register_ray() >>> with parallel_backend("ray"): ... print(Parallel()(delayed(neg)(i + 1) for i in range(5))) [-1, -2, -3, -4, -5]
Alternatively the backend can be passed directly as an instance.
By default all available workers will be used (
n_jobs=-1
) unless the caller passes an explicit value for then_jobs
parameter.This is an alternative to passing a
backend='backend_name'
argument to theParallel
class constructor. It is particularly useful when calling into library code that uses joblib internally but does not expose the backend argument in its own API.>>> from operator import neg >>> with parallel_backend('threading'): ... print(Parallel()(delayed(neg)(i + 1) for i in range(5))) ... [-1, -2, -3, -4, -5]
Joblib also tries to limit the oversubscription by limiting the number of threads usable in some third-party library threadpools like OpenBLAS, MKL or OpenMP. The default limit in each worker is set to
max(cpu_count() // effective_n_jobs, 1)
but this limit can be overwritten with theinner_max_num_threads
argument which will be used to set this limit in the child processes.New in version 0.10.
See also
joblib.parallel_config
context manager to change the backend configuration.