`sklearn.preprocessing`.MultiLabelBinarizer#

class sklearn.preprocessing.MultiLabelBinarizer(*, classes=None, sparse_output=False)[source]#

Transform between iterable of iterables and a multilabel format.

Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.

Parameters:

classesarray-like of shape (n_classes,), default=None: Indicates an ordering for the class labels. All entries should be unique (cannot contain duplicate classes).
sparse_outputbool, default=False: Set to True if output binary array is desired in CSR sparse format.

Attributes:

classes_ndarray of shape (n_classes,): A copy of the classes parameter when provided. Otherwise it corresponds to the sorted set of classes found when fitting.

See also

OneHotEncoder: Encode categorical features using a one-hot aka one-of-K scheme.

Examples

>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> mlb = MultiLabelBinarizer()
>>> mlb.fit_transform([(1, 2), (3,)])
array([[1, 1, 0],
       [0, 0, 1]])
>>> mlb.classes_
array([1, 2, 3])

>>> mlb.fit_transform([{'sci-fi', 'thriller'}, {'comedy'}])
array([[0, 1, 1],
       [1, 0, 0]])
>>> list(mlb.classes_)
['comedy', 'sci-fi', 'thriller']

A common mistake is to pass in a list, which leads to the following issue:

>>> mlb = MultiLabelBinarizer()
>>> mlb.fit(['sci-fi', 'thriller', 'comedy'])
MultiLabelBinarizer()
>>> mlb.classes_
array(['-', 'c', 'd', 'e', 'f', 'h', 'i', 'l', 'm', 'o', 'r', 's', 't',
    'y'], dtype=object)

To correct this, the list of labels should be passed in as:

>>> mlb = MultiLabelBinarizer()
>>> mlb.fit([['sci-fi', 'thriller', 'comedy']])
MultiLabelBinarizer()
>>> mlb.classes_
array(['comedy', 'sci-fi', 'thriller'], dtype=object)

Methods

`fit`(y)	Fit the label sets binarizer, storing classes_.
`fit_transform`(y)	Fit the label sets binarizer and transform the given label sets.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(yt)	Transform the given indicator matrix into label sets.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(y)	Transform the given label sets.

fit(y)[source]#

Fit the label sets binarizer, storing classes_.

Parameters:

yiterable of iterables: A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.

Returns:

selfobject: Fitted estimator.

fit_transform(y)[source]#

Fit the label sets binarizer and transform the given label sets.

Parameters:

yiterable of iterables: A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.

Returns:

y_indicator{ndarray, sparse matrix} of shape (n_samples, n_classes): A matrix such that y_indicator[i, j] = 1 iff classes_[j] is in y[i], and 0 otherwise. Sparse matrix will be of CSR format.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

inverse_transform(yt)[source]#

Transform the given indicator matrix into label sets.

Parameters:

yt{ndarray, sparse matrix} of shape (n_samples, n_classes): A matrix containing only 1s ands 0s.

Returns:

ylist of tuples: The set of labels for each sample such that y[i] consists of classes_[j] for each yt[i, j] == 1.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

"default": Default output format of a transformer
"pandas": DataFrame output
"polars": Polars output
None: Transform configuration is unchanged

New in version 1.4: "polars" option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(y)[source]#

Transform the given label sets.

Parameters:

yiterable of iterables: A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.

Returns:

y_indicatorarray or CSR matrix, shape (n_samples, n_classes): A matrix such that y_indicator[i, j] = 1 iff classes_[j] is in y[i], and 0 otherwise.

sklearn.preprocessing.MultiLabelBinarizer#

`sklearn.preprocessing`.MultiLabelBinarizer#