.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/multiclass/plot_multiclass_overview.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_multiclass_plot_multiclass_overview.py>`
        to download the full example code or to run this example in your browser via JupyterLite or Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_multiclass_plot_multiclass_overview.py:


===============================================
Overview of multiclass training meta-estimators
===============================================

In this example, we discuss the problem of classification when the target
variable is composed of more than two classes. This is called multiclass
classification.

In scikit-learn, all estimators support multiclass classification out of the
box: the most sensible strategy was implemented for the end-user. The
:mod:`sklearn.multiclass` module implements various strategies that one can use
for experimenting or developing third-party estimators that only support binary
classification.

:mod:`sklearn.multiclass` includes OvO/OvR strategies used to train a
multiclass classifier by fitting a set of binary classifiers (the
:class:`~sklearn.multiclass.OneVsOneClassifier` and
:class:`~sklearn.multiclass.OneVsRestClassifier` meta-estimators). This example
will review them.

.. GENERATED FROM PYTHON SOURCE LINES 24-30

The Yeast UCI dataset
---------------------

In this example, we use a UCI dataset [1]_, generally referred as the Yeast
dataset. We use the :func:`sklearn.datasets.fetch_openml` function to load
the dataset from OpenML.

.. GENERATED FROM PYTHON SOURCE LINES 30-34

.. code-block:: Python

    from sklearn.datasets import fetch_openml

    X, y = fetch_openml(data_id=181, as_frame=True, return_X_y=True)


.. GENERATED FROM PYTHON SOURCE LINES 35-37

To know the type of data science problem we are dealing with, we can check
the target for which we want to build a predictive model.

.. GENERATED FROM PYTHON SOURCE LINES 37-39

.. code-block:: Python

    y.value_counts().sort_index()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    class_protein_localization
    CYT    463
    ERL      5
    EXC     35
    ME1     44
    ME2     51
    ME3    163
    MIT    244
    NUC    429
    POX     20
    VAC     30
    Name: count, dtype: int64


.. GENERATED FROM PYTHON SOURCE LINES 40-71

We see that the target is discrete and composed of 10 classes. We therefore
deal with a multiclass classification problem.

Strategies comparison
---------------------

In the following experiment, we use a
:class:`~sklearn.tree.DecisionTreeClassifier` and a
:class:`~sklearn.model_selection.RepeatedStratifiedKFold` cross-validation
with 3 splits and 5 repetitions.

We compare the following strategies:

* :class:~sklearn.tree.DecisionTreeClassifier can handle multiclass
  classification without needing any special adjustments. It works by breaking
  down the training data into smaller subsets and focusing on the most common
  class in each subset. By repeating this process, the model can accurately
  classify input data into multiple different classes.
* :class:`~sklearn.multiclass.OneVsOneClassifier` trains a set of binary
  classifiers where each classifier is trained to distinguish between
  two classes.
* :class:`~sklearn.multiclass.OneVsRestClassifier`: trains a set of binary
  classifiers where each classifier is trained to distinguish between
  one class and the rest of the classes.
* :class:`~sklearn.multiclass.OutputCodeClassifier`: trains a set of binary
  classifiers where each classifier is trained to distinguish between
  a set of classes from the rest of the classes. The set of classes is
  defined by a codebook, which is randomly generated in scikit-learn. This
  method exposes a parameter `code_size` to control the size of the codebook.
  We set it above one since we are not interested in compressing the class
  representation.

.. GENERATED FROM PYTHON SOURCE LINES 71-93

.. code-block:: Python

    import pandas as pd

    from sklearn.model_selection import RepeatedStratifiedKFold, cross_validate
    from sklearn.multiclass import (
        OneVsOneClassifier,
        OneVsRestClassifier,
        OutputCodeClassifier,
    )
    from sklearn.tree import DecisionTreeClassifier

    cv = RepeatedStratifiedKFold(n_splits=3, n_repeats=5, random_state=0)

    tree = DecisionTreeClassifier(random_state=0)
    ovo_tree = OneVsOneClassifier(tree)
    ovr_tree = OneVsRestClassifier(tree)
    ecoc = OutputCodeClassifier(tree, code_size=2)

    cv_results_tree = cross_validate(tree, X, y, cv=cv, n_jobs=2)
    cv_results_ovo = cross_validate(ovo_tree, X, y, cv=cv, n_jobs=2)
    cv_results_ovr = cross_validate(ovr_tree, X, y, cv=cv, n_jobs=2)
    cv_results_ecoc = cross_validate(ecoc, X, y, cv=cv, n_jobs=2)


.. GENERATED FROM PYTHON SOURCE LINES 94-96

We can now compare the statistical performance of the different strategies.
We plot the score distribution of the different strategies.

.. GENERATED FROM PYTHON SOURCE LINES 96-113

.. code-block:: Python

    from matplotlib import pyplot as plt

    scores = pd.DataFrame(
        {
            "DecisionTreeClassifier": cv_results_tree["test_score"],
            "OneVsOneClassifier": cv_results_ovo["test_score"],
            "OneVsRestClassifier": cv_results_ovr["test_score"],
            "OutputCodeClassifier": cv_results_ecoc["test_score"],
        }
    )
    ax = scores.plot.kde(legend=True)
    ax.set_xlabel("Accuracy score")
    ax.set_xlim([0, 0.7])
    _ = ax.set_title(
        "Density of the accuracy scores for the different multiclass strategies"
    )


.. image-sg:: /auto_examples/multiclass/images/sphx_glr_plot_multiclass_overview_001.png
   :alt: Density of the accuracy scores for the different multiclass strategies
   :srcset: /auto_examples/multiclass/images/sphx_glr_plot_multiclass_overview_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 114-130

At a first glance, we can see that the built-in strategy of the decision
tree classifier is working quite well. One-vs-one and the error-correcting
output code strategies are working even better. However, the
one-vs-rest strategy is not working as well as the other strategies.

Indeed, these results reproduce something reported in the literature
as in [2]_. However, the story is not as simple as it seems.

The importance of hyperparameters search
----------------------------------------

It was later shown in [3]_ that the multiclass strategies would show similar
scores if the hyperparameters of the base classifiers are first optimized.

Here we try to reproduce such result by at least optimizing the depth of the
base decision tree.

.. GENERATED FROM PYTHON SOURCE LINES 130-160

.. code-block:: Python

    from sklearn.model_selection import GridSearchCV

    param_grid = {"max_depth": [3, 5, 8]}
    tree_optimized = GridSearchCV(tree, param_grid=param_grid, cv=3)
    ovo_tree = OneVsOneClassifier(tree_optimized)
    ovr_tree = OneVsRestClassifier(tree_optimized)
    ecoc = OutputCodeClassifier(tree_optimized, code_size=2)

    cv_results_tree = cross_validate(tree_optimized, X, y, cv=cv, n_jobs=2)
    cv_results_ovo = cross_validate(ovo_tree, X, y, cv=cv, n_jobs=2)
    cv_results_ovr = cross_validate(ovr_tree, X, y, cv=cv, n_jobs=2)
    cv_results_ecoc = cross_validate(ecoc, X, y, cv=cv, n_jobs=2)

    scores = pd.DataFrame(
        {
            "DecisionTreeClassifier": cv_results_tree["test_score"],
            "OneVsOneClassifier": cv_results_ovo["test_score"],
            "OneVsRestClassifier": cv_results_ovr["test_score"],
            "OutputCodeClassifier": cv_results_ecoc["test_score"],
        }
    )
    ax = scores.plot.kde(legend=True)
    ax.set_xlabel("Accuracy score")
    ax.set_xlim([0, 0.7])
    _ = ax.set_title(
        "Density of the accuracy scores for the different multiclass strategies"
    )

    plt.show()


.. image-sg:: /auto_examples/multiclass/images/sphx_glr_plot_multiclass_overview_002.png
   :alt: Density of the accuracy scores for the different multiclass strategies
   :srcset: /auto_examples/multiclass/images/sphx_glr_plot_multiclass_overview_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 161-202

We can see that once the hyperparameters are optimized, all multiclass
strategies have similar performance as discussed in [3]_.

Conclusion
----------

We can get some intuition behind those results.

First, the reason for which one-vs-one and error-correcting output code are
outperforming the tree when the hyperparameters are not optimized relies on
fact that they ensemble a larger number of classifiers. The ensembling
improves the generalization performance. This is a bit similar why a bagging
classifier generally performs better than a single decision tree if no care
is taken to optimize the hyperparameters.

Then, we see the importance of optimizing the hyperparameters. Indeed, it
should be regularly explored when developing predictive models even if
techniques such as ensembling help at reducing this impact.

Finally, it is important to recall that the estimators in scikit-learn
are developed with a specific strategy to handle multiclass classification
out of the box. So for these estimators, it means that there is no need to
use different strategies. These strategies are mainly useful for third-party
estimators supporting only binary classification. In all cases, we also show
that the hyperparameters should be optimized.

References
----------

  .. [1] https://archive.ics.uci.edu/ml/datasets/Yeast

  .. [2] `"Reducing multiclass to binary: A unifying approach for margin classifiers."
     Allwein, Erin L., Robert E. Schapire, and Yoram Singer.
     Journal of machine learning research 1
     Dec (2000): 113-141.
     <https://www.jmlr.org/papers/volume1/allwein00a/allwein00a.pdf>`_.

  .. [3] `"In defense of one-vs-all classification."
     Journal of Machine Learning Research 5
     Jan (2004): 101-141.
     <https://www.jmlr.org/papers/volume5/rifkin04a/rifkin04a.pdf>`_.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 20.152 seconds)


.. _sphx_glr_download_auto_examples_multiclass_plot_multiclass_overview.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/multiclass/plot_multiclass_overview.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/?path=auto_examples/multiclass/plot_multiclass_overview.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_multiclass_overview.ipynb <plot_multiclass_overview.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_multiclass_overview.py <plot_multiclass_overview.py>`


.. include:: plot_multiclass_overview.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_