.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/model_selection/plot_successive_halving_iterations.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_model_selection_plot_successive_halving_iterations.py: Successive Halving Iterations ============================= This example illustrates how a successive halving search (:class:`~sklearn.model_selection.HalvingGridSearchCV` and :class:`~sklearn.model_selection.HalvingRandomSearchCV`) iteratively chooses the best parameter combination out of multiple candidates. .. GENERATED FROM PYTHON SOURCE LINES 12-23 .. code-block:: Python import matplotlib.pyplot as plt import numpy as np import pandas as pd from scipy.stats import randint from sklearn import datasets from sklearn.ensemble import RandomForestClassifier from sklearn.experimental import enable_halving_search_cv # noqa from sklearn.model_selection import HalvingRandomSearchCV .. GENERATED FROM PYTHON SOURCE LINES 24-26 We first define the parameter space and train a :class:`~sklearn.model_selection.HalvingRandomSearchCV` instance. .. GENERATED FROM PYTHON SOURCE LINES 26-46 .. code-block:: Python rng = np.random.RandomState(0) X, y = datasets.make_classification(n_samples=400, n_features=12, random_state=rng) clf = RandomForestClassifier(n_estimators=20, random_state=rng) param_dist = { "max_depth": [3, None], "max_features": randint(1, 6), "min_samples_split": randint(2, 11), "bootstrap": [True, False], "criterion": ["gini", "entropy"], } rsh = HalvingRandomSearchCV( estimator=clf, param_distributions=param_dist, factor=2, random_state=rng ) rsh.fit(X, y) .. raw:: html
HalvingRandomSearchCV(estimator=RandomForestClassifier(n_estimators=20,
                                                           random_state=RandomState(MT19937) at 0x7F76A0C36440),
                          factor=2,
                          param_distributions={'bootstrap': [True, False],
                                               'criterion': ['gini', 'entropy'],
                                               'max_depth': [3, None],
                                               'max_features': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x7f76a0d3ca00>,
                                               'min_samples_split': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x7f76cc708310>},
                          random_state=RandomState(MT19937) at 0x7F76A0C36440)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 47-49 We can now use the `cv_results_` attribute of the search estimator to inspect and plot the evolution of the search. .. GENERATED FROM PYTHON SOURCE LINES 49-71 .. code-block:: Python results = pd.DataFrame(rsh.cv_results_) results["params_str"] = results.params.apply(str) results.drop_duplicates(subset=("params_str", "iter"), inplace=True) mean_scores = results.pivot( index="iter", columns="params_str", values="mean_test_score" ) ax = mean_scores.plot(legend=False, alpha=0.6) labels = [ f"iter={i}\nn_samples={rsh.n_resources_[i]}\nn_candidates={rsh.n_candidates_[i]}" for i in range(rsh.n_iterations_) ] ax.set_xticks(range(rsh.n_iterations_)) ax.set_xticklabels(labels, rotation=45, multialignment="left") ax.set_title("Scores of candidates over iterations") ax.set_ylabel("mean test score", fontsize=15) ax.set_xlabel("iterations", fontsize=15) plt.tight_layout() plt.show() .. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_successive_halving_iterations_001.png :alt: Scores of candidates over iterations :srcset: /auto_examples/model_selection/images/sphx_glr_plot_successive_halving_iterations_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 72-86 Number of candidates and amount of resource at each iteration ------------------------------------------------------------- At the first iteration, a small amount of resources is used. The resource here is the number of samples that the estimators are trained on. All candidates are evaluated. At the second iteration, only the best half of the candidates is evaluated. The number of allocated resources is doubled: candidates are evaluated on twice as many samples. This process is repeated until the last iteration, where only 2 candidates are left. The best candidate is the candidate that has the best score at the last iteration. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.327 seconds) .. _sphx_glr_download_auto_examples_model_selection_plot_successive_halving_iterations.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/model_selection/plot_successive_halving_iterations.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/?path=auto_examples/model_selection/plot_successive_halving_iterations.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_successive_halving_iterations.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_successive_halving_iterations.py ` .. include:: plot_successive_halving_iterations.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_