.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/ensemble/plot_isolation_forest.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_ensemble_plot_isolation_forest.py: ======================= IsolationForest example ======================= An example using :class:`~sklearn.ensemble.IsolationForest` for anomaly detection. The :ref:`isolation_forest` is an ensemble of "Isolation Trees" that "isolate" observations by recursive random partitioning, which can be represented by a tree structure. The number of splittings required to isolate a sample is lower for outliers and higher for inliers. In the present example we demo two ways to visualize the decision boundary of an Isolation Forest trained on a toy dataset. .. GENERATED FROM PYTHON SOURCE LINES 20-32 Data generation --------------- We generate two clusters (each one containing `n_samples`) by randomly sampling the standard normal distribution as returned by :func:`numpy.random.randn`. One of them is spherical and the other one is slightly deformed. For consistency with the :class:`~sklearn.ensemble.IsolationForest` notation, the inliers (i.e. the gaussian clusters) are assigned a ground truth label `1` whereas the outliers (created with :func:`numpy.random.uniform`) are assigned the label `-1`. .. GENERATED FROM PYTHON SOURCE LINES 32-51 .. code-block:: Python import numpy as np from sklearn.model_selection import train_test_split n_samples, n_outliers = 120, 40 rng = np.random.RandomState(0) covariance = np.array([[0.5, -0.1], [0.7, 0.4]]) cluster_1 = 0.4 * rng.randn(n_samples, 2) @ covariance + np.array([2, 2]) # general cluster_2 = 0.3 * rng.randn(n_samples, 2) + np.array([-2, -2]) # spherical outliers = rng.uniform(low=-4, high=4, size=(n_outliers, 2)) X = np.concatenate([cluster_1, cluster_2, outliers]) y = np.concatenate( [np.ones((2 * n_samples), dtype=int), -np.ones((n_outliers), dtype=int)] ) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42) .. GENERATED FROM PYTHON SOURCE LINES 52-53 We can visualize the resulting clusters: .. GENERATED FROM PYTHON SOURCE LINES 53-63 .. code-block:: Python import matplotlib.pyplot as plt scatter = plt.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") handles, labels = scatter.legend_elements() plt.axis("square") plt.legend(handles=handles, labels=["outliers", "inliers"], title="true class") plt.title("Gaussian inliers with \nuniformly distributed outliers") plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_001.png :alt: Gaussian inliers with uniformly distributed outliers :srcset: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 64-66 Training of the model --------------------- .. GENERATED FROM PYTHON SOURCE LINES 66-72 .. code-block:: Python from sklearn.ensemble import IsolationForest clf = IsolationForest(max_samples=100, random_state=0) clf.fit(X_train) .. raw:: html
IsolationForest(max_samples=100, random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 73-80 Plot discrete decision boundary ------------------------------- We use the class :class:`~sklearn.inspection.DecisionBoundaryDisplay` to visualize a discrete decision boundary. The background color represents whether a sample in that given area is predicted to be an outlier or not. The scatter plot displays the true labels. .. GENERATED FROM PYTHON SOURCE LINES 80-97 .. code-block:: Python import matplotlib.pyplot as plt from sklearn.inspection import DecisionBoundaryDisplay disp = DecisionBoundaryDisplay.from_estimator( clf, X, response_method="predict", alpha=0.5, ) disp.ax_.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") disp.ax_.set_title("Binary decision boundary \nof IsolationForest") plt.axis("square") plt.legend(handles=handles, labels=["outliers", "inliers"], title="true class") plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_002.png :alt: Binary decision boundary of IsolationForest :srcset: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 98-111 Plot path length decision boundary ---------------------------------- By setting the `response_method="decision_function"`, the background of the :class:`~sklearn.inspection.DecisionBoundaryDisplay` represents the measure of normality of an observation. Such score is given by the path length averaged over a forest of random trees, which itself is given by the depth of the leaf (or equivalently the number of splits) required to isolate a given sample. When a forest of random trees collectively produce short path lengths for isolating some particular samples, they are highly likely to be anomalies and the measure of normality is close to `0`. Similarly, large paths correspond to values close to `1` and are more likely to be inliers. .. GENERATED FROM PYTHON SOURCE LINES 111-124 .. code-block:: Python disp = DecisionBoundaryDisplay.from_estimator( clf, X, response_method="decision_function", alpha=0.5, ) disp.ax_.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") disp.ax_.set_title("Path length decision boundary \nof IsolationForest") plt.axis("square") plt.legend(handles=handles, labels=["outliers", "inliers"], title="true class") plt.colorbar(disp.ax_.collections[1]) plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_003.png :alt: Path length decision boundary of IsolationForest :srcset: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.473 seconds) .. _sphx_glr_download_auto_examples_ensemble_plot_isolation_forest.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/ensemble/plot_isolation_forest.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/?path=auto_examples/ensemble/plot_isolation_forest.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_isolation_forest.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_isolation_forest.py ` .. include:: plot_isolation_forest.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_