.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/datasets/plot_iris_dataset.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_datasets_plot_iris_dataset.py: ================ The Iris Dataset ================ This data sets consists of 3 different types of irises' (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width. The below plot uses the first two features. See `here `_ for more information on this dataset. .. GENERATED FROM PYTHON SOURCE LINES 17-22 .. code-block:: Python # Code source: Gaƫl Varoquaux # Modified for documentation by Jaques Grobler # License: BSD 3 clause .. GENERATED FROM PYTHON SOURCE LINES 23-25 Loading the iris dataset ------------------------ .. GENERATED FROM PYTHON SOURCE LINES 25-30 .. code-block:: Python from sklearn import datasets iris = datasets.load_iris() .. GENERATED FROM PYTHON SOURCE LINES 31-33 Scatter Plot of the Iris dataset -------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 33-42 .. code-block:: Python import matplotlib.pyplot as plt _, ax = plt.subplots() scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target) ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1]) _ = ax.legend( scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes" ) .. image-sg:: /auto_examples/datasets/images/sphx_glr_plot_iris_dataset_001.png :alt: plot iris dataset :srcset: /auto_examples/datasets/images/sphx_glr_plot_iris_dataset_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 43-50 Each point in the scatter plot refers to one of the 150 iris flowers in the dataset, with the color indicating their respective type (Setosa, Versicolour, and Virginica). You can already see a pattern regarding the Setosa type, which is easily identifiable based on its short and wide sepal. Only considering these 2 dimensions, sepal width and length, there's still overlap between the Versicolor and Virginica types. .. GENERATED FROM PYTHON SOURCE LINES 52-57 Plot a PCA representation ------------------------- Let's apply a Principal Component Analysis (PCA) to the iris dataset and then plot the irises across the first three PCA dimensions. This will allow us to better differentiate between the three types! .. GENERATED FROM PYTHON SOURCE LINES 57-85 .. code-block:: Python # unused but required import for doing 3d projections with matplotlib < 3.2 import mpl_toolkits.mplot3d # noqa: F401 from sklearn.decomposition import PCA fig = plt.figure(1, figsize=(8, 6)) ax = fig.add_subplot(111, projection="3d", elev=-150, azim=110) X_reduced = PCA(n_components=3).fit_transform(iris.data) ax.scatter( X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=iris.target, s=40, ) ax.set_title("First three PCA dimensions") ax.set_xlabel("1st Eigenvector") ax.xaxis.set_ticklabels([]) ax.set_ylabel("2nd Eigenvector") ax.yaxis.set_ticklabels([]) ax.set_zlabel("3rd Eigenvector") ax.zaxis.set_ticklabels([]) plt.show() .. image-sg:: /auto_examples/datasets/images/sphx_glr_plot_iris_dataset_002.png :alt: First three PCA dimensions :srcset: /auto_examples/datasets/images/sphx_glr_plot_iris_dataset_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 86-90 PCA will create 3 new features that are a linear combination of the 4 original features. In addition, this transform maximizes the variance. With this transformation, we see that we can identify each species using only the first feature (i.e. first eigenvalues). .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.211 seconds) .. _sphx_glr_download_auto_examples_datasets_plot_iris_dataset.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/datasets/plot_iris_dataset.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/?path=auto_examples/datasets/plot_iris_dataset.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_iris_dataset.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_iris_dataset.py ` .. include:: plot_iris_dataset.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_