scikit-learn

Core Developer

This is the collection of my open source contributions to scikit-learn, a Python module for machine learning. It has its code base maintained on GitHub, with over 2500 contributors.

I am a core developer of scikit-learn and also a member of its documentation team. I have contributed 88 merged pull requests, being its Top #43 contributor. Note that throughout this post, when saying a bug existed in scikit-learn a.b.c, it does not take into consideration backporting. For instance, "a bug existed in scikit-learn 1.3.1" and "fixed in scikit-learn 1.3.2" only implies that the bug was fixed after the release of scikit-learn 1.3.1, but does not gurantee that one would see the bug with scikit-learn 1.3.1 now since the fix for scikit-learn 1.3.2 may be backported to earlier versions, especially 1.3.x.

Code Contributions

Items in each section are sorted in reverse chronological order by the time of merge.

Cluster

FIX AffinityPropagation assigning multiple clusters for equal points #28121

Cross Decomposition

ENH ravel prediction of PLSRegression when fitted on 1d y #26602

Datasets

FIX dump svmlight when data is read-only #28111
ENH make_sparse_spd_matrix use sparse memory layout #27438

Decomposition

FIX KernelPCA inverse transform when gamma is not given #26337

Ensemble

FIX Remove spurious feature names warning in IsolationForest #25931

Feature Selection

FIX mutual_info_regression when X is of integer dtype #26748
FIX SequentialFeatureSelector throws IndexError when cv is a generator #25973

Metrics

ENH PrecisionRecallDisplay add option to plot chance level #26019
ENH RocCurveDisplay add option to plot chance level #25987

Neighbors

FIX KNeighborsClassifier raise when all neighbors of some sample have zero weights #26410

Preprocessing

FIX PowerTransformer raise when "box-cox" has nan column #26400

Tree

FIX export_text and export_graphviz accepts feature and class names as array-like #26289

Utilities

FIX improve error message in check_array when getting a Series and expecting a 2D container #28090

Maintenance Contributions

Items are sorted in reverse chronological order by the time of merge.

MAINT fix update_environments_and_lock_files for non-posix systems #28133
MNT Work-around sphinx-gallery UnicodeDecodeError in recommender system #27969
CLN avoid nested conftests #27954
TST Extend tests for scipy.sparse.*array in sklearn/svm/tests/test_sparse #27723
TST Extend tests for scipy.sparse/*array in sklearn/manifold/tests/test_spectral_embedding #27240
FIX make dataset fetchers accept os.Pathlike for data_home #27468
TST Extend tests for scipy.sparse/*array in sklearn/neighbors/tests/test_neighbors #27250
TST Extend tests for scipy.sparse/*array in sklearn/impute/tests/test_common #27277
TST Extend tests for scipy.sparse/*array in sklearn/feature_extraction/tests/test_text #27219
TST Extend tests for scipy.sparse/*array in sklearn/ensemble/tests/test_forest #27216
TST Extend tests for scipy.sparse/*array in sklearn/ensemble/tests/test_gradient_boosting #27217
TST Extend tests for scipy.sparse/*array in sklearn/ensemble/tests/test_iforest #27218
TST Extend tests for scipy.sparse/*array in sklearn/tree/tests/test_tree #27261
TST Extend tests for scipy.sparse/*array in sklearn/preprocessing/tests/test_data #27253
TST Extend tests for scipy.sparse/*array in sklearn/linear_model/tests/test_base #27225
TST Extend tests for scipy.sparse/*array in sklearn/linear_model/tests/test_coordinate_descent #27226
TST Extend tests for scipy.sparse/*array in sklearn/linear_model/tests/test_sparse_coordinate_descent #27237
TST Extend tests for scipy.sparse/*array in sklearn/feature_selection/tests/test_variance_threshold #27222
TST Extend tests for scipy.sparse/*array in sklearn/linear_model/tests/test_ridge #27235
TST Extend tests for scipy.sparse/*array in sklearn/linear_model/tests/test_quantile #27228
TST Extend tests for scipy.sparse/*array in sklearn/linear_model/tests/test_ransac #27233
TST Extend tests for scipy.sparse/*array in sklearn/preprocessing/tests/test_function_transformer #27254
TST Extend tests for scipy.sparse/*array in sklearn/neural_network/tests/test_rbm #27252
TST Extend tests for scipy.sparse/*array in sklearn/metrics/cluster/tests/test_unsupervised #27241
TST Extend tests for scipy.sparse/*array in sklearn/model_selection/tests/test_split #27246
TST Extend tests for scipy.sparse/*array in sklearn/utils/tests/test_extmath #27262
TST Extend tests for scipy.sparse/*array in sklearn/utils/tests/test_testing #27276
TST Extend tests for scipy.sparse/*array in sklearn/utils/tests/test_multiclass #27274
CLN v1.4.rst entries are not sorted #26759
MAINT Parameters validation for sklearn.utils.gen_even_slices #26682
MAINT Parameters validation for sklearn.linear_model.ridge_regression #26250
MAINT Parameters validation for sklearn.metrics.pairwise_distances_chunked #26125
MAINT Parameters validation for sklearn.metrics.pairwise_distances_argmin #26124
MAINT Parameters validation for sklearn.metrics.pairwise.manhattan_distances #26122
MAINT Parameters validation for sklearn.model_selection.learning_curve #26227
MAINT Parameters validation for sklearn.model_selection.validation_curve #26229
MAINT Parameters validation for sklearn.model_selection.permutation_test_score #26230
MAINT Parameters validation for sklearn.datasets.fetch_species_distributions #26161
MAINT Parameters validation for sklearn.datasets.load_breast_cancer #26165
MAINT Parameters validation for sklearn.datasets.load_diabetes #26166
MAINT Parameters validation for sklearn.datasets.fetch_rcv1 #26126
MAINT Parameters validation for sklearn.metrics.pairwise.sigmoid_kernel #26072
MAINT Parameters validation for sklearn.metrics.pairwise.rbf_kernel #26071
MAINT Parameters validation for sklearn.metrics.pairwise.polynomial_kernel #26070
MAINT Parameters validation for sklearn.metrics.pairwise.paired_cosine_distances #26075
MAINT Parameters validation for sklearn.metrics.pairwise.paired_manhattan_distances #26074
MAINT Parameters validation for sklearn.metrics.pairwise.paired_euclidean_distances #26073
MAINT Parameters validation for sklearn.metrics.pairwise.cosine_distances #26046
MAINT Parameters validation for sklearn.metrics.pairwise.linear_kernel #26049
MAINT Parameters validation for sklearn.metrics.pairwise.laplacian_kernel #26048
MAINT Parameters validation for sklearn.metrics.pairwise.haversine_distances #26047
MAINT Parameters validation for sklearn.preprocessing.scale #26036
MAINT Parameters validation for sklearn.tree.export_graphviz #26034

Documentation Contributions

Items are sorted in reverse chronological order by the time of merge.

DOC restructure changelog (in particular for switching to pydata-sphinx-theme) #28255
DOC solve some sphinx errors when updating to pydata-sphinx-theme #28134
DOC make up for errors in #26410 #28128
DOC fix the confusing ordering of whats_new/v1.5.rst #28120
DOC fix wrong indentations in the documentation that lead to undesired blockquotes #28107
DOC update doc build sphinx link to by matching regex in lock file #27970
DOC minor fixes of splitter docstrings (from #26423) #27790
DOC fix return type of make_sparse_spd_matrix #27472
DOC show usage of __ in Pipeline and FeatureUnion #26661
DOC search link to sphinx version #26610
DOC fix SplineTransformer include_bias docstring #26018