sklearn.neighbors
.RadiusNeighborsTransformer#
- class sklearn.neighbors.RadiusNeighborsTransformer(*, mode='distance', radius=1.0, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, n_jobs=None)[source]#
Transform X into a (weighted) graph of neighbors nearer than a radius.
The transformed data is a sparse graph as returned by
radius_neighbors_graph
.Read more in the User Guide.
New in version 0.22.
- Parameters:
- mode{‘distance’, ‘connectivity’}, default=’distance’
Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, and ‘distance’ will return the distances between neighbors according to the given metric.
- radiusfloat, default=1.0
Radius of neighborhood in the transformed sparse graph.
- algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’
Algorithm used to compute the nearest neighbors:
‘ball_tree’ will use
BallTree
‘kd_tree’ will use
KDTree
‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to
fit
method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
- leaf_sizeint, default=30
Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
- metricstr or callable, default=’minkowski’
Metric to use for distance computation. Default is “minkowski”, which results in the standard Euclidean distance when p = 2. See the documentation of scipy.spatial.distance and the metrics listed in
distance_metrics
for valid metric values.If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance between those vectors. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string.
Distance matrices are not supported.
- pfloat, default=2
Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. This parameter is expected to be positive.
- metric_paramsdict, default=None
Additional keyword arguments for the metric function.
- n_jobsint, default=None
The number of parallel jobs to run for neighbors search. If
-1
, then the number of jobs is set to the number of CPU cores.
- Attributes:
- effective_metric_str or callable
The distance metric used. It will be same as the
metric
parameter or a synonym of it, e.g. ‘euclidean’ if themetric
parameter set to ‘minkowski’ andp
parameter set to 2.- effective_metric_params_dict
Additional keyword arguments for the metric function. For most metrics will be same with
metric_params
parameter, but may also contain thep
parameter value if theeffective_metric_
attribute is set to ‘minkowski’.- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- feature_names_in_ndarray of shape (
n_features_in_
,) Names of features seen during fit. Defined only when
X
has feature names that are all strings.New in version 1.0.
- n_samples_fit_int
Number of samples in the fitted data.
See also
kneighbors_graph
Compute the weighted graph of k-neighbors for points in X.
KNeighborsTransformer
Transform X into a weighted graph of k nearest neighbors.
Examples
>>> import numpy as np >>> from sklearn.datasets import load_wine >>> from sklearn.cluster import DBSCAN >>> from sklearn.neighbors import RadiusNeighborsTransformer >>> from sklearn.pipeline import make_pipeline >>> X, _ = load_wine(return_X_y=True) >>> estimator = make_pipeline( ... RadiusNeighborsTransformer(radius=42.0, mode='distance'), ... DBSCAN(eps=25.0, metric='precomputed')) >>> X_clustered = estimator.fit_predict(X) >>> clusters, counts = np.unique(X_clustered, return_counts=True) >>> print(counts) [ 29 15 111 11 12]
Methods
fit
(X[, y])Fit the radius neighbors transformer from the training dataset.
fit_transform
(X[, y])Fit to data, then transform it.
get_feature_names_out
([input_features])Get output feature names for transformation.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
radius_neighbors
([X, radius, ...])Find the neighbors within a given radius of a point or points.
radius_neighbors_graph
([X, radius, mode, ...])Compute the (weighted) graph of Neighbors for points in X.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Compute the (weighted) graph of Neighbors for points in X.
- fit(X, y=None)[source]#
Fit the radius neighbors transformer from the training dataset.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_samples) if metric=’precomputed’
Training data.
- yIgnored
Not used, present for API consistency by convention.
- Returns:
- selfRadiusNeighborsTransformer
The fitted radius neighbors transformer.
- fit_transform(X, y=None)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Training set.
- yIgnored
Not used, present for API consistency by convention.
- Returns:
- Xtsparse matrix of shape (n_samples, n_samples)
Xt[i, j] is assigned the weight of edge that connects i to j. Only the neighbors have an explicit value. The diagonal is always explicit. The matrix is of CSR format.
- get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation.
The feature names out will prefixed by the lowercased class name. For example, if the transformer outputs 3 features, then the feature names out are:
["class_name0", "class_name1", "class_name2"]
.- Parameters:
- input_featuresarray-like of str or None, default=None
Only used to validate feature names with the names seen in
fit
.
- Returns:
- feature_names_outndarray of str objects
Transformed feature names.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- radius_neighbors(X=None, radius=None, return_distance=True, sort_results=False)[source]#
Find the neighbors within a given radius of a point or points.
Return the indices and distances of each point from the dataset lying in a ball with size
radius
around the points of the query array. Points lying on the boundary are included in the results.The result points are not necessarily sorted by distance to their query point.
- Parameters:
- X{array-like, sparse matrix} of (n_samples, n_features), default=None
The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.
- radiusfloat, default=None
Limiting distance of neighbors to return. The default is the value passed to the constructor.
- return_distancebool, default=True
Whether or not to return the distances.
- sort_resultsbool, default=False
If True, the distances and indices will be sorted by increasing distances before being returned. If False, the results may not be sorted. If
return_distance=False
, settingsort_results=True
will result in an error.New in version 0.22.
- Returns:
- neigh_distndarray of shape (n_samples,) of arrays
Array representing the distances to each point, only present if
return_distance=True
. The distance values are computed according to themetric
constructor parameter.- neigh_indndarray of shape (n_samples,) of arrays
An array of arrays of indices of the approximate nearest points from the population matrix that lie within a ball of size
radius
around the query points.
Notes
Because the number of neighbors of each point is not necessarily equal, the results for multiple query points cannot be fit in a standard data array. For efficiency,
radius_neighbors
returns arrays of objects, where each object is a 1D array of indices or distances.Examples
In the following example, we construct a NeighborsClassifier class from an array representing our data set and ask who’s the closest point to [1, 1, 1]:
>>> import numpy as np >>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]] >>> from sklearn.neighbors import NearestNeighbors >>> neigh = NearestNeighbors(radius=1.6) >>> neigh.fit(samples) NearestNeighbors(radius=1.6) >>> rng = neigh.radius_neighbors([[1., 1., 1.]]) >>> print(np.asarray(rng[0][0])) [1.5 0.5] >>> print(np.asarray(rng[1][0])) [1 2]
The first array returned contains the distances to all points which are closer than 1.6, while the second array returned contains their indices. In general, multiple points can be queried at the same time.
- radius_neighbors_graph(X=None, radius=None, mode='connectivity', sort_results=False)[source]#
Compute the (weighted) graph of Neighbors for points in X.
Neighborhoods are restricted the points at a distance lower than radius.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features), default=None
The query point or points. If not provided, neighbors of each indexed point are returned. In this case, the query point is not considered its own neighbor.
- radiusfloat, default=None
Radius of neighborhoods. The default is the value passed to the constructor.
- mode{‘connectivity’, ‘distance’}, default=’connectivity’
Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, in ‘distance’ the edges are distances between points, type of distance depends on the selected metric parameter in NearestNeighbors class.
- sort_resultsbool, default=False
If True, in each row of the result, the non-zero entries will be sorted by increasing distances. If False, the non-zero entries may not be sorted. Only used with mode=’distance’.
New in version 0.22.
- Returns:
- Asparse-matrix of shape (n_queries, n_samples_fit)
n_samples_fit
is the number of samples in the fitted data.A[i, j]
gives the weight of the edge connectingi
toj
. The matrix is of CSR format.
See also
kneighbors_graph
Compute the (weighted) graph of k-Neighbors for points in X.
Examples
>>> X = [[0], [3], [1]] >>> from sklearn.neighbors import NearestNeighbors >>> neigh = NearestNeighbors(radius=1.5) >>> neigh.fit(X) NearestNeighbors(radius=1.5) >>> A = neigh.radius_neighbors_graph(X) >>> A.toarray() array([[1., 0., 1.], [0., 1., 0.], [1., 0., 1.]])
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”}, default=None
Configure output of
transform
andfit_transform
."default"
: Default output format of a transformer"pandas"
: DataFrame output"polars"
: Polars outputNone
: Transform configuration is unchanged
New in version 1.4:
"polars"
option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X)[source]#
Compute the (weighted) graph of Neighbors for points in X.
- Parameters:
- Xarray-like of shape (n_samples_transform, n_features)
Sample data.
- Returns:
- Xtsparse matrix of shape (n_samples_transform, n_samples_fit)
Xt[i, j] is assigned the weight of edge that connects i to j. Only the neighbors have an explicit value. The diagonal is always explicit. The matrix is of CSR format.