sigma_clip¶
-
astropy.stats.
sigma_clip
(data, sigma=3, sigma_lower=None, sigma_upper=None, iters=5, cenfunc=np.ma.median, stdfunc=np.std, axis=None, copy=True)[source] [edit on github]¶ Perform sigma-clipping on the provided data.
The data will be iterated over, each time rejecting points that are discrepant by more than a specified number of standard deviations from a center value. If the data contains invalid values (NaNs or infs), they are automatically masked before performing the sigma clipping.
Note
scipy.stats.sigmaclip provides a subset of the functionality in this function.
Parameters: data : array-like
The data to be sigma clipped.
sigma : float, optional
The number of standard deviations to use for both the lower and upper clipping limit. These limits are overridden by
sigma_lower
andsigma_upper
, if input. Defaults to 3.sigma_lower : float or
None
, optionalsigma_upper : float or
None
, optionaliters : int or
None
, optionalThe number of iterations to perform sigma clipping, or
None
to clip until convergence is achieved (i.e., continue until the last iteration clips nothing). Defaults to 5.cenfunc : callable, optional
The function used to compute the center for the clipping. Must be a callable that takes in a masked array and outputs the central value. Defaults to the median (
numpy.ma.median
).stdfunc : callable, optional
The function used to compute the standard deviation about the center. Must be a callable that takes in a masked array and outputs a width estimator. Masked (rejected) pixels are those where:
deviation < (-sigma_lower * stdfunc(deviation)) deviation > (sigma_upper * stdfunc(deviation))
where:
deviation = data - cenfunc(data [,axis=int])
Defaults to the standard deviation (
numpy.std
).axis : int or
None
, optionalcopy : bool, optional
Returns: filtered_data :
numpy.ma.MaskedArray
A masked array with the same shape as
data
input, where the points rejected by the algorithm have been masked.Notes
The routine works by calculating:
deviation = data - cenfunc(data [,axis=int])
and then setting a mask for points outside the range:
deviation < (-sigma_lower * stdfunc(deviation)) deviation > (sigma_upper * stdfunc(deviation))
It will iterate a given number of times, or until no further data are rejected.
Most numpy functions deal well with masked arrays, but if one would like to have an array with just the good (or bad) values, one can use:
good_only = filtered_data.data[~filtered_data.mask] bad_only = filtered_data.data[filtered_data.mask]
However, for multidimensional data, this flattens the array, which may not be what one wants (especially if filtering was done along an axis).
Examples
This example generates random variates from a Gaussian distribution and returns a masked array in which all points that are more than 2 sample standard deviations from the median are masked:
>>> from astropy.stats import sigma_clip >>> from numpy.random import randn >>> randvar = randn(10000) >>> filtered_data = sigma_clip(randvar, sigma=2, iters=5)
This example sigma clips on a similar distribution, but uses 3 sigma relative to the sample mean, clips until convergence, and does not copy the data:
>>> from astropy.stats import sigma_clip >>> from numpy.random import randn >>> from numpy import mean >>> randvar = randn(10000) >>> filtered_data = sigma_clip(randvar, sigma=3, iters=None, ... cenfunc=mean, copy=False)
This example sigma clips along one axis on a similar distribution (with bad points inserted):
>>> from astropy.stats import sigma_clip >>> from numpy.random import normal >>> from numpy import arange, diag, ones >>> data = arange(5) + normal(0., 0.05, (5, 5)) + diag(ones(5)) >>> filtered_data = sigma_clip(data, sigma=2.3, axis=0)
Note that along the other axis, no points would be masked, as the variance is higher.