Calculates outliers using the Mahalanobis distance measures. In statistics, Mahalanobis distance is based on correlations between variables by which different patterns can be identified and analyzed. It is a useful way of determining similarity of an unknown sample set to a known one. It differs from the Euclidean distance in that it takes into account the correlations of the data set and is not dependent on the scale of measurements.
std_deviation_X – Standard deviation threshold of Mahalanobis distance, after which data point is considered as an outlier; integer (for example, 3).
columns – Dataset columns or custom calculations.
Example: SMILE.OUTLIERS(3, sum([No of customers]), sum([Gross Sales])) used as a calculation for the Color field of the Scatterplot visualization.
- Numeric variables
- Without missing values
- Size of input data is not limited
- Column of integer values 0 or 1, where 1 is outlier and 0 is inlier.
Key usage points
- It is a multivariate outlier detection method, so multiple variables are allowed.
- Only numeric (continuous) variables are allowed
- Inappropriate for ordinal data
- Calculation of sample covariance matrix makes it self-sensitive to outliers
For the whole list of algorithms, see Data science built-in algorithms.