# SMILE.OUTLIERS(std_deviation_X, columns)

Calculates outliers using the Mahalanobis distance measures. In statistics, Mahalanobis distance is based on correlations between variables by which different patterns can be identified and analyzed. It is a useful way of determining similarity of an unknown sample set to a known one. It differs from the Euclidean distance in that it takes into account the correlations of the data set and is not dependent on the scale of measurements.

###### Parameters

std_deviation_X – Standard deviation threshold of Mahalanobis distance, after which data point is considered as an outlier; integer (for example, 3).

columns – Dataset columns or custom calculations.

Example: SMILE.OUTLIERS(3, sum([No of customers]), sum([Gross Sales])) used as a calculation for the Color field of the Scatterplot visualization.

###### Input data

- Numeric variables
- Without missing values
- Size of input data is not limited

###### Result

- Column of integer values 0 or 1, where 1 is outlier and 0 is inlier.

###### Key usage points

- It is a multivariate outlier detection method, so multiple variables are allowed.
- Only numeric (continuous) variables are allowed
- Inappropriate for ordinal data
- Calculation of sample covariance matrix makes it self-sensitive to outliers

For the whole list of algorithms, see Data science built-in algorithms.

## Comments

0 comments