DCPY.QUANTILETRANSFORM(n_quantiles, output_distribution, column)
Quantile Transformer provides a non-parametric and non-linear transformation based on the quantile function to map the data to uniform or normal distribution. It tends to spread out the most frequent values and minimizes the impact of big outliers. It may distort linear correlations between variables measured at the same scale, but makes variables measured on different scales more comparable. It also distorts distances within and across variables.
- n_quantiles – Number of quantiles to be computed, integer (default 1000).
- output_distribution – Desired output distribution (default uniform):
- uniform – Uniform distribution
- normal – Normal distribution
column – Dataset column or custom calculation.
Example: DCPY.QUANTILETRANSFORM(1000, ’uniform’, [Discount])
- Numeric column.
- Rows that contain missing values are dropped before calculations.
- Numeric column with transformed values with same length as input column.
- Missing values are on the same indices like in input column.
Key usage points
- Use it when you need to directly compare two or more variables, which are originally measured at different scales.
- Use it when you have big outliers in your data.
For the whole list of algorithms, see Data science built-in algorithms.