Calculates the Pearson correlation coefficient between selected columns to assess the linear relationship of two continuous variables. A relationship is linear when a change in one variable is associated with a proportional change in the other variable. Pearson’s correlation coefficient is a measure based on the actual data values, and thus, it is sensitive to outliers.
- Two numeric variables
- The size of input data is not limited
- Without missing values
Example: MLLIB.PEARSONCOR(sum([Gross Sales]), sum([No of Customers]))
The correlation coefficient measures of the strength of the relationship between two variables (from -1 to 1). For example, the value of -1 shows a perfect negative correlation, the value of 1 indicates a perfect positive correlation, and the value of 0 — no linear relationship between the two variables.
Using the Scatterplot widget, add a calculation with the MLLIB.PEARSONCOR(sum([Gross Sales]), sum([No of Customers])), but set to dimension. Using the dataset manager, drag it into the Color field. The function returns a single value, so only one color is used. The coefficient value is shown in the legend and the tooltip for each point of the visualization. The coefficient of 0.93 indicates a strong positive correlation.
For the whole list of algorithms, see Data science built-in algorithms.