Covariance and Pearson Correlation
Covariance is a measure of how two variables varies together. It measures the variability of the data points on a scatter plot of two variables. \[pearson\ correlation\ =\ p\ =\ \frac{cov_{xy}}{\sigma_x\times\sigma_y}\]
Computing the covariance
The covariance may be computed using the Numpy function np.cov()
. For example, we have two sets of data x
and y
, np.cov(x, y)
returns a 2D array where entries [0,1]
and [1,0]
are the covariances. Entry [0,0]
is the variance of the data in x
, and entry [1,1]
is the variance of the data in y
. This 2D output array is called the covariance matrix, since it organizes the self- and covariance.
To remind you how the I. versicolor petal length and width are related, we include the scatter plot you generated in a previous exercise