Covariance and Pearson Correlation

Covariance is a measure of how two variables varies together. It measures the variability of the data points on a scatter plot of two variables. \[pearson\ correlation\ =\ p\ =\ \frac{cov_{xy}}{\sigma_x\times\sigma_y}\]

Computing the covariance

The covariance may be computed using the Numpy function np.cov(). For example, we have two sets of data x and ynp.cov(x, y) returns a 2D array where entries [0,1] and [1,0] are the covariances. Entry [0,0] is the variance of the data in x, and entry [1,1] is the variance of the data in y. This 2D output array is called the covariance matrix, since it organizes the self- and covariance.
To remind you how the I. versicolor petal length and width are related, we include the scatter plot you generated in a previous exercise