Understanding Different Correlation Methods with Examples in Python

Aqeel Ahmed
4 min readJun 4, 2024

--

Hello Everyone!

Welcome back to my blogs! So basically, I was working with some data and had to try different correlation methods. So I thought why not just share the fundamentals of correlation and its different types with the medium community? So here I am :)! We will first describe what does a correlation mean and then move onto different types of correlation.

What is Correlation?

A statistical tool used to characterize the degree of relationship between two variables is called correlation. It measures the extent to which one variable’s change is correlated with another’s change. The correlation coefficient, represented by the symbol 𝑼 r, is a number between -1 and 1.

Here’s what the values indicate:

  • A correlation of 1 implies a perfect positive correlation, meaning as one variable increases, the other variable also increases proportionally.
  • A correlation of -1 implies a perfect negative correlation, meaning as one variable increases, the other decreases proportionally.
  • A correlation of 0 means there is no linear relationship between the variables.

Cross-Correlation

Cross-correlation is a measure of similarity between two signals as a function of a time-lag applied to one of the signals. It measures how two signals correlate with each other at different time shifts. Cross-correlation is often used in signal processing, time series analysis, and pattern recognition.

Mathematics Behind Cross-Correlation

Example in Python with Synthetic Data

Let’s generate synthetic data for two signals x[n] and y[n] and compute their cross-correlation using Python:

import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data for signals x and y
np.random.seed(0)
x = np.random.randn(100) # Signal x
y = np.zeros(150) # Signal y (padded with zeros)
y[20:70] = np.sin(np.linspace(0, 2*np.pi, 50)) # Insert a sinusoidal waveform into signal y

# Compute cross-correlation
cross_corr = np.correlate(x, y, mode='full')

# Plot signals x and y
plt.figure(figsize=(10, 4))
plt.subplot(3, 1, 1)
plt.plot(x)
plt.title('Signal x')
plt.subplot(3, 1, 2)
plt.plot(y)
plt.title('Signal y')
plt.subplot(3, 1, 3)
plt.plot(cross_corr)
plt.title('Cross-correlation')
plt.xlabel('Time Lag')
plt.ylabel('Cross-correlation Value')
plt.tight_layout()
plt.show()

Autocorrelation

Autocorrelation is a statistical concept that measures the linear relationship between a time series and a lagged version of itself. It assesses how a signal correlates with a delayed or lagged copy of itself. Autocorrelation is commonly used in time series analysis to identify repeating patterns and seasonality in data.

Mathematics Behind Autocorrelation:

The autocorrelation function measures the similarity between the original time series and a lagged version of itself at different time lags τ. A peak in the autocorrelation function at lag τ indicates a repeating pattern in the data with a period of τ.

Example in Python with Synthetic Data

import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data for a time series
np.random.seed(0)
N = 100 # Length of the time series
x = np.random.randn(N) # Time series data

# Compute autocorrelation function
autocorr = np.correlate(x, x, mode='full') / np.dot(x, x) # Normalized autocorrelation

# Plot the autocorrelation function
lags = np.arange(-N + 1, N) # Lag values
plt.stem(lags, autocorr)
plt.xlabel('Lag')
plt.ylabel('Autocorrelation')
plt.title('Autocorrelation Function')
plt.grid(True)
plt.show()

Spearman Rank Correlation

Spearman rank correlation coefficient, denoted as ρ (rho), is a non-parametric measure of the strength and direction of association between two ranked variables. It assesses how well the relationship between two variables can be described using a monotonic function.

Mathematics Behind Spearman Rank Correlation

The Spearman rank correlation coefficient is computed as the Pearson correlation coefficient between the ranks of the two variables. Let X and Y be two variables. The steps to compute Spearman rank correlation are as follows:

  1. Rank the values of both variables, assigning each observation a rank from 1 to n, where n is the number of observations.
  2. Calculate the differences between the ranks of corresponding observations for both variables.
  3. Compute the Spearman rank correlation coefficient ρ as the Pearson correlation coefficient between the ranks.

The formula for computing ρ is the same as that of Pearson correlation coefficient, but the data used are the ranks of the observations.

Example in Python with Synthetic Data

Let’s generate synthetic data for two variables X and Y and compute their Spearman rank correlation coefficient using Python:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import spearmanr

# Generate synthetic data for variables X and Y
np.random.seed(0)
X = np.random.rand(100) # Variable X
Y = 2 * X + np.random.randn(100) # Variable Y with a linear relationship with X

# Compute Spearman rank correlation coefficient
rho, p_value = spearmanr(X, Y)

print("Spearman rank correlation coefficient:", rho)

# Plot the data
plt.scatter(X, Y)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot of X vs Y')
plt.grid(True)
plt.show()

--

--

Aqeel Ahmed
Aqeel Ahmed

Written by Aqeel Ahmed

PhD (DL & IoT Security). My expertise include Signal Processing, ML, DL, and Cybersecurity. Follow me for easy to understand tutorials in these areas

No responses yet