Search
  • Sebastien

Create a Correlation Matrix with Python & Pandas

Updated: Dec 11, 2020

1. Let me first define the example I chose to that purpose:

Arbitrarily, I decided I wanted to know the correlations between 14 assets which are trading on CME/Globex along the last weekly 4 hours of trading on a 5min timeframe, that is to say the last 48 candles only and I used the close as the reference point for all of them, given the trading close occurs at the same time (22:00 UTC) to make sure there's no discrepancy with the alignment of time (so I don't have to clean the data and realign it). For example, I excluded the agricultural commodities which would close earlier (~19:20), otherwise I would have got empty data between 19:20 and 22:00 thus the result would not be significant within that chosen period window.


2. The chosen assets:

YMZ0, ZNZ0, 6AZ0, 6BZ0, 6CZ0, 6EZ0, 6JZ0, ESZ0, NQZ0, RTYZ0, GCZ0, SIZ0, CLZ0, NGZ0


3. The method used:

I exported the last 48 x 5min candle data for each assets via TradingView (as a CSV file);


I opened each file with Excel to filter/select the relevant data (last 48 rows of 5min close) for the last 240min of trading before weekly market close.


A- I collected the raw data under this format (to facilitate the visualisation I am only showing the first 4 values for each asset here...):


data = { 'YMZ0': [28223,28232,28208,28221...], 'ZNZ0': [138.546875,138.546875,138.53125,138.53125...], '6AZ0': [0.7272,0.7272,0.7274,0.7275...], '6BZ0': [1.3157,1.3157,1.316,1.3162...], '6CZ0': [0.76745,0.7674,0.7674,0.76755...], '6EZ0': [1.18905,1.1892,1.18905,1.18925...], '6JZ0': [0.009687,0.0096855,0.009687,0.009685...], 'ESZ0': [3501.75,3502.5,3501.5,3504.25...], 'NQZ0': [12056,12057,12063.5,12081.5...], 'RTYZ0': [1648.7,1646.9,1645.2,1646.9...], 'GCZ0': [1953,1953,1952.7,1952.9...], 'SIZ0': [25.65,25.64,25.63,25.645...], 'CLZ0': [37.33,37.36,37.28...], 'NGZ0': [2.901,2.901,2.89,2.887...], }

B- I created a DataFrame in order to capture the above dataset in Python:


df = pd.DataFrame(data,columns=['YMZ0','ZNZ0','6AZ0','6BZ0','6CZ0','6EZ0','6JZ0','ESZ0','NQZ0','RTYZ0','GCZ0','SIZ0','CLZ0','NGZ0'])


C- If I want to print this, it will show the following DataFrame:



D- I created a correlation matrix using Pandas (Python library) and the function corrMatrix = df.corr() so that would be printed as:




E- I imported 2 additional packages (seaborn & matplotlib) in order to get a more appealing visual representation of the correlation matrix and used the following function:

sn.heatmap(corrMatrix, annot=True)

F- This final step will thus automatically plot the results by showing the correlation matrix as a heatmap:


Voila! I hope you'll find this example insightful. Do not hesitate to send me your comments or questions...

32 views0 comments