IB量化博客

1 2 3 4 5

Why Machine Learning Funds Fail

By Quantpedia

An interesting insight into problems associated with an attempts to implement machine learning in trading:

Authors: de Prado
Title: The 7 Reasons Most Machine Learning Funds Fail
Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3031282

Abstract:

The rate of failure in quantitative finance is high, and particularly so in financial machine learning. The few managers who succeed amass a large amount of assets, and deliver consistently exceptional performance to their investors. However, that is a rare outcome, for reasons that will become apparent in this presentation. Over the past two decades, I have seen many faces come and go, firms started and shut down. In my experience, there are 7 critical mistakes underlying most of those failures.

Notable quotations from the academic research paper:

The rate of failure in quantitative finance is high, and particularly so in financial machine learning. The few managers who succeed amass a large amount of assets, and deliver consistently exceptional performance to their investors. However, that is a rare outcome, for reasons that will become apparent in this presentation. Over the past two decades, I have seen many faces come and go, firms started and shut down. In my experience, there are 7 critical mistakes underlying most of those failures.
Notable quotations from the academic research paper:

• “Over the past 20 years, I have seen many new faces arrive to the financial industry, only to leave shortly after.
• The rate of failure is particularly high in machine learning (ML).
• In my experience, the reasons boil down to 7 common errors:
1. The Sisyphus paradigm
2. Integer differentiation
3. Inefficient sampling
4. Wrong labeling
5. Weighting of non-IID samples
6. Cross-validation leakage
7. Backtest overfitting

Pitfall #1:
The complexities involved in developing a true investment strategy are overwhelming.  Even if the firm provides you with shared services in those areas, you are like a worker at a BMW factory who has been asked to build the entire car alone, by using all the workshops around you. It takes almost as much effort to produce one true investment strategy as to produce a hundred. Every successful quantitative firm I am aware of applies the meta-strategy paradigm. Your firm must set up a research factory where tasks of the assembly line are clearly divided into subtasks, where quality is independently measured and monitored for each subtask, where the role of each quant is to specialize in a particular subtask, to become the best there is at it, while having a holistic view of the entire process.
Pitfall #2:
In order to perform inferential analyses, researchers need to work with invariant processes, such as returns on prices (or changes in log-prices), changes in yield, changes in volatility. These operations make the series stationary, at the expense of removing all memory from the original series. Memory is the basis for the model’s predictive power. The dilemma is returns are stationary however memory-less; and prices have memory however they are non-stationary.
Pitfall #3:
Information does not arrive to the market at a constant entropy rate. Sampling data in chronological intervals means that the informational content of the individual observations is far from constant. A better approach is to sample observations as a subordinated process of the amount of information exchanged: Trade bars. Volume bars. Dollar bars. Volatility or runs bars. Order imbalance bars. Entropy bars.
Pitfall #4:
Virtually all ML papers in finance label observations using the fixed-time horizon method. There are several reasons to avoid such labeling approaches: Time bars do not exhibit good statistical properties and the same threshold τ is applied regardless of the observed volatility. There are a couple of better alternatives, but even these improvements miss a key flaw of the fixed-time horizon method: the path followed by prices.
Pitfall #5:
Most non-financial ML researchers can assume that observations are drawn from IID processes. For example, you can obtain blood samples from a large number of patients, and measure their cholesterol. Of course, various underlying common factors will shift the mean and standard deviation of the cholesterol distribution, but the samples are still independent: There is one observation per subject. Suppose you take those blood samples, and someone in your laboratory spills blood from each tube to the following 9 tubes to their right. Now you need to determine the features predictive of high cholesterol (diet, exercise, age, etc.), without knowing for sure the cholesterol level of each patient. That is the equivalent challenge that we face in financial ML.

1. Labels are decided by outcomes.
2. Outcomes are decided over multiple observations.
3. Because labels overlap in time, we cannot be certain about what observed features caused an effect.

Pitfall #6:
One reason k-fold CV fails in finance is because observations cannot be assumed to be drawn from an IID process. Leakage takes place when the training set contains information that also appears in the testing set. In the presence of irrelevant features, leakage leads to false discoveries. One way to reduce leakage is to purge from the training set all observations whose labels overlapped in time with those labels included in the testing set. I call this process purging.
Pitfall #7:
Backtest overfitting due to data dredging. Solution - use The Deflated Sharpe Ratio - it computes the probability that the Sharpe Ratio (SR) is statistically significant, after controlling for the inflationary effect of multiple trials, data dredging, non-normal returns and shorter sample lengths."

---------------------------------------
To learn more about this paper, view the full article on Quantpedia website:
https://www.quantpedia.com/Blog/Details/why-machine-learning-funds-fail

About Quantpedia

Quantpedia Mission is to process financial academic research into a more user-friendly form to help anyone who seeks new quantitative trading strategy ideas. Quantpedia team consists of members with strong financial and mathematical background (former quantitative portfolio managers and founders of Quantconferences.com) combined with members with outstanding IT and technical knowledge. Learn more about Quantpedia here: https://quantpedia.com

This article is from Quantpedia and is being posted with Quantpedia’s permission. The views expressed in this article are solely those of the author and/or Quantpedia and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

17891

5 Questions For Wesley Gray, AlphaArchitect.com

Momentum investing – betting on the persistence of price trends in the short to medium term — has captured the crowd’s attention in recent years. Consider, for instance, the strong growth in ETF assets in the niche. The first fund launched a bit more than five years ago; today, there are dozens of momentum ETFs, collectively holding nearly $15 billion in assets, according to etfdb.com. That’s still a small piece of the total ETF pie, but the strategy’s allure could keep growth bubbling for years to come. What should investors expect? Does the rising interest in momentum raise concerns about the strategy’s expected return? For some insight, The Capital Spectator asked Wesley Gray at Alpha Architect, a wealth manager near Philadelphia. Gray, who previously worked as a finance professor, is an obvious source for discussing momentum. In addition to managing variations of momentum-based portfolios for clients, he and his team have written extensively about the strategy at AlphaArchitect.com, a popular investing blog. Gray is also the co-author of Quantitative Momentum: A Practitioner’s Guide to Building a Momentum-Based Stock Selection System Why does momentum persist? It’s been identified in the literature for decades and traders have been using it for much longer in one form or another. Most return anomalies are arbitraged away or turn out to be data-mining illusions. Momentum seems to be different. Why? This is a debate that still rages in academic circles, but it boils down to a mix of fundamental risk and mispricing that is tough to arbitrage away. Fundamental risk is easy to understand — higher risk generally earns higher returns in a competitive equilibrium. Mispricing is a bit trickier. If the mispricing is easy to exploit — i.e., you can generate 2-plus Sharpe ratio strategies by exploiting momentum — one can be sure the highly leveraged computer geeks at fast-moving hedge funds and proprietary trading shops will take care of the mispricing. But what if trying to exploit momentum mispricing is akin to eating a hand grenade on occasion? Well, it turns out that strategies designed to “arbitrage” momentum profits away can be incredibly volatile and suffer huge drawdowns — not exactly low-risk-easy-to-leverage trading strategies that the 200 IQ types look forward to exploiting. Long story short, sometimes even the best evidence-based active investment strategies can create a formidable challenge to investors seeking to exploit them. It’s a kind of quid pro quo: in order to access the potential gain, you must willing to accept the potential pain. Could momentum be the most epic data-mining result in all of finance? Sure. Could it vanish in the future? Possible. However, if we believe that momentum stocks are 1) naturally riskier and 2) driven by systematic mispricing that is costly to “arbitrage,” we can expect momentum investing to work in the future.1 There’s been strong growth in momentum-focused strategies and investment products in recent years. Is there a capacity limit for the strategy? If so, are we near that limit? Jack Vogel, one of my business partners, recently published a long piece called, “Factor Investing and Trading Costs,” which addresses this question in great detail. The short answer, yes, the capacity on momentum strategies is limited. Some folks argue it’s anywhere from$5 billion to $300 billion-plus in capacity. On the question of “are we near the limit,” I’d guess that we are still a ways off, based on a few things. First, most so-called momentum funds are closet-indexers so their actual momentum exposure if fairly limited even with a large amount of assets under management. Also, David Blitz [Robeco Asset Management] highlights that the ETF market as a whole hasn’t taken a dramatic momentum bet. At some point momentum, or any strategy for that matter, could suffer from too many dollars chasing too few returns. That said, given the relatively poor performance of momentum over the past decade, I’m not convinced there are huge swaths of short-term-performance-chasing investors looking to dive into stock momentum strategies — I think most [performance-chasing] investors have turned to things like cryptocurrency speculation. You’ve previously noted that institutional investors have only dipped their toes into momentum. That’s surprising, given the strategy’s encouraging historical record. What accounts for the reluctance among the investment behemoths to dive in deeper? There are almost certainly some large institutional investors implementing uber-sophisticated momentum strategies at scale. However, I’ve spoken to chief investment officers at several multi-billion-dollar endowments who weren’t even familiar with the term and/or the strategy. This was really surprising when I engaged in one of these conversations, but then I quickly remembered that not every CIO is buried in academic finance research. Many CIOs are tried and true fundamental investors and their philosophies revolve around the “value investing” ethos. So, even in this day and age, when systematic strategies are en vogue in the ETF space, many in the institutional space are still enamored with human stock pickers as opposed to fairly simple systematic investment approaches. I’m not exactly sure why this is the case, but my guess is that there is a potential agency problem at play: the consultants and internal investment staffs wouldn’t have a job if the pension/endowment bought a handful of index or factor funds and called it a day. Are momentum strategies sufficiently robust to stand on their own? Or is it advisable to pair it with other strategies, such as value investing and/or a plain-vanilla market-indexing portfolio? Depends who you ask. If you ask a value investor they will say, “buy value,” and never touch momentum, and vice versa for a momentum/technical type. The answer is you should probably do both, because value and momentum are excellent diversifiers. AQR Capital Management published an excellent paper[“Value and Momentum Everywhere”] on the subject. Why do so many investors punt on momentum strategies? We wrote a piece, “Evidence-based investing requires less religion and more reason,” where we discuss the fundamental and technical religions in the marketplace. We think a lot of the “anti-momentum” sentiment is driven by a religious-like approach to investing. But why? Taking a step back, the mission for long-term active investors is to beat the market. Active investors should focus on the scientific method to address a basic question: What works? Warren Buffett obviously showed that value investing, irrespective of technical considerations, can work. But George Soros and Paul Tudor Jones also showed that technical analysis can work just as well. An ever growing body of academic research formalizes the evidence that fundamental strategies (e.g., value and quality) and technical strategies (e.g., momentum and trend-following) both seem to work. Many dogmatic investors, however, looking to confirm what they already believe, selectively adopt the research evidence that fits their investing religion. In contrast, an evidence-based investor will conclude that fundamental and technical analysis strategies can work because they are two sides of the same coin. They are cousins because they share the common objective of exploiting the poor decisions of market participants influenced by biased decision-making. As Andrew Lo, an influential and forward-looking financial economist at MIT, correctly observes about the debate between fundamental and technical traders, “In the end we all have the same goal, which is to forecast uncertain market prices. We should be able to learn from each other.” What’s the biggest risk with momentum investing generally? Is there some aspect of risk that’s unique to momentum? Volatility. For example, our public momentum indexes (see the data on our indexes here), are highly focused and concentrated long-only momentum strategies. These strategies are expected to have around 25% volatility versus 15% for the generic stock market. That’s intense! You’ll almost certainly experience violent portfolio pain so that you’ll wish you had never heard of momentum investing. But, of course, this intense volatility arguably comes with a reasonable chance of earning excess returns. One can apply trend-following overlays and other risk management strategies to try and ease the momentum pain, but the harsh reality is that volatility will always exist for well-constructed momentum strategies. There are some other risks associated with long/short momentum strategies, which are related to dynamically shifting beta. If one is going down that path they should certainly read “Momentum Crashes,” by Kent Daniel and Tobias Moskowitz. 1 For defining momentum, Gray notes: “Momentum can refer to trend-following strategies, also called ‘time series’ momentum, but let’s discuss the classic ‘momentum factor’ in academic finance research. This momentum is a relative strength, or ‘cross-sectional,’ momentum (described here).Quick example to highlight the difference: Consider stock A and B. A is down 10% and B is down 20% over the past 12 months. A trend-following, or time series momentum, strategy would not buy either of these stocks, however, a cross-sectional momentum strategy would buy A and short/avoid B, because A is relatively stronger than B, despite having poor absolute momentum.” CapitalSpectator.com is a finance/investment/economics blog that’s edited by James Picerno. The site’s focus is macroeconomics, the business cycle and portfolio strategy (with an emphasis on asset allocation and related analytics). Picerno is the author of Dynamic Asset Allocation: Modern Portfolio Theory Updated for the Smart Investor (Bloomberg Press, 2010) and Nowcasting The Business Cycle: A Practical Guide For Spotting Business Cycle Peaks (Beta Publishing, 2014). In addition, Picerno publishes The US Business Cycle Risk Report, a weekly newsletter that quantitatively evaluates US recession risk in real time. Picerno is also working on a new book about using R for portfolio analytics. The publication date is expected in mid-2018. This article is from CapitalSpectator.com and is being posted with CapitalSpectator.com’s permission. The views expressed in this article are solely those of the author and/or CapitalSpectator.com and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice. 17889 Quant Interpreting and Visualizing AutoCorelation By Jithin J and Karthik Ravindra, Byte Academy Analyzing a Time Series Data needs special attention. Here, we would like to explore working with time series data and identify the eect of auto correlation to come up with a more practical approach to work in Linear Regression Models. When using some data to try to estimate some value, say equity precies, Autocorrelation is a common feature. It is defined as the situation when the error terms of the linear regression model are correlated. So, if one error term is positive (or negative), and this fact causes the next error term to also be positive (or negative), we say that the model suers from autocorrelation. It is a very serious problem, as it violates the common assumption that the error term is stochastic and non-deterministic. Maintaining a stochastic error term is important to maintain the integrity of a linear regression otherwise it risks inducing bias in the model's estimations. Let's take an example of some financial data during a stock market crash. The crash on day one increases the likelihood of observing a downward trend for the next few days, perhaps even weeks. If the model suers from autocorrelation and is used for extrapolation, the model will estimate a similar stock market crash in the future as well. Therefore, we must first be able to identify the presence of this trend. To prepare this article, we decided to pick a financial data set. After a quick research we decided to work on Shiller PE ratio and estimate the movement of S&P monthly closing price. The data was taken from: http://www.multpl.com/shiller-pe/table?f=m. Domain Knowledge The Shiller P/E is a valuation measure usually applied to the US S&P 500 equity market. It is defined as price divided by the average of ten years of earnings (moving average), adjusted for inflation. As such, it is principally used to assess likely future returns from equities over timescales of 10 to 20 years, with higher than average values implying lower than average long-term annual average returns. Webscraping We start with extracting data scraping the Shiller P/E ratio and S&P closing prices from http://www.multpl.com/shiller-pe/table?f=m . If interested in the webscraping, the Python code is here: https://github.com/jithinjkumar. Once our data has been extracted we store in pandas DataFrames. We create a pandas data frame with index column as time series and S&P closing and Shiller Ratio as our column. Once the data is stored, we need to clean and prepare it for analysis. Data Preparation and Data Cleaning using Pandas library: Creating a Time Series So we have Shiller ratio data and S&P closing price in two dierent data frames, now let’s perform a lookup function to get the Shiller PE ratio for each month into the closing price data frame. We have 1769 entries and 4 columns SandP_Date and sh_Date are date columns we could easily drop one of them and we need to check for null values. sh_Ratio has 120 null values, we could drop these values from our dataset safely as this accounts to less than 6% of total row items Now we create a time series for which the S&P Date column needs to format correctly so that we are able to assign the correct data type for each columns. Now our Dataframe is in a time series format and ready for further analysis. Stay tuned for the next post in this series, in which we will discuss Time Series Analysis. ------------------------------------------------------- Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations. Byte Academy is based in New York, USA. It offers coding education, classes in FinTech, Blockchain, DataSci, Python + Quant. This article is from Byte Academy and is being posted with Byte Academy’s permission. The views expressed in this article are solely those of the author and/or Byte Academy and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice. 18066 Quant R Tip of the Month: Correlation Over Time In my earlier post from March 2018, I introduced the rollapply function that executes a function on a rolling window basis. While this function is very useful, it needs a little modification for users to apply other general operations. Originally, I faced this issue when I tried to compute the correlation matrix across different asset returns on a rolling window. For the demonstration, let's consider the returns for all sector ETFs excluding real estate: library(quantmod) v <- c("XLE","XLU","XLK","XLB","XLP","XLY","XLI","XLV","XLF") t1 <- "1990-01-01" P.list <- lapply(v,function(x) get(getSymbols(x,from = t1)) ) P.list <- lapply(P.list,function(x) x[,6]) P <- Reduce(merge,P.list) names(P) <- v R <- na.omit(P/lag(P) - 1) By default, rollapply executes the given function on each time series separately and returns a time series object. For instance, tail(rollapply(R,25,mean)) returns the 25 moving average for each one separately. On the other hand, if I try to compute the moving correlation, instead, I get the following tail(rollapply(R,25,cor)) which computes the correlation with the same ETF rather than other ETFs - as it treats each time series separately. As a remedy, add by.column = F argument to the rollapply function. In this case, the function returns a time series xts object, however, with$9 \times 9 = 81\$ columns, where each column corresponds to the pairwise correlation between the 9 sector ETFs rather than a squared matrix.

COR <- rollapply(R,25,cor,by.column = F)
dim(COR)
class(COR)

What’s left to be done is to stack these vectors back into a correlation matrix, one for each time period. To do so, I will refer to the plyr package. The plyr package allows users to take an array (a), a data frame (d), or a list (l), execute a given function over the given object, and output the results in either format. For our case, I will input the time series COR object as an array and output it as a list, where each element in the list corresponds to the moving correlation matrix.

library(plyr)
COR.list <- alply(COR,1,function(x) matrix(x,nrow = ncol(R), byrow = T ))

The second argument in the alply specifies the margin, where 1 indicates that the given function to be executed over the rows, while 2 states that it should be executed over the columns instead. The third argument, which takes a function, stacks each row of the COR object into a squared matrix. As a result, we have:

round(COR.list[[25]],2)

which is identical to correlation matrix computed over the first 25 days in the data

round(cor(R[1:25,]),2)

Finally, one can either keep the rolling correlation matrix in a list or transform it back a time series using certain computations (e.g., construct portfolio weights and compute the out-of-sample return as a time series). As a final demonstration, I will show how one can stack the list into a time series of average correlation across sectors over time.

# the following computes average of the upper traingle correlation matrix elements
COR.mean <- sapply(COR.list, function(x) mean(x[upper.tri(x)]) )
summary(COR.mean)

To retrieve back into a time series object, following trick should serve well:

library(lubridate)
names(COR.mean) <- date(COR)
COR.mean <- as.xts(COR.mean)
plot(COR.mean)

Note that in order to transform a numerical vector into a time series, I label the values with the corresponding date and, then, set it object as an xts object, whereas the lubridate is an extremely useful package to handle date formats.

Visit Majeed’s GitHub – IBKR-R corner for more info.

Learn more about Majeed’s research in R during his presentation at the R/Finance 2018 Conference in Chicago on June 1, 2018: http://www.rinfinance.com/

Majeed Simaan, Ph,D Finance, is well versed in research areas related to banking, asset pricing, and financial modeling. His research interests revolve around Banking and Risk Management, with emphasis on asset allocation and pricing. He has been involved in a number of projects that apply state of the art empirical research tools in the areas of financial networks (interconnectedness), machine learning, and textual analysis. His research has been published in the International Review of Economics and Finance and the Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence. Majeed also pursued graduate training in the area of Mathematical Finance at the London School of Economics (LSE). He has a strong quantitative background in both computing and statistical learning. He holds both BA and MA in Statistics from the University of Haifa with specialization in actuarial science.

This article is from Majeed Simaan and is being posted with Majeed Simaan’s permission. The views expressed in this article are solely those of the author and/or Majeed Simaan and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

18032

K-Means Clustering For Pair Selection In Python - Overview

In this series, we will cover what K-Means clustering is, how it can be used for solving the age-old problem of pair selection for Statistical Arbitrage, and the advantage of using K-Means for pair selection compared to using a brute force method. We will also create a Statistical Arbitrage strategy using K-Means for pair selection and implement the elbow technique to determine the value of K.

Let’s get started!

Part I – Life Without K-Means

To gain an understanding of why we may want to use K-Means to solve the problem of pair selection we will attempt to implement a Statistical Arbitrage as if there was no K-Means. That is, we will attempt to develop a brute force solution to our pair selection problem and then apply that solution within our Statistical Arbitrage strategy.

Let’s take a moment to think about why K-Means could be used for trading. What’s the benefit of using K-Means to form subgroups of possible pairs? Couldn’t we just come up with the pairs ourselves?

This is a great question and one undoubtedly you may have wondered about. To better understand the strength of using a technique like K-Means for Statistical Arbitrage, we’ll do a walk-through of trading a Statistical Arbitrage strategy if there was no K-Means. I’ll be your ghost of trading past so to speak.

First, let’s identify the key components of any Statistical Arbitrage trading strategy.

1. We must identify assets that have a tradable relationship
2. We must calculate the Z-Score of the spread of these assets, as well as the hedge ratio for position sizing
3. We generate buy and sell decisions when the Z-Score exceeds some upper or lower bound

To begin, we need some pairs to trade. But we can’t trade Statistical Arbitrage without knowing whether or not the pairs we select are cointegrated. Cointegration simply means that the statistical properties between our two assets are stable. Even if the two assets move randomly, we can count on the relationship between them to be constant, or at least most of the time.

Traditionally, when solving the problem of pair selection, in a world with no K-Means, we must find pairs by brute force or trial and error. This was usually done by grouping stocks together that were merely in the same sector or industry. The idea was that if these stocks were of companies in similar industries, thus having similarities in their operations, their stocks should move similarly as well. But, as we shall see, this is not necessarily the case.

The first step is to think of some pairs of stocks that should yield a trading relationship. We’ll use stocks in the S&P 500 but this process could be applied to any stocks within any index. Hmm, how about Walmart and Target. They both are retailers and direct competitors. Surely they should be cointegrated and thus would allow us to trade them in a Statistical Arbitrage Strategy.

Let’s begin by importing the necessary libraries as well as the data that we will need. We will use 2014-2016 as our analysis period.

#importing necessary libraries

#data analysis/manipulation

import numpy as np
import pandas as pd

#importing pandas datareader to get our data
import pandas_datareader as pdr

#importing the Augmented Dickey Fuller Test to check for cointegration
from statsmodels.tsa.api import adfuller

Now that we have our libraries, let’s get our data.

#setting start and end dates
start='2014-01-01'
end='2916-01-01'
#importing Walmart and Target using pandas datareader
wmt=pdr.get_data_yahoo('WMT',start,end)
tgt=pdr.get_data_yahoo('TGT',start,end)

Before testing our two stocks for cointegration, let’s take a look at their performance over the period. We’ll create a plot of Walmart and Target.

#Creating a figure to plot on
plt.figure(figsize=(10,8))

#Creating WMT and TGT plots
plt.plot(wmt["Close"],label='Walmart')

plt.plot(tgt[‘Close'],label='Target')
plt.title('Walmart and Target Over 2014-2016')

plt.legend(loc=0)
plt.show()

In the above plot, we can see a slight correlation at the beginning of 2014. But this doesn’t really give us a clear idea of the relationship between Walmart and Target. To get a definitive idea of the relationship between the two stocks, we’ll create a correlation heat-map.

To begin creating our correlation heatmap, must first place Walmart* and Target* prices in the same dataframe. Let’s create a new dataframe for our stocks.

#initializing newDF as a pandas dataframe
newDF=pd.DataFrame()
#adding WMT closing prices as a column to the newDF
newDF['WMT']=wmt['Close']
#adding TGT closing prices as a column to the newDF
newDF['TGT']=tgt['Close']

Now that we have created a new dataframe to hold our Walmart* and Target* stock prices, let’s take a look at it.

newDF.head()

We can see that we have the prices of both our stocks in one place.

In the next post, we will create a correlation heat map of stocks and run some ADF tests

------------------------------------------------------------

*Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

If you want to learn more about K-Means Clustering for Pair Selection in Python, or to download the code, visit QuantInsti website and the educational offerings at their Executive Programme in Algorithmic Trading (EPAT™).

This article is from QuantInsti and is being posted with QuantInsti’s permission. The views expressed in this article are solely those of the author and/or QuantInsti and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

17880

1 2 3 4 5

披露

IBKR量化博客上提供的材料（包括文章与点评）仅供参考。此等材料不代表盈透证券（IB）推荐您或您的客户采用任何在IBKR量化博客上发布内容的独立顾问、对冲基金或其他实体之服务或在此类个人/实体处投资，也不代表盈透证券推荐您在任何顾问或对冲基金处投资。在IBKR量化博客上发布内容的顾问、对冲基金及其他分析师独立于IB，IB不就此等顾问、对冲基金及其他个人/实体的历史及未来业绩、或其提供的信息的准确性做任何称述与担保。盈透证券不会开展“适合性审查”来确保任意顾问、对冲基金或其他个人/实体的交易适合您。

IB量化博客上提及的证券或其他金融产品并不适合所有投资者。发布的材料并未考虑您特定的投资目标、财务状况或需求，也不构成任意证券、金融产品或策略之推荐。在进行任意投资或交易前，您应考虑该投资或交易是否适合您特定的情况，且在必要时寻求专业建议。过去的业绩不保证将来的业绩。