Jul 15 2010

Equity Portfolio Cluster Analysis

I am working on a little tool to help identify clusters in an equity portfolio. Ideally, I want to identify highly correlated ‘pockets’ of holdings, such that the holdings in each cluster do not provide much diversification from each other. Mentally, I imagine that a PCA analysis of the correlation of log differences for individual cluster would have 1 vector that represented a linear shift which explains the majority of the variance.

My first attempts have been limited in succes…

1) Perform a PCA decomposition of the correlation matrix, then regress the eigenvectors (scaled by their eigenvalues) against each column of the correlation matrix . After each regression, I find the coefficient with the largest absolute value (for the time being, I ignored ‘significance’). This coefficient identifies that holdings associated cluster. The problem, of course, becomes that each stock becomes associated with an eigenvector that has a very low contribution to the overall explained variance.

2) Similar to above, but I didn’t scale the eigenvectors. This ends up with all the holdings being associated with 1 vector, which on inspection was basically the ‘market’ vector (i.e. the linear shift of all holdings).

3) Use PCA to identify the number of eigenvectors required to explain 95% of the variance, and use hierarchical centroid clustering (using the correlation matrix rows as my ‘points’ in n-space). The issue here is that for a larger portfolio (say, 50 stocks), the eigenvectors fall off very precipitously, and I end up with 30-40 eigenvectors that explain ~1.5% of the variance — and therefore I end up with ~30-40 clusters.

4) Skip the PCA, and just use hierarchical centroid clustering, using the rows of the correlation matrix as my points in n-space, with a ‘maximum distance’ criteria, not allowing clustering if points are ‘too far’ apart. However, without any way to decide what this ‘maximum distance’ is, this didn’t feel very good.

5) I realized that the problem with using my correlation matrix rows as my points in n-space was that as the number of holdings increased, the correlation between two individual holdings mattered less and less. i.e. if I had two holdings that had similar correlations on every other holding, but a high correlation with each other, I want them clustered. But if they have a low correlation with each other (which typically implies, if their other correlations are ‘similar’, that the other correlations are low), I do NOT want them clustered. However, using the correlation matrix, as the number of holdings increased, the each dimension matters less and less, so the dimension identifying their low correlation with each other does not come into play.

To fix this, I mapped my correlation matrix into n-space (using optimization) s.t. the cosine between two holdings’ vectors in n-space was as close as possible to their correlation. This fixed the problems associated with the last issue, but I still don’t know how to identify how many clusters to select. Furthermore, the optimization seems to be quite slow (though, this is sort of a low priority problem)

In summary…

Given a portfolio of equity holdings, how would you recommend I go about identifying a) how many clusters exist in the portfolio and b) what those clusters are?

These are the problems I am stuck on…

  • Share/Bookmark

May 29 2010

Momentum Heat Map, Part III

Now if we check our prediction error, where returns are sorted relative to their variance over the period the returns were generated, we see more stability.

However, at the end of the day, we are only working with ~30 assets. So let’s look at the 6-1-1 region (which seems relatively stable). We are effectively looking at an error of ~(265/30)=8.3 per asset. i.e. The mean prediction error in index for each asset, at each step, is about 8.3 spots. That … isn’t very good. In fact, that is about 25% of the assets…

Perhaps if instead of predicting the exact index, we looked at deciles?

  • Share/Bookmark

May 25 2010

Momentum Heat Map Part Deux

Do I spy, with my little eye, stability at 12-1 and 6-1?

  • Share/Bookmark

May 20 2010

Exploring the n-m predict p momentum model

There is quite a bit written about the 6-1 momentum model, which ranks investments by their previous 6 month return, minus the previous 1 month return, and holds for 1 month.

The idea behind the method is that average returns over the long run should be stable in the short run, made superior by taking into account short-term reversionary factors.

To explore this concept and see exactly how stable the 6-1 model is, I explored 30 diverse ETFs (equities, commodities, currencies, bonds, et cetera) over a period of 400 days, changing my momentum, reversion, and prediction periods.

To begin, I transformed adjusted closing prices into cumulative log returns.
I calculate the prediction returns as follows (in matlab syntax):

pastReturns = (cumulativeReturns(today, :) - cumulativeReturns(today-n, :)) - (cumulativeReturns(today, :) - cumulativeReturns(today-m, :));

and the future returns as

futureReturns = (cumulativeReturns(today+p, :) - cumulativeReturns(today, :));

I then sort both sets of returns and calculate the difference between their sorted order to get the total estimation error

[v, idx] = sort(returns);
[v, futureIdx] = sort(futureReturns);
error = sum(abs(idx - futureIdx));

I do this, looping through n, m, and p. To generate the total error for a given (n,m,p) tuple, I loop through all available days, compute the prediction error for that day, and average the errors.

The results are visualized as follows (please note, axes must be scaled by a factor of 5 and that the data base been interpolated to fill in missing spots):

To me, there are two very intriguing things about this visualization.

First, there seems to be a region of stability (or am I just seeing what I want to see?) in the 90-110 day momentum, 25-50 day reversion, and 20-30 day prediction area, indicating that there may be some validity to the 6-1 momentum method. However, it is not the ONLY region of stability, bringing into question its validity.

Secondly, the line where the momentum look-back period equals the reversion look-back period (i.e. when there is NO prediction data and all stocks are expected to perform equally well), we get the minimum error. However, to keep a consisten equal balance in the portfolio would incur incredibly high turn-over, and I suspect that the benefit of this error decrease would actually be more costly due to trading costs & slippage.

So while 6-1 may out-perform buy and hold, it seems that a constantly equally rebalancing portfolio may be worth exploring further.

  • Share/Bookmark

Jan 6 2010

Some thoughts on Back Testing

Back testing is fairly common when analyzing the profitability of a strategy, but there are many other things to be considered besides returns.  Much of this list came from perusing Nuclear Phynance (particularly, FDAXHunter’s input).

  • Length of the Period Tested: Over what time-frame did you run the test? How long was that time-frame?  What market conditions persisted over it?  The goal should be to run the strategy on as diverse a time-frame as possible, to help you discover what market factors play a critical role in the success or failure of your method.
  • Out of Sample Tests & Locations: Much like above, you want to have a fairly significant and diverse set of out of sample data to test on.
  • Average Trade / Win / Loss: What does the average trade of the system look like?  Are you normally profitable, or do you lost money and it was a couple fat-tail trades that gave you profitability?
  • Volatility & Skewness of P&L Stream: Is your profitability stable?  Do you have a fat loss tail?  How skewed positive are you?
  • Maximum Consecutive Losers / Winners: This is very important for when the system goes live.  Is five bad trades in a row a reason to pull the system?  Ten?  What is normal for the system?  When should we start getting concerned?
  • Maximum Draw Down & Time: If we implemented the system, what sort of draw-downs would we have to stomach, and over how long would we have to stomach them?  I don’t care much about a 1300% return over 5 years if for 4 of them, I faced an 80% draw-down.  You would probably pull the plug long before the fifth year came around.
  • Average Draw Down & Time: What does the average draw down look like?  Is it stable?
  • Percent of Winners Removed Until Neutral: What percentage of our best trades do we have to remove before the system breaks?  Is it only a few?  Is our success based on a few large winners, or do we have a stable set of success?
  • Histogram of P&L: What does the P&L look like, historically?  This is a visualization of the skew and volatility from above.
  • Shape of Equity Curve: Are we talking about a long, smooth curve?  A curve with lots of jumps?  How much interim volatility between new highs?
  • Optimal Parameter Location: Are the parameters we used in a stable location, or are they at a pin-point?  If they are at a pin-point, the success of the model is most likely the result of data-mining, instead of a true edge.  Instead, we would like to see that our parameters are in a plateau — the model remains stable for moderate changes in our parameter values.
  • Performance in Other Markets: How does the model perform on securities it wasn’t designed to trade for?  Does it succeed in similar securities?  Does it fail in securities which the edge shouldn’t exist on?

All of these things should be considered along-side a simple profitability analysis, or else you will end up with a ‘successful back-test of three years, blow up in three days’ scenario.

  • Share/Bookmark

Nov 2 2009

The “e-ratio” system measurement

Always on the look-out for interesting system measurements, I stumbled across the “e-ratio,” described here. The concept is to graph your normalized (volatility-adjusted) MFE (maximum favorable excursion) to MAE (maximum adverse excursion) ratio for different n (time-steps) since a trade was taken. A value above 1 allows you to identify a positive edge. It is most useful when making tweaks to a system, to help identify when a filter or signal adds ‘edge’ over all time-steps.

  • Share/Bookmark

Jul 27 2009

Why Traditional Mutual Fund Managers Can’t Win

As an intern at an independent consultancy firm, I had the privilege to interview dozens of money managers. They were all intelligent and well-intentioned individuals. After all, the better their clients did, the better they did. However, when all was said and done, I came to one conclusion: it is not surprising that most actively managed funds underperform passive investing in the benchmark.

Firstly, fees automatically put an active manager at a disadvantage. It is like starting a foot-race several meters behind. This not only has a very real and tangible effect on returns, but may also have a psychological affect on managers. Not only do they have to out-perform the market, but they have to out-perform the market and fees.

Secondly, style boxes. Morningstar may have been well intentioned in their invention, but style boxes are the absolute worst thing to happen to investing. Ever. While it allows clients to discover talent in a style that they are looking for exposure to, it locks managers into specific styles. As we all know, markets and economies move in cycles, and not all time periods are good for all styles. However, as investors paying fees, we demand that our managers not only retain their exposure to their style (that is what we are paying them for, after all), but to be invested 100% of the time. This is where we, as investors, shoot ourselves in the foot. We effectively are handcuffing our managers. With these restrictions, we are saying that we as investors know more than the managers do. Truthfully, who knows better than the manager whether it is a good time to be employing their style strategy? Perhaps it just isn’t a good time to be buying small-cap value? But with our restrictions, we force our managers to be in 100% of the time.

Next, as managers get more assets under management, they run into liquidity issues. It takes longer for them to build and get out of positions. It also limits the companies they can invest in. A pure equity investment manager with several billion under management will never truly add alpha by picking companies, because he either has to purchase hundreds of different equities, or purchase large cap stocks. In either way, his bets become strictly macro-economic. This further limits the time-frame the manager can invest in. Eventually, most traditional mutual fund managers just end up with an S&P 500 or Dow Jones look alike.

With these factors stacked against traditional mutual fund managers, it is no wonder that they cannot out-perform passive investing. As investors, we effectively guarantee that result by the restrictions we place on them.

  • Share/Bookmark