Jul 15 2010

Equity Portfolio Cluster Analysis

I am working on a little tool to help identify clusters in an equity portfolio. Ideally, I want to identify highly correlated ‘pockets’ of holdings, such that the holdings in each cluster do not provide much diversification from each other. Mentally, I imagine that a PCA analysis of the correlation of log differences for individual cluster would have 1 vector that represented a linear shift which explains the majority of the variance.

My first attempts have been limited in succes…

1) Perform a PCA decomposition of the correlation matrix, then regress the eigenvectors (scaled by their eigenvalues) against each column of the correlation matrix . After each regression, I find the coefficient with the largest absolute value (for the time being, I ignored ‘significance’). This coefficient identifies that holdings associated cluster. The problem, of course, becomes that each stock becomes associated with an eigenvector that has a very low contribution to the overall explained variance.

2) Similar to above, but I didn’t scale the eigenvectors. This ends up with all the holdings being associated with 1 vector, which on inspection was basically the ‘market’ vector (i.e. the linear shift of all holdings).

3) Use PCA to identify the number of eigenvectors required to explain 95% of the variance, and use hierarchical centroid clustering (using the correlation matrix rows as my ‘points’ in n-space). The issue here is that for a larger portfolio (say, 50 stocks), the eigenvectors fall off very precipitously, and I end up with 30-40 eigenvectors that explain ~1.5% of the variance — and therefore I end up with ~30-40 clusters.

4) Skip the PCA, and just use hierarchical centroid clustering, using the rows of the correlation matrix as my points in n-space, with a ‘maximum distance’ criteria, not allowing clustering if points are ‘too far’ apart. However, without any way to decide what this ‘maximum distance’ is, this didn’t feel very good.

5) I realized that the problem with using my correlation matrix rows as my points in n-space was that as the number of holdings increased, the correlation between two individual holdings mattered less and less. i.e. if I had two holdings that had similar correlations on every other holding, but a high correlation with each other, I want them clustered. But if they have a low correlation with each other (which typically implies, if their other correlations are ‘similar’, that the other correlations are low), I do NOT want them clustered. However, using the correlation matrix, as the number of holdings increased, the each dimension matters less and less, so the dimension identifying their low correlation with each other does not come into play.

To fix this, I mapped my correlation matrix into n-space (using optimization) s.t. the cosine between two holdings’ vectors in n-space was as close as possible to their correlation. This fixed the problems associated with the last issue, but I still don’t know how to identify how many clusters to select. Furthermore, the optimization seems to be quite slow (though, this is sort of a low priority problem)

In summary…

Given a portfolio of equity holdings, how would you recommend I go about identifying a) how many clusters exist in the portfolio and b) what those clusters are?

These are the problems I am stuck on…

  • Share/Bookmark

Jun 16 2010

Weighted Random Array

I had an array that I was trying to do weighted random pulls from. There are a few basic algorithms for this online, but my issue was that they were not efficient for very, very large tables — most were O(n) for each time you wanted to pull an element. So I wrote my own wrapper class that would handle the task for me using Alias Tables.

class WeightedRandomArray < Array
      def initialize(other_array, weights)
        total_weights = weights.inject(0.0) { |t,e| t+e }
        proportions = weights.map { |e| e / total_weights }
 
        elements = other_array.zip(proportions)
 
        # construct an alias table for faster access
        @table = []
 
        n = elements.size
        elements.map! { |a,w| [a, w*(n-1)] }
        elements.sort! { |e1,e2| e1[1] <=> e2[1] }
 
        while elements.size > 2
          p = elements[0][1]
          elements[-1][1] -= (1.0 - p)
          @table << [p, elements[0][0], elements[-1][0]]
          elements.delete_at(0)
          elements.sort! { |a,b| a[1] <=> b[1] }
        end
 
        p = elements[0][1]
        elements[-1][1] -= p
        @table << [p, elements[0][0], elements[-1][0]]
      end
 
      def random_element
        entry = @table[(Kernel.rand * @table.size).floor]
        if Kernel.rand < entry[0]
          return entry[1]
        else
          return entry[2]
        end
      end
    end

So now the work is all up-front — and while it is considerable, so long as you will be doing enough random draws, the O(1) time to get a random entry should trump…

  • Share/Bookmark

Jun 8 2010

Passing around Ruby blocks

I have some code that I am bundling together that required me to tackle a rather strange problem. Basically, I had one function that took a block, and another function that wrapped around that function. It looks something like this:

def inner(*args)
   yield args[0], args[1]*args[2]
end
 
def wrapper
   inner(3,4,5)
end

The question is, how do I pass a block to wrapper and have it get passed to inner? Google wasn’t much help here. My solution looks like:

def call_block
   yield 4, 5, 6
end
 
def wrap_block(&blk)
   call_block { |*args| blk.call(*args) }
end
 
wrap_block { |x,y,z| puts x+y+z }

Would love to see a smarter solution…

  • Share/Bookmark

May 29 2010

Momentum Heat Map, Part III

Now if we check our prediction error, where returns are sorted relative to their variance over the period the returns were generated, we see more stability.

However, at the end of the day, we are only working with ~30 assets. So let’s look at the 6-1-1 region (which seems relatively stable). We are effectively looking at an error of ~(265/30)=8.3 per asset. i.e. The mean prediction error in index for each asset, at each step, is about 8.3 spots. That … isn’t very good. In fact, that is about 25% of the assets…

Perhaps if instead of predicting the exact index, we looked at deciles?

  • Share/Bookmark

May 25 2010

Momentum Heat Map Part Deux

Do I spy, with my little eye, stability at 12-1 and 6-1?

  • Share/Bookmark

May 20 2010

Exploring the n-m predict p momentum model

There is quite a bit written about the 6-1 momentum model, which ranks investments by their previous 6 month return, minus the previous 1 month return, and holds for 1 month.

The idea behind the method is that average returns over the long run should be stable in the short run, made superior by taking into account short-term reversionary factors.

To explore this concept and see exactly how stable the 6-1 model is, I explored 30 diverse ETFs (equities, commodities, currencies, bonds, et cetera) over a period of 400 days, changing my momentum, reversion, and prediction periods.

To begin, I transformed adjusted closing prices into cumulative log returns.
I calculate the prediction returns as follows (in matlab syntax):

pastReturns = (cumulativeReturns(today, :) - cumulativeReturns(today-n, :)) - (cumulativeReturns(today, :) - cumulativeReturns(today-m, :));

and the future returns as

futureReturns = (cumulativeReturns(today+p, :) - cumulativeReturns(today, :));

I then sort both sets of returns and calculate the difference between their sorted order to get the total estimation error

[v, idx] = sort(returns);
[v, futureIdx] = sort(futureReturns);
error = sum(abs(idx - futureIdx));

I do this, looping through n, m, and p. To generate the total error for a given (n,m,p) tuple, I loop through all available days, compute the prediction error for that day, and average the errors.

The results are visualized as follows (please note, axes must be scaled by a factor of 5 and that the data base been interpolated to fill in missing spots):

To me, there are two very intriguing things about this visualization.

First, there seems to be a region of stability (or am I just seeing what I want to see?) in the 90-110 day momentum, 25-50 day reversion, and 20-30 day prediction area, indicating that there may be some validity to the 6-1 momentum method. However, it is not the ONLY region of stability, bringing into question its validity.

Secondly, the line where the momentum look-back period equals the reversion look-back period (i.e. when there is NO prediction data and all stocks are expected to perform equally well), we get the minimum error. However, to keep a consisten equal balance in the portfolio would incur incredibly high turn-over, and I suspect that the benefit of this error decrease would actually be more costly due to trading costs & slippage.

So while 6-1 may out-perform buy and hold, it seems that a constantly equally rebalancing portfolio may be worth exploring further.

  • Share/Bookmark

May 19 2010

Immutable Ruby?

Got bored and hacked around.  Found this sort of interesting…
class Object
  def self.new( *args, &blk )
    o = allocate
    o.instance_eval{initialize( *args, &blk )}
    o.freeze
    o
  end
end
Forcing objects to be frozen after initialization?  Almost sounds … functional.
  • Share/Bookmark

Apr 1 2010

Long time, no post

So it has been well over a month since my last post, and that is just far too long.  Finals snuck around the corner, then it was spring break, and then getting back into the swing of school.  But now that I am back, I thought I would jump back on the train.

Today, I thought I would provide a couple thoughts on some tools I have been trying out.  I use Matlab fairly extensively in school, but since I work for myself, I have to use other, alternative products to develop my strategies in to avoid having to pay Matlab’s exorbitant licensing fees.  If you’ve read some of my past posts, you will also know that I have been trying to find some sort of alternative to my Ruby addiction — something preferably statically typed and with strong functional roots.

But I am also looking for something that allows me to do easy numerical computing and statistics work.  This means a good matrix interfacing library, as well as a strong set of statistics (and preferably, econometrics) packages.

My first stop was the “Matlab” clones and cousins: Octave, SciLab, R, SciPy and Sage.  My initial thoughts were:

  • Octave: I would really like to have a GUI.  And a lot of Matlab functions I want are missing (particularly, textscan).
  • SciLab: Not enough to differentiate itself from Octave.  Why bother learning the differences?
  • R: I absolutely despise R’s semantics — but it is a brilliant statistics package.
  • Sage: Too ‘mathematica’ like for me
  • SciPy (with NumPy and IPython): Not a bad alternative, if I could get the damn thing to compile on Mac OS X 10.6

After continued searching, I found QtOctave, which solved my Octave GUI problem.  For now, I have stuck with that.

I also have been playing around with Haskell, doing as much homework in it as possible.  Being more of an ‘academic’ language, it has a lot of scientific packages available.  Unfortunately, it doesn’t really have a uniform set of interfaces, which I think is the largest bonus of Matlab.  Can Haskell do everything Matlab can?  Probably.  But can it do it as easily and without as much glue code?  Not at all.  I really do think that Haskell is a fantastic language and would like to work on a project where I can unify several projects into one easy to use package — but my Haskell capabilities are just not there yet.  This all makes the learning curve twice as frustrating compared to any of the Matlab clones.

Another tool I re-stumbled upon, which I like, is Weka.  Instead of having to code up all my classification algorithms to perform analysis, I can quickly load up a file into Weka and have it perform the tests.  Saves me a whole lot of time.  I highly recommend checking it out for any quick machine learning tasks you have.

That is pretty much it, for now.  I am currently working on installing RLaBPlus.  I will provide my thoughts in a few days.

  • Share/Bookmark

Feb 17 2010

Papers, Papers, Papers

These might be of interest to some folks reading this blog…

  • Share/Bookmark

Feb 17 2010

Another great Haskell Resource

Another great and free Haskell resource can be found here.

  • Share/Bookmark