Machine Learning & SciKit Learn

I made the point to someone the other day that technology and coding is getting easier and easier to accomplish. I don’t think I would have been able to perform ‘machine learning’ five years ago but with the resources available today (Python, SciKit Learn, and pages upon pages of StackOverflow) even someone like me can fit a model and build ML algorithms.

Machine Learning is also ridiculously “easy”. It’s literally 4 lines of code. It takes far longer to get the data in the right format than it does to fit and predict a model.

With inspiration from a posting on TradingWithPython blog, I analyzed a K Nearest Neighbors Model for trading the VXX. It utilizes just two ‘features’, the term structure and the volatility premium and tries to predict the direction of the 1 day forward return of VXX. The model basically plots the features together, along with a 3rd variable being the output (long or short).  When fed with new data, it plots it and finds the nearest neighbors and decides based on factors like proximity and count which output it is most similar to. This is hopefully easy to see with 2 features but can be impossible with multiple features!

The Volatility Premium and Term Structure are given by market prices. Find the corresponding location, figure out if it’s closer to black or white and output a signal.

Results are not bad. I compared it to another system I’ve written about and solidly outperforms along with a benchmark constant short VXX.

BenchMark Short & Hold VXX

Despite these interesting initial results, I’m still not convinced ML is anything other than one huge optimization technique. If I beat the data enough, I’m sure I’ll find something with an awesome equity curve. I can apply thousands of iterations of parameters in seconds. I can run every ML model there is in a few hours and I’m sure by chance alone I should come up with something good. Perhaps this is a good approach for selecting a portfolio of strategies and applying some logic to construction? A momentum based survival of the fittest approach might be of interest. But perhaps ML is just another passing quant craze like Portfolio Insurance or Monte Carlo Simulation?

Speaking of Monte Carlo, @modestpropsal  had an excellent link to a post on Relativity Media. A hollywood exec purported to have developed an algorithm based on Monte Carlo simulations that would predict box office success. It’s an awesome article on the power of buzzwords and salesmanship. Needless to say, it didn’t work out too well and it involved some of the #brightest hedge funds on Wall Street.

I am fascinated by the capacity of a good sales pitch, please suggest any good posts, blogs, books, etc on the subject! Thanks!

Code is available on my Github page. Enjoy!