SVM Trend Strategy on Nikkei 225 Mini Futures Exploring the Potential of Supervised Learning


Support Vector Machines (SVM) are among the most popular Supervised Learning techniques for classification and regression, due to their ease in usage to find non-linear patterns. They work by separating data by finding an optimal threshold – known as a decision boundary or hyperplane, to classify observations. When new data is presented to the SVM, it can distinguish which side of the line it falls under and make a prediction. According to Bishop (2006), the key advantage of SVMs are that they provide a straightforward solution to clustering non-linear relationships.

In recent years, academic research into applying SVMs to the sphere of financial markets has yielded promising results. Cao and Tay (2003) applied SVM in predicting returns for 5 futures contracts listed on the Chicago Mercantile Exchange (CME) and found that it outperformed a multilayer back-propagation neural network model. In addition, in a study of investigating the forecasting power of SVM on S&P CNX NIFTY returns, Kumar and Thenmozhi (2007) found that it outperformed neural network, random forest and linear ARIMA models.

This post aims to explore the use of a SVM based trading system on Nikkei 225 Mini Futures, a popular Asian index futures contract. First, 20 different indicators were chosen for predicting forward returns on Nikkei 225. PCA is applied to reduce dimensionality of the features. Next, SVM was then fitted on training data to forecast the direction of returns over the testing dataset, and the predictive accuracy of the model is considered. Finally, backtesting is conducted to evaluate the performance of the trading strategy.

Data and Methodology


Our study uses four years’ worth of indicator data for JPX Nikkei 225 front month mini futures, which was obtained from Bloomberg. This comprises of daily values for 20 indicators for the period of 17th September 2013 to 15th September 2017. These input variables are summarized below.

Fig 1 – Table of Input Variables

Training and Testing Datasets

For the classification problem, we split the data into a 3:1 ratio, and use three years (17th September 2013 to 19th September 2016) of data for training (classification) and one year (20th September 2016 to 15th September 2017) of data for testing.

Principal Component Analysis

Principal Component Analysis (PCA) performs a linear mapping of data into a low-dimensional space in such a way that the variance of data in the low dimensional space is maximized.


Firstly, the data has 20 dimensions. Having higher dimensions might incur heavy computational costs in the SVM model training process and take a longer time.

Secondly, the features in the data would be correlated. For example, it would not be surprising if all volatility parameters had strong correlations. A complicated learning model can successfully analyze and transform the highly correlated features to lowly correlated ones. However, research shows a complicated model may incur overfitting. Therefore, reducing the dimensionality of our input parameters through PCA can help to minimize model overfitting.

How PCA works

Firstly, PCA computes the correlation matrix between features in the training set. Secondly, it computes the eigenvalues for this matrix. Thirdly, it computes the eigenvectors corresponding to the largest eigenvalues. Such eigenvectors are called feature vectors. Lastly, it maps the original features to new features by multiplying the original data with the feature vectors.

PCA is applied to reduce the dimensionality of the input data. The principal components that constitute more than 80% of the variance in the data are then selected as features for training the SVM model.

Support Vector Machines

SVMs classify data into different classes using a hyperplane. Data on same sides of the hyperplane are classified as belonging to the same class. Over here we consider a simple case from Thome (2012):

Fig 2 – Simple SVM Linear Classification Example

In this figure, points represent the actual data, while the X and Y axes represent two features of the data, Blue circles and yellow triangles represent different classes. In this case, the hyperplane is a line which separates the data to two sides. The objective of SVM is to find hyperplane H3, because H3 is farthest from both clusters of data, compared to H1 and H2. The data is assumed to be clustered, so staying farther from the existing data means higher chance of classifying unseen data correctly.

In the example above, the hyperplane is linear because the data is linearly separable. In the real world, data may be nonlinearly distributed, therefore different hyperplanes are developed to solve such classification problem. The figure below from Kim (2013) requires a cyclic boundary. A polynomial hyperplane is used to determine the boundary.

Fig 3 – SVM Polynomial Classification Example

Computational Results

Using PCA, it was determined that just 6 features out of 20 could explain 81.8% of the variance.

Fig 4 – Principal Component Analysis of Input Variables – Cumulative Proportion of Variance by Number of Components

Next, the SVM was calibrated on the training data. The prediction variables are the new data generated by PCA. The target variable is the sign of return on the next trading day. In the training data, if the next day’s return is positive, we classify its target value as 1. If negative, it is classified as -1.

Due to the multivariate nature of the data, we used a polynomial function for our SVM model.

The chart below shows the fitting of SVM on the first two principal components, which make up 58.8% of the variance in the data. The degree of the color indicates the probability that a record belongs to the corresponding class. In addition, the light-colored section between the red and blue indicates a clear bound between the two classes.

Fig 5 – SVM Classification Plot – First 2 Components

The SVM was then fitted on the testing data. This yielded the following confusion matrix:

Fig 6 – SVM Classification – Confusion Matrix of Prediction Results

The results showed the SVM model to be accurate at predicting the next day’s sign of return 59.59% of the time. Now that we have showcased the accuracy of the SVM for a yearly forecast, we develop a trading system based on our results to showcase the potential of applying it in a strategy over the same timeframe on a daily basis.

Strategy Development and Backtest

Based on our SVM results on the testing dataset that spans the one-year period from 20th September 2016 to 15th September 2017, trade signals are generated based on the predicted sign of return for the next day. If the sign of return is predicted to be 1, the trading system will enter a long position. If predicted to be -1, it enters a short position. On the next day, if the predicted sign of return is the same as the previous day, we maintain a hold position.

Fig 7 – Representative SVM Strategy Backtest

Over the one-year testing period, the strategy posted a return of 37.4% with a Sharpe ratio of 0.141. The largest daily profit was 6.3% while the largest daily loss was 5.7%. A total of 70 trades were conducted, comprising of 35 longs and 35 shorts, with the average holding period at 5.2 days. Based on P&L of the trades, the win rate was 63.7% and expectancy was 49bps.

This trading system is just for demonstration and should not be used for live trading.


In summary, we found that using SVM does present an edge for trading Nikkei 225 mini futures at a daily rebalanced level. If used for live trading, daily optimization of the SVM might improve accuracy of the strategy. In the meantime, feel free to test out our code for your own research!


  • Bishop, C. M., Pattern Recognition and Machine Learning, Springer, 2013
  • Tay, F. E. H. & Cao, L., Application of Support Vector Machines in Financial Time Series Forecasting, Omega 29 (4), 2001
  • Kumar, M. & Thenmozhi, M., Forecasting Stock Index Movement: A Comparision of Support Vector Machines and Random Forest, IIMB Manage, Rev 21, 2008
  • Antonio Carlos Gay Thome, SVM Classifiers – Concepts and Applications to Character Recognition, ISBN 978-953-51-0823-8, November 7, 2012
  • Eric Kim, Everything You Wanted to Know about the Kernel Trick, Web, Released on 1-9-2013
2 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *