# Comparing Supervised Learning Methods for Hang Seng Index Futures Long/Short Strategy A Continuation of Machine Learning Studies on Asian Index Futures Markets

**Motivation**

Trend prediction in financial markets is a very complex task, due to the fact that prices are inherently noisy, non-stationary, and deterministically chaotic. In recent years, studies have leveraged various machine learning algorithms to tackle this task. Leung, Chen, and Daouk (2001) used a Probabilistic Neural Network to forecast the trend of index return on the Taiwan Stock Exchange Index, and found it obtained higher return than strategies guided by the forecasts estimated by the random walk model. Fernando, Simón, and Julián (2002) predicted foreign exchange rates in European Monetary System with a K-Nearest Neighbors (KNN) approach and concluded KNN was a useful tool to predict daily exchange data.

Delving deeper into machine learning techniques, we compare the performance of popular supervised learning algorithms on trend strategy on Hang Seng Index Futures.

Building upon our last post which used Support Vector Machines (SVM) as a supervised learning method for trend strategy on Nikkei 225 mini futures, we explore a comparison of other supervised learning methods including Neural Networks, Random Forest, Naïve Bayes, K-nearest neighbors as well as SVM on a similar classification problem on long/short strategy. As part of a broad effort to explore feasibility of machine learning based strategies across Asian markets, we based this paper on HKEX Hang Seng Index futures, another immensely popular Asian equity index futures contract.

First, we used a similar approach for feature engineering from the previous post by using the same 20 indicators and applying PCA to reduce dimensionality. Next, we used the five supervised learning methods as classification algorithms on the training dataset for forecasting the direction of returns over the testing dataset, and the predictive accuracy of the models were considered. Details of each method are provided within the different sections to provide viewers with a better understanding of each method. Finally, we conducted backtesting to evaluate the performances across methods.

Based on our study and among various methods, we found SVM to have the most robust edge in trading Hang Seng index futures, although Neural Network yielded promising results.

**Data and Methodology**

*Data*

Our study uses four years’ worth of indicator data for HKEX Hang Seng Index Futures front month futures, which was obtained from Bloomberg. This comprises of daily values for 20 indicators for a four-year period from 17th September 2013 to 15th September 2017. The features extracted from the raw data are the same as that in the previous paper. As a recap, the input variables are summarized below:

**Fig 1 – Table of Input Variables**

*Training and Testing Datasets*

For this classification problem, again we split the data into a 3:1 ratio, and use three years (17^{th} September 2013 to 19^{th} September 2016) of data for training (classification) and one year (20^{th} September 2016 to 15^{th} September 2017) of data for testing.

*Principal Component Analysis*

Our study applies Principal Component Analysis (PCA) to the features to reduce dimensionality. The cumulative proportion variance graph below shows the top 6 principal components can explain 80% of the variance and they are used to train the models.

**Fig 2 – Principal Component Analysis of Input Variables – Cumulative Proportion of Variance by Number of Components**

*Neural Networks*

As deep learning is becoming more and more popular (in 2014, deep learning algorithms have beaten humans in image recognition competitions), we apply neural network, which is a general form of deep learning, to predict the trend of futures’ price. Modelled after the human brain, Neural Networks serve to capture associations on large amounts of noisy data. A general diagram by Karpathy can be found below:

**Fig 3 – Neural Network Example**

The circles (neurons) in input layer transform the features created to new data. The neurons in two hidden layers transform the data from the previous layer to new data. The neuron in the output layer process data and generate the output. In general, the number of hidden layers and the number of neurons in each layer are determined by the complexity of the problems. Researchers usually spend much time to tune and determine the set of parameters to obtain the best performance on the testing data. The rule of thumb is to set the number of neurons in each hidden layer in a descending way. The data transformation functions can take logistic, tanh, or other forms in the output layer. In other layers, the functions are in linear form. Our results from fitting a neural network can be found in the computational results section.

*Random Forest*

Random forest is a robust machine learning algorithm that expands on decision tree models, by averaging the output of many decision trees. After generating a set of decision trees, it randomly selects a subset of the whole training data to train a decision tree, like what is happening in the following figure from TutorialKart (2017). After this training process, it predicts a testing record’s corresponding class according to average output of all decision trees.

**Fig 4 – Random Forest Example**

*Naive Bayes*

Naïve Bayes is used for classification problems in supervised learning. It is a simple probabilistic classifier based on Bayes’ theorem. It assumes input features are strongly independent. Based on training data, Naïve Bayes computes the probability of each class and the conditional probability of a feature’s value given a class. Then the algorithm computes the conditional probability of a class given a testing record.

*K-nearest Neighbors*

KNN is a non-parametric algorithm for both classification and regression problems. Distance between any two records is defined according to the Euclidean distance, Hamming distance, or distance measurement for features. KNN records the training set’s data. Then for each test case, it finds the test case’s K nearest neighbors and their classes. At last, it uses majority voting to determine the test case’s class. If there are equal number of neighbors for two classes, KNN will randomly select a class for the test case. For example, in the figure below, if KNN checks the red pentagon’s 3 nearest neighbors, it will determine the pentagon’s type as blue. If KNN checks the pentagon’s 5 nearest neighbors, the pentagon’s type would be green.

**Fig 5 – K-Nearest Neighbors Example**

*Support Vector Machines*

SVMs can be used to solve both classification and regression problems. To obtain good performance, researchers usually try different kernel functions to generate the best hyperplane that separates data from different classes in classification problems. To learn more about SVM, please refer to our previous paper – “SVM Trend Strategy on Nikkei 225 Mini Futures”.

*Classification Approach*

Across the various approaches, we the data output from PCA is used as prediction variables for the classification problem, with the target variable as the sign of the return on the next trading day. In the training data, if the next day’s return is positive, we classify its target value as 1. If negative, it is classified as -1.

** ****Computational Results**

*Neural Network*

In our experiment, we examined different combinations of the number of layers and the number of neurons in each layer. We realized the neural network’s performance was unstable on the test data set. Using the same configuration, the prediction accuracy rate fluctuated in a 7% range. A reasonable accuracy rate observed is 59.677%. The corresponding configuration is 1 hidden layer and 14 hidden neurons. We also noticed the logistic function usually performed better than tanh function in the output layer. More details are shown in the confusion matrix.

**Fig 6 – Confusion Matrix for Neural Networks Approach**

*Random Forest*

Since the speed for the training is very fast (Random forest training does not involve complicated mathematical computations), our study tested various combinations of parameters: the number of decision trees and sample size. The best accuracy we obtained was 58.5%.

**Fig 7 – Confusion Matrix for Random Forest Approach**

*Naïve Bayes*

Based on the confusion matrix, accuracy was 48%. The possible reason is that the features generated by PCA do not follow a Gaussian Distribution.

**Fig 8 – Confusion Matrix for Naive Bayes Approach**

*KNN*

Tuning the KNN model was simple because only the number of neighbors can be adjusted. In our study 20 neighbors are selected. Accuracy was 54%.

**Fig 9 – Confusion Matrix for KNN Approach**

*SVM*

After tuning, our study found that a polynomial kernel function outperformed other kernel types. The accuracy was 59.68%.

**Fig 10 – Confusion Matrix for SVM Approach**

Now that we have showcased the accuracy of the various supervised learning methods for a yearly forecast, we develop a trading system based on our results to showcase the potential of applying them on a long/short strategy over the same timeframe on a daily basis.

**Strategy Development and Backtest**

Based on results across the various methods on the testing dataset that spanned the one-year period from 20^{th} September 2016 to 15^{th} September 2017, trade signals were generated based on the predicted sign of the return for the next day. If the sign of return is predicted to be 1, the trading system will enter a long position. If predicted to be -1, it enters a short position. On the next day, if the predicted sign of return is the same as the previous day, we maintain a hold position. Backtesting results are shown below for each method.

*Neural Network*

Over the one-year testing period, the strategy posted a return of 38.8% with a Sharpe ratio of 0.176. The largest daily profit was 2.3% while the largest daily loss was 2.0%. A total of 115 trades were conducted, comprising of 58 longs and 57 shorts, with the average holding period at 3.1 days. Based on P&L of the trades, the win rate was 60.0% and expectancy was 30bps.

**Fig 11 – Representative Neural Network Strategy Backtest**

*Random Forest*

Over the one-year testing period, the strategy posted a return of 37.77% with a Sharpe ratio of 0.172. The largest daily profit was 2.3% while the largest daily loss was 2.0%. A total of 115 trades were conducted, comprising of 58 longs and 57 shorts, with the average holding period at 3.2 days. Based on P&L of the trades, the win rate was 60.0% and expectancy was 27bps.

**Fig 12 – Representative Random Forest Strategy Backtest**

*Naïve Bayes*

Over the one-year testing period, the strategy posted a return of -3.94% with a Sharpe ratio of -0.017. The largest daily profit was 2.3% while the largest daily loss was 2.3%. A total of 77 trades were conducted, comprising of 39 longs and 38 shorts, with the average holding period at 4.6 days. Based on P&L of the trades, the win rate was 44.2% and expectancy was 7.6bps.

**Fig 13 – Representative Na ïve Bayes Strategy Backtest**

*KNN*

Over the one-year testing period, the strategy posted a return of 22.1% with a Sharpe ratio of 0.108. The largest daily profit was 2.3% while the largest daily loss was 2.0%. A total of 96 trades were conducted, comprising of 48 longs and 48 shorts, with the average holding period at 3.7 days. Based on P&L of the trades, the win rate was 60.4% and expectancy was 18bps.

**Fig 14 – Representative KNN Strategy Backtest**

*SVM*

Over the one-year testing period, the strategy posted a return of 42.2% with a Sharpe ratio of 0.189. The largest daily profit was 2.3% while the largest daily loss was 2.0%. A total of 77 trades were conducted, comprising of 39 longs and 38 shorts, with the average holding period at 4.5 days. Based on P&L of the trades, the win rate was 61.0% and expectancy was 44bps.

**Fig 15 – Representative SVM Strategy Backtest**

**Conclusion**

In summary, we found that SVM had the strongest edge in trading long/short prediction on Hang Seng Index Futures, although Neural Networks showed promising results. To achieve better performance, in the future we aim to explore the application of ensemble methods for long/short trend trading strategies.

### References

- Chen, An-Sing and Daouk, Hazem and Leung, Mark T., Application of Neural Networks to an Emerging Financial Market: Forecasting and Trading the Taiwan Stock Index (July 2001). Available at SSRN: https://ssrn.com/abstract=237038 or http://dx.doi.org/10.2139/ssrn.237038
- Fernández Rodríguez, Fernando and Sosvilla Rivero, Simón and Andrada Félix, Julián, Nearest-Neighbour Predictions in Foreign Exchange Markets (January 2002). FEDEA Working Paper No. 2002-05. Available at SSRN: https://ssrn.com/abstract=300404 or http://dx.doi.org/10.2139/ssrn.300404
- Andrej Karpathy, CS231n: Convolutional Neural Networks for Visual Recognition, http://cs231n.github.io/neural-networks-1/
- TutorialKart, Random Forest in Machine Learning, 2017 https://www.tutorialkart.com/machine-learning/random-forest/#
- Jürgen Cox, Classification parameter optimization, 2015, http://www.coxdocs.org/doku.php?id=perseus:user:activities:matrixprocessing:learning:classificationparameteroptimization

Are the Sharpe ratios not being annualized? A 33% return with a drawdown of 5% should have a far higher Sharpe ratio than 1.

Thanks for pointing that out. Figures will be corrected.

Great article. It would also be good to see how Reinforcement Learning algos ( Q learning, for example) compare to NN, SVM.