Answers From The Book Information Content in the Limit Order Book for Crude Oil Futures (WTI)


Order book imbalance strategies have been a big alpha source in automated market making. Tick by tick observations provide important information about general market sentiment and direction, and high frequency trading firms (HFTs) have been very efficient at trading on this information at very low latency intervals.

In their recent paper on HFT strategies, Goldstein, Kwan and Philip analyzed six months of full order book and trade data for stocks in the S&P/ASX 100 and showed that order book imbalances are strong predictors of future prices. They found that HFT firms increase price efficiency in markets by trading in the same direction as the imbalance but could also use order book information to detect institutional buying or selling pressure. In addition, their results also show that HFT firms trade more aggressively during periods of high market volatility and more successful a picking off stale orders from institutional and retail investors.

This post aims to expand on the work by Goldstein et al. by investigating the predictive significance of order book imbalance on price changes, short term intraday volatilities, and realized spreads for the NYMEX WTI Crude Oil futures market to assess its implications on execution.

In addition to finding strong evidence that WTI futures prices move in the direction in the direction of order balance, we found a more effective definition to the predictive relationship between order imbalance and price changes. We also discovered an interesting causality between order book imbalance and realized spreads.

Data and Metric Definitions


Our study uses one week’s worth of tick data for WTI futures which was obtained from Bloomberg. This comprises of order book and trade data with second based time stamps that shows prices and volumes for every trade and change to best bid and ask prices from the period of 7th to 14th July 2017:

Fig 1 – Tick Data for NYMEX WTI Crude Oil Futures


Order Book Imbalance

Order book imbalance (OBI) is defined as the difference between the volume available at the best bid and ask prices, as a proportion of the total volume available at the best bid and ask prices. This is calculated at each point in time by:

$OBI_{t} = \frac{VolBestBid_{t}-VolBestAsk_{t}}{VolBestBid_{t}+VolBestAsk_{t}}$

Where VolBestBid and VolBestAsk are the quote volumes available at the top bid and ask levels respectively. This results in a normalized indicator for the shape of the order book, and we obtain the following distribution of values for OBI for our dataset:


Fig 2 – Histogram and Descriptive Statistics for Distribution of Order Book Imbalance Values


Mid Price

The mid price is defined as the average of the current best bid and ask prices being quoted for the security:

$Mid\:Price_{t} = \frac{BestBidPrice_{t}-BestAskPrice_{t}}{2}$


Short Term Volatility

In their paper, Goldstein et al. calculate volatility as the difference between the log of the highest  ask price and the log of the lowest bid price within each interval. We use their measure for calculating volatility within short intraday time frames for our study:

$Volatility_{t,t+1} = log(max(BestAskPrice_{t,t+1}))-log(max(BestBidPrice_{t,t+1}))$


Realized Spread

In market microstructure, the realized half spread is the difference between the transaction price and the mid-price over an interval of time. This measure incorporates price impact into trading costs and is calculated using the following formula:

$Realized\:Spread_{t,t+1} = \frac{Executed Price_{t}-Mid Price_{t+1}}{Mid Price_{t}}$


Empirical Results


In investigating the statistical properties of order book imbalance relative to the other metrics we have defined, we first focus on analyzing the information content of order book imbalance on WTI crude oil futures by using Goldstein et al.’s restricted regression model, which uses the book imbalance for the best bid and offer:

$Return=\beta _{0}+\beta _{1} OBI _{TopLevel}+\epsilon$

They calculated return as the log of the difference between the bid-ask midpoint 10 trades in the future and the midpoint price just prior to the trade. OBI is the order book imbalance immediately before the trade, based on the equation provided in the previous section. Besides applying their return definition for a range of future trade sequences, we also explore the predictability of OBI on midpoint price returns over a range of time intervals as well as the next successive midpoint price change.

We then extend this regression analysis to analyze the predictive power of order imbalance on realized spreads and short term volatility which are of relevance to price impact and execution. In the case of the volatility model, we use average values for OBI over the previous interval in order to smoothen out noise. We expect to find that order imbalance of the limit order book contains information on future realized spreads and short term volatility. Following the literature by Foucault et al. (2007) wherein they developed a theoretical limit order market in which traders had different views of future volatility given their private information. Given limit orders have options like features, they would need to be priced using the trader’s private volatility information. Upon order submission into the limit order book also disseminate their volatility information into the market place.

$Realized\:Spread=\beta _{0}+\beta _{1} OBI _{TopLevel}+\epsilon$

$Volatility=\beta _{0}+\beta _{1} Average\:OBI _{TopLevel}+\epsilon$


Price Changes

Fig 3 – Regression Analysis of Order Book Imbalance vs. Various Mid Price Return Definitions


Although all regressions were significant at 5%, we see that based on R-squares most of them do not provide much incremental information about future price movements. However, the model which uses the next midpoint change as a return definition stands out as the best for prediction. In their study, Goldstein et al. split their data by day and stock and obtained mean and median R-squares of 12.02% and 10.96% respectively before regressing OBI against the change in mid-price 10 trades in the future. Surprisingly, we obtained an R-square of 10.9%, which is very close to their results. Overall, this proves that the order book shape contains statistically significant information about future price movements for WTI futures.


Realized Spreads

Fig 4 – Regression Analysis of Order Book Imbalance vs. Various Realized Spread Definitions


Poor results were obtained from using order book imbalance to predict realized spreads. Regressions on time based definitions of realized spreads yielded insignificant results across different short term intervals (regression results on OBI vs. 5-second realized spreads shown in table). On the other hand, although using imbalance to predict realized spreads over 10 future trades yielded significant results, R-square was small.

We were interested in finding out whether OBI and realized spreads had any implications on each other at all. Granger Causality tests on OBI values and realized spreads over intervals of 10 trades suggest that instead of OBI being a significant predictor of realized spreads, the reverse was true.

Fig 5 – Granger Causality Analysis of Realized Spreads over 10 Trades vs. Successive Order Imbalance Values
Fig 6 – Granger Causality Analysis of Order Imbalance Values vs. Successive Realized Spreads over 10 Trades


Fitting the model the other way round showed a highly significant regression with R-square of 11.5%. In addition, realized spreads and lagged imbalance values had a correlation of 33.9%. This implies that the previous spread size causes an imbalance. A possible economic reasoning behind this could be that when realized spread delta is large, orders are not able to refresh fast enough in order to rebalance the book, which leaves a liquidity imbalance.

Fig 7 – Regression Analysis of Realized Spreads over 10 Future Trades vs. Lagged Order Imbalance Values


Short Term Volatility

Fig 8 – Regression Analysis of Short Term Volatility over Different Intervals vs. Average Order Imbalance Values


Results suggest that order imbalance does not contain much information about future short term volatility. In addition, fitting the regression on standard deviations of mid prices over the different intervals did not yield better results. In this case, Granger causality tests were not in favor of the opposite relationship either.

On the other hand, we found a relationship between 30 min volatility and corresponding trade volumes over the same interval to be highly significant and have correlation of 85.4%.

Fig 9 – Regression Analysis of 30 Minute Volatility vs. Trade Volumes



Our study reinforces the previous findings of Goldstein et al. that order book imbalances are strong predictors of future prices, in particular the direction of the next price change. This could have economic implications for strategies based on order book shape. In addition, we find evidence that realized spreads contain information about future imbalances. However, we acknowledge that in this complex realm of microstructure, simple linear models may not be optimal in dealing with the significant noise within the data. In future, we seek to explore these relationships with more advanced techniques for dealing with noisy tick data such as Levy processes and Kalman filters.



  • Goldstein, Michael A. and Kwan, Amy and Philip, Richard, High-Frequency Trading Strategies (May 23, 2017). Available at SSRN:
  • Foucault, T., Moinas, S., Theissen, E., 2007. Does Anonymity Matter in Electronic Limit Order Markets? Review of Financial Studies 20, 1707-1747.