\documentclass{article}
\usepackage[affil-it]{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
\usepackage{url}
\usepackage{hyperref}
\hypersetup{colorlinks=false,pdfborder={0 0 0}}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[ngerman,english]{babel}
\begin{document}
\title{MDPI LaTeX template}
\author{David Allen}
\affil{University of Sydney}
\date{\today}
\maketitle
\selectlanguage{english}
\begin{abstract}
The direction of price movements are analysed under an ordered probit framework, recognising the importance of accounting for discreteness in price changes. By extending the work of \cite{hausman1992ordered} and \cite{yang2012predicting}, this paper focuses on improving the forecast performance of the model while infusing a more practical perspective by enhancing flexibility. This is achieved by extending the existing framework to generate short term multi period ahead forecasts for better decision making, whilst considering the serial dependence structure. This approach enhances the flexibility and adaptability of the model to future price changes, particularly targeting risk minimisation. Empirical evidence is provided, based on seven stocks listed on the Australian Securities Exchange(ASX). The prediction success varies between 78 and 91 per cent for in-sample and out-of-sample forecasts for both the short term and long term. KEY WORDS ordered probit; stock prices; auto-regressive; multi-step ahead forecasts;%
\end{abstract}%
\section{Introduction}
There has been a significant growth in market micro-structure research, which is concerned with the study of the underlying process that translates the latent demands of investors into transaction prices and volumes\selectlanguage{ngerman} (Madhavan, 2000). The study of the time series properties of security prices has been central to market micro-structure research for many years. Madhavan (2000) asserts that frictions and departures from symmetric information do affect the trading process. Furthermore, insights into future price trends provides additional information useful in strategy formulation. As per financial economic theory, the asset returns cannot be easily predicted by employing statistical or other techniques and incorporating publicly available information. Nevertheless, recent literature bears evidence of successful forecasting of asset return signs; see for example, Breen et al. (1989), Leung et al. (2000), White (2000), Pesaran et al. (2004) and Cheung et al. (2005). While having mean independence, it is statistically probable to have sign and volatility dependence in asset returns (Christoffersen et al., 2006).
The knowledge of the future direction of the stock price movement provides valuable guidance in developing profitable trading strategies. However, there is no clear consensus on the stochastic behaviour of prices and also the major factors determining the change in prices. In this context, theories of information asymmetry stating that private information deduced from trading causes market price fluctuations (See Kyle, 1985) became important propositions. Consequently, many market attributes have been employed as substitutes for information in the study of security price behaviour. Price changes occur in discrete increments, which are denoted in multiples of ticks. It is well recognised today that failing to treat the price process as a discrete series could adversely affect prediction results. Initially the modeling of discrete transaction prices was done by Gottlieb et al. (1985). The generalisation and variation of such a modeling framework can be found in Ball (1988), Glosten et al. (1988), Harris (1990), Dravid et al. (1991) and Hasbrouck (1999). Most often, earlier studies have treated price change as a continuous variable, primarily focusing on the unconditional distribution and also ignored the timing of transactions, which is irregular and random. The \selectlanguage{english}"ordered probit model", which was initially proposed by Aitchison et al. (1957) is a useful model for discrete dependent variables, which can take only a finite number of values with a natural ordering. Gurland et al. (1960) developed it further and later it was introduced into the social sciences by McKelvey et al. (1975), which became an analytical tool in the financial market security price dynamics of micro-structure research. This could be used to quantify the effects of various factors on stock price movements, whilst accounting for discreteness in price changes and the irregular spacing of trades. % Further, it enables the simultaneous estimation of the conditional mean and conditional variance of transaction prices, under the framework. \\
In an ordered probit analysis of the conditional distribution of price changes, Hausman et al. (1992) recognised the importance of accounting for discreteness, especially in intraday price movements. In such fine samples, the extent of price change is limited to a few distinct values, which may not be well approximated by a continuous state space. Their paper investigated the impact of several explanatory variables in capturing the transaction price changes. Importantly, the clock-time effect, measured in terms of duration between two consecutive trades, bid-ask spread, trade size and market-wide or systematic movements in prices based on a market index on conditional distribution of price changes were modeled under this framework. In a more recent study, Yang et al. (2012) extended the existing empirical literature on the impact of market attributes on price dynamics, utilising an ordered probit model. Their study explored the price impact of variables such as market depth and trade imbalance (also referred to as order imbalance in quote driven markets), in addition to trade size, trade indicator, bid-ask spread and duration which were found to be significant in other similar studies. The model thus estimated by Yang et al. (2012), was able to forecast the direction of price change for about 72% of the cases, on average.
The in-sample and out-of-sample forecasts provided by the authors were based on the observed values of the regressors in the forecast horizon. However, in generating out-of-sample forecasts beyond one-step ahead incorporating observed values for regressors is of limited practical use, as they are not observed priori. Developing multi-step ahead forecasts, at least for a few transactions ahead is much more beneficial in a practical perspective, for effective decision making. However, such forecasting evidence under this framework is seemingly absent in the literature. Therefore, in addressing this shortcoming, this paper introduces a forecasting mechanism to generate forecasts beyond the one-step ahead level. Towards this end, disaggregated forecasts are generated first, for each of the explanatory variables for the period concerned. In order to generate forecasts for the regressors included, the serial dependence structure of each of the variables is investigated and appropriate forecasting models are fitted. Sign forecasts are subsequently generated, based on those predicted regressor values, rather than on observed values and the estimated coefficients of the ordered probit model. These prediction results are compared with those of the existing literature. %This resulted in several Autoregressive Moving Average (ARMA) and Autoregressive Fractionally Integrated Moving Average (ARFIMA) type models. On the other hand, in the case of trade-indicator, which is a categorical variable with more than two categories, did not warrant the fitting of above mentioned type of models. Therefore, taking a different approach, a multinomial logistic regression is employed to capture the behaviour of this variable.This aggregated forecasting exercise is more valuable in predicting price movements in real life, as it studies the behaviour of each explanatory variable and improve the forecasting process by capturing the impact of serial correlation on those variables. Through the introduction of dynamic variables into the forecasting system, the predictive capability of this approach is investigated through a study based on the stocks of seven major companies listed in the Australian Securities Exchange (ASX).
In summary, the primary motivation of this paper is to introduce a method to enhance the flexibility and adaptability of the ordered probit model to generate multi-step ahead forecasts of stock price changes. Identifying and estimating appropriate univariate models for forecasting each explanatory variable, taking their serial dependence structure into account, towards this endeavour, is the second motivation. The third motivation is to improve on the results of Yang et al. (2012) in model estimation and forecast accuracy, by reducing noise in the data used and suitably formulating variables. %Therefore, this exercise is concernedbased on the same stocks and almost the same independent variables that were employed by ~\cite{yang2012predicting}. We were able to achieve more than 88 per cent rate of accuracy, on average, in the out of-sample forecasts of the direction of price changes using observed regressor values. In addition, more than 91 per cent of in-sample estimates,on average, correctly predicted the direction of price change. This is in comparison to the 72 per cent achieved by ~\cite{yang2012predicting}. It is between 78-80 per cent when predictied regressor values were incorporated.
The remainder of the paper is organized as follows. Section 2 provides a review of the ordered probit model while Secton 3 gives a description on data and the variables used in the analysis. This section reports the summary statistics for each variable for the chosen stocks and introduces the relevant models for estimation and forecasting of durations, residuals and regressors. The empirical evidence is reported in Section 4 including model estimation and diagnostics. The results of the forecasting exercise for both in-sample and out-of-sample are presented in section 5 and finally, the concluding remarks are provided in section 6.
\section {A review of ordered probit model}
In a sequence of transaction prices, $P_{t_0}$, $P_{t_1}$, $P_{t_2}$,...,$P_{t_T}$ occurring at times $t_0$,$t_1$,$t_2$,...,$t_T$ the resulting price changes multiplied by 100 is represented as an integer multiple of a tick and denoted by $Y_1$, $Y_2$,...,$Y_T$, where $Y_{k}\equiv \{P_{t_k} -P_{t_{k-1}}\} \times 100$. The ordered probit model analyses discrete dependent variables with responses that are ordinal but not continuous. Underlying the indexing in such models, there exists a latent continuous metric and the thresholds partition the real line into a series of different regions corresponding to these ordinal categories. Therefore, the unobserved latent continuous variable $Y^*$ is related to the observed discrete variable Y. It is assumed that the conditional mean of $Y^*$ is described as a linear combination of observed explanatory variables, X and a disturbance term that has a Normal distribution.
The ordered probit specification takes the following form:
\begin{equation}\label{eq:1}
Y_k^* = X'_k\beta + \varepsilon_k, \quad \text {where} \quad \varepsilon_k|X_k \sim \ i.n.i.d. N(0,\sigma_k^2),
\end{equation}
where i.n.i.d denotes that the errors are independently but not identically distributed. $X_k$ is a $q \times 1$ vector of predetermined explanatory variables that govern the conditional mean,$Y_k^*$ and $\beta$ is a $q \times 1$ vector of parameters to be estimated. Here, the subscript denotes the transaction time. The observed price change $Y_k$ is related to the latent continuous variable $Y_k^*$ according to the following scheme:
\begin{eqnarray}\label{eq:2}
Y_k = \left\{ \begin{array}{rl}
s_1 &\mbox{ if $Y_k^* \in A_1$}\\
s_2 &\mbox{ if $Y_k^* \in A_2$}\\
\vdots &\mbox{ \vdots}\\
s_m &\mbox{if $Y_k^* \in A_m$}\quad ,\\
\end{array} \right.
\end{eqnarray}
where the sets $A_k$ comprise of non overlapping ranges of values, partitioning the continuous state space of $Y_k^*$ and the $s_j$ are the corresponding discrete values containing the state space of $Y_k$, which are called states. Let $s_j$'s be the price change in ticks -2, -1, 0, 1, .... . Suppose that the threshold values of A are given as follows:
\begin{eqnarray}\label{eq:3}
\left\{ \begin{array}{rl}
&\mbox{ $ A_1 \equiv (-\infty, \alpha_1],$}\\
% &\mbox{ $ \ \ $}\\
&\mbox{ $ A_2 \equiv (\alpha_1, \alpha_2],$}\\
&\mbox{ $ \vdots$}\\
&\mbox{ $ A_k \equiv (\alpha_{k-1}, \alpha_k],$}\\
&\mbox{ $ \vdots$}\\
&\mbox{ $ A_m \equiv (\alpha_{m-1}, \infty).$}\\
\end{array} \right.
\end{eqnarray}
The number of states, $\it m$ is kept finite, though in reality price change could take any value in cents to avoid the explosion of unknown number of parameters. As per ~\cite{hausman1992ordered}, the only requirement in this framework is the conditional independence of the $\varepsilon_k$'s, where all the serial dependence would be captured by the regressors. Further, there are no restrictions on the temporal dependence of the $X_k$'s. The conditional distribution of $Y_k$ , conditioned upon $X_k$ depends on the partition boundaries and the distributional assumption of $\varepsilon_k$. The conditional distribution in the case of Gaussian $\varepsilon_k$ is
$$P(Y_k=s_i|X_k)=P(X'_k\beta+\varepsilon_k\in A_i|X_k)$$
\begin{eqnarray}\label{eq:4}
=\left\{ \begin{array}{ll}
P(X'_k\beta+\varepsilon_k \leq \alpha_1|X_k) &\mbox{ if $ i=1,$}\\
&\mbox{ $ \ \ $}\\
P(\alpha_{i-1} < X'_k\beta+\varepsilon_k \leq \alpha_i|X_k) &\mbox{ if $1< i < m,$}\\
&\mbox{ $ \ \ $}\\
P(\alpha_{m-1} < X'_k\beta+\varepsilon_k |X_k) &\mbox{ if $ i=m,$}\\
\end{array} \right.
\end{eqnarray}
\begin{eqnarray}\label{eq:5}
=\left\{ \begin{array}{ll}
\Phi\left(\frac{\alpha_1 - X'_k\beta}{\sigma_k}\right) &\mbox{ if $ i=1,$}\\
&\mbox{ $ \ \ $}\\
\Phi\left(\frac{\alpha_i - X'_k\beta}{\sigma_k}\right) - \Phi\left(\frac{\alpha_{i-1} - X'_k\beta}{\sigma_k}\right) &\mbox{ if $1< i < m,$}\\
&\mbox{ $ \ \ $}\\
1 - \Phi\left(\frac{\alpha_{m-1} - X'_k\beta}{\sigma_k}\right) &\mbox{ if $ i=m,$}\\
\end{array} \right.
\end{eqnarray}
where $\Phi(\cdot)$ denotes the standard Normal cumulative distribution function. Since the distance between the conditional mean $X'_k \beta$ and the partition boundaries determines the probability of any observed price change, the probabilities of attaining each state, given the conditional mean, could be changed by shifting the partition boundaries appropriately. The explanatory variables capture the marginal effects of various economic factors that influence the likelihood of a given state as opposed to another. Therefore, the ordered probit model determines the empirical relation between the unobservable continuous state space and the observed discrete state space as a function of the explanatory variables, $X_k$, by estimating all the system parameters, including $\beta$ coefficients, the conditional variance $\sigma_k^2$ and the partition boundaries $\alpha$, from the data itself. \\
Let $U_{ik}$ be an indicator variable, which takes the value 1 if the realisation of the $k$th observation, $Y_k$ is the $i$th state $s_i$ and 0 otherwise. The log likelihood function $L$ for the price changes $Y=[Y_1, Y_2, ..., Y_T]$, conditional on the regressors, $ X= [X_1, X_2, ...,X_T]$, takes the following form:
\begin{multline}\label{eq:6}
L(Y|X) =\sum_{k=1}^T \left\{U_{1k}.log\Phi \left(\frac{\alpha_1 - X_k^\prime \beta}{\sigma_k}\right) \right.\\
+\sum_{i=2}^ {m-1}U_{ik}.log\left[\Phi \left(\frac{\alpha_i - X_k^\prime\beta}{\sigma_k}\right) -\Phi \left(\frac{\alpha_{i-1} - X_k^\prime \beta}{\sigma_k}\right)\right] \\
\left.+ U_{mk}.log \left[1 - \Phi \left(\frac{\alpha_{m-1} - X_k^\prime \beta}{\sigma_k}\right)\right]\right\}
\end{multline}
%\end{eqnarray}
%\begin{eqnarray*}
%L (Z | X) & = & \sum_{k = 1}^{n} \left \{ Y _{1k} \log \Phi \left ( \frac{\alpha_1 - X _{k}^{\prime} \beta }{ \sigma_k } \right ) \right. \\
%& & + xxxx \\
%& &\left. + xxxxx \right \}
%\end{eqnarray*}
\noindent~\cite{hausman1992ordered} has reparameterised the conditional variance $\sigma_k^2$ based on the time between trades and lagged spread. \\
%The Section \ref{sec:data} considers the time series data in modelling.
\section{Data and variables}\label{sec:data}\section{Data and variables}\label{sec:data}
\subsection{Data description}
The relevant data for this analysis was obtained from the Securities Industry Research Centre of Asia-Pacific (SIRCA) in Australia. The dataset consists of time stamped tick-by-tick trades, to the nearest millisecond and other information pertaining to trades and quotes for the chosen stocks listed in the Australian Securities Exchange (ASX). This study is based on a sample of stock prices collected during a 3 month period from 16 January 2014 to 15 April 2014. The stocks that were not subjected to any significant structural change, representing seven major industry sectors, are included in the sample. The selected stocks are Australian Gas Light Company (AGL), BHP Billiton (BHP), Commonwealth Bank (CBA), News Corporation (NCP), Telstra (TLS), Westfarmers (WES) and Woodside Petroleum (WPL) from Utilities, Materials, Financials, Consumer Discretionary, Telecommunication services, Consumer Staples and Energy sectors respectively. All these seven stocks are included in the study by~\cite{yang2012predicting}, consisting of both liquid as well as less liquid assets, to minimise sample selection biases. However, the sampling period and the sample size differ between studies. Two stocks are not included in this paper due to the absence of transactions during the study period. Intraday price changes extracted from tick by tick trade data forms the basic time series under consideration. Overnight price changes are excluded as their properties differ significantly from those of intraday price changes~\citep[See][]{amihud1987trading,stoll1990stock}. The trading hours of ASX are from 10.00 am to 4.00 pm. Due to the possibility of contamination of the trading process by including opening and closing trades ~\citep{engle1998autoregressive} the trades during the initial 30 minutes of opening and the final 30 minutes prior to closing are disregarded.
The following information with respect to each transaction is collected for each stock: Trade data comprising of date, time, transaction price and trade size, quote data such as bid price and ask price, market depth data comprising of volume at the highest bid price (best bid) and volume at the lowest ask price (best ask) and market index (ASX200). HFD generally contains erroneous transactions and outliers that do not correspond to plausible market activity. This is mainly attributed to high velocity of transactions~\citep{falkenberry2002high}. Among others ~\cite{hansen2006realized},~\cite{brownlees2006financial} and~\cite{barndorff2009realized} have paid special attention to the importance of data cleaning. A rigorous cleaning procedure is used here in obtaining a reliable data series for the analysis, mainly in accordance with the procedure outlined in ~\cite{barndorff2009realized}. To generate a time series at unique time points, during the instances of simultaneous multiple trades (quotes), the median transaction price (bid/ask prices) of those trades (quotes) is considered. Correspondingly, cumulative volume of those trades (quotes) are taken as the trade volume (bid/ask volume).
In the ordered probit model, the dependent variable $Y_k$ is the price change between the $\it k$th and $\it k-1$th trade multiplied by 100. This records $Y_k$ in cents, which however is equivalent to ticks as the tick size of the ASX for stocks with prices of the chosen magnitude is 1 cent. In this analysis, several different explanatory variables are included to measure their association with direction of price movement, following~\cite{yang2012predicting}. Bid and ask quotes are reported as and when quotes are updated, which necessitates the matching of quotes to transaction prices. Each transaction price is matched to the quote reported immediately prior to that transaction. Similarly, aggregate volumes at the best bid and best ask prices together with the ASX200 index representing the market are also matched in a similar fashion. The bid-ask spread $Sprd_{k-1}$, is given in cents, while $LBAV_{k-1}\ \&\ LBBV_{k-1}$ denote the natural log of number of shares at best ask and bid prices respectively. $LVol_{k-1}$ gives the natural logarithm of $(k-1)$th trade size. Conditional duration, $\psi_{k-1}$ and standardised transaction duration $\epsilon_{k-1}$ are derived estimates by fitting an autoregressive conditional duration model (ACD (1,1)) to diurnally adjusted duration data. A brief description of the model introduced by~\cite{engle1998autoregressive} is presented in the Appendix. The initial record of each day is disregarded as it is linked to the previous day's prices and results in negative durations. $TI_{k-1}$ denotes the trade indicator of $(k-1)$th trade, which classifies a trade as a buyer-initiated, seller-initiated or other type of trade. Trade imbalance $TIB_{k-1}$, based on the preceding 30 trades that occurred on the same day~\citep{yang2012predicting} (YP hereafter) is calculated as follows:
\begin{equation}\label{eq:9}
TIB_{k-1}=\frac{\sum_{j=1}^{30}\left(TI_{(k-1)-j}\times Vol_{(k-1)-j}\right)}{\sum_{j=1}^{30} Vol_{(k-1)-j}}
\end{equation}
\noindent The first 30 observations of trade imbalance (TIB) is set to zero as TIB also depends on the previous day's trade imbalance for these transactions.
\noindent Market index return $RIndx_{k-1}$, prevailing immediately prior to transaction $\it k$ is computed as given below:
\begin{equation}\label{eq:10}
RIndx_{k-1}=\ln(INDX_{k-1})-\ln(INDX_{k-2})
\end{equation}
The sampling period and the use and categorisation of certain variables in this analysis differ from YP. ASX200 is applied here instead of specific sector indexes as the impact of the performance of the overall economy tends to be more significant on stock price behaviour than of a specific sector. On the other hand, the reference point for grouping the price changes is the `one tick' threshold vis a vis the `zero' change. This provides a more meaningful classification of the groups, as the categorisation of price change is based on a range of values rather than a fixed value for a certain group.
%\subsection{Data cleaning procedure}
%The detection and removal of incorrect and inconsistent observations in the trade and quote data from the ASX was based on the following procedure. Some steps are common to both trade and quote data while others are specific to one type.\\
%
%\begin{enumerate}
%\item [\bf All\ data]
%\item [C1.] Delete records with zero transaction, bid or ask prices
%
%\item[\bf Quote data]
%
%\item [Q1.] Instances of multiple quotes for a given time stamp, replace with the median bid and median ask price
%\item [Q2.] Delete records with negative spreads
%\item [Q3.] Delete records, if the median spread exceeds 50 times the median spread for a given day
%\item [Q4.] Delete records if the mid-quote deviated by more than 10 mean absolute deviations from a rolling centered median of 50 observations (25 observation before and 25 observations after, excluding the observation under consideration).
%
%\item[\bf Trade data]
%\item[T1.] In instances of multiple transactions for a given time stamp, replace with the median transaction price.
%\item[T2.] Delete records if the transaction price deviated by more than 10 mean absolute deviations from a rolling centered median of 50 observations (25 observation before and 25 observations after, excluding the observation under consideration).
%\item [T3.] Delete records if the transaction price is more than ask plus bid-ask spread or is below the bid minus bid-ask spread.
%\end{enumerate}
\subsection{Data description}
The relevant data for this analysis was obtained from the Securities Industry Research Centre of Asia-Pacific (SIRCA) in Australia. The dataset consists of time stamped tick-by-tick trades, to the nearest millisecond and other information pertaining to trades and quotes for the chosen stocks listed in the Australian Securities Exchange (ASX). This study is based on a sample of stock prices collected during a 3 month period from 16 January 2014 to 15 April 2014. The stocks that were not subjected to any significant structural change, representing seven major industry sectors, are included in the sample. The selected stocks are Australian Gas Light Company (AGL), BHP Billiton (BHP), Commonwealth Bank (CBA), News Corporation (NCP), Telstra (TLS), Westfarmers (WES) and Woodside Petroleum (WPL) from Utilities, Materials, Financials, Consumer Discretionary, Telecommunication services, Consumer Staples and Energy sectors respectively. All these seven stocks are included in the study by~\cite{yang2012predicting}, consisting of both liquid as well as less liquid assets, to minimise sample selection biases. However, the sampling period and the sample size differ between studies. Two stocks are not included in this paper due to the absence of transactions during the study period. Intraday price changes extracted from tick by tick trade data forms the basic time series under consideration. Overnight price changes are excluded as their properties differ significantly from those of intraday price changes~\citep[See][]{amihud1987trading,stoll1990stock}. The trading hours of ASX are from 10.00 am to 4.00 pm. Due to the possibility of contamination of the trading process by including opening and closing trades ~\citep{engle1998autoregressive} the trades during the initial 30 minutes of opening and the final 30 minutes prior to closing are disregarded.
The following information with respect to each transaction is collected for each stock: Trade data comprising of date, time, transaction price and trade size, quote data such as bid price and ask price, market depth data comprising of volume at the highest bid price (best bid) and volume at the lowest ask price (best ask) and market index (ASX200). HFD generally contains erroneous transactions and outliers that do not correspond to plausible market activity. This is mainly attributed to high velocity of transactions~\citep{falkenberry2002high}. Among others ~\cite{hansen2006realized},~\cite{brownlees2006financial} and~\cite{barndorff2009realized} have paid special attention to the importance of data cleaning. A rigorous cleaning procedure is used here in obtaining a reliable data series for the analysis, mainly in accordance with the procedure outlined in ~\cite{barndorff2009realized}. To generate a time series at unique time points, during the instances of simultaneous multiple trades (quotes), the median transaction price (bid/ask prices) of those trades (quotes) is considered. Correspondingly, cumulative volume of those trades (quotes) are taken as the trade volume (bid/ask volume).
In the ordered probit model, the dependent variable $Y_k$ is the price change between the $\it k$th and $\it k-1$th trade multiplied by 100. This records $Y_k$ in cents, which however is equivalent to ticks as the tick size of the ASX for stocks with prices of the chosen magnitude is 1 cent. In this analysis, several different explanatory variables are included to measure their association with direction of price movement, following~\cite{yang2012predicting}. Bid and ask quotes are reported as and when quotes are updated, which necessitates the matching of quotes to transaction prices. Each transaction price is matched to the quote reported immediately prior to that transaction. Similarly, aggregate volumes at the best bid and best ask prices together with the ASX200 index representing the market are also matched in a similar fashion. The bid-ask spread $Sprd_{k-1}$, is given in cents, while $LBAV_{k-1}\ \&\ LBBV_{k-1}$ denote the natural log of number of shares at best ask and bid prices respectively. $LVol_{k-1}$ gives the natural logarithm of $(k-1)$th trade size. Conditional duration, $\psi_{k-1}$ and standardised transaction duration $\epsilon_{k-1}$ are derived estimates by fitting an autoregressive conditional duration model (ACD (1,1)) to diurnally adjusted duration data. A brief description of the model introduced by~\cite{engle1998autoregressive} is presented in the Appendix. The initial record of each day is disregarded as it is linked to the previous day's prices and results in negative durations. $TI_{k-1}$ denotes the trade indicator of $(k-1)$th trade, which classifies a trade as a buyer-initiated, seller-initiated or other type of trade. Trade imbalance $TIB_{k-1}$, based on the preceding 30 trades that occurred on the same day~\citep{yang2012predicting} (YP hereafter) is calculated as follows:
\begin{equation}\label{eq:9}
TIB_{k-1}=\frac{\sum_{j=1}^{30}\left(TI_{(k-1)-j}\times Vol_{(k-1)-j}\right)}{\sum_{j=1}^{30} Vol_{(k-1)-j}}
\end{equation}
\noindent The first 30 observations of trade imbalance (TIB) is set to zero as TIB also depends on the previous day's trade imbalance for these transactions.
\noindent Market index return $RIndx_{k-1}$, prevailing immediately prior to transaction $\it k$ is computed as given below:
\begin{equation}\label{eq:10}
RIndx_{k-1}=\ln(INDX_{k-1})-\ln(INDX_{k-2})
\end{equation}
The sampling period and the use and categorisation of certain variables in this analysis differ from YP. ASX200 is applied here instead of specific sector indexes as the impact of the performance of the overall economy tends to be more significant on stock price behaviour than of a specific sector. On the other hand, the reference point for grouping the price changes is the `one tick' threshold vis a vis the `zero' change. This provides a more meaningful classification of the groups, as the categorisation of price change is based on a range of values rather than a fixed value for a certain group.
%\subsection{Data cleaning procedure}
%The detection and removal of incorrect and inconsistent observations in the trade and quote data from the ASX was based on the following procedure. Some steps are common to both trade and quote data while others are specific to one type.\\
%
%\begin{enumerate}
%\item [\bf All\ data]
%\item [C1.] Delete records with zero transaction, bid or ask prices
%
%\item[\bf Quote data]
%
%\item [Q1.] Instances of multiple quotes for a given time stamp, replace with the median bid and median ask price
%\item [Q2.] Delete records with negative spreads
%\item [Q3.] Delete records, if the median spread exceeds 50 times the median spread for a given day
%\item [Q4.] Delete records if the mid-quote deviated by more than 10 mean absolute deviations from a rolling centered median of 50 observations (25 observation before and 25 observations after, excluding the observation under consideration).
%
%\item[\bf Trade data]
%\item[T1.] In instances of multiple transactions for a given time stamp, replace with the median transaction price.
%\item[T2.] Delete records if the transaction price deviated by more than 10 mean absolute deviations from a rolling centered median of 50 observations (25 observation before and 25 observations after, excluding the observation under consideration).
%\item [T3.] Delete records if the transaction price is more than ask plus bid-ask spread or is below the bid minus bid-ask spread.
%\end{enumerate}
\section{Results and Discussion}
Main text paragraph.
The document text continues here.
The document text continues here.
\subsection{This is a Subsection Heading}
Main text paragraph.
\section{Conclusions}
Main text paragraph.
Main text paragraph.
Main text.
Main text.
State any potential conflicts of interest here or ``The authors declare no conflict of interest''.
\selectlanguage{english}
\FloatBarrier
\bibliographystyle{plain}
\bibliography{bibliography/converted_to_latex.bib%
}
\end{document}