4.1 The proposed test
The nonparametric testing method we suggest is not new, but is not very well known among applied researchers (or at least not as well known as it should be) and we have not seen it being promoted or used in this particular context before. The name of the method is permutation test.
Under assumption (i), the joint distribution of the data is invariant to permutation (or reordering) of the observations. The combined sample size is
\(n +m\). Let
\(X_t {:}{=}AR_{T_0 + t}\) for
\(t = 1, \ldots , n\) and
\(X_t = AR_{T_1 + t - n}\) for
\(t = n+1, \ldots , n+m\), so that
$$\begin{aligned} \{X_1, \ldots X_n, X_{n+1}, \ldots , X_{n+m}\} = \{AR_{T_0+1}, \ldots , AR_{T_0+n}, AR_{T_1+1}, \ldots , AR_{T_2}\}~. \end{aligned}$$
Next, let
\(r{:}{=}\{r_1, \ldots , r_{n+m}\}\) be a permutation (or re-ordering of) of the set of integers
\(\{1, \ldots , n+m\}\). Note that
\((n+m)!\) distinct such permutations exist, where for an integer
d,
$$\begin{aligned} d! {:}{=}d \cdot (d-1) \cdot \ldots \cdot 2 \cdot 1~. \end{aligned}$$
(In words, one says “
d factorial”.) As an example, there are
\(3! = 6\) distinct permutations of the set
\(\{1, 2, 3\}\), given by
$$\begin{aligned} \{1, 2, 3\}, \{1, 3, 2\}, \{2, 1, 3\}, \{2, 3, 1\}, \{3, 1, 2\}, \{3, 2, 1\}~. \end{aligned}$$
(Note that the original ordering counts as one of the possible permutations.)
For a given permutation r, the corresponding permutation of the \(\{X_i\}\) is then implied as \(X_i^* {:}{=}X_{r_i}\) which in return defines the corresponding permutation of the abnormal returns as \(AR_{T_0+t}^* {:}{=}X_t^*\), for \(t = 1,\ldots , n\), and \(AR_{T_1+t-n}^* {:}{=}X_t^*\), for \(t = n+1, \ldots , n+m\). The point is that under assumption (i) the joint distribution of the permuted abnormal return is the same as the joint distribution of the original abnormal returns: i.i.d. according to a distribution with mean zero and (unknown) variance \(\sigma ^2 > 0\).
In a nutshell, the permutation test, in its ‘ideal’ version, then works as follows. First, set up the test statistic
T in a way such that large values ‘indicate’ the alternative hypothesis, that is,
$$\begin{aligned} T&{:}{=}t_{CAR} \quad \text {for testing problem } (2.2)~,\\ T&{:}{=}- t_{CAR} \quad \text {for testing problem } (2.3)~, \text { and} \\ T&{:}{=}|t_{CAR}| \quad \text {for testing problem } (2.4)~. \end{aligned}$$
Second, for any permutation
r, denote the value of the test statistic computed from the permuted data
\(\{AR_{T_0+1}^*, \ldots , AR_{T_0+n}^*, AR_{T_1+1}^*, \ldots , AR_{T_2}^*\}\) by
\(T_r^*\). Third, compute the
p-value as
$$\begin{aligned} \hat{p} {:}{=}\frac{\#\{T_r^{*} \ge T\}}{(n+m)!}~; \end{aligned}$$
(4.1)
that is, the
p-value is given by the fraction of test statistics (stemming from all distinct permutations of the data) that are as large or larger than the value of the test statistic computed from the observed data. This algorithm is called ‘ideal’, since the
p-value according to formula (
4.1) cannot be computed in practice unless the combined sample size
\(n+m\) is very small, which is not the case in our intended applications; for example, for
\(n+m=100\), one obtains
\((n+m)! = 100! \approx 9.33 \cdot 10^{157}\).
Therefore, a ‘feasible’
p-value is based on manageable number
B of permutations that are selected in a suitable way from universe of all
\((n+m)!\) distinct permutations. The ‘feasible’
p-value is then computed as
$$\begin{aligned} \hat{p} {:}{=}\frac{\#\{T_r^{*} \ge T\}}{B}~; \end{aligned}$$
In doing so, it is customary to make the ‘identity permutation’ one of the selected
B permutations, for which then
\(T_r^* = T\), and draw the remaining
\(B-1\) permutations at random from the universe of all distinct permutations. In this case, the smallest possible
p-value is 1/
B, namely if all the test statistics
\(T_r^*\) based on the
\(B-1\) randomly drawn permutations are smaller than
T. It is recommended to choose
B as large a possible in practice, depending on one’s computational power, but at least
\(B \ge 10,000\).
Last but not least, how does one draw a permutation of the numbers \(\{1,\ldots , n+m\}\) at random? Of course, the exact command depends on one’s software but the key term is “drawing without replacement” instead of “drawing with replacement”. The mental image is that there is an urn with balls labeled from 1 to \(n+m\). Then one draws one ball at a time, at random, without replacement, which results in a random permutation. If one draws with replacement instead, in general some numbers will appear more than once whereas other numbers will not appear at all, and so the resulting sequence is not a permutation.
For completeness, we can now ‘summarize’ the permutation-test method of constructing a p-value by means of the following algorithm.
By the general results on permutation testing of Lehmann and Romano (
2022, Section 17.2.1), the resulting
p-value (
4.2) is exact (or ‘perfect’) in finite samples; that is, for any
\(0< \alpha < 1\),
$$\begin{aligned} \text {Prob} \bigl (\hat{p} \le \alpha \bigr ) = \alpha \end{aligned}$$
under assumption (i), the data
\(\{AR_{T_0+1}, \ldots , AR_{T_0+n}, AR_{T_1+1}, \ldots , AR_{T_2}\}\) are i.i.d. according to a distribution with mean zero and (unknown) variance
\(\sigma ^2 > 0\),
In a sense, the permutation test uses a ‘data-based’ null distribution to derive the p-value, namely the empirical distribution of the B test statistics \(\{T_{r_1}^*, \ldots , T_{r_B}^*\}\) computed from permuted data; this is a discrete, nonparametric distribution. On the other hand, the t-test uses the \(t_{n-K}\) distribution as the null distribution to derive the p-value; this is a continuous, parametric distribution. Whereas the former null distribution is always valid by the result stated in the previous paragraph, the latter null distribution is only valid when the abnormal returns follow a normal distribution, which is generally not the case in practice.
It might be instructive to study how the two respective null distributions behave asymptotically, as the size of the estimation window,
n, tends to infinity whereas the size of the event window,
m, remains fixed. To this end let
\(X_1, X_2, \ldots , X_m\) be random variables that are i.i.d. according to the distribution with mean zero and variance
\(\sigma ^2\) that all abnormal returns follow under the null. Then the asymptotic null distribution for the permutation test, if in addition the number of permutations,
B, tends to infinity, is given by the distribution of the random variable
$$\begin{aligned} \frac{\sum _{i=1}^m X_i}{\sqrt{m} \sigma }~, \end{aligned}$$
which is a distribution with mean zero and variance one.
5 On the other hand, the asymptotic null distribution for the
t-test is given by
N(0, 1), that is, by the standard normal distribution. Therefore, both asymptotic null distributions have mean zero and variance one but only the former is always valid; the latter, again, is only valid when the abnormal returns follow a normal distribution, which is generally not the case in practice.
4.4 Nonrobustness to event-induced increase in variance
We deem it prudent to point out that all three tests that have been discussed — t-test, permutation test, and SQ test — are not robust to event-induced increase in variance.
To illustrate, consider as an example an event window of a single day (that is,
\(m=1\)) in which case the two-sided testing problem specifies to
$$\begin{aligned} H_0: {\mathbb {E}}(AR) = 0 \quad \text {vs.} \quad H_1: {\mathbb {E}}(AR) \ne 0~. \end{aligned}$$
Further, for simplicity, assume that abnormal returns are normally distributed. Our assumption under
\(H_0\) then specifies to the assumption that the data
\(\{AR_{T_0+1}, \ldots , AR_{T_0+n}, AR_{T_1+1}\}\) are i.i.d. according to
\(N(0, \sigma ^2)\). If instead
\( AR_{T_1+1}\) is distributed according to
\(N(0, \tilde{\sigma }^2)\) with
\(\tilde{\sigma }^2 > \sigma ^2\), the probability of rejecting
\(H_0\) is not controlled at the nominal level
\(\alpha \). In fact, the rejection probability of all three tests tends to one as
\(\tilde{\sigma }^2\) tends to infinity.
Therefore, to be allowed to interpret a rejection of \(H_0\) as evidence for \({\mathbb {E}}(AR) \ne 0\), one must assume that the event, if it has any effect, only changes the mean of \(AR_{T_1+1}\) but not its variance, such that \(AR_{T_1+1}\) is distributed according to \(N(\gamma , \sigma ^2)\) with \(\gamma \ne 0\).
This reasoning carries over to multi-day event windows (that is, \(m > 1\)) and abnormal returns that are not normally distributed: Any effect of the event should only ‘shift’ the distributions of the abnormal returns (up or down) but leave the shape of the distribution otherwise unchanged, and thus in particular leave the variance unchanged.
The nonrobustness to event-induced increase in variance of the three tests is undesirable but also impossible to fix, in the sense that it is not possible to disentangle statistically a change of the mean of the abnormal returns in the event window from a potential change of the variance at the same time, at least when the event window is short and as short as a single day.
4.5 Extension to testing CAAR
As stated before, in most event studies there are several firms under study and the interest is in testing CAAR, of which AAR is a special case (when the event window is of size
\(m=1\)). If the number of firms is ‘sufficiently’ large, one can use parametric test statistics; as a rule of thumb,
\(N \ge 30\) firms can be considered sufficient. For a smaller number of firms, one can use nonparametric test statistics; as a rule of thumb,
\(N \ge 10\) firms can be considered sufficient.
6
But there might be applications when the number of firms is in the single digits and as small as \(N= 2\). In such cases, even nonparametric test statistics are generally not viable. On the other hand, one can extend the permutation test for testing CAR outlined above to such applications. Once one prescribes how to permute the joint data comprising all the firms, the way the test is carried out is similar to testing CAR, and thus the details are left to the reader. In prescribing how to permute the joint data, we shall consider two settings.
In the first setting, there is no overlap between the ‘combined’ windows (estimation window together with event window) of the various firms. In this setting, one would permute ‘independently’ with respect to firms; in other words, one would permute the firm-specific data one firm at a time, using independently drawn permutations.
In the second setting, there is a common estimation window together with a common event window for all firms. In this setting, one would always apply the same permutation to all the firms together in order to preserve any (potential) across-firm dependence structure.