1 Introduction
In [
1], Charles Stein introduced a powerful new method for bounding the approximation error in the central limit theorem and other normal approximations. Let
\(X_{1},X_{2}\ldots ,X_{n}\) be independent random variables with finite third absolute moments, standardized so that
\(\mathbb{E}X_{i} = 0\),
\(1\leq i \leq n\), and
\(\operatorname{Var}(\sum_{i=1}^{n}X_{i}) = 1\). Stein’s method yields a characteristic function-free proof of the Berry–Esseen theorem, i.e., that there exists an absolute constant
C such that
$$ \sup_{x\in \mathbb{R}} \bigl\vert P(W\leq x) - \Phi (x) \bigr\vert \leq C\gamma , $$
(1.1)
where
\(W=\sum_{i=1}^{n}X_{i}\),
\(\gamma = \sum_{i=1}^{n}\mathbb{E}|X_{i}|^{3}\), and Φ is the standard normal distribution function. See Theorem 3.6 of [
2] for a proof with absolute constant
\(C=9.4\).
Nonuniform versions of the Berry–Esseen theorem, which are more informative than (
1.1), have also been obtained by Stein’s method. For example, Chen and Shao [
3] improve on the earlier bound of
$$ \bigl\vert P(W \leq x) - \Phi (x) \bigr\vert \leq \frac{C\gamma}{1 + \vert x \vert ^{3}} $$
from [
4] and show that a similar result holds if we only assume the existence of second moments of the
\(X_{i}\).
Local limit theorems, which quantify the accuracy of a normal approximation for the point probabilities
\(P(S=k)\),
\(k\in \mathbb{Z}\), when
S is a sum of integer-valued random variables, have seldom been considered using Stein’s method. Suppose now that
\(X_{1},X_{2}\ldots ,X_{n}\) are integer-valued random variables and set
\(S = \sum_{i=1}^{n}X_{i}\). Formally,
S is said to satisfy the local limit theorem if the quantity
$$ \triangle = \sup_{k\in \mathbb{Z}}\biggl\vert P(S = k) - \frac{1}{\sigma \sqrt{2\pi}}\exp \biggl\{ - \frac{(k-\mu )^{2}}{2\sigma ^{2}} \biggr\} \biggr\vert $$
(1.2)
satisfies
\(\triangle = o(1/\sigma )\), where
\(\mu = \mathbb{E}S\) and
\(\sigma ^{2} = \operatorname{Var}(S)\). Here and throughout, we suppress the obvious dependence in
n of quantities such as
S,
μ, and
\(\sigma ^{2}\).
To understand the requirement that
\(\triangle = o(1/\sigma )\), observe that (
1.1) gives a bound on the difference between the distribution functions of
\(W=(S-\mu )/\sigma \) and
\(Z\sim N(0,1)\) that is proportional to
\(\sigma ^{-3}\sum_{i=1}^{n}\mathbb{E}|X_{i}-\mu _{i}|^{3}\), where
\(\mu _{i}=\mathbb{E} X_{i}\). In the typical situation where
\(\sum_{i=1}^{n}\mathbb{E}|X_{i}-\mu _{i}|^{3} = O(n)\) and
\(\sigma ^{-2} = O(n^{-1})\), the Berry–Esseen bound is
\(O(1/\sigma )\), and it can also be shown that
\(\triangle = O(1/\sigma )\) in this case. Thus the requirement that
\(\triangle = o(1/\sigma )\) in the local limit theorem serves to ensure more refined information than is immediately available from the Berry–Esseen bound.
A historical overview of local limit theorems can be found in [
5], where it is noted that they predate central limit theorems. The earliest such result, the DeMoivre–Laplace theorem [
6,
7], establishes the local limit theorem in the case of sums of identically distributed Bernoulli random variables, i.e., where
S is binomially distributed. In this case the definition of △ in (
1.2) is modified by taking the supremum over
\(k\in [0,n]\cap \mathbb{Z}\). The DeMoivre–Laplace theorem is also considered in [
8], where it is shown that
\(\triangle \leq 0.516/\sigma ^{2}\). Chapter 7 of [
9] shows that under mild conditions, when
S is a sum of independent integer-valued random variables, an approximation error of
\(\triangle = O(1/\sigma ^{2})\) is optimal. Siripraparat and Neammanee [
10] establish the optimal error of
\(\triangle = O(1/\sigma ^{2})\) in the case of independent but not necessarily identically distributed Bernoulli random variables and give explicit constants in their bound. Siripraparat and Neammanee [
11] generalize this work to sums of arbitrary independent integer-valued random variables. The proofs of these results typically involve Fourier analysis of characteristic functions.
Although Barbour et al. [
12] use Stein’s method to prove local limit theorems, they consider a rather more general setup, which does not restrict them to local approximation of sums of integer-valued random variables. Consequently, the bounds they obtain are more complicated than expected when one considers sums, and when applied to the particular case of sums of independent integer-valued random variables, they do not yield the optimal rate of
\(\triangle = O(1/\sigma ^{2})\), although the authors suggest how their methods can be adapted to yield the optimal rate in this case. Fang [
13] uses Stein’s method to give bounds for the total-variation distance between an integer-valued random variable and a discretized normal distribution, although bounds in the local metric are not considered. Barbour and Choi [
14] consider approximating the distribution of sums of integer-valued random variables by a translated Poisson distribution. They obtain nonuniform bounds in the total-variation metric that are roughly analogous to the classical results of [
4] and [
3] for the central limit theorem, although they do not consider local limit theorems.
All of the above studies involving local limit theorems consider only uniform bounds, so that some information regarding the quality of the normal approximation of
\(P(S=k)\) for a specific fixed
k is lost. In this paper, we prove a nonuniform local limit theorem when
\(X_{1},X_{2},\ldots ,X_{n}\) are independent but not necessarily identically distributed Bernoulli random variables. In this case,
\(S=\sum_{i=1}^{n}X_{i}\) is said to be a Poisson binomial random variable and follows a Poisson binomial distribution. Poisson binomial random variables, introduced in [
15], have been used in a wide range of contexts from finance [
16], reliability analysis [
17], and machine learning [
18,
19], to name a few. A survey of the Poisson binomial distribution may be found in [
20]. We now state our main result.
Our proof of Theorem
1.1 uses a combination of the zero-bias transformation of [
21], the concentration inequality approach of [
3], and some ideas from Chap. 7 of [
2]. Our proof may also be modified to give the classical uniform local limit theorem with an explicit constant. Although we do not pursue this direction here, we intend to in a future note.
The remainder of the paper is structured as follows. Section
2 reviews the necessary background in Stein’s method and the zero-bias framework required for the remainder of the paper. Section
3 gives some auxiliary technical results needed for the proof of Theorem
1.1 in Sect.
4. The
Appendix gives the proof of a lemma that is stated in Sect.
3.
2 Stein’s method
The starting point of Stein’s method is the following characterization of the standard normal distribution. If the random variable
W has a standard normal distribution, then
$$ \mathbb{E} f'(W) - \mathbb{E} Wf(W) = 0 $$
(2.1)
for all absolutely continuous functions
\(f:\mathbb{R}\to \mathbb{R}\) with
\(\mathbb{E} |f'(W) | < \infty \). Conversely, if (
2.1) holds for all bounded, continuous, and piecewise continuously differentiable functions
f with
\(\mathbb{E} |f'(Z) | < \infty \),
\(Z\sim N(0,1)\), then
W has a standard normal distribution.
Intuitively, if W is approximately standard normal, then for
\(Z\sim N(0,1)\),
\(\mathbb{E}h(W) -\mathbb{E}h(Z)\) should be close to zero for
h in a sufficiently large class of test functions. Also, if
W is in some sense close to
Z, then from (
2.1),
\(\mathbb{E}f'(W)-\mathbb{E}Wf(W)\) should be close to zero. These two observations lead to consideration of the ordinary differential equation
$$ f'(w) - wf(w) = h(w) - \mathbb{E}h(Z), $$
(2.2)
known as the Stein equation, which may be solved for
f by the method of integrating factors. For a given choice of
h, with
f the solution of the Stein equation (
2.2), we see that bounding
\(\mathbb{E}h(W) - \mathbb{E}h(Z)\) is equivalent to bounding
\(\mathbb{E}\{f'(W) - Wf(W)\}\). The latter expectation often turns out to be easier to bound than the former, particularly when
W is a sum of random variables.
To analyze the approximation error in the central limit theorem when
W is a sum or sample mean, fix
\(x\in \mathbb{R}\), and take
. Then, replacing
w by
W and taking expectations, the right-hand side of (
2.2) becomes
\(P(W\leq x) - \Phi (x)\), which is the desired difference to be analyzed. The class of test functions to be used for our problem is identified in Sect.
3.1.
We will use the zero-bias framework of [
21], which defines a new random variable
\(W^{*}\) on the same space as
W to assess the proximity of
W to a normal random variable. When
\(W\sim N(0, \sigma ^{2}_{ W})\), the characterizing equation analogous to (
2.1) is
\(\mathbb{E}Wf(W) = \sigma ^{2}_{ W}\mathbb{E}f'(W)\). This motivates the following definition.
Goldstein and Reinert [
21] prove the existence and uniqueness of
\(W^{*}\). Regarding the zero-bias transformation as the mapping
\(W\to W^{*}\), a random variable with
\(N(0, \sigma ^{2}_{ W})\) distribution is seen to be the unique fixed point of this transformation. If
W is in some sense close to
\(W^{*}\), then we expect
W to be approximately normally distributed. Indeed, a key step in proving the nonuniform bound for
S in Theorem
1.1 is showing that an analogous approximation holds for
\(W^{*}\) when
W is a sum of appropriately centered and scaled Bernoulli random variables.
An important example for our problem is when
X is Bernoulli with
\(P(X=1)=1-P(X=0)=p\). Although
\(\mathbb{E}X = p\), so that
\(X^{*}\) does not exist, we may calculate
\((X-p)^{*}\) as follows. Letting
\(Y=X-p\), which has the variance
\(\sigma ^{2}_{ Y}=p(1-p)\), we have
$$\begin{aligned} \mathbb{E}Yf(Y) &= \mathbb{E}\bigl[(X-p)f(X-p)\bigr] = p(1-p)f(1-p)-(1-p)pf(-p) \\ & = \sigma ^{2}_{ Y}\bigl[f(1-p) - f(-p)\bigr] =\sigma ^{2}_{ Y} \int _{-p}^{1-p}f'(u)\,du = \sigma ^{2}_{ Y}\mathbb{E}f'(U), \end{aligned}$$
where
U is uniformly distributed on
\([-p, 1-p]\), and thus
\((X-p)^{*}\overset{d}{=}U[-p, 1-p]\).
A useful and easily verified property of the zero-bias transformation is that if
\(\mathbb{E}X=0\) and
\(a\neq 0\), then
\((aX)^{*}=aX^{*}\) [
2, p. 29]. Note now, for later use, that if
\(X \sim \text{Bernoulli}(p)\), then for
\(\sigma > 0\),
$$ \biggl(\frac{X-p}{\sigma} \biggr)^{*} \sim U \biggl[ \frac{-p}{\sigma}, \frac{1-p}{\sigma} \biggr], $$
(2.3)
where
\(U[-p/\sigma , (1-p)/\sigma ]\) is the uniform distribution on the interval
\([-p/\sigma , (1-p)/\sigma ]\).
The following fundamental result from [
21] shows how
\(W^{*}\) may be obtained when
W is a sum of independent zero-mean random variables.
4 Proof of Theorem 1.1
Before proceeding to the proof of Theorem
1.1, we observe that with our choice of test functions, the result in Theorem
3.1 may be written as
$$ \biggl\vert P \biggl(x-\frac{1}{\sigma} < W^{*} \leq x \biggr) - P \biggl( x- \frac{1}{\sigma} < Z \leq x \biggr) \biggr\vert \leq \frac{Ce^{- \vert x \vert }}{\sigma ^{2}}. $$
(4.1)
Specializing to values of
x of the form
\(x = \frac{k-\mu}{\sigma}\),
\(k \in [0,n]\cap \mathbb{Z}\), (
4.1) becomes
$$ \bigl\vert P \bigl(k-1 < \sigma W^{*} + \mu \leq k \bigr) - P (k-1 < \sigma Z + \mu \leq k ) \bigr\vert \leq \frac{Ce^{- \vert \frac{k-\mu}{\sigma} \vert }}{\sigma ^{2}}. $$
(4.2)
We define the integer-valued random variables
\(Z_{\mu ,\sigma ^{2}}\) and
\(W^{*}_{\mu ,\sigma ^{2}}\), which are discretizations of
\(\sigma Z + \mu \) and
\(\sigma W^{*} + \mu \), respectively, as
$$\begin{aligned} &P(Z_{\mu ,\sigma ^{2}} = k)= P \biggl(\frac{k-\mu -1}{\sigma} < Z \leq \frac{k-\mu}{\sigma} \biggr), \quad k\in \mathbb{Z}, \end{aligned}$$
(4.3)
$$\begin{aligned} &P\bigl(W^{*}_{\mu ,\sigma ^{2}} = k\bigr)= P \biggl( \frac{k-\mu -1}{\sigma} < W^{*} \leq \frac{k-\mu}{\sigma} \biggr), \quad k\in \mathbb{Z}. \end{aligned}$$
(4.4)
The result of Theorem
3.1, specialized to
\(x=(k-\mu )/\sigma \),
\(k \in [0,n]\cap \mathbb{Z}\), may then be written as
$$ \bigl\vert P(Z_{\mu ,\sigma ^{2}} = k) - P\bigl(W^{*}_{\mu ,\sigma ^{2}} = k\bigr) \bigr\vert \leq \frac{Ce^{- \vert \frac{k-\mu}{\sigma} \vert }}{\sigma ^{2}}. $$
(4.5)
In Sect.
3.1, our main reason for working with the normalized sums
W, rather than with the raw sums
S, was to allow us to use the zero-bias framework of [
21]. It is also more straightforward to derive properties of the solution to the simple Stein equation (
2.2) compared to the general form in (
3.1). We have now translated the results of Theorem
3.1, regarding the centered random variables
W and
Z, to statements about the uncentered random variables
\(Z_{\mu ,\sigma ^{2}}\) and
\(W^{*}_{\mu ,\sigma ^{2}}\) in (
4.5).
For the remainder of this section, we will work directly with the raw uncentered sums S rather than with W. Now for fixed k, we define the test function \(g_{k}\) as , so that for \(k\in \mathbb{Z}\), \(\mathbb{E}g_{k}(S) = P(S = k)\).
We now give the proof of Theorem
1.1.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.