Skip to main content
Top

2024 | Book

Handbook of Scan Statistics

Editors: Joseph Glaz, Markos V. Koutras

Publisher: Springer New York

insite
SEARCH

About this book

Scan statistics, one of the most active research areas in applied probability and statistics, has seen a tremendous growth during the last 25 years. Google Scholar lists about 3,500 hits to references of articles on scan statistics since the year 2020, resulting in over 850 hits to articles per year. This is mainly due to extensive and diverse areas of science and technology where scan statistics have been employed, including: atmospheric and climate sciences, business, computer science, criminology, ecology, epidemiology, finance, genetics and genomics, geographic sciences, medical and health sciences, nutrition, pharmaceutical sciences, physics, quality control and reliability, social networks and veterinary science.

This volume of the Handbook of Scan Statistics is a collection of forty chapters, authored by leading experts in the field, outlines the research and the breadthof applications of scan statistics to the numerous areas of science and technology listed above. These chapters present an overview of the theory, methods and computational techniques, related to research in the area of scan statistics and outline future developments. It contains extensive references to research articles, books and relevant computer software.

Handbook of Scan Statistics is an excellent reference for researchers and graduate students in applied probability and statistics, as well as for scientists in research areas where scan statistics are used. This volume may also be used as a textbook for a graduate level course on scan statistics.

Table of Contents

Frontmatter
1. Research on Probability Models for Cluster of Points Before the Year 1960

Scan statistics describe large number of events or objects clustered close in time or space. A few special cases of scan statistics – long success runs and the sample range – have been widely applied and their distributions known for hundreds of years. Most of the theory on the probability of scan statistics was developed after 1960. Prior to 1960, researchers in various fields have used scan statistics and were either limited to the special cases or used rough distributional approximations. In some cases, they did not fully take into account in their likelihood analysis the overlapping scanning nature of how they selected clusters.For the continuous case, the times when (or locations where) events can occur can be anywhere within an interval (or region). For the discrete case, the events can occur anywhere on a grid, a special case being a sequence of Bernoulli trials. We separate the history for the discrete and continuous cases because they grew out of different applications.

Joseph Naus
2. Adaptive Likelihood Ratio Scans for the Detection of Space-Time Clusters

This work presents a methodology to detect space-time clusters, based on adaptive likelihood ratios (ALRs), which preserves the martingale structure of the regular likelihood ratio. Monte Carlo simulations are not required to validate the procedure’s statistical significance, because the upper limit for the false alarm rate of the proposed method depends only on the quantity of evaluated cluster candidates, thus allowing the construction of a fast computational algorithm. The quantity of evaluated clusters is also significantly reduced, by using another adaptive scheme to prune many unpromising clusters, further increasing the computational speed. Performance is evaluated through simulations to measure the average detection delay and the probability of correct cluster detection. Applications for thyroid cancer in New Mexico and hanseniasis in children in the Brazilian Amazon are shown.

Max S. de Lima, Luiz H. Duczmal
3. Adjusted Inference for the Spatial Scan Statistic

A modification is proposed to the usual inference test of the Kulldorff’s spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new modified inference question is answered: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size known, taking into account only those most likely clusters of same size found under null hypothesis? A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.

Alexandre C.  L. Almeida, Anderson R. Duarte, Luiz H. Duczmal, Fernando L.  P. Oliveira, Ricardo H.  C. Takahashi, Ivair R. Silva
4. Approximating the Distribution of the Multiple Scan Statistic

In this paper, we review a number of bounds and approximations for the distribution of the multiple scan statistic defined on a sequence of binary trials. Using a simulation study, we proceed to the assessment of the accuracy of the approximations.

Markos V. Koutras, F. S. Milienos
5. Approximations for Discrete Scan Statistics on i.i.d and Markov-Dependent Bernoulli Trials

In this short note, we examine some approximations for the distribution of the discrete scan statistic defined on i.i.d. and Markov-dependent Bernoulli trials. The approximations are developed using the finite Markov chain imbedding technique of Fu and Koutras (J Am Stat Assoc 89(427):1050–1058, 1994) and the methods in Fu and Johnson (Adv Appl Probab 41(1):292–308, 2009) and Koutras and Milienos (J Stat Plann Inference 142(6):1464–1479, 2012). The approximations perform well for the cases considered and, in most cases, outperform the commonly used product approximation developed in Chen and Glaz (Approximations for the distribution and the moments of discrete scan statistics. In: Glaz J, Balakrishnan N (eds) Scan statistics and applications. Statistics for industry and technology. Birkhäuser, Boston, pp 27–66, 1999).

Brad C. Johnson
6. Bayesian Scan Statistics

In this chapter we describe Bayesian scan statistics, a class of methods which build both on the prior literature on scan statistics and on Bayesian approaches to cluster detection and modeling. We first compare and contrast the Bayesian scan to the traditional, frequentist hypothesis testing approach to scan statistics and summarize the advantages and disadvantages of each approach. We then focus on three different Bayesian scan statistic approaches: the Bayesian variable window scan statistic, the multivariate Bayesian scan statistic and extensions, and scan statistic approaches based on Bayesian networks. We describe each of these approaches in detail and compare these to related Bayesian scan methods and to the wider literature on Bayesian cluster detection and modeling. Finally, we discuss several promising areas for future work in Bayesian scan statistics, including multiple cluster detection, nonparametric Bayesian approaches, extension of Bayesian spatial scan to nonspatial datasets, and computationally efficient methods for model learning and detection.

Daniel B. Neill
7. Calibrating the Scan Statistic with Size-Dependent Critical Values
Heuristics, Methodology, and Computation

It is known that the scan statistic with variable window size favors the detection of signals with small spatial extent and there is a corresponding loss of power for signals with large spatial extent. Recent results have shown that this loss is not inevitable: Using critical values that depend on the size of the window allows optimal detection for all signal sizes simultaneously, so there is no substantial price to pay for not knowing the correct window size and for scanning with a variable window size. This chapter gives a review of the heuristics and methodology for such size-dependent critical values, their applications to various settings including the multivariate case, and recent results about fast algorithms for computing scan statistics.

Guenther Walther
8. Demimartingale Approaches for ScanStatistics

Scan statistics are defined as random variables enumerating the moving windows in a sequence of binary outcome trials which contain a prescribed number of successes. The main objective of this contribution is to serve as a self-contained source of some recent results concerning both the simple and the multiple scan statistic. These results are innovative in the sense that they seem to be the first ones on scan statistics that were derived by means of demimartingale techniques. The demimartingale approach motivated also some classification questions for stochastic processes associated with scan statistics. These types of questions and some past results on scan statistics that can be regarded as relevant to the demimartingale approach are also discussed here. In order to illustrate how our results can be implemented in practice, our presentation is enriched with several numerical exhibitions.

Markos V. Koutras, Demetrios P. Lyberopoulos
9. Designing Distributed Sensor Detection Systems Using the Scan Statistic

The scan statistic has been successfully used to detect anomalies in georeferenced data, making it a suitable candidate for fusing measurements in distributed sensor systems to detect emitters in a region. In such systems, sensor nodes are spread in the region and periodically collect measurements from different locations, and a fusion center combines the measurements to decide whether the region contains an emitter or not. The value of the scan statistic to fuse the sensor measurements has been reported by researchers in the past; however, when designing a system using the scan statistic, a system designer encounters several questions specific to distributed sensor systems: How to design the scan statistic to satisfy a maximum probability of false alarm? How can the designer ensure the detection performance of the system? How to build clusters or tune the scan statistic to improve the detection performance? In this paper, we review some of the recent results with the aim of answering these questions.

Benedito J. B. Fonseca Jr.
10. Discrete Scan Statistics for Higher-Order Markovian Sequences

In this chapter we review methods for computing probabilities of the discrete scan statistic. Most of the presented results are for independent trials, as results for higher-order Markovian sequences are scarce. Results from three papers on exact computation of probabilities in Markovian sequences are given, two of which are for binary Markov chains, the third allowing multistate higher-order Markovian trials. Whereas exact computation of the complete distribution of the statistic is limited to relatively small values of the scanning window w, larger window sizes can be handled in the case of individual p-values and extreme values of the scan statistic. Approximations and bounds on probabilities for the statistic have been developed for still larger values of w. Product-type and Poisson/compound Poisson approximations are considered here, as well as Bonferroni- and product-type bounds that give a feel for the accuracy of approximations. The final section includes numerical comparisons of exact and approximate methods to evaluate the accuracy of the approximations and possible areas of future study.

Donald E. K. Martin
11. Discrete Scan Statistics Generated by Dependent Trials and Their Applications in Reliability

The chapter is concerned with discrete scan statistic based on a sequence of dependent binary trials. In particular, the existing results are reviewed for the distribution of the discrete scan statistic based on a sequence of exchangeable binary trials. The results are discussed in the context of the reliability of the linear consecutive-k-within-m-out-of-n:F system, and a new exact formula for the reliability of the linear consecutive-2-within-m-out-of-n:F system that consists of arbitrarily dependent components is presented.

Serkan Eryilmaz, Femin Yalcin
12. Generating Function Methods for Run and Scan Statistics

Runs and pattern statistics have found successful applications in various fields. Many classical results of distributions of runs were obtained by combinatorial methods. As the patterns under study become complicated, the combinatorial complexity involved may become challenging, especially when dealing with multistate or multiset systems. Several unified methods have been devised to overcome the combinatorial difficulties. One of them is the finite Markov chain imbedding approach. Here we use a systematic approach that is inspired by methods in statistical physics. In this approach the study of run and pattern distributions is decoupled into two easy independent steps. In the first step, elements of each object (usually represented by its generating function) are considered in isolation without regards of elements of the other objects. In the second step, formulas in matrix or explicit forms combine the results from the first step into a whole multi-object system with potential nearest neighbor interactions. By considering only one kind of object each time in the first step, the complexity arising from the simultaneous interactions of elements from multiple objects is avoided. In essence the method builds up a higher level generating function for the whole system by using the lower level of generating functions from individual objects. By dealing with generating functions in each step, the method usually obtains results that are more general than those obtained by other methods. Examples of different complexities and flavors for run- and pattern-related distributions will be used to illustrate the method.

Yong Kong
13. Health Monitoring Techniques Using Scan Statistics

Scan statistics appeared in the statistics literature about half a century ago, and since then many papers suggesting either extensions and modifications or applications into various research fields have been published. Scan statistics are mainly used to detect clusters of events in time or space. In the last two decades several researchers have proposed techniques or systems for the surveillance of public health or other healthcare processes. In this paper, we shall present a systematic review of health monitoring techniques which exploit scan statistics in order to set up early warning systems detecting potential threats for public health.

Sotiris Bersimis, Athanasios Sachlas, Markos V. Koutras
14. Martingale Methods

We survey the ways that martingales and the method of gambling teams can be used to obtain otherwise hard-to-get information for the moments and distributions of waiting times for the occurrence of simple or compound patterns in an independent or a Markov sequence. We also survey how such methods can be used to provide moments and distribution approximations for a variety of scan statistics, including variable length scan statistics. Each of the general problems considered here is accompanied by one or more concrete examples that illustrate the computational tractability of the methods.

Vladimir Pozdnyakov, J. Michael Steele
15. Nearest Neighbors of Multivariate Runs

We investigate the joint distributions of the number of nearest neighbor contacts between different objects in the context of runs-related statistics in multiple object systems. These distributions reveal spatial or temporal relations between runs that could not be answered by traditional run statistics, where the spatial or temporal relations between runs are ignored. To obtain the distributions of nearest neighbor contacts, we generalized the generating function approach we developed previously for run statistics. Explicit distributions and moments of the distributions were obtained. These generating functions also lead directly to the asymptotic distributions based on the singularity perturbation theory. Two kinds of nearest neighbor contacts are discussed. For each case the distributions for conditional and unconditional models are derived. By considering the nearest neighbor contacts in the context of runs and scans distributions, our study adds a new dimension to the existing knowledge in the field, opening up opportunities for further research to explore its full implications and potential applications.

Yong Kong
16. New Frontiers for Scan Statistics: Network, Trajectory, and Text Data

In this chapter we survey the new theoretical developments and the use of scan statistics in data represented as graphs, trajectories, and text. These types of data are becoming common in the new massive digital data world. Large social networks are represented by complex graphs. We have records of the paths of moving objects, such as people who log their travel routes generating GPS trajectories. Large quantities of text are continuously generated by news wire services and social networks. There is a large interest in developing algorithms with strong statistical basis for detecting anomalies in these types of data. We review the use of the scan statistics in these situations. Additionally, we identify three main opportunities and challenges from the big data times for scan statistics: we need to deal with new stochastic data structures; we need much higher computational efficiency than we have now; and we need models that can deal with the variability that appears in the large samples now collected.

Renato M. Assunção, Roberto C. S. N. P. Souza, Marcos O. Prates
17. On Scan Statistics Through the Finite Markov Chain Imbedding Approach

This chapter provides a short review of the finite Markov chain imbedding approach for studying the distributions of discrete scan statistics, multiple window scan statistics, and continuous scan statistics under a Poisson arrival process. Applications to hypothesis testing for various alternatives are also provided to illustrate the versatility of the approach.

W. Y. Wendy Lou, James C. Fu
18. On the Exact Distributions of Pattern Statistics for a Sequence of Binary Trials: A Combinatorial Approach

Consider a sequence of exchangeable or Markov-dependent binary (zero-one) trials. A sequence of independent and identically distributed binary trials is covered as a particular case of both the prementioned ones. For counting/waiting time pattern statistics defined on such model sequences, we point out how their exact probability distributions can be established using enumerative combinatorics. The expressions for the distributions contain probabilities depending on the internal structure of the model sequence and combinatorial numbers denoting set cardinalities. The latter numbers depend on the considered pattern statistics and the number of ones, for an exchangeable sequence, as well as the number of runs of ones, for a Markov-dependent sequence. These numbers become concrete when certain patterns and enumerative schemes are studied on the model sequences. Exact distributions for statistics connected to patterns of limited length, as well as to certain runs and scans, are provided using proper combinatorial numbers and exemplify the approach.

Frosso S. Makri, Zaharias M. Psillakis
19. Poisson Approximations for the Number of kl-Scans

Consider a lecture class with a population of N students. Suppose we keep track of the order of students called upon to answer a question. Each student on the roster has l friends before his/her name and l friends after his/her name; these may be considered to be students who are lexicographically close. A kl-match occurs when two students, who are in each other’s list of 2l friends or are themselves, are called upon within the k previous questions. A large number of such occurrences might indicate that the professor is not selecting students at random. Let Xn denote the number of kl-matches within the first n questions asked by the professor, where each student has a full window of 2l + 1 friends and a full window of k previous questions. This scenario is built off of Burkhardt et al. (Stat Probab Lett 21:1–8, 1994) paper about the distribution of k-matches. The distribution of Xn, in the uniform case, is approximated by a Poisson random variable if lk2 = o(N). In the nonuniform but i.i.d. case, the distribution is also approximately Poisson. There is a relation of this problem to two-dimensional scan statistics, where one is counting numbers of events that are close in time and space.

Anant Godbole, Katherine Grzesik, Heather Shappell
20. Run and Scan Rules in Statistical Process Monitoring

In this paper, we provide an overview of the use of run and scan rules in statistical process monitoring. Although we focus on control charts, supplemented with various stopping rules based on run and scan statistics, several other monitoring procedures that incorporate run and scan statistics are reviewed as well. Rules based on the notion of scans have been incorporated in the traditional Shewhart charts in order to improve their performance and at the same time preserve their simplicity. In our presentation we review the major types of run and scan rules currently available in the literature of control charts and highlight how they are implemented in practice. A unified framework for studying the characteristics of run- and scan-based control charts by exploiting a Markov chain approach is also provided. We end up with some concluding remarks and some directions for future research in the area under review.

Sotiris Bersimis, Markos V. Koutras, Athanasios C. Rakitzis
21. Scan Statistics Applications in Genomics

The area of scan statistics encompasses a broad class of methods for detecting clusters of events. These methods have been applied to identify genomic regions containing nonrandom clusters of restriction sites, genetic markers, or specific word patterns since the early 1990s. Typically, the positions of these sites are modeled as independent and identically distributed random points on the unit interval or events in Bernoulli or Poisson processes. In the majority of the applications, the scan statistics are defined either as the maximum count of points contained in a scanning window of fixed length or the minimum aggregated spacing between a fixed number of consecutive points. In some applications, the underlying models are generalized to the two-dimensional unit square or graphs representing gene networks where the scan statistics are maximum likelihood ratios. Statistical significance can be evaluated by p-value calculations using asymptotic and other analytical approximations or permutation-based procedures. This chapter includes reviews of the DNA sequence analysis studies to identify clusters of the GATC tetranucleotide on E. coli DNA and palindromes in herpesvirus genomes. Some recent applications in finding clusters of chromosomal translocation breakpoints, viral DNA integration sites, and DNA variants associated with copy number variations are also presented. Identification of these clusters will help design better gene therapies and elucidate the mechanisms of diseases like leukemia, schizophrenia, and autism. As different types of genomics data are accumulating rapidly, the methodology of scan statistics is expected to play increasingly important roles in biomedical research.

Ming-Ying Leung
22. Scan Statistics for Detecting a Local Change in Mean for Normal Data

In this article, we review the approximations and inequalities that have been derived in the scientific literature for fixed-, multiple-, and variable-window-length scan statistics, for detecting a local change in the population mean, for one-dimensional normal data. We assume that the variance of the underlying distribution is known and remains unchanged. Monitoring processes based on a fixed-window scan statistic via fixed and sequential sampling schemes are discussed as well. In the context of sequential sampling schemes for the monitoring process, we discuss a repeated significance test and evaluate its properties. The implementation of two multiple-window-length scan statistics are based on the minimum p-value statistic and the generalized likelihood ratio test statistic, respectively. The implementation of the variable-window scan statistic is based on the generalized likelihood ratio test statistic. Simulation algorithms and numerical results are presented to evaluate the performance of the multiple and variable-window-type scan statistics and compare them with fixed-window scan statistics.

Jie Chen, Joseph Glaz
23. Scan Statistics for Detecting a Local Change in Model Parameters for Normal Data

In this chapter, we review the testing procedures that have been investigated in the scientific literature for detecting a local change in the parameters of a normal distribution, for one- and two-dimensional data. In Chen and Glaz (2021), testing procedures for detecting a local change in the population mean for one-dimensional normal data have been reviewed, when the population variance is known. In this chapter we assume that both population mean and population variance are unknown. We also consider the case when a local change in the population mean and population variance occurs simultaneously. When the size of the local region where the change of the parameters has occurred is unknown, we consider testing procedures based on the minimum p-value statistic and the generalized likelihood ratio type statistic. Simulation algorithms and numerical results are presented to evaluate the accuracy of the specified significance level and the power of the test statistics discussed in this chapter. When the size of the local region where the change of the parameters has occurred is unknown, numerical results are presented to compare the power of the test statistics based on fixed and multiple window scan statistics.

Jie Chen, Joseph Glaz
24. Scan Statistics for Integer-Valued Random Variables: Conditional Case

In this chapter, we review approximations and inequalities for the distribution of conditional scan statistics for a sequence of independent and identically distributed nonnegative integer-valued random variables, modeled by a one-parameter natural exponential family of distributions, when the total sum of the random variables is known. The distribution of conditional scan statistics is based on the joint distribution of moving sums of components of a random vector from a related multivariate discrete distribution. In most cases the exact distribution of conditional scan statistics is unknown. Therefore, accurate approximations and inequalities for their distributions are of great importance. In this chapter, we present accurate product-type and Poisson-type approximations and Bonferroni-type inequalities for the tail probabilities and expected size of conditional scan statistics for the binomial, Poisson, and negative binomial models. We also discuss the extension of the conditional scan statistics to a conditional multiple occurrence scan statistic. Numerical results are presented to evaluate the accuracy of the approximations and inequalities discussed in this chapter.

Jie Chen, Joseph Glaz
25. Scan Statistics on Graphs and Networks

This article summarizes modern research of scan statistics on graphs and networks. These statistics arise naturally in the scanning of time and space looking for clusters of anomalous entities or events. We review theories and methodologies of constructing scan statistics for both static and dynamic graphs, in both purely spatial and spatio temporal frameworks. Computation of graph-structured scan statistics is challenging, and usually leads to NP-hard problems. We also review several popular convex approximation algorithms for computing scan statistics in this article.

Panpan Zhang, Joseph Glaz
26. Scan Statistics Viewed as Maximum of 1-Dependent Random Variables

A method of approximating the distribution function of the partial maximum sequence generated by a 1-dependent stationary sequence can be applied to estimate the distribution function of one or multidimensional scan statistics. The method, which provides error bounds for the approximations, was investigated and evaluated in several papers.

George Haiman, Cristian Preda
27. ScanZID: Spatial Scan Statistics with Zero Inflation and Dispersion

The spatial scan statistic is one of the most important methods to detect and monitor spatial disease clusters. Usually it is assumed that disease cases follow a Poisson, Binomial, Bernoulli, or negative binomial distribution. In practice, however, case count datasets frequently present zero inflation and/or dispersion (underdispersion or overdispersion), resulting in the violation of those commonly used models, thus increasing type I error occurrence. This paper describes the spatial scan statistic with the zero inflation and dispersion (ScanZID) to accommodate simultaneously the excess of zeroes and dispersion. The null and alternative model parameters are estimated by the expectation-maximization (EM) algorithm, and the p-value is obtained through the fast double bootstrap test. An application is presented for Hanseniasis data in the Brazilian Amazon.

Max S. de Lima, Luiz H. Duczmal, José C. Neto, Letícia P. Pinto, Márcio A. C. Ferreira, Vanessa A. de Lima
28. Shocks, Scans, and Reliability Systems

This chapter summarizes the close connection between one of the widely studied shock models known as δ-shock model and runs/scans. Under discrete time setting, i.e., when the shocks occur according to a binomial process, the linkage between the lifetime of the system under the shock model and the waiting time for the first scan is presented. Such a useful connection may create a new perspective to study the reliability properties of the system under the δ-shock model.

Serkan Eryilmaz
29. Spacing Methods and Their Applications to Scan Statistics

The scan statistics can be used in many areas of science to test for uniformity. In this chapter a review of spacing methods on scan statistic for the continuous conditional case is presented. This method is based on repeated use of a basic recursion to break up the joint distribution of linear combinations of spacings into simpler components which are easily evaluated. The final answers produced by this approach are piecewise polynomials whose coefficients are computed exactly. These expressions can be stored and later used to rapidly compute numerical answers which are accurate to any required degree of precision.

Chien-Tai Lin
30. Spatial Cluster Detection Through a Dynamic Programming Approach

This chapter reviews a dynamic programming scan approach to the detection and inference of arbitrarily shaped spatial clusters in aggregated geographical area maps, which is formulated here as a classic knapsack problem. A polynomial algorithm based on constrained dynamic programming is proposed, the spatial clusters detection dynamic scan. It minimizes a bi-objective vector function, finding a collection of Pareto optimal solutions. The dynamic programming algorithm is adapted to consider geographical proximity between areas, thus allowing a disconnected subset of aggregated areas to be included in the efficient solutions set. It is shown that the collection of efficient solutions generated by this approach contains all the solutions maximizing the spatial scan statistic. The plurality of the efficient solutions set is potentially useful to analyze variations of the most likely cluster and to investigate covariates.

Gladston J. P. Moreira, Luís Paquete, Luiz H. Duczmal, David Menotti, Ricardo H. C. Takahashi
31. Spatial Cluster Estimation and Visualization Using Item Response Theory

In recent years Kulldorff’s circular scan statistic has become the most popular tool for detecting spatial clusters. However, window-imposed limitation may not be appropriate to detect the true cluster. To work around this problem we usually use complex tools that allow the detection of clusters with arbitrary format, but at the expense of an increase in computational effort. In this chapter we describe a methodology that assists the detection of unconnected and arbitrarily shaped clusters and that provides a measure of uncertainty in the design of such clusters.

André L. F. Cançado, Antonio E. Gomes, Cibele Q. da-Silva, Fernando L. P. Oliveira, Luiz H. Duczmal
32. Spatial Scan Statistics for Functional Data

In this chapter, we propose a review of spatial scan statistics methods for functional data. These are defined by spatial locations within which longitudinal observations are made. Several methods of spatial scan statistics are presented in the context of univariate and multivariate functional data. Compared to univariate and multivariate spatial scan statistics, these methods have the advantage of considering the whole data while overcoming the problem of high dimensionality and multicollinearity. These methods are illustrated through an application to the detection of environmental black spots of multiple air pollution in France.

Mohamed-Salem Ahmed, Camille Frévent, Michaël Génin
33. The Role of Scan Statistics in High-Energy Astrophysics

Astronomers often face with the problem of recognizing activity periods of celestial sources. Once an activity period is found during the operations of the observing telescope, quasi-simultaneous multiwavelength monitor of the celestial source activity is often performed through alert dissemination and target-of-opportunity (ToO) programs. In some other cases, astronomers are interested in studying the physical characteristics of several activity periods of a single celestial source or of a class of sources of interest.The current generation of x- and gamma-ray telescopes usually provides the arrival time of collected photons; also some extremely fast optical instrumentation is able to record photon-by-photon events, and it has been a couple of years now that fast optical sensitive planes are available at the loci of 1–4 m class telescopes to the astronomers community on a regular basis during the biannual cycles of observations. In spite of this, scan statistics were not applied to solve the problems of recognizing activity periods of celestial sources. In this chapter, we address the astrophysical problem in a general way, and we identify some fields for which scan statistics will provide improvement to the current knowledge of astrophysical sources.

Luigi Pacciani
34. The Scan Statistic for Multidimensional Data and Social Media Applications

This chapter reviews the literature on the scan statistic with particular emphasis on its application. The chapter considers the application of the scan statistic to social media data, where it explores changes in the sentiments of sadness, fear, anger, joy, and love. This compares 2017 to 2016 data for each geographical location around the world. The temporal scan statistic is used to flag periods within 2017 with significantly different sentiments from the average of the whole of 2016. This was carried out firstly within Australia and then, in less details, in other pockets around the world.

Ross Sparks, Cecile Paris
35. The Spatial Structure of Housing Prices in Madrid: Evidence from Spatio-temporal Scan Statistics

We apply spatio-temporal scan statistics on the distributions of asking price per meter squared for various segments of the housing market (attics, houses, flats of various sizes) in the city of Madrid for 5 years during the period 2008–2019. Our application shows how spatio-temporal scan statistics can be useful for assessing the dynamics of urban spatial structure analyzed through the lens of housing prices. The focus on the post-2008 period and the computation of prospective clusters allows to detect the winners and the losers of the 2008 real estate crisis and to uncover new trends during the post-crisis period. We show that the economic crisis in Madrid has had a strong impact on housing market with increased polarization between the center and the periphery and between the northern and southern areas of Madrid, with some heterogeneity depending on the neighborhood, on the market segment, and on the urban policies undertaken.

Coro Chasco, Julie Le Gallo, Fernando A López
36. Two-Dimensional Discrete Scan Statistics with Arbitrary Window Shape

The definition of the two-dimensional discrete scan statistic with rectangular window shape is extended to a more general framework. In particular, this approach allows to introduce different shapes for the scanning window (discretized rectangle, polygon, circle, ellipse, or annulus). We provide approximation for the distribution of the scan statistic and illustrate its accuracy by conducting a numerical comparison study. The power of test based on the scan statistics is also evaluated by simulation.

Alexandru Amărioarei, Michaël Génin, Cristian Preda
37. Variable Window Scan Statistics for Poisson Processes

We present methods to do fast online anomaly detection using scan statistics. Scan statistics have long been used to detect statistically significant bursts of events. We extend the scan statistic framework to handle many practical issues that occur in application: dealing with an unknown background rate of events; allowing for slow natural changes in background frequency, the reverse problem of finding an unusual lack of events; and setting the test parameters to maximize power. We demonstrate the utility of these improvements on real and synthetic data sets with comparison to other methods.

Ryan Turner, Steven Bottone
38. Variable Window Scan Statistics: Alternatives to Generalized Likelihood Ratio Tests

Classical variable window scan statistics are based on likelihood ratios issued from parametric models. However, these likelihood ratios do not give equal chances to all potential clusters. I introduce alternatives which do not suffer the same problem and describe their properties. I apply these methods to a classical epidemiological data set.

Lionel Cucala
39. Waiting for Scans Containing Two Successes

In the present chapter, we present a review of results pertaining to the distribution of waiting times for the occurrence(s) of scans of type 2∕r in sequences of binary trials. Our review covers the geometric distribution of order 2∕r, the negative binomial distributions of order 2∕r, and their generalizations. Exact and asymptotic results are presented and illustrated through numerical examples. In the case of the geometric distribution of order 2∕r, a new closed, exact formula is established. Several applications of the reviewed waiting time distributions in various scientific areas are discussed in some detail.

Markos V. Koutras, Spiros D. Dafnis
40. Wilcoxon Rank Sum Scan Statistics for Continuous Data with Outliers

In this chapter, we investigate the performance of several Wilcoxon rank sum scan statistics in detecting a local change in population mean, in the presence of outliers, for one- and two-dimensional data, generated by a continuous distribution. The detection problem is formulated via testing of hypotheses and implemented via simulation using a nonparametric bootstrap approach. The performance of the Wilcoxon rank sum scan statistics discussed in this chapter is evaluated via simulation based on the accuracy of achieving the specified significance level and the power against selected alternatives. The selected alternative hypotheses are based on probability models for the observed data, probability models for the outliers, and their location in the data and selected parameters indicating the local change in the population mean. Directions for future research are discussed as well in this chapter.

Qianzhu Wu, Joseph Glaz
Metadata
Title
Handbook of Scan Statistics
Editors
Joseph Glaz
Markos V. Koutras
Copyright Year
2024
Publisher
Springer New York
Electronic ISBN
978-1-4614-8033-4
Print ISBN
978-1-4614-8032-7
DOI
https://doi.org/10.1007/978-1-4614-8033-4

Premium Partner