Development of steady-state analysis and cluster testing methods for quality control and surveillance problems
Student thesis: Doctoral Thesis
Related Research Unit(s)
This dissertation deals with some problems in quality control and surveillance. In quality control, the control chart is one of the basic tools used to determine the state of statistical control over a manufacturing or business process. For example, cumulative sum (CUSUM) control charts and exponentially weighted moving average (EWMA) control charts are typically used to monitor for small changes such as step shifts in a process mean. In many surveillance problems, cluster testing and detection are important. For example, the methods of cluster testing and detection can be used to identify emergent outbreaks and determine the demographic factors that influence diseases. In this study, we focus on the steady-state analysis of control charts using the integral equation method and the development of cluster testing method using minimal internal distance. The main contributions of this research are threefold. First, this research analyzes the conditional steady-state performance of EWMA control charts under step shifts in a process mean and variance. Both one-sided and two-sided charts are studied. EWMA control charts were originally developed to detect changes in the process mean. At present, EWMA control charts are also used to detect variance change. Both the run length distribution and ARL have become popular performance measures for schemes with this objective and have been well studied in zero-state situations. In the present work, integral (recurrence) equations are established to solve these problems.A fast and accurate algorithm based on the numerical approximation of the integral (recurrence) equations is proposed for computing the steady-state run length distribution and average run length (ARL) of the EWMA chart. In particular, the piecewise collocation method is used to deal with the non-smooth integration kernel function problem. The proposed method is compared with the numerical simulation and Markov chain based approaches. The collocation method can produce accurate approximation results in both steady-state run length distribution and ARL calculations. Second, this research investigates the probability distribution of the CUSUM charting statistics based on a recurrence relationship in parallel to the conventional study on the distribution of the first time to signal. The probability distribution of the CUSUM statistic cannot only provide statistical significance of observations against the null hypothesis of being in control but also facilitate the analysis of the CUSUM chart in the steady-state scenario. Both the conditional case (CUSUM chart without restarting) and the cyclical case (CUSUM chart with restarting) are considered. It is shown that the distribution of CUSUM charts both with and without restarting approaches a stationary distribution, independent of their initial values. It is shown in this dissertation that the null steady-state distribution of the unbounded CUSUM chart investigated by Grigg and Spiegelhalter (2008) is a special case of this research as the control limit approaches infinity. Finally, this research proposes a cluster testing and detection method by using minimal internal distance. The cluster testing problem can be formed by testing the null hypothesis of uniformity against a non-random clustering alternative under a simple particular situation. The scan statistic is known to solve this problem both in testing and in detecting clusters. Density-based clustering methods in unsupervised machine learning can detect clusters based on density changes. In this research we propose another approach, which is inspired by the recent results of minimum spanning tree from graph theory. The minimal internal distance is used as a new statistic to perform hypothesis testing. By using minimal internal distance, the scan window shape problem in the scan statistic is not an issue, and the detected cluster can be arbitrarily shaped. Based on a simple calculation of the probability distribution of minimal internal distance, the proposed testing method can run automatically and does not require any parameter setting. This method is illustrated in this study by using some simulated benchmark datasets and comparing it with some well-known methods, such as spatial scan statistic, k-means, and dbscan. Furthermore, the method is applied to the disease surveillance problem. Disease surveillance is essential in the study of the spread of diseases. An important task in disease surveillance is identifying disease clusters, which are areas with unusually high incidence of diseases. In this research, the disease surveillance problem is formulated as a cluster testing problem. Simulated and real lung cancer data from New Mexico are analyzed according to the proposed method, and the results are compared with those of the popular spatial scan statistic.
- Quality control, Charts, diagrams, etc., Cluster analysis