Testing Several Hypotheses on Rank-Based Statistics and Bayes Factor

Student thesis: Doctoral Thesis

Abstract

Independent testing between random vectors x and y and homogeneity testing of two random samples are essential tasks in statistics. Under the univariate case, the rank number of each observation among samples can be computed straightforwardly and is robust to outliers and extremes in the samples, thus many classical and effective test methods were constructed based on rank statistics, such as the Spearman rank correlation test, Wilcoxon-Mann-Whitney test, etc. However, in practice, we are often faced with multivariate or even high dimensional data. Thus it is worth studying how to extend the methods based on rank statistics to the multivariate case and build more robust and effective tests, which is the focus of Chapters 2-3 of this paper. In addition, the hypothesis test for parameters in the model is also an important aspect in statistics. In recent years, high-dimensional data has frequently appeared in many fields like ecology, agriculture, medicine and finance, it brings many difficulties and new challenges to the classical approaches to parameter testing. Especially in the field of biomedicine, the linear mixed model is often used to analyze the longitudinal data due to the intra-group correlation structure of data. Therefore, Chapter 4 of this paper is mainly concerned with the fixed effects testing in high-dimensional linear mixed models.

The main idea of Chapter 2 is that the observations which are close in x tend to be close in y if the vector x and y are dependent. We generate the minimal spanning trees both in x and y spaces, and for each edge in each minimal spanning tree, the corresponding rank number can be calculated based on another random vector. And then several symmetrical independence tests for x and y are constructed based on these rank numbers, the symmetrical tests help avoid the contradictory conclusion from the random selection between x and y. The exact distributions of test statistics are investigated when the sample size is small. Also, we study the asymptotic properties of the statistics. Since the null distribution of the new proposal is not distribution-free, the permutation method is introduced for getting p-values of the statistics. Compared with the existing methods, our proposed methods are more efficient demonstrated by numerical analysis.

In Chapter 3, the rank of sample points is redefined based on the distance between points, and the Wilcoxon-Mann-Whitney test is generalized to the multivariate case. The Wilcoxon-Mann-Whitney test is designed to test for the homogeneity of two random samples in the univariate case. It is very powerful to detect location shifts yet may lose power completely when there exist scale differences. The generalized test is in spirit to compare difference between the distribution functions of two random samples. We derive the limiting distributions of the proposed statistic under the null hypothesis and the alternative hypothesis. Since its asymptotic distribution under the null hypothesis is the distribution of a weighted summation of the infinite number of chi-square variables, we use the Bootstrap resampling method to determine the p-value of the test. Finally, the performance of the new rank statistic under limited samples is studied by numerical simulation and real data analysis.

In Chapter 4, we propose a testing method for the fixed effects in the high dimensional linear mixed model based on the Bayes factor. Hypothesis testing on high-dimensional fixed effects is indispensable for investigating the utility of the predictors on response. In this case, the conventional frequentist methods designed for cases with fixed dimensions fail completely since the dimension is larger than the sample size. Based on the Bayes factor, a novel statistic is proposed for testing high-dimensional fixed effects. By transferring the linear mixed model, we build a bridge between the Bayes factor and the mimic likelihood ratio statistic of the modified model and provide a convictive justification for testing high-dimensional fixed effects through the mimic likelihood ratio statistic. The proposed statistic can be represented as the ratio of two quadratic forms constructed based on the random effects and random noises. In contrast to the existing results for quadratic forms based on independent and identically distributed random variable sequences, we investigate the asymptotic normality for the quadratic form constructed by independent but not identically distributed random sequences. This theoretical result itself is very rewarding. To put the test procedure into practice, a one-step iteration method is innovatively developed to determine the critical value. The power function under the local alternative is derived with some mild conditions. In numerical experiments, we demonstrate the higher powers in comparison with the existing method and the practical utility of the proposed method.

In Chapter 5, the research of this paper is summarized and the main conclusions are given, and some issues worthy of further study are also discussed.
Date of Award17 Apr 2023
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorWangli XU (External Supervisor) & Heng LIAN (Supervisor)

Cite this

'