Modifying the Classical F Test for Microarray Experiments

Microarray data has a high dimensional data structure that makes statistical inference drawn from this type of data challenging. Since current statistical methods are generally for “small p and large n”, these methods can be insufficient to draw valid conclusions for microarray data. Nevertheless, some of these methods, such as ANOVA (F test), are still widely used. One of the assumptions of the classical F test is that populations (genes) are assumed to be independent. This assumption is obviously violated in microarray experiments because gene-gene interactions can naturally occur. In this paper, we use an effective “column” size idea to take correlations among genes into account to modify the classical F test. We consider various magnitudes of correlation among genes in Monte Carlo simulation studies. We compare the proposed test (F -MOD) with the classical F test and multivariate Hotelling’s T2 test through validity and power analyses. We also demonstrate the proposed test with real type 2 diabetes mellitus gene expression data, which was obtained from the Gene Expression Omnibus (GEO) database with accession number GSE25724. Abstract


Modifying the Classical F Test for Microarray Experiments
Bourget G * Department of Mathematics, California State University, Fullerton, CA, USA A common characteristic of high dimensional data is that it has high dimension (p), and relatively small sample size (n).This kind of data structure is called "large p and small n".Besides having high dimensional data, microarray data also have correlation structure [9].Most of the current methods either ignore high dimensional data structure or fail to efficiently take correlations among genes into account.Multivariate analysis can take correlations among genes into account by analyzing genes jointly.Consequently, multivariate analysis methods have recently being used in microrray data [10,11].However, these methods are not straightforward, and most importantly ignore the multidimensional structure of the gene expression data.
Hotelling's T 2 test is one of the multivariate analysis methods that takes correlations among genes into account to identify differentially expressed genes.It has been applied in genome association studies [12], microarray process control [13], and data control charts [14].However, Hotelling's T 2 test does not take high dimensional data structure into account.For example, in a comparison of two groups, this test requires an explicit condition on data dimension and sample size: for fixed p, p < n 1 + n 2 − 1, where p is the number of genes, n 1 is the sample size of the group 1, and n 2 is the sample size of the group 2. Lu, et al. [15] presented a new T 2 statistic for analyzing microarray data.They used first a multiple forward search algorithm to select a subset of feature vectors in a high-dimensional microarray dataset to reduce the dimension (i.e., p) to satisfy the restriction p < n 1 + n 2 − 1, and then they implemented the Hotelling's T 2 test.
Volume 1 | Issue 1 Moreover, as an alternative test to Hotelling's T 2 , Chen, et al. [16] proposed a two-sample test for the means of high-dimensional data.
In this paper, we present a different approach proposed in Lu, et al. [15].Our approach is more general and practical than that of in [15], and moreover does not implement Hotelling's T 2 test but the simple classical F test.The proposed modified F test is denoted by F -MOD.We use an effective sample size idea to take correlation among genes into account [17][18][19].The effective sample size formula was originally proposed by Clifford, et al. [20], and was improved for small sample sizes by Dutilleul, et al. (1993) [21].Also, the same effective sample size formula was used in modified F tests to assess multiple correlation between one spatial process and several others [22], and to assess correlation between two time series [23].We implement the same effective sample size formula described in [21] to compute effective column size not effective sample size.Henceforth, we introduce a new nomenclature term "effective column size".To adopt the formula in [21], we consider the same structure of the design matrix (1) in the Methods section.
A s ingle multivariate observation is the collection of measurements on p different variables (genes) taken from the same trial (array).If n observations have been obtained, the entire data set can be represented in an n × p matrix Methods An another statistical technique for finding significant genes in a set of microarray experiments is Significance Analysis of Microarray (SAM) proposed by Tusher, et al. [24].The SAM uses repeated permutations of the data to determine if the expression of any genes are significantly related to the response.It uses a set of gene-specific t tests.Since, the classical F , Hotellings T 2 , and F -MOD tests use global F tests and not individual t tests as in SAM, we do not consider the SAM as one of the methods to be compared in this paper.Also, the goal of SAM is to handle gene-specific fluctuations by considering a statistic based on the ratio of change in gene expression to standard deviation in the data for that gene.However, in this paper, our goal is to handle gene-gene interactions and not in gene-specific fluctuations, which are two different problems to tackle.

The row vector
represents the jth multivariate observation.The matrix X represents p genes each having n observations.Now, consider a microarray experiment of n 1 and n 2 samples from populations 1 and 2, respectively.For example, population 1 can represent the disease group, while population 2 can represent the healthy group.Suppose that the expression levels of p genes are measured and matrix representations of populations 1 and 2 are defined in (1) as X and Y.The observations on p variables can be arranged as follows: The remainder of the paper is organized as follows.In the Methods section, we describe Hotelling's T 2 , classical F, and F -MOD tests, and in the Results section we outline Monte Carlo simulation studies, present its findings, and analyze gene expression data of type 2 diabetes mellitus.Finally, we draw conclusions in the Discussion section.

Comparing Mean Vectors from Two Populations
(1) Our goal in this paper is to only make inferences about the differences of the vector mean of the populations.That is, we want to know if µ 1 = µ 2 , or equivalently if µ 1 − µ 2 = 0.However, one further can investigate which means are different if the hypothesis of µ 1 − µ 2 = 0 is concluded.We need to make some assumptions to provide answers to these questions.The assumptions are: 1.The sample is a random sample of n 1 from a p-variate population with mean vector µ 1 and covariance matrix Σ 1 .
2. The sample is a random sample of n 2 from a p-variate population with mean vector µ 2 and covariance matrix Σ 2 .
3. The samples are independent of the samples .
For large samples, these assumptions are enough to make an inference about µ 1 − µ 2 .However, when the sample sizes n 1 and n 2 are small we need to have the following assumptions as well.
1.Both populations are multivariate normal, and The null (H 0 ) and alternative (H a ) hypotheses we are interested are: where µ 1 = (µ 11 , µ 12 , . . ., µ 1p )ʹ is the vector mean expression level of population 1, and µ 2 = (µ 21 , µ 22 , . . ., µ 2p )ʹ is the vector mean expression level of population 2. The null and alternative hypotheses can also be rewritten as or equivalently Note that, we test the mean expression of p genes all together not the individual mean expressions in ( 2) -( 4).That is, we consider a global test not an individual test.
We consider a microarray experiment composing of n 1 samples from population 1 and n 2 samples from population 2. Let X ij be the expression level for gene j of sample i from population 1, and Y kj be the expression level for gene j of sample k from population 2. The expression level vectors for sample i from population 1 can be expressed as X i = (X i1 , . . ., X ip )ʹ.The mean expression level of gene j in population 1 is defined as

Hotelling's T 2 Test
Then, the mean expression level vector for p genes for population 1 is given by ( 5) We can similarly define these expressions for population 2. The pooled variance-covariance matrix of p genes for populations 1 and 2 can be written as where S X and S Y are the sample variance covariance matrices of populations 1 and 2. Note that correlation among genesare taken into account through sample variance covariance matrices.The Hotelling's T 2 test [25] is defined as (6) (7) By Central Limit Theorem, (8) has classical F distribution with p degrees of freedom for the numerator and n 1 + n 2 − p -1 degrees of freedom for the denominator.This test requires that the degrees of freedoms are positive, that is, it forces the condition p < n 1 + n 2 − 1.However, this restriction makes it almost impossible to implement Hotelling's T 2 test in microarray experiments.
The classical F test compares the means of the columns of X, and assumes that these columns are independent (univariate case).In microarray experiment, we want to compare the differences of the p means of X and Y. Since we want to compare multivariate (Hotelling's T 2 ) and univariate (classical F) methods, we adopt the data structure from the multivariate to univariate case by considering the observations as the differences of the data matrices X and Y.That is, we compute X ij − Y ij , and apply the univariate F test on these observations.The F test is defined as where B = n -1 (I -n -1 J), J is the n × n matrix of ones, and I is the identity matrix.
In this paper, we use equation (11) defined in [21] to compute effective column size to identify differentially expressed genes in microarray data.We considered the following steps for F -MOD test in the simulation runs: first, we computed the effective column size, p, as in equation (11).
The estimated covariance matrices Σ X and Σ Y were computed using the raw data of X and Y, respectively.Second, we replaced p by p in the degrees of freedoms of the classical F test defined in (9).Finally, we computed the p-value of the global F test in (9) with p − 1 and p(n − 1) degrees of freedoms for the numerator and denominator degrees of freedoms, respectively.Note that, the sample size is n 1 = n 2 = n.(9) where MST is the mean square for treatments (genes), and MSE is the mean square for errors.The F obs in (9) follows an F distribution with p − 1 degrees of freedom for the numerator and p(n − 1) degrees of freedom for the denominator, where n 1 = n 2 = n.
When the assumptions are not satisfied by sample data, there are two general remedies: (1) to transform the data so that the assumptions are satisfied, or (2) to develop a modified inferential method in which the assumptions are relaxed at the estimation stage, or deviations from the assumptions are taken into account at the testing stage.

F -MOD Test
In linear models, the autocorrelation of errors has an impact on the inefficiency of slope estimators and the invalidity of significance levels.When regressors have fixed structure, the only source of autocorrelation comes from errors.However, when regressors also have random structures, their autocorrelations along with correlations of errors have an impact on estimation and testing [17][18][19]26,27].Since the autocovariances of stochastic processes bias the variance of sample correlation coefficients [28], the incorporation of effective sample size into modified t-tests were proposed [20,21].The effective sample size n in [20] was defined as (10) where Σ X and Σ Y were the estimated covariance matrices of X and Y, respectively.Dutilleul (1993) proposed an improved effective sample size for small sample sizes [21].However, the effective sample sizes prosed in [20] and [21] behave similarly for large sample sizes.The effective sample size in [21] was defined as ˆ( 11) ˆˆˆŴ e generated two multivariate normal distributions: MVN(µ 1 , Σ 1 ) and MVN(µ 2 , Σ 2 ), each with dimension p (genes).The variance covariance matrices are defined as where We can similarly define Σ (-ρ) by replacing ρ by (−ρ) in (12).

Results and Discussion
The matrices Σρ and Σ (−ρ) have dimensions g × g, and the matrices Σ 1 = Σ 2 have dimensions p × p.The constant term l is cancelled out in the computation of the effective column size in (11), hence, it has no effect on the effective column size.However, this term is considered to generate the data matrices X and Y with covariance matrices defined in (12).
Actually, the simulation setup has sound basis in methodologies used in analyzing real microarray data.It is common knowledge that genes are networked together in pathways.Although, it is true that weak connections between groups may exist, independence between groups is a reasonable assumption.Also, within each group, genes are either positively or negatively correlated, and due to their relative distance in the regulatory pathway, the further apart two genes, the less correlation between them.These are exactly the reasons why we considered the structures of Σ 1 and Σ 2 defined in (12) for microarray data.
We assumed that both populations have equal sample sizes (i.e., n 1 = n 2 ), and there are 10 matrices on the diagonals of Σ 1 and Σ 2 .For example, if p = 100 then there are 10 matrices on the diagonal of Σ 1 and Σ 2 with 10 genes in each matrix (i.e., g = 10).To assess the effects of correlation among genes, we took ρ = 0, 0.1, 0.2, . . ., 0.9 as various magnitudes of correlations.We also set the variances of each gene at 0.01 (i.e., σ 2 = 0.01).Even though the value of σ 2 is needed to generate X and Y, it has no effect on the computation of the effective column size.Two different significance levels, α = 0.01 and 0.05, were used in validity and power analyses.
The null hypothesis in validity analysis was set to µ 1 = µ 2 |= (0,0,0,.....,0)' (p × p) whereas in power analysis µ 1 ≠ µ 2 with µ 1 = (0,0,0,.....,0)' (p × 1) and More precisely, the first 2% of the means of the genes were set to 0.5, and the rest were set to 0 in µ 2 .If 0.02 * p was not an integer value, then we used ceiling function in R that takes a single numeric argument a and returns a numeric value containing the smallest integers not less than the corresponding elements of a.
The simulation program was written and run in R, which is a free software environment for statistical computing and graphics.We ran 10,000 data sets to test the null hypothesis.We computed empirical significance levels (p-values) and powers of the tests to draw conclusions about the testing procedures.
Lu, et al. [15], Chen, et al. [16], and SAM [24] methods were not compared in the simulation.The SAM handles gene-specific fluctuations by considering a statistic based on the ratio of change in gene expression to standard deviation in the data for that gene.However, in this paper, our goal is to handle gene-gene interactions and not gene-specific fluctuations.Also, Lu, et al. [15] modified the degrees of freedom in Hotellings T 2 test but F -MOD modified the degrees of freedom of the classical F test.Moreover, the method of Chen, et al. [16] was not compared because they proposed a two-sample test, and we used a test that modified the global F-test.
The strict definition of a testing procedure to be valid at a significance level α is that if the actual p-value, which is the probability of rejecting the null hypothesis when in fact the null hypothesis is true, is less than or equal to α.To take variability among generated data into account in simulation runs, one may consider the upper limit of the approximate 95% confidence interval for the actual p-value.Under binomial distribution model, for α and m simulation runs, the approximate 95% confidence interval is α ± 2√α(1 − α)/m.In simulation runs, we took α = 0.01 and 0.05, and m = 10, 000.The upper limits are

Validity and Power Analysis
_________ Therefore, we assessed the validity of the testing procedures based on the strict definition of the validity and the variability associated with the data generation.That is, the validity conditions are p-value ≤ 0.012 when α = 0.01, and p-value ≤ 0.054 when α = 0.05 in Tables 1 and 2.
In Table 1, we investigated the validit y of the tests at α = 0.01 and 0.05 when p < n 1 + n 2 − 1.We need this restriction to perform the Hotelling's T 2 test, but not the other two tests.Table 1 showed that the classical F test suffered lack of validity when correlations among genes were between mild and strong.The Hotelling's T 2 test is known to be not well-defined when p is much greater than n because the variance-covariance matrices Σ 1 and Σ 2 become singular.As a result, Hotelling's T 2 test becomes unstable.This phenomena was ascertained in Table 1 when p > 60.Therefore, we suggest not to use Hotelling's T 2 test when p > 60.In contrast, the proposed F -MOD test always provided valid tests for any ρ, except only in two cases (p = 50 when α = 0.05 and α = 0.01), which might be solely due to variation among data.
We studied the validity of F and F -MOD tests without the restriction p < n 1 + n 2 − 1 in where p is the number of columns (e.g., the number of genes) and n is the number of sample size (e.g., the number of individuals.)Table 3 provided power analysis at α = 0.01 and 0.05 when p < n 1 + n 2 − 1.Since F test suffered lack of validity when ρ > 0.2, we did not analyze the power values in the table; these values were provided only for completeness of the Table.Hence, the power of F test should be ignored when ρ > 0.2.While Hotelling's T 2 test provided better power when correlations among genes were not too strong, the power decreased as correlations among genes got stronger.The Hotelling's T 2 test actually became powerless as p increased.This is not an unusual observation because it is known that even when p ≤ n, the Hotelling's T 2 test perform poorly if p is nearly as large as n.The performance of the Hotelling's T 2 test under p, n → ∞ with p/n → 1 -Є was studied in [29], which they showed that the asymptotic power of the test suffered for small values of Є > 0. A number of improvements to give better power on the Hotelling's T 2 test in high-dimensional data have been proposed in [16,[29][30][31].It was interesting to observe that Hotelling's T 2 test was more powerful when α = 0.05 than when α = 0.01.Its powers were more than 88.5% when α = 0.05, but not more than 35.4% when α = 0.01.In contrast, the F -MOD always provided powers at 100%.We did not provide a table for power analysis when the restriction p < n 1 + n 2 − 1 was because held because it provided similar results to those in Table 3. 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 F 0.05 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 F-MOD 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Hotelling's 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 F 0.01 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 F-MOD 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Hotelling's Table 3: Power analysis with restriction p < n 1 + n 2 − 1, where p is the number of columns (e.g., the number of genes) and n is the number of sample size (e.g., the number of individuals.)Table 4 shows average effective column sizes computed from (11) when 10,000 simulation runs were performed.The effective column sizes decreased as correlations among genes got stronger.As expected, when genes are independent (i.e., ρ = 0) the effective column size was the same as the original number of genes (p).used the gene expressions of type 2 diabetes from the data base Gene Expression Omnibus (GEO) with accession number GSE25724 [32] (data was not collected by us).The normalized gene expression data of p = 22, 283 genes was obtained from six type 2 diabetic human islets (population 1, n 1 = 6) and seven non-diabetic human islet (population 2, n 2 = 7).In over all design, human islets were isolated from the pancreas of organ donors by collagenase digestion followed by density gradient purification, then hand-picked and cultured two days in M199 culture medium.The platform GPL96 [HG-U133A]) by Affymetrix was used.

Real Data: Type 2 Diabetes Mellitus
The programming codes to analyze gene expression data were written in R software.The dimensions of the matrices X and Y were 6 × 22, 283, and 7 × 22, 283, respectively.Since F -MOD test required the differences of the observations from two populations, six non-diabetic patients were chosen to have equal sample sizes for both populations (n 1 = n 2 = 6).That is, the dimension of the difference matrix was 6 × 22, 283.The data structure was high dimensional (p = 22, 283 genes, and n = 6 observations), which caused memory exhaustion in R.However, we used built-in functions such as "as.big.matrix" to do matrix operations and "bigcor" to compute correlation and covariance matrices of size 22, 283 × 22, 283.The effective column size in (11) was easily computed using the as.big.matrixfunction to multiply two or four matrices of sizes 22, 283 × 22, 283.
Before analyzing the data, we verified that the assumptions of the fixed one-way ANOVA were satisfied: (1) our data did not violate the assumption of normal distribution, because fixed one-way ANOVA is considered a robust test against the normality assumption.(2) the equality of variances were not violated because it is well known that when the error variances are unequal, the F test for equality of means with the fixed one-way ANOVA model is only slightly affected if all factor level sample sizes are equal or do not differ greatly.In real data, the sample size was six in each gene, hence this assumption was not violated.However, 3) the independence of the populations were violated.To show dependency, we computed the correlation matrices for both populations.The correlation matrix has entries of correlations for pairwise genes.The number of pairwise genes for 22,283 genes is ( 22,283 ) = 2.48254903 × 10 8 .We counted the pairwise correlations that are more than 0.5, 0.7, and 0.9 in absolute values.The result is shown in Table 5.We concluded that genes were correlated in both populations, and hence the classical F test was not performed.The Hotelling's T 2 was also not performed because 22, 283 < 6 + 6 − 1.Therefore, we only considered F -MOD test to analyze the data./ 2 In the simulation study, we were only interested in the hypotheses defined in (2) or (3).That is, if there was a difference in the vector means of the populations.In the data analysis we proceeded one step further to identify differentially expressed genes if the null hypothesis in (2) or (3) was rejected.The statistic in ( 9) was F obs = 5.609043, and the effective column size in (11) was computed as p = 9.424243.Since p-value= 4.13 × 10 -5 was smaller than the significance levels α = 0.01 or α = 0.05, we rejected the null hypothesis, and concluded that 22,283 genes were differentially expressed together.We then run t tests for each genes with the adjusted degree of freedoms p(n 1 − 1) with and without Bonferroni corrections at α = 0.01 and α = 0.05 significance levels.Below, we only presented the number of significant genes without the Bonferroni corrections but provided the list of significant genes with the Bonferroni corrections in Tables 6-9.With or without Bonferroni corrections, we then compared these significant genes with significant genes listed at the GeneCards database.GeneCards is a searchable, integrated database of human genes that provides comprehensive, updated, and user-friendly information on all known and predicted human genes (http://www.genecards.org).The search is automatically extracted from more than 100 carefully selected web sources, and uses standard nomenclature and approved gene symbols.Moreover, it presents a rich subset of data for each gene by providing links to the original sources for further examination.Its use is free for academic non-profit institutions.We identified 1083 significant genes related to type 2 diabetes by searching the keywords "type 2 diabetes mellitus".After Bonferroni correction, there were 674 significant genes at α = 0.01/22283 = 4.49 × 10 -7 significance level in which 52 were matched with GeneCards database (Tables 6 and 7).Without Bonferroni correction at α = 0.05 significance level, there were 7116 significant genes in which 554 of them were matched with the GeneCards (results were not shown).With Bonferroni correction at α = 0.05/22, 283 = 2.24 × 10 -6 , there were 901 significant genes in which 73 of them were matched with the GeneCards data (Tables 8 and 9).
We used PANTHER classification system, which is a comprehensive, curated database of protein families, trees, subfamilies and functions [33,34], for the significant genes identified in Tables 6-9.The tool is available at http://pantherdb.org.The results are presented in Tables 10-12.The main goals of PANTHER are to make accurate inference of genes and protein functions over large sequence databases.PANTHER extrapolates phylogenetic trees to represent gene family evolution.It also identifies subfamilies and protein class.In Tables 10-12, we presented families/subfamilies and protein class for each gene.The significant genes were grouped in the following protein classes: peptide hormones and protein hormones (have an effect on the endocrine system of animals and humans); DNA-binding proteins (can incorporate domains as the zinc finger, the helix-turn-helix, and the leucine zipper that facilitate binding to nucleic acid); acetyltransferase or transacetylase (is a type of transferase enzyme that transfers an acetyl group); carbohydrate kinase domain also known as CARKD; chemokines (are a family of small cytokines, or signaling proteins secreted by cells); hydrolase (is an enzyme that catalyzes the hydrolysis of a chemical bond); dehydrogenase also called DHO (is an enzyme belonging to the group of oxidoreductases that oxidizes a substrate by a reduction reaction that transfers one or more hydrides (H-) to an electron acceptor); peroxidases (are a large family of enzymes); and reductase (is an enzyme that catalyzes a reduction reaction).Microarray data has a high dimensional data structure that makes statistical inference from this type of data challenging.The most widely used statistical methods for finding differentially expressed genes from microarray data are univariate.While univariate methods do not take correlations among genes into account, gene-gene interactions shouldn't be ignored in testing procedures.Multivariate statistical methods can overcome this deficiency of univariate methods by taking gene-gene interactions into account through variance-covariance matrices.However, these methods are sometimes not straightforward, and moreover ignore the multidimensional structure of the gene expression data.

Conclusion
The Hotelling's T 2 test is one of the multivariate analysis methods that takes correlations among genes into account but requires the restriction p < n 1 + n 2 − 1, when two populations are considered with sample sizes of n 1 and n 2 .In microarray experiments, it is almost impossible to satisfy this condition because p is always larger than n 1 and n 2 .That means Hotelling's T 2 suffers to handle curse of dimensionality.One solution is to apply Principal Component Analysis (PCA), or some other methods to satisfy the restriction before implementing the Hotelling's T 2 test.However, even this condition is satisfied, this test still suffers lack of powers when p, n → ∞ with p/n → 1 − Є for small values of Є > 0.
In the Real Data section, we analyzed gene expressions of type 2 diabetes [32].There were 117,610,455 pairwise genes that had correlations in absolute value more than 0.5 in the non-diabetic group, and 107,977,419 pairwise genes that had correlations in absolute value more than0.5 in the diabetic group.We concluded that the assumptions of independence were violated in both groups, and hence the classical F test was not performed.We also did not implement Hotelling's T 2 test because the restriction 22, 283 < 6 + 6 − 1 did not hold.Since F -MOD takes correlations among genes into account, we analyzed the data only using F -MOD test with and without Bonferroni corrections.For example, we identified 901 significant genes in which 73 of them matched with the GeneCards data at α = 0.05/22, 283 = 2.24 × 10 -6 .
In this paper, we consider F -MOD test that used the novel idea of effective column size concept in microarray data.The test provides valid testings and 100% powers for any ρ.More-over, the computation of F -MOD can easily be performed in R using built-in functions such as "as.big.matrix" and "bigcor" without exhausting the memory in R. To adopt the data structure from the multivariate case to the univariate case, the differences of the data matrices X and Y were considered as observations.If the null hypothesis in (2) is rejected, then we suggest testing to identify differentially expressed genes H 0 : µ 1i = µ 2i versus Ha : µ 1i ≠ µ 2i (i = 1, 2, . . ., p) using the classical t-test with p(n 1 − 1) degree of freedoms with Bonferroni correction.Here, µ 1i is the mean expression of gene i from population 1, and µ 2i is the mean expression of gene i from population 2.

Ŵe
suggest for researchers to consider the F -MOD test with a multiple test adjustment correction, such as Boferroni correction, instead of the classical F test if the assumption of independence is in question.Hotelling's T 2 is the second competitive test to F -MOD.However, the restriction p < n 1 + n 2 − 1 does not hold in microarray data, and renders this test inapplicable.We believe that the use of effective column size in microarray experiment will be a novel approach that will help practitioners to choose an easy, effective, and powerful testing procedure instead of a complicated or a procedure with restrictions, such as Hotelling's T 2 test.
In future work, it is interesting to investigate the performance of a test that modifies Hotelling's T 2 test by taking into account the effective column size concept in the degrees of freedoms.
We would like to thank the referees for their valuable comments that helped improve the quality of the article.

Table 2
. Since F MOD performed very well up to p = 80, we ran simulations for p = 100 and 200 to better understand the performance of the test for larger number of genes.Both tests performed similarly as in Table1.That is, F test was only valid when correlation among genes did not exist or the magnitudes of the correlations were very weak.The F -MOD test always provided valid testings, except in one case.p =

Table 1 :
Validity analysis with restriction p < n 1 + n 2 − 1, where p is the number of columns (e.g., the number of genes) and n is the number of sample size (e.g., the number of individuals.)

Table 2 :
Validity analysis without restriction

Table 4 :
Size Effective column size p for p when n 1 = n 2 = n Ŵe

Table 5 :
The number of pairwise correlations from the correlation matrices for non-diabetic and diabetic groups

Table 6 :
The significant genes of Type 2 Diabetes Mellitus at α = 0.01/22283 = 4.49 × 10 -7 when genes are matched with GeneCards data base The second column shows the name of the genes from UniGene bank.The third column shows the Entrez Gene Database UID number.The fourth column shows the p-values adjusted by Bonferroni correction.The last column shows the title of the gene represented by the probe set.In column three, * and ** symbols are replaced for AFFX-HUMGAPDH/M33197_M_at and AFFX-HUMGAPDH/M33197_5_at, respectively.enerepresented by the probe set.In column three, ?symbol is replaced for AFFX-HUMGAPDH/M33197_5_at respectively

Table 9 :
Table 8 continuesThere were 4215 significant genes at α = 0.01 significance level (without Bonferroni correction) in which 297 of them were matched with GeneCards database (results were not shown).

Table 10 :
Functional classification of the genes in Tables 6-9 by PANTHER