1388 GeneChip data (25k, 46 series) downloaded from TAIR (
Microarray data from AtGenExpress).
Some data such as a large series of time-course experiments under a single biological condition are biologically redundant or biased.
Since these biases may mislead to incorrect conclusions, we have corrected these possible redundancies and biases based on Pearson's correlation coefficients (PCCs) between samples.
- First, PCCs between sample S1 and sample S2 were calculated.
, where REg,s is the relative expression of gene G in sample S,
is the average relative expression value for all genes in sample S1;
,
is the average relative expression value for all genes in sample S2;
.
- For the paiwise sample redundancy (Js1,s2) between sample S1 and sample S2, we introduced the cut-off threshold C to Rs1,s2.
.
We used 0.4 for this threshold, which is roughly optimized.
- The sample redundancy Js1 for sample S1 is calculated as the summation of the pairwise sample redundancies between sample S1 and each of all samples including sample S1 itself.
- The weight of sample S1 is the inverse of the square root of the sample redundancy Js1. This procedure is analogous to the calculation of the standard error from the standard deviation. If sample S1 is replicated 4 times with no experimental error, the reliability of the data for sample S1 become double.
The weighted PCC (
CORg1,g2) was calculated between gene G1 and gene G2.
, where
REg,s is relative expression of gene G in sample S,
is the weighted average relative expression value of gene G1;
,
is the weighted average relative expression value of gene G2;
.
From the coexpression version c4.0, we do not use this PCC for the selection of coexpressed genes. This PCC is transformed to MR.
(
Rank and Mutual Rank)