CanGEM
 
Friday, January 6, 2017
Logged in as guest | Register
Login: Password:
1.5. How the gene copy numbers are calculated
1. General

All of the array comparative genomic hybridization (CGH) data stored in the database goes through a predefined analysis pipeline. As such a predetermined pipeline can never be an optimal one for all possible cases, the provided gene copy numbers are meant to provide summary statistics, and to aid in locating data from the database. More detailed analysis should always be performed on the raw data that can be downloaded to your own computer.

The analysis algorithm is outlined below.

First, the raw data files are loaded into R using the limma package. Outliers are filtered out, and the data is normalized with normalizeWithinArrays (loess) and normalizeBetweenArrays. The log ratios are then linked with the physical coordinates of the microarray probes, that have been obtained with a MegaBlast analysis.

If the patient under study is not a male, all measurements for the Y chromosome are removed. Also, if the reference sample is not sex matched (e.g. using a male reference with a female sample), also measurements for the X chromosome are removed.

Next, the data is processed with the ACE algorithm of CGH Explorer, which will convert the log ratios to discrete copy numbers of 0, 1, or -1, representing normal copy number, a gain, or a loss, respectively.

Finally, the copy number levels of the microarray probes are streched out to cover every gene from the Ensembl 45 release. This is done one gene at a time as follows. If there are microarray probes within the start and end position of the gene, use them to calculate the copy number. If there are no overlapping probes, use the last preceding probe and the first probe after the gene for the calculation. If all of those probes have the same copy number, otherwise it will get a copy number of 0 (representing normal).

For example, a gene with two overlapping probes that have copy numbers of -1 and -1, will get the copy number of -1. Another gene with no overlapping probes, but having a preceding probe with the value of -1 and a tailing probe with the value of 1, will get the copy number 0.

The complete analysis algorithms for R can be downloaded in the cangem package.

2007-07-08 20:45:26 by Ilari Scheinin
2007-08-16 19:23:04 by Ilari Scheinin

GUID: {BEB1116F-6296-410D-9859-E30DF7F8A663}