DNA Copy-Number Segments

The Copy_Number_segments table contains one row per copy-number segment per TCGA aliquot. Each TCGA aliquot is uniquely represented by a TCGA barcode of length 24, eg TCGA-04-1517-01A-01D-0533-01. (For more information on how TCGA barcodes were created and how to “read” a TCGA barcode, click on the preceding link.)

Platform

DNA Copy-Number data was generated for the TCGA project using the Affymetrix GenomeWide Human SNP 6.0 Array.

Pipeline

DNA Copy-Number data was generated for the TCGA project at the Broad Genome Characterization Center. A DESCRIPTION.txt file is included with each data archive at the DCC describing the algorithms, methods, and protocols used to produce the Level-1, Level-2, and Level-3 data.

ETL Details

Each Level-3 data archive contains 4 output files per sample assayed: two based on the hg18 reference, and two based on the hg19 reference. The BigQuery table is populated only with the files ending with nocnv\_hg19.seg.txt. The num_probes and segment_mean fields in the raw files are sometimes represented using Exponential Scientific Notation (eg 8.7E+07) and were interpreted as integer or floating-point values respectively.

The mapping between TCGA aliquot barcodes and Level-3 data files was obtained from the SDRF file.


Have feedback or corrections? You can file an issue here or email us at feedback@isb-cgc.org.