Reference Data¶
ISB-CGC Hosted Reference Data¶
In order to facilitate working with the TCGA data tables that the ISB-CGC is hosting in BigQuery, additional reference data tables have also been created, others are hosted by Google Genomics, and suggestions for more are welcome at feedback@isb-cgc.org.
Platform Reference Data¶
Some reference data is necessary in order to work with data generated by specific platforms such as the Illumina DNA Metylation array, or the Affymetrix Genome-Wide Human SNP Array 6.0. This section will provide links to existing sources of information elsewere on the web, or will describe additional resources that are hosted by the ISB-CGC. If there are additional platform reference sources that you would like to see hosted in BigQuery tables, please let us know at feedback@isb-cgc.org.
DNA Methylation Platform¶
Most of the DNA Methylation data produced by the TCGA project was obtained using the Illumina Infinium HumanMethylation450 (aka 450k) BeadChip array. Some of the earlier tumor types were assayed on the older, 27k array.
Although additional details can be found at the Illumina webpage, we have uploaded the platform
annotation information into the BigQuery table isb-cgc:platform_reference.methylation_annotation
Each CpG locus is uniquely identified as described in this technical note and this unique identifier can be used to look up and cross-reference data between the TCGA DNA methylation data table and the platform annotation table.
Genome Reference Data¶
Reference data that describes or annotates the human (or other) genome(s) is described in this section.
Reference data hosted by the ISB-CGC in BigQuery tables are available in the isb-cgc:genome_reference
dataset.
GENCODE¶
Release 19, the final build of the GENCODE geneset mapped to GRCh37 has been uploaded as a BigQuery table
called GENCODE_r19. This table can be used to find the genomic coordinates for a gene of interest in
combination with queries against molecular tables such as the TCGA copy-number data.
miRBase¶
The human portion of version 20 of the miRBase database has been uploaded as a BigQuery table
called miRBase_v20. This database can be used to map between MIMAT accession IDs, miR names, and
mature miR names. The miR sequence cal also be retrieved from this table.
miRTarBase¶
The recently updated miRTarBase database (release 6.1) is available as a BigQuery table: isb-cgc:genome_reference.miRTarBase
Other Reference Data Sources¶
Google Genomics maintains a list of publicly available datasets, including Reference Genomes, the Illumina Platinum Genomes, information about the Tute Genomics Annotation table, etc.