OpenPedCan-analysis

Data file descriptions

This document contains information about all data files associated with this project. Each file will have the following association information:

current release (v14)

| File name | File Type | Origin | File Description | |—————|—————-|————————|———————–| |histologies-base.tsv | Data file | Cohort-specific data files and databases | Clinical and sequencing metadata for each biospecimen |histologies.tsv | Modified data file | molecular-subtyping-integrate | histologies-base.tsv plus molecular_subtype, cancer_group, integrated_diagnosis, and harmonized_diagnosis |intersect_cds_lancet_strelka_mutect_WGS.bed | Analysis file | snv-callers | Intersection of gencode.v39.primary_assembly.annotation.gtf.gz CDS with Lancet, Strelka2, Mutect2 regions |intersect_strelka_mutect_WGS.bed | Analysis file | snv-callers | Intersection of gencode.v39.primary_assembly.annotation.gtf.gz CDS with Strelka2 and Mutect2 regions called |efo-mondo-map.tsv | Reference mapping file | Manual collation | Mapping of EFO and MONDO codes to cancer groups |efo-mondo-map-prefill.tsv | Modified reference mapping file | Analysis file generated in molecular-subtyping-integrate | Mapping of EFO and MONDO codes to cancer groups |ensg-hugo-pmtl-mapping.tsv | Reference mapping file | Manual curation of PMTLv1.1 by FNL; RNA-Seq pipeline GTF mapping | File which maps Hugo Symbols to ENSEMBL gene IDs an each ENSG to the RMTL curated by FNL |*.bed | Reference file | Manual collation | Bed files used for variant calling and are used for tmb calculation |uberon-map-gtex-group.tsv | Reference mapping file | Manual collation | Mapping of UBERON codes to tissue types in GTEx broad groups |uberon-map-gtex-subgroup.tsv | Reference mapping file | Manual collation | Mapping of UBERON codes to tissue types in GTEx subgroups |methyl-beta-values.rds | Processed data file | methylation beta values | Methylation beta values |methyl-m-values.rds | Processed data file | methylation m values | Methylation m values |rna-isoform-expression-rsem-tpm.rds | Processed data file | RNA isoform TPM files | RNA isoform TPM files |fusion-dgd.tsv | Processed data file | DGD merged fusion results | DGD merged fusion results |fusion-arriba.tsv.gz | Processed data file | Gene fusion detection; Workflow | Fusion - Arriba TSV, annotated with FusionAnnotator |fusion-starfusion.tsv.gz | Processed data file | Gene fusion detection; Workflow | Fusion - STARFusion TSV |fusion-annoFuse.tsv.gz | Processed data file | AnnoFuse QC filtered fusion file; Workflow | Filter out normal and non-expressed fusions | |fusion_summary_embryonal_foi.tsv | Analysis file | fusion-summary | Summary file for presence of embryonal tumor fusions of interest | |fusion_summary_ependymoma_foi.tsv | Analysis file | fusion-summary | Summary file for presence of ependymal tumor fusions of interest | |fusion_summary_ewings_foi.tsv | Analysis file | fusion-summary | Summary file for presence of Ewing’s sarcoma fusions of interest | |fusion_summary_lgg_hgg_foi.tsv | Analysis file | fusion-summary | Summary file for presence of LGG and HGG fusions of interest | |fusion-putative-oncogenic.tsv | Analysis file | fusion_filtering | Filtered and prioritized fusions |gene-counts-rsem-expected_count-collapsed.rds | Analysis file | PBTA+GMKF+TARGET collapse-rnaseq | Gene expression - RSEM expected_count for each samples collapsed to gene symbol (gene-level) |gene-expression-rsem-tpm-collapsed.rds | Analysis file | PBTA+GMKF+TARGET collapse-rnaseq | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level) |tcga_gene-counts-rsem-expected_count-collapsed.rds | Modified reference file | TCGA samples lifted from GENCODE v27 to v39 | Gene expression - RSEM counts for each samples collapsed to gene symbol (gene-level) |tcga_gene-expression-rsem-tpm-collapsed.rds | Modified reference file | TCGA samples lifted from GENCODE v27 to v39 | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level) |gtex_gene-expression-rsem-tpm-collapsed.rds | Modified reference file | GTEX v8 release lifted to GENCODE v39 | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level) |gtex_gene-counts-rsem-expected_count-collapsed.rds | Modified reference file | GTEX v8 release lifted to GENCODE v39 | Gene expression - RSEM counts for each samples collapsed to gene symbol (gene-level) |WGS.hg38.lancet.300bp_padded.bed | Reference Target/Baits File | SNV and INDEL calling | WGS.hg38.lancet.unpadded.bed file with each region padded by 300 bp |WGS.hg38.lancet.unpadded.bed | Reference Regions File | SNV and INDEL calling | hg38 WGS regions created using UTR, exome, and start/stop codon features of the GENCODE 31 reference, augmented with PASS variant calls from Strelka2 and Mutect2 |WGS.hg38.mutect2.vardict.unpadded.bed | Reference Regions File | SNV and INDEL calling | hg38 BROAD Institute interval calling list (restricted to Chr1-22,X,Y,M and non-N regions) used for Mutect2 and VarDict variant callers |WGS.hg38.strelka2.unpadded.bed | Reference Regions File | SNV and INDEL calling | hg38 BROAD Institute interval calling list (restricted to Chr1-22,X,Y,M) used for Strelka2 variant caller |WGS.hg38.vardict.100bp_padded.bed | Reference Regions File | SNV and INDEL calling | WGS.hg38.mutect2.vardict.unpadded.bed with each region padded by 100 bp used for VarDict variant caller |snv-consensus-plus-hotspots.maf.tsv.gz | Analysis file | Kids First somatic workflow consensus calls | Consensus (2 of 4) maf +1/4 hotspots |snv-mutect2-tumor-only-plus-hotspots.maf.tsv.gz | Analysis file | Kids First Tumor Only workflow | Mutect2 tumor only with additional filters to remove t_alt_count <5 |cnv-cnvkit.seg.gz | Processed data file | Copy number variant calling; Workflow | Somatic Copy Number Variant - CNVkit SEG file |cnv-consensus.seg.gz | Analysis file | [copy_number_consensus_call]](https://github.com/d3b-center/OpenPedCan-analysis/tree/dev/analyses/copy_number_consensus_call) | Somatic Copy Number Variant - WGS samples only |cnvkit_with_status.tsv
consensus_seg_with_status.tsv | Analysis files | copy_number_consensus_call | CNVkit calls for WXS or CNV consensus calls for WGS with gain/loss status |cnv-consensus-gistic.gz | Analysis file | run-gistic | GISTIC results - WGS samples only |cnv-controlfreec.tsv.gz | Processed data file | Copy number variant calling; Workflow | Somatic Copy Number Variant - TSV file that is a merge of ControlFreeC *_CNVs files |cnv-controlfreec-tumor-only.tsv.gz | Processed data file | Copy number variant calling Workflow - tumor only | Somatic Copy Number Variant - TSV file that is a merge of ControlFreeC *_CNVs files |cnv-gatk.seg.gz | Processed data file | Copy number variant calling | Somatic Copy Number Variant - TSV SEG file produced by GATK CNV | Analysis file | |consensus_wgs_plus_cnvkit_wxs_plus_freec_tumor_only.tsv.gz | Analysis file | focal-cn-file-preparation | TSV file containing genes with copy number changes per biospecimen; all chromosomes |consensus_wgs_plus_cnvkit_wxs_plus_freec_tumor_only_x_and_y.tsv.gz | Analysis file | focal-cn-file-preparation | TSV file containing genes with copy number changes per biospecimen; sex chromosomes only |consensus_wgs_plus_cnvkit_wxs_plus_freec_tumor_only_autosomes.tsv.gz | Analysis file | focal-cn-file-preparation | TSV file containing genes with copy number changes per biospecimen; autosomes chromosomes only |snv-mutation-tmb-all.tsv | Processed data file | tmb-calculation | TSV file with sample names and their tumor mutation burden counting all variants |snv-mutation-tmb-coding.tsv | Processed data file | tmb-calculation | TSV file with sample names and their tumor mutation burden counting all variants in coding region only |sv-manta.tsv.gz| Processed data file | Structural variant calling; Workflow | Somatic Structural Variant - Manta output, annotated with AnnotSV (WGS samples only) |splice-events-rmats.tsv.gz| Processed data | Kids First splice variant workflow; Workflow | rMATs single sample workflow |cptac-protein-imputed-phospho-expression-log2-ratio.tsv.gz| Processed data | CPTAC pediatric brain tumor phospho-proteomics expression | Imputed phospho-protein expression, log2 TMT ratio |cptac-protein-imputed-prot-expression-abundance.tsv.gz| Processed data | CPTAC pediatric brain tumor protein expression | Imputed whole cell protein expression, total abundance |cptac-protein-imputed-prot-expression-log2-ratio.tsv.gz| Processed data | CPTAC pediatric brain tumor protein expression | Imputed whole cell protein expression, log2 TMT ratio |gbm-protein-imputed-phospho-expression-abundance.tsv.gz| Processed data | CPTAC adult GBM brain tumor phospho-proteomics expression | Imputed phospho-protein expression, total abundance |gbm-protein-imputed-prot-expression-abundance.tsv.gz| Processed data | CPTAC adult GBM brain tumor protein expression | Imputed whole cell expression, total abundance |hope-protein-imputed-phospho-expression-abundance.tsv.gz| Processed data | Adult and Young Adolescent (AYA) brain tumor phospho-proteomics expression (Project HOPE) | Imputed phospho-protein expression, total abundance |hope-protein-imputed-prot-expression-abundance.tsv.gz| Processed data | Adult and Young Adolescent (AYA) brain tumor protein expression (Project HOPE) | Imputed whole cell protein expression, total abundance |rna-dna-qc-stats.tsv| Reference QC file | Quality control metrics for WGS, WXS, DNA panel, and RNA-Seq samples | Used to filter samples for data release |mirna-expression-counts.rds| Processed data | miRNA expression counts | Generated from HTG-Seq independent-specimens.methyl.primary.tsv
independent-specimens.methyl.relapse.tsv
independent-specimens.rnaseq.primary.eachcohort.tsv
independent-specimens.rnaseq.primary.tsv
independent-specimens.rnaseq.relapse-pre-release.tsv
independent-specimens.rnaseq.relapse.eachcohort.tsv
independent-specimens.rnaseq.relapse.tsv
independent-specimens.rnaseq.primary-plus-pre-release.tsv
independent-specimens.rnaseqpanel.primary-plus.pre-release.tsv
independent-specimens.rnaseqpanel.primary-plus.tsv
independent-specimens.rnaseqpanel.primary.eachcohort.tsv
independent-specimens.rnaseqpanel.primary.tsv
independent-specimens.rnaseqpanel.relapse.eachcohort.tsv
independent-specimens.rnaseqpanel.relapse.tsv
independent-specimens.wgs.primary-plus.eachcohort.tsv
independent-specimens.wgs.primary-plus.tsv
independent-specimens.wgs.primary.eachcohort.tsv
independent-specimens.wgs.primary.tsv
independent-specimens.wgs.relapse.eachcohort.tsv
independent-specimens.wgs.relapse.tsv
independent-specimens.wgswxspanel.primary-plus.eachcohort.prefer.wgs.tsv
independent-specimens.wgswxspanel.primary-plus.eachcohort.prefer.wxs.tsv
independent-specimens.wgswxspanel.primary-plus.prefer.wgs.tsv
independent-specimens.wgswxspanel.primary-plus.prefer.wxs.tsv
independent-specimens.wgswxspanel.primary.eachcohort.prefer.wgs.tsv
independent-specimens.wgswxspanel.primary.eachcohort.prefer.wxs.tsv
independent-specimens.wgswxspanel.primary.eachcohort.tsv
independent-specimens.wgswxspanel.primary.prefer.wgs.tsv
independent-specimens.wgswxspanel.primary.prefer.wxs.tsv
independent-specimens.wgswxspanel.primary.tsv
independent-specimens.wgswxspanel.relapse.eachcohort.prefer.wgs.tsv
independent-specimens.wgswxspanel.relapse.eachcohort.prefer.wxs.tsv
independent-specimens.wgswxspanel.relapse.eachcohort.tsv
independent-specimens.wgswxspanel.relapse.prefer.wgs.tsv
independent-specimens.wgswxspanel.relapse.prefer.wxs.tsv
independent-specimens.wgswxspanel.relapse.tsv
independent-specimens.methyl.primary-plus.eachcohort.tsv
independent-specimens.methyl.primary.eachcohort.tsv
independent-specimens.methyl.relapse.eachcohort.tsv| Analysis files | independent-samples | Independent (non-redundant) sample list of DNA, RNA, or methylation samples of all sequencing methods, from primary, primary-plus, or relapse tumors within each or across all cohorts independent-specimens.rnaseq.primary-plus-pre-release.tsv
independent-specimens.rnaseq.primary-pre-release.tsv
independent-specimens.rnaseq.primary-pre-release.tsv
independent-specimens.rnaseq.relapse-pre-release.tsv | Analysis files | independent-samples | Independent (non-redundant) sample list of RNA samples of all sequencing methods, from primary, primary-plus, or relapse tumors across all cohorts for the purposes of running fusion_filtering pre-release