This directory contains various analysis modules in the OpenPedCan project. See the README of an individual analysis modules for more information about that module.
The table below is intended to help project organizers quickly get an idea of what files (and therefore types of data) are consumed by each analysis module, what the module does, and what output files it produces that can be consumed by other analysis modules.
This is in service of documenting interdependent analyses.
Note that nearly all modules use the harmonized clinical data file (histologies.tsv
) even when it is not explicitly included in the table below.
| Module | Input Files | Brief Description | Produces files for data release? | Output Files Consumed by Other Analyses | Adapted for OPC? | Run Platform | Action Plan |
|——————————————————————————————————————————————————————————–|———————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————-|—————————————————————————————————————————————————————————————————————————————————————————————|—————————————————————————————————————————————————————————————————————|——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————-|——————|————–|——————————|
| chromosomal-instability | histologies.tsv
sv-manta.tsv.gz
cnv-cnvkit.seg.gz
| Evaluates chromosomal instability by calculating chromosomal breakpoint densities and by creating circular plot visuals | No | breakpoint-data/union_of_breaks_densities.tsv
| No | N/A | Will Adapt for OT |
| chromothripsis | sv-manta.tsv.gz
cnv-consensus.seg.gz
independent-specimens.wgs.primary-plus.tsv
| chromothripsis analysis per #1007 | No | N/A | No | N/A | N/A |
| cnv-chrom-plot | cnv-consensus-gistic.zip
cnv-consensus.seg
| Plots genome wide visualizations relating to copy number results | No | N/A | No | N/A | N/A |
| cnv-frequencies (DEPRECATED) | histologies.tsv
consensus_wgs_plus_cnvkit_wxs.tsv.gz
independent-specimens.wgswxspanel.primary.eachcohort.tsv
independent-specimens.wgswxspanel.relapse.eachcohort.tsv
independent-specimens.wgswxspanel.primary.tsv
independent-specimens.wgswxspanel.relapse.tsv
| Annotate CNV table with mutation frequencies | No | results/gene-level-cnv-consensus-annotated-mut-freq.jsonl.gz
results/gene-level-cnv-consensus-annotated-mut-freq.tsv.gz
| Yes | GitHub | N/A |
| collapse-rnaseq (DEPRECATED) | gene-expression-rsem-tpm.rds
gencode.v39.primary_assembly.annotation.gtf.gz
| Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. | Yes | results/gene-expression-rsem-fpkm-collapsed.rds
included in data download; too large for tracking via GitHub | Yes | CAVATICA | N/A |
| comparative-RNASeq-analysis (DEPRECATED) | gene-expression-rsem-tpm.rds
histologies.tsv
mend-qc-manifest.tsv
mend-qc-results.tar.gz
| In progress; will produce expression outlier profiles per #229 | No | N/A | No | N/A | N/A |
| compare-gistic (DEPRECATED) | cnv-consensus-gistic.zip
analyses/run-gistic/results/cnv-consensus-hgat-gistic.zip
analyses/run-gistic/results/cnv-consensus-lgat-gistic.zip
analyses/run-gistic/results/cnv-consensus-medulloblastoma-gistic.zip
| Comparison of the GISTIC results of the entire cohort with the GISTIC results of three individual histolgies, namely, LGAT, HGAT and medulloblastoma #547 | No | N/A | No | N/A | N/A |
| copy_number_consensus_call | cnv-cnvkit.seg.gz
cnv-controlfreec.tsv.gz
sv-manta.tsv.gz
| Produces consensus copy number calls per #128 and a set of excluded regions where CNV calls are not made | Yes | results/cnv_consensus.tsv
` ‘results/uncalled_samples.tsv’ results/cnv-consensus.seg.gz included in data download
ref/cnv_excluded_regions.bed
ref/cnv_callable.bed | Yes | CAVATICA | N/A |
| [create-subset-files](https://github.com/d3b-center/OpenPedCan-analysis/blob/dev/analyses/create-subset-files) | All files | This module contains the code to create the subset files used in continuous integration | No | All subset files for continuous integration | No | N/A | Will set up for OT ticket in |
| [data-pre-release-qc](https://github.com/d3b-center/OpenPedCan-analysis/blob/dev/analyses/data-pre-release-qc) |
histologies-base.tsv
gene-counts-rsem-expected_count-collapsed.rds
gene-expression-rsem-tpm-collapased.rds
tcga-gene-counts-rsem-expected_count-collapsed.rds
tcga-gene-expression-rsem-tpm-collapsed.rds
cnv-cnvkit.seg.gz
cnvkit_with_status.tsv
consensus_wgs_plus_cnvkit_wxs_autosomes.tsv.gz
consensus_wgs_plus_cnvkit_wxs_x_and_y.tsv.gz
snv-mutation-tmb-all.tsv
fusion_summary_embryonal_foi.tsv
fusion_summary_ependymoma_foi.tsv
fusion_summary_lgg_hgg_foi.tsv
fusion_summary_ewings_foi.tsv
biospecimen_id_to_bed_map.txt | Performs QC on data pre-release files with requirements which should pass before hand off between BIXU Engineering team to the OpenPedCan team | Yes | 'data-pre-release-qc.nb.html' | No | N/A | N/A |
| [efo-mondo-mapping](https://github.com/d3b-center/OpenPedCan-analysis/blob/dev/analyses/efo-mondo-mapping) (**DEPRECATED**) |
histologies.tsv
efo-mondo-map.tsv | This module contains a file with EFO, MONDO, and NCIT codes for all cancer_group found in histologies.tsv and runs a script to qc in case any cancer_group is missed | Yes |
efo-mondo-mapping.tsv | Yes | N/A | Yes |
| [filter-mtp-tables](https://github.com/d3b-center/OpenPedCan-analysis/blob/dev/analyses/filter-mtp-tables) (**DEPRECATED**) |
gencode.v39.primary_assembly.annotation.gtf.gz
PMTL_v1.1.tsv
histologies.tsv
gene-level-snv-consensus-annotated-mut-freq.tsv.gz
snv-consensus-plus-hotspots.maf.tsv.gz
variant-level-snv-consensus-annotated-mut-freq.tsv.gz
gene-level-cnv-consensus-annotated-mut-freq.tsv.gz
consensus_wgs_plus_cnvkit_wxs.tsv.gz
putative-oncogene-fusion-freq.tsv.gz
fusion-putative-oncogenic.tsv
putative-oncogene-fused-gene-freq.tsv.gz
long_n_tpm_mean_sd_quantile_gene_wise_zscore.tsv.gz
long_n_tpm_mean_sd_quantile_group_wise_zscore.tsv.gz | Remove Ensembl (ESNG) gene identifier in the OpenPedCan mutation frequency tables, including SNV, CNV, fusion, and TPM expression tables that are not in GENCODE v39 and Ensembl package 104. | No | All files from module
results directory | Yes | N/A | Yes |
| [focal-cn-file-preparation](https://github.com/d3b-center/OpenPedCan-analysis/blob/dev/analyses/focal-cn-file-preparation) |
cnv-cnvkit.seg.gz
cnv-controlfreec.tsv.gz
gene-expression-rsem-tpm-collapsed.rds
cnv-consensus.seg.gz | Maps from copy number variant caller segments to gene identifiers; will be updated to take into account changes that affect entire cytobands, chromosome arms [#186](https://github.com/d3b-center/OpenPedCan-analysis/issues/186) | Yes |
cnvkit_annotated_cn_wxs_autosomes.tsv.gz
cnvkit_annotated_cn_wxs_x_and_y.tsv.gz
consensus_seg_annotated_cn_autosomes.tsv.gz
consensus_seg_annotated_cn_x_and_y.tsv.gz
consensus_seg_most_focal_fn_status.tsv.gz
consensus_seg_recurrent_focal_cn_units.tsv consensus_seg_with_ucsc_cytoband_status.tsv.gz
consensus_wgs_plus_cnvkit_wxs_autosomes.tsv.gz
included in data download consensus_wgs_plus_cnvkit_wxs_x_and_y.tsv.gz
included in data download | Yes | CAVATICA | N/A |
| fusion_filtering | fusion-arriba.tsv.gz
fusion-starfusion.tsv.gz
independent-specimens.rnaseq.primary.tsv
independent-specimens.rnaseq.relapse.tsv
| Standardizes, filters, and prioritizes fusion calls | Yes | results/fusion-putative-oncogenic.tsv
included in data download results/fusion-recurrent-fusion-bycancergroup.tsv
results/fusion-recurrent-fusion-bysample.tsv
results/fusion-recurrently-fused-genes-bycancergroup.tsv
results/fusion-recurrently-fused-genes-bysample.tsv
| Yes | GitHub | N/A |
| fusion-frequencies (DEPRECATED) | histologies.tsv
fusion-putative-oncogenic.tsv
fusion-dgd.tsv.gz
independent-specimens.rnaseqpanel.primary.tsv
independent-specimens.rnaseqpanel.relapse.tsv
independent-specimens.rnaseqpanel.primary.eachcohort.tsv
independent-specimens.rnaseqpanel.relapse.eachcohort.tsv
| Gather counts and frequencies for fusion per cancer_group and cohort | results/putative-oncogene-fused-gene-freq.jsonl.gz
results/putative-oncogene-fused-gene-freq.tsv.gz
results/putative-oncogene-fusion-freq.jsonl.gz
results/putative-oncogene-fusion-freq.tsv.gz
| N/A | Yes | GitHub | N/A |
| fusion-summary | histologies.tsv
fusion-putative-oncogenic.tsv
fusion-arriba.tsv.gz
fusion-starfusion.tsv.gz
| Generate summary tables from fusion files (#398; #623) | Yes | results/fusion_summary_embryonal_foi.tsv
results/fusion_summary_ependymoma_foi.tsv
results/fusion_summary_ewings_foi.tsv
| Yes | GitHub | N/A |
| gene_match (DEPRECATED) | GTF file sources: gencode v28 gencode v38 open_ped_can_v7_ensg-hugo-rmtl-mapping.tsv
| This module reads GTF file and formats attributes
to extract gene symbol with gene ensembl ID. | Yes | ensg-hugo-pmtl-mapping.tsv
| Yes | GitHub | N/A |
| gene-set-enrichment-analysis | gene-expression-rsem-tpm-collapsed.rds
histologies.tsv
| Updated gene set enrichment analysis with appropriate RNA-seq expression data | No | results/gsva_scores.tsv
combined file for all RNA library types | Yes | GitHub | Move to CAVATICA |
| hotspots-detection (DEPRECATED) | snv-strelka2.vep.maf.gz
snv-mutect2.vep.maf.gz
snv-vardict.vep.maf.gz
snv-lancet.vep.maf.gz
| Scavenges cancer any hotspot calls from each caller and merges with consensus (3/3) calls if it was missed in snv-caller workflow. | No | snv-hotspots-mutation.maf.tsv.gz
| No | CAVATICA | N/A |
| immune-deconv | gene-expression-rsem-tpm-collapsed.rds
data/histologies.tsv
| Immune/Stroma characterization across PBTA part of #15 | No | xcell_output.rds
quantiseq_output.rds
| No | N/A | N/A |
| independent-samples | histologies.tsv
| Generates independent specimen lists for WGS/WXS samples | Yes | results/independent-specimens.wgswxspanel.primary.tsv
included in data download results/independent-specimens.wgswxspanel.relapse.tsv
included in data download results/independent-specimens.wgswxspanel.primary.eachcohort.tsv
included in data download results/independent-specimens.wgswxspanel.relapse.eachcohort.tsv
included in data download results/independent-specimens.wgswxspanel.primary.prefer.wxs.tsv
included in data download results/independent-specimens.wgswxspanel.relapse.prefer.wxs.tsv
included in data download results/independent-specimens.wgswxspanel.primary.eachcohort.prefer.wxs.tsv
included in data download results/independent-specimens.wgswxspanel.relapse.eachcohort.prefer.wxs.tsv
included in data download results/independent-specimens.rnaseq.primary.tsv
included in data download results/independent-specimens.rnaseq.relapse.tsv
included in data download results/independent-specimens.rnaseq.primary.eachcohort.tsv
included in data download results/independent-specimens.rnaseq.relapse.eachcohort.tsv
included in data download | Yes | GitHub | N/A |
| interaction-plots | independent-specimens.wgs.primary-plus.tsv
snv-consensus-mutation.maf.tsv.gz
| Creates interaction plots for mutation mutual exclusivity/co-occurrence #13; may be updated to include other data types e.g., fusions | No | N/A | No | N/A | N/A |
| long-format-table-utils (DEPRECATED) | ensg-hugo-rmtl-mapping.tsv
analyses/fusion_filtering/references/genelistreference.txt
efo-mondo-map.tsv
uberon-map-gtex-group.tsv
uberon-map-gtex-subgroup.tsv
| Functions and scripts for handling long-format tables | No | annotator/annotation-data/ensg-gene-full-name-refseq-protein.tsv
annotator/annotation-data/oncokb-cancer-gene-list.tsv
| Yes | GitHub | N/A |
| methylation-preprocessing (DEPRECATED) | TARGET_Normal_MethylationArray_20160812.sdrf.txt
TARGET_NBL_MethylationArray_20160812.sdrf.1.txt
TARGET_NBL_MethylationArray_20160812.sdrf.2.txt
TARGET_CCSK_MethylationArray_20160819.sdrf.txt
TARGET_OS_MethylationArray_20161103.sdrf.txt
TARGET_WT_MethylationArray_20160831.sdrf.txt
TARGET_AML_MethylationArray_20160812_450k.sdrf.1.txt
TARGET_AML_MethylationArray_20160812_450k.sdrf.2.txt
TARGET_AML_MethylationArray_20160812_27k.sdrf.1.txt
TARGET_AML_MethylationArray_20160812_27k.sdrf.2.txt
TARGET_AML_MethylationArray_20160812_27k.sdrf.3.txt
manifest_methylation_CBTN_20220410.1.csv
manifest_methylation_CBTN_20220410.2.csv
manifest_methylation_CBTN_20220410.3.csv
manifest_methylation_CBTN_20220410.4.csv
| Preprocess probe hybridization intensity values of selected methylated and unmethylated cytosine (CpG) loci into usable methylation measurements for the Pediatric Open Targets, OpenPedCan-analysis raw DNA methylation array datasets. | No | N/A | Yes | Cavatica | N/A |
| methylation-summary (DEPRECATED) | infinium.gencode.v39.probe.annotations.tsv.gz
independent-specimens.rnaseqpanel.eachchort.tsv
independent-specimens.methyl.eachcohort.tsv
gene-expression-rsem-tpm-collapsed.rds
rna-isoform-expression-rsem-tpm.rds
methyl-beta-values.rds
efo-mondo-map.tsv
histlogies.tsv
| Summarize preprocessed Illumina Infinium Human Methylation array measurements produced by the OpenPedCan methylation preprocessing module and Illumina infinium methylation array CpG probe coordinates. | No | N/A | No | aws | N/A |
| molecular-subtyping-ATRT | histologies-base.tsv
| Molecular subtyping of ATRT samples | No | NA | GitHub | N/A | |
| molecular-subtyping-CRANIO | histologies-base.tsv
snv-consensus-plus-hotspots.maf.tsv.gz
| Molecular subtyping of craniopharyngiomas samples #810 | No | results/CRANIO_molecular_subtype.tsv
| No | N/A | Prepare for scaling |
| molecular-subtyping-EPN | histologies-base.tsv
gene-expression-rsem-tpm-collapsed.rds
analyses/chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv
analyses/fusion-summary/results/fusion_summary_ependymoma_foi.tsv
analyses/gene-set-enrichment-analysis/results/gsva_scores.tsv
| molecular subtyping of ependymoma tumors | No | results/EPN_all_data_withsubgroup.tsv
| No | N/A | Will Adapt for OT |
| molecular-subtyping-EWS | histologies-base.tsv
analyses/fusion-summary/results/fusion_summary_ewings_foi.tsv
| Reclassification of tumors based on the presence of defining fusions for Ewing Sarcoma per #623 | No | results/EWS_samples.tsv
| No | N/A | Will Adapt for OT |
| molecular-subtyping-HGG | histologies-base.tsv
snv-consensus-plus-hotspots.maf.tsv.gz
consensus_wgs_plus_cnvkit_wxs.tsv.gz
fusion-putative-oncogenic.tsv
cnv-consensus-gistic.zip
gene-expression-rsem-tpm-collapsed.rds
tp53_altered_status.tsv
| Molecular subtyping of high-grade glioma samples #249 | No | results/HGG_molecular_subtype.tsv
| Yes | GitHub | N/A |
| molecular-subtyping-LGAT | histologies-base.tsv
snv-consensus-plus-hotspots.maf.tsv.gz
fusion-putative-oncogenic.tsv
analyses/fusion_filtering/results/fusion-recurrently-fused-genes-bysample.tsv
| Molecular subtyping of Low-grade astrocytic tumor samples #631 | No | results/lgat_subtyping.tsv
| Yes | GitHub | N/A |
| molecular-subtyping-MB | histologies-base.tsv
gene-expression-rsem-tpm-collapsed.rds
| Molecular classification of Medulloblastoma subtypes part of #116 | No | results/MB_molecular_subtype.tsv
| Yes | GitHub | N/A |
| molecular-subtyping-SHH-tp53 | histologies
snv-consensus-plus-hotspots.maf.tsv.gz
| Deprecated; Identify the SHH-classified medulloblastoma samples that have TP53 mutations #247 | No | N/A | No | N/A | N/A |
| molecular-subtyping-chordoma | analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz
gene-expression-rsem-fpkm-collapsed.stranded.rds
| identifying poorly-differentiated chordoma samples per #250 | No | N/A | No | N/A | Will Adapt for OT |
| molecular-subtyping-embryonal | histologies-base.tsv
analyses/fusion-summary/fusion_summary_embryonal_foi.tsv
sv-manta.tsv.gz
consensus_wgs_plus_cnvkit_wxs.tsv.gz
analyses/focal-cn-file-preparation/cnvkit_annotated_cn_x\_and_y.tsv.gz
analyses/focal-cn-file-preparation/controlfreec_annotated_cn_x\_and_y.tsv.gz
gene-expression-rsem-tpm-collapsed.rds
| Molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors #251 | No | results/embryonal_tumor_molecular_subtypes.tsv
| No | N/A | Will Adapt for OT |
| molecular-subtyping-integrate | histologies-base.tsv
results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv
| Add molecular subtype information to base histology | No | results/histologies.tsv
| Yes | GitHub | N/A |
| molecular-subtyping-NBL | histologies-base.tsv
consensus_wgs_plus_cnvkit_wxs.tsv.gz
cnv-cnvkit.seg.gz
cnv-controlfreec.tsv.gz
gene-expression-rsem-tpm-collapsed.rds
analyses/molecular-subtyping-NBL/input/gmkf_patient_clinical_mycn_status.tsv
analyses/molecular-subtyping-NBL/input/target_patient_clinical_mycn_status.tsv
| molecular subtyping of NBL tumors #417 | No | results/NBL_MYCN_Subtype.tsv
results/Alteration_Table.tsv
results/Subtypes_Based_On_Cutoff.tsv
results/QC_table.tsv
| Yes | EC2 | N/A |
| molecular-subtyping-neurocytoma | histologies-base.tsv
| Molecular subtyping of Neurocytoma samples #805 | No | results/neurocytoma_subtyping.tsv
| No | N/A | Will Adapt for OT |
| molecular-subtyping-pathology | analyses/molecular-subtyping-CRANIO/results/CRANIO_molecular_subtype.tsv
analyses/molecular-subtyping-EPN/results/CRANIO_molecular_subtype.tsv
analyses/molecular-subtyping-MB/results/MB_molecular_subtype.tsv
analyses/molecular-subtyping-neurocytoma/results/neurocytoma_subtyping.tsv
analyses/molecular-subtyping-EWS/results/EWS_samples.tsv
analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv
analyses/molecular-subtyping-LGAT/results/lgat_subtyping.tsv
analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv
| Compile output from other molecular subtyping modules and incorporate pathology feedback #645 | No | choroid_plexus_papilloma_subtypes.tsv
cns-lymphoma-subtypes.tsv
compiled_molecular_subtypes.tsv
compiled_molecular_subtypes_and_report_info.tsv
compiled_molecular_subtypes_with_clinical_feedback_and_report_info.tsv
compiled_molecular_subtypes_with_clinical_pathology_feedback_and_report_info.tsv
cranio_adam_subtypes.tsv
glialneuronal_tumor_subtypes.tsv
juvenile-xanthogranuloma-subtypes.tsv
lgat-pathology-free-text-subtypes.tsv
meningioma_subtypes.tsv
| Yes | GitHub | N/A |
| molecular-subtyping-PB | histologies-base.tsv
| Molecular subtyping of Pineoblastoma samples PR #476 | No | results/pineo-molecular-subtypes.tsv
| Yes | GitHub | N/A |
| mtp-annotations (DEPRECATED) | scratch/mtp-json/targets/
scratch/mtp-json/diseases/
| This module transforms OpenTargetsPlatform Target (core annotations for targets) and Disease/Phenotype (core annotations for diseases and phenotypes) tables into mapping files utilized in filtering MTP designated tables and OPC data release files for plotting API development | No | N/A | local | N/A | N/A |
| mtp-tables-qc-checks (DEPRECATED) | gene-level-cnv-consensus-annotated-mut-freq.tsv.gz
gene-level-snv-consensus-annotated-mut-freq.tsv.gz
gene-variant-snv-consensus-annotated-mut-freq.tsv.gz
putative-oncogene-gused-gene-freq.tsv.gz
putative-oncogene-fusion-freq.tsv.gz
long_n_tpm_mean_sd_quantitle_gene_wise_zscore.tsv.gz
long_n_tpm_mean_sd_quatile_group_wise_zscore.tsv.gz
| Performs summary and QC checks comparing the current and the previous OPC mutation frequencies table | No | N/A | No | N/A | N/A |
| mutational-signatures | snv-consensus-plus-hotspots.maf.tsv.gz
| Performs COSMIC and Alexandrov et al. mutational signature analysis using the consensus SNV data | No | N/A | No | N/A | N/A |
| mutect2-vs-strelka2 (DEPRECATED) | snv-mutect2.vep.maf.gz
snv-strelka2.vep.maf.gz
| Deprecated; comparison of only two SNV callers, subsumed by snv-callers
| No | N/A | No | N/A | N/A |
| oncoprint-landscape | snv-consensus-plus-hotspots.maf.tsv.gz
fusion-putative-oncogenic.tsv
analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz
independent-specimens.\*
| Combines mutation, copy number, and fusion data into an OncoPrint plot #6; will need to be updated as all data types are refined | No | N/A | No | N/A | N/A |
| pedcbio-cnv-prepare | consensus_wgs_plus_cnvkit_wxs_autosomes.tsv.gz
consensus_wgs_plus_cnvkit_wxs_x\_and_y.tsv.gz
| Generate annotated CNV files that are similar to seg files for PedCBio uploads to include all samples with neutral CNV calls | Yes | Upload to PedCBio S3 bucket for ingestion | GitHub | N/A | N/A |
| pedcbio-sample-name | histologies.tsv
input\cbtn_cbio_sample.csv
input\dgd_cbio_sample.csv
input\oligo_nation_cbio_sample.csv
input\x01_fy16_nbl_maris_cbio_sample.csv
| For some of the samples, when multiple DNA or RNA specimens are associated with the same sample, there is no column that would distinguish between different aliquots while still tying DNA and RNA together. | Yes | Upload to PedCBio S3 bucket for ingestion | GitHub | N/A | N/A |
| pedot-table-column-display-order-name | analyses/snv-frequencies/results/gene-level-snv-consensus-annotated-mut-freq.tsv
analyses/snv-frequencies/results/variant-level-snv-consensus-annotated-mut-freq.tsv.gz
analyses/cnv-frequencies/results/gene-level-cnv-consensus-annotated-mut-freq.tsv.gz
analyses/fusion-frequencies/results/putative-oncogene-fused-gene-freq.tsv.gz
analyses/fusion-frequencies/results/putative-oncogene-fusion-freq.tsv.gz
analyses/rna-seq-expression-summary-stats/results/long_n\_tpm_mean_sd_quantile_gene_wise_zscore.tsv.gz
analyses/rna-seq-expression-summary-stats/results/long_n\_tpm_mean_sd_quantile_group_wise_zscore.tsv.gz
| Generate and validate an Excel spreadsheet for Pediatric Open Targets PedOT website table display orders and names | No | Upload to FNL BOX | Yes | GitHub | N/A |
| rna-seq-composition (DEPRECATED) | gene-expression-rsem-tpm.rds
histologies.tsv
mend-qc-results.tar.gz
mend-qc-manifest.tsv
star-log-manifest.tsv
star-log-final.tar.gz
| Analyzes the fraction of read types that comprise each RNA-Seq sample; flags samples with unusual composition | No | N/A | No | N/A | N/A |
| rnaseq-batch-correct | gene-counts-rsem-expected_count-collapsed.rds
histologies.tsv
hk_genes_normals.rds
[positive_control_genes].rds
| RUVseq-DESeq2 batch-corrected DGE analysis | Yes | N/A | Yes | Github | N/A |
| rna-seq-expression-summary-stats (DEPRECATED) | gene-expression-rsem-tpm-collapsed.rds
histologies.tsv
| Calculate TPM summary statistics within each cancer group and cohort. #51. | No | Upload to FNL Box | Yes | GitHub | N/A |
| run-gistic | histologies.tsv
cnv-consensus.seg.gz
| Runs GISTIC 2.0 on SEG files | Yes | cnv-consensus-gistic.zip
included in data download | Yes | GitHub | Move to CAVATICA |
| sample-distribution-analysis (DEPRECATED) | histologies.tsv
| Produces plots and tables that illustrate the distribution of different histologies in the PBTA data | No | N/A | No | N/A | N/A |
| sex-prediction-from-RNASeq (DEPRECATED) | gene-expression-kallisto.stranded.rds
histologies.tsv
| predicts genetic sex using RNA-seq data #84 | No | N/A | No | N/A | N/A |
| snv-frequencies (DEPRECATED) | histologies.tsv
snv-consensus-plus-hotspots.maf.tsv.gz
snv-dgd.maf.tsv.gz
independent-specimens.wgswxspanel.primary.eachcohort.prefer.wxs.tsv
independent-specimens.wgswxspanel.relapse.eachcohort.prefer.wxs.tsv
independent-specimens.wgswxspanel.primary.prefer.wxs.tsv
independent-specimens.wgswxspanel.relapse.prefer.wxs.tsv
| Annotate SNV table with mutation frequencies | No | results/gene-level-snv-consensus-annotated-mut-freq.jsonl.gz
results/gene-level-snv-consensus-annotated-mut-freq.tsv.gz
variant-level-snv-consensus-annotated-mut-freq.jsonl.gz
variant-level-snv-consensus-annotated-mut-freq.tsv.gz
| Yes | GitHub | N/A |
| survival-analysis | TBD | In progress; will eventually contain functions for various types of survival analysis #18 | No | N/A | No | N/A | N/A |
| telomerase-activity-prediction | gene-expression-rsem-tpm-collapsed.rds
gene-counts-rsem-expected_count-collapsed.rds
| Quantify telomerase activity across pediatric brain tumors part of #148 | No | results/TelomeraseScores_PTBAPolya_counts
results/TelomeraseScores_PTBAPolya_FPKM.txt
results/TelomeraseScores_PTBAStranded_counts.txt
results/TelomeraseScores_PTBAStranded_FPKM.txt
| No | N/A | N/A |
| tmb-calculation | gencode.v27.primary_assembly.annotation.bed
intersect_strelka_mutect2_vardict_WGS.bed
snv-consensus-plus-hotspots.maf.tsv.gz
biospecimen_id_to_bed_map.tsv
histologies-base.tsv
hg38_strelka.bed
wgs_canonical_calling_regions.hg38.bed
gencode.v39.primary_assembly.annotation.gtf.gz
| The Tumor Mutation Burden calculation is adapted from snv-callers
module of the OpenPBTA-analyses, but uses the consensus SNV calls from 2/4 Mutect2, Strelka2, Lancet, and Vardict callers. | Yes | snv-mutation-tmb-all.tsv
snv-mutation-tmb-coding.tsv
| Yes | GitHub | N/A |
| tmb-compare (DEPRECATED) | snv-consensus-mutation-tmb-coding.tsv
| Compares PBTA tumor mutation burden to adult TCGA data. The D3B TMB calculations TMB_d3b_code
and its comparison notebook compare-tmb-calculations.Rmd
are deprecated. | No | N/A | No | N/A | N/A |
| tp53_nf1_score | snv-consensus-plus-hotspots.maf.tsv
gene-expression-rsem-tpm-collapsed.rds
consensus_wgs_plus_cnvkit_wxs.tsv.gz
| Applies TP53 inactivation, NF1 inactivation, and Ras activation classifiers to RNA-seq data #165 | No | TP53_NF1_snv_alteration.tsv
gene-expression-rsem-tpm-collapsed_classifier_scores.tsv
loss_overlap_domains_tp53.tsv
poly-A_TP53.png
stranded_TP53.png
sv_overlap_tp53.tsv
tp53_altered_status.tsv
| Yes | GitHub | N/A |
| transcriptomic-dimension-reduction | gene-expression-rsem-tpm.rds
gene-expression-kallisto.rds
| Dimension reduction and visualization of RNA-seq data part of #9 | No | N/A | No | N/A | N/A |
| tcga-capture-kit-investigation (DEPRECATED) | snv-lancet.vep.maf.gz
snv-mutect2.vep.maf.gz
snv-strelka2.vep.maf.gz
tcga-snv-lancet.vep.maf.gz
tcga-snv-mutect2.vep.maf.gz
tcga-snv-strelka2.vep.maf.gz
histologies.tsv
tcga-manifest.tsv
WGS.hg38.lancet.unpadded.bed
WGS.hg38.strelka2.unpadded.bed
WGS.hg38.mutect2.vardict.unpadded.bed
| Investigation of the TMB discrepancy between PBTA and TCGA data | No | results/*.bed
| No | GitHub | N/A |
| tumor-gtex-plots (DEPRECATED) | gene-expression-rsem-tpm-collapsed.rds
histologies.tsv
| In progress #38; tumor vs normal and tumor only expression plots | No | results/pan_cancer_plots_cancer_group_level.{tsv, jsonl.gz}
results/pan_cancer_plots_cohort_cancer_group_level.{tsv, jsonl.gz}
results/tumor_normal_gtex_plots_cancer_group_level.{tsv, jsonl.gz}
results/tumor_normal_gtex_plots_cohort_cancer_group_level.{tsv, jsonl.gz}
results/metadata.tsv
plots/\*.png
| Yes | GitHub | N/A |
| tumor-normal-differential-expression (DEPRECATED) | histologies.tsv
gene-counts-rsem-expected_count-collapsed.rds
independent-specimens.rnaseq.primary.tsv
independent-specimens.rnaseq.primary.eachcohort.tsv
gene-expression-rsem-tpm-collapsed.rds
ensg-hugo-pmtl-mapping.tsv
efo-mondo-map.tsv
uberon-map-gtex-subgroup.tsv
| This module takes as input histologies and the RNA-Seq expression matrices data, and performs differential expression analysis for all combinations of GTEx subgroup normal and cancer histology type tumor. | No | N/A | | | | | Yes | Yes | HPC CAVATICA user can create application for personal analysis purpose using scripts provided in the module | N/A |