Cell-of-origin analysis for "Histone H3.3G34-mutant interneuron progenitors co-opt PDGFRA for gliomagenesis" (Chen*, Deshmukh*, Jessa*, Hadjadj*, et al, Cell, 2020)
This repository contains the code & data for the bulk analysis included the G34R/V HGG manuscript (Chen, Deshmukh, Jessa, Hadjadj, et al, Cell, 2020), for the analysis that was performed by our lab.
This repository is meant to enhance the STAR Methods section by providing code for the custom analyses in the manuscript and the exact R dependencies, in order to improve reproducibility for the main results. However, it is not a fully executable workflow. The code is permanently archived on Zenodo at the doi 10.5281/zenodo.7086446.
NOTE: if viewing on GitHub, only code is visible, as inputs, data, and outputs are generally not tracked in git.
Contents:
reference_datasets
: data, code, and processed outputs for scRNAseq
normal brain reference datasets, with one sub-directory per publication.
For external publications, the process
of obtaining or deriving cluster markers used in later analysis is recorded here.bulk_transcriptome_epigenome
: data, code, figures and output for
bulk RNA-seq and ChIP-seq analysis of patient tumors and tumor-derived cell lines.
functions.R
and ssgsea.R
singlecell_normal
: data, code, figures and output for analysis of the scRNAseq
normal brain data
functions.R
renv
: directory maintained by the R package renv
, containing the isolated
project specific libraryinclude
: shared templates, Rmd/HTML headers/footers, and R functions used
throughout the analysisThis section contains a pointer from each figure in the paper to the section (§) where it’s generated in the code. For each figure panel, I provide a partial path to the RMD/MD/HTML files within this repository/directory, and then the section in the rendered HTML which specifically produces that panel. As described below in the section on reproducibility, the source data for the figure is typically saved alongside the figure itself.
bulk_transcriptome_epigenome/02-GSEA...
, § 6.1.1 Forebrain referencebulk_transcriptome_epigenome/02-GSEA...
, § 6.1.2 Striatal SVZbulk_transcriptome_epigenome/01-bulk_RNAseq_pipeline...
§ 4.3.2 Lineage specific TFsbulk_transcriptome_epigenome/03-ChIPseq...
, § 4.2.1 DGEsinglecell_normal/analysis/01-interneuron_pseudotime...
,
bulk_transcriptome_epigenome/02-GSEA...
, § 6.2 GSEA enrichment plotsbulk_transcriptome_epigenome/02-GSEA...
, § 6.4 Confirmation of signal by direct expression of gene programsbulk_transcriptome_epigenome/02-GSEA...
, § 6.1.3 Adult V-SVZbulk_transcriptome_epigenome/01-bulk_RNAseq_pipeline...
§ 4.3.2 Lineage specific TFsbulk_transcriptome_epigenome/analysis/04-isogenic_cell_lines...
,
bulk_transcriptome_epigenome/analysis/04-isogenic_cell_lines...
§ 4.5.4 Visualize resultssinglecell_normal/analysis/02-gene_bubbleplots...
,
bulk_transcriptome_epigenome/01-bulk_RNAseq_pipeline...
§ 4.4.1 G34 mutantssinglecell_normal/analysis/02-gene_bubbleplots...
, § 4.2 Humansinglecell_normal/analysis/03-astrocyte_interneuron_coexpression...
, § 5.3 Developing mouse forebrain and 5.4 H3G34R/V tumorssinglecell_normal/analysis/03-astrocyte_interneuron_coexpression...
, § 5.2 Human fetal telencephalonrr
template & helpersThis repository uses the rr
template, which contains
a set of R markdown templates to help ensure reproducibility. This template also
provides a set of helper functions (located in rr_helpers.R
and prefixed by rr_
in the
function name) to help encourage documentation.
The R library for this project is managed with the package renv
,
which:
renv
folder,renv.lock
, which
can be used to reproduce the R package environment elsewhereThe R version used is 3.5.1. The analysis also makes use of our in-house package for scRNAseq visualization, cytobox.
Each markdown/HTML file has a “Reproducibility report” at the bottom, indicating when the document was last rendered, the most recent git commit when it was rendered, the seed, and the R session info.
For most figures, the source data underlying the plot is saved along side the figure
in the respective figures
directory. If so, a message is displayed
in the markdown/HTML files underneath the chunk which produces the plot,
giving the path for the figures/source data within this project directory.
e.g. [figure/source data @ G34-gliomas/bulk_transcriptome_epigenome/figures/01/gsx2_pdgfra_correlation…]
For most text file & R object outputs, there is a text file saved next to the object
with the extension .desc
, with a very brief one-line description of what’s contained in the file.
e.g. for the output file bulk_transcriptome_epigenome/output/02/fgsea_df.tsv
,
there is an associated description file bulk_transcriptome_epigenome/output/02/fgsea_df.desc
This directory is tracked with git and has an associated GitHub repository in the Kleinman lab account at https://github.com/fungenomics/G34-gliomas.
The following are tracked / available on GitHub:
.Rmd
files, containing the code, and .md
files, containing code and outputspng
formattsv
/Rda
/Rds
), if they’re smalldesc
files for otputsrenv
packageThe following are not tracked / available on GitHub:
pdf
format, and figure source dataSelin Jessa (selin.jessa at mail.mcgill.ca)