HGG-oncohistones

Analysis for "K27M in canonical and noncanonical H3 variants occurs in distinct oligodendroglial cell lineages in brain midline gliomas" (Jessa et al, Nature Genetics, 2022)

View the Project on GitHub fungenomics/HGG-oncohistones

DOI

HGG-oncohistones analysis code

Contents:

Contents of this repository

Codebase overview

Codebase structure

Brief explanation of the directory structure:

Materials for the manuscript

Code to reproduce key analyses

Code to reproduce analyses is saved in code and R-4/code. (See here for why two different R versions are used.) When these analyses depend on inputs from pipelines, I’ve tried to note within the R Markdown documents where these scripts/pipelines are located.

This table contains pointers to code for the key analyses associated with each figure. The links in the Analysis column lead to rendered HTMLs.

Figure Analysis Path
Fig 1 Oncoprints summarizing tumor and cell line cohort ./code/00-oncoprints.Rmd
Ext Fig 1 Summary figures for extended mouse brain scRNAseq atlas ./code/05-mouse_atlas.Rmd
Fig 1, Ext Fig 2 cNMF analysis of variable gene programs ./R-4/code/01-cNMF_programs.Rmd
Fig 1 Cell type identity in tumors with automated consensus projections ./R-4/code/02-consensus_projections.Rmd
Ext Fig 2 Analysis of human fetal thalamus and hindbrain scRNAseq data ./code/01A-human_thalamus.{Rmd,html} and ./code/01B-human_hindbrain.{Rmd,html}
Ext Fig 2 Validation of cell type projections using human thalamic fetal brain reference ./R-4/code/02-consensus_projections.Rmd
Ext Fig 3 Characterization of malignant ependymal cells ./code/01C-ependymal_cells.Rmd
Fig 2, 4 Scatterplots for RNAseq/K27ac/K27me3 between H3K27M tumor subtypes ./code/02-bulk_comparisons.Rmd
Fig 2, Ext Fig 4 Systematic HOX analysis/quantification ./code/03A-HOX.Rmd
Fig 3 Analysis of thalamic patterning ./code/04-thalamus.Rmd
Fig 4-6, Ext Fig 4-6 Analysis of dorsal-ventral patterning and NKX6-1/PAX3 activation ./code/03B-NKX61_PAX3.Rmd
Fig 6 Analysis of ACVR1 cell lines ./code/06-ACVR1.Rmd
Fig 7, 8, Ext Fig 7 Analysis of histone marks in tumors & cell lines ./code/07A-histone_marks.Rmd
Ext Fig 8 Comparison of tumor epigenomes to scChIP of normal cell types ./R-4/code/03A-celltype_epigenomic_similarity.Rmd
Fig 8, Ext Fig 9 Heatmaps of H3K27me2/3 in CRISPR experiments ./code/07D-deeptools_*.sh and ./code/07E-deeptools_*.sh

Palettes & custom plotting utilities

Most color palettes (e.g. for tumor groups, genotypes, locations, cell types, HOX genes, etc) and ggplot2 theme elements (theme_min(), no_legend(), rotate_x(), etc) are defined in include/style.R.

Tables

Supplementary tables (included with the manuscript) and processed data tables (on Zenodo) were assembled from the following input/output/figure source data files. (Only tables produced with the code included here are listed below.)

Supplementary table Path
6 ./output/05/TABLE_mouse_sample_info.tsv
7 ./output/05/TABLE_mouse_cluster_info.tsv
8 ./R-4/output/02/TABLE_cNMF_programs_per_sample.tsv
9 ./R-4/output/02/cNMF_metaprogram_signatures.malignant_filt.tsv
10 ./R-4/output/02/TABLE_reference_cnmf_program_overlaps.tsv
11 ./output/01A/TABLE_thalamus_QC.tsv and ./output/01B/TABLE_hindbrain_QC.tsv
12 ./output/01A/info_clusters3.tsv and ./output/01B/info_clusters3.tsv
13 ./output/03A/TABLE_HOX_expression_per_transcript.tsv
14 ./output/03A/TABLE_HOX_H3K27ac_H3K27me3_per_transcript.tsv
16 ./figures/03B/enhancer_diff-1.source_data.tsv
Processed data table Path
1a ./output/02/TABLE_bulk_counts.tsv
1b ./output/02/TABLE_dge_H3.1_vs_H3.3.tsv
1c ./output/02/TABLE_dge_thal_vs_pons.tsv
2a ./output/07A/TABLE_K27me3_CGIs.tsv
2b ./output/07A/TABLE_K27me2_100kb_bins.tsv
3a ./output/02/TABLE_promoter_H3K27ac_H3K27me3_per_sample.tsv

Processed single-cell data

This section describes the scripts used for preprocessing of single-cell data from this project. That includes: sn/scRNAseq, scATACseq, and scMultiome (joint RNA & ATAC in the same cells). This document refers to sn and scRNAseq generally as ‘scRNAseq’. Please see the sample metadata for the technology used to profile each sample. Please see the Methods section of the manuscript for more details on the single-cell profiling.

Single-cell RNAseq data

The pipeline for scRNAseq processing applied per-sample is summarized in this schematic. In general, scripts contain the code to run the analysis and config files contain the parameters or setting specific to a certain iteration of the analysis.

Following Cellranger, the scRNAseq samples have all been processed with the lab’s preprocessing workflow (./code/scripts/scRNAseq_preprocessing.Rmd). Each sample is then subject to several downstream analyses as described in the schematic above, with the associated scripts indicated.

Single-cell ATACseq data

The pipeline for scATACseq processing applied per-sample is summarized in this schematic:

Following Cellranger, preprocessing of the scATAC data is done with a script that builds off the scRNAseq workflow, at ./R-4/code/scripts/preprocessing_scATAC.Rmd. This workflow is run in the scATAC pipeline at ./R-4/data/scATACseq/pipeline_10X_ATAC, with one folder per sample. Each sample is then subject to several downstream analyses as described in the schematic above, run in that sample’s folder, with the associated scripts.

Single-cell Multiome data

The pipeline for scMultiome processing applied per-sample is summarized in this schematic:

Following Cellranger, preprocessing of the scMultiome data is done with a script that builds off the scRNAseq workflow, at ./R-4/code/scripts/preprocessing_scMultiome.Rmd. This workflow is run in the scMultiome pipeline at ./R-4/data/scMultiome/pipeline_10X_Multiome, with one folder per sample. Each sample is then subject to several downstream analyses as described in the schematic above, run in that sample’s folder, with the associated scripts.

Cell annotations matching the paper

For scRNAseq, scATACseq and scMultiome samples, the cell metadata provided with the paper contains several columns matching the analyses used in the paper:

The cell annotations/metadata are included in processed data deposition on Zenodo and on GEO (GSE210568).

Data integration

As described in the Methods, we used the harmony package for integration of single-cell datasets.

Human fetal brain scRNAseq data

Human fetal data brain data for the hindbrain and thalamus were obtained from two studies, Eze et al, Nature Neuroscience, 2021, and Bhaduri et al, Nature, 2021.

Data availability

Notes for reproducibility

rr template & helpers

This repository uses the rr template, which contains a set of R markdown templates to help me ensure reproducibility. Secondly, this also provides a set of helper functions (located in rr_helpers.R and prefixed by rr_ in the function name) to help encourage documentation.

R and R package versions

The R libraries for this project are managed with the package renv. The R versions used are 3.6.1 and 4.1.2, and renv manages one library for each R version.

The renv package:

  1. maintains two isolated project-specific libraries in the renv folder (for R 3.6.1) or R-4/renv folder (for R 4.1.2) - the libraries themselves are not on GitHub
  2. stores packages according to version
  3. records the R, Bioconductor, and package versions in the files renv.lock and R-4/renv.lock, which can be used to reproduce the R package environment.

The reason for using two different R versions is that certain analyses involving 10X Multiome data require versions of Seurat/Signac dependent on R > 4.

R Markdown

Each markdown/HTML file has a “Reproducibility report” at the bottom (example), indicating when the document was last rendered, the most recent git commit when it was rendered, the seed, and the R session info.

Testing

Lightweight testing is performed in certain cases (e.g. validating metadata) using the ensurer package, combined with the testrmd testing framework for R Markdown documents. Certain reusable ensurer contracts (reusable tests) are stored in ./code/functions/testing.R.

GitHub / version control

The following are tracked / available on GitHub:

The following are not tracked / available on GitHub:

Citation

If you use or modify code provided here, please cite this work as follows:

Selin Jessa, Steven Hébert, Samantha Worme, Hussein Lakkis, Maud Hulswit, Srinidhi Varadharajan, Nisha Kabir, and Claudia L. Kleinman. (2022). HGG-oncohistones analysis code. Zenodo. https://doi.org/10.5281/zenodo.6647837

Acknowledgements