NB-FOXR2 project [source]

1 Configuration
2 Overview
3 Libraries
4 Load data
- 4.1 Load mouse model single-cell data
  - 4.1.1 Individual samples
  - 4.1.2 Joint object
5 Load reference information
6 QC
7 Load cell type annotation
- 7.1 Jessa forebrain
- 7.2 Yao forebrain
8 Distinguish tumor / normal cells
- 8.1 By FOXR2 expression
- 8.2 By inferCNV
9 Inspection per sample: dashboard
10 Combined annotations used in paper
11 Plot gene expression
- 11.1 Neuronal and glial programs
12 Save single-nuclei metadata
13 Reproducibility

1 Configuration

Configuration of project directory & analysis outputs:

Show full config

source(here("rr_helpers.R"))

# Set up outputs
message("Document index: ", doc_id)

## Document index: 07.1

# Specify where to save outputs
out        <- here("output", doc_id); dir.create(out, recursive = TRUE)
figout     <- here("figures", doc_id, "/")
cache      <- paste0("/project/kleinman/bhavyaa.chandarana/cache/NB-FOXR2/public/", doc_id, "/")

The root directory of this project is:

## /project/kleinman/bhavyaa.chandarana/from_hydra/2023-05-NB-FOXR2/public

Outputs and figures will be saved at these paths, relative to project root:

## public/output/07.1

## public/figures/07.1//

Setting a random seed:

set.seed(100)

2 Overview

In this document, we analyze murine models generated through in utero electroporation of constructs with Foxr2, or Foxr2 and p53 LOF, into the mouse ganglionic eminences.

Here, we perform QC and cell type annotation of single-nuclei samples sequenced with 10X Multiome (RNA & ATAC).

Cell type annotation was performed with CoRAL (v3.0.0, https://github.com/fungenomics/CoRAL/releases/tag/v3.0.0).

3 Libraries

library(here) 
library(magrittr)
library(tidyr)
library(dplyr)
library(readr)
library(readxl)
library(stringr)
library(glue)
library(purrr)
library(ggplot2)
library(cowplot)
library(Seurat)
library(Signac)
library(scDblFinder) 
library(ComplexHeatmap)
library(tibble)
library(ape)
library(dendextend)
library(patchwork)
library(ggalluvial)
library(viridis)
library(MetBrewer)

source(here("include/style.R"))
source(here("code/functions/scRNAseq.R"))
source(here("code/functions/RNAseq.R"))
source(here("code/functions/ssGSEA.R"))

ggplot2::theme_set(theme_min())

square_theme <- theme(aspect.ratio = 1)
large_text <- theme(text = element_text(size = 15))

4 Load data

4.1 Load mouse model single-cell data

We have two genotypes, Foxr2+p53LOF (n=1) and Foxr2 alone (n=2). Each sample comes from a separate mouse, hence these are biological, not technical replicates.

We will load the individual samples (n=3) as separate objects in a list, as well as a joined object of all samples, without any batch correction or integration ("naive").

4.1.1 Individual samples

Load single-cell Seurat objects.

mouse_samples <- c("Foxr2_p53_r1" = "AN24377",
                   "Foxr2_r1"     = "M7238",
                   "Foxr2_r2"     = "AN22476")

# load all three objects into a list
seurat_mouse <- map(mouse_samples, ~ get(load(here(glue("data/singlecell/pipeline_scMultiome_mm/{.x}/seurat.Rda")))))
names(seurat_mouse) <- names(mouse_samples)

# For now, we will not use the ATAC, so let's make a "light" version of the objects
seurat_mouse <- map(seurat_mouse,
                    function(seurat){
                        
                            seurat[['ATAC']] <- NULL
                            seurat[['peaks']] <- NULL
                            seurat[['promoters']] <- NULL
                        
                        return(seurat)
                    })

Print number of cells per sample:

# print number of cells per sample
map(seurat_mouse, ncol)

## $Foxr2_p53_r1
## [1] 8332
## 
## $Foxr2_r1
## [1] 7528
## 
## $Foxr2_r2
## [1] 1042

# total number of cells in the analysis
map(seurat_mouse, ncol) %>% unname() %>% as.numeric() %>% sum()

## [1] 16902

4.1.2 Joint object

Load joint object (without batch correction or integration) used in figures.

load(here("data/singlecell/integrations/mouse_naive_join/output/seurat_joint.Rda"))
seurat_joint$Joint_cluster <- Idents(seurat_joint)

# For now, we will not use the ATAC, so let's make a "light" version of the object
seurat_joint[['ATAC']] <- NULL
seurat_joint[['peaks']] <- NULL
seurat_joint[['promoters']] <- NULL

# Sanity check: # cells in joint sample = total across individual samples
map(seurat_mouse, ncol) %>% unname() %>% as.numeric() %>% sum() == ncol(seurat_joint)

## [1] TRUE

5 Load reference information

Here, we load cell type categories and color palettes related to the two murine brain atlases used as normal reference for cell type annotation:

developing murine forebrain atlas: Jessa et al Nature Genetics 2022
adult murine brain atlas: Yao et al Nature 2023

Load color palettes and cell type ontology for references used in projection.

5.1 General

palette_mm_sample <- c("Foxr2_p53_r1" = "#2E86AB",
                       "Foxr2_r1" = "#A23B72",
                       "Foxr2_r2" = "#F18F01")

# General broad palette
palette_broad_type <- c("Neuron" = "blue4",
                        "Glia" = "olivedrab3",
                        "Other" = "grey70",
                        "Unresolved" = "grey90",
                        "No consensus" = "grey90")

5.2 Jessa forebrain

Load the cell type ontology (grouping into higher cell classes) and color palette used in the Jessa 2022 publication.

# Jessa forebrain 
load(here("data/singlecell/references_normal/Jessa_NatGenet_2022/ontology_l2_palette.Rda"))

ontology_jessa <- ontology_jessa %>% select(Label, Label_broad = Label_broad_L3) %>% 
    tibble::add_row("Label" = "No consensus", "Label_broad" = "No consensus")
palette_jessa_l3 <- c(palette_jessa_l3, "No consensus" = "gray90", "Unresolved" = "gray90", "Mixed progenitors" = "yellow")

# Write ontology to file for CoRAL input
write_csv(ontology_jessa %>% 
              dplyr::rename("labels" = "Label", "level3" = "Label_broad"),
          file = here("data/singlecell/references_normal/Jessa_NatGenet_2022/ontology.csv"))

Set mapping from Label_broad (level3) to broader cell classes (Neuron/Glia/Other):

jessa_broad <- ontology_jessa %>% 
    mutate(broad_label_Jessa = 
               
               case_when(Label_broad %in% c("MGE inhibitory neurons",
                                            "Excitatory IPC",
                                            "Cortical excitatory neurons",
                                            "Cortical inhibitory neurons",
                                            "Inhibitory IPC",
                                            "Cajal-Retzius neurons",
                                            "Other neurons",
                                            "Thalamic neurons",
                                            "Striatal spiny neurons")  ~ "Neuron",
                         
                         Label_broad %in% c("Astrocytes",
                                            "Ependymal",
                                            "OPC",
                                            "Oligodendrocytes",
                                            "Gliogenic progenitors")        ~ "Glia",
                         
                         Label_broad %in% c("Ventral RGC",
                                            "Dorsal RGC",
                                            "Cortical hem",
                                            "Rostral telencephalic midline",
                                            "Meninges",
                                            "Erythrocytes",
                                            "Endothelial",
                                            "Thalamic precursors",
                                            "Pericytes",
                                            "Immune",
                                            "Mixed progenitors",
                                            "Split cluster containing erythrocytes & neurons") ~ "Other",
                         
                         Label_broad %in% c("No consensus",
                                            "Unresolved")             ~ "No consensus",
                         
                         TRUE                                         ~ as.character(NA)
                         
                         )
            )

5.2.1 TABLE: Jessa reference ontology

Here, we export a supplementary table for the cell type ontology used in the Jessa atlas:

Original cluster label
Label used in figures
Broad label

rr_write_tsv(df = jessa_broad, path = glue("{out}/jessa_reference_labels.tsv"),
             desc = "Levels of labels used for Jessa 2022 atlas")

## ...writing description of jessa_reference_labels.tsv to public/output/07.1/jessa_reference_labels.desc

5.3 Yao forebrain

Guide to acronyms used in cell class labels:

IT-ET = intratelencephalic, extratelencephalic
NP-CT-L6b = Near-projecting corticothalamic layer 6b neurons
OB = Olfactory bulb
CR = Cajal-Retzius (excitatory)
DG = Dentate gyrus (hippocampus)
IMN = Immature neurons
CTX = Cerebral cortex (pallium)
CGE = Caudal ganglionic eminence
MGE = Medial ganglionic eminence
CNU = Cerebral nuclei (subpallium)
LGE = Lateral ganglionic eminence
LSX = Lateral septal complex
HYa = Anterior hypothalamus
HY = Hypothalamus
TH = Thalamus
Astro = Astrocytes
Epen = Ependymal
OPC = Oligodendrocyte precursor cells
Oligo = Oligodendrocytes
OEC = Olfactory ensheathing cells (a type of cortical glia)

Load the class and subclass level cell type ontology from the Yao atlas publication, and define a color palette for plotting cell types:

ontology_yao <- read_csv(here("data/singlecell/references_normal/Yao_Nature_2023__AllenBrainAtlas_mouse_brain_10X/ontology.csv"))

## Rows: 116 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): labels, class_label
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

palette_yao2023_class <- c(
  "01 IT-ET Glut" = "#8B0000",      
  "02 NP-CT-L6b Glut" = "#800000",  
  "03 OB-CR Glut" = "#A52A2A",      
  "04 DG-IMN Glut" = "#c9110e",     
  "05 OB-IMN GABA" = "#4682B4",
  "06 CTX-CGE GABA" = "#6495ED",
  "07 CTX-MGE GABA" = "#135ca0",
  "08 CNU-MGE GABA" = "#1d72c2",
  "09 CNU-LGE GABA" = "#3e50c2",
  "10 LSX GABA" = "#B0E0E6",
  "11 CNU-HYa GABA" = "#87CEFA",
  "12 HY GABA" = "#AFEEEE",
  "13 CNU-HYa Glut" = "#ff2600",
  "14 HY Glut" = "#FF8C00",
  "18 TH Glut" = "#FF4500",
  "30 Astro-Epen" = "#00a385",
  "31 OPC-Oligo" = "#57a607",
  "32 OEC" = "#800080",
  "33 Vascular" = "#8f1857",
  "34 Immune" = "#000000",
  "Unresolved" = "gray90",
  "No consensus" = "gray90"
)

Set mapping from these cell types to broader cell classes (Neuron/Glia/Other):

yao_broad <- data.frame(Label = names(palette_yao2023_class))

yao_broad <- yao_broad %>% 
    
    mutate(broad_label_Yao = 
               case_when(Label %in% c("01 IT-ET Glut",
                                      "02 NP-CT-L6b Glut",
                                      "03 OB-CR Glut",
                                      "04 DG-IMN Glut",
                                      "05 OB-IMN GABA",
                                      "06 CTX-CGE GABA",
                                      "07 CTX-MGE GABA",
                                      "08 CNU-MGE GABA",
                                      "09 CNU-LGE GABA",
                                      "10 LSX GABA",
                                      "11 CNU-HYa GABA",
                                      "12 HY GABA",
                                      "13 CNU-HYa Glut",
                                      "14 HY Glut",
                                      "18 TH Glut")    ~ "Neuron",
                         
                         Label %in% c("30 Astro-Epen",
                                      "31 OPC-Oligo",
                                      "32 OEC")        ~ "Glia",
                         
                         Label %in% c("33 Vascular",
                                      "34 Immune")     ~ "Other",
                         
                         Label %in% c("No consensus",
                                      "Unresolved")    ~ "No consensus",
                         
                         TRUE                          ~ as.character(NA)
                         
                         )
               )

Identify the GE-derived neuron types within Yao atlas labels.

palette_ge_neurons <- c("GE-derived neurons" = "#135ca0",  # Matches blue of MGE neurons
                        "Other neurons" = "#6224b3",       # Violet
                        "Other" = "grey80")

yao_broad <- yao_broad %>% 
    
    mutate(GE_neuron_Yao = 
               case_when(Label %in% c("06 CTX-CGE GABA",
                                      "07 CTX-MGE GABA",
                                      "08 CNU-MGE GABA",
                                      "09 CNU-LGE GABA")    ~ "GE-derived neurons",
                         
                         Label %in% c("01 IT-ET Glut",
                                      "02 NP-CT-L6b Glut",
                                      "03 OB-CR Glut",
                                      "04 DG-IMN Glut",
                                      "05 OB-IMN GABA",
                                      "10 LSX GABA",
                                      "11 CNU-HYa GABA",
                                      "12 HY GABA",
                                      "13 CNU-HYa Glut",
                                      "14 HY Glut",
                                      "18 TH Glut")        ~ "Other neurons",
                         
                         Label %in% c("30 Astro-Epen",
                                      "31 OPC-Oligo",
                                      "32 OEC",
                                      "33 Vascular",
                                      "34 Immune",
                                      "No consensus",
                                      "Unresolved")         ~ "Other",
                         
                         TRUE                               ~ as.character(NA)
                         
                         )
               )

5.3.1 TABLE: Yao reference ontology

Here, we export a supplementary table for the cell type ontology used in Yao 2023 atlas:

Original cluster label
Label used in figures
Broad label
GE vs. Other neuron label

yao_onto <- ontology_yao %>% 
    left_join(., yao_broad, by = c("class_label" = "Label"))

rr_write_tsv(df = yao_onto, path = glue("{out}/yao_reference_labels.tsv"),
             desc = "Levels of labels used for Yao 2023 atlas")

## ...writing description of yao_reference_labels.tsv to public/output/07.1/yao_reference_labels.desc

6 QC

6.0.1 TABLE: Mouse singlecell QC

Export a supplementary table for mouse single-cell RNA-seq QC.

# get sample prep info ---------------------------------------------------------
omega_sc <- read_xlsx(here("data/metadata/20240404-Omegatable-singlecell.xlsx")) %>%
    filter(Aliases %in% names(seurat_mouse)) %>%
    select(ID = Aliases,
           Original_ID = Sample,
           Path,
           Protocol,
           Publication,
           Kit = `kit version`,
           Method_of_dissociation = `Method of dissociation`,
           Starting_material = `Starting material`,
           Starting_material_amount = `tissue (mg)`,
           Targeted_N_cells = `cell/nuclei target`,
           Sequencing_platform = `Instrument`)

## New names:
## • `library yield (nM)` -> `library yield (nM)...36`
## • `copies/ul` -> `copies/ul...37`
## • `copies/ul` -> `copies/ul...38`
## • `library yield (nM)` -> `library yield (nM)...39`

# get bioinformatics QC --------------------------------------------------------
mouse_seurat_qc <- map_dfr(omega_sc$Original_ID,
                           ~ data.table::fread(here(glue("data/singlecell/pipeline_scMultiome_mm/{.x}/preprocessing/output/seurat_metrics.tsv")),
                                               data.table = FALSE) %>%
                               tibble::add_column(Original_ID = .x, .before = 1)) %>%
    {left_join(omega_sc, ., by = "Original_ID")}

# get Cellranger QC ------------------------------------------------------------
cellranger_qc <- map_dfr(mouse_seurat_qc$Path,
                          ~ data.table::fread(file.path("/project/kleinman/singlecell_pipeline/data/",
                                                        .x,
                                                        "cellranger-arc_count_manavConstruct/summary.csv"), data.table = FALSE) %>%
                              mutate_all(as.character) %>%
                              dplyr::rename(Original_ID = `Sample ID`))

# get processing params --------------------------------------------------------
config_params <- imap_dfr(seurat_mouse, ~ .x@misc$params$config %>%
                              # drop the parameter which indicates the genes plot during preprocessing
                              discard_at("genes") %>%
                              data.frame() %>%
                              mutate(Sample = .y))

config_params2 <- config_params %>%
    select(ID = Sample,
           Min_cells = min_cells,
           N_principal_components_RNA = rna_pcs_keep,
           N_principal_components_ATAC = atac_pcs_keep,
           Clustering_resolution = clustering_resolution,
           Seed = seed)

# combine ----------------------------------------------------------------------
TABLE_mouse_multiome_qc <- mouse_seurat_qc %>%
    left_join(cellranger_qc, by = "Original_ID") %>%
    left_join(config_params2, by = "ID") %>%
    select(
        # Sample info
        ID, Original_ID, Sequencing_platform,
        # QC
        Genome,
        Cellranger_version = `Pipeline version`,
        N_cells_estimated = `Estimated number of cells`,
        N_cells_after_filtering = N_cells_after,
        GEX_Reads_mapped_to_genome = `GEX Reads mapped confidently to genome`,
        GEX_Reads_mapped_to_transcriptome = `GEX Reads mapped confidently to transcriptome`,
        GEX_Mean_mitochondrial_content_after_filtering = percent.mito_mean.postQC,
        GEX_Mean_UMIs_after_filtering = nCount_RNA_mean.postQC,
        GEX_Mean_N_genes_after_filtering = nFeature_RNA_mean.postQC,
        GEX_Max_mito_threshold = percent.mito_max.threshold,
        GEX_Min_N_genes_threshold = nFeature_RNA_min.threshold,
        GEX_Max_N_genes_threshold = nFeature_RNA_max.threshold,
        GEX_Min_UMIs_threshold = nCount_RNA_min.threshold,
        GEX_Max_UMIs_threshold = nCount_RNA_max.threshold,
        ATAC_Fraction_transposition_events_in_peaks_in_cells = `ATAC Fraction of transposition events in peaks in cells`,
        ATAC_Fraction_mapped_confidently = `ATAC Confidently mapped read pairs`,
        ATAC_Fraction_fragments_in_peaks = `ATAC Fraction of high-quality fragments overlapping peaks`,
        Min_cells, Clustering_resolution, N_principal_components_RNA, N_principal_components_ATAC, Seed,
        ATAC_Median_fragments_per_cell = `ATAC Median high-quality fragments per cell`,
        ATAC_Mean_nucleosome_signal_after_filtering = nucleosome_signal_mean.postQC,
        ATAC_Mean_peak_region_fragments_after_filtering = atac_peak_region_fragments_mean.postQC,
        ATAC_Mean_TSS_enrichment_after_filtering = TSS.enrichment_mean.postQC,
        ATAC_Min_peak_region_fragments_threshold = atac_peak_region_fragments_min.threshold,
        ATAC_Max_peak_region_fragments_threshold = atac_peak_region_fragments_max.threshold,
        ATAC_Max_nucleosome_signal_threshold = nucleosome_signal_max.threshold,
        ATAC_Min_TSS_enrichment_threshold = TSS.enrichment_min.threshold,
        ATAC_Max_TSS_enrichment_threshold = TSS.enrichment_max.threshold) %>%
    arrange(ID)

rr_write_tsv(TABLE_mouse_multiome_qc,
             glue("{out}/TABLE_mouse_multiome_QC.tsv"),
             "Summary of sample info and QC for mouse model multiome samples")

## ...writing description of TABLE_mouse_multiome_QC.tsv to public/output/07.1/TABLE_mouse_multiome_QC.desc

6.0.2 Sample QC overviews

In this section, let's examine several QC stats across each sample. Per cluster stats will be explored in the "Inspection per sample" section.

# extract per-cell metadata from each sample, containing qc
p1 <- imap_dfr(seurat_mouse, ~ .x@meta.data %>% mutate(Sample = .y)) %>%
    select(Sample, nCount_RNA, nFeature_RNA, percent.mito, percent.ribo,
           nCount_peaks, nFeature_peaks, nucleosome_signal, TSS.enrichment) %>%
    gather(stat, value, 2:ncol(.)) %>%
    mutate(stat = factor(stat, levels = c("nCount_RNA", "nFeature_RNA", "percent.mito", "percent.ribo",
                                          "nCount_peaks", "nFeature_peaks", "nucleosome_signal", "TSS.enrichment"))) %>%
    # plot
    ggplot(aes(x = Sample, y = value)) +
    geom_violin(aes(fill = Sample), scale = "width") +
    facet_wrap(~ stat, scales = "free_x", nrow = 1) +
    coord_flip() +
    rotate_x() +
    no_legend() +
    xlab(NULL) + theme(axis.text.y = element_blank())

# calculate number of cells
p2 <- imap_dfr(seurat_mouse, ~ .x@meta.data %>% mutate(Sample = .y)) %>%
    group_by(Sample) %>%
    count() %>%
    # plot
    ggplot(aes(x = Sample, y = n)) +
    geom_col(aes(fill = Sample), width = 0.5) +
    rotate_x() +
    coord_flip() +
    no_legend() 

plot_grid(p2, p1, rel_widths = c(0.15, 0.85), align = "h", axis = "tb")

In particular, we note that one Foxr2-alone replicate has very few cells compared to the other two samples.

6.0.3 Call doublets

We will use scDblFinder to identify putative doublets by comparing real cells to simulated doublets. Since this is non-deterministic, we run once, and save the results.

Here, we're following Chapter 8.3 in the Orchestrating Single Cell Analysis Advanced book. In particular this uses the "griffiths" method for thresholding, which is based on "cluster-wise number of median absolute deviation in doublet score", according to the documentation. This method is based on the analysis implemented by Jonathan Griffiths in Pijuan-Sala et al, Nature, 2019 - see section Doublet removal in the Methods.

call_doublets <- function(seurat) {

    # compute doublet density per cell
    scores <- scDblFinder::computeDoubletDensity(seurat@assays$RNA@counts)

    # apply thresholding to get a call of singlet or doublet per cell
    calls <- scDblFinder::doubletThresholding(data.frame(score = scores,
                                                         cluster = Idents(seurat)),
                                              method = "griffiths",
                                              returnType = "call")
    df <- data.frame("Doublet_score_log1p" = log1p(scores),
                     "Doublet_call" = calls)
    rownames(df) <- colnames(seurat)

    return(df)

}

# loop over objects and call doublets
doublet_scores <- map(seurat_mouse, call_doublets)
names(doublet_scores) <- names(seurat_mouse)

save(doublet_scores, file = glue("{out}/doublet_scores.Rda"))

Add the computed doublet scores to each object:

load(glue("{out}/doublet_scores.Rda"))

# if there are fewer than 20 cells in any cluster after removing doublets, then
# remove the cluster
doublet_scores <- map2(seurat_mouse, doublet_scores, function(seurat, scores) {

    scores$Cluster <- Idents(seurat)
    singlets_per_cluster <- scores %>% filter(Doublet_call == "singlet") %>% pull(Cluster) %>% table()
    clusters_drop <- names(singlets_per_cluster[singlets_per_cluster < 20])
    scores$Doublet_call <- as.character(scores$Doublet_call)
    scores <- scores %>%
        mutate(Doublet_call = ifelse(Cluster %in% clusters_drop & Doublet_call == "singlet", "drop", Doublet_call)) %>%
        select(-Cluster)  %>%
        mutate(Doublet_call = factor(Doublet_call))
    return(scores)

})

seurat_mouse <- map2(seurat_mouse, doublet_scores, ~ AddMetaData(.x, .y))

# count singlets/doublets
map(doublet_scores, ~ table(.x$Doublet_call))

## $Foxr2_p53_r1
## 
## doublet    drop singlet 
##    1393       5    6934 
## 
## $Foxr2_r1
## 
## doublet    drop singlet 
##    1171       9    6348 
## 
## $Foxr2_r2
## 
## doublet    drop singlet 
##     189      10     843

Add to joint object:

joint_doublet <- imap(seurat_mouse, ~ .x@meta.data %>% 
                       rownames_to_column("Cell") %>% 
                      select(Cell,
                             Doublet_call,
                             Doublet_score_log1p) %>% 
                       mutate(Sample = .y)) %>% 
    Reduce(rbind, .) %>% 
    dplyr::rename("gex_barcode" = "Cell") %>% 
    left_join(seurat_joint@meta.data, ., by = c("gex_barcode", "Sample")) %>% 
    select(Doublet_call, Doublet_score_log1p) %>% 
    mutate(Doublet_call = factor(Doublet_call))

rownames(joint_doublet) <- rownames(seurat_joint@meta.data)

seurat_joint <- AddMetaData(seurat_joint, joint_doublet)

# count singlets/doublets
table(seurat_joint$Doublet_call)

## 
## doublet    drop singlet 
##    2753      24   14125

Export cell ids marked as doublet per sample:

walk(names(seurat_mouse),
    function(id){
        
        seurat_mouse[[id]]@meta.data %>% 
            as.data.frame() %>% 
            select(Doublet_call, gex_barcode) %>% 
            filter(Doublet_call == "doublet") %>% 
            .$gex_barcode %>% 
            as.data.frame() %>% 
            write_tsv(glue("{out}/{id}_doublet_barcodes.tsv"),col_names = F)
        
})

Plot doublet scores and calls per sample:

imap(seurat_mouse, ~ umap(.x,
                          title = .y,
                          color_by = "Doublet_score_log1p",
                          order_by = "Doublet_score_log1p",
                          color_by_type = "continuous",
                          colors = rdbu,
                          point_size = 0.3) +
         square_theme) %>% 
    {plot_grid(plotlist = ., nrow = 1)}

imap(seurat_mouse, ~ umap(.x,
                          title = .y,
                          color_by = "Doublet_call",
                          order_by = "Doublet_call",
                          colors = c("doublet" = "red", "singlet" = "lightblue", "drop" = "black"),
                          point_size = 0.3) +
         square_theme) %>% 
    {plot_grid(plotlist = ., nrow = 1)}

plot_doublet_vln <- function(seurat, title) {
    
    FetchData(seurat, c("seurat_clusters", "Doublet_score_log1p", "Doublet_call")) %>% 
        ggplot(aes(x = seurat_clusters, y = Doublet_score_log1p)) +
        geom_violin(scale = "width", fill = "gray90") +
        geom_jitter(aes(color = Doublet_call), size = 0.1, width = 0.2, alpha = 0.5) +
        scale_color_manual(values = c("doublet" = "red", "singlet" = "lightblue", "drop" = "black")) +
        ggtitle(title) + xlab("Cluster")
    
}

imap(seurat_mouse, ~ plot_doublet_vln(.x, title = .y) +
         theme(legend.position = "bottom")) %>% 
    {plot_grid(plotlist = ., nrow = 1, rel_widths = c(0.4, 0.4, 0.2))}

7 Load cell type annotation

To annotate cells at the single-cell level, we performed automated annotation using CoRAL (v3.0.0, https://github.com/fungenomics/CoRAL/releases/tag/v3.0.0).

Tools run through CoRAL:

Correlation
SciBet
SVMlinear
singleCellNet

7.1 Jessa forebrain

Here, we use a custom function get_consensus_labels (in code/functions/scRNAseq.R) to produce a consensus using broader class labels.

# Calculate majority vote consensus outside of CoRAL
seurat_mouse <- imap(seurat_mouse, function(seurat, alias) {

    labels <- read_tsv(here(glue("data/singlecell/CoRAL/output/mm_model_Jessaforebrain/{alias}/JessaForebrain/majority/Prediction_Summary_base.tsv"))) %>%
        get_consensus_labels(tools_to_include = c("Correlation", "SciBet", "SVMlinear", "singleCellNet"),
                             remove_timepoint = FALSE,
                             high_level = TRUE,
                             ontology = ontology_jessa,
                             suffix = "braindex")
    
    labels$Consensus_broad_braindex <- factor(labels$Consensus_broad_braindex, levels = names(palette_jessa_l3))
    labels <- labels %>% tibble::column_to_rownames("Cell")
    seurat <- AddMetaData(seurat, labels)

})

## Rows: 8332 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (14): cellname, SingleR, scClassify, SciBet, singleCellNet, SVMlinear, C...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## [1] "SciBet"        "singleCellNet" "SVMlinear"     "Correlation"

## @ Proportion of unresolved cells: NA
## @ Proportion of uncertain cells (broad): NA
## Rows: 7528 Columns: 14── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (14): cellname, SingleR, scClassify, SciBet, singleCellNet, SVMlinear, C...
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## [1] "SciBet"        "singleCellNet" "SVMlinear"     "Correlation"

## @ Proportion of unresolved cells: NA
## @ Proportion of uncertain cells (broad): NA
## Rows: 1042 Columns: 14── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (14): cellname, SingleR, scClassify, SciBet, singleCellNet, SVMlinear, C...
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## [1] "SciBet"        "singleCellNet" "SVMlinear"     "Correlation"

## @ Proportion of unresolved cells: NA
## @ Proportion of uncertain cells (broad): NA

Summarize these labels further to broad annotation (Neuron/Glia/Other) as per ontology in section 5.2:

seurat_mouse <- map(seurat_mouse,
                    function(seurat){
                        
                        x <- seurat@meta.data %>% 
                            select("Consensus_broad_braindex") %>% 
                            rownames_to_column("Cell") %>% 
                            left_join(., jessa_broad %>% select(Label_broad, broad_label_Jessa) %>% distinct(),
                                      by = c("Consensus_broad_braindex" = "Label_broad")) %>% 
                            select(-Consensus_broad_braindex) %>% 
                            column_to_rownames("Cell")
                        
                        x$broad_label_Jessa <- factor(x$broad_label_Jessa, levels = rev(names(palette_broad_type)))
                        
                        seurat <- AddMetaData(seurat, x)
                    
                        return(seurat)    
                        
                    }
)

Transfer cell type annotation columns to joint object:

# Bind together individual sample metadata
# Join it to the right of the seurat_joint metadata
# (preserving the cell order of the joint metadata)
joint_anno <- imap(seurat_mouse, ~ .x@meta.data %>% 
                       rownames_to_column("Cell") %>% 
                      select(Cell, 
                             Consensus_broad_braindex, 
                             broad_label_Jessa) %>% 
                       mutate(Sample = .y)) %>% 
    Reduce(rbind, .) %>% 
    dplyr::rename("gex_barcode" = "Cell") %>% 
    left_join(seurat_joint@meta.data, ., by = c("gex_barcode", "Sample")) %>% 
    select(Consensus_broad_braindex, 
             broad_label_Jessa)

rownames(joint_anno) <- rownames(seurat_joint@meta.data)

joint_anno$Consensus_broad_braindex <- factor(joint_anno$Consensus_broad_braindex, levels = names(palette_jessa_l3))
joint_anno$broad_label_Jessa <- factor(joint_anno$broad_label_Jessa, levels = names(palette_broad_type))

seurat_joint <- AddMetaData(seurat_joint, joint_anno)

7.2 Yao forebrain

Here, we use a custom function get_consensus_labels (in code/functions/scRNAseq.R) to produce a consensus using broader class labels.

ontology_yao <- ontology_yao %>% 
    dplyr::rename("Label" = "labels") %>% 
    dplyr::rename("Label_broad" = "class_label")

# get majority consensus annotations
seurat_mouse <- imap(seurat_mouse, function(seurat, alias) {

    labels <- read_tsv(here(glue("data/singlecell/CoRAL/output/mm_model_Yaoforebrain/{alias}/YaoForebrain_subclass/majority/Prediction_Summary_base.tsv"))) %>%
        get_consensus_labels(tools_to_include = c("Correlation", "SciBet", "SVMlinear", "singleCellNet"),
                             remove_timepoint = FALSE,
                             high_level = TRUE,
                             ontology = ontology_yao,
                             suffix = "Yao")

    labels$Consensus_broad_Yao <- factor(labels$Consensus_broad_Yao, levels = names(palette_yao2023_class))
    labels <- labels %>% tibble::column_to_rownames("Cell")
    seurat <- AddMetaData(seurat, labels)

})

## Rows: 8332 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (14): cellname, SingleR, scClassify, SciBet, singleCellNet, SVMlinear, C...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## [1] "SciBet"        "singleCellNet" "SVMlinear"     "Correlation"

## @ Proportion of unresolved cells: NA
## @ Proportion of uncertain cells (broad): NA
## Rows: 7528 Columns: 14── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (14): cellname, SingleR, scClassify, SciBet, singleCellNet, SVMlinear, C...
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## [1] "SciBet"        "singleCellNet" "SVMlinear"     "Correlation"

## @ Proportion of unresolved cells: NA
## @ Proportion of uncertain cells (broad): NA
## Rows: 1042 Columns: 14── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (14): cellname, SingleR, scClassify, SciBet, singleCellNet, SVMlinear, C...
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## [1] "SciBet"        "singleCellNet" "SVMlinear"     "Correlation"

## @ Proportion of unresolved cells: NA
## @ Proportion of uncertain cells (broad): NA

Summarize these labels further to broad annotation (Neuron/Glia/Other) as per ontology in section 5.3:

seurat_mouse <- map(seurat_mouse,
                    function(seurat){
                        
                        x <- seurat@meta.data %>% 
                            select("Consensus_broad_Yao") %>% 
                            rownames_to_column("Cell") %>% 
                            left_join(., yao_broad %>% select(Label, broad_label_Yao) %>% distinct(),
                                      by = c("Consensus_broad_Yao" = "Label")) %>% 
                            select(-Consensus_broad_Yao) %>% 
                            column_to_rownames("Cell")
                        
                        x$broad_label_Yao <- factor(x$broad_label_Yao, levels = rev(names(palette_broad_type)))
                        
                        seurat <- AddMetaData(seurat, x)

                        return(seurat)    
                        
                    }
)

Transfer cell type annotation columns to joint object:

# Bind together individual sample metadata
# Join it to the right of the seurat_joint metadata
# (preserving the cell order of the joint metadata)
joint_anno <- imap(seurat_mouse, ~ .x@meta.data %>% 
                       rownames_to_column("Cell") %>% 
                      select(Cell, 
                             Consensus_broad_Yao, 
                             broad_label_Yao) %>% 
                       mutate(Sample = .y)) %>% 
    Reduce(rbind, .) %>% 
    dplyr::rename("gex_barcode" = "Cell") %>% 
    left_join(seurat_joint@meta.data, ., by = c("gex_barcode", "Sample")) %>% 
    select(Consensus_broad_Yao, 
             broad_label_Yao)

rownames(joint_anno) <- rownames(seurat_joint@meta.data)

joint_anno$Consensus_broad_Yao <- factor(joint_anno$Consensus_broad_Yao, levels = names(palette_yao2023_class))
joint_anno$broad_label_Yao <- factor(joint_anno$broad_label_Yao, levels = names(palette_broad_type))

seurat_joint <- AddMetaData(seurat_joint, joint_anno)

8 Distinguish tumor / normal cells

In order to distinguish tumor and normal cells, we first generate a reference of normal cells based on cell type annotation as Immune, or equivalently low FOXR2 expression (section 8.1).

We then run inferCNV to detect CNVs within this reference, running this set of cells as both the reference and the query at the same time, and remove cells with visible CNV (section 8.2.1).

Finally, we combine this reference with cells from wild-type adult mouse cortex Multiome profiling (Khazaei et al. Cell 2023) to call tumor vs. normal cells in the full mouse samples (section 8.2.2) and add this annotation to Seurat objects (section 8.2.3).

8.1 By FOXR2 expression

In these models, Foxr2 is randomly incorporated into the genome, and therefore its expression level may differ between samples.

To distinguish tumor from normal brain cells, the procedure is as follows:

filter out putative doublets by setting an (arbitrary) hard threshold on the doublet score computed using scDblFinder::computeDoubletDensities()
recompute the neighbour graph on singlets
subcluster within clusters using Seurat::FindSubcluster (NOTE: this method does not recompute variable genes, PCA, and neighbours, but uses the existing neighbor graph), and simply finds subclusters within each existing cluster
use an unambiguously normal cell type (immune cells) as the baseline distribution for Foxr2 expression
compare each subcluster to the immune cells - if Foxr2 is significantly higher in the other cluster (based on a Wilcoxon rank-sum test), call it a tumor cluster, using a p-value threshold of 0.05.

NOTE: by default, Seurat::FindMarkers() uses a pseudocount of 1 when computing average log2 fold-change between clusters, which is how non-zero fold-changes are produced even for samples where Foxr2 is 0 in the baseline population.

#' distinguish tumor and normal cells based on differential expression of Foxr2
#' between subclusters and immune cells
#' 
#' @param seurat Seurat object, expects columns "Doublet_call" and "Consensus_broad_braindex"
#' in the meta.data slot
#' @param out_prefix string, prefix for the filenames of the output files,
#' e.g. a sample ID
#' @param baseline string, name of population to use as baseline (should be a
#' valid category of the "Consensus_broad_braindex" column)
call_tumor_normal <- function(seurat, out_prefix, baseline = "Immune") {
    
    # check the required cols are present
    if (!("Doublet_call" %in% colnames(seurat@meta.data))) stop(
        "The column 'Doublet_call' is required to be present in the object metadata")
    if (!("Consensus_broad_braindex" %in% colnames(seurat@meta.data))) stop(
        "The column 'Consensus_broad_braindex' is required to be present in the object metadata")
    
    message("@ data wrangling...")
    # make a tmp object since we will remove some dataslots for the purposes
    # of simplifying the subsetting based on the RNA only
    seurat_lite <- seurat
    
    # 0. remove chromatin data -------------------------------------------------
    seurat_lite[['ATAC']] <- NULL
    seurat_lite[['peaks']] <- NULL
    seurat_lite[['promoters']] <- NULL
    
    # 1. filter out putative doublets ------------------------------------------
    seurat_subset <- subset(seurat_lite, subset = Doublet_call == "singlet")
    
    # 2. recluster within each cluster  -----------------------------------------
    # calculate neighbour graph for singlets
    seurat_subset <- FindNeighbors(seurat_subset,
                                   reduction = "pca",
                                   dims      = 1:20,
                                   verbose   = FALSE,
                                   nn.eps    = 0.5)
    
    Idents(seurat_subset) <- "seurat_clusters"
    idents <- levels(Idents(seurat_subset))
    
    # recluster within each cluster
    # the notation "i_j" will mean subcluster j of cluster i
    message("@ re-clustering...")
    for (i in seq_along(idents)) {
        
        seurat_subset <- FindSubCluster(seurat_subset,
                                        cluster = idents[i],
                                        graph.name = "RNA_snn",
                                        subcluster.name = glue("Subcluster_{idents[i]}"),
                                        resolution = 0.2,
                                        algorithm = 1)
        
    }
    
    # establish a column "DGE_cluster", which will contain the cell groupings
    # to use for the differential expression testing
    # (get the new subclustered idents, unless the cell is an immune cell)
    seurat_subset@meta.data$DGE_cluster <- as.character(seurat_subset@meta.data$seurat_clusters)
    
    # loop over clusters and replace by subclustered ident
    for (i in seq_along(idents)) {
        
        seurat_subset@meta.data[seurat_subset@meta.data$DGE_cluster == idents[i], 
        ]$DGE_cluster <- seurat_subset@meta.data[seurat_subset@meta.data$DGE_cluster == idents[i], ][[glue("Subcluster_{idents[i]}")]]
        
    }
    
    # if it's the baseline pop (based on the braindex labels),
    # just use that as the label
    seurat_subset@meta.data[seurat_subset@meta.data$Consensus_broad_braindex == baseline, ]$DGE_cluster <- baseline
    seurat_subset$DGE_cluster <- factor(seurat_subset$DGE_cluster)
    Idents(seurat_subset) <- "DGE_cluster"
    
    # get a colour palette for the clusters
    pal <- get_gg_colors(length(levels(Idents(seurat_subset)))) %>% 
        # randomize the order so that close-by clusters are less likely to have similar colors
        # (make them easier to see in the UMAP space)
        sample()
    names(pal) <- levels(Idents(seurat_subset))
    pal[baseline] <- "black"
    
    # show the subclusters in the UMAP, just for reference
    print(umap(seurat_subset, color_by = "DGE_cluster", label = TRUE, point_size = 0.1,
               legend = TRUE, colors = pal, title = "Subclusters", label_size = 2))
    
    # get the detection rate per cluster
    det_rate <- FetchData(seurat_subset, c("Foxr2", "DGE_cluster")) %>%
        mutate(DGE_cluster = factor(DGE_cluster, levels = names(pal))) %>% 
        group_by(DGE_cluster) %>%
        summarize(Foxr2_detection = sum(Foxr2 > 0)/n())
    
    # discard clusters with fewer than 10 cells
    counts_per_cluster <- seurat_subset@meta.data %>% group_by(DGE_cluster) %>% count()
    clusters_keep <- counts_per_cluster %>% filter(n >= 10) %>% pull(DGE_cluster) %>% as.character()
    
    # get a nice order for the subclusters
    nice_order <- seurat_subset@meta.data %>%
        distinct(seurat_clusters, DGE_cluster) %>%
        arrange(seurat_clusters) %>%
        pull(DGE_cluster) %>%
        as.character() %>%
        unique()
    nice_order <- c(baseline, setdiff(nice_order, baseline))
    
    # get the average expression per cluster
    mean_expr <- AverageExpression(seurat_subset, features = "Foxr2", assays = "RNA") %>% .$RNA
    mean_expr_df <- data.frame("Subcluster" = unname(colnames(mean_expr)),
                               "Mean_expression" = as.numeric(mean_expr)) %>% 
        mutate(Subcluster = factor(Subcluster, levels = nice_order))
    
    seurat_subset <- subset(seurat_subset, idents = clusters_keep)
    message("@@ dropping ", counts_per_cluster %>% filter(n < 10) %>% pull(n) %>% sum(), " cells due to small subclusters...")
    
    # 4. compare Foxr2 expression in each subcluster
    # to the immune population as a baseline -----------------------------------
    message("@ differential expression...")
    non_immune <- setdiff(levels(Idents(seurat_subset)), baseline)
    foxr2_tests <- map_dfr(non_immune, ~ FindMarkers(seurat_subset, ident.1 = .x, ident.2 = baseline,
                                                     features = "Foxr2",
                                                     min.pct = 0,
                                                     logfc.threshold = 0,
                                                     verbose = FALSE) %>% 
                               mutate(Subcluster = .x)) %>% 
        set_rownames(NULL)
    
    # get clusters where foxr2 is not detected
    no_det_clusters <- det_rate %>% filter(Foxr2_detection == 0) %>% pull(DGE_cluster) %>% as.character()
    tumor_clusters <- foxr2_tests %>% filter(p_val < 0.05 & avg_log2FC > 0) %>% pull(Subcluster)
    # message("@@ putative tumor subclusters: ", tumor_clusters)
    
    # tidy for plotting
    tests_df <- foxr2_tests %>%
        select(Subcluster, avg_log2FC, p_val) %>% 
        tibble::add_row(Subcluster = baseline, avg_log2FC = 0, p_val = 1) %>% 
        mutate(Subcluster = factor(Subcluster, levels = nice_order)) %>% 
        complete(Subcluster, fill = list(avg_log2FC = 0, p_val = 1)) %>% 
        mutate(Tumor = ifelse(Subcluster %in% tumor_clusters, TRUE, FALSE)) %>% 
        mutate(log10p = -log10(p_val))
    
    # 5. put the results back in the subsetted seurat object -------------------
    seurat_subset@meta.data$Tumor_normal_FOXR2_call <- "Normal"
    seurat_subset@meta.data[seurat_subset@meta.data$DGE_cluster %in% tumor_clusters, ]$Tumor_normal_FOXR2_call <- "Tumor"
    tumor_cells <- WhichCells(seurat_subset, expression = Tumor_normal_FOXR2_call == "Tumor")
    
    # add the final call back in the original object
    seurat@meta.data$Tumor_normal_FOXR2_call <- "Normal"
    seurat@meta.data[seurat@meta.data$Doublet_call != "singlet", ]$Tumor_normal_FOXR2_call <- "Doublet"
    seurat@meta.data[tumor_cells, ]$Tumor_normal_FOXR2_call <- "Tumor"
    
    # add the subcluster in the original object
    seurat@meta.data$Subcluster <- as.character(seurat@meta.data$Doublet_call)
    if (any(seurat@meta.data$Subcluster == "drop")) seurat@meta.data[seurat@meta.data$Subcluster == "drop",
    ]$Subcluster <- "doublet"
    seurat@meta.data[colnames(seurat_subset), ]$Subcluster <- as.character(seurat_subset@meta.data$DGE_cluster)
    # if there's any singlets remaining, it's because they were tiny subclusters
    if (any(seurat@meta.data$Subcluster == "singlet")) seurat@meta.data[seurat@meta.data$Subcluster == "singlet", 
    ]$Subcluster <- "smallcluster"
    
    # new idents
    # unique(seurat@meta.data$Subcluster)
    
    # save outputs for troubleshooting later
    out_diag <- glue("{out}/{out_prefix}_tumornormal_diagnostics.tsv")
    message("@ saving diagnostic stats to ", out_diag)
    diagnostics <- counts_per_cluster %>% 
        dplyr::rename(Subcluster = "DGE_cluster") %>% 
        left_join(det_rate, by = c("Subcluster" = "DGE_cluster")) %>%
        left_join(mean_expr_df) %>% 
        left_join(tests_df) %>% 
        left_join(tibble::enframe(pal, "Subcluster", "Color"))
    write_tsv(diagnostics, out_diag)
    
    out_meta <- glue("{out}/{out_prefix}_tumornormal_seurat_metadata.Rds")
    message("@ saving cell metadata to ", out_meta)
    # just in case
    seurat@meta.data$Tumor_normal_FOXR2_call <- factor(seurat@meta.data$Tumor_normal_FOXR2_call)
    seurat$Consensus_broad_braindex <- factor(seurat$Consensus_broad_braindex, levels = names(palette_jessa_l3))
    seurat_meta <- seurat@meta.data
    # save in seurat misc
    seurat@misc$tumor_normal_diagnostics <- diagnostics
    saveRDS(seurat_meta, file = out_meta)
    
    message("@ done.")
    return(seurat)
    
}

#' plot several plots for diagnostics of this tumor/normal calling method:
#' 1. number of cells per subcluster
#' 2. foxr2 detection rate per subcluster
#' 3. mean foxr2 expression per subcluster
#' 4. -log10(p-val) for the DGE on foxr2 between each subcluster and baseline cells
#' 5. log2fold change for the DGE on foxr2 between each subcluster and baseline calls
#' 
#' @param diagnostics data frame, as produced by reading in <out_prefix>_tumornormal_diagnostics.tsv,
#' produced by call_tumor_normal()
plot_diagnostics <- function(diagnostics) {
    
    pal <- diagnostics %>% select(Subcluster, Color) %>% tibble::deframe()
    
    p1 <- diagnostics %>% 
        ggplot(aes(x = Subcluster, y = n)) +
        geom_col(aes(fill = Subcluster)) +
        geom_hline(yintercept = 10) +
        scale_fill_manual(values = pal) +
        theme_min2() +
        theme(panel.grid.major.y = element_line(color = "gray90")) +
        no_legend() +
        coord_flip() +
        ggtitle("# cells per subcluster") + xlab("Subcluster")
    
    p2 <- diagnostics %>% 
        ggplot(aes(x = Subcluster, y = Foxr2_detection)) +
        geom_col(aes(fill = Subcluster)) +
        scale_fill_manual(values = pal) +
        theme_min2() +
        theme(panel.grid.major.y = element_line(color = "gray90")) +
        no_legend() +
        coord_flip() +
        ggtitle("Foxr2 detection rate \n(pct1) per subcluster") + xlab(NULL)
    
    p3 <- diagnostics %>% 
        ggplot(aes(x = Subcluster, y = Mean_expression)) +
        geom_col(aes(fill = Subcluster)) +
        scale_fill_manual(values = pal) +
        theme_min2() +
        theme(panel.grid.major.y = element_line(color = "gray90")) +
        no_legend() +
        coord_flip() +
        ggtitle("Mean Foxr2 expression \nper subcluster") + xlab(NULL)
    
    p4 <- diagnostics %>% 
        ggplot(aes(x = Subcluster, y = log10p)) +
        geom_col(aes(fill = Subcluster, alpha = Tumor)) +
        scale_fill_manual(values = pal) +
        scale_alpha_manual(values = c("TRUE" = 1, "FALSE" = 0.3)) +
        # show adjusted p-value = 0.05 threshold
        geom_hline(yintercept = -log10(0.05)) +
        coord_flip() +
        theme_min2() +
        theme(panel.grid.major.y = element_line(color = "gray90")) +
        no_legend() +
        ggtitle("-log10(p-val) for Foxr2\n(clusters w/ <10 cells or \n0 Foxr2 det: dropped)") + xlab(NULL)
    
    p5 <- diagnostics %>% 
        ggplot(aes(x = Subcluster, y = avg_log2FC)) +
        # highlight clusters which will be called tumor clusters
        geom_col(aes(fill = Subcluster, alpha = Tumor)) +
        scale_fill_manual(values = pal) +
        scale_alpha_manual(values = c("TRUE" = 1, "FALSE" = 0.3)) +
        coord_flip() +
        theme_min2() +
        theme(panel.grid.major.y = element_line(color = "gray90")) +
        no_legend() +
        ggtitle("log2FC for Foxr2 \nlog2(subcluster +1/ Immune+1)") + xlab(NULL)
    
    plot_grid(p1, p2, p3, p4, p5, nrow = 1, align = "h", axis = "tb")
    
}

#' UMAP plots of the sample separated into tumor / normal cells
plot_tumor_normal <- function(seurat, alias) {
    
    p1 <- umap(seurat,
               color_by = "Tumor_normal_FOXR2_call",
               label = TRUE,
               colors = palette_tumor_normal_mm,
               title = glue("{alias} - tumor"),
               cells = names(which(seurat$Tumor_normal_FOXR2_call == "Tumor")),
               show_all_cells = FALSE, constrain_scale = TRUE,
               legend = FALSE)
    p2 <- umap(seurat,
               color_by = "Tumor_normal_FOXR2_call",
               label = TRUE,
               colors = palette_tumor_normal_mm,
               title = glue("{alias} - normal"),
               cells = names(which(seurat$Tumor_normal_FOXR2_call == "Normal")),
               show_all_cells = FALSE,
               constrain_scale = TRUE,
               legend = FALSE)
    
    plot_grid(p1, p2, nrow = 1)
    
}

To avoid re-running this time and memory intensive function, add in the pre-computed Tumor-normal calls into each Seurat object from saved metadata:

load(glue("{out}/seurat_mouse_metadata.Rda"))

seurat_mouse <- map2(seurat_mouse, seurat_mouse_metadata,
    function(seurat, meta){
        
        x <- meta %>% 
            column_to_rownames("Cell") %>% 
            select(Tumor_normal_FOXR2_call)
        
        seurat <- AddMetaData(seurat, x)
        return(seurat)
    })

8.1.1 Foxr2_p53_r1

seurat_mouse$Foxr2_p53_r1 <- call_tumor_normal(seurat = seurat_mouse$Foxr2_p53_r1,
                                               out_prefix = "Foxr2_p53_r1",
                                               baseline = "Immune")

plot_diagnostics(read_tsv(here(glue("{out}/Foxr2_p53_r1_tumornormal_diagnostics.tsv"))))

## Rows: 42 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): Subcluster, Color
## dbl (6): n, Foxr2_detection, Mean_expression, avg_log2FC, p_val, log10p
## lgl (1): Tumor
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

8.1.2 Foxr2_r1

seurat_mouse$Foxr2_r1 <- call_tumor_normal(seurat = seurat_mouse$Foxr2_r1,
                                           out_prefix = "Foxr2_r1",
                                           baseline = "Immune")

plot_diagnostics(read_tsv(here(glue("{out}/Foxr2_r1_tumornormal_diagnostics.tsv"))))

## Rows: 34 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): Subcluster, Color
## dbl (6): n, Foxr2_detection, Mean_expression, avg_log2FC, p_val, log10p
## lgl (1): Tumor
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

8.1.3 Foxr2_r2

seurat_mouse$Foxr2_r2 <- call_tumor_normal(seurat = seurat_mouse$Foxr2_r2,
                                           out_prefix = "Foxr2_r2",
                                           baseline = "Immune")

plot_diagnostics(read_tsv(here(glue("{out}/Foxr2_r2_tumornormal_diagnostics.tsv"))))

## Rows: 8 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): Subcluster, Color
## dbl (6): n, Foxr2_detection, Mean_expression, avg_log2FC, p_val, log10p
## lgl (1): Tumor
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

8.1.4 Export reference

Export inferCNV reference based on Tumor_normal_FOXR2_call (FOXR2 expression) column with doublets removed.

For inferCNV, we will use cells labeled as Normal by this technique.

inferCNV_ref_no_doublets <- map(seurat_mouse, 
    ~ .x@meta.data %>% as.data.frame() %>% 
        select(Tumor_normal_FOXR2_call) %>% 
        filter(Tumor_normal_FOXR2_call != "Doublet") %>% 
        mutate(Malignant_status = case_when(Tumor_normal_FOXR2_call == "Tumor" ~ "Malignant",
                                            Tumor_normal_FOXR2_call == "Normal" ~ "Normal")) %>% 
        select(-Tumor_normal_FOXR2_call)
    )

imap(inferCNV_ref_no_doublets, 
     ~ data.table::fwrite(x = .x, sep = "\t", row.names = T,
                          file = glue("{out}/foxr2_negative_cells_no_doublets_sample_{.y}.tsv")))

## $Foxr2_p53_r1
## NULL
## 
## $Foxr2_r1
## NULL
## 
## $Foxr2_r2
## NULL

8.2 By inferCNV

8.2.1 Clean reference

We will use the Foxr2 negative cells identified in the previous section, concatenated with wildtype P7 cortex (scMultiome sample "6" from Khazaei et al. Cell 2023) to avoid cell type specific & artefactual results.

We run inferCNV on this set of cells, as both reference and query at the same time, and remove clusters of cells with CNV-like signals.

Code for extracting cells adapted from:

code/scripts/per_sample_script_examples/infer_cnv_call.Rmd

# Here's our adapted function from dendextend::get_subdendrograms with the bug fix
# https://rdrr.io/cran/dendextend/src/R/get_subdendrograms.R
get_subdendrograms2 <- function(dend, k, ...) {
    clusters <- cutree(dend, k, ...)
    dend_list <- lapply(unique(clusters), function(cluster.id) {
        # !! Added names(clusters)[] here
        find_dendrogram(dend, names(clusters)[which(clusters == cluster.id)])
    })
    class(dend_list) <- "dendlist"
    dend_list
}

#' This function loads the inferCNV analysis, and plots
#' the dendrogram and four subtrees when cutting with k = input value
plot_subtrees <- function(dir_path, k, palette = NULL) {
    
    dendrogram_path <- glue("{dir_path}/infercnv.observations_dendrogram.txt")
    
    # Read in the file from inferCNV
    dend <- ape::read.tree(file = dendrogram_path)
    
    # Conver to hclust
    hclust_dend <- as.hclust(dend)
    if(!is.null(palette)){
        dend1 <- color_branches(hclust_dend, k = k, col = palette)
    } else{
        dend1 <- color_branches(hclust_dend, k = k)
    }
    
    m <- cbind(c(1, 1), c(2, 3), c(4, 5))
    layout(m)
    par(mar = c(4, 4, 0, 0))
    
    # Plot the whole one
    plot(dend1, leaflab = "none", horiz = TRUE)
    
    # Plot the subtrees
    dend_list <- get_subdendrograms2(dend1, k)
    
    sapply(dend_list, plot, leaflab = "none", horiz = TRUE)
    
    return(list("dend" = dend1,
                "subdendrograms" = dend_list))
    
}

Show inferCNV results:

If we split the clustering into two subtrees, the bottom one contains the malignant cells with CNV. This tree is subdendrogram #1 when doing clustering with \(k = 2\), and is colored pink below.

Saving the barcodes of the remaining cells (top / turquoise subtree, subdendrogram #2) to use as a cleaned reference of normal cells.

subtrees <- plot_subtrees(dir_path = here("data/singlecell/mm_inferCNV/normVSnorm_InferCNV/inferCNV/window101_exp0.1_refmanav_foxr2_normal_cleaned_6_WT_P7_cortex_chrFixed_HMMnone_denoise"),
                          k = 2)

plot(subtrees$subdendrograms[[2]], leaflab = "none", horiz = TRUE)

norm_barcodes_ref <- c(labels(subtrees$subdendrograms[[2]]))

write_tsv(x = norm_barcodes_ref %>% as.data.frame(),
          file = glue("{out}/barcodes_normVSnorm_top_tree.tsv"))

8.2.2 Get malignant cells per sample

We ran inferCNV individually in each sample using this cleaned reference.

Loading the inferCNV results and calling tumor/normal cells based on the heatmaps.

8.2.2.1 Foxr2_p53_r1 (AN24377)

# Set palette for inferCNV clusters
inferCNV_cluster_palette <- MetBrewer::met.brewer("Juarez", n = 7) %>% 
    setNames(c("C1", "C2", "C3", "C4", "C5", "C6", "C7"))

subtrees <- plot_subtrees(dir_path = here("data/singlecell/mm_inferCNV/AN24377/inferCNV/window101_exp0.1_refmanav_foxr2_normal_cleaned_6_WT_P7_cortex_chrFixed_HMMnone_denoise/"),
              k = 3, palette = inferCNV_cluster_palette[1:3])

plot(subtrees$subdendrograms[[1]], leaflab = "none", horiz = TRUE)

After cutting the tree at \(k = 3\), cells in subtrees #2 (middle) is normal. Others (subtrees #1 and #3) are Malignant.

Extracting the malignant cell barcodes and saving them.

mal_barcodes <- c()
inferCNV_subtrees <- c()

mm_sample <- "Foxr2_p53_r1"

mal_barcodes[[mm_sample]] <- c(labels(subtrees$subdendrograms[[1]]) %>% 
                                   gsub(pattern = "_.*$", 
                                        replacement = ""),
                               labels(subtrees$subdendrograms[[3]]) %>% 
                                   gsub(pattern = "_.*$", 
                                        replacement = ""))

inferCNV_subtrees[[mm_sample]] <- seurat_mouse[[mm_sample]]@meta.data %>%
    tibble::rownames_to_column("Cell") %>% 
    select(Cell) %>% 
    mutate(inferCNV_cluster = case_when(Cell %in% (labels(subtrees$subdendrograms[[1]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = ""))  ~ "C1",
                                        Cell %in% (labels(subtrees$subdendrograms[[2]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C2",
                                        Cell %in% (labels(subtrees$subdendrograms[[3]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C3"))

write_tsv(x = mal_barcodes[[mm_sample]] %>% as.data.frame(),
          file = glue("{out}/barcodes_malignant_{mm_sample}.tsv"))

write_tsv(x = inferCNV_subtrees[[mm_sample]] %>% as.data.frame(),
          file = glue("{out}/inferCNV_subtrees_{mm_sample}.tsv"))

8.2.2.2 Foxr2_r1 (M7238)

subtrees <- plot_subtrees(dir_path = here("data/singlecell/mm_inferCNV/M7238/inferCNV/window101_exp0.1_refmanav_foxr2_normal_cleaned_6_WT_P7_cortex_chrFixed_HMMnone_denoise/"),
              k = 6, palette = inferCNV_cluster_palette[1:6])

After cutting the tree at \(k = 6\), the bottom two (subtrees #1 and #2) are normal and rest are malignant.

Extracting the malignant cell barcodes and saving them.

mm_sample <- "Foxr2_r1"

mal_barcodes[[mm_sample]] <- c(labels(subtrees$subdendrograms[[3]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = ""),
                               labels(subtrees$subdendrograms[[4]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = ""),
                               labels(subtrees$subdendrograms[[5]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = ""),
                               labels(subtrees$subdendrograms[[6]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = ""))

inferCNV_subtrees[[mm_sample]] <- seurat_mouse[[mm_sample]]@meta.data %>%
    tibble::rownames_to_column("Cell") %>% 
    select(Cell) %>% 
    mutate(inferCNV_cluster = case_when(Cell %in% (labels(subtrees$subdendrograms[[1]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C1",
                                        Cell %in% (labels(subtrees$subdendrograms[[2]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C2",
                                        Cell %in% (labels(subtrees$subdendrograms[[3]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C3",
                                        Cell %in% (labels(subtrees$subdendrograms[[4]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C4",
                                        Cell %in% (labels(subtrees$subdendrograms[[5]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C5",
                                        Cell %in% (labels(subtrees$subdendrograms[[6]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C6"))

write_tsv(x = mal_barcodes[[mm_sample]] %>% as.data.frame(),
          file = glue("{out}/barcodes_malignant_{mm_sample}.tsv"))

write_tsv(x = inferCNV_subtrees[[mm_sample]] %>% as.data.frame(),
          file = glue("{out}/inferCNV_subtrees_{mm_sample}.tsv"))

8.2.2.3 Foxr2_r2 (AN22476)

subtrees <- plot_subtrees(dir_path = here("data/singlecell/mm_inferCNV/AN22476/inferCNV/window101_exp0.1_refmanav_foxr2_normal_cleaned_6_WT_P7_cortex_chrFixed_HMMnone_denoise/"),
              k = 4, palette = inferCNV_cluster_palette[1:4])

After cutting the tree at \(k = 4\), the second tree from the bottom (subtree #2) is normal. The rest are malignant

Extracting the malignant cell barcodes and saving them.

mm_sample <- "Foxr2_r2"

mal_barcodes[[mm_sample]] <- c(labels(subtrees$subdendrograms[[1]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = ""),
                               labels(subtrees$subdendrograms[[3]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = ""),
                               labels(subtrees$subdendrograms[[4]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = ""))

inferCNV_subtrees[[mm_sample]] <- seurat_mouse[[mm_sample]]@meta.data %>%
    tibble::rownames_to_column("Cell") %>% 
    select(Cell) %>% 
    mutate(inferCNV_cluster = case_when(Cell %in% (labels(subtrees$subdendrograms[[1]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C1",
                                        Cell %in% (labels(subtrees$subdendrograms[[2]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C2",
                                        Cell %in% (labels(subtrees$subdendrograms[[3]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C3",
                                        Cell %in% (labels(subtrees$subdendrograms[[4]]) %>% 
                                                       gsub(pattern = "_.*$", 
                                                            replacement = "")) ~ "C4"))

write_tsv(x = mal_barcodes[[mm_sample]] %>% as.data.frame(),
          file = glue("{out}/barcodes_malignant_{mm_sample}.tsv"))

write_tsv(x = inferCNV_subtrees[[mm_sample]] %>% as.data.frame(),
          file = glue("{out}/inferCNV_subtrees_{mm_sample}.tsv"))

8.2.3 Add label to Seurat objects

Add this information to a new column of the Seurat objects.

Individual sample objects:

save(mal_barcodes, file = glue("{out}/inferCNV_call_malignant_barcodes.Rda"))
save(inferCNV_subtrees, file = glue("{out}/inferCNV_clusters.Rda"))

seurat_mouse <- imap(seurat_mouse, function(seurat, id) {

    mal_inferCNV_call <- mal_barcodes[[id]]
    inferCNV_clusters <- inferCNV_subtrees[[id]]

    x <- seurat@meta.data %>%
        as.data.frame() %>%
        tibble::rownames_to_column("Cell") %>% 
        select(Cell, Doublet_call) %>% 
        mutate(inferCNV_call = case_when(Cell %in% mal_inferCNV_call ~ "Tumor",
                                        TRUE ~ "Normal")) %>%
        mutate(Malignant_status = case_when(Doublet_call != "singlet" ~ "Doublet",
                                        Cell %in% mal_inferCNV_call ~ "Tumor",
                                        TRUE ~ "Normal")) %>% 
        select(-Doublet_call)

    x <- left_join(x, inferCNV_clusters, by = "Cell") %>% 
        column_to_rownames("Cell")

    x$inferCNV_call <- factor(x$inferCNV_call)
    x$Malignant_status <- factor(x$Malignant_status)
    x$inferCNV_cluster <- factor(x$inferCNV_cluster)
    
    seurat <- AddMetaData(seurat, x)

    return(seurat)

})

Joint object:

joint_mal_norm <- imap(seurat_mouse, ~ .x@meta.data %>% 
                       rownames_to_column("Cell") %>% 
                       mutate(inferCNV_cluster = glue("{inferCNV_cluster}_{.y}")) %>% 
                      select(Cell, 
                             inferCNV_call, 
                             Malignant_status,
                             inferCNV_cluster) %>% 
                       mutate(Sample = .y)) %>% 
    Reduce(rbind, .) %>% 
    dplyr::rename("gex_barcode" = "Cell") %>% 
    left_join(seurat_joint@meta.data, ., by = c("gex_barcode", "Sample")) %>% 
    select(inferCNV_call, 
             Malignant_status,
             inferCNV_cluster)

rownames(joint_mal_norm) <- rownames(seurat_joint@meta.data)

seurat_joint <- AddMetaData(seurat_joint, joint_mal_norm)

Also add metadata for malignant cells per each sample:

palette_mal_per_sample <- c("Foxr2_p53_r1_malignant" = "#3341ff",
                            "Foxr2_r1_malignant" = "#741f75",
                            "Foxr2_r2_malignant" = "#ab592d",
                            "Non-malignant" = "gray90")

mal_per_sample <- seurat_joint@meta.data %>% 
    mutate(mal_per_sample = case_when(
        Malignant_status != "Tumor" ~ "Non-malignant",
        Sample == "Foxr2_p53_r1" ~ "Foxr2_p53_r1_malignant",
        Sample == "Foxr2_r1" ~ "Foxr2_r1_malignant",
        Sample == "Foxr2_r2" ~ "Foxr2_r2_malignant"
    )) %>% 
        select(mal_per_sample)

mal_per_sample$mal_per_sample <- factor(mal_per_sample$mal_per_sample,
                                        levels = names(palette_mal_per_sample))

seurat_joint <- AddMetaData(seurat_joint, mal_per_sample)

9 Inspection per sample: dashboard

For each sample, let's define a function to compare the QC of each cluster, alongside the projections, canonical markers for the major cell types, and the amount of Foxr2+ cells. This will help us compare all the information together to understand which clusters are derived from the tumor, along with their cell identities. We will exclude doublets from this analysis.

# Set up metadata variables for Foxr2 gene expression and detection
seurat_mouse <- map(seurat_mouse, 
                    function(seurat){
                        
                        x <- FetchData(seurat, vars = c("Foxr2"))%>%
                                mutate(Foxr2_expression = as.numeric(Foxr2)) %>%
                                mutate(Foxr2_positive = Foxr2 > 0) %>%
                                select(-Foxr2) %>%
                                mutate(Foxr2_positive = as.character(Foxr2_positive))
                                
                        seurat <- AddMetaData(seurat, x)
                        
                        return(seurat)
                        
                    })

annotation_dashboard_umap <- function(seurat, alias) {

    # ----------------------------------------------------------------------------
    
     # UMAP colored by expression cluster (Seurat)
    u1 <- umap(seurat, label = TRUE, legend = FALSE, point_size = 0.3,
               title = glue("Seurat clusters"),
               rasterize = TRUE)  +
         square_theme +
         large_text
    
    # UMAP colored by inferCNV cluster
    u2 <- umap(seurat,
               color_by = "inferCNV_cluster",
               legend = T,
               colors = inferCNV_cluster_palette,
               point_size = 0.3,
               hide_ticks = TRUE,
               label = FALSE,
               title = glue("{alias} inferCNV clusters, N={ncol(seurat)} cells"),
               rasterize = TRUE) + 
        mod_legend_col(title = NULL) +
        theme(legend.position = "bottom") +
         square_theme +
         large_text
    
    # UMAP colored by inferCNV tumor/normal call
    u3 <-  umap(seurat,
                color_by = "inferCNV_call",
                order_by = "inferCNV_call",
                title = "inferCNV call",
                colors = palette_tumor_normal_mm,
                point_size = 0.3,
                label = FALSE, legend = T,
                rasterize = TRUE,) + 
        mod_legend_col(title = NULL) +
        theme(legend.position = "bottom") +
         square_theme +
         large_text
    
     # UMAP colored by FOXR2 detection
    u4 <-  umap(seurat,
                color_by = "Foxr2_positive",
                color_by_type = "discrete",
                order_by = "Foxr2_expression",
                title = "Foxr2 expression > 0",
                colors = c("TRUE" = "red",
                           "FALSE" = "grey80"),
                point_size = 0.3,
                label = FALSE, legend = T,
                rasterize = TRUE) + 
        mod_legend_col(title = NULL) +
        theme(legend.position = "bottom") +
         square_theme +
         large_text
    
    # UMAP colored by broad annotation (Jessa Forebrain)
    u5 <- umap(seurat,
               color_by = "broad_label_Jessa",
               colors = palette_broad_type, 
               point_size = 0.3,
               hide_ticks = TRUE,
               label = FALSE, legend = T,
               title = "Broad annotation Jessa",
               rasterize = TRUE) + 
        mod_legend_col(title = NULL) +
        theme(legend.position = "bottom") +
         square_theme +
         large_text
    
    # UMAP of cell type annotation: Jessa
    u6 <- umap(seurat,
               color_by = "Consensus_broad_braindex",
               legend = FALSE,
               colors = palette_jessa_l3, 
               point_size = 0.3,
               hide_ticks = TRUE,
               label = FALSE,
               title = "Jessa forebrain label",
               rasterize = TRUE) +
         square_theme +
         large_text
    
     # UMAP colored by broad annotation (Yao Forebrain)
    u7 <- umap(seurat,
               color_by = "broad_label_Yao",
               colors = palette_broad_type, 
               point_size = 0.3,
               hide_ticks = TRUE,
               label = FALSE, legend = T,
               title = "Broad annotation Yao",
               rasterize = TRUE) + 
        mod_legend_col(title = NULL) +
        theme(legend.position = "bottom") +
         square_theme +
         large_text

    # UMAP of cell type annotation: yao
    u8 <- umap(seurat,
               color_by = "Consensus_broad_Yao",
               legend = FALSE,
               colors = palette_yao2023_class, 
               point_size = 0.3,
               hide_ticks = TRUE,
               label = FALSE,
               title = "Yao forebrain label",
               rasterize = TRUE) +
         square_theme +
         large_text
    
    # UMAP colored by mitochondrial %
    u9 <-  umap(seurat,
                color_by = "percent.mito",
                color_by_type = "continuous",
                title = "Mitochondrial %",
                colors = purples,
                point_size = 0.3,
                label = FALSE, legend = T,
                rasterize = TRUE) + 
        theme(legend.position = "bottom") +
         square_theme +
         large_text
    
    # UMAP colored by ribosomal %
    u10 <-  umap(seurat,
                color_by = "percent.ribo",
                color_by_type = "continuous",
                title = "Ribosomal %",
                colors = purples,
                point_size = 0.3,
                label = FALSE, legend = T,
                rasterize = TRUE) + 
        theme(legend.position = "bottom") +
         square_theme +
         large_text
    
    # UMAP colored by number of UMIs 
    u11 <-  umap(seurat,
                color_by = "nCount_RNA",
                color_by_type = "continuous",
                title = "nCount_RNA",
                colors = purples,
                point_size = 0.3,
                label = FALSE, legend = T,
                rasterize = TRUE) + 
        theme(legend.position = "bottom") +
         square_theme +
         large_text
    
    # UMAP colored by number of genes
    u12 <-  umap(seurat,
                color_by = "nFeature_RNA",
                color_by_type = "continuous",
                title = "nFeature_RNA",
                colors = purples,
                point_size = 0.3,
                label = FALSE, legend = T,
                rasterize = TRUE) + 
        theme(legend.position = "bottom") +
         square_theme +
         large_text

    # ----------------------------------------------------------------------------
    # Assemble the figure
    # NOTE: legend is being generated in the next chunk
    plot_grid(
        plot_grid(
              u1, u2, u3, u4, 
              u5, u6, u7, u8,
              u9, u10, u11, u12,
                  ncol = 4, align = "hv", axis = "tb"),
        plot_grid(leg_labelled1, leg_labelled2),
        rel_heights = c(0.9, 0.1), ncol = 1)
        

}

Get cell types color legends.

leg_labelled1 <- cowplot::get_legend(umap(seurat_mouse$Foxr2_p53_r1,
           color_by = "Consensus_broad_braindex",
           alpha = 0.5,
           point_size = 0.5,
           colors = palette_jessa_l3,
           label = FALSE,
           legend = TRUE,
           title = "Projected cell type") +
        # increase size of symbols in the legend
        mod_legend_col(ncol = 2, title = NULL) +
        theme(legend.position = "bottom"))

leg_labelled2 <- cowplot::get_legend(umap(seurat_mouse$Foxr2_r1,
           color_by = "Consensus_broad_Yao",
           alpha = 0.5,
           point_size = 0.5,
           colors = palette_yao2023_class,
           label = FALSE,
           legend = TRUE,
           title = "Projected cell type") +
        # increase size of symbols in the legend
        mod_legend_col(ncol = 2, title = NULL) +
        theme(legend.position = "bottom"))

# Row_label: Seurat object column used as the rows of bar plots

annotation_dashboard_bar_vln <- function(seurat, alias, row_label) {

    seurat_col <- seurat@meta.data %>%
        select(row_label, gex_barcode)
    colnames(seurat_col) <- c("Label", "gex_barcode")
    seurat <- AddMetaData(seurat, seurat_col)

    # plot QC metrics as violin plots, colour QC-flagged clusters as red
    p1 <- seurat@meta.data %>% 
        select("nCount_RNA", "nFeature_RNA", "percent.mito", "Doublet_score_log1p", "Label") %>%
        mutate(Label = factor(Label, levels = sort(unique(seurat@meta.data$Label)))) %>%
        pivot_longer(cols = c(nCount_RNA, nFeature_RNA, percent.mito, Doublet_score_log1p),
                     names_to = "stat", values_to = "value") %>%
        ggplot(aes(x = Label, y = value)) +
        geom_violin(fill = "gray50", scale = "width") +
        facet_wrap(~ stat, scales = "free_x", nrow = 1) +
        theme_min2() +
        coord_flip() +
        no_legend() +
        xlab(NULL)

    # ----------------------------------------------------------------------------
    # plot proportion of GFP+ cells per cluster as a barplot
    p2 <- FetchData(seurat, c("GFP")) %>%
        tibble::rownames_to_column("gex_barcode") %>% 
        left_join(., seurat@meta.data %>% 
                    select("Label", "gex_barcode")) %>% 
        mutate(GFP_binary = ifelse(GFP > 0, "+", "-")) %>%
        ggplot(aes(x = Label)) +
        geom_bar(aes(fill = GFP_binary), position = "fill") +
        scale_fill_manual(values = c("+" = "green", "-" = "gray90")) +
        ggtitle("GFP+") +
        geom_hline(yintercept = c(0.05, 0.1)) +
        theme_min2() +
        theme(axis.text.x = element_text(size = 6),
              title = element_text(size = 8)) +
        ylab("% cells") +
        coord_flip() +
        no_legend()

    # ----------------------------------------------------------------------------
    # plot proportion of Foxr2+ cells per cluster as a barplot
    foxr2_df <- seurat@meta.data %>% select("Foxr2_expression", "Label") %>%
        mutate(Foxr2_binary = ifelse(Foxr2_expression > 0, "+", "-"))

    p3 <- foxr2_df %>%
        ggplot(aes(x = Label)) +
        geom_bar(aes(fill = Foxr2_binary), position = "fill") +
        scale_fill_manual(values = c("+" = "red", "-" = "gray90")) +
        ggtitle("Foxr2+") +
        geom_hline(yintercept = c(0.02, 0.05)) +
        theme_min2() +
        theme(axis.text.y = element_blank(),
              title = element_text(size = 8)) +
        ylab("% cells") + xlab(NULL) +
        coord_flip(ylim = c(0, 0.5)) +
        no_legend()

    # ----------------------------------------------------------------------------
    # plot tumor_normal call
    p4 <- seurat@meta.data %>% select("inferCNV_call", "Label") %>%
        ggplot(aes(x = Label)) +
        geom_bar(aes(fill = inferCNV_call), position = "fill") +
        scale_fill_manual(values = palette_tumor_normal_mm) +
        ggtitle("inferCNV_call") +
        theme_min2() +
        theme(axis.text.y = element_blank(),
              title = element_text(size = 8)) +
        ylab("# cells") + xlab(NULL) +
        coord_flip() +
        no_legend()

    # ----------------------------------------------------------------------------

    # barplot of projections from Jessa forebrain - broad
    p5 <- seurat@meta.data %>%
        select(Cluster = Label, broad_label_Jessa) %>%
        ggplot(aes(x = Cluster)) +
        geom_bar(aes(fill = broad_label_Jessa), position = "fill") +
        scale_fill_manual(values = palette_broad_type) +
        ggtitle("Broad annotation Jessa") +
        theme_min2() +
        theme(axis.text.y = element_blank(),
              title = element_text(size = 8)) +
        coord_flip() +
        ylab("% cells") + xlab(NULL) +
        no_legend()
    
    # barplot of projections from Jessa forebrain
    p6 <- seurat@meta.data %>%
        select(Cluster = Label, Consensus_broad_braindex) %>%
        ggplot(aes(x = Cluster)) +
        geom_bar(aes(fill = Consensus_broad_braindex), position = "fill") +
        scale_fill_manual(values = palette_jessa_l3) +
        ggtitle("Jessa forebrain label") +
        theme_min2() +
        theme(axis.text.y = element_blank(),
              title = element_text(size = 8)) +
        coord_flip() +
        ylab("% cells") + xlab(NULL) +
        no_legend()
    
    # barplot of projections from Yao forebrain
    p7 <- seurat@meta.data %>%
        select(Cluster = Label, Consensus_broad_Yao) %>%
        ggplot(aes(x = Cluster)) +
        geom_bar(aes(fill = Consensus_broad_Yao), position = "fill") +
        scale_fill_manual(values = palette_yao2023_class) +
        ggtitle("Yao forebrain label") +
        theme_min2() +
        theme(axis.text.y = element_blank(),
              title = element_text(size = 8)) +
        coord_flip() +
        ylab("% cells") + xlab(NULL) +
        no_legend()
    
  
    # --------------------------------------------------------
    # plot violin plots of a few key genes for cell type identification
    genes <- c("Hexb", "Ly86", # immune
                "Pdgfra", "Cspg4", "Olig2", # opc
                "Mbp", # oligo
                "Rbfox3","Gad1", "Gad2", # neuronal
                "Drd2", # Striatal spiny
                "Nkx2-1", "Lhx6", "Dlx5", "Dlx6", #fingerprint
                "Aqp4", "Slc1a3" # astrocytic
                      )
    
    p8 <- FetchData(seurat, genes) %>% 
         tibble::rownames_to_column(var = "gex_barcode") %>% 
         left_join(., seurat@meta.data %>% select(c("gex_barcode", "Label"))) %>% 
         pivot_longer(cols = genes,
                      names_to = "gene", values_to = "exp") %>% 
         arrange(factor(gene, levels = genes)) %>% 
         ggplot(aes(x = Label, y = exp)) +
            geom_violin() +
            facet_wrap(~ gene,nrow = 1) + coord_flip()
    
    # Assemble the figure
    plot_grid(
        plot_grid(p1, p2, p3, p4, 
                 p5, p6, p7, p8, 
                 nrow = 1, align = "hv", axis = "tb",
                 rel_widths = c(0.2, rep(0.4/6, 6), 0.4)),
        plot_grid(leg_labelled1, leg_labelled2, 
                 nrow=1, align = "hv", axis = "tb"),
             nrow = 2, align = "hv", axis = "tb",
             rel_heights = c(0.8,0.2))
    


}

9.1 Foxr2_p53_r1

sample <- "Foxr2_p53_r1"
subset_seurat <- subset(seurat_mouse[[sample]],
                                 subset = Malignant_status != "Doublet")

annotation_dashboard_umap(subset_seurat, 
                          sample)

annotation_dashboard_bar_vln(subset_seurat, 
                            sample, row_label = "seurat_clusters")

## Joining with `by = join_by(gex_barcode)`
## Joining with `by = join_by(gex_barcode)`

annotation_dashboard_bar_vln(subset_seurat, 
                             sample, row_label = "inferCNV_cluster")

## Joining with `by = join_by(gex_barcode)`
## Joining with `by = join_by(gex_barcode)`

9.2 Foxr2_r1

sample <- "Foxr2_r1"
subset_seurat <- subset(seurat_mouse[[sample]],
                                 subset = Malignant_status != "Doublet")

annotation_dashboard_umap(subset_seurat, 
                          sample)

annotation_dashboard_bar_vln(subset_seurat, 
                             sample, row_label = "seurat_clusters")

## Joining with `by = join_by(gex_barcode)`
## Joining with `by = join_by(gex_barcode)`

annotation_dashboard_bar_vln(subset_seurat, 
                             sample, row_label = "inferCNV_cluster")

## Joining with `by = join_by(gex_barcode)`
## Joining with `by = join_by(gex_barcode)`

9.3 Foxr2_r2

sample <- "Foxr2_r2"
subset_seurat <- subset(seurat_mouse[[sample]],
                                 subset = Malignant_status != "Doublet")

annotation_dashboard_umap(subset_seurat, 
                          sample)

annotation_dashboard_bar_vln(subset_seurat, 
                             sample, row_label = "seurat_clusters")

## Joining with `by = join_by(gex_barcode)`
## Joining with `by = join_by(gex_barcode)`

annotation_dashboard_bar_vln(subset_seurat, 
                             sample, row_label = "inferCNV_cluster")

## Joining with `by = join_by(gex_barcode)`
## Joining with `by = join_by(gex_barcode)`

10 Combined annotations used in paper

Using all this information we can assign final labels as follows:

Exclude doublets
For remaining cells, use cell type projection from the Jessa atlas
For cells called as neurons by Jessa atlas, relabel with cell type projection from Yao atlas
Label as "No consensus" any cells that are labeled as neurons in Jessa and non-neurons in Yao

This way, we combine both references based on their strengths in sampling cell types.

10.1 Set labels

Perform re-labeling combining both atlases, and set up matching color palettes.

Details for each new label column:

Cell_label column:

For cells not labeled "singlet" in doublet identification, assign label NA
For cells labeled "Neuron" in Jessa forebrain atlas broad label and not labeled "Neuron" in Yao atlas broad label, assign label "No consensus"
For remaining cells labeled "Neuron" in Jessa forebrain atlas broad label, assign predicted label from Yao atlas
For cells not labeled "Neuron" in Jessa forebrain atlas broad label, assign predicted label from Jessa forebrain atlas

broad_label column:

Perform steps as above for Cell_label, but use broad label rather than predicted label for each atlas in step 3 and 4.

broad_label_mal_norm column:

Perform steps as above for broad_label, but after it is complete, relabel any cells not labeled malignant as "Non-malignant".

broad_label_mal_norm_ge column:

Perform steps as above for broad_label_mal_norm, but after it is complete, relabel cells with Cell_label in a list of GE-derived neuron types as "GE-derived neurons", and cells in a list of non-GE derived neuron types as "Other neurons". (Only perform relabeling for malignant cells.)

### Set palettes
palette_cell_type <- c("Other/mixed neurons" = "#e3bfcd", 
                       palette_jessa_l3[which(!names(palette_jessa_l3) %in% c("Unresolved", "No consensus"))], 
                       palette_yao2023_class[which(!names(palette_yao2023_class) %in% c("Unresolved", "No consensus"))],
                       "Unresolved" = "gray90",
                       "No consensus" = "gray90")

palette_broad_mal_norm <- c(palette_broad_type[which(!names(palette_broad_type) == "Unresolved")],
                            "Non-malignant" = "gray90")


palette_broad_mal_norm_ge <- c("GE-derived neurons" = "#135ca0",  # Matches blue of MGE neurons
                            "Other neurons" = "#6224b3", # Violet
                            palette_broad_mal_norm[which(!names(palette_broad_type) == "Neuron")])

### Aggregate labels
seurat_mouse <- (map(seurat_mouse,
     function(seurat){
         
         
         x <- seurat@meta.data %>% 
             mutate(Cell_label = case_when(
                Doublet_call != "singlet"     ~ NA,
                broad_label_Jessa != "Neuron" ~ Consensus_broad_braindex,
                broad_label_Jessa == "Neuron" & broad_label_Yao == "Neuron" ~ Consensus_broad_Yao,
                TRUE ~ "No consensus"
             )) %>% 
             mutate(broad_label = case_when(
                 Doublet_call != "singlet"     ~ NA,
                 broad_label_Jessa != "Neuron" ~ broad_label_Jessa,
                 broad_label_Jessa == "Neuron" & broad_label_Yao == "Neuron" ~ "Neuron",
                 TRUE ~ "No consensus"
             )) %>% 
             mutate(broad_label_mal_norm = case_when(
                 Malignant_status != "Tumor"     ~ "Non-malignant",
                 TRUE ~ broad_label
             )) %>% 
             mutate(ge_neuron_broad = case_when(
                 Cell_label %in% 
                     filter(yao_broad, GE_neuron_Yao == "GE-derived neurons")$Label ~ "GE-derived neurons",
                 Cell_label %in% 
                     filter(yao_broad, GE_neuron_Yao == "Other neurons")$Label ~ "Other neurons",
                 TRUE ~ "Other"
             )) %>% 
             mutate(broad_label_mal_norm_ge = case_when(
                 Malignant_status != "Tumor"     ~ "Non-malignant",
                 Cell_label %in% filter(yao_broad, GE_neuron_Yao == "GE-derived neurons")$Label ~ "GE-derived neurons",
                 Cell_label %in% filter(yao_broad, GE_neuron_Yao == "Other neurons")$Label ~ "Other neurons",
                 TRUE ~ as.character(broad_label_mal_norm)
             )) %>% 
             select(Cell_label, broad_label, broad_label_mal_norm, ge_neuron_broad, broad_label_mal_norm_ge)
         
                        
        x$Cell_label <- factor(x$Cell_label, levels = names(palette_cell_type))
        x$broad_label <- factor(x$broad_label, levels = rev(names(palette_broad_type)))
        x$broad_label_mal_norm <- factor(x$broad_label_mal_norm, levels = rev(names(palette_broad_mal_norm)))
        x$ge_neuron_broad <- factor(x$ge_neuron_broad, levels = rev(names(palette_ge_neurons)))
        x$broad_label_mal_norm_ge <- factor(x$broad_label_mal_norm_ge, levels = rev(names(palette_broad_mal_norm_ge)))
        
        seurat <- AddMetaData(seurat, x)

        return(seurat)   
         
     }
))

Add this metadata to the joint object.

joint_comb_label <- imap(seurat_mouse, ~ .x@meta.data %>% 
                       rownames_to_column("Cell") %>% 
                      select(Cell, 
                             Cell_label, 
                             broad_label,
                             broad_label_mal_norm,
                             ge_neuron_broad,
                             broad_label_mal_norm_ge) %>% 
                       mutate(Sample = .y)) %>% 
    Reduce(rbind, .) %>% 
    dplyr::rename("gex_barcode" = "Cell") %>% 
    left_join(seurat_joint@meta.data, ., by = c("gex_barcode", "Sample")) %>% 
    select(Cell_label, 
         broad_label,
         broad_label_mal_norm,
         ge_neuron_broad,
         broad_label_mal_norm_ge)

rownames(joint_comb_label) <- rownames(seurat_joint@meta.data)

seurat_joint <- AddMetaData(seurat_joint, joint_comb_label)

10.2 Plot Joint UMAPs for publication

Define helper functions for plotting assignments in the UMAP space, and quantifying cell labels.

Bar plots only contain cell types with > 2% cells in the sample.

plot_celltype <- function(seurat, alias) {

    #seurat$Cell_label <- factor(seurat$Cell_label, levels = names(palette_cell_type))

    # tumor cells
    p1 <- umap(seurat,
               color_by = "Cell_label",
               label = FALSE,
               colors = palette_cell_type,
               point_size = 1,
               title = glue("{alias} - tumor"),
               cells = Seurat::WhichCells(seurat, expression = Malignant_status == "Tumor"),
               show_all_cells = FALSE, constrain_scale = TRUE, rasterize = TRUE,
               legend = F) +
        mod_legend_col() +
         square_theme +
         large_text

    # normal cells
    p2 <- umap(seurat,
               color_by = "Cell_label",
               label = FALSE,
               colors = palette_cell_type,
               point_size = 1,
               title = glue("{alias} - normal"),
               cells = Seurat::WhichCells(seurat, expression = Malignant_status == "Normal"),
               show_all_cells = FALSE,
               constrain_scale = TRUE,
               legend = F) +
        mod_legend_col() +
         square_theme +
         large_text

    # cell type counts
    counts <- seurat@meta.data %>%
        filter(!(Cell_label %in% c("doublet"))) %>%
        group_by(Malignant_status, Cell_label) %>%
        filter(Cell_label != "No consensus") %>% 
        count()

    # tumor cells
    p3 <- counts %>%
        #mutate(Malignant_status = factor(Malignant_status, levels = c("Tumor", "Normal"))) %>% 
        filter(Malignant_status == "Tumor") %>% 
        filter(n > 0.02*ncol(seurat)) %>% 
        ggplot(aes(x = reorder(Cell_label, n), y = n)) +
        geom_col(aes(fill = Cell_label)) +
        geom_text(aes(label = n), hjust = -0.5) +
        scale_fill_manual(values = palette_cell_type) +
        #facet_wrap(~ Malignant_status, scales = "free_x") +
        xlab("Cell_label") +
        rotate_x() +
        no_legend() +
        coord_flip() +
        ylim(c(0, max(filter(counts, Malignant_status == "Tumor") %>% .$n)*1.2)) 
    
    # normal cells
    p4 <- counts %>%
        #mutate(Malignant_status = factor(Malignant_status, levels = c("Tumor", "Normal"))) %>% 
        filter(Malignant_status == "Normal") %>% 
        filter(n > 0.02*ncol(seurat)) %>% 
        ggplot(aes(x = reorder(Cell_label, n), y = n)) +
        geom_col(aes(fill = Cell_label)) +
        geom_text(aes(label = n), hjust = -1) +
        scale_fill_manual(values = palette_cell_type) +
        #facet_wrap(~ Malignant_status, scales = "free_x") +
        xlab("Cell_label") +
        rotate_x() +
        no_legend() +
        coord_flip() +
        ylim(c(0, max(filter(counts, Malignant_status == "Normal") %>% .$n)*1.2)) 

    plot_grid(
        plot_grid(p1, p3, ncol = 1, align = "v", axis = "lr", rel_heights = c(0.7,0.3)),
        plot_grid(p2, p4, ncol = 1, align = "v", axis = "lr", rel_heights = c(0.7,0.3)),
        ncol = 2, align = "h", axis = "tb")

}

plot_by_label_bar <- function(seurat, id, min_cells, column, palette) {

    # Filter out cell types by min_cells
    types_keep <- table(seurat[[column]])
    types_keep <- names(types_keep)[types_keep > min_cells]

    # Set order of cell types based on palette, remove "No consensus" cells
    order <- names(palette)
    order <- setdiff(order, "No consensus")

    # Count # cell per cell type
    df <- seurat@meta.data %>%
        select(all_of(c(column))) %>%
        rename(Label = column) %>% 
        filter(Label %in% order) %>% 
        group_by(Label) %>% count()
    
    # Create bar plot
    (p <- df %>%
        mutate(Label = factor(Label, levels = rev(order))) %>%
        ggplot(aes(x = Label, y = n)) +
          geom_col(aes(fill = Label), width = 0.5) +
          geom_text(aes(label = n), hjust = -0.5) +
          scale_fill_manual(values = palette) +
          theme_min2() +
          coord_flip() +
          no_legend() +
          ggtitle(id) +
          scale_x_discrete(order, drop = FALSE) +
          xlab(column) + ylab("# cells") +
          theme(axis.title.y=element_blank()) +
          rotate_x() +
        ylim(c(0, max(df %>% .$n)*1.2)))

}

plot_celltype(subset(seurat_joint, subset = broad_label != "No consensus" & !is.na(broad_label)), "Joint")

Save joint object with no Non-consensus cells or doublets for plotting

seurat_joint_for_plot <- subset(seurat_joint, subset = broad_label != "No consensus" & !is.na(broad_label))

Plot malignant cells from each sample, and print the number of cells per category.

umap(seurat_joint_for_plot,
            color_by = "mal_per_sample",
            colors = palette_mal_per_sample,
            alpha = 0.5, label = FALSE, point_size = 0.8, legend = TRUE,
            # rasterize points
            rasterize = TRUE,
            title = "Malignant cells per sample") +
    theme(legend.position = "bottom") +
    mod_legend_col(title = NULL) +
    square_theme +
    large_text

table(seurat_joint_for_plot$mal_per_sample)

## 
## Foxr2_p53_r1_malignant     Foxr2_r1_malignant     Foxr2_r2_malignant 
##                   2952                   3904                    529 
##          Non-malignant 
##                   5173

Remove all cells labeled "No consensus". Plot broad label with label "non-malignant" for remaining non-malignant cells.

umap(seurat_joint_for_plot,
                        color_by = "broad_label_mal_norm",
                        order_by = "broad_label_mal_norm",
                        colors = palette_broad_mal_norm,
                        alpha = 0.5, label = FALSE, point_size = 0.8, legend = TRUE,
                        # rasterize points
                        rasterize = TRUE,
                        title = "Joint broad label + non-malignant") +
         theme(legend.position = "bottom") +
         mod_legend_col(title = NULL) +
         square_theme +
         large_text

column <- "broad_label_mal_norm"

plot_by_label_bar(seurat_joint_for_plot,
                  id = NULL,
                                  min_cells = 0,
                                  palette = palette_broad_mal_norm,
                                  column = column) +
         large_text

10.3 Count GE neurons

Here, we check how many of the malignant neurons in the joint object of mouse models map to GE-derived neuron types.

table(seurat_joint_for_plot$broad_label_mal_norm_ge)

## 
##      Non-malignant       No consensus              Other               Glia 
##               5173                  0                149               2464 
##      Other neurons GE-derived neurons 
##               2445               2327

n_neurons <- seurat_joint_for_plot@meta.data %>% 
    filter(broad_label_mal_norm_ge %in% c("GE-derived neurons", "Other neurons")) %>% 
    nrow()

n_ge_neurons <- seurat_joint_for_plot@meta.data %>% 
    filter(broad_label_mal_norm_ge %in% c("GE-derived neurons")) %>% 
    nrow()

# Percentage of malignant neurons which are GE-derived cell types
n_ge_neurons/n_neurons

## [1] 0.4876362

11 Plot gene expression

11.1 Neuronal and glial programs

Set list of genes to plot:

check_genes <- c("Snap25", "Stmn2", "Rbfox3", # Pan-neuronal
                 "Nxph1", # Interneuron
                 "Bcas1", # OPC
                 "Mag", "Mog", "Mobp", "Mbp") # Oligodendrocytes

Violin of malignant neurons vs. glia in each sample:

imap(seurat_mouse,
    
    function(seurat, id){
        
        violin_grid(seurat = subset(seurat, 
                            subset = Malignant_status == "Tumor" & broad_label %in% c("Glia", "Neuron")),
            genes = check_genes,
            group_col = "broad_label", group_order = c("Glia", "Neuron"),
            colours = palette_broad_type[which(names(palette_broad_type) %in% c("Glia", "Neuron"))]
        ) + ggtitle(glue("Malignant glia vs. neurons in sample {id}")) + large_text
        
    }
)

## $Foxr2_p53_r1

## 
## $Foxr2_r1

## 
## $Foxr2_r2

12 Save single-nuclei metadata

Key columns produced:

Malignant_status: values in "Tumor", "Normal", or "Doublet"
Cell_label: final cell type label, separates doublets and assigns cell types based on automated projections from both atlases

seurat_mouse_metadata <- map(seurat_mouse, 
                             function(seurat) {
                                 meta <- seurat@meta.data %>% 
                                     as.data.frame() %>% 
                                     rownames_to_column("Cell")
                                 rownames(meta) <- NULL
                                 return(meta)
                            })
save(seurat_mouse_metadata, file = glue("{out}/seurat_mouse_metadata.Rda"))


seurat_mouse_metadata_joint <- seurat_joint@meta.data %>% 
                                     as.data.frame() %>% 
                                     rownames_to_column("Cell")
rownames(seurat_mouse_metadata_joint) <- NULL
save(seurat_mouse_metadata_joint, file = glue("{out}/seurat_mouse_metadata_joint.Rda"))

These can be added back to the objects as follows:

load(glue("{out}/seurat_mouse_metadata.Rda"))
seurat_mouse <- map2(seurat_mouse, seurat_mouse_metadata,
                     function(seurat, meta) {
                         
                         seurat@meta.data <- meta %>% 
                             column_to_rownames("Cell")
                         
                         return(seurat)
                     })

load(glue("{out}/seurat_mouse_metadata_joint.Rda"))
seurat_joint@meta.data <- seurat_mouse_metadata_joint %>%
                             column_to_rownames("Cell")

13 Reproducibility

This document was last rendered on:

## 2024-11-05 11:09:55

The git repository and last commit:

## Local:    main /project/kleinman/bhavyaa.chandarana/from_hydra/2023-05-NB-FOXR2/public
## Remote:   main @ origin (https://github.com/fungenomics/NB-FOXR2.git)
## Head:     [72d1c5c] 2024-11-04: Add DOI badge

The random seed was set with set.seed(100)

The R session info:

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.1.2 (2021-11-01)
##  os       Rocky Linux 8.10 (Green Obsidian)
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       America/Toronto
##  date     2024-11-05
##  pandoc   1.19.2.1 @ /cvmfs/soft.computecanada.ca/gentoo/2020/usr/bin/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  ! package              * version  date (UTC) lib source
##  P abind                  1.4-5    2016-07-21 [?] CRAN (R 4.1.2)
##  P ape                  * 5.7-1    2023-03-13 [?] RSPM (R 4.1.2)
##  P beachmat               2.10.0   2021-10-26 [?] Bioconductor
##  P beeswarm               0.4.0    2021-06-01 [?] CRAN (R 4.1.2)
##  P Biobase                2.54.0   2021-10-26 [?] Bioconductor
##  P BiocGenerics           0.40.0   2021-10-26 [?] Bioconductor
##  P BiocManager            1.30.15  2021-05-11 [?] CRAN (R 4.1.2)
##  P BiocNeighbors          1.12.0   2021-10-26 [?] Bioconductor
##  P BiocParallel           1.28.3   2021-12-09 [?] Bioconductor
##  P BiocSingular           1.10.0   2021-10-26 [?] Bioconductor
##  P Biostrings             2.62.0   2021-10-26 [?] Bioconductor
##  P bit                    4.0.4    2020-08-04 [?] CRAN (R 4.1.2)
##  P bit64                  4.0.5    2020-08-30 [?] CRAN (R 4.1.2)
##  P bitops                 1.0-7    2021-04-24 [?] CRAN (R 4.1.2)
##  P bluster                1.4.0    2021-10-26 [?] Bioconductor
##  P bslib                  0.3.1    2021-10-06 [?] CRAN (R 4.1.2)
##  P cachem                 1.0.6    2021-08-19 [?] CRAN (R 4.1.2)
##  P Cairo                  1.6-0    2022-07-05 [?] CRAN (R 4.1.2)
##  P callr                  3.7.6    2024-03-25 [?] RSPM
##  P cellranger             1.1.0    2016-07-27 [?] CRAN (R 4.1.2)
##  P circlize               0.4.15   2022-05-10 [?] CRAN (R 4.1.2)
##  P cli                    3.6.1    2023-03-23 [?] RSPM (R 4.1.2)
##  P clue                   0.3-64   2023-01-31 [?] CRAN (R 4.1.2)
##  P cluster                2.1.2    2021-04-17 [?] CRAN (R 4.1.2)
##  P codetools              0.2-18   2020-11-04 [?] CRAN (R 4.1.2)
##  P colorspace             2.0-2    2021-06-24 [?] CRAN (R 4.1.2)
##  P ComplexHeatmap       * 2.10.0   2021-10-26 [?] Bioconductor
##  P cowplot              * 1.1.1    2020-12-30 [?] CRAN (R 4.1.2)
##  P crayon                 1.4.2    2021-10-29 [?] CRAN (R 4.1.2)
##  P data.table             1.14.2   2021-09-27 [?] CRAN (R 4.1.2)
##  P DelayedArray           0.20.0   2021-10-26 [?] Bioconductor
##  P DelayedMatrixStats     1.16.0   2021-10-26 [?] Bioconductor
##  P deldir                 1.0-6    2021-10-23 [?] CRAN (R 4.1.2)
##  P dendextend           * 1.17.1   2023-03-25 [?] CRAN (R 4.1.2)
##  P devtools               2.4.5    2022-10-11 [?] CRAN (R 4.1.2)
##  P digest                 0.6.35   2024-03-11 [?] CRAN (R 4.1.2)
##  P docopt                 0.7.1    2020-06-24 [?] CRAN (R 4.1.2)
##  P doParallel             1.0.16   2020-10-16 [?] CRAN (R 4.1.2)
##  P dplyr                * 1.1.1    2023-03-22 [?] CRAN (R 4.1.2)
##  P dqrng                  0.3.0    2021-05-01 [?] CRAN (R 4.1.2)
##  P edgeR                  3.36.0   2021-10-26 [?] Bioconductor
##  P ellipsis               0.3.2    2021-04-29 [?] CRAN (R 4.1.2)
##  P evaluate               0.23     2023-11-01 [?] CRAN (R 4.1.2)
##  P fansi                  1.0.2    2022-01-14 [?] CRAN (R 4.1.2)
##  P farver                 2.1.0    2021-02-28 [?] CRAN (R 4.1.2)
##  P fastmap                1.1.0    2021-01-25 [?] CRAN (R 4.1.2)
##  P fastmatch              1.1-3    2021-07-23 [?] CRAN (R 4.1.2)
##  P fitdistrplus           1.1-6    2021-09-28 [?] CRAN (R 4.1.2)
##  P foreach                1.5.1    2020-10-15 [?] CRAN (R 4.1.2)
##  P fs                     1.5.2    2021-12-08 [?] CRAN (R 4.1.2)
##  P future                 1.25.0   2022-04-24 [?] CRAN (R 4.1.2)
##  P future.apply           1.8.1    2021-08-10 [?] CRAN (R 4.1.2)
##  P generics               0.1.3    2022-07-05 [?] CRAN (R 4.1.2)
##  P GenomeInfoDb           1.30.1   2022-01-30 [?] Bioconductor
##  P GenomeInfoDbData       1.2.4    2023-11-28 [?] Bioconductor
##  P GenomicRanges          1.46.1   2021-11-18 [?] Bioconductor
##  P GetoptLong             1.0.5    2020-12-15 [?] CRAN (R 4.1.2)
##  P ggalluvial           * 0.12.5   2023-02-22 [?] CRAN (R 4.1.2)
##  P ggbeeswarm             0.7.1    2022-12-16 [?] CRAN (R 4.1.2)
##  P ggforce                0.3.3    2021-03-05 [?] CRAN (R 4.1.2)
##  P ggplot2              * 3.4.2    2023-04-03 [?] CRAN (R 4.1.2)
##  P ggrastr                1.0.1    2021-12-08 [?] CRAN (R 4.1.2)
##  P ggrepel                0.9.1    2021-01-15 [?] CRAN (R 4.1.2)
##  P ggridges               0.5.3    2021-01-08 [?] CRAN (R 4.1.2)
##  P ggseqlogo              0.1      2017-07-25 [?] CRAN (R 4.1.2)
##  P git2r                  0.29.0   2021-11-22 [?] CRAN (R 4.1.2)
##  P GlobalOptions          0.1.2    2020-06-10 [?] CRAN (R 4.1.2)
##  P globals                0.14.0   2020-11-22 [?] CRAN (R 4.1.2)
##  P glue                 * 1.6.2    2022-02-24 [?] CRAN (R 4.1.2)
##  P goftest                1.2-3    2021-10-07 [?] CRAN (R 4.1.2)
##  P gridExtra              2.3      2017-09-09 [?] CRAN (R 4.1.2)
##  P gtable                 0.3.0    2019-03-25 [?] CRAN (R 4.1.2)
##  P here                 * 1.0.1    2020-12-13 [?] CRAN (R 4.1.2)
##  P highr                  0.9      2021-04-16 [?] CRAN (R 4.1.2)
##  P hms                    1.1.1    2021-09-26 [?] CRAN (R 4.1.2)
##  P htmltools              0.5.2    2021-08-25 [?] CRAN (R 4.1.2)
##  P htmlwidgets            1.5.4    2021-09-08 [?] CRAN (R 4.1.2)
##  P httpuv                 1.6.5    2022-01-05 [?] CRAN (R 4.1.2)
##  P httr                   1.4.2    2020-07-20 [?] CRAN (R 4.1.2)
##  P ica                    1.0-2    2018-05-24 [?] CRAN (R 4.1.2)
##  P igraph                 2.0.3    2024-03-13 [?] CRAN (R 4.1.2)
##  P IRanges                2.28.0   2021-10-26 [?] Bioconductor
##  P irlba                  2.3.5    2021-12-06 [?] CRAN (R 4.1.2)
##  P iterators              1.0.13   2020-10-15 [?] CRAN (R 4.1.2)
##  P jquerylib              0.1.4    2021-04-26 [?] CRAN (R 4.1.2)
##  P jsonlite               1.8.8    2023-12-04 [?] CRAN (R 4.1.2)
##  P KernSmooth             2.23-20  2021-05-03 [?] CRAN (R 4.1.2)
##  P knitr                  1.37     2021-12-16 [?] CRAN (R 4.1.2)
##  P labeling               0.4.2    2020-10-20 [?] CRAN (R 4.1.2)
##  P later                  1.3.0    2021-08-18 [?] CRAN (R 4.1.2)
##  P lattice                0.20-45  2021-09-22 [?] CRAN (R 4.1.2)
##  P lazyeval               0.2.2    2019-03-15 [?] CRAN (R 4.1.2)
##  P leiden                 0.3.9    2021-07-27 [?] CRAN (R 4.1.2)
##  P lifecycle              1.0.3    2022-10-07 [?] CRAN (R 4.1.2)
##  P limma                  3.50.3   2022-04-07 [?] Bioconductor
##  P listenv                0.8.0    2019-12-05 [?] CRAN (R 4.1.2)
##  P lmtest                 0.9-39   2021-11-07 [?] CRAN (R 4.1.2)
##  P locfit                 1.5-9.4  2020-03-25 [?] CRAN (R 4.1.2)
##  P lsa                    0.73.2   2020-05-04 [?] CRAN (R 4.1.2)
##  P magrittr             * 2.0.3    2022-03-30 [?] CRAN (R 4.1.2)
##  P MASS                   7.3-54   2021-05-03 [?] CRAN (R 4.1.2)
##  P Matrix                 1.3-4    2021-06-01 [?] CRAN (R 4.1.2)
##  P MatrixGenerics         1.6.0    2021-10-26 [?] Bioconductor
##  P matrixStats            0.61.0   2021-09-17 [?] CRAN (R 4.1.2)
##  P memoise                2.0.1    2021-11-26 [?] CRAN (R 4.1.2)
##  P metapod                1.2.0    2021-10-26 [?] Bioconductor
##  P MetBrewer            * 0.2.0    2022-03-21 [?] CRAN (R 4.1.2)
##  P mgcv                   1.8-38   2021-10-06 [?] CRAN (R 4.1.2)
##  P mime                   0.12     2021-09-28 [?] CRAN (R 4.1.2)
##  P miniUI                 0.1.1.1  2018-05-18 [?] CRAN (R 4.1.2)
##  P munsell                0.5.0    2018-06-12 [?] CRAN (R 4.1.2)
##  P nlme                   3.1-153  2021-09-07 [?] CRAN (R 4.1.2)
##  P parallelly             1.30.0   2021-12-17 [?] CRAN (R 4.1.2)
##  P patchwork            * 1.1.1    2020-12-17 [?] CRAN (R 4.1.2)
##  P pbapply              * 1.5-0    2021-09-16 [?] CRAN (R 4.1.2)
##  P pillar                 1.9.0    2023-03-22 [?] RSPM (R 4.1.2)
##  P pkgbuild               1.4.2    2023-06-26 [?] CRAN (R 4.1.2)
##  P pkgconfig              2.0.3    2019-09-22 [?] CRAN (R 4.1.2)
##  P pkgload                1.3.3    2023-09-22 [?] CRAN (R 4.1.2)
##  P plotly                 4.10.0   2021-10-09 [?] CRAN (R 4.1.2)
##  P plyr                   1.8.6    2020-03-03 [?] CRAN (R 4.1.2)
##  P png                    0.1-7    2013-12-03 [?] CRAN (R 4.1.2)
##  P polyclip               1.10-0   2019-03-14 [?] CRAN (R 4.1.2)
##  P prettyunits            1.1.1    2020-01-24 [?] CRAN (R 4.1.2)
##  P processx               3.8.4    2024-03-16 [?] RSPM
##  P profvis                0.3.8    2023-05-02 [?] CRAN (R 4.1.2)
##  P promises               1.2.0.1  2021-02-11 [?] CRAN (R 4.1.2)
##  P ps                     1.7.6    2024-01-18 [?] RSPM
##  P purrr                * 1.0.1    2023-01-10 [?] CRAN (R 4.1.2)
##  P qlcMatrix              0.9.7    2018-04-20 [?] CRAN (R 4.1.2)
##  P R6                     2.5.1    2021-08-19 [?] CRAN (R 4.1.2)
##  P RANN                   2.6.1    2019-01-08 [?] CRAN (R 4.1.2)
##  P RColorBrewer         * 1.1-2    2014-12-07 [?] CRAN (R 4.1.2)
##  P Rcpp                   1.0.8    2022-01-13 [?] CRAN (R 4.1.2)
##  P RcppAnnoy              0.0.19   2021-07-30 [?] CRAN (R 4.1.2)
##  P RcppRoll               0.3.0    2018-06-05 [?] CRAN (R 4.1.2)
##  P RCurl                  1.98-1.5 2021-09-17 [?] CRAN (R 4.1.2)
##  P readr                * 2.1.1    2021-11-30 [?] CRAN (R 4.1.2)
##  P readxl               * 1.3.1    2019-03-13 [?] CRAN (R 4.1.2)
##  P remotes                2.4.2.1  2023-07-18 [?] CRAN (R 4.1.2)
##  P renv                   1.0.3    2023-09-19 [?] CRAN (R 4.1.2)
##  P reshape2               1.4.4    2020-04-09 [?] CRAN (R 4.1.2)
##  P reticulate             1.23     2022-01-14 [?] CRAN (R 4.1.2)
##  P rjson                  0.2.21   2022-01-09 [?] CRAN (R 4.1.2)
##  P rlang                  1.1.3    2024-01-10 [?] CRAN (R 4.1.2)
##  P rmarkdown              2.11     2021-09-14 [?] CRAN (R 4.1.2)
##  P ROCR                   1.0-11   2020-05-02 [?] CRAN (R 4.1.2)
##  P rpart                  4.1-15   2019-04-12 [?] CRAN (R 4.1.2)
##  P rprojroot              2.0.2    2020-11-15 [?] CRAN (R 4.1.2)
##  P Rsamtools              2.10.0   2021-10-26 [?] Bioconductor
##  P rsvd                   1.0.5    2021-04-16 [?] RSPM (R 4.1.2)
##  P Rtsne                  0.15     2018-11-10 [?] CRAN (R 4.1.2)
##  P S4Vectors              0.32.4   2022-03-24 [?] Bioconductor
##  P sass                   0.4.0    2021-05-12 [?] CRAN (R 4.1.2)
##  P ScaledMatrix           1.2.0    2021-10-26 [?] Bioconductor
##  P scales                 1.2.1    2022-08-20 [?] CRAN (R 4.1.2)
##  P scater                 1.22.0   2021-10-26 [?] Bioconductor
##  P scattermore            0.7      2020-11-24 [?] CRAN (R 4.1.2)
##  P scDblFinder          * 1.8.0    2021-10-26 [?] Bioconductor
##  P scran                  1.22.1   2021-11-14 [?] Bioconductor
##  P sctransform            0.3.3    2022-01-13 [?] CRAN (R 4.1.2)
##  P scuttle                1.4.0    2021-10-26 [?] Bioconductor
##  P sessioninfo            1.2.2    2021-12-06 [?] CRAN (R 4.1.2)
##  P Seurat               * 4.0.0    2021-01-30 [?] CRAN (R 4.1.2)
##  P SeuratObject         * 4.0.4    2021-11-23 [?] CRAN (R 4.1.2)
##  P shape                  1.4.6    2021-05-19 [?] CRAN (R 4.1.2)
##  P shiny                  1.7.1    2021-10-02 [?] CRAN (R 4.1.2)
##  P Signac               * 1.3.0    2021-07-12 [?] CRAN (R 4.1.2)
##  P SingleCellExperiment   1.16.0   2021-10-26 [?] Bioconductor
##  P slam                   0.1-50   2022-01-08 [?] CRAN (R 4.1.2)
##  P SnowballC              0.7.0    2020-04-01 [?] CRAN (R 4.1.2)
##  P sparseMatrixStats      1.6.0    2021-10-26 [?] Bioconductor
##  P sparsesvd              0.2      2019-07-15 [?] CRAN (R 4.1.2)
##  P spatstat               1.64-1   2020-05-12 [?] CRAN (R 4.1.2)
##  P spatstat.data          2.1-2    2021-12-17 [?] CRAN (R 4.1.2)
##  P spatstat.utils         2.3-0    2021-12-12 [?] CRAN (R 4.1.2)
##  P statmod                1.4.36   2021-05-10 [?] CRAN (R 4.1.2)
##  P stringi                1.7.6    2021-11-29 [?] CRAN (R 4.1.2)
##  P stringr              * 1.5.0    2022-12-02 [?] CRAN (R 4.1.2)
##  P SummarizedExperiment   1.24.0   2021-10-26 [?] Bioconductor
##  P survival               3.2-13   2021-08-24 [?] CRAN (R 4.1.2)
##  P tensor                 1.5      2012-05-05 [?] CRAN (R 4.1.2)
##  P tibble               * 3.2.1    2023-03-20 [?] RSPM (R 4.1.2)
##  P tidyr                * 1.3.0    2023-01-24 [?] CRAN (R 4.1.2)
##  P tidyselect             1.2.0    2022-10-10 [?] CRAN (R 4.1.2)
##  P tweenr                 1.0.2    2021-03-23 [?] CRAN (R 4.1.2)
##  P tzdb                   0.3.0    2022-03-28 [?] CRAN (R 4.1.2)
##  P urlchecker             1.0.1    2021-11-30 [?] CRAN (R 4.1.2)
##  P usethis                2.2.2    2023-07-06 [?] CRAN (R 4.1.2)
##  P utf8                   1.2.2    2021-07-24 [?] CRAN (R 4.1.2)
##  P uwot                   0.1.11   2021-12-02 [?] CRAN (R 4.1.2)
##  P vctrs                  0.6.5    2023-12-01 [?] CRAN (R 4.1.2)
##  P vipor                  0.4.5    2017-03-22 [?] CRAN (R 4.1.2)
##  P viridis              * 0.5.1    2018-03-29 [?] RSPM (R 4.1.2)
##  P viridisLite          * 0.3.0    2018-02-01 [?] CRAN (R 4.1.2)
##  P vroom                  1.5.7    2021-11-30 [?] CRAN (R 4.1.2)
##  P withr                  2.5.0    2022-03-03 [?] CRAN (R 4.1.2)
##  P xfun                   0.29     2021-12-14 [?] CRAN (R 4.1.2)
##  P xgboost                1.7.5.1  2023-03-30 [?] CRAN (R 4.1.2)
##  P xtable                 1.8-4    2019-04-21 [?] CRAN (R 4.1.2)
##  P XVector                0.34.0   2021-10-26 [?] Bioconductor
##  P yaml                   2.2.1    2020-02-01 [?] CRAN (R 4.1.2)
##  P zlibbioc               1.40.0   2021-10-26 [?] Bioconductor
##  P zoo                    1.8-9    2021-03-09 [?] CRAN (R 4.1.2)
## 
##  [1] /project/kleinman/bhavyaa.chandarana/from_hydra/2023-05-NB-FOXR2/public/renv/library/R-4.1/x86_64-pc-linux-gnu
##  [2] /home/kleinman/bhavyaa.chandarana/.cache/R/renv/sandbox/R-4.1/x86_64-pc-linux-gnu/145cef2c
## 
##  P ── Loaded and on-disk path mismatch.
## 
## ──────────────────────────────────────────────────────────────────────────────

A project of the Kleinman Lab at McGill University, using the rr reproducible research template.

07.1 - Murine models of NB-FOXR2: cell annotation

Selin Jessa [selin.jessa@mail.mcgill.ca] and Bhavyaa Chandarana [bhavyaa.chandarana@mail.mcgill.ca]

05 November, 2024

1 Configuration

2 Overview

3 Libraries

4 Load data

4.1 Load mouse model single-cell data

4.1.1 Individual samples

4.1.2 Joint object

5 Load reference information

5.1 General

5.2 Jessa forebrain

5.2.1 TABLE: Jessa reference ontology

5.3 Yao forebrain

5.3.1 TABLE: Yao reference ontology

6 QC

6.0.1 TABLE: Mouse singlecell QC

6.0.2 Sample QC overviews

6.0.3 Call doublets

7 Load cell type annotation

7.1 Jessa forebrain

7.2 Yao forebrain

8 Distinguish tumor / normal cells

8.1 By FOXR2 expression

8.1.1 Foxr2_p53_r1

8.1.2 Foxr2_r1

8.1.3 Foxr2_r2

8.1.4 Export reference

8.2 By inferCNV

8.2.1 Clean reference

8.2.2 Get malignant cells per sample

8.2.2.1 Foxr2_p53_r1 (AN24377)

8.2.2.2 Foxr2_r1 (M7238)

8.2.2.3 Foxr2_r2 (AN22476)

8.2.3 Add label to Seurat objects

9 Inspection per sample: dashboard

9.1 Foxr2_p53_r1

9.2 Foxr2_r1

9.3 Foxr2_r2

10 Combined annotations used in paper

10.1 Set labels

10.2 Plot Joint UMAPs for publication

10.3 Count GE neurons

11 Plot gene expression

11.1 Neuronal and glial programs

12 Save single-nuclei metadata

13 Reproducibility