1 Working with Seurat

1.1 Load the library

Seurat is an R package designed for the analysis of single-cell RNA-seq data.

library(Seurat)
## Loading required package: methods

1.2 Import the data

Single-cell RNA-seq data are presented in a matrix, where each row represents a gene and each column represents a single cell with a raw count (UMI). We first load the text file then create a “Seurat object”, the data structure suitable to work with Seurat.

raw_counts <- read.csv2(file="data/data_day1/GSM2494785_dge_mel_rep3.txt",sep="\t")
raw_counts[1:3,1:5]
##            GENE TCCCTAAAGTAN TTTAAGCTCTTN AGAGAGAATACA GCCCGTGGAGCA
## 1         128up            0            1            1            2
## 2 14-3-3epsilon          426          371          438          380
## 3    14-3-3zeta           64           54           58           58
print(dim(raw_counts))
## [1] 17026  1586

Here we have the expression of 17 026 genes in 1586 cells.

To work with Seurat, you need to create a Seurat Object. Here, we create a Seurat object from our dataframe. We modify the table raw_counts to have the field GENE as the rownames.

rownames(raw_counts) = raw_counts$GENE
raw_counts$GENE = NULL

While creating the Seurat object, we can perform a first filtering: we exclude cells that contain less than 200 genes (undersequenced cells or debris) and genes that are expressed in only 2 cells.

mydata <- CreateSeuratObject(raw_counts, min.cells = 2, min.features = 200)
mydata
## An object of class Seurat 
## 12511 features across 1579 samples within 1 assay 
## Active assay: RNA (12511 features)

Alternative: if you have a directory produced by CellRanger, you create your Seurat object with the function read10X. This function takes as argument the name of the folder containing the output of CellRanger (matrix.mtx, genes.tsv, barcodes.tsv).

1.3 Explore the Seurat data structure

A Seurat object is not the easiest structure to work with, but with a bit of practice you will learn to appreciate its potentiality.

In Seurat, data are organised in different compartements (slots), which contain themselves several compartements, which can also contain sub-compartments, etc.

Each compartment can be used to store:

  • data from multiple modalities, such as RNAseq, spatial transcriptomics, ATAC-seq… For our session today, we will only focus on scRNAseq data (slot assays, sub-slot RNA)
  • general results regarding your data, e.g. the total number of UMI expressed (slot meta.data)
  • results of analyses: PCA components or clustering results
slotNames(mydata)
##  [1] "assays"       "meta.data"    "active.assay" "active.ident" "graphs"      
##  [6] "neighbors"    "reductions"   "project.name" "misc"         "version"     
## [11] "commands"     "tools"

You navigate through this hierarchy using @ and $ signs.

slotNames(mydata@assays$RNA)
## [1] "counts"        "data"          "scale.data"    "key"          
## [5] "var.features"  "meta.features" "misc"

In the slots associated RNA, you can store:

  • counts : raw UMI (the data we imported)
  • data : filtered/normalized counting matrix
  • scale.data : normalized and scaled data (usually for PCA analyses)
  • var.features: contains a list of genes genes that contribute strongly to cell-to-cell variation (see section 3.1 on highly variable genes).

You can access the data directly with the GetAssayData function.

# mydata@assays$RNA@counts[1:3,1:5]
GetAssayData(mydata, slot="counts")[1:3,1:5]
## 3 x 5 sparse Matrix of class "dgCMatrix"
##               TCCCTAAAGTAN TTTAAGCTCTTN AGAGAGAATACA GCCCGTGGAGCA ACTAGACCAAGT
## 128up                    .            1            1            2            4
## 14-3-3epsilon          426          371          438          380          358
## 14-3-3zeta              64           54           58           58           35

In Seurat, data are stored as “dgCMatrix”, which is an efficient way to store an array with a lot of zeros in a computer (sparse matrix).