1 Working with Seurat
1.1 Load the library
Seurat is an R package designed for the analysis of single-cell RNA-seq data.
## Loading required package: methods
1.2 Import the data
Single-cell RNA-seq data are presented in a matrix, where each row represents a gene and each column represents a single cell with a raw count (UMI). We first load the text file then create a “Seurat object”, the data structure suitable to work with Seurat.
raw_counts <- read.csv2(file="data/data_day1/GSM2494785_dge_mel_rep3.txt",sep="\t")
raw_counts[1:3,1:5]## GENE TCCCTAAAGTAN TTTAAGCTCTTN AGAGAGAATACA GCCCGTGGAGCA
## 1 128up 0 1 1 2
## 2 14-3-3epsilon 426 371 438 380
## 3 14-3-3zeta 64 54 58 58
## [1] 17026 1586
Here we have the expression of 17 026 genes in 1586 cells.
To work with Seurat, you need to create a Seurat Object. Here, we create a Seurat object from our dataframe. We modify the table raw_counts to have the field GENE as the rownames.
While creating the Seurat object, we can perform a first filtering: we exclude cells that contain less than 200 genes (undersequenced cells or debris) and genes that are expressed in only 2 cells.
## An object of class Seurat
## 12511 features across 1579 samples within 1 assay
## Active assay: RNA (12511 features)
Alternative:
if you have a directory produced by CellRanger, you create your Seurat object with the function read10X. This function takes as argument the name of the folder containing the output of CellRanger (matrix.mtx, genes.tsv, barcodes.tsv).
1.3 Explore the Seurat data structure
A Seurat object is not the easiest structure to work with, but with a bit of practice you will learn to appreciate its potentiality.
In Seurat, data are organised in different compartements (slots), which contain themselves several compartements, which can also contain sub-compartments, etc.
Each compartment can be used to store:
- data from multiple modalities, such as RNAseq, spatial transcriptomics, ATAC-seq… For our session today, we will only focus on scRNAseq data (slot
assays, sub-slotRNA) - general results regarding your data, e.g. the total number of UMI expressed (slot
meta.data) - results of analyses: PCA components or clustering results
## [1] "assays" "meta.data" "active.assay" "active.ident" "graphs"
## [6] "neighbors" "reductions" "project.name" "misc" "version"
## [11] "commands" "tools"
You navigate through this hierarchy using @ and $ signs.
## [1] "counts" "data" "scale.data" "key"
## [5] "var.features" "meta.features" "misc"
In the slots associated RNA, you can store:
counts: raw UMI (the data we imported)data: filtered/normalized counting matrixscale.data: normalized and scaled data (usually for PCA analyses)var.features: contains a list of genes genes that contribute strongly to cell-to-cell variation (see section 3.1 on highly variable genes).
You can access the data directly with the GetAssayData function.
## 3 x 5 sparse Matrix of class "dgCMatrix"
## TCCCTAAAGTAN TTTAAGCTCTTN AGAGAGAATACA GCCCGTGGAGCA ACTAGACCAAGT
## 128up . 1 1 2 4
## 14-3-3epsilon 426 371 438 380 358
## 14-3-3zeta 64 54 58 58 35
In Seurat, data are stored as “dgCMatrix”, which is an efficient way to store an array with a lot of zeros in a computer (sparse matrix).