1 Working with Seurat
1.1 Load the library
Seurat is an R package designed for the analysis of single-cell RNA-seq data.
## Loading required package: methods
1.2 Import the data
Single-cell RNA-seq data are presented in a matrix, where each row represents a gene and each column represents a single cell with a raw count (UMI). We first load the text file then create a “Seurat object”, the data structure suitable to work with Seurat.
raw_counts <- read.csv2(file="data/data_day1/GSM2494785_dge_mel_rep3.txt",sep="\t")
raw_counts[1:3,1:5]
## GENE TCCCTAAAGTAN TTTAAGCTCTTN AGAGAGAATACA GCCCGTGGAGCA
## 1 128up 0 1 1 2
## 2 14-3-3epsilon 426 371 438 380
## 3 14-3-3zeta 64 54 58 58
## [1] 17026 1586
Here we have the expression of 17 026 genes in 1586 cells.
To work with Seurat, you need to create a Seurat Object. Here, we create a Seurat object from our dataframe. We modify the table raw_counts
to have the field GENE
as the rownames.
While creating the Seurat object, we can perform a first filtering: we exclude cells that contain less than 200 genes (undersequenced cells or debris) and genes that are expressed in only 2 cells.
## An object of class Seurat
## 12511 features across 1579 samples within 1 assay
## Active assay: RNA (12511 features)
Alternative:
if you have a directory produced by CellRanger, you create your Seurat object with the function read10X
. This function takes as argument the name of the folder containing the output of CellRanger (matrix.mtx, genes.tsv, barcodes.tsv).
1.3 Explore the Seurat data structure
A Seurat object is not the easiest structure to work with, but with a bit of practice you will learn to appreciate its potentiality.
In Seurat, data are organised in different compartements (slots), which contain themselves several compartements, which can also contain sub-compartments, etc.
Each compartment can be used to store:
- data from multiple modalities, such as RNAseq, spatial transcriptomics, ATAC-seq… For our session today, we will only focus on scRNAseq data (slot
assays
, sub-slotRNA
) - general results regarding your data, e.g. the total number of UMI expressed (slot
meta.data
) - results of analyses: PCA components or clustering results
## [1] "assays" "meta.data" "active.assay" "active.ident" "graphs"
## [6] "neighbors" "reductions" "project.name" "misc" "version"
## [11] "commands" "tools"
You navigate through this hierarchy using @
and $
signs.
## [1] "counts" "data" "scale.data" "key"
## [5] "var.features" "meta.features" "misc"
In the slots associated RNA, you can store:
counts
: raw UMI (the data we imported)data
: filtered/normalized counting matrixscale.data
: normalized and scaled data (usually for PCA analyses)var.features
: contains a list of genes genes that contribute strongly to cell-to-cell variation (see section 3.1 on highly variable genes).
You can access the data directly with the GetAssayData
function.
## 3 x 5 sparse Matrix of class "dgCMatrix"
## TCCCTAAAGTAN TTTAAGCTCTTN AGAGAGAATACA GCCCGTGGAGCA ACTAGACCAAGT
## 128up . 1 1 2 4
## 14-3-3epsilon 426 371 438 380 358
## 14-3-3zeta 64 54 58 58 35
In Seurat, data are stored as “dgCMatrix”, which is an efficient way to store an array with a lot of zeros in a computer (sparse matrix).