ondisc: Algorithms and data structures for large single-cell expression matrices
Source:R/ondisc.R
ondisc-package.Rd
Single-cell datasets are growing in size, posing challenges as well as opportunities for genomics researchers. `ondisc` is an R package that facilitates analysis of large-scale single-cell data out-of-core on a laptop or distributed across tens to hundreds processors on a cluster or cloud. In both of these settings, `ondisc` requires only a few gigabytes of memory, even if the input data are tens of gigabytes in size. `ondisc` mainly is oriented toward single-cell CRISPR screen analysis, but ondisc also can be used for single-cell differential expression and single-cell co-expression analyses. ondisc is powered by several new, efficient algorithms for manipulating and querying large, sparse expression matrices.
Author
Maintainer: Timothy Barry tbarry@hsph.harvard.edu (ORCID)
Authors:
Eugene Katsevich ekatsevi@wharton.upenn.edu [thesis advisor]
Other contributors:
Songcheng Dai [contributor]
Yixuan Qiu [contributor]
Examples
# initialize odm objects from Cell Ranger output; also, compute the cellwise covariates
library(sceptredata)
directories_to_load <- paste0(
system.file("extdata", package = "sceptredata"),
"/highmoi_example/gem_group_", c(1, 2)
)
directory_to_write <- tempdir()
out_list <- create_odm_from_cellranger(
directories_to_load = directories_to_load,
directory_to_write = directory_to_write,
)
#> Round 1/2 processing of the input files.
#> Processing file 1 of 2.
#> Processing file 2 of 2.
#> Round 2/2 processing of the input files.
#> Processing file 1 of 2. Computing cellwise covariates. Writing to disk.
#> Processing file 2 of 2. Computing cellwise covariates. Writing to disk.
# extract the odm corresponding to the gene modality
gene_odm <- out_list$gene
gene_odm
#> An object of class odm with the following attributes:
#> • 526 features
#> • 45919 cells
#> • Backing file: /var/folders/7v/5sqjgh8j28lgf8qx3gbtq1h00000gp/T//Rtmp8pBZJd/gene.odm
# obtain dimension information
dim(gene_odm)
#> [1] 526 45919
nrow(gene_odm)
#> [1] 526
ncol(gene_odm)
#> [1] 45919
# obtain rownames (i.e., the feature IDs)
rownames(gene_odm) |> head()
#> [1] "ENSG00000069275" "ENSG00000117222" "ENSG00000117266" "ENSG00000117280"
#> [5] "ENSG00000133059" "ENSG00000133065"
# extract row into memory, first by integer and then by string
expression_vector_1 <- gene_odm[10,]
expression_vector_2 <- gene_odm["ENSG00000135046",]
# delete the gene_odm object
rm(gene_odm)
# reinitialize the gene_odm object
gene_odm <- initialize_odm_from_backing_file(
paste0(tempdir(), "/gene.odm")
)
gene_odm
#> An object of class odm with the following attributes:
#> • 526 features
#> • 45919 cells
#> • Backing file: /var/folders/7v/5sqjgh8j28lgf8qx3gbtq1h00000gp/T//Rtmp8pBZJd/gene.odm