Skip to contents

Single-cell datasets are growing in size, posing challenges as well as opportunities for genomics researchers. `ondisc` is an R package that facilitates analysis of large-scale single-cell data out-of-core on a laptop or distributed across tens to hundreds processors on a cluster or cloud. In both of these settings, `ondisc` requires only a few gigabytes of memory, even if the input data are tens of gigabytes in size. `ondisc` mainly is oriented toward single-cell CRISPR screen analysis, but ondisc also can be used for single-cell differential expression and single-cell co-expression analyses. ondisc is powered by several new, efficient algorithms for manipulating and querying large, sparse expression matrices.

Author

Maintainer: Timothy Barry tbarry@hsph.harvard.edu (ORCID)

Authors:

Other contributors:

  • Songcheng Dai [contributor]

  • Yixuan Qiu [contributor]

Examples

# initialize odm objects from Cell Ranger output; also, compute the cellwise covariates
library(sceptredata)
directories_to_load <- paste0(
 system.file("extdata", package = "sceptredata"),
 "/highmoi_example/gem_group_", c(1, 2)
)
directory_to_write <- tempdir()
out_list <- create_odm_from_cellranger(
  directories_to_load = directories_to_load,
  directory_to_write = directory_to_write,
)
#> Round 1/2 processing of the input files.
#> 	Processing file 1 of 2.
#> 	Processing file 2 of 2.
#> Round 2/2 processing of the input files.
#> 	Processing file 1 of 2. Computing cellwise covariates. Writing to disk.
#> 	Processing file 2 of 2. Computing cellwise covariates. Writing to disk.

# extract the odm corresponding to the gene modality
gene_odm <- out_list$gene
gene_odm
#> An object of class odm with the following attributes:
#>526 features
#>45919 cells
#> 	• Backing file: /var/folders/7v/5sqjgh8j28lgf8qx3gbtq1h00000gp/T//Rtmp8pBZJd/gene.odm

# obtain dimension information
dim(gene_odm)
#> [1]   526 45919
nrow(gene_odm)
#> [1] 526
ncol(gene_odm)
#> [1] 45919

# obtain rownames (i.e., the feature IDs)
rownames(gene_odm) |> head()
#> [1] "ENSG00000069275" "ENSG00000117222" "ENSG00000117266" "ENSG00000117280"
#> [5] "ENSG00000133059" "ENSG00000133065"

# extract row into memory, first by integer and then by string
expression_vector_1 <- gene_odm[10,]
expression_vector_2 <- gene_odm["ENSG00000135046",]

# delete the gene_odm object
rm(gene_odm)

# reinitialize the gene_odm object
gene_odm <- initialize_odm_from_backing_file(
  paste0(tempdir(), "/gene.odm")
)
gene_odm
#> An object of class odm with the following attributes:
#>526 features
#>45919 cells
#> 	• Backing file: /var/folders/7v/5sqjgh8j28lgf8qx3gbtq1h00000gp/T//Rtmp8pBZJd/gene.odm