3  Assign gRNAs

The third step of the pipeline is to assign gRNAs to cells. This step entails using the gRNA UMI counts to determine which cells contain which gRNAs. sceptre provides three gRNA-to-cell assignment methods:

We begin by loading the sceptre and sceptredata packages.

3.1 Initialize the CRISPRi and CRISPRko sceptre_objects

We use the high-MOI CRISPRi and low-MOI CRISPRko data as running examples. We call import_data() and set_analysis_parameters() on the low-MOI CRISPRko data to initialize a sceptre_object called sceptre_object_lowmoi.

# low-MOI CRISPRko data setup
# 1. import data
data(lowmoi_example_data)
sceptre_object_lowmoi <- import_data(
  response_matrix = lowmoi_example_data$response_matrix,
  grna_matrix = lowmoi_example_data$grna_matrix,
  extra_covariates = lowmoi_example_data$extra_covariates,
  grna_target_data_frame = lowmoi_example_data$grna_target_data_frame,
  moi = "low"
)

# 2. set analysis parameters
positive_control_pairs <- construct_positive_control_pairs(sceptre_object_lowmoi)
discovery_pairs <- construct_trans_pairs(
  sceptre_object = sceptre_object_lowmoi,
  positive_control_pairs = positive_control_pairs,
  pairs_to_exclude = "pc_pairs"
)
sceptre_object_lowmoi <- set_analysis_parameters(
  sceptre_object = sceptre_object_lowmoi,
  discovery_pairs = discovery_pairs,
  positive_control_pairs = positive_control_pairs
)

We do the same for the high-MOI CRISPRi data, creating sceptre_object_highmoi.

# high-MOI CRISPRi setup
# 1. import data
data(highmoi_example_data); data(grna_target_data_frame_highmoi)
sceptre_object_highmoi <- import_data(
  response_matrix = highmoi_example_data$response_matrix,
  grna_matrix = highmoi_example_data$grna_matrix,
  grna_target_data_frame = grna_target_data_frame_highmoi,
  moi = "high",
  extra_covariates = highmoi_example_data$extra_covariates,
  response_names = highmoi_example_data$gene_names
)
# 2. set analysis parameters
positive_control_pairs <- construct_positive_control_pairs(sceptre_object_highmoi)
discovery_pairs <- construct_cis_pairs(sceptre_object_highmoi,
  positive_control_pairs = positive_control_pairs,
  distance_threshold = 5e6
)
sceptre_object_highmoi <- set_analysis_parameters(
  sceptre_object = sceptre_object_highmoi,
  discovery_pairs = discovery_pairs,
  positive_control_pairs = positive_control_pairs,
  side = "left"
)

We are now ready to carry out the gRNA assignment step on both datasets.

3.2 Visualize the gRNA count distributions

A helpful first step in assigning gRNAs to cells is to visualize the gRNA UMI count distributions. As discussed in Section 3 of The whole game, we can use the function plot_grna_count_distributions() to plot the empirical UMI count distribution of one or more gRNAs. plot_grna_count_distributions() takes several arguments: sceptre_object (required), n_grnas_to_plot (optional), grnas_to_plot (optional), and threshold (optional). n_grnas_to_plot is an integer specifying the number of randomly selected gRNAs to plot. grnas_to_plot is a character vector specifying (by name) one or more specific gRNAs to plot. Finally, threshold is an integer specifying the location at which to draw a dotted vertical line. We call plot_grna_count_distributions() on the low-MOI CRISPRko data, plotting nine random gRNAs.

plot_grna_count_distributions(
  sceptre_object = sceptre_object_lowmoi,
  n_grnas_to_plot = 9
)

Histograms of the gRNA count distributions for the low-MOI CRISPRko data.

We see that the gRNA count distributions exhibit generally bimodal behavior. As discussed in The whole game, this bimodality is due to the phenomenon of background contamination: gRNA reads sometimes map to cells that do not contain the corresponding gRNA. sceptre provides three methods for assigning gRNAs to cells, all of which account for background contamination.

3.3 Assign gRNAs to cells

We assign gRNAs to cells to cells by calling the function assign_grnas(). assign_grnas() takes the arguments sceptre_object (required), method (optional), print_progress (optional), and parallel (optional). method is the gRNA assignment method and should be set to "mixture", "maximum", or "thresholding". print_progress (default TRUE) and parallel (default FALSE) are logical values specifying whether to print progress updates and run the function in parallel, respectively. Below, we describe the three gRNA assignment strategies in detail.

3.3.1 Mixture method

The first gRNA assignment strategy is the “mixture” method, which involves assigning gRNAs to cells using a mixture model (Barry, Roeder, and Katsevich 2024). The mixture method is the default method for high-MOI screens and is an optional method for low-MOI screens. The method works as follows. First, we fit a latent variable Poisson GLM to the data, regressing the gRNA UMI count vector onto the (latent) gRNA indicator vector and cell-specific covariate matrix. (A given entry of the gRNA indicator vector is defined to be “1” if the gRNA is present in the corresponding cell and “0” otherwise.) We fit the latent variable Poisson GLM using a novel variant of the EM algorithm. The fitted model yields the probability that each cell contains the gRNA; we threshold these probabilities to assign the gRNA to cells. An advantage of the GLM-based mixture modeling approach is that it accounts for cell-specific covariates, such as sequencing depth and batch. For example, cells that are sequenced deeply (and thus have a large value for grna_n_umis) generally exhibit higher levels of ambient background contamination; sceptre controls for heterogeneity across cells due to sequencing depth and other factors.

We use the mixture assignment method to assign gRNAs to cells on both the high-MOI CRISPRi data and low-MOI CRISPRko data, saving the resulting outputs as sceptre_object_lowmoi_mixture and sceptre_object_highmoi_mixture, respectively.

# high-MOI CRISPRi data
sceptre_object_highmoi_mixture <- assign_grnas(
  sceptre_object = sceptre_object_highmoi,
  method = "mixture", parallel = TRUE
)
# low-MOI CRISPRko data
sceptre_object_lowmoi_mixture <- assign_grnas(
  sceptre_object = sceptre_object_lowmoi,
  method = "mixture", parallel = TRUE
)

We can pass optional, method-specific parameters to assign_grnas() to control precisely how the assignment method is deployed. The relevant parameters for the mixture method include formula_object, n_em_rep, n_nonzero_cells_cutoff, backup_threshold, and probability_threshold. formula_object is a formula object that specifies how sceptre is to adjust for the cell-specific covariates in the latent-variable gRNA count model. The default formula object is constructed by summing over all covariates and then log-transforming the count-based covariates. It is sometimes helpful to exclude covariates from the formula object so as to improve the speed of the gRNA assignment step. For example, we define a reduced formula object formula_object_reduced that includes only grna_n_nonzero and grna_n_umis, the two most important covariates for assigning gRNAs to cells.

formula_object_reduced <- formula(~ log(grna_n_nonzero) + log(grna_n_umis))

n_em_rep is the number of times to run the EM algorithm using random starting estimates (default: n_em_rep = 5). Setting n_em_rep to a larger integer can improve the accuracy of the gRNA assignments at the cost of increasing compute. n_nonzero_cells_cutoff is the minimum number of cells that must exhibit nonzero expression of the gRNA to attempt fitting a mixture model to the gRNA count distribution (default: n_nonzero_cells_cutoff = 10). If the gRNA is expressed in fewer than n_nonzero_cells_cutoff cells, the gRNA instead is assigned to cells via the thresholding method, where the threshold used is backup_threshold (default: backup_threshold = 5). (See Section 3.3.3 for a detailed discussion of the thresholding method.) Finally, cells whose estimated probability of having received a gRNA exceeds probability_threshold are called as containing the gRNA (default: probability_threshold = 0.8). In practice probability_threshold typically does not impact the results considerably, but it is helpful to vary this parameter to ensure stability of the gRNA assignments to this choice.

To illustrate use of these method-specific parameters, we again call assign_grnas() to assign gRNAs to cells on the high-MOI CRISPRi data, this time setting formula_object to formula_object_reduced.

sceptre_object_highmoi_mixture <- assign_grnas(
  sceptre_object = sceptre_object_highmoi,
  formula_object = formula_object_reduced,
  parallel = TRUE
)

3.3.2 Maximum method

The second gRNA assignment strategy is the “maximum” method. The maximum method is the default assignment method for low-MOI screens and is not available as an option for high-MOI screens. The maximum method assigns the gRNA that accounts for the greatest number of UMIs in a given cell to that cell. We apply the maximum method to the low-MOI CRISPRko data below.

sceptre_object_lowmoi_maximum <- assign_grnas(
  sceptre_object = sceptre_object_lowmoi, 
  method = "maximum"
)

An advantage of the maximum method relative to the mixture method is that the former is model-free (and thus robust to misspecification of the Poisson GLM). However, when using the maximum method, one must take care to identify cells that contain zero gRNAs or multiple gRNAs (so that these cells can be removed as part of cellwise quality control). In this context the maximum method allows for two optional arguments: umi_fraction_threshold (default: umi_fraction_threshold = 0.8) and min_grna_n_umis_threshold (default: min_grna_n_umis_threshold = 5). Cells for which the maximally expressed gRNA constitutes fewer than umi_fraction_threshold of the UMIs in that cell are flagged as containing multiple gRNAs. For example, if the maximally expressed gRNA in a given cell makes up 0.64 (or 64\(\%\)) of the UMIs in that cell, and if umi_fraction_threshold = 0.8, then that cell is flagged as containing multiple gRNAs. Next, cells for which the total gRNA UMI count (i.e., n_grna_umis) is less than min_grna_n_umis_threshold are flagged as containing zero gRNAs. Cells containing multiple gRNAs or zero gRNAs are removed as part of low-MOI quality control step, as discussed in Chapter 4.

3.3.3 Thresholding method

The third method for assigning gRNAs to cells is the "thresholding" method; this method is available in both low- and high-MOI settings. The thresholding method assigns a gRNA to a cell if the UMI count of the gRNA in the cell is greater than or equal to some integer threshold (by default 5). We apply the "thresholding" method to both CRISPRi and CRISPRko datasets as follows.

# high-MOI CRISPRi data
sceptre_object_highmoi_thresholding <- assign_grnas(
  sceptre_object = sceptre_object_highmoi,
  method = "thresholding"
)
# low-MOI CRISPRko data
sceptre_object_lowmoi_thresholding <- assign_grnas(
  sceptre_object = sceptre_object_lowmoi,
  method = "thresholding"
)

The thresholding method allows for one optional argument, namely the integer threshold threshold. One can set threshold to 1 to assign any gRNA expressed in a given cell to that cell. We assign gRNAs to cells on the CRISPRi data using a threshold of 1 below.

sceptre_object_highmoi_thresholding <- assign_grnas(
  sceptre_object = sceptre_object_highmoi,
  method = "thresholding",
  threshold = 1
)

Note that setting threshold to 1 causes background contamination to be ignored.

3.4 Plotting the gRNA assignments

We can call plot() on the sceptre_object to render a plot summarizing the outcome of the gRNA-to-cell assignment step. plot() takes the arguments sceptre_object (required) n_grnas_to_plot (optional), grnas_to_plot (optional), and return_indiv_plots (optional). n_grnas_to_plot is the number of (randomly selected) gRNAs to plot; grnas_to_plot is a vector of names of one or more gRNAs to plot; and return_indiv_plots is a logical value indicating whether to return the constituent panels of the plot individually (TRUE) or combined into a single figure (FALSE; default).

We visualize the outcome of the gRNA-to-cell assignment step on the low-MOI CRISPRko data based on the maximum assignment strategy. We specifically plot the gRNAs “IFNGR2g2”, “STAT1g2”, and “STAT5Ag4”. (These gRNAs were selected arbitrarily for the purpose of illustrating the grnas_to_plot argument.)

grnas_to_plot <- c("IFNGR2g2", "STAT1g2", "STAT5Ag4")
plot(sceptre_object_lowmoi_maximum, grnas_to_plot = grnas_to_plot)

Maximum assignment method on the low-MOI CRISPRko data.

Section 3 of The whole game describes how to interpret this plot. Briefly, the top panel displays the gRNA-to-cell assignments of several individual gRNAS, and the bottom left panel shows the number of cells per gRNA. Typically, the bottom right panel plots the number of gRNAs per cell, but the bottom right panel is not meaningful under the maximum assignment strategy and is thus omitted.

It is informative to compare the output of multiple gRNA assignment methods and check for consistency across methods. To this end we render a visualization of the gRNA-to-cell assignments on the same dataset, this time based on the mixture method.

plot(sceptre_object_lowmoi_mixture, grnas_to_plot = grnas_to_plot)

Mixture assignment method on the low-MOI CRISPRko data.

Encouragingly, the gRNA assignments appear to be highly concordant across the maximum and mixture methods. Suppose that we ultimately choose to use the mixture strategy to assign gRNAs to cells on the low-MOI CRISPRko dataset. We update sceptre_object_lowmoi as follows.

sceptre_object_lowmoi <- sceptre_object_lowmoi_mixture

3.5 Returning and printing the gRNA assignments

We can obtain the gRNA-to-cell assignments by calling get_grna_assignments() on the sceptre_object.

grna_assignment_matrix <- get_grna_assignments(
  sceptre_object = sceptre_object_lowmoi
)

grna_assignment_matrix is a sparse, logical matrix of gRNA assignments; gRNA IDs are in the rows and cells are in the columns. A given entry of the matrix is set to TRUE if the given gRNA was assigned to the given cell (and FALSE otherwise). We can determine the cells to which a given gRNA was assigned by indexing grna_assignment_matrix by row; conversely, we can determine the gRNAs that were assigned to a given cell by indexing grna_assignment_matrix by column. By default, get_grna_assignments() returns the gRNA-to-cell assignment matrix corresponding to the original gRNA expression matrix. We can set apply_cellwise_qc to TRUE to instead return the gRNA-to-cell assignment matrix corresponding to the gRNA expression matrix that has had cellwise quality control called on it. (The function run_qc() first must be called on the sceptre_object for the latter option to work.)

Finally, we can call print() on sceptre_object_lowmoi to print a summary tracking the progress of the analysis. The field “gRNA-to-cell assignment information” presents information about the gRNA assignment step, including the selected assignment method, the mean number of cells per gRNA, and the mean number of gRNAs per cell.

print(sceptre_object_lowmoi)
An object of class sceptre_object.

Attributes of the data:
    • 20729 cells
    • 299 responses
    • Low multiplicity-of-infection 
    • 101 targeting gRNAs (distributed across 26 targets) 
    • 9 non-targeting gRNAs 
    • 6 covariates (bio_rep, grna_n_nonzero, grna_n_umis, response_n_nonzero, response_n_umis, response_p_mito)

Analysis status:
    ✓ import_data()
    ✓ set_analysis_parameters()
    ✓ assign_grnas()
    ✗ run_qc()
    ✗ run_calibration_check()
    ✗ run_power_check()
    ✗ run_discovery_analysis()

Analysis parameters: 
    • Discovery pairs: data frame with 7765 pairs
    • Positive control pairs: data frame with 9 pairs
    • Sidedness of test: both
    • Control group: non-targeting cells
    • Resampling mechanism: permutations
    • gRNA integration strategy: union
    • Resampling approximation: skew normal
    • Multiple testing adjustment: BH at level 0.1
    • N nonzero treatment cells threshold: not specified
    • N nonzero control cells threshold: not specified
    • Formula object: log(response_n_nonzero) + log(response_n_umis) + bio_rep

gRNA-to-cell assignment information:
    • Assignment method: mixture
    • Mean N cells per gRNA: 205.07
    • Mean N gRNAs per cell (MOI): 1.09 
    • gRNA assignment formula object: log(response_n_nonzero) + log(response_n_umis) + log(grna_n_nonzero) + log(grna_n_umis) + bio_rep