Skip to contents

constructNull takes the target data as the input and returns the corresponding synthetic null data.

Usage

constructNull(
  mat,
  family = "nb",
  formula = NULL,
  extraInfo = NULL,
  nCores = 1,
  parallelization = "mcmapply",
  fastVersion = TRUE,
  ifSparse = FALSE,
  corrCut = 0.1,
  BPPARAM = NULL
)

Arguments

mat

An expression matrix (gene by cell). It can be a regular dense matrix or a sparseMatrix.

family

A string or a vector of strings of the distribution of your data. Must be one of 'nb', 'binomial', 'poisson', 'zip', 'zinb' or 'gaussian', which represent 'poisson distribution', 'negative binomial distribution', 'zero-inflated poisson distribution', 'zero-inflated negative binomail distribution', and 'gaussian distribution' respectively. For UMI-counts data, we usually use 'nb'. Default is 'nb'.

formula

A string of the mu parameter formula. It defines the relationship between gene expression in synthetic null data and the extra covariates. Default is NULL (cell type case). For example, if your input data is a spatial data with X, Y coordinates, the formula can be 's(X, Y, bs = 'gp', k = 4)'.

extraInfo

A data frame of the extra covariates used in formula. For example, the 2D spatial coordinates. Default is NULL.

nCores

An integer. The number of cores to use for Parallel processing.

parallelization

A string indicating the specific parallelization function to use. Must be one of 'mcmapply', 'bpmapply', or 'pbmcmapply', which corresponds to the parallelization function in the package parallel,BiocParallel, and pbmcapply respectively. The default value is 'pbmcmapply'.

fastVersion

A logic value. If TRUE, the fast approximation is used. Default is FALSE.

ifSparse

A logic value. For high-dimensional data (gene number is much larger than cell number), if a sparse correlation estimation will be used. Default is FALSE.

corrCut

A numeric value. The cutoff for non-zero proportions in genes used in modelling correlation.

BPPARAM

A MulticoreParam object or NULL. When the parameter parallelization = 'mcmapply' or 'pbmcmapply', this parameter must be NULL. When the parameter parallelization = 'bpmapply', this parameter must be one of the MulticoreParam object offered by the package 'BiocParallel. The default value is NULL.

Value

The expression matrix of the synthetic null data.

Details

This function constructs the synthetic null data based on the target data (real data). The input is a expression matrix (gene by cell); the user should specify a distribution, which is usually Negative Binomial for count matrix.

Examples

data(exampleCounts)
nullData <- constructNull(mat = exampleCounts)
#> 95% of genes are used in correlation modelling.