Skip to contents

simu_new generates new simulated data based on fitted marginal and copula models.

Usage

simu_new(
  sce,
  assay_use = "counts",
  mean_mat,
  sigma_mat,
  zero_mat,
  quantile_mat = NULL,
  copula_list,
  n_cores,
  fastmvn = FALSE,
  family_use,
  nonnegative = TRUE,
  nonzerovar = FALSE,
  input_data,
  new_covariate,
  important_feature = "all",
  parallelization = "mcmapply",
  BPPARAM = NULL,
  filtered_gene
)

Arguments

sce

A SingleCellExperiment object.

assay_use

A string which indicates the assay you will use in the sce. Default is 'counts'.

mean_mat

A cell by feature matrix of the mean parameter.

sigma_mat

A cell by feature matrix of the sigma parameter.

zero_mat

A cell by feature matrix of the zero-inflation parameter.

quantile_mat

A cell by feature matrix of the multivariate quantile.

copula_list

A list of copulas for generating the multivariate quantile matrix. If provided, the quantile_mat must be NULL.

n_cores

An integer. The number of cores to use.

fastmvn

An logical variable. If TRUE, the sampling of multivariate Gaussian is done by mvnfast, otherwise by mvtnorm. Default is FALSE.

family_use

A string of the marginal distribution. Must be one of 'poisson', "binomial", 'nb', 'zip', 'zinb' or 'gaussian'.

nonnegative

A logical variable. If TRUE, values < 0 in the synthetic data will be converted to 0. Default is TRUE (since the expression matrix is nonnegative).

nonzerovar

A logical variable. If TRUE, for any gene with zero variance, a cell will be replaced with 1. This is designed for avoiding potential errors, for example, PCA.

input_data

A input count matrix.

new_covariate

A data.frame which contains covariates of targeted simulated data from construct_data.

important_feature

important_feature A string or vector which indicates whether a gene will be used in correlation estimation or not. If this is a string, then this string must be either "all" (using all genes) or "auto", which indicates that the genes will be automatically selected based on the proportion of zero expression across cells for each gene. Gene with zero proportion greater than 0.8 will be excluded form gene-gene correlation estimation. If this is a vector, then this should be a logical vector with length equal to the number of genes in sce. TRUE in the logical vector means the corresponding gene will be included in gene-gene correlation estimation and FALSE in the logical vector means the corresponding gene will be excluded from the gene-gene correlation estimation. The default value for is "all".

parallelization

A string indicating the specific parallelization function to use. Must be one of 'mcmapply', 'bpmapply', or 'pbmcmapply', which corresponds to the parallelization function in the package parallel,BiocParallel, and pbmcapply respectively. The default value is 'mcmapply'.

BPPARAM

A MulticoreParam object or NULL. When the parameter parallelization = 'mcmapply' or 'pbmcmapply', this parameter must be NULL. When the parameter parallelization = 'bpmapply', this parameter must be one of the MulticoreParam object offered by the package 'BiocParallel. The default value is NULL.

filtered_gene

A vector or NULL which contains genes that are excluded in the marginal and copula fitting steps because these genes only express in less than two cells. This can be obtain from construct_data

Value

A feature by cell matrix of the new simulated count (expression) matrix or sparse matrix.

Details

The function takes the new covariate (if use) from construct_data, parameter matrices from extract_para and multivariate Unifs from fit_copula.

Examples

  data(example_sce)
  my_data <- construct_data(
  sce = example_sce,
  assay_use = "counts",
  celltype = "cell_type",
  pseudotime = "pseudotime",
  spatial = NULL,
  other_covariates = NULL,
  corr_by = "1"
  )
  my_marginal <- fit_marginal(
  data = my_data,
  mu_formula = "s(pseudotime, bs = 'cr', k = 10)",
  sigma_formula = "1",
  family_use = "nb",
  n_cores = 1,
  usebam = FALSE
  )
  my_copula <- fit_copula(
  sce = example_sce,
  assay_use = "counts",
  marginal_list = my_marginal,
  family_use = c(rep("nb", 5), rep("zip", 5)),
  copula = "vine",
  n_cores = 1,
  input_data = my_data$dat
  )
#> Convert Residuals to Uniform
#> Converting End
#> Copula group 1 starts
#> Vine Copula Estimation Starts
#> Time difference of 0.1432807 secs
#> Vine Copula Estimation Ends
  my_para <- extract_para(
    sce = example_sce,
    marginal_list = my_marginal,
    n_cores = 1,
    family_use = c(rep("nb", 5), rep("zip", 5)),
    new_covariate = my_data$new_covariate,
    data = my_data$dat
  )
  my_newcount <- simu_new(
  sce = example_sce,
  mean_mat = my_para$mean_mat,
  sigma_mat = my_para$sigma_mat,
  zero_mat = my_para$zero_mat,
  quantile_mat = NULL,
  copula_list = my_copula$copula_list,
  n_cores = 1,
  family_use = c(rep("nb", 5), rep("zip", 5)),
  input_data = my_data$dat,
  new_covariate = my_data$new_covariate,
  important_feature = my_copula$important_feature,
  filtered_gene = my_data$filtered_gene
  )
#> Use Copula to sample a multivariate quantile matrix
#> Sample Copula group 1 starts