simu_new
generates new simulated data based on fitted marginal and copula models.
Usage
simu_new(
sce,
assay_use = "counts",
mean_mat,
sigma_mat,
zero_mat,
quantile_mat = NULL,
copula_list,
n_cores,
fastmvn = FALSE,
family_use,
nonnegative = TRUE,
nonzerovar = FALSE,
input_data,
new_covariate,
important_feature = "all",
parallelization = "mcmapply",
BPPARAM = NULL,
filtered_gene
)
Arguments
- sce
A
SingleCellExperiment
object.- assay_use
A string which indicates the assay you will use in the sce. Default is 'counts'.
- mean_mat
A cell by feature matrix of the mean parameter.
- sigma_mat
A cell by feature matrix of the sigma parameter.
- zero_mat
A cell by feature matrix of the zero-inflation parameter.
- quantile_mat
A cell by feature matrix of the multivariate quantile.
- copula_list
A list of copulas for generating the multivariate quantile matrix. If provided, the
quantile_mat
must be NULL.- n_cores
An integer. The number of cores to use.
- fastmvn
An logical variable. If TRUE, the sampling of multivariate Gaussian is done by
mvnfast
, otherwise bymvtnorm
. Default is FALSE.- family_use
A string of the marginal distribution. Must be one of 'poisson', "binomial", 'nb', 'zip', 'zinb' or 'gaussian'.
- nonnegative
A logical variable. If TRUE, values < 0 in the synthetic data will be converted to 0. Default is TRUE (since the expression matrix is nonnegative).
- nonzerovar
A logical variable. If TRUE, for any gene with zero variance, a cell will be replaced with 1. This is designed for avoiding potential errors, for example, PCA.
- input_data
A input count matrix.
- new_covariate
A data.frame which contains covariates of targeted simulated data from
construct_data
.- important_feature
important_feature A string or vector which indicates whether a gene will be used in correlation estimation or not. If this is a string, then this string must be either "all" (using all genes) or "auto", which indicates that the genes will be automatically selected based on the proportion of zero expression across cells for each gene. Gene with zero proportion greater than 0.8 will be excluded form gene-gene correlation estimation. If this is a vector, then this should be a logical vector with length equal to the number of genes in
sce
.TRUE
in the logical vector means the corresponding gene will be included in gene-gene correlation estimation andFALSE
in the logical vector means the corresponding gene will be excluded from the gene-gene correlation estimation. The default value for is "all".- parallelization
A string indicating the specific parallelization function to use. Must be one of 'mcmapply', 'bpmapply', or 'pbmcmapply', which corresponds to the parallelization function in the package
parallel
,BiocParallel
, andpbmcapply
respectively. The default value is 'mcmapply'.- BPPARAM
A
MulticoreParam
object or NULL. When the parameter parallelization = 'mcmapply' or 'pbmcmapply', this parameter must be NULL. When the parameter parallelization = 'bpmapply', this parameter must be one of theMulticoreParam
object offered by the package 'BiocParallel. The default value is NULL.- filtered_gene
A vector or NULL which contains genes that are excluded in the marginal and copula fitting steps because these genes only express in less than two cells. This can be obtain from
construct_data
Details
The function takes the new covariate (if use) from construct_data
,
parameter matrices from extract_para
and multivariate Unifs from fit_copula
.
Examples
data(example_sce)
my_data <- construct_data(
sce = example_sce,
assay_use = "counts",
celltype = "cell_type",
pseudotime = "pseudotime",
spatial = NULL,
other_covariates = NULL,
corr_by = "1"
)
my_marginal <- fit_marginal(
data = my_data,
mu_formula = "s(pseudotime, bs = 'cr', k = 10)",
sigma_formula = "1",
family_use = "nb",
n_cores = 1,
usebam = FALSE
)
my_copula <- fit_copula(
sce = example_sce,
assay_use = "counts",
marginal_list = my_marginal,
family_use = c(rep("nb", 5), rep("zip", 5)),
copula = "vine",
n_cores = 1,
input_data = my_data$dat
)
#> Convert Residuals to Uniform
#> Converting End
#> Copula group 1 starts
#> Vine Copula Estimation Starts
#> Time difference of 0.1432807 secs
#> Vine Copula Estimation Ends
my_para <- extract_para(
sce = example_sce,
marginal_list = my_marginal,
n_cores = 1,
family_use = c(rep("nb", 5), rep("zip", 5)),
new_covariate = my_data$new_covariate,
data = my_data$dat
)
my_newcount <- simu_new(
sce = example_sce,
mean_mat = my_para$mean_mat,
sigma_mat = my_para$sigma_mat,
zero_mat = my_para$zero_mat,
quantile_mat = NULL,
copula_list = my_copula$copula_list,
n_cores = 1,
family_use = c(rep("nb", 5), rep("zip", 5)),
input_data = my_data$dat,
new_covariate = my_data$new_covariate,
important_feature = my_copula$important_feature,
filtered_gene = my_data$filtered_gene
)
#> Use Copula to sample a multivariate quantile matrix
#> Sample Copula group 1 starts