scdesign3
takes the input data, fits the model and
Usage
scdesign3(
sce,
assay_use = "counts",
celltype,
pseudotime = NULL,
spatial = NULL,
other_covariates,
ncell = dim(sce)[2],
mu_formula,
sigma_formula = "1",
family_use = "nb",
n_cores = 2,
usebam = FALSE,
edf_flexible = FALSE,
corr_formula,
empirical_quantile = FALSE,
copula = "gaussian",
if_sparse = FALSE,
fastmvn = FALSE,
DT = TRUE,
pseudo_obs = FALSE,
family_set = c("gauss", "indep"),
important_feature = "all",
nonnegative = TRUE,
nonzerovar = FALSE,
return_model = FALSE,
simplify = FALSE,
parallelization = "mcmapply",
BPPARAM = NULL,
trace = FALSE
)
Arguments
- sce
A
SingleCellExperiment
object.- assay_use
A string which indicates the assay you will use in the sce. Default is 'counts'.
- celltype
A string of the name of cell type variable in the
colData
of the sce. Default is 'cell_type'.- pseudotime
A string or a string vector of the name of pseudotime and (if exist) multiple lineages. Default is NULL.
- spatial
A length two string vector of the names of spatial coordinates. Default is NULL.
- other_covariates
A string or a string vector of the other covariates you want to include in the data.
- ncell
The number of cell you want to simulate. Default is
dim(sce)[2]
(the same number as the input data).- mu_formula
A string of the mu parameter formula
- sigma_formula
A string of the sigma parameter formula
- family_use
A string of the marginal distribution. Must be one of 'poisson', 'nb', 'zip', 'zinb' or 'gaussian'.
- n_cores
An integer. The number of cores to use.
- usebam
A logic variable. If use
bam
for acceleration in marginal fitting.- edf_flexible
A logic variable. It is used for accelerating for spatial model if k is large in 'mu_formula'. Default is FALSE.
- corr_formula
A string of the correlation structure.
- empirical_quantile
Please only use it if you clearly know what will happen! A logic variable. If TRUE, DO NOT fit the copula and use the EMPIRICAL CDF values of the original data; it will make the simulated data fixed (no randomness). Default is FALSE. Only works if ncell is the same as your original data.
- copula
A string of the copula choice. Must be one of 'gaussian' or 'vine'. Default is 'gaussian'. Note that vine copula may have better modeling of high-dimensions, but can be very slow when features are >1000.
- if_sparse
A logic variable. Only works for Gaussian copula (
family_set = "gaussian"
). If TRUE, a thresholding strategy will make the corr matrix sparse.- fastmvn
An logical variable. If TRUE, the sampling of multivariate Gaussian is done by
mvnfast
, otherwise bymvtnorm
. Default is FALSE. It only matters for Gaussian copula.- DT
A logic variable. If TRUE, perform the distributional transformation to make the discrete data 'continuous'. This is useful for discrete distributions (e.g., Poisson, NB). Default is TRUE. Note that for continuous data (e.g., Gaussian), DT does not make sense and should be set as FALSE.
- pseudo_obs
A logic variable. If TRUE, use the empirical quantiles instead of theoretical quantiles for fitting copula. Default is FALSE.
- family_set
A string or a string vector of the bivariate copula families. Default is c("gauss", "indep"). For more information please check package
rvinecoplib
.- important_feature
A numeric value or vector which indicates whether a gene will be used in correlation estimation or not. If this is a numeric value, then gene with zero proportion greater than this value will be excluded form gene-gene correlation estimation. If this is a vector, then this should be a logical vector with length equal to the number of genes in
sce
.TRUE
in the logical vector means the corresponding gene will be included in gene-gene correlation estimation andFALSE
in the logical vector means the corresponding gene will be excluded from the gene-gene correlation estimation. The default value for is "all" (a special string which means no filtering).- nonnegative
A logical variable. If TRUE, values < 0 in the synthetic data will be converted to 0. Default is TRUE (since the expression matrix is nonnegative).
- nonzerovar
A logical variable. If TRUE, for any gene with zero variance, a cell will be replaced with 1. This is designed for avoiding potential errors, for example, PCA. Default is FALSE.
- return_model
A logic variable. If TRUE, the marginal models and copula models will be returned. Default is FALSE.
- simplify
A logic variable. If TRUE, the fitted regression model will only keep the essential contains for
predict
, otherwise the fitted models can be VERY large. Default is FALSE.- parallelization
A string indicating the specific parallelization function to use. Must be one of 'mcmapply', 'bpmapply', or 'pbmcmapply', which corresponds to the parallelization function in the package
parallel
,BiocParallel
, andpbmcapply
respectively. The default value is 'mcmapply'.- BPPARAM
A
MulticoreParam
object or NULL. When the parameter parallelization = 'mcmapply' or 'pbmcmapply', this parameter must be NULL. When the parameter parallelization = 'bpmapply', this parameter must be one of theMulticoreParam
object offered by the package 'BiocParallel. The default value is NULL.- trace
A logic variable. If TRUE, the warning/error log and runtime for gam/gamlss will be returned, FALSE otherwise. Default is FALSE.
Value
A list with the components:
new_count
A matrix of the new simulated count (expression) matrix.
new_covariate
A data.frame of the new covariate matrix.
model_aic
The model AIC.
marginal_list
A list of marginal regression models if return_model = TRUE.
corr_list
A list of correlation models (conditional copulas) if return_model = TRUE.
Examples
data(example_sce)
my_simu <- scdesign3(
sce = example_sce,
assay_use = "counts",
celltype = "cell_type",
pseudotime = "pseudotime",
spatial = NULL,
other_covariates = NULL,
mu_formula = "s(pseudotime, bs = 'cr', k = 10)",
sigma_formula = "1",
family_use = "nb",
n_cores = 2,
usebam = FALSE,
edf_flexible = FALSE,
corr_formula = "pseudotime",
copula = "gaussian",
if_sparse = TRUE,
DT = TRUE,
pseudo_obs = FALSE,
ncell = 1000,
return_model = FALSE
)
#> Input Data Construction Start
#> fitting ...
#>
|
| | 0%
|
|====== | 9%
|
|============= | 18%
|
|=================== | 27%
|
|========================= | 36%
|
|================================ | 45%
|
|====================================== | 55%
|
|============================================= | 64%
|
|=================================================== | 73%
|
|========================================================= | 82%
|
|================================================================ | 91%
|
|======================================================================| 100%
#> Input Data Construction End
#> Start Marginal Fitting
#> Marginal Fitting End
#> Start Copula Fitting
#> Convert Residuals to Multivariate Gaussian
#> Converting End
#> Copula group 1 starts
#> Copula group 2 starts
#> Copula group 3 starts
#> Copula group 4 starts
#> Copula Fitting End
#> Start Parameter Extraction
#> Parameter
#> Extraction End
#> Start Generate New Data
#> Use Copula to sample a multivariate quantile matrix
#> Sample Copula group 1 starts
#> Sample Copula group 2 starts
#> Sample Copula group 3 starts
#> Sample Copula group 4 starts
#> New Data Generating End