fit_copula
fits the copula model.
Usage
fit_copula(
sce,
assay_use,
input_data,
empirical_quantile = FALSE,
marginal_list,
family_use,
copula = "gaussian",
DT = TRUE,
pseudo_obs = FALSE,
epsilon = 1e-06,
family_set = c("gaussian", "indep"),
important_feature = "all",
if_sparse = FALSE,
n_cores,
parallelization = "mcmapply",
BPPARAM = NULL
)
Arguments
- sce
A
SingleCellExperiment
object.- assay_use
A string which indicates the assay you will use in the sce. Default is 'counts'.
- input_data
The input data, which is one of the output from
construct_data
.- empirical_quantile
Please only use it if you clearly know what will happen! A logic variable. If TRUE, DO NOT fit the copula and use the EMPIRICAL CDF values of the original data; it will make the simulated data fixed (no randomness). Default is FALSE. Only works if ncell is the same as your original data.
- marginal_list
A list of fitted regression models from
fit_marginal
.- family_use
A string or a vector of strings of the marginal distribution. Must be one of 'poisson', 'nb', 'zip', 'zinb' or 'gaussian'.
- copula
A string of the copula choice. Must be one of 'gaussian' or 'vine'. Default is 'gaussian'. Note that vine copula may have better modeling of high-dimensions, but can be very slow when features are >1000.
- DT
A logic variable. If TRUE, perform the distributional transformation to make the discrete data 'continuous'. This is useful for discrete distributions (e.g., Poisson, NB). Default is TRUE. Note that for continuous data (e.g., Gaussian), DT does not make sense and should be set as FALSE.
- pseudo_obs
A logic variable. If TRUE, use the empirical quantiles instead of theoretical quantiles for fitting copula. Default is FALSE.
- epsilon
A numeric variable for preventing the transformed quantiles to collapse to 0 or 1.
- family_set
A string or a string vector of the bivariate copula families. Default is c("gaussian", "indep").
- important_feature
A numeric value or vector which indicates whether a gene will be used in correlation estimation or not. If this is a numeric value, then gene with zero proportion greater than this value will be excluded form gene-gene correlation estimation. If this is a vector, then this should be a logical vector with length equal to the number of genes in
sce
.TRUE
in the logical vector means the corresponding gene will be included in gene-gene correlation estimation andFALSE
in the logical vector means the corresponding gene will be excluded from the gene-gene correlation estimation. The default value for is "all" (a special string which means no filtering).- if_sparse
A logic variable. Only works for Gaussian copula (
family_set = "gaussian"
). If TRUE, a thresholding strategy will make the corr matrix sparse.- n_cores
An integer. The number of cores to use.
- parallelization
A string indicating the specific parallelization function to use. Must be one of 'mcmapply', 'bpmapply', or 'pbmcmapply', which corresponds to the parallelization function in the package
parallel
,BiocParallel
, andpbmcapply
respectively. The default value is 'mcmapply'.- BPPARAM
A
MulticoreParam
object or NULL. When the parameter parallelization = 'mcmapply' or 'pbmcmapply', this parameter must be NULL. When the parameter parallelization = 'bpmapply', this parameter must be one of theMulticoreParam
object offered by the package 'BiocParallel. The default value is NULL.
Value
A list with the components:
new_mvu
A matrix of the new multivariate uniform distribution from the copula.
copula_list
A list of the fitted copula model. If using Gaussian copula, a list of correlation matrices; if vine, a list of vine objects.
model_aic
A vector of the marginal AIC and the copula AIC.
model_bic
A vector of the marginal BIC and the copula BIC.
Details
This function takes the result from fit_marginal
as the input and
and fit the copula model on the residuals.
Examples
data(example_sce)
my_data <- construct_data(
sce = example_sce,
assay_use = "counts",
celltype = "cell_type",
pseudotime = "pseudotime",
spatial = NULL,
other_covariates = NULL,
corr_by = "1"
)
my_marginal <- fit_marginal(
data = my_data,
mu_formula = "s(pseudotime, bs = 'cr', k = 10)",
sigma_formula = "1",
family_use = "nb",
n_cores = 1,
usebam = FALSE
)
my_copula <- fit_copula(
sce = example_sce,
assay_use = "counts",
marginal_list = my_marginal,
family_use = c(rep("nb", 5), rep("zip", 5)),
copula = "vine",
n_cores = 1,
input_data = my_data$dat
)
#> Convert Residuals to Uniform
#> Converting End
#> Copula group 1 starts
#> Vine Copula Estimation Starts
#> Time difference of 0.1842949 secs
#> Vine Copula Estimation Ends