Fit the copula model — fit_copula • scDesign3

fit_copula fits the copula model.

Usage

fit_copula(
  sce,
  assay_use,
  input_data,
  empirical_quantile = FALSE,
  marginal_list,
  family_use,
  copula = "gaussian",
  DT = TRUE,
  pseudo_obs = FALSE,
  epsilon = 1e-06,
  family_set = c("gaussian", "indep"),
  important_feature = "all",
  if_sparse = FALSE,
  n_cores,
  parallelization = "mcmapply",
  BPPARAM = NULL
)

Arguments

sce: A SingleCellExperiment object.
assay_use: A string which indicates the assay you will use in the sce. Default is 'counts'.
input_data: The input data, which is one of the output from construct_data.
empirical_quantile: Please only use it if you clearly know what will happen! A logic variable. If TRUE, DO NOT fit the copula and use the EMPIRICAL CDF values of the original data; it will make the simulated data fixed (no randomness). Default is FALSE. Only works if ncell is the same as your original data.
marginal_list: A list of fitted regression models from fit_marginal.
family_use: A string or a vector of strings of the marginal distribution. Must be one of 'poisson', 'nb', 'zip', 'zinb' or 'gaussian'.
copula: A string of the copula choice. Must be one of 'gaussian' or 'vine'. Default is 'gaussian'. Note that vine copula may have better modeling of high-dimensions, but can be very slow when features are >1000.
DT: A logic variable. If TRUE, perform the distributional transformation to make the discrete data 'continuous'. This is useful for discrete distributions (e.g., Poisson, NB). Default is TRUE. Note that for continuous data (e.g., Gaussian), DT does not make sense and should be set as FALSE.
pseudo_obs: A logic variable. If TRUE, use the empirical quantiles instead of theoretical quantiles for fitting copula. Default is FALSE.
epsilon: A numeric variable for preventing the transformed quantiles to collapse to 0 or 1.
family_set: A string or a string vector of the bivariate copula families. Default is c("gaussian", "indep").
important_feature: A numeric value or vector which indicates whether a gene will be used in correlation estimation or not. If this is a numeric value, then gene with zero proportion greater than this value will be excluded form gene-gene correlation estimation. If this is a vector, then this should be a logical vector with length equal to the number of genes in sce. TRUE in the logical vector means the corresponding gene will be included in gene-gene correlation estimation and FALSE in the logical vector means the corresponding gene will be excluded from the gene-gene correlation estimation. The default value for is "all" (a special string which means no filtering).
if_sparse: A logic variable. Only works for Gaussian copula (family_set = "gaussian"). If TRUE, a thresholding strategy will make the corr matrix sparse.
n_cores: An integer. The number of cores to use.
parallelization: A string indicating the specific parallelization function to use. Must be one of 'mcmapply', 'bpmapply', or 'pbmcmapply', which corresponds to the parallelization function in the package parallel,BiocParallel, and pbmcapply respectively. The default value is 'mcmapply'.
BPPARAM: A MulticoreParam object or NULL. When the parameter parallelization = 'mcmapply' or 'pbmcmapply', this parameter must be NULL. When the parameter parallelization = 'bpmapply', this parameter must be one of the MulticoreParam object offered by the package 'BiocParallel. The default value is NULL.

Value

A list with the components:

new_mvu: A matrix of the new multivariate uniform distribution from the copula.
copula_list: A list of the fitted copula model. If using Gaussian copula, a list of correlation matrices; if vine, a list of vine objects.
model_aic: A vector of the marginal AIC and the copula AIC.
model_bic: A vector of the marginal BIC and the copula BIC.

Details

This function takes the result from fit_marginal as the input and and fit the copula model on the residuals.

Examples

  data(example_sce)
  my_data <- construct_data(
  sce = example_sce,
  assay_use = "counts",
  celltype = "cell_type",
  pseudotime = "pseudotime",
  spatial = NULL,
  other_covariates = NULL,
  corr_by = "1"
  )
  my_marginal <- fit_marginal(
  data = my_data,
  mu_formula = "s(pseudotime, bs = 'cr', k = 10)",
  sigma_formula = "1",
  family_use = "nb",
  n_cores = 1,
  usebam = FALSE
  )
  my_copula <- fit_copula(
  sce = example_sce,
  assay_use = "counts",
  marginal_list = my_marginal,
  family_use = c(rep("nb", 5), rep("zip", 5)),
  copula = "vine",
  n_cores = 1,
  input_data = my_data$dat
  )
#> Convert Residuals to Uniform
#> Converting End
#> Copula group 1 starts
#> Vine Copula Estimation Starts
#> Time difference of 0.1842949 secs
#> Vine Copula Estimation Ends