fit_marginal
fits the per-feature regression models.
Usage
fit_marginal(
data,
predictor = "gene",
mu_formula,
sigma_formula,
family_use,
n_cores,
usebam = FALSE,
edf_flexible = FALSE,
parallelization = "mcmapply",
BPPARAM = NULL,
trace = FALSE,
simplify = FALSE,
filter_cells = FALSE
)
Arguments
- data
An object from
construct_data
.- predictor
A string of the predictor for the gam/gamlss model. Default is "gene". This is just a name.
- mu_formula
A string of the mu parameter formula. It follows the format of formula in
bam
. Note: if the formula has multiple smoothers (s()
) (we do not recommend this), please put the one with largest k (most complex one) as the first one.- sigma_formula
A string of the sigma parameter formula
- family_use
A string or a vector of strings of the marginal distribution. Must be one of 'binomial', 'poisson', 'nb', 'zip', 'zinb' or 'gaussian', which represent 'poisson distribution', 'negative binomial distribution', 'zero-inflated poisson distribution', 'zero-inflated negative binomial distribution', and 'gaussian distribution' respectively.
- n_cores
An integer. The number of cores to use.
- usebam
A logic variable. If use
bam
for acceleration.- edf_flexible
A logic variable. It uses simpler model to accelerate the marginal fitting with a mild loss of accuracy. If TRUE, the fitted regression model will use the fitted relationship between Gini coefficient and the effective degrees of freedom on a random selected gene sets. Default is FALSE.
- parallelization
A string indicating the specific parallelization function to use. Must be one of 'mcmapply', 'bpmapply', or 'pbmcmapply', which corresponds to the parallelization function in the package
parallel
,BiocParallel
, andpbmcapply
respectively. The default value is 'mcmapply'.- BPPARAM
A
MulticoreParam
object or NULL. When the parameter parallelization = 'mcmapply' or 'pbmcmapply', this parameter must be NULL. When the parameter parallelization = 'bpmapply', this parameter must be one of theMulticoreParam
object offered by the package 'BiocParallel. The default value is NULL.- trace
A logic variable. If TRUE, the warning/error log and runtime for gam/gamlss will be returned. will be returned, FALSE otherwise. Default is FALSE.
- simplify
A logic variable. If TRUE, the fitted regression model will only keep the essential contains for
predict
. Default is FALSE.- filter_cell
A logic variable. If TRUE, when all covariates used for fitting the GAM/GAMLSS model are categorical, the code will check each unique combination of categories and remove cells in that category if it has all zero gene expression for each fitted gene.
Details
The function takes the result from construct_data
as the input,
and fit the regression models for each feature based on users' specification.
Examples
data(example_sce)
my_data <- construct_data(
sce = example_sce,
assay_use = "counts",
celltype = "cell_type",
pseudotime = "pseudotime",
spatial = NULL,
other_covariates = NULL,
corr_by = "1"
)
my_marginal <- fit_marginal(
data = my_data,
mu_formula = "s(pseudotime, bs = 'cr', k = 10)",
sigma_formula = "1",
family_use = "nb",
n_cores = 1,
usebam = FALSE
)