Skip to contents

fit_marginal fits the per-feature regression models.

Usage

fit_marginal(
  data,
  predictor = "gene",
  mu_formula,
  sigma_formula,
  family_use,
  n_cores,
  usebam = FALSE,
  edf_flexible = FALSE,
  parallelization = "mcmapply",
  BPPARAM = NULL,
  trace = FALSE,
  simplify = FALSE,
  filter_cells = FALSE
)

Arguments

data

An object from construct_data.

predictor

A string of the predictor for the gam/gamlss model. Default is "gene". This is just a name.

mu_formula

A string of the mu parameter formula. It follows the format of formula in bam. Note: if the formula has multiple smoothers (s()) (we do not recommend this), please put the one with largest k (most complex one) as the first one.

sigma_formula

A string of the sigma parameter formula

family_use

A string or a vector of strings of the marginal distribution. Must be one of 'binomial', 'poisson', 'nb', 'zip', 'zinb' or 'gaussian', which represent 'poisson distribution', 'negative binomial distribution', 'zero-inflated poisson distribution', 'zero-inflated negative binomial distribution', and 'gaussian distribution' respectively.

n_cores

An integer. The number of cores to use.

usebam

A logic variable. If use bam for acceleration.

edf_flexible

A logic variable. It uses simpler model to accelerate the marginal fitting with a mild loss of accuracy. If TRUE, the fitted regression model will use the fitted relationship between Gini coefficient and the effective degrees of freedom on a random selected gene sets. Default is FALSE.

parallelization

A string indicating the specific parallelization function to use. Must be one of 'mcmapply', 'bpmapply', or 'pbmcmapply', which corresponds to the parallelization function in the package parallel,BiocParallel, and pbmcapply respectively. The default value is 'mcmapply'.

BPPARAM

A MulticoreParam object or NULL. When the parameter parallelization = 'mcmapply' or 'pbmcmapply', this parameter must be NULL. When the parameter parallelization = 'bpmapply', this parameter must be one of the MulticoreParam object offered by the package 'BiocParallel. The default value is NULL.

trace

A logic variable. If TRUE, the warning/error log and runtime for gam/gamlss will be returned. will be returned, FALSE otherwise. Default is FALSE.

simplify

A logic variable. If TRUE, the fitted regression model will only keep the essential contains for predict. Default is FALSE.

filter_cell

A logic variable. If TRUE, when all covariates used for fitting the GAM/GAMLSS model are categorical, the code will check each unique combination of categories and remove cells in that category if it has all zero gene expression for each fitted gene.

Value

A list of fitted regression models. The length is equal to the total feature number.

Details

The function takes the result from construct_data as the input, and fit the regression models for each feature based on users' specification.

Examples

  data(example_sce)
  my_data <- construct_data(
  sce = example_sce,
  assay_use = "counts",
  celltype = "cell_type",
  pseudotime = "pseudotime",
  spatial = NULL,
  other_covariates = NULL,
  corr_by = "1"
  )
  my_marginal <- fit_marginal(
  data = my_data,
  mu_formula = "s(pseudotime, bs = 'cr', k = 10)",
  sigma_formula = "1",
  family_use = "nb",
  n_cores = 1,
  usebam = FALSE
  )