This function runs the NCLUSION method
run_nclusion(datafilename1, KMax, alpha1, gamma1,
seed, elbo_ep,
dataset,
outdir, logger=nothing,
num_iter=150, save_metrics=false)
Absolute path to preprocessed AnnData object with scRNA-seq expression values (cells x genes). The AnnData.obs_names layer must be occupied with the unique cell IDs, the AnnData.var_names with unique gene identifiers, AnnData.X with the expression matrix, and AnnData.obs["condition"] with the condition label for each cell. Optional: if the dataset has been pre-annotated, AnnData.obs["cell_type"] may be occupied with these annotations. For more detailed information on the required input data structure see Get started.
Maximum number of clusters NCLUSION will be initialized with
Second level concentration parameter (smaller values give fewer cluster)
Top level concentration parameter (smaller values give fewer cluster)
Random seed
Minumum tolerance in the change in elbo values between iterations. This determines whether or not convergence is reached.
Name of the data set being analyzed (for output naming purposes)
Absolute path to where NCLUSION's output will be written
Logging object that writes algorithms progress to stdout [Default: nothing]
Maximum number of iterations the variational inference algorithm will continue to run before stopping if convergence is not reached [Default: 150]
Boolean that indicates whether or not NCLUSION should calculate label-dependent clustering metrics (Note: Requires reference labels to be provided in the AnnData.obs["cell_type"] of the input data object) [Default: false]
Dictionary of all estimated parameters, including PIPs (yjk_) and cluster propbability (rtik_)
The following output files are automatically saved to
outputs/EXPERIMENT_nclusion_{DATA-NAME}/DATASET_{DATA-NAME}_{NUM-GENES}HVGs-{NUM-CELLS}N/{YEAR_MONTH_DATE_TIME}/
which is automatically generated in the specified output
directory, outdir
, after running NCLUSION.
NOTE: {DATA-NAME} corresponds to the dataset
input variable.
{NUM-GENES} corresponds to the number of genes included in the input AnnData
object. {NUM-CELLS} corresponds to the number of cells included in the input
AnnData object. {YEAR-MONTH-DATE-TIME} corresponds to the time-stamp at which the
outputs were generated (calculated automatically by NCLUSION).
AnnData.obs['cell_type']
layer of
the input data. Otherwise they remain empty: