Reference

This function runs the NCLUSION method

run_nclusion(datafilename1, KMax, alpha1, gamma1,
    seed, elbo_ep,
    dataset,
    outdir, logger=nothing,
    num_iter=150, save_metrics=false)

Arguments

datafilename1: Absolute path to preprocessed AnnData object with scRNA-seq expression values (cells x genes). The AnnData.obs_names layer must be occupied with the unique cell IDs, the AnnData.var_names with unique gene identifiers, AnnData.X with the expression matrix, and AnnData.obs["condition"] with the condition label for each cell. Optional: if the dataset has been pre-annotated, AnnData.obs["cell_type"] may be occupied with these annotations. For more detailed information on the required input data structure see Get started.
KMax: Maximum number of clusters NCLUSION will be initialized with
alpha1: Second level concentration parameter (smaller values give fewer cluster)
gamma1: Top level concentration parameter (smaller values give fewer cluster)
seed: Random seed
elbo_ep: Minumum tolerance in the change in elbo values between iterations. This determines whether or not convergence is reached.
dataset: Name of the data set being analyzed (for output naming purposes)
outdir: Absolute path to where NCLUSION's output will be written
logger: Logging object that writes algorithms progress to stdout [Default: nothing]
num_iter: Maximum number of iterations the variational inference algorithm will continue to run before stopping if convergence is not reached [Default: 150]
save_metrics: Boolean that indicates whether or not NCLUSION should calculate label-dependent clustering metrics (Note: Requires reference labels to be provided in the AnnData.obs["cell_type"] of the input data object) [Default: false]

Value

Dictionary of all estimated parameters, including PIPs (yjk_) and cluster propbability (rtik_)

Saved Output Files

The following output files are automatically saved to outputs/EXPERIMENT_nclusion_{DATA-NAME}/DATASET_{DATA-NAME}_{NUM-GENES}HVGs-{NUM-CELLS}N/{YEAR_MONTH_DATE_TIME}/ which is automatically generated in the specified output directory, outdir, after running NCLUSION.

NOTE: {DATA-NAME} corresponds to the dataset input variable. {NUM-GENES} corresponds to the number of genes included in the input AnnData object. {NUM-CELLS} corresponds to the number of cells included in the input AnnData object. {YEAR-MONTH-DATE-TIME} corresponds to the time-stamp at which the outputs were generated (calculated automatically by NCLUSION).

{DATA-NAME}_{NUM-GENES}HVGs-{NUM-CELLS}N_nclusion-{YEAR-MONTH-DATE-TIME}.csv
- .csv file that contains the NCLUSION clustering assignments, where each row displays relevant metadata and clustering results for each cell.
- The 'condition' column displays the cell's experimental condition labels (all cells in this study have the same condition label).
- The 'cell_id' column displays the cell's unique identifier label.
- The 'inferred_label' column displays the cell's NCLUSION cluster assignment.
NOTE: the following columns are only filled if alternative cell-type annotations are provided in the AnnData.obs['cell_type'] layer of the input data. Otherwise they remain empty:
- The 'cell_type' column displays the cell's alternative cell-type annotation obtained from a different study.
- The 'called_label' column displays the numerical equivalent of the 'cell_type' label, mapped automatically by NCLUSION.
{NUM-GENES}G-{YEAR-MONTH-DATE-TIME}-pips.csv
- .csv file that contains the Posterior Inclusion Probabilities (PIPs) of genes across clusters identified by NCLUSION. PIPs are used as a summary of evidence for a gene being associated with driving the identity of any phenotypic cluster. For a particular cluster, a higher PIP indicates that a gene is more significant to the formation of that cluster.
_QuickSummary_{YEAR-MONTH-DATE-TIME}.txt
- .txt file that contains the input parameters that were used to run NCLUSION.
output.jld2
- .jld2 file that contains all model parameters
{YEAR-MONTH-DATE-TIME}-Nk.csv
- .csv file that contains estimated number of cells in each cluster identified by NCLUSION

run_nclusion()

Arguments

Value

Saved Output Files