tools

sofa.utils.utils.calc_rmse(X, X_pred)

Calculate the root mean squared error between X and X_pred.

Parameters
  • X_pred (numpy.array) – Predicted X.

  • X (numpy.array) – Input X.

Returns

The root mean squared error of X of the model.

Return type

float

sofa.utils.utils.calc_var_explained(X_pred, X)

Calculate R2 for X and X_pred.

Parameters
  • X_pred (numpy.array) – Predicted X.

  • X (numpy.array) – Input X.

Returns

R2 value for X and X_pred.

Return type

float

sofa.utils.utils.calc_var_explained_(X_pred, X)

Calculate the fraction of variance of each view that is explained by each factor.

Parameters
  • X_pred (numpy.array) – Predicted X.

  • X (numpy.array) – Input X.

Returns

Array containing the fraction of variance of each view that is explained by each factor.

Return type

numpy.array

sofa.utils.utils.get_ad(data: pandas.core.frame.DataFrame, llh: str = 'gaussian', select_hvg: bool = False, log: bool = False, scale: bool = False, scaling_factor: float = 0.1) anndata._core.anndata.AnnData

Convert a numpy array to an AnnData object.

datapandas DataFrame

The input data to be converted to AnnData object.

namestr

The name of the variable.

llhstr, optional

The likelihood of the data. It should be “gaussian”, “bernoulli” or “categorical”. Default is “gaussian”.

select_hvg: bool, optional

whether to select highly variable features.

log: bool, optional

whether to log transform the data.

scale: bool, optional

whether to center and scale the data.

scaling_factor: float, optional

The scaling factor to scale the likelihood for this view. It is a crucial hyperparameter. Too high scaling parameters can lead to overfitting (Factors don’t explain variance of Xmdata, but perfectly predict Ymdata). Too low values can lead to the model ignoring the covariate guidance. In practice a value of 0.1 is a good starting point. Default is 0.1.

adataAnnData

The converted AnnData object.

sofa.utils.utils.get_factors(model: sofa.models.SOFA.SOFA) pandas.core.frame.DataFrame

Get the loadings of the model for a specific view.

Parameters

model (SOFA) – The trained SOFA model.

Returns

DataFrame containing the loadings of the model for the specified view.

Return type

pd.DataFrame

sofa.utils.utils.get_gsea_enrichment(gene_list, db, background)

Get gene set enrichment analysis results based on a gene_list using gseapy.

Parameters
  • gene_list (list) – List of strings containing gene names.

  • db (list) – List of strings containing database names to be used for enrichment analysis.

  • background (list) – List of strings containing gene names to be used as background.

Returns

Enrichr object containing the results of the enrichment analysis.

Return type

Enrichr object

sofa.utils.utils.get_guide_error(model)

Calculate the root mean squared error for continuous, binary crossentropy for binary or categorical cross entropy for categorical Y of the model.

Parameters

model (SOFA) – The trained SOFA model.

Returns

Containing the root mean squared error for continuous, binary crossentropy for binary or categorical cross entropy for categorical Y of the model.

Return type

dict

sofa.utils.utils.get_loadings(model: sofa.models.SOFA.SOFA, view: str) pandas.core.frame.DataFrame

Get the loadings of the model for a specific view.

Parameters
  • model (SOFA) – The trained SOFA model.

  • view (str) – Name of the view to get the loadings for.

Returns

DataFrame containing the loadings of the model for the specified view.

Return type

pd.DataFrame

sofa.utils.utils.get_rmse(model)

Calculate the root mean squared error of the model.

Parameters

model (SOFA) – THe trained SOFA model.

Returns

The root mean squared error of X of the model for each view.

Return type

dict

sofa.utils.utils.get_top_loadings(model, view, factor, sign='+', top_n=100)

Get the top_n loadings of the model for a specific view.

Parameters
  • model (SOFA) – The trained SOFA model.

  • view (str) – Name of the view to get the loadings for.

  • factor (int) – Index of the factor to get the top loadings for. Should be between 1 and the total number of factors.

  • sign (str) – Sign of the loadings to get. Default is “+”.

  • top_n (int) – Number of top loadings to get. Default is 100.

Returns

DataFrame containing the top_n loadings of the model for the specified view.

Return type

pandas.DataFrame

sofa.utils.utils.get_var_explained_per_view_factor(model: sofa.models.SOFA.SOFA)

Calculate the fraction of variance of each view that is explained by each factor. :param model: The trained SOFA model. :type model: SOFA

Returns

Array containing the fraction of variance of each view that is explained by each factor.

Return type

numpy.array

sofa.utils.utils.load_model(file_prefix)

Load a saved model from disk. The function requires an h5mu and a save file to load model.

Parameters

file_prefix (str) – Filename prefix to save the model as h5mu and save files.

Returns

The loaded SOFA model.

Return type

SOFA

sofa.utils.utils.save_model(model, file_prefix)

Saves a model as h5mu and save files to disk. Model hyperparameters, input data and predictions are saved in the h5mu file and the model parameters are saved in the save file. Both files are needed to load a model and continue training.

Parameters
  • model (SOFA) – The trained SOFA model.

  • file_prefix (str) – Filename prefix to save the model as h5mu and save files.

Returns

Filenames of the saved h5mu and save files.

Return type

tuple(str,str)