tools

sofa.utils.utils.calc_rmse(X, X_pred)

Calculate the root mean squared error between X and X_pred.

Parameters

X_pred (numpy.array) – Predicted X.
X (numpy.array) – Input X.

Returns

The root mean squared error of X of the model.

Return type

float

sofa.utils.utils.calc_var_explained(X_pred, X)

Calculate R2 for X and X_pred.

Parameters

X_pred (numpy.array) – Predicted X.
X (numpy.array) – Input X.

Returns

R2 value for X and X_pred.

Return type

float

sofa.utils.utils.calc_var_explained_(X_pred, X)

Calculate the fraction of variance of each view that is explained by each factor.

Parameters

X_pred (numpy.array) – Predicted X.
X (numpy.array) – Input X.

Returns

Array containing the fraction of variance of each view that is explained by each factor.

Return type

numpy.array

sofa.utils.utils.get_ad(data: pandas.core.frame.DataFrame, llh: str = 'gaussian', select_hvg: bool = False, log: bool = False, scale: bool = False, scaling_factor: float = 0.1) → anndata._core.anndata.AnnData

Convert a numpy array to an AnnData object.

datapandas DataFrame: The input data to be converted to AnnData object.
namestr: The name of the variable.
llhstr, optional: The likelihood of the data. It should be “gaussian”, “bernoulli” or “categorical”. Default is “gaussian”.
select_hvg: bool, optional: whether to select highly variable features.
log: bool, optional: whether to log transform the data.
scale: bool, optional: whether to center and scale the data.
scaling_factor: float, optional: The scaling factor to scale the likelihood for this view. It is a crucial hyperparameter. Too high scaling parameters can lead to overfitting (Factors don’t explain variance of Xmdata, but perfectly predict Ymdata). Too low values can lead to the model ignoring the covariate guidance. In practice a value of 0.1 is a good starting point. Default is 0.1.

adataAnnData: The converted AnnData object.

sofa.utils.utils.get_factors(model: sofa.models.SOFA.SOFA) → pandas.core.frame.DataFrame

Get the loadings of the model for a specific view.

Parameters: model (SOFA) – The trained SOFA model.
Returns: DataFrame containing the loadings of the model for the specified view.
Return type: pd.DataFrame

sofa.utils.utils.get_gsea_enrichment(gene_list, db, background)

Get gene set enrichment analysis results based on a gene_list using gseapy.

Parameters

gene_list (list) – List of strings containing gene names.
db (list) – List of strings containing database names to be used for enrichment analysis.
background (list) – List of strings containing gene names to be used as background.

Returns

Enrichr object containing the results of the enrichment analysis.

Return type

Enrichr object

sofa.utils.utils.get_guide_error(model)

Calculate the root mean squared error for continuous, binary crossentropy for binary or categorical cross entropy for categorical Y of the model.

Parameters: model (SOFA) – The trained SOFA model.
Returns: Containing the root mean squared error for continuous, binary crossentropy for binary or categorical cross entropy for categorical Y of the model.
Return type: dict

sofa.utils.utils.get_loadings(model: sofa.models.SOFA.SOFA, view: str) → pandas.core.frame.DataFrame

Get the loadings of the model for a specific view.

Parameters

model (SOFA) – The trained SOFA model.
view (str) – Name of the view to get the loadings for.

Returns

DataFrame containing the loadings of the model for the specified view.

Return type

pd.DataFrame

sofa.utils.utils.get_rmse(model)

Calculate the root mean squared error of the model.

Parameters: model (SOFA) – THe trained SOFA model.
Returns: The root mean squared error of X of the model for each view.
Return type: dict

sofa.utils.utils.get_top_loadings(model, view, factor, sign='+', top_n=100)

Get the top_n loadings of the model for a specific view.

Parameters

model (SOFA) – The trained SOFA model.
view (str) – Name of the view to get the loadings for.
factor (int) – Index of the factor to get the top loadings for. Should be between 1 and the total number of factors.
sign (str) – Sign of the loadings to get. Default is “+”.
top_n (int) – Number of top loadings to get. Default is 100.

Returns

DataFrame containing the top_n loadings of the model for the specified view.

Return type

pandas.DataFrame

sofa.utils.utils.get_var_explained_per_view_factor(model: sofa.models.SOFA.SOFA)

Calculate the fraction of variance of each view that is explained by each factor. :param model: The trained SOFA model. :type model: SOFA

Returns: Array containing the fraction of variance of each view that is explained by each factor.
Return type: numpy.array

sofa.utils.utils.load_model(file_prefix)

Load a saved model from disk. The function requires an h5mu and a save file to load model.

Parameters: file_prefix (str) – Filename prefix to save the model as h5mu and save files.
Returns: The loaded SOFA model.
Return type: SOFA

sofa.utils.utils.save_model(model, file_prefix)

Saves a model as h5mu and save files to disk. Model hyperparameters, input data and predictions are saved in the h5mu file and the model parameters are saved in the save file. Both files are needed to load a model and continue training.

Parameters

model (SOFA) – The trained SOFA model.
file_prefix (str) – Filename prefix to save the model as h5mu and save files.

Returns

Filenames of the saved h5mu and save files.

Return type

tuple(str,str)