tools
- sofa.utils.utils.calc_rmse(X, X_pred)
Calculate the root mean squared error between X and X_pred.
- Parameters
X_pred (numpy.array) – Predicted X.
X (numpy.array) – Input X.
- Returns
The root mean squared error of X of the model.
- Return type
float
- sofa.utils.utils.calc_var_explained(X_pred, X)
Calculate R2 for X and X_pred.
- Parameters
X_pred (numpy.array) – Predicted X.
X (numpy.array) – Input X.
- Returns
R2 value for X and X_pred.
- Return type
float
- sofa.utils.utils.calc_var_explained_(X_pred, X)
Calculate the fraction of variance of each view that is explained by each factor.
- Parameters
X_pred (numpy.array) – Predicted X.
X (numpy.array) – Input X.
- Returns
Array containing the fraction of variance of each view that is explained by each factor.
- Return type
numpy.array
- sofa.utils.utils.get_ad(data: pandas.core.frame.DataFrame, llh: str = 'gaussian', select_hvg: bool = False, log: bool = False, scale: bool = False, scaling_factor: float = 0.1) anndata._core.anndata.AnnData
Convert a numpy array to an AnnData object.
- datapandas DataFrame
The input data to be converted to AnnData object.
- namestr
The name of the variable.
- llhstr, optional
The likelihood of the data. It should be “gaussian”, “bernoulli” or “categorical”. Default is “gaussian”.
- select_hvg: bool, optional
whether to select highly variable features.
- log: bool, optional
whether to log transform the data.
- scale: bool, optional
whether to center and scale the data.
- scaling_factor: float, optional
The scaling factor to scale the likelihood for this view. It is a crucial hyperparameter. Too high scaling parameters can lead to overfitting (Factors don’t explain variance of Xmdata, but perfectly predict Ymdata). Too low values can lead to the model ignoring the covariate guidance. In practice a value of 0.1 is a good starting point. Default is 0.1.
- adataAnnData
The converted AnnData object.
- sofa.utils.utils.get_factors(model: sofa.models.SOFA.SOFA) pandas.core.frame.DataFrame
Get the loadings of the model for a specific view.
- Parameters
model (SOFA) – The trained SOFA model.
- Returns
DataFrame containing the loadings of the model for the specified view.
- Return type
pd.DataFrame
- sofa.utils.utils.get_gsea_enrichment(gene_list, db, background)
Get gene set enrichment analysis results based on a gene_list using gseapy.
- Parameters
gene_list (list) – List of strings containing gene names.
db (list) – List of strings containing database names to be used for enrichment analysis.
background (list) – List of strings containing gene names to be used as background.
- Returns
Enrichr object containing the results of the enrichment analysis.
- Return type
Enrichr object
- sofa.utils.utils.get_guide_error(model)
Calculate the root mean squared error for continuous, binary crossentropy for binary or categorical cross entropy for categorical Y of the model.
- Parameters
model (SOFA) – The trained SOFA model.
- Returns
Containing the root mean squared error for continuous, binary crossentropy for binary or categorical cross entropy for categorical Y of the model.
- Return type
dict
- sofa.utils.utils.get_loadings(model: sofa.models.SOFA.SOFA, view: str) pandas.core.frame.DataFrame
Get the loadings of the model for a specific view.
- Parameters
model (SOFA) – The trained SOFA model.
view (str) – Name of the view to get the loadings for.
- Returns
DataFrame containing the loadings of the model for the specified view.
- Return type
pd.DataFrame
- sofa.utils.utils.get_rmse(model)
Calculate the root mean squared error of the model.
- Parameters
model (SOFA) – THe trained SOFA model.
- Returns
The root mean squared error of X of the model for each view.
- Return type
dict
- sofa.utils.utils.get_top_loadings(model, view, factor, sign='+', top_n=100)
Get the top_n loadings of the model for a specific view.
- Parameters
model (SOFA) – The trained SOFA model.
view (str) – Name of the view to get the loadings for.
factor (int) – Index of the factor to get the top loadings for. Should be between 1 and the total number of factors.
sign (str) – Sign of the loadings to get. Default is “+”.
top_n (int) – Number of top loadings to get. Default is 100.
- Returns
DataFrame containing the top_n loadings of the model for the specified view.
- Return type
pandas.DataFrame
- sofa.utils.utils.get_var_explained_per_view_factor(model: sofa.models.SOFA.SOFA)
Calculate the fraction of variance of each view that is explained by each factor. :param model: The trained SOFA model. :type model: SOFA
- Returns
Array containing the fraction of variance of each view that is explained by each factor.
- Return type
numpy.array
- sofa.utils.utils.load_model(file_prefix)
Load a saved model from disk. The function requires an h5mu and a save file to load model.
- Parameters
file_prefix (str) – Filename prefix to save the model as h5mu and save files.
- Returns
The loaded SOFA model.
- Return type
- sofa.utils.utils.save_model(model, file_prefix)
Saves a model as h5mu and save files to disk. Model hyperparameters, input data and predictions are saved in the h5mu file and the model parameters are saved in the save file. Both files are needed to load a model and continue training.
- Parameters
model (SOFA) – The trained SOFA model.
file_prefix (str) – Filename prefix to save the model as h5mu and save files.
- Returns
Filenames of the saved h5mu and save files.
- Return type
tuple(str,str)