{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "b473046a-3199-4b77-be18-bd3a74d2dfaa", "metadata": {}, "outputs": [], "source": [ "# Install SOFA + dependencies\n", "!pip install --quiet biosofa" ] }, { "cell_type": "markdown", "id": "df209001-70db-4c52-a47e-750b7f79c483", "metadata": { "tags": [] }, "source": [ "The input tcga_gyn_data.h5mu needs to be download from zenodo: https://zenodo.org/records/14761127" ] }, { "cell_type": "code", "execution_count": 2, "id": "86dd3368-ce4c-4564-b929-b17113420f83", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/capraz/hubershare/anaconda3/envs/base_copy/lib/python3.8/site-packages/torch/onnx/_internal/_beartype.py:30: UserWarning: module 'beartype.roar' has no attribute 'BeartypeDecorHintPep585DeprecationWarning'\n", " warnings.warn(f\"{e}\")\n", "/home/capraz/hubershare/anaconda3/envs/base_copy/lib/python3.8/site-packages/umap/distances.py:1063: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n", " @numba.jit()\n", "/home/capraz/hubershare/anaconda3/envs/base_copy/lib/python3.8/site-packages/umap/distances.py:1071: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n", " @numba.jit()\n", "/home/capraz/hubershare/anaconda3/envs/base_copy/lib/python3.8/site-packages/umap/distances.py:1086: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n", " @numba.jit()\n", "/home/capraz/hubershare/anaconda3/envs/base_copy/lib/python3.8/site-packages/umap/umap_.py:660: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n", " @numba.jit()\n" ] } ], "source": [ "import sofa\n", "import torch\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.decomposition import PCA\n", "import seaborn as sns\n", "import matplotlib\n", "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", "from sklearn.linear_model import Lasso, LinearRegression\n", "from sklearn.model_selection import cross_val_predict, cross_val_score\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn.metrics import mean_squared_error, make_scorer\n", "from muon import MuData\n", "from sklearn.manifold import TSNE\n", "import matplotlib.patches as mpatches\n", "import scanpy as sc\n", "import anndata as ad\n", "from anndata import AnnData\n", "import pickle\n", "import muon\n", "from lifelines import CoxPHFitter, KaplanMeierFitter\n", "from lifelines.statistics import logrank_test\n", "import statsmodels\n", "from matplotlib import colors as mp_colors\n", "from muon import MuData\n", "import muon as mu\n", "from decimal import Decimal\n", "from seaborn import axes_style\n", "from mofapy2.run.entry_point import entry_point\n", "import seaborn.objects as so\n", "from mpl_toolkits.axes_grid1.axes_divider import make_axes_locatable, make_axes_area_auto_adjustable\n", "from matplotlib.ticker import FormatStrFormatter" ] }, { "cell_type": "markdown", "id": "3b3e9f4a-b57a-480f-9c9b-0cb9f61d9b59", "metadata": {}, "source": [ "# Analysis of TCGA data\n", "\n", "## Introduction\n", "\n", "In this notebook we will explore how `SOFA` can be used to analyze multi-omics data from the TCGA [[1]](#1). \n", "Here we give a brief introduction what the SOFA model does and what it can be used for. For a more \n", "detailed description please refer to our preprint: https://doi.org/10.1101/2024.10.10.617527 \n", "\n", "\n", "### The SOFA model\n", "Given a set of real-valued data\n", "matrices containing multi-omic measurements from overlapping samples (also called views),\n", "along with sample-level guiding variables that capture additional properties such as batches\n", "or mutational profiles, SOFA extracts an interpretable lower-dimensional data representation,\n", "consisting of a shared factor matrix and modality-specific loading matrices. The goal of these \n", "factors is to explain the major axes of variation in the data. SOFA explicitly assigns a subset of factors \n", "to explain both the multi-omics data and the guiding\n", "variables (guided factors), while preserving another subset of factors exclusively\n", "for explaining the multi-omics data (unguided factors). Importantly, this feature allows the\n", "analyst to discern variation that is driven by known sources from novel, unexplained sources\n", "of variability.\n", "\n", "#### Interpretation of the factors (Z)\n", "Analogous to the interpretation of factors in PCA, SOFA factors ordinate samples along a\n", "zero-centered axis, where samples with opposing signs exhibit contrasting phenotypes along\n", "the inferred axis of variation, and the absolute value of the factor indicates the strength of the\n", "phenotype. Importantly, SOFA partitions the factors of the low-rank decomposition into\n", "guided and unguided factors: the guided factors are linked to specific guiding variables,\n", "while the unguided factors capture global, yet unexplained, sources of variability in the data. \n", "The factor values can be used in downstream analysis tasks related to the samples, such as clustering \n", "or survival analysis. The factor values are called Z in SOFA.\n", "\n", "#### Interpretation of the loading weights (W)\n", "SOFA’s loading weights indicate the importance of each feature for its respective factor,\n", "thereby enabling the interpretation of SOFA factors. Loading weights close to zero indicate\n", "that a feature has little to no importance for the respective factor, while large magnitudes\n", "suggest strong relevance. The sign of the loading weight aligns with its corresponding factor,\n", "meaning that positive loading weights indicate higher feature levels in samples with positive\n", "factor values, and negative loading weights indicate higher feature levels in samples with\n", "negative factor values. The top loading weights can be simply inspected or used in downstream analysis such as gene set \n", "enrichment analysis. The factor values are called W in SOFA.\n", "\n", "#### Supported data\n", "SOFA expects a set of matrices containing omics measurements with matching and aligned samples and different features. \n", "Currently SOFA only supports Gaussian likelihoods, for the multi-omics data. \n", "Data should therefore be appropriately normalized according to\n", "its omics modality. Additionally, data should be centered and scaled.\n", "\n", "\n", "For the guiding variables SOFA supports Gaussian, Bernoulli and Categorical likelihoods. Guiding variables\n", "can therefore be continuous, binary or categorical. Guiding variables should be vectors with matching samples with \n", "the multi-omics data.\n", "\n", "In SOFA the multi-omics data is denoted as X and the guiding variables as Y.\n", "\n", "\n", "### The TCGA data set\n", "The pan-gynecologic cancer multi-omic data set of the cancer genome atlas (TCGA) project[[1]](#1), consists of measurements from the transcriptome (mRNA), proteome, methylome, and miRNA of 2599 samples from five different cancers. Additionally, the study includes data about mutations, metadata, and clinical endpoints progression free interval (PFI) and overall survival (OS). We used SOFA to infer 12 latent factors and guided the first 5 factors with the 5 cancer type labels form gynecologic cancers. We hypothesized that the remaining unguided factors will capture cancer type independent variation.\n", "\n", "\n", "### References\n", "[1] \n", "Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013)." ] }, { "cell_type": "markdown", "id": "b9563068-eb16-4145-8e6f-0819f125d791", "metadata": {}, "source": [ "## Read data and set hyperparameters" ] }, { "cell_type": "code", "execution_count": 3, "id": "8918d464-2460-4d79-94dd-48916c6e92b4", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
MuData object with n_obs × n_vars = 2599 × 8926\n",
" obs:\t'admin.disease_code', 'Unnamed: 0', 'type', 'age_at_initial_pathologic_diagnosis', 'gender', 'race', 'ajcc_pathologic_tumor_stage', 'clinical_stage', 'histological_type', 'histological_grade', 'initial_pathologic_dx_year', 'menopause_status', 'birth_days_to', 'vital_status', 'tumor_status', 'last_contact_days_to', 'death_days_to', 'cause_of_death', 'new_tumor_event_type', 'new_tumor_event_site', 'new_tumor_event_site_other', 'new_tumor_event_dx_days_to', 'treatment_outcome_first_course', 'margin_status', 'residual_tumor', 'OS', 'OS.time', 'DSS', 'DSS.time', 'DFI', 'DFI.time', 'PFI', 'PFI.time', 'Redaction'\n",
" 10 modalities\n",
" RNA:\t2599 x 4436\n",
" uns:\t'llh', 'log1p'\n",
" obsm:\t'mask'\n",
" Protein:\t2599 x 183\n",
" uns:\t'llh'\n",
" obsm:\t'mask'\n",
" Methylation:\t2599 x 3436\n",
" uns:\t'llh', 'log1p'\n",
" obsm:\t'mask'\n",
" miRNA:\t2599 x 680\n",
" uns:\t'llh', 'log1p'\n",
" obsm:\t'mask'\n",
" Mutations:\t2599 x 186\n",
" uns:\t'llh'\n",
" obsm:\t'mask'\n",
" brca:\t2599 x 1\n",
" uns:\t'llh', 'scaling_factor'\n",
" obsm:\t'mask'\n",
" cesc:\t2599 x 1\n",
" uns:\t'llh', 'scaling_factor'\n",
" obsm:\t'mask'\n",
" ov:\t2599 x 1\n",
" uns:\t'llh', 'scaling_factor'\n",
" obsm:\t'mask'\n",
" ucec:\t2599 x 1\n",
" uns:\t'llh', 'scaling_factor'\n",
" obsm:\t'mask'\n",
" ucs:\t2599 x 1\n",
" uns:\t'llh', 'scaling_factor'\n",
" obsm:\t'mask'"
],
"text/plain": [
"MuData object with n_obs × n_vars = 2599 × 8926\n",
" obs:\t'admin.disease_code', 'Unnamed: 0', 'type', 'age_at_initial_pathologic_diagnosis', 'gender', 'race', 'ajcc_pathologic_tumor_stage', 'clinical_stage', 'histological_type', 'histological_grade', 'initial_pathologic_dx_year', 'menopause_status', 'birth_days_to', 'vital_status', 'tumor_status', 'last_contact_days_to', 'death_days_to', 'cause_of_death', 'new_tumor_event_type', 'new_tumor_event_site', 'new_tumor_event_site_other', 'new_tumor_event_dx_days_to', 'treatment_outcome_first_course', 'margin_status', 'residual_tumor', 'OS', 'OS.time', 'DSS', 'DSS.time', 'DFI', 'DFI.time', 'PFI', 'PFI.time', 'Redaction'\n",
" 10 modalities\n",
" RNA:\t2599 x 4436\n",
" uns:\t'llh', 'log1p'\n",
" obsm:\t'mask'\n",
" Protein:\t2599 x 183\n",
" uns:\t'llh'\n",
" obsm:\t'mask'\n",
" Methylation:\t2599 x 3436\n",
" uns:\t'llh', 'log1p'\n",
" obsm:\t'mask'\n",
" miRNA:\t2599 x 680\n",
" uns:\t'llh', 'log1p'\n",
" obsm:\t'mask'\n",
" Mutations:\t2599 x 186\n",
" uns:\t'llh'\n",
" obsm:\t'mask'\n",
" brca:\t2599 x 1\n",
" uns:\t'llh', 'scaling_factor'\n",
" obsm:\t'mask'\n",
" cesc:\t2599 x 1\n",
" uns:\t'llh', 'scaling_factor'\n",
" obsm:\t'mask'\n",
" ov:\t2599 x 1\n",
" uns:\t'llh', 'scaling_factor'\n",
" obsm:\t'mask'\n",
" ucec:\t2599 x 1\n",
" uns:\t'llh', 'scaling_factor'\n",
" obsm:\t'mask'\n",
" ucs:\t2599 x 1\n",
" uns:\t'llh', 'scaling_factor'\n",
" obsm:\t'mask'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# First we read the preprocessed data as a single MuData object\n",
"mdata = mu.read(\"data/tcga/tcga_gyn_data.h5mu\")\n",
"mdata"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b430fadb-66cf-4d58-96ef-cfe0ad4fb100",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# We create the MuData object Xmdata, which contains the multi-omics data:\n",
"Xmdata = MuData({\"RNA\":mdata[\"RNA\"], \"Protein\":mdata[\"Protein\"], \"Methylation\":mdata[\"Methylation\"], \"miRNA\":mdata[\"miRNA\"]})\n",
"# We create the MuData objectYmdata, which contains the guiding variables:\n",
"Ymdata = MuData({\"BRCA\":mdata[\"brca\"], \"CESC\": mdata[\"cesc\"], \"OV\": mdata[\"ov\"], \"UCEC\":mdata[\"ucec\"], \"UCS\":mdata[\"ucs\"]})"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "15b70394-f7b6-4afe-ab8c-18300a5e9661",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]], dtype=torch.float64)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We set the number of factors to infer\n",
"num_factors = 12\n",
"# Use obs as metadata of the cell lines\n",
"metadata = mdata.obs\n",
"# In order to relate factors to guiding variables we need to provide a design matrix (guiding variables x number of factors) \n",
"# indicating which factor is guided by which guiding variable.\n",
"# Here we just indicate that the first 5 factors are each guided by a different guiding variable:\n",
"design = np.zeros((len(Ymdata.mod), num_factors))\n",
"for i in range(len(Ymdata.mod)):\n",
" design[i,i] = 1\n",
" \n",
"# convert to torch tensor to make it usable by SOFA\n",
"design = torch.tensor(design)\n",
"design"
]
},
{
"cell_type": "markdown",
"id": "722697d1-5b85-4aab-bdb2-d41b97e19359",
"metadata": {},
"source": [
"## Fit the `SOFA` model"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "502876e2-440a-4473-bfd0-54d3654419a4",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Current Elbo 1.40E+07 | Delta: 11086: 100%|██████████| 6000/6000 [13:03<00:00, 7.66it/s] \n",
"Current Elbo 1.39E+07 | Delta: 5385: 100%|██████████| 3000/3000 [12:36<00:00, 3.97it/s] \n",
"Current Elbo 1.39E+07 | Delta: 13180: 100%|██████████| 6000/6000 [25:08<00:00, 3.98it/s] \n",
"Current Elbo 1.38E+07 | Delta: 1659: 100%|██████████| 3000/3000 [12:33<00:00, 3.98it/s] \n",
"Current Elbo 1.39E+07 | Delta: 5138: 100%|██████████| 6000/6000 [25:13<00:00, 3.97it/s] \n",
"Current Elbo 1.38E+07 | Delta: -2798: 100%|██████████| 3000/3000 [12:33<00:00, 3.98it/s]\n",
"Current Elbo 1.38E+07 | Delta: 17196: 100%|██████████| 6000/6000 [25:07<00:00, 3.98it/s] \n",
"Current Elbo 1.38E+07 | Delta: -4738: 100%|██████████| 3000/3000 [12:35<00:00, 3.97it/s] \n"
]
}
],
"source": [
"model = sofa.SOFA(Xmdata = Xmdata, # the input multi-omics data \n",
" num_factors=num_factors, # number of factors to infer\n",
" Ymdata = Ymdata, # the input guiding variables\n",
" design = design, # design matrix relating factors to guiding variables\n",
" device='cuda', # set device to \"cuda\" to enable computation on the GPU, if you don't have a GPU available set it to \"cpu\"\n",
" seed=42) # set seed to get the same results every time we run it\n",
"# train SOFA with learning rate of 0.01 for 3000 steps\n",
"model.fit(n_steps=6000, lr=0.01, predict=False)\n",
"# decrease learning rate to 0.005 and continue training\n",
"model.fit(n_steps=3000, lr=0.005)\n",
"models.append(model)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d69f52bb-2fc5-4c84-853b-8b99969b7e93",
"metadata": {},
"outputs": [],
"source": [
"# if we would like to save the fitted model we can save it using:\n",
"#sofa.tl.save_model(model,\"models/model_name\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "15d96e01-4ad2-43fd-b72f-26524be550ee",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# load fitted model to exactly reproduce manuscript figures\n",
"# to load the model use:\n",
"model = sofa.tl.load_model(\"models/tcga_gyn_model\")"
]
},
{
"cell_type": "markdown",
"id": "bd1038c0-22fa-4bc1-a753-fb012f0161da",
"metadata": {},
"source": [
"## Downstream analysis\n",
"\n",
"\n",
"### Convergence\n",
"\n",
"We will first assess whether the ELBO loss of SOFA has converged by plotting it over training steps"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "8d3cf291-098d-4131-be6f-a03b7e2c7628",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'ELBO')"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"| \n", " | Factor_1 (brca) | \n", "Factor_2 (cesc) | \n", "Factor_3 (ov) | \n", "Factor_4 (ucec) | \n", "Factor_5 (ucs) | \n", "Factor_6 | \n", "Factor_7 | \n", "Factor_8 | \n", "Factor_9 | \n", "Factor_10 | \n", "Factor_11 | \n", "Factor_12 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TCGA-A1-A0SB | \n", "-0.068559 | \n", "-0.956759 | \n", "0.395876 | \n", "0.353804 | \n", "-0.449537 | \n", "-0.244394 | \n", "-0.489684 | \n", "0.234003 | \n", "0.364860 | \n", "-0.349728 | \n", "-0.021216 | \n", "0.583005 | \n", "
| TCGA-A1-A0SD | \n", "-0.791113 | \n", "-0.176194 | \n", "0.162773 | \n", "0.077882 | \n", "0.022851 | \n", "0.090326 | \n", "0.110678 | \n", "0.036941 | \n", "0.281851 | \n", "-0.163649 | \n", "-0.056130 | \n", "0.102763 | \n", "
| TCGA-A1-A0SE | \n", "-0.649877 | \n", "-0.516037 | \n", "0.241078 | \n", "0.019738 | \n", "0.355230 | \n", "0.079890 | \n", "-0.057455 | \n", "0.151067 | \n", "0.365583 | \n", "-0.202970 | \n", "-0.646631 | \n", "0.212070 | \n", "
| TCGA-A1-A0SF | \n", "-0.436990 | \n", "-0.397665 | \n", "0.262103 | \n", "-0.131475 | \n", "0.457887 | \n", "0.311861 | \n", "0.133225 | \n", "0.461349 | \n", "0.154909 | \n", "-0.215308 | \n", "0.080673 | \n", "0.108758 | \n", "
| TCGA-A1-A0SG | \n", "-0.796272 | \n", "-0.228161 | \n", "0.087272 | \n", "-0.005371 | \n", "0.616239 | \n", "0.122077 | \n", "0.041294 | \n", "0.088474 | \n", "0.196174 | \n", "-0.137173 | \n", "-0.251898 | \n", "-0.354120 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| TCGA-NF-A5CP | \n", "0.384871 | \n", "0.149592 | \n", "0.140378 | \n", "0.200661 | \n", "-1.616578 | \n", "0.166475 | \n", "0.342098 | \n", "-0.267612 | \n", "0.145394 | \n", "0.229709 | \n", "0.012862 | \n", "-0.198812 | \n", "
| TCGA-NG-A4VU | \n", "0.511174 | \n", "-0.193516 | \n", "0.299988 | \n", "0.069122 | \n", "-2.975001 | \n", "-1.160977 | \n", "0.002463 | \n", "0.582547 | \n", "0.403265 | \n", "0.781878 | \n", "-0.013149 | \n", "-0.131417 | \n", "
| TCGA-NG-A4VW | \n", "0.668518 | \n", "0.125276 | \n", "0.162740 | \n", "0.090213 | \n", "-1.621962 | \n", "0.385436 | \n", "-0.587095 | \n", "0.180678 | \n", "0.314375 | \n", "0.189267 | \n", "-0.081897 | \n", "0.192156 | \n", "
| TCGA-QM-A5NM | \n", "1.058689 | \n", "0.071231 | \n", "0.241933 | \n", "-0.284919 | \n", "-1.533058 | \n", "0.279359 | \n", "-0.279391 | \n", "0.174577 | \n", "0.358607 | \n", "0.190392 | \n", "0.053573 | \n", "-0.531947 | \n", "
| TCGA-QN-A5NN | \n", "0.386474 | \n", "-0.215088 | \n", "0.212471 | \n", "-0.068811 | \n", "-1.736867 | \n", "-0.674749 | \n", "0.190650 | \n", "-0.281425 | \n", "0.337099 | \n", "0.273190 | \n", "0.074353 | \n", "-0.624181 | \n", "
2599 rows × 12 columns
\n", "| \n", " | IGF2 | \n", "DLK1 | \n", "CYP17A1 | \n", "APOE | \n", "SLPI | \n", "CYP11B1 | \n", "STAR | \n", "H19 | \n", "GNAS | \n", "GAPDH | \n", "... | \n", "LRRN4CL | \n", "SLC25A1 | \n", "ARID5B | \n", "RHOQ | \n", "PDE6G | \n", "KIAA0664 | \n", "FAM134A | \n", "PEX6 | \n", "NARF | \n", "TRAF7 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Factor_1 (brca) | \n", "0.256977 | \n", "0.464854 | \n", "0.425984 | \n", "0.109858 | \n", "1.112575 | \n", "0.103872 | \n", "0.529434 | \n", "0.286809 | \n", "0.498709 | \n", "1.013466 | \n", "... | \n", "0.367414 | \n", "0.587168 | \n", "0.027022 | \n", "0.064519 | \n", "0.255864 | \n", "0.737403 | \n", "-0.468108 | \n", "0.264154 | \n", "0.999687 | \n", "0.338431 | \n", "
| Factor_2 (cesc) | \n", "-0.371340 | \n", "-0.627032 | \n", "-0.278327 | \n", "-0.231412 | \n", "-0.320379 | \n", "0.021412 | \n", "-0.323378 | \n", "-0.197620 | \n", "-0.858876 | \n", "-0.654136 | \n", "... | \n", "-0.292036 | \n", "-0.237705 | \n", "-0.305694 | \n", "-0.874484 | \n", "-0.383729 | \n", "-0.336187 | \n", "-0.245804 | \n", "-0.461135 | \n", "-0.451547 | \n", "-0.009286 | \n", "
| Factor_3 (ov) | \n", "-0.558795 | \n", "-0.102920 | \n", "-1.380738 | \n", "-0.295772 | \n", "-0.245436 | \n", "0.169791 | \n", "-0.371026 | \n", "-0.326948 | \n", "0.071627 | \n", "-0.148008 | \n", "... | \n", "-0.026116 | \n", "-0.034537 | \n", "-0.406462 | \n", "-0.257028 | \n", "-0.613797 | \n", "-0.093484 | \n", "-0.939213 | \n", "-0.158356 | \n", "0.479365 | \n", "-0.332378 | \n", "
| Factor_4 (ucec) | \n", "0.321659 | \n", "0.274981 | \n", "0.148637 | \n", "-0.832970 | \n", "0.020647 | \n", "0.033767 | \n", "-0.202398 | \n", "0.065250 | \n", "-1.134238 | \n", "-1.619760 | \n", "... | \n", "0.295630 | \n", "-1.465490 | \n", "1.537956 | \n", "0.259835 | \n", "-0.707334 | \n", "0.781751 | \n", "-0.384141 | \n", "-0.071286 | \n", "-0.721724 | \n", "0.367421 | \n", "
| Factor_5 (ucs) | \n", "-0.246544 | \n", "-0.396687 | \n", "-0.053422 | \n", "-0.028221 | \n", "0.059327 | \n", "-0.059312 | \n", "0.274254 | \n", "-0.084729 | \n", "-0.194032 | \n", "-0.108764 | \n", "... | \n", "0.092816 | \n", "0.252733 | \n", "0.035070 | \n", "-0.412078 | \n", "-0.065257 | \n", "0.324985 | \n", "0.109216 | \n", "-0.071969 | \n", "0.008588 | \n", "0.196552 | \n", "
| Factor_6 | \n", "0.056331 | \n", "-0.018716 | \n", "0.026955 | \n", "-0.245057 | \n", "0.252097 | \n", "-0.084610 | \n", "-0.308168 | \n", "0.120581 | \n", "0.316507 | \n", "-0.160259 | \n", "... | \n", "0.106847 | \n", "0.229631 | \n", "0.029902 | \n", "0.188749 | \n", "-0.011532 | \n", "-0.216373 | \n", "0.151458 | \n", "-0.031476 | \n", "0.122704 | \n", "-0.108069 | \n", "
| Factor_7 | \n", "0.944083 | \n", "0.190422 | \n", "-0.057485 | \n", "1.765003 | \n", "0.045593 | \n", "0.235488 | \n", "-0.119139 | \n", "0.720282 | \n", "0.056930 | \n", "-0.210789 | \n", "... | \n", "1.772369 | \n", "0.157764 | \n", "0.819517 | \n", "0.625311 | \n", "1.294285 | \n", "-0.240169 | \n", "-0.199078 | \n", "-0.321732 | \n", "0.018755 | \n", "-0.211537 | \n", "
| Factor_8 | \n", "0.567729 | \n", "0.866502 | \n", "0.309059 | \n", "0.263794 | \n", "-0.752690 | \n", "0.049291 | \n", "0.931249 | \n", "-0.032493 | \n", "-0.359230 | \n", "-0.975684 | \n", "... | \n", "0.737753 | \n", "-0.696783 | \n", "0.189181 | \n", "-0.056764 | \n", "0.683843 | \n", "-0.067248 | \n", "0.573768 | \n", "0.351589 | \n", "-0.339139 | \n", "-0.256080 | \n", "
| Factor_9 | \n", "1.361504 | \n", "0.462694 | \n", "0.173420 | \n", "-0.463739 | \n", "-0.196467 | \n", "-0.239038 | \n", "-0.203452 | \n", "0.735064 | \n", "0.491511 | \n", "-0.076739 | \n", "... | \n", "1.247741 | \n", "0.358347 | \n", "0.286029 | \n", "0.187339 | \n", "-0.728862 | \n", "-0.415017 | \n", "0.249448 | \n", "0.128513 | \n", "-0.140165 | \n", "0.117316 | \n", "
| Factor_10 | \n", "0.280253 | \n", "0.628980 | \n", "0.000314 | \n", "0.263023 | \n", "-0.880207 | \n", "-0.004985 | \n", "0.116316 | \n", "-0.213680 | \n", "0.927243 | \n", "0.325137 | \n", "... | \n", "-0.442285 | \n", "0.154566 | \n", "-0.581967 | \n", "0.341046 | \n", "0.377159 | \n", "-0.124212 | \n", "-0.156052 | \n", "-0.333031 | \n", "0.964307 | \n", "0.109611 | \n", "
| Factor_11 | \n", "-0.017417 | \n", "-0.049315 | \n", "0.036915 | \n", "0.014929 | \n", "-0.023290 | \n", "0.029108 | \n", "-0.020272 | \n", "-0.001592 | \n", "0.028506 | \n", "-0.010991 | \n", "... | \n", "-0.017803 | \n", "-0.123792 | \n", "0.022235 | \n", "-0.062988 | \n", "0.032762 | \n", "-0.021049 | \n", "-0.052274 | \n", "0.012072 | \n", "0.063581 | \n", "-0.015236 | \n", "
| Factor_12 | \n", "-0.046363 | \n", "-0.025982 | \n", "-0.078152 | \n", "-0.990845 | \n", "0.123614 | \n", "-0.020091 | \n", "0.017208 | \n", "-0.359655 | \n", "-0.125432 | \n", "-0.198495 | \n", "... | \n", "0.023031 | \n", "-0.938130 | \n", "0.821455 | \n", "0.410206 | \n", "-0.300631 | \n", "-0.820548 | \n", "-0.382904 | \n", "-0.754267 | \n", "-0.709637 | \n", "-0.756123 | \n", "
12 rows × 4436 columns
\n", "" ], "text/plain": [ " IGF2 DLK1 CYP17A1 APOE SLPI CYP11B1 \\\n", "Factor_1 (brca) 0.256977 0.464854 0.425984 0.109858 1.112575 0.103872 \n", "Factor_2 (cesc) -0.371340 -0.627032 -0.278327 -0.231412 -0.320379 0.021412 \n", "Factor_3 (ov) -0.558795 -0.102920 -1.380738 -0.295772 -0.245436 0.169791 \n", "Factor_4 (ucec) 0.321659 0.274981 0.148637 -0.832970 0.020647 0.033767 \n", "Factor_5 (ucs) -0.246544 -0.396687 -0.053422 -0.028221 0.059327 -0.059312 \n", "Factor_6 0.056331 -0.018716 0.026955 -0.245057 0.252097 -0.084610 \n", "Factor_7 0.944083 0.190422 -0.057485 1.765003 0.045593 0.235488 \n", "Factor_8 0.567729 0.866502 0.309059 0.263794 -0.752690 0.049291 \n", "Factor_9 1.361504 0.462694 0.173420 -0.463739 -0.196467 -0.239038 \n", "Factor_10 0.280253 0.628980 0.000314 0.263023 -0.880207 -0.004985 \n", "Factor_11 -0.017417 -0.049315 0.036915 0.014929 -0.023290 0.029108 \n", "Factor_12 -0.046363 -0.025982 -0.078152 -0.990845 0.123614 -0.020091 \n", "\n", " STAR H19 GNAS GAPDH ... LRRN4CL \\\n", "Factor_1 (brca) 0.529434 0.286809 0.498709 1.013466 ... 0.367414 \n", "Factor_2 (cesc) -0.323378 -0.197620 -0.858876 -0.654136 ... -0.292036 \n", "Factor_3 (ov) -0.371026 -0.326948 0.071627 -0.148008 ... -0.026116 \n", "Factor_4 (ucec) -0.202398 0.065250 -1.134238 -1.619760 ... 0.295630 \n", "Factor_5 (ucs) 0.274254 -0.084729 -0.194032 -0.108764 ... 0.092816 \n", "Factor_6 -0.308168 0.120581 0.316507 -0.160259 ... 0.106847 \n", "Factor_7 -0.119139 0.720282 0.056930 -0.210789 ... 1.772369 \n", "Factor_8 0.931249 -0.032493 -0.359230 -0.975684 ... 0.737753 \n", "Factor_9 -0.203452 0.735064 0.491511 -0.076739 ... 1.247741 \n", "Factor_10 0.116316 -0.213680 0.927243 0.325137 ... -0.442285 \n", "Factor_11 -0.020272 -0.001592 0.028506 -0.010991 ... -0.017803 \n", "Factor_12 0.017208 -0.359655 -0.125432 -0.198495 ... 0.023031 \n", "\n", " SLC25A1 ARID5B RHOQ PDE6G KIAA0664 FAM134A \\\n", "Factor_1 (brca) 0.587168 0.027022 0.064519 0.255864 0.737403 -0.468108 \n", "Factor_2 (cesc) -0.237705 -0.305694 -0.874484 -0.383729 -0.336187 -0.245804 \n", "Factor_3 (ov) -0.034537 -0.406462 -0.257028 -0.613797 -0.093484 -0.939213 \n", "Factor_4 (ucec) -1.465490 1.537956 0.259835 -0.707334 0.781751 -0.384141 \n", "Factor_5 (ucs) 0.252733 0.035070 -0.412078 -0.065257 0.324985 0.109216 \n", "Factor_6 0.229631 0.029902 0.188749 -0.011532 -0.216373 0.151458 \n", "Factor_7 0.157764 0.819517 0.625311 1.294285 -0.240169 -0.199078 \n", "Factor_8 -0.696783 0.189181 -0.056764 0.683843 -0.067248 0.573768 \n", "Factor_9 0.358347 0.286029 0.187339 -0.728862 -0.415017 0.249448 \n", "Factor_10 0.154566 -0.581967 0.341046 0.377159 -0.124212 -0.156052 \n", "Factor_11 -0.123792 0.022235 -0.062988 0.032762 -0.021049 -0.052274 \n", "Factor_12 -0.938130 0.821455 0.410206 -0.300631 -0.820548 -0.382904 \n", "\n", " PEX6 NARF TRAF7 \n", "Factor_1 (brca) 0.264154 0.999687 0.338431 \n", "Factor_2 (cesc) -0.461135 -0.451547 -0.009286 \n", "Factor_3 (ov) -0.158356 0.479365 -0.332378 \n", "Factor_4 (ucec) -0.071286 -0.721724 0.367421 \n", "Factor_5 (ucs) -0.071969 0.008588 0.196552 \n", "Factor_6 -0.031476 0.122704 -0.108069 \n", "Factor_7 -0.321732 0.018755 -0.211537 \n", "Factor_8 0.351589 -0.339139 -0.256080 \n", "Factor_9 0.128513 -0.140165 0.117316 \n", "Factor_10 -0.333031 0.964307 0.109611 \n", "Factor_11 0.012072 0.063581 -0.015236 \n", "Factor_12 -0.754267 -0.709637 -0.756123 \n", "\n", "[12 rows x 4436 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# specify the view of which we want to retrieve the loadings\n", "W = sofa.tl.get_loadings(model, view=\"rna\")\n", "W" ] }, { "cell_type": "code", "execution_count": 30, "id": "37b1b42b-fdea-4402-95ad-91229fd4eabf", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "