{
"cells": [
{
"cell_type": "markdown",
"id": "539f7263-8136-4654-a771-01790adabeea",
"metadata": {
"tags": []
},
"source": [
"\n",
" "
]
},
{
"cell_type": "markdown",
"id": "0d991acc-8907-420b-8cdc-5536f44856f2",
"metadata": {
"tags": []
},
"source": [
"Before you start, make sure to set your runtime type to GPU in colab."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "af1bb4b1-102b-4764-9b1d-711773db9e36",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Install SOFA + dependencies\n",
"!pip install --quiet biosofa"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "96e9ae47-b714-4804-93db-82c7f917dbf2",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2024-11-07 10:03:30-- https://zenodo.org/records/14044221/files/pancan_depmap.h5mu?download=1\n",
"Resolving zenodo.org (zenodo.org)... 188.184.98.238, 188.184.103.159, 188.185.79.172, ...\n",
"Connecting to zenodo.org (zenodo.org)|188.184.98.238|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 79551896 (76M) [application/octet-stream]\n",
"Saving to: ‘pancan_depmap.h5mu?download=1’\n",
"\n",
"pancan_depmap.h5mu? 100%[===================>] 75.87M 534KB/s in 1m 55s \n",
"\n",
"2024-11-07 10:05:26 (677 KB/s) - ‘pancan_depmap.h5mu?download=1’ saved [79551896/79551896]\n",
"\n"
]
}
],
"source": [
"!wget https://zenodo.org/records/14044221/files/pancan_depmap.h5mu?download=1\n",
"!mv pancan_depmap.h5mu?download=1 pancan_depmap.h5mu"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1039b922-7673-4ae5-bd80-0536c5a22b62",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"import pandas as pd\n",
"import sofa\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import matplotlib\n",
"from muon import MuData\n",
"import muon as mu\n",
"from sklearn.manifold import TSNE\n",
"import scanpy as sc\n",
"import anndata as ad\n",
"from anndata import AnnData\n",
"import torch"
]
},
{
"cell_type": "markdown",
"id": "48916839-4f05-4457-ad47-b9f588e704aa",
"metadata": {
"tags": []
},
"source": [
"# Analysis of DepMap data\n",
"\n",
"## Introduction\n",
"\n",
"In this notebook we will explore how `SOFA` can be used to analyze multi-omics data from the DepMap [[1,2,3,4,5]](#1,#2,#3,#4,#5). \n",
"Here we give a brief introduction what the SOFA model does and what it can be used for. For a more \n",
"detailed description please refer to our preprint: https://doi.org/10.1101/2024.10.10.617527 \n",
"\n",
"\n",
"### The SOFA model\n",
"Given a set of real-valued data\n",
"matrices containing multi-omic measurements from overlapping samples (also called views),\n",
"along with sample-level guiding variables that capture additional properties such as batches\n",
"or mutational profiles, SOFA extracts an interpretable lower-dimensional data representation,\n",
"consisting of a shared factor matrix and modality-specific loading matrices. The goal of these \n",
"factors is to explain the major axes of variation in the data. SOFA explicitly assigns a subset of factors \n",
"to explain both the multi-omics data and the guiding\n",
"variables (guided factors), while preserving another subset of factors exclusively\n",
"for explaining the multi-omics data (unguided factors). Importantly, this feature allows the\n",
"analyst to discern variation that is driven by known sources from novel, unexplained sources\n",
"of variability.\n",
"\n",
"#### Interpretation of the factors (Z)\n",
"Analogous to the interpretation of factors in PCA, SOFA factors ordinate samples along a\n",
"zero-centered axis, where samples with opposing signs exhibit contrasting phenotypes along\n",
"the inferred axis of variation, and the absolute value of the factor indicates the strength of the\n",
"phenotype. Importantly, SOFA partitions the factors of the low-rank decomposition into\n",
"guided and unguided factors: the guided factors are linked to specific guiding variables,\n",
"while the unguided factors capture global, yet unexplained, sources of variability in the data. \n",
"The factor values can be used in downstream analysis tasks related to the samples, such as clustering \n",
"or survival analysis. The factor values are called Z in SOFA.\n",
"\n",
"#### Interpretation of the loading weights (W)\n",
"SOFA’s loading weights indicate the importance of each feature for its respective factor,\n",
"thereby enabling the interpretation of SOFA factors. Loading weights close to zero indicate\n",
"that a feature has little to no importance for the respective factor, while large magnitudes\n",
"suggest strong relevance. The sign of the loading weight aligns with its corresponding factor,\n",
"meaning that positive loading weights indicate higher feature levels in samples with positive\n",
"factor values, and negative loading weights indicate higher feature levels in samples with\n",
"negative factor values. The top loading weights can be simply inspected or used in downstream analysis such as gene set \n",
"enrichment analysis. The factor values are called W in SOFA.\n",
"\n",
"#### Supported data\n",
"SOFA expects a set of matrices containing omics measurements with matching and aligned samples and different features. \n",
"Currently SOFA only supports Gaussian likelihoods, for the multi-omics data. \n",
"Data should therefore be appropriately normalized according to\n",
"its omics modality. Additionally, data should be centered and scaled.\n",
"\n",
"\n",
"For the guiding variables SOFA supports Gaussian, Bernoulli and Categorical likelihoods. Guiding variables\n",
"can therefore be continuous, binary or categorical. Guiding variables should be vectors with matching samples with \n",
"the multi-omics data.\n",
"\n",
"In SOFA the multi-omics data is denoted as X and the guiding variables as Y.\n",
"\n",
"\n",
"### The DepMap data set\n",
"The DepMap project aims to identify cancer vulnerabilities and drug targets across a diverse range of cancer types. The data set includes multi-omics data, encompassing transcriptomics[[4]](#4), proteomics[[1]](#1), and methylation[[5]](#5), as well as drug response profiles for 627 drugs[[5]](#5) and CRISPR-Cas9 gene essentiality scores[[2,3]](#2,#3) for 17485 genes for 949 cancer cell lines across 26 different tissues. We will fit a SOFA model with 20 factors, while accounting for potential/known drivers of variation such as growth rate, microsatellite instability (MSI) status, BRAF, TP53 and PIK3CA mutation and hematopoietic lineage. We will use the essentiality score data test factor associations with essentiality scores.\n",
"We will first load the preprocessed data in `MuData` format, fit a SOFA model and perform various downstream analyses. \n",
"\n",
"\n",
"\n",
"[1] \n",
"Gonçalves, E. et al. Pan-cancer proteomic map of 949 human cell lines. Cancer Cell 40, 835–849.e8 (2022).\\\n",
"[2] \n",
"Boehm, J. S. et al. Cancer research needs a better map. Nature 589, 514–516 (2021).\\\n",
"[3] \n",
"Pacini, C. et al. Integrated cross-study datasets of genetic dependencies in cancer. Nat. Commun. 12, 1661 (2021).\\\n",
"[4] \n",
"Garcia-Alonso, L. et al. Transcription factor activities enhance markers of drug sensitivity in cancer. Cancer Res. 78, 769–780 (2018).\\\n",
"[5] \n",
"Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016)."
]
},
{
"cell_type": "markdown",
"id": "8eb95739-7f76-4a0b-b806-860152361226",
"metadata": {
"tags": []
},
"source": [
"## Read data and set hyperparameters"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "9b3b41ce-966a-4726-a63c-aaf209c58813",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
MuData object with n_obs × n_vars = 778 × 23503\n", " obs:\t'DepMap_ID', 'cell_line_name', 'stripped_cell_line_name', 'CCLE_Name', 'alias', 'COSMICID', 'sex', 'source', 'RRID', 'WTSI_Master_Cell_ID', 'sample_collection_site', 'primary_or_metastasis', 'primary_disease', 'Subtype', 'age', 'Sanger_Model_ID', 'depmap_public_comments', 'lineage', 'lineage_subtype', 'lineage_sub_subtype', 'lineage_molecular_subtype', 'default_growth_pattern', 'model_manipulation', 'model_manipulation_details', 'patient_id', 'parent_depmap_id', 'Cellosaurus_NCIt_disease', 'Cellosaurus_NCIt_id', 'Cellosaurus_issues', 'model_id', 'Project_Identifier', 'Cell_line', 'Source', 'Identifier', 'Gender', 'Tissue_type', 'Cancer_type', 'Cancer_subtype', 'Haem_lineage', 'BROAD_ID', 'CCLE_ID', 'ploidy', 'mutational_burden', 'msi_status', 'growth_properties', 'growth', 'size', 'media', 'replicates_correlation', 'number_of_proteins', 'EMT', 'Proteasome', 'TranslationInitiation', 'CopyNumberInstability', 'GeneExpressionCorrelation', 'CopyNumberAttenuation', 'crispr_source', 'hema/lymph'\n", " 12 modalities\n", " RNA:\t778 x 2000\n", " uns:\t'llh', 'log1p'\n", " obsm:\t'mask'\n", " Protein:\t778 x 2000\n", " uns:\t'llh'\n", " obsm:\t'mask'\n", " Methylation:\t778 x 2000\n", " uns:\t'llh', 'log1p'\n", " obsm:\t'mask'\n", " Drug response:\t778 x 627\n", " uns:\t'llh'\n", " obsm:\t'mask'\n", " CRISPR scores:\t778 x 16258\n", " uns:\t'llh'\n", " obsm:\t'mask'\n", " Mutations:\t778 x 612\n", " uns:\t'llh'\n", " obsm:\t'mask'\n", " Growth:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'\n", " MSI:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'\n", " BRAF:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'\n", " TP53:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'\n", " PIK3CA:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'\n", " Hema:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'" ], "text/plain": [ "MuData object with n_obs × n_vars = 778 × 23503\n", " obs:\t'DepMap_ID', 'cell_line_name', 'stripped_cell_line_name', 'CCLE_Name', 'alias', 'COSMICID', 'sex', 'source', 'RRID', 'WTSI_Master_Cell_ID', 'sample_collection_site', 'primary_or_metastasis', 'primary_disease', 'Subtype', 'age', 'Sanger_Model_ID', 'depmap_public_comments', 'lineage', 'lineage_subtype', 'lineage_sub_subtype', 'lineage_molecular_subtype', 'default_growth_pattern', 'model_manipulation', 'model_manipulation_details', 'patient_id', 'parent_depmap_id', 'Cellosaurus_NCIt_disease', 'Cellosaurus_NCIt_id', 'Cellosaurus_issues', 'model_id', 'Project_Identifier', 'Cell_line', 'Source', 'Identifier', 'Gender', 'Tissue_type', 'Cancer_type', 'Cancer_subtype', 'Haem_lineage', 'BROAD_ID', 'CCLE_ID', 'ploidy', 'mutational_burden', 'msi_status', 'growth_properties', 'growth', 'size', 'media', 'replicates_correlation', 'number_of_proteins', 'EMT', 'Proteasome', 'TranslationInitiation', 'CopyNumberInstability', 'GeneExpressionCorrelation', 'CopyNumberAttenuation', 'crispr_source', 'hema/lymph'\n", " 12 modalities\n", " RNA:\t778 x 2000\n", " uns:\t'llh', 'log1p'\n", " obsm:\t'mask'\n", " Protein:\t778 x 2000\n", " uns:\t'llh'\n", " obsm:\t'mask'\n", " Methylation:\t778 x 2000\n", " uns:\t'llh', 'log1p'\n", " obsm:\t'mask'\n", " Drug response:\t778 x 627\n", " uns:\t'llh'\n", " obsm:\t'mask'\n", " CRISPR scores:\t778 x 16258\n", " uns:\t'llh'\n", " obsm:\t'mask'\n", " Mutations:\t778 x 612\n", " uns:\t'llh'\n", " obsm:\t'mask'\n", " Growth:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'\n", " MSI:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'\n", " BRAF:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'\n", " TP53:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'\n", " PIK3CA:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'\n", " Hema:\t778 x 1\n", " uns:\t'llh', 'scaling_factor'\n", " obsm:\t'mask'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# First we read the preprocessed data as a single MuData object\n", "mdata = mu.read(\"pancan_depmap.h5mu\")\n", "mdata" ] }, { "cell_type": "markdown", "id": "2d6550fe-d5fc-4a56-8a54-ed3647c5bd2e", "metadata": {}, "source": [ "The mdata object contains 12 data modalities and the obs slot contains metadata of the cell lines. \n", "We will use RNA, Protein, Methylation and Drug response as the multi-omics data input for `SOFA`. \n", "We will use the modalities Growth (the growth rate of the cell lines), MSI (microsatellite instability status), BRAF (whether the cell line is mutated in BRAF), TP53 \n", "(whether the cell line is mutated in TP53), PIK3CA (whether the cell line is mutated in PIK3CA) and Hema (whether the cell line is from the \n", "hematopoietic lineage) as guiding variables. The modalities Mutations and CRISPR scores will be used in the downstream analysis, to test for significant associations with factors." ] }, { "cell_type": "code", "execution_count": null, "id": "84c4b6bb-8317-4dc3-b7e8-49b6b1e33e01", "metadata": { "tags": [] }, "outputs": [], "source": [ "# We create the MuData object Xmdata, which contains the multi-omics data:\n", "Xmdata = MuData({\"RNA\":mdata[\"RNA\"], \"Protein\":mdata[\"Protein\"], \"Methylation\":mdata[\"Methylation\"], \"Drug response\":mdata[\"Drug response\"]})\n", "# We create the MuData objectYmdata, which contains the guiding variables:\n", "Ymdata = MuData({\"Growth\":mdata[\"Growth\"], \"MSI\": mdata[\"MSI\"], \"BRAF\": mdata[\"BRAF\"], \"TP53\":mdata[\"TP53\"], \"PIK3CA\":mdata[\"PIK3CA\"], \"Hema\": mdata[\"Hema\"]})" ] }, { "cell_type": "markdown", "id": "6da7062f-3f08-4206-a6bd-478ab9b2ea45", "metadata": {}, "source": [ "### (Optional for this Tutorial) here we will show how you would prepare the input data for SOFA yourself\n", "We assume that you have a `pandas.DataFrame` for each of the data modalities. " ] }, { "cell_type": "code", "execution_count": 14, "id": "6572b8d9-73de-46be-a1f2-98141a1ac027", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Extract dataframes from the `MuData` object\n", "rna_df = mdata[\"RNA\"].to_df()\n", "prot_df = mdata[\"Protein\"].to_df()\n", "meth_df = mdata[\"Methylation\"].to_df()\n", "drug_df = mdata[\"Drug response\"].to_df()" ] }, { "cell_type": "markdown", "id": "e149922a-f51e-4e72-8240-50c0b506f753", "metadata": {}, "source": [ "Then we can use the sofa.tl.get_ad() function to produce an appropriate `AnnData` object." ] }, { "cell_type": "code", "execution_count": 15, "id": "9f00602d-915f-4e71-9a60-e7a311d509b6", "metadata": { "tags": [] }, "outputs": [], "source": [ "rna_ad = sofa.tl.get_ad(rna_df, llh = \"gaussian\") # currently only the Gaussian likelihood is supported for the omics data\n", "prot_ad = sofa.tl.get_ad(prot_df, llh = \"gaussian\")\n", "meth_ad = sofa.tl.get_ad(meth_df, llh = \"gaussian\")\n", "drug_ad = sofa.tl.get_ad(drug_df, llh = \"gaussian\")\n", "# Finally as before wrap all the `AnnData` objects in a single `MuData` object.\n", "Xmdata = MuData({\"RNA\":rna_ad, \"Protein\":prot_ad, \"Methylation\":meth_ad, \"Drug response\":drug_ad})" ] }, { "cell_type": "markdown", "id": "178662e2-a76f-4f01-bdc3-3fb7fab8f222", "metadata": {}, "source": [ "and analogously for the guiding variables:" ] }, { "cell_type": "code", "execution_count": 16, "id": "cee40548-3f54-48bf-ba35-e75fc0e7616a", "metadata": { "tags": [] }, "outputs": [], "source": [ "growth_df = mdata[\"Growth\"].to_df()\n", "msi_df = mdata[\"MSI\"].to_df()\n", "braf_df = mdata[\"BRAF\"].to_df()\n", "tp53_df = mdata[\"TP53\"].to_df()\n", "pik3ca_df = mdata[\"PIK3CA\"].to_df()\n", "hema_df = mdata[\"Hema\"].to_df()" ] }, { "cell_type": "markdown", "id": "eb408df0-9033-4b28-b3db-905f6a5dbe83", "metadata": {}, "source": [ "Again we can use the sofa.tl.get_ad() function to produce an appropriate `AnnData` object.\n", "We need to specify an appropriate likelihood for each guiding variables. For the continuous growth rate we use \n", "`gaussian` and for the remaining binary variables `bernoulli`. SOFA also supports the `categorical` likelihood.\n", "Additionally, we need to set a scaling factor for each guiding \n", "variables, determining the strength of the supervision in the fitting process. To high values lead to guided factors that do not explain any variance \n", "of the multi-omics data. Too low values would lead to factors that are not associated with their guiding variables.\n", "We recommend the default of 0.1." ] }, { "cell_type": "code", "execution_count": 19, "id": "c0ed3381-9784-4b39-aa40-2cd4c2d16776", "metadata": { "tags": [] }, "outputs": [], "source": [ "growth_ad = sofa.tl.get_ad(growth_df, llh = \"gaussian\", scaling_factor = 0.01) \n", "msi_ad = sofa.tl.get_ad(msi_df, llh = \"bernoulli\", scaling_factor = 0.1)\n", "braf_ad = sofa.tl.get_ad(braf_df, llh = \"bernoulli\", scaling_factor = 0.1)\n", "tp53_ad = sofa.tl.get_ad(tp53_df, llh = \"bernoulli\", scaling_factor = 0.1)\n", "pik3ca_ad = sofa.tl.get_ad(pik3ca_df, llh = \"bernoulli\", scaling_factor = 0.1)\n", "hema_ad = sofa.tl.get_ad(hema_df, llh = \"bernoulli\", scaling_factor = 0.1)\n", "\n", "# Finally as before wrap all the `AnnData` objects in a single `MuData` object.\n", "Ymdata = MuData({\"Growth\":growth_ad, \"MSI\": msi_ad, \"BRAF\": braf_ad, \"TP53\": tp53_ad, \"PIK3CA\": pik3ca_ad, \"Hema\": hema_ad})" ] }, { "cell_type": "code", "execution_count": 20, "id": "3c321a3d-3bd0-4b51-8fd1-b9d2780cc840", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", " 0., 0.],\n", " [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", " 0., 0.],\n", " [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", " 0., 0.],\n", " [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", " 0., 0.],\n", " [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", " 0., 0.],\n", " [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", " 0., 0.]], dtype=torch.float64)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We set the number of factors to infer\n", "num_factors = 20\n", "# Use obs as metadata of the cell lines\n", "metadata = mdata.obs\n", "# In order to relate factors to guiding variables we need to provide a design matrix (guiding variables x number of factors) \n", "# indicating which factor is guided by which guiding variable.\n", "# Here we just indicate that the first 6 factors are each guided by a different guiding variable:\n", "design = np.zeros((len(Ymdata.mod), num_factors))\n", "for i in range(len(Ymdata.mod)):\n", " design[i,i] = 1\n", " \n", "# convert to torch tensor to make it usable by SOFA\n", "design = torch.tensor(design)\n", "design" ] }, { "cell_type": "markdown", "id": "0f111725-7a24-4b32-94d0-5f133fae5d77", "metadata": { "tags": [] }, "source": [ "## Fit the `SOFA` model" ] }, { "cell_type": "code", "execution_count": 21, "id": "4fdcecf1-88bb-4f61-a235-2fcb581481ce", "metadata": { "tags": [] }, "outputs": [], "source": [ "model = sofa.SOFA(Xmdata = Xmdata, # the input multi-omics data \n", " num_factors=num_factors, # number of factors to infer\n", " Ymdata = Ymdata, # the input guiding variables\n", " design = design, # design matrix relating factors to guiding variables\n", " device='cuda', # set device to \"cuda\" to enable computation on the GPU, if you don't have a GPU available set it to \"cpu\"\n", " seed=42) # set seed to get the same results every time we run it" ] }, { "cell_type": "code", "execution_count": 22, "id": "3b1c2f5c-5a15-4710-865b-adc99181b81e", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Current Elbo 3.69E+06 | Delta: 494761: 100%|██████████| 3000/3000 [05:01<00:00, 9.94it/s] \n", "Current Elbo 2.61E+06 | Delta: -65210: 100%|██████████| 3000/3000 [04:43<00:00, 10.58it/s] \n" ] } ], "source": [ "# train SOFA with learning rate of 0.01 for 3000 steps\n", "model.fit(n_steps=3000, lr=0.01)\n", "# decrease learning rate to 0.005 and continue training\n", "model.fit(n_steps=3000, lr=0.005)" ] }, { "cell_type": "code", "execution_count": 6, "id": "8eca0ee3-88f3-49d9-a2bd-6563a429410c", "metadata": { "tags": [] }, "outputs": [], "source": [ "# if we would like to save the fitted model we can save it using:\n", "#sofa.tl.save_model(model,\"depmap_example_model\")\n", "\n", "# to load the model use:\n", "model = sofa.tl.load_model(\"depmap_example_model\")" ] }, { "cell_type": "markdown", "id": "9bb9324a-d093-4995-84ff-b9745e18c2a6", "metadata": { "tags": [] }, "source": [ "## Downstream analysis\n", "\n", "\n", "### Convergence\n", "\n", "We will first assess whether the ELBO loss of SOFA has converged by plotting it over training steps" ] }, { "cell_type": "code", "execution_count": 15, "id": "f95de90d-80dd-44d9-9ce2-28d059cc613c", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'ELBO')" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
\n", " | Factor_0 (Growth) | \n", "Factor_1 (MSI) | \n", "Factor_2 (BRAF) | \n", "Factor_3 (TP53) | \n", "Factor_4 (PIK3CA) | \n", "Factor_5 (Hema) | \n", "Factor_6 | \n", "Factor_7 | \n", "Factor_8 | \n", "Factor_9 | \n", "Factor_10 | \n", "Factor_11 | \n", "Factor_12 | \n", "Factor_13 | \n", "Factor_14 | \n", "Factor_15 | \n", "Factor_16 | \n", "Factor_17 | \n", "Factor_18 | \n", "Factor_19 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "-0.984563 | \n", "0.094903 | \n", "1.867936 | \n", "0.991292 | \n", "-0.005510 | \n", "0.174301 | \n", "-0.149569 | \n", "-0.101285 | \n", "0.118306 | \n", "0.228873 | \n", "-0.208032 | \n", "-0.263638 | \n", "-0.600392 | \n", "-0.382208 | \n", "0.140103 | \n", "0.344184 | \n", "-0.815846 | \n", "0.459298 | \n", "0.719330 | \n", "-0.313031 | \n", "
1 | \n", "-0.674658 | \n", "0.031586 | \n", "-0.789824 | \n", "-0.407347 | \n", "1.124087 | \n", "0.458733 | \n", "-0.060520 | \n", "-0.040364 | \n", "-0.503078 | \n", "-0.399277 | \n", "0.166812 | \n", "-0.032913 | \n", "0.652486 | \n", "-0.614164 | \n", "0.154161 | \n", "0.377786 | \n", "0.100994 | \n", "-1.004648 | \n", "-0.408009 | \n", "0.504229 | \n", "
2 | \n", "-0.361459 | \n", "-0.191929 | \n", "-0.307024 | \n", "0.383602 | \n", "0.018404 | \n", "0.897071 | \n", "0.294184 | \n", "-0.372404 | \n", "0.148475 | \n", "0.785472 | \n", "-0.204483 | \n", "0.098536 | \n", "0.097677 | \n", "0.411425 | \n", "-0.391176 | \n", "0.374827 | \n", "0.386192 | \n", "-0.456484 | \n", "0.346030 | \n", "0.550844 | \n", "
3 | \n", "-0.605062 | \n", "-0.200432 | \n", "-0.299046 | \n", "0.202695 | \n", "0.221496 | \n", "0.108696 | \n", "-0.091975 | \n", "0.026284 | \n", "0.158653 | \n", "0.096715 | \n", "0.113269 | \n", "-0.056898 | \n", "-0.152790 | \n", "0.209436 | \n", "0.354010 | \n", "0.336211 | \n", "0.376640 | \n", "-0.477210 | \n", "-0.379058 | \n", "0.468571 | \n", "
4 | \n", "0.212342 | \n", "-0.120816 | \n", "-0.212216 | \n", "0.690939 | \n", "0.186770 | \n", "0.897576 | \n", "0.123370 | \n", "0.123256 | \n", "-0.111156 | \n", "0.608043 | \n", "-0.065553 | \n", "0.236904 | \n", "-0.198532 | \n", "-0.240049 | \n", "-0.649414 | \n", "0.371791 | \n", "0.641309 | \n", "0.105965 | \n", "0.207381 | \n", "0.464103 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
773 | \n", "-0.030411 | \n", "-0.103553 | \n", "-0.316009 | \n", "0.223481 | \n", "0.244809 | \n", "0.518574 | \n", "0.131121 | \n", "-0.026001 | \n", "-0.492095 | \n", "-0.288541 | \n", "0.167538 | \n", "0.161157 | \n", "0.659408 | \n", "-0.443171 | \n", "0.323169 | \n", "-0.498860 | \n", "-0.499177 | \n", "-1.300491 | \n", "0.662859 | \n", "0.273217 | \n", "
774 | \n", "0.561793 | \n", "-0.662839 | \n", "0.209959 | \n", "-0.977655 | \n", "0.237925 | \n", "0.574140 | \n", "-0.069846 | \n", "0.001591 | \n", "-1.813954 | \n", "0.000474 | \n", "-0.044735 | \n", "0.296505 | \n", "-0.099778 | \n", "0.061658 | \n", "0.567360 | \n", "0.424479 | \n", "-0.512040 | \n", "1.120325 | \n", "-0.449671 | \n", "-1.028471 | \n", "
775 | \n", "0.208128 | \n", "-0.140388 | \n", "0.508802 | \n", "-0.937757 | \n", "-0.153248 | \n", "-2.141505 | \n", "-0.047474 | \n", "0.235640 | \n", "0.024192 | \n", "-0.118403 | \n", "-0.061317 | \n", "-0.530468 | \n", "0.032437 | \n", "-0.095084 | \n", "-0.306878 | \n", "-0.375194 | \n", "0.274054 | \n", "0.427416 | \n", "-1.491817 | \n", "0.362090 | \n", "
776 | \n", "1.427114 | \n", "-1.299472 | \n", "-0.234395 | \n", "-0.894223 | \n", "0.246734 | \n", "0.601628 | \n", "0.543358 | \n", "0.153857 | \n", "-1.782104 | \n", "0.350788 | \n", "0.240557 | \n", "0.425656 | \n", "-0.485319 | \n", "0.379114 | \n", "-0.334070 | \n", "-0.889224 | \n", "0.031068 | \n", "0.376890 | \n", "-0.444826 | \n", "-0.905731 | \n", "
777 | \n", "-0.233796 | \n", "0.198935 | \n", "2.749278 | \n", "1.159026 | \n", "-0.625919 | \n", "0.637330 | \n", "-0.659517 | \n", "1.177139 | \n", "0.177545 | \n", "0.033086 | \n", "-0.106065 | \n", "-0.191898 | \n", "-0.288880 | \n", "-1.048721 | \n", "-0.279230 | \n", "0.333567 | \n", "0.657307 | \n", "0.193587 | \n", "2.300986 | \n", "-0.597988 | \n", "
778 rows × 20 columns
\n", "symbol | \n", "A2M | \n", "AADACL2 | \n", "AADACL3 | \n", "AARD | \n", "ABCB10P1 | \n", "ABI3BP | \n", "ACAN | \n", "ACKR3 | \n", "ACP3 | \n", "ACTA2 | \n", "... | \n", "XBP1 | \n", "XCL2 | \n", "ZNF683 | \n", "ZNF723 | \n", "ZNRF4 | \n", "ZP2 | \n", "ZP4 | \n", "ZPLD1 | \n", "ZSCAN10 | \n", "ZSWIM5P3 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0.015295 | \n", "0.071497 | \n", "0.035568 | \n", "0.089748 | \n", "-0.044069 | \n", "0.073249 | \n", "0.057849 | \n", "0.014150 | \n", "-0.007009 | \n", "-0.008777 | \n", "... | \n", "-0.029756 | \n", "0.015113 | \n", "0.042406 | \n", "0.026360 | \n", "0.010476 | \n", "-0.007051 | \n", "-0.005228 | \n", "-0.024504 | \n", "0.009037 | \n", "0.015518 | \n", "
1 | \n", "-0.034314 | \n", "-0.000051 | \n", "0.023871 | \n", "-0.097714 | \n", "0.070291 | \n", "-0.047070 | \n", "0.091474 | \n", "-0.057706 | \n", "-0.268044 | \n", "-0.026221 | \n", "... | \n", "-0.037763 | \n", "0.029878 | \n", "0.045231 | \n", "-0.062290 | \n", "-0.131012 | \n", "-0.080212 | \n", "-0.024728 | \n", "-0.048543 | \n", "0.021252 | \n", "-0.004852 | \n", "
2 | \n", "-0.321572 | \n", "0.045532 | \n", "0.004737 | \n", "-0.070684 | \n", "-0.024462 | \n", "-0.011495 | \n", "-0.150803 | \n", "0.258310 | \n", "-0.023645 | \n", "0.054582 | \n", "... | \n", "0.071099 | \n", "0.111338 | \n", "0.020180 | \n", "-0.049511 | \n", "-0.059965 | \n", "0.063550 | \n", "-0.019318 | \n", "0.126384 | \n", "0.001582 | \n", "0.015054 | \n", "
3 | \n", "-0.009988 | \n", "-0.003648 | \n", "0.012213 | \n", "0.121117 | \n", "0.031789 | \n", "-0.031690 | \n", "-0.033959 | \n", "0.002975 | \n", "-0.057448 | \n", "-0.136400 | \n", "... | \n", "-0.068956 | \n", "0.031779 | \n", "0.245507 | \n", "0.027935 | \n", "0.038758 | \n", "-0.057646 | \n", "-0.054516 | \n", "-0.055807 | \n", "-0.005945 | \n", "-0.003330 | \n", "
4 | \n", "-0.104526 | \n", "0.066071 | \n", "0.052031 | \n", "0.007025 | \n", "0.083235 | \n", "0.038949 | \n", "-0.075813 | \n", "0.105265 | \n", "0.076275 | \n", "0.093963 | \n", "... | \n", "-0.027402 | \n", "0.041936 | \n", "0.013638 | \n", "0.020756 | \n", "0.009367 | \n", "0.003728 | \n", "0.026330 | \n", "0.006912 | \n", "0.076745 | \n", "-0.143186 | \n", "
5 | \n", "0.102369 | \n", "0.060284 | \n", "0.038335 | \n", "0.013700 | \n", "-0.111408 | \n", "0.278752 | \n", "0.066290 | \n", "0.134587 | \n", "-0.168386 | \n", "0.098990 | \n", "... | \n", "-0.064056 | \n", "-0.045495 | \n", "-0.201629 | \n", "-0.056866 | \n", "-0.007317 | \n", "-0.084082 | \n", "-0.051703 | \n", "0.098773 | \n", "-0.122135 | \n", "-0.010227 | \n", "
6 | \n", "-0.177834 | \n", "-0.042088 | \n", "0.000444 | \n", "0.010341 | \n", "0.108204 | \n", "0.005978 | \n", "0.027057 | \n", "-0.006823 | \n", "-0.105048 | \n", "-0.166809 | \n", "... | \n", "0.147754 | \n", "-0.021945 | \n", "0.199206 | \n", "0.040381 | \n", "-0.033136 | \n", "-0.052512 | \n", "0.018241 | \n", "-0.003154 | \n", "0.250065 | \n", "-0.082886 | \n", "
7 | \n", "0.064315 | \n", "-0.179968 | \n", "0.041989 | \n", "-0.258878 | \n", "-0.158774 | \n", "0.052421 | \n", "0.052402 | \n", "-0.184161 | \n", "-0.064313 | \n", "-0.016940 | \n", "... | \n", "0.120286 | \n", "-0.008686 | \n", "-0.231400 | \n", "0.272920 | \n", "-0.021050 | \n", "0.109255 | \n", "0.118702 | \n", "0.021676 | \n", "-0.181603 | \n", "-0.013686 | \n", "
8 | \n", "0.022922 | \n", "0.312208 | \n", "0.010867 | \n", "-0.191499 | \n", "0.106837 | \n", "-0.229320 | \n", "-0.008715 | \n", "-0.210733 | \n", "0.037657 | \n", "0.062167 | \n", "... | \n", "0.085694 | \n", "0.040236 | \n", "-0.080379 | \n", "-0.042302 | \n", "0.103474 | \n", "-0.319116 | \n", "0.086957 | \n", "-0.002122 | \n", "0.176039 | \n", "-0.005275 | \n", "
9 | \n", "-0.307194 | \n", "0.043982 | \n", "0.100535 | \n", "-0.267381 | \n", "-0.050292 | \n", "-0.010615 | \n", "-0.169009 | \n", "0.112214 | \n", "0.375080 | \n", "-0.178900 | \n", "... | \n", "0.012742 | \n", "-0.063244 | \n", "-0.010274 | \n", "-0.093946 | \n", "-0.064835 | \n", "0.015574 | \n", "0.066980 | \n", "0.022103 | \n", "-0.124436 | \n", "-0.030850 | \n", "
10 | \n", "-0.460474 | \n", "-0.161775 | \n", "0.248694 | \n", "0.026441 | \n", "0.160128 | \n", "0.045803 | \n", "-0.233451 | \n", "-0.090797 | \n", "0.112407 | \n", "0.233218 | \n", "... | \n", "0.098966 | \n", "-0.276042 | \n", "-0.014679 | \n", "0.299511 | \n", "0.042258 | \n", "-0.020505 | \n", "-0.006881 | \n", "-0.029953 | \n", "0.251095 | \n", "0.169541 | \n", "
11 | \n", "-0.011533 | \n", "0.112840 | \n", "0.052935 | \n", "-0.015447 | \n", "-0.496742 | \n", "-0.003645 | \n", "0.026647 | \n", "0.325467 | \n", "-0.001202 | \n", "0.038805 | \n", "... | \n", "0.125546 | \n", "-0.109403 | \n", "-0.441826 | \n", "-0.083174 | \n", "-0.185597 | \n", "-0.014198 | \n", "0.024846 | \n", "-0.064795 | \n", "0.087614 | \n", "0.030194 | \n", "
12 | \n", "0.245613 | \n", "0.296580 | \n", "-0.038640 | \n", "-0.305640 | \n", "-0.124200 | \n", "0.271039 | \n", "0.256376 | \n", "0.541047 | \n", "0.342560 | \n", "0.444382 | \n", "... | \n", "0.560416 | \n", "-0.040864 | \n", "-0.018886 | \n", "-0.035057 | \n", "0.008567 | \n", "0.115618 | \n", "0.088392 | \n", "-0.240834 | \n", "-0.047551 | \n", "-0.004341 | \n", "
13 | \n", "-0.660592 | \n", "0.019101 | \n", "-0.120316 | \n", "-0.134759 | \n", "-0.079046 | \n", "-0.341223 | \n", "-0.456594 | \n", "-0.103917 | \n", "0.044514 | \n", "-0.499875 | \n", "... | \n", "0.023534 | \n", "0.128351 | \n", "-0.001446 | \n", "-0.276361 | \n", "0.020640 | \n", "0.088788 | \n", "-0.147279 | \n", "-0.029944 | \n", "0.029581 | \n", "0.008861 | \n", "
14 | \n", "-0.085970 | \n", "-0.005224 | \n", "-0.109311 | \n", "-0.124855 | \n", "0.053398 | \n", "0.029236 | \n", "-0.034264 | \n", "-0.209597 | \n", "0.003543 | \n", "-0.030162 | \n", "... | \n", "0.104430 | \n", "-0.231496 | \n", "-0.059287 | \n", "-0.001135 | \n", "0.046376 | \n", "0.015712 | \n", "-0.034782 | \n", "-0.005570 | \n", "0.020778 | \n", "-0.011761 | \n", "
15 | \n", "-0.010765 | \n", "-0.039279 | \n", "-0.028720 | \n", "0.005197 | \n", "-0.021794 | \n", "-0.020378 | \n", "0.005175 | \n", "0.006982 | \n", "0.017629 | \n", "-0.019821 | \n", "... | \n", "-0.011246 | \n", "0.008254 | \n", "0.046750 | \n", "0.060618 | \n", "0.022018 | \n", "-0.094701 | \n", "0.021379 | \n", "-0.037993 | \n", "-0.040373 | \n", "-0.096038 | \n", "
16 | \n", "-0.102186 | \n", "0.118045 | \n", "0.007831 | \n", "0.033710 | \n", "0.026295 | \n", "-0.103503 | \n", "-0.186066 | \n", "-0.038786 | \n", "0.027735 | \n", "-0.093989 | \n", "... | \n", "-0.233836 | \n", "0.137448 | \n", "0.046462 | \n", "-0.017949 | \n", "-0.011706 | \n", "-0.010605 | \n", "0.010610 | \n", "-0.056985 | \n", "0.004476 | \n", "0.000662 | \n", "
17 | \n", "-0.133406 | \n", "0.200971 | \n", "0.115494 | \n", "0.210792 | \n", "0.246593 | \n", "-0.616548 | \n", "-0.265424 | \n", "0.108277 | \n", "0.383489 | \n", "-0.634510 | \n", "... | \n", "0.005781 | \n", "0.341722 | \n", "-0.029150 | \n", "0.164946 | \n", "-0.003980 | \n", "0.146180 | \n", "0.165554 | \n", "-0.024582 | \n", "-0.000005 | \n", "-0.012273 | \n", "
18 | \n", "-0.031818 | \n", "0.005847 | \n", "0.032746 | \n", "-0.134567 | \n", "0.327935 | \n", "0.030983 | \n", "-0.019491 | \n", "0.083426 | \n", "0.073085 | \n", "-0.001174 | \n", "... | \n", "-0.062192 | \n", "0.260967 | \n", "-0.008846 | \n", "0.142483 | \n", "-0.016763 | \n", "-0.008566 | \n", "-0.008396 | \n", "-0.053864 | \n", "0.054478 | \n", "0.011180 | \n", "
19 | \n", "-0.600847 | \n", "0.053716 | \n", "0.075986 | \n", "-0.049103 | \n", "0.048417 | \n", "0.555588 | \n", "-0.332565 | \n", "0.418213 | \n", "-0.372007 | \n", "0.248540 | \n", "... | \n", "-0.052842 | \n", "0.318683 | \n", "-0.009427 | \n", "0.426128 | \n", "-0.065958 | \n", "-0.097275 | \n", "0.261584 | \n", "0.066420 | \n", "-0.060836 | \n", "0.083081 | \n", "
20 rows × 2000 columns
\n", "