{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Human interactome\n", "\n", "This Jupyter notebook provides an example of using the Python package [gravis](https://pypi.org/project/gravis). The .ipynb file can be found [here](https://github.com/robert-haas/gravis/tree/master/examples).\n", "\n", "It visualizes **protein-protein interactions (PPi)** taken from the **Human Reference Interactome (HuRI)** and HuRI combined with other systematic screening efforts at CCSB (**HI-union**).\n", "\n", "\n", "## References\n", "\n", "- [Center for Cancer Systems Biology (CCSB)](https://www.dana-farber.org/research/departments-centers-and-labs/integrative-research-centers/center-for-cancer-systems-biology/)\n", " - [The Human Reference Protein Interactome Mapping Project](http://www.interactome-atlas.org)\n", " - [Download](http://www.interactome-atlas.org/download)\n", " - [HuRI.tsv](http://www.interactome-atlas.org/data/HuRI.tsv) with 52569 interactions (Ensembl gene IDs)\n", " - [HI-union.tsv](http://www.interactome-atlas.org/data/HI-union.tsv) with 64006 interactions\n", " - [Preprint paper](https://www.biorxiv.org/content/10.1101/605451v2)\n", " - \"The dataset, versioned HI-III-19 (Human Interactome obtained from screening Space III, published in 2019), contains 52,569 verified PPIs involving 8,275 proteins (Supplementary Table 6). Given its systematic nature, completeness and scale, we consider HI-III-19 to be the first draft of the Human Reference Interactome (**HuRI**).\"\n", " - \"Combining HuRI with all previously published systematic screening efforts at CCSB yields 64,006 binary PPIs involving 9,094 proteins (**HI-union**)\"\n", "- [EMBL-EBI](https://www.ebi.ac.uk)\n", " - [Ensembl](http://www.ensembl.org): Ensembl is a genome browser for vertebrate genomes\n", " - [About](http://www.ensembl.org/info/about/index.html): In order to improve consistency between the data provided by different genome browsers, Ensembl has entered into an agreement with UCSC and NCBI with regard to sequence identifiers \n", "- [UniProt](https://www.uniprot.org)\n", " - [About](https://www.uniprot.org/help/about): The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import csv\n", "import os\n", "\n", "import gravis as gv\n", "import networkx as nx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load protein-protein interaction (PPi) data\n", "\n", "The data is given as table that contains (source, target) pairs, which is a simple edge list. Note that ``HI-union-minimal.tsv`` is a reduced version of ``HI-union.tsv`` that contains only the first two columns." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def load_csv_data(filepath, delimiter=','):\n", " with open(filepath) as csv_file:\n", " csv_reader = csv.reader(csv_file, delimiter=delimiter)\n", " data = list(csv_reader)\n", " return data\n", "\n", "\n", "filepath = os.path.join('data', 'HuRI.tsv')\n", "data_huri = load_csv_data(filepath, delimiter='\\t')\n", "\n", "filepath = os.path.join('data', 'HI-union-minimal.tsv')\n", "data_hi_union = load_csv_data(filepath, delimiter='\\t')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create PPi network as NetworkX graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def construct_graph(data, name):\n", " graph = nx.Graph()\n", " for source, target in data:\n", " graph.add_edge(source, target)\n", "\n", " uniprot_template = (\n", " 'Degree: {degree}
'\n", " 'Uniprot: {id}')\n", " ensembl_template = (\n", " 'Degree: {degree}
'\n", " 'Ensembl: {id}
'\n", " 'NCBI Gene: {id}')\n", "\n", " for node_id in graph.nodes:\n", " node = graph.nodes[node_id]\n", " template = ensembl_template if node_id.lower().startswith('ens') else uniprot_template\n", " node['hover'] = template.format(id=node_id, degree=graph.degree[node_id])\n", " node['click'] = '$hover'\n", " print('Protein-protein interaction network \"{}\"'.format(name))\n", " print('- Number of nodes:', len(graph.nodes))\n", " print('- Number of edges:', len(graph.edges))\n", " print()\n", " return graph\n", "\n", "\n", "graph_huri = construct_graph(data_huri, 'HuRI')\n", "graph_hi_union = construct_graph(data_hi_union, 'HI-union')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot filtered versions of the large graph\n", "\n", "### Filter 1: Egocentric network (=neighborhood of a chosen node)\n", "\n", "- GSU library: [Ego network](https://research.library.gsu.edu/c.php?g=916490&p=6612505)\n", "- Science direct topic: [Egocentric network](https://www.sciencedirect.com/topics/computer-science/egocentric-network)\n", "\n", "Focus on an actor (\"ego\") and show all edges to his direct neighbors (\"alters\") and between them.\n", "\n", "Chosen here is the MYC gene, with [ENSG00000136997](https://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000136997;r=8:127735434-127742951) as [Ensembl identifier](https://m.ensembl.org/info/genome/stable_ids/index.html) of the gene and [P01106](https://www.uniprot.org/uniprot/P01106) as [Uniprot identifier](https://www.uniprot.org/help/accession_numbers) of the protein transcribed from the gene." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def list_edges_containing_a_node(data, beginning_node_id):\n", " for source, target in data:\n", " for node_id in [source, target]:\n", " if node_id.startswith(beginning_node_id):\n", " print(' ', source, target)\n", "\n", "gene_id = 'ENSG00000136997'\n", "print('Edges containing the gene id \"{}\" in HuRI database'.format(gene_id))\n", "list_edges_containing_a_node(data_huri, gene_id)\n", "\n", "print()\n", "\n", "protein_id = 'P01106'\n", "print('Edges containing the protein id \"{}\" in HI-union database'.format(protein_id))\n", "list_edges_containing_a_node(data_hi_union, protein_id)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_ego_graph(graph, ego_node_id, radius=1):\n", " ego_graph = nx.ego_graph(graph, ego_node_id, radius=radius)\n", " ego_node = ego_graph.nodes[ego_node_id]\n", " ego_node['color'] = 'red'\n", " ego_node['label_color'] = 'red'\n", " pos_counter = {i: 0 for i in range(radius+1)}\n", " for node_id in ego_graph.nodes:\n", " node = ego_graph.nodes[node_id]\n", " distance = len(nx.shortest_path(graph, ego_node_id, node_id)) - 1\n", " node['x'] = pos_counter[distance] * 40 - 1000\n", " node['y'] = distance * 120 - 150\n", " node['size'] = 10 + graph.degree[node_id] / 10\n", " pos_counter[distance] += 1\n", " if distance == 1:\n", " node['color'] = 'blue'\n", " elif distance == 2:\n", " node['color'] = 'green'\n", " print('Egocentric graph')\n", " print('- Number of nodes:', len(ego_graph.nodes))\n", " print('- Number of edges:', len(ego_graph.edges))\n", " return ego_graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1) HuRI data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Examples of proteins in HuRI data (ensembl identifiers)\n", "# Myc: ENSG00000136997\n", "# Max: ENSG00000125952\n", "ego_graph = create_ego_graph(graph_huri, 'ENSG00000136997', radius=2)\n", "\n", "gv.d3(ego_graph, zoom_factor=0.33, graph_height=250, node_label_rotation=35)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2) HI-union data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Examples of proteins in HI-union data (uniprot identifiers)\n", "# Myc: 'P01106-1'\n", "# Max: 'P61244-1'\n", "ego_graph = create_ego_graph(graph_hi_union, 'P01106-1', radius=2)\n", "\n", "gv.d3(ego_graph, zoom_factor=0.33, graph_height=250, node_label_rotation=35)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Filter 2: Only well-connected nodes with degree >= n \n", "\n", "Show only proteins that have interactions with at least n other proteins." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_high_degree_graph(graph, n):\n", " filtered_graph = graph.copy()\n", "\n", " # Step 1\n", " to_remove = [node for node, degree in graph.degree() if degree < n]\n", " filtered_graph.remove_nodes_from(to_remove)\n", "\n", " # Step 2\n", " to_remove = [node for node, degree in filtered_graph.degree() if degree < 1]\n", " filtered_graph.remove_nodes_from(to_remove)\n", " \n", " print('Filtered graph containing only nodes of degree >= {}'.format(n))\n", " print('- Number of nodes:', len(filtered_graph.nodes))\n", " print('- Number of edges:', len(filtered_graph.edges))\n", " return filtered_graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1) HuRI data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "graph = create_high_degree_graph(graph_huri, n=150)\n", "\n", "gv.d3(graph, node_label_size_factor=0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2) HI-union data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "graph = create_high_degree_graph(graph_hi_union, n=175)\n", "\n", "gv.d3(graph, node_label_size_factor=0.5)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 2 }