{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Human interactome\n",
"\n",
"This Jupyter notebook provides an example of using the Python package [gravis](https://pypi.org/project/gravis). The .ipynb file can be found [here](https://github.com/robert-haas/gravis/tree/master/examples).\n",
"\n",
"It visualizes **protein-protein interactions (PPi)** taken from the **Human Reference Interactome (HuRI)** and HuRI combined with other systematic screening efforts at CCSB (**HI-union**).\n",
"\n",
"\n",
"## References\n",
"\n",
"- [Center for Cancer Systems Biology (CCSB)](https://www.dana-farber.org/research/departments-centers-and-labs/integrative-research-centers/center-for-cancer-systems-biology/)\n",
" - [The Human Reference Protein Interactome Mapping Project](http://www.interactome-atlas.org)\n",
" - [Download](http://www.interactome-atlas.org/download)\n",
" - [HuRI.tsv](http://www.interactome-atlas.org/data/HuRI.tsv) with 52569 interactions (Ensembl gene IDs)\n",
" - [HI-union.tsv](http://www.interactome-atlas.org/data/HI-union.tsv) with 64006 interactions\n",
" - [Preprint paper](https://www.biorxiv.org/content/10.1101/605451v2)\n",
" - \"The dataset, versioned HI-III-19 (Human Interactome obtained from screening Space III, published in 2019), contains 52,569 verified PPIs involving 8,275 proteins (Supplementary Table 6). Given its systematic nature, completeness and scale, we consider HI-III-19 to be the first draft of the Human Reference Interactome (**HuRI**).\"\n",
" - \"Combining HuRI with all previously published systematic screening efforts at CCSB yields 64,006 binary PPIs involving 9,094 proteins (**HI-union**)\"\n",
"- [EMBL-EBI](https://www.ebi.ac.uk)\n",
" - [Ensembl](http://www.ensembl.org): Ensembl is a genome browser for vertebrate genomes\n",
" - [About](http://www.ensembl.org/info/about/index.html): In order to improve consistency between the data provided by different genome browsers, Ensembl has entered into an agreement with UCSC and NCBI with regard to sequence identifiers \n",
"- [UniProt](https://www.uniprot.org)\n",
" - [About](https://www.uniprot.org/help/about): The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import csv\n",
"import os\n",
"\n",
"import gravis as gv\n",
"import networkx as nx"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load protein-protein interaction (PPi) data\n",
"\n",
"The data is given as table that contains (source, target) pairs, which is a simple edge list. Note that ``HI-union-minimal.tsv`` is a reduced version of ``HI-union.tsv`` that contains only the first two columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def load_csv_data(filepath, delimiter=','):\n",
" with open(filepath) as csv_file:\n",
" csv_reader = csv.reader(csv_file, delimiter=delimiter)\n",
" data = list(csv_reader)\n",
" return data\n",
"\n",
"\n",
"filepath = os.path.join('data', 'HuRI.tsv')\n",
"data_huri = load_csv_data(filepath, delimiter='\\t')\n",
"\n",
"filepath = os.path.join('data', 'HI-union-minimal.tsv')\n",
"data_hi_union = load_csv_data(filepath, delimiter='\\t')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create PPi network as NetworkX graph"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def construct_graph(data, name):\n",
" graph = nx.Graph()\n",
" for source, target in data:\n",
" graph.add_edge(source, target)\n",
"\n",
" uniprot_template = (\n",
" 'Degree: {degree}
'\n",
" 'Uniprot: {id}')\n",
" ensembl_template = (\n",
" 'Degree: {degree}
'\n",
" 'Ensembl: {id}
'\n",
" 'NCBI Gene: {id}')\n",
"\n",
" for node_id in graph.nodes:\n",
" node = graph.nodes[node_id]\n",
" template = ensembl_template if node_id.lower().startswith('ens') else uniprot_template\n",
" node['hover'] = template.format(id=node_id, degree=graph.degree[node_id])\n",
" node['click'] = '$hover'\n",
" print('Protein-protein interaction network \"{}\"'.format(name))\n",
" print('- Number of nodes:', len(graph.nodes))\n",
" print('- Number of edges:', len(graph.edges))\n",
" print()\n",
" return graph\n",
"\n",
"\n",
"graph_huri = construct_graph(data_huri, 'HuRI')\n",
"graph_hi_union = construct_graph(data_hi_union, 'HI-union')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot filtered versions of the large graph\n",
"\n",
"### Filter 1: Egocentric network (=neighborhood of a chosen node)\n",
"\n",
"- GSU library: [Ego network](https://research.library.gsu.edu/c.php?g=916490&p=6612505)\n",
"- Science direct topic: [Egocentric network](https://www.sciencedirect.com/topics/computer-science/egocentric-network)\n",
"\n",
"Focus on an actor (\"ego\") and show all edges to his direct neighbors (\"alters\") and between them.\n",
"\n",
"Chosen here is the MYC gene, with [ENSG00000136997](https://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000136997;r=8:127735434-127742951) as [Ensembl identifier](https://m.ensembl.org/info/genome/stable_ids/index.html) of the gene and [P01106](https://www.uniprot.org/uniprot/P01106) as [Uniprot identifier](https://www.uniprot.org/help/accession_numbers) of the protein transcribed from the gene."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def list_edges_containing_a_node(data, beginning_node_id):\n",
" for source, target in data:\n",
" for node_id in [source, target]:\n",
" if node_id.startswith(beginning_node_id):\n",
" print(' ', source, target)\n",
"\n",
"gene_id = 'ENSG00000136997'\n",
"print('Edges containing the gene id \"{}\" in HuRI database'.format(gene_id))\n",
"list_edges_containing_a_node(data_huri, gene_id)\n",
"\n",
"print()\n",
"\n",
"protein_id = 'P01106'\n",
"print('Edges containing the protein id \"{}\" in HI-union database'.format(protein_id))\n",
"list_edges_containing_a_node(data_hi_union, protein_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def create_ego_graph(graph, ego_node_id, radius=1):\n",
" ego_graph = nx.ego_graph(graph, ego_node_id, radius=radius)\n",
" ego_node = ego_graph.nodes[ego_node_id]\n",
" ego_node['color'] = 'red'\n",
" ego_node['label_color'] = 'red'\n",
" pos_counter = {i: 0 for i in range(radius+1)}\n",
" for node_id in ego_graph.nodes:\n",
" node = ego_graph.nodes[node_id]\n",
" distance = len(nx.shortest_path(graph, ego_node_id, node_id)) - 1\n",
" node['x'] = pos_counter[distance] * 40 - 1000\n",
" node['y'] = distance * 120 - 150\n",
" node['size'] = 10 + graph.degree[node_id] / 10\n",
" pos_counter[distance] += 1\n",
" if distance == 1:\n",
" node['color'] = 'blue'\n",
" elif distance == 2:\n",
" node['color'] = 'green'\n",
" print('Egocentric graph')\n",
" print('- Number of nodes:', len(ego_graph.nodes))\n",
" print('- Number of edges:', len(ego_graph.edges))\n",
" return ego_graph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1) HuRI data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# Examples of proteins in HuRI data (ensembl identifiers)\n",
"# Myc: ENSG00000136997\n",
"# Max: ENSG00000125952\n",
"ego_graph = create_ego_graph(graph_huri, 'ENSG00000136997', radius=2)\n",
"\n",
"gv.d3(ego_graph, zoom_factor=0.33, graph_height=250, node_label_rotation=35)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2) HI-union data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Examples of proteins in HI-union data (uniprot identifiers)\n",
"# Myc: 'P01106-1'\n",
"# Max: 'P61244-1'\n",
"ego_graph = create_ego_graph(graph_hi_union, 'P01106-1', radius=2)\n",
"\n",
"gv.d3(ego_graph, zoom_factor=0.33, graph_height=250, node_label_rotation=35)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filter 2: Only well-connected nodes with degree >= n \n",
"\n",
"Show only proteins that have interactions with at least n other proteins."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def create_high_degree_graph(graph, n):\n",
" filtered_graph = graph.copy()\n",
"\n",
" # Step 1\n",
" to_remove = [node for node, degree in graph.degree() if degree < n]\n",
" filtered_graph.remove_nodes_from(to_remove)\n",
"\n",
" # Step 2\n",
" to_remove = [node for node, degree in filtered_graph.degree() if degree < 1]\n",
" filtered_graph.remove_nodes_from(to_remove)\n",
" \n",
" print('Filtered graph containing only nodes of degree >= {}'.format(n))\n",
" print('- Number of nodes:', len(filtered_graph.nodes))\n",
" print('- Number of edges:', len(filtered_graph.edges))\n",
" return filtered_graph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1) HuRI data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"graph = create_high_degree_graph(graph_huri, n=150)\n",
"\n",
"gv.d3(graph, node_label_size_factor=0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2) HI-union data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"graph = create_high_degree_graph(graph_hi_union, n=175)\n",
"\n",
"gv.d3(graph, node_label_size_factor=0.5)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}