Human interactome¶
This Jupyter notebook provides an example of using the Python package gravis. The .ipynb file can be found here.
It visualizes protein-protein interactions (PPi) taken from the Human Reference Interactome (HuRI) and HuRI combined with other systematic screening efforts at CCSB (HI-union).
References¶
Center for Cancer Systems Biology (CCSB)
The Human Reference Protein Interactome Mapping Project
-
HuRI.tsv with 52569 interactions (Ensembl gene IDs)
HI-union.tsv with 64006 interactions
-
“The dataset, versioned HI-III-19 (Human Interactome obtained from screening Space III, published in 2019), contains 52,569 verified PPIs involving 8,275 proteins (Supplementary Table 6). Given its systematic nature, completeness and scale, we consider HI-III-19 to be the first draft of the Human Reference Interactome (HuRI).”
“Combining HuRI with all previously published systematic screening efforts at CCSB yields 64,006 binary PPIs involving 9,094 proteins (HI-union)”
-
-
About: The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data.
[1]:
import csv
import os
import gravis as gv
import networkx as nx
Load protein-protein interaction (PPi) data¶
The data is given as table that contains (source, target) pairs, which is a simple edge list. Note that HI-union-minimal.tsv
is a reduced version of HI-union.tsv
that contains only the first two columns.
[2]:
def load_csv_data(filepath, delimiter=','):
with open(filepath) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=delimiter)
data = list(csv_reader)
return data
filepath = os.path.join('data', 'HuRI.tsv')
data_huri = load_csv_data(filepath, delimiter='\t')
filepath = os.path.join('data', 'HI-union-minimal.tsv')
data_hi_union = load_csv_data(filepath, delimiter='\t')
Create PPi network as NetworkX graph¶
[3]:
def construct_graph(data, name):
graph = nx.Graph()
for source, target in data:
graph.add_edge(source, target)
uniprot_template = (
'Degree: {degree}<br>'
'Uniprot: <a href="https://www.uniprot.org/uniprot/{id}" target="_blank">{id}</a>')
ensembl_template = (
'Degree: {degree}<br>'
'Ensembl: <a href="https://www.ensembl.org/Homo_sapiens/Gene/Summary?g={id}" target="_blank">{id}</a><br>'
'NCBI Gene: <a href="https://www.ncbi.nlm.nih.gov/gene/?term={id}" target="_blank">{id}</a>')
for node_id in graph.nodes:
node = graph.nodes[node_id]
template = ensembl_template if node_id.lower().startswith('ens') else uniprot_template
node['hover'] = template.format(id=node_id, degree=graph.degree[node_id])
node['click'] = '$hover'
print('Protein-protein interaction network "{}"'.format(name))
print('- Number of nodes:', len(graph.nodes))
print('- Number of edges:', len(graph.edges))
print()
return graph
graph_huri = construct_graph(data_huri, 'HuRI')
graph_hi_union = construct_graph(data_hi_union, 'HI-union')
Protein-protein interaction network "HuRI"
- Number of nodes: 8272
- Number of edges: 52548
Protein-protein interaction network "HI-union"
- Number of nodes: 9573
- Number of edges: 65330
Plot filtered versions of the large graph¶
Filter 1: Egocentric network (=neighborhood of a chosen node)¶
GSU library: Ego network
Science direct topic: Egocentric network
Focus on an actor (“ego”) and show all edges to his direct neighbors (“alters”) and between them.
Chosen here is the MYC gene, with ENSG00000136997 as Ensembl identifier of the gene and P01106 as Uniprot identifier of the protein transcribed from the gene.
[4]:
def list_edges_containing_a_node(data, beginning_node_id):
for source, target in data:
for node_id in [source, target]:
if node_id.startswith(beginning_node_id):
print(' ', source, target)
gene_id = 'ENSG00000136997'
print('Edges containing the gene id "{}" in HuRI database'.format(gene_id))
list_edges_containing_a_node(data_huri, gene_id)
print()
protein_id = 'P01106'
print('Edges containing the protein id "{}" in HI-union database'.format(protein_id))
list_edges_containing_a_node(data_hi_union, protein_id)
Edges containing the gene id "ENSG00000136997" in HuRI database
ENSG00000004487 ENSG00000136997
ENSG00000125952 ENSG00000136997
Edges containing the protein id "P01106" in HI-union database
P61244-1 P01106-1
P61244-1 P01106-1
P61244-1 P01106-1
O60341-1 P01106-1
P01106-1 P61244-1
P01106-1 P61244-1
O60341-1 P01106-1
O60341-1 P01106-1
P61244-1 P01106-1
[5]:
def create_ego_graph(graph, ego_node_id, radius=1):
ego_graph = nx.ego_graph(graph, ego_node_id, radius=radius)
ego_node = ego_graph.nodes[ego_node_id]
ego_node['color'] = 'red'
ego_node['label_color'] = 'red'
pos_counter = {i: 0 for i in range(radius+1)}
for node_id in ego_graph.nodes:
node = ego_graph.nodes[node_id]
distance = len(nx.shortest_path(graph, ego_node_id, node_id)) - 1
node['x'] = pos_counter[distance] * 40 - 1000
node['y'] = distance * 120 - 150
node['size'] = 10 + graph.degree[node_id] / 10
pos_counter[distance] += 1
if distance == 1:
node['color'] = 'blue'
elif distance == 2:
node['color'] = 'green'
print('Egocentric graph')
print('- Number of nodes:', len(ego_graph.nodes))
print('- Number of edges:', len(ego_graph.edges))
return ego_graph
1) HuRI data¶
[6]:
# Examples of proteins in HuRI data (ensembl identifiers)
# Myc: ENSG00000136997
# Max: ENSG00000125952
ego_graph = create_ego_graph(graph_huri, 'ENSG00000136997', radius=2)
gv.d3(ego_graph, zoom_factor=0.33, graph_height=250, node_label_rotation=35)
Egocentric graph
- Number of nodes: 53
- Number of edges: 119
[6]:
2) HI-union data¶
[7]:
# Examples of proteins in HI-union data (uniprot identifiers)
# Myc: 'P01106-1'
# Max: 'P61244-1'
ego_graph = create_ego_graph(graph_hi_union, 'P01106-1', radius=2)
gv.d3(ego_graph, zoom_factor=0.33, graph_height=250, node_label_rotation=35)
Egocentric graph
- Number of nodes: 60
- Number of edges: 159
[7]:
Filter 2: Only well-connected nodes with degree >= n¶
Show only proteins that have interactions with at least n other proteins.
[8]:
def create_high_degree_graph(graph, n):
filtered_graph = graph.copy()
# Step 1
to_remove = [node for node, degree in graph.degree() if degree < n]
filtered_graph.remove_nodes_from(to_remove)
# Step 2
to_remove = [node for node, degree in filtered_graph.degree() if degree < 1]
filtered_graph.remove_nodes_from(to_remove)
print('Filtered graph containing only nodes of degree >= {}'.format(n))
print('- Number of nodes:', len(filtered_graph.nodes))
print('- Number of edges:', len(filtered_graph.edges))
return filtered_graph
1) HuRI data¶
[9]:
graph = create_high_degree_graph(graph_huri, n=150)
gv.d3(graph, node_label_size_factor=0.5)
Filtered graph containing only nodes of degree >= 150
- Number of nodes: 44
- Number of edges: 168
[9]:
2) HI-union data¶
[10]:
graph = create_high_degree_graph(graph_hi_union, n=175)
gv.d3(graph, node_label_size_factor=0.5)
Filtered graph containing only nodes of degree >= 175
- Number of nodes: 56
- Number of edges: 333
[10]: