# Actor-Movie relations from Wikidata

This Jupyter notebook provides an example of using the Python package [gravis](https://pypi.org/project/gravis). The .ipynb file can be found [here](https://github.com/robert-haas/gravis/tree/master/examples).

It shows how a **network of actors and movies** can be visualized as bipartite graph (=a graph with two types of nodes, where actor nodes and movie nodes, visually distinguished by color). The data is fetched from **Wikidata** with **SPARQL** (a data query language) and describes the relations between actors and movies they participated in (many entries are missing).

## References

- [Wikidata](https://www.wikidata.org)
 - [Glossary](https://www.wikidata.org/wiki/Wikidata:Glossary)
 - [SPARQL tutorial](https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial)
 - [Query examples 1](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries)
 - [Query examples 2](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples)
 - [Example: Characters portrayed by most actors](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Characters_portrayed_by_most_actors)
 - [List of properties](https://www.wikidata.org/wiki/Wikidata:List_of_properties)

- Other
 - [Tutorial: Where do Mayors Come From: Querying Wikidata with Python and SPARQL](https://janakiev.com/blog/wikidata-mayors/)
 - [Your First SPARQL Query](https://docs.data.world/tutorials/sparql/Your_First_Sparql_Query.html)

- Used here
 - Property P18: [image](https://www.wikidata.org/wiki/Property:P18)
 - Property P161: [cast member](https://www.wikidata.org/wiki/Property:P161)
 - Property P453: [character role](https://www.wikidata.org/wiki/Property:P453)

In [None]:
import random
import string

import gravis as gv
import networkx as nx
import requests

## Data generation: Fetch data from Wikidata with a SPARQL query

Goal: Fetch data about actors and movies from Wikidata in order to create a bipartite network of actor-movie relations.

In [None]:
def fetch_data(num_tries):
 url = 'https://query.wikidata.org/sparql'
 query = """
 SELECT ?filmLabel ?actorLabel ?characterLabel ?actorImage ?movieImage ?characterImage
 WHERE {
 ?film p:P161 [
 ps:P161 ?actor;
 pq:P453 ?character
 ].
 OPTIONAL{
 ?film wdt:P18 ?filmImage. # film / has image / filmImage
 ?actor wdt:P18 ?actorImage. # actor / has image / actorImage
 }
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
 }
 LIMIT 100000
 """
 for i in range(num_tries):
 try:
 print('Try number {}'.format(i+1))
 random_string = ''.join(random.choice(string.ascii_letters) for i in range(20))
 headers = {'User-Agent': random_string}
 params = {'format': 'json', 'query': query}
 response = requests.get(url, headers=headers, params=params)
 print(response.text)
 data = response.json()
 break
 except Exception:
 pass
 else:
 raise ValueError('Data fetching failed.')
 return data


data = fetch_data(num_tries=5)
print('Number of items:', len(data['results']['bindings']))

## Create a bipartite graph of actors and movies

In [None]:
graph = nx.Graph()

for item in data['results']['bindings']:
 movie = item['filmLabel']['value']
 actor = 'Actor: ' +item['actorLabel']['value']
 character = item['characterLabel']['value']
 
 # Node type 1: Movie (red)
 graph.add_node(movie)
 node = graph.nodes[movie]
 node['type'] = 'Movie'
 node['color'] = 'red'
 node['label_color'] = 'red'
 
 # Node type 2: Actor (black)
 graph.add_node(actor)
 node = graph.nodes[actor]
 node['type'] = 'Actor'
 node['color'] = 'black'
 
 # Edge between different node types
 graph.add_edge(movie, actor)

print('Number of nodes:', len(graph.nodes))
print('Number of edges:', len(graph.edges))

In [None]:
def add_properties(graph):
 for node, degree in graph.degree():
 graph.nodes[node]['size'] = 10.0 + degree / 10.0

add_properties(graph)

## Plot filtered versions of the large graph

### Filter 1: Egocentric network (=neighborhood of a selected node)

- GSU library: [Ego network](https://research.library.gsu.edu/c.php?g=916490&p=6612505)
- Science direct topic: [Egocentric network](https://www.sciencedirect.com/topics/computer-science/egocentric-network)

Focus on an actor ("ego") and show all edges to his direct neighbors ("alters") and between them.

In [None]:
ego = 'Actor: Anthony Hopkins'

ego_graph = nx.ego_graph(graph, ego, radius=2)
ego_graph.nodes[ego]['shape'] = 'rectangle'
ego_graph.nodes[ego]['color'] = 'green'
ego_graph.nodes[ego]['label_color'] = 'green'

print('Number of nodes:', len(ego_graph.nodes))
print('Number of edges:', len(ego_graph.edges))

In [None]:
gv.d3(ego_graph, node_hover_neighborhood=True, zoom_factor=0.3, node_label_size_factor=0.5)

### Filter 2: Only well-connected nodes with degree >= n 

Show only actors that play in at least n movies and each movie with at least one such actor.

In [None]:
n = 10
filtered_graph = graph.copy()

# Step 1
to_remove = [node for node, degree in graph.degree()
 if (degree < n and graph.nodes[node]['type'] == 'Actor')]
filtered_graph.remove_nodes_from(to_remove)

# Step 2
to_remove = [node for node, degree in filtered_graph.degree() if degree < 1]
filtered_graph.remove_nodes_from(to_remove)

print('Number of nodes:', len(filtered_graph.nodes))
print('Number of edges:', len(filtered_graph.edges))

In [None]:
# Use a precalculated layout: Fruchterman-Reingold
layout = nx.spring_layout(filtered_graph, iterations=60, scale=5000)
for node_id, (x, y) in layout.items():
 node = filtered_graph.nodes[node_id]
 node['x'] = x
 node['y'] = y

In [None]:
# Plot it with vis.js as raster image on a canvas (less load on browser than d3.js SVG image)
gv.vis(filtered_graph, node_hover_neighborhood=True, layout_algorithm_active=False, large_graph_threshold=10e10)