{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Actor-Movie relations from Wikidata\n",
    "\n",
    "This Jupyter notebook provides an example of using the Python package [gravis](https://pypi.org/project/gravis). The .ipynb file can be found [here](https://github.com/robert-haas/gravis/tree/master/examples).\n",
    "\n",
    "It shows how a **network of actors and movies** can be visualized as bipartite graph (=a graph with two types of nodes, where actor nodes and movie nodes, visually distinguished by color). The data is fetched from **Wikidata** with **SPARQL** (a data query language) and describes the relations between actors and movies they participated in (many entries are missing).\n",
    "\n",
    "## References\n",
    "\n",
    "- [Wikidata](https://www.wikidata.org)\n",
    "    - [Glossary](https://www.wikidata.org/wiki/Wikidata:Glossary)\n",
    "    - [SPARQL tutorial](https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial)\n",
    "    - [Query examples 1](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries)\n",
    "    - [Query examples 2](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples)\n",
    "        - [Example: Characters portrayed by most actors](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Characters_portrayed_by_most_actors)\n",
    "    - [List of properties](https://www.wikidata.org/wiki/Wikidata:List_of_properties)\n",
    "\n",
    "- Other\n",
    "    - [Tutorial: Where do Mayors Come From: Querying Wikidata with Python and SPARQL](https://janakiev.com/blog/wikidata-mayors/)\n",
    "    - [Your First SPARQL Query](https://docs.data.world/tutorials/sparql/Your_First_Sparql_Query.html)\n",
    "\n",
    "- Used here\n",
    "    - Property P18: [image](https://www.wikidata.org/wiki/Property:P18)\n",
    "    - Property P161: [cast member](https://www.wikidata.org/wiki/Property:P161)\n",
    "    - Property P453: [character role](https://www.wikidata.org/wiki/Property:P453)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "import string\n",
    "\n",
    "import gravis as gv\n",
    "import networkx as nx\n",
    "import requests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data generation: Fetch data from Wikidata with a SPARQL query\n",
    "\n",
    "Goal: Fetch data about actors and movies from Wikidata in order to create a bipartite network  of actor-movie relations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fetch_data(num_tries):\n",
    "    url = 'https://query.wikidata.org/sparql'\n",
    "    query = \"\"\"\n",
    "        SELECT ?filmLabel ?actorLabel ?characterLabel ?actorImage ?movieImage ?characterImage\n",
    "        WHERE {\n",
    "          ?film p:P161 [\n",
    "            ps:P161 ?actor;\n",
    "            pq:P453 ?character\n",
    "          ].\n",
    "          OPTIONAL{\n",
    "            ?film wdt:P18 ?filmImage.    # film / has image / filmImage\n",
    "            ?actor wdt:P18 ?actorImage.  # actor / has image / actorImage\n",
    "          }\n",
    "          SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n",
    "        }\n",
    "        LIMIT 100000\n",
    "    \"\"\"\n",
    "    for i in range(num_tries):\n",
    "        try:\n",
    "            print('Try number {}'.format(i+1))\n",
    "            random_string = ''.join(random.choice(string.ascii_letters) for i in range(20))\n",
    "            headers = {'User-Agent': random_string}\n",
    "            params = {'format': 'json', 'query': query}\n",
    "            response = requests.get(url, headers=headers, params=params)\n",
    "            print(response.text)\n",
    "            data = response.json()\n",
    "            break\n",
    "        except Exception:\n",
    "            pass\n",
    "    else:\n",
    "        raise ValueError('Data fetching failed.')\n",
    "    return data\n",
    "\n",
    "\n",
    "data = fetch_data(num_tries=5)\n",
    "print('Number of items:', len(data['results']['bindings']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a bipartite graph of actors and movies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "graph = nx.Graph()\n",
    "\n",
    "for item in data['results']['bindings']:\n",
    "    movie = item['filmLabel']['value']\n",
    "    actor = 'Actor: ' +item['actorLabel']['value']\n",
    "    character = item['characterLabel']['value']\n",
    "    \n",
    "    # Node type 1: Movie (red)\n",
    "    graph.add_node(movie)\n",
    "    node = graph.nodes[movie]\n",
    "    node['type'] = 'Movie'\n",
    "    node['color'] = 'red'\n",
    "    node['label_color'] = 'red'\n",
    "    \n",
    "    # Node type 2: Actor (black)\n",
    "    graph.add_node(actor)\n",
    "    node = graph.nodes[actor]\n",
    "    node['type'] = 'Actor'\n",
    "    node['color'] = 'black'\n",
    "    \n",
    "    # Edge between different node types\n",
    "    graph.add_edge(movie, actor)\n",
    "\n",
    "print('Number of nodes:', len(graph.nodes))\n",
    "print('Number of edges:', len(graph.edges))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_properties(graph):\n",
    "    for node, degree in graph.degree():\n",
    "        graph.nodes[node]['size'] = 10.0 + degree / 10.0\n",
    "\n",
    "add_properties(graph)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot filtered versions of the large graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Filter 1: Egocentric network (=neighborhood of a selected node)\n",
    "\n",
    "- GSU library: [Ego network](https://research.library.gsu.edu/c.php?g=916490&p=6612505)\n",
    "- Science direct topic: [Egocentric network](https://www.sciencedirect.com/topics/computer-science/egocentric-network)\n",
    "\n",
    "Focus on an actor (\"ego\") and show all edges to his direct neighbors (\"alters\") and between them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ego = 'Actor: Anthony Hopkins'\n",
    "\n",
    "ego_graph = nx.ego_graph(graph, ego, radius=2)\n",
    "ego_graph.nodes[ego]['shape'] = 'rectangle'\n",
    "ego_graph.nodes[ego]['color'] = 'green'\n",
    "ego_graph.nodes[ego]['label_color'] = 'green'\n",
    "\n",
    "print('Number of nodes:', len(ego_graph.nodes))\n",
    "print('Number of edges:', len(ego_graph.edges))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gv.d3(ego_graph, node_hover_neighborhood=True, zoom_factor=0.3, node_label_size_factor=0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Filter 2: Only well-connected nodes with degree >= n \n",
    "\n",
    "Show only actors that play in at least n movies and each movie with at least one such actor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n = 10\n",
    "filtered_graph = graph.copy()\n",
    "\n",
    "# Step 1\n",
    "to_remove = [node for node, degree in graph.degree()\n",
    "             if (degree < n and graph.nodes[node]['type'] == 'Actor')]\n",
    "filtered_graph.remove_nodes_from(to_remove)\n",
    "\n",
    "# Step 2\n",
    "to_remove = [node for node, degree in filtered_graph.degree() if degree < 1]\n",
    "filtered_graph.remove_nodes_from(to_remove)\n",
    "\n",
    "print('Number of nodes:', len(filtered_graph.nodes))\n",
    "print('Number of edges:', len(filtered_graph.edges))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use a precalculated layout: Fruchterman-Reingold\n",
    "layout = nx.spring_layout(filtered_graph, iterations=60, scale=5000)\n",
    "for node_id, (x, y) in layout.items():\n",
    "    node = filtered_graph.nodes[node_id]\n",
    "    node['x'] = x\n",
    "    node['y'] = y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot it with vis.js as raster image on a canvas (less load on browser than d3.js SVG image)\n",
    "gv.vis(filtered_graph, node_hover_neighborhood=True, layout_algorithm_active=False, large_graph_threshold=10e10)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}