{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Neighboring country graph\n", "\n", "This Jupyter notebook provides an example of using the Python package [gravis](https://pypi.org/project/gravis). The .ipynb file can be found [here](https://github.com/robert-haas/gravis/tree/master/examples).\n", "\n", "It uses geographical data extracted from a Wikipedia page to represent **countries and their neighboring countries** as undirected graph, where each country is a node and each edge a neighborhood relation. Additionaly, data from Gapminder is used to show **national indicators and statistics** such as GDP, life expectancy or child mortality as node properties (e.g. size, color, shape).\n", "\n", "## References\n", "\n", "- Wikipedia: [List of countries and territories by land and maritime borders](https://en.wikipedia.org/wiki/List_of_countries_and_territories_by_land_and_maritime_borders)\n", "- Gapminder: [Bulk data](https://www.gapminder.org/data/) = all indicators displayed in Gapminder World" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import os\n", "\n", "import gravis as gv\n", "import networkx as nx\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data fetching\n", "\n", "### Load countries and their neighboring countries\n", "\n", "The data was fetched previously from a Wikipedia site with another notebook in the data directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filepath = os.path.join('data', 'neighboring_countries.json')\n", "with open(filepath) as f:\n", " country_data = json.load(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load national indicators and statistics\n", "\n", "The data was downloaded previously by hand from the Gapminder website." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filepath = os.path.join('data', 'country_statistics_gapminder.csv')\n", "df = pd.read_csv(filepath)\n", "df = df.fillna(0)\n", "\n", "indicator_name = list(df.columns)\n", "indicator_data = df.values.T.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Map country names from dataset 2 onto those in dataset 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gapminder_to_wikipeda_name = {\n", " \"Cote d'Ivoire\": \"Côte d'Ivoire\",\n", " \"Congo, Dem. Rep.\": \"Democratic Republic of the Congo\",\n", " \"Congo, Rep.\": \"Republic of the Congo\",\n", " \"Gambia\": \"The Gambia\",\n", " \"Holy See\": \"Vatican City\",\n", " \"Kyrgyz Republic\": \"Kyrgyzstan\",\n", " \"Lao\": \"Laos\",\n", " \"Micronesia, Fed. Sts.\": \"Federated States of Micronesia\",\n", " \"Sao Tome and Principe\": \"São Tomé and Príncipe\",\n", " \"Slovak Republic\": \"Slovakia\",\n", " \"St. Kitts and Nevis\": \"Saint Kitts and Nevis\",\n", " \"St. Lucia\": \"Saint Lucia\",\n", " \"St. Vincent and the Grenadines\": \"Saint Vincent and the Grenadines\",\n", " \"Swaziland\": \"Eswatini (Swaziland)\",\n", " \"Timor-Leste\": \"East Timor\",\n", "}\n", "# Comments on others:\n", "# Kyrgyzstan: \"Kyrgyz Republic\" is the official name according to wiki\n", "# Macedonia: Seems to be a part of Greece according to wiki, not a country itself but a region \n", "# Abkhazia: self-declared sovereign state in the South Caucasus, Georgian–Abkhazian conflict\n", "# Adélie Land: a claimed territory on the continent of Antarctica, France\n", "# Akrotiri and Dhekelia: a British Overseas Territory on the island of Cyprus\n", "# American Samoa: an unincorporated territory of the United States, southeast of Samoa\n", "# Anguilla: a British overseas territory in the Caribbean\n", "# Guam is the largest region of Micronesia\n", "\n", "indicator_data[0] = [gapminder_to_wikipeda_name.get(c, c) for c in indicator_data[0]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "graph = nx.Graph()\n", "known_edges = set()\n", "for source, targets in country_data.items():\n", " for target in targets:\n", " if (target, source) not in known_edges:\n", " known_edges.add((source, target))\n", " graph.add_edge(source, target)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate graph properties" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def detect_communities(graph, num_communities):\n", " community_generator = nx.algorithms.community.girvan_newman(graph)\n", " for i in range(num_communities-2):\n", " communities = next(community_generator)\n", " return communities\n", "\n", "\n", "def assign_node_color_by_community(graph, communities, colors=None):\n", " if colors is None:\n", " colors = [\"blue\", \"orange\", \"green\", \"red\", \"darkviolet\",\n", " \"brown\", \"pink\", \"gray\", \"yellowgreen\", \"lightblue\", 'cyan']\n", " for community_number, community in enumerate(communities):\n", " for member in community:\n", " graph.nodes[member][\"color\"] = colors[community_number % len(colors)]\n", " return graph\n", "\n", "\n", "def assign_node_position_by_community(graph, communities):\n", " x_shift = -1000\n", " y_shift = -600\n", " for community_number, community in enumerate(communities):\n", " sorted_community_members = sorted(list(community), key=lambda name: graph.nodes[name][\"degree\"])\n", " for member_number, member in enumerate(sorted_community_members):\n", " graph.nodes[member][\"x\"] = x_shift + member_number * 50\n", " graph.nodes[member][\"y\"] = y_shift + community_number * 100\n", " return graph\n", "\n", "\n", "def assign_node_size_by_degree(graph):\n", " for node_id in graph.nodes:\n", " graph.nodes[node_id][\"degree\"] = 5 + graph.degree[node_id]\n", " return graph\n", "\n", "\n", "def assign_edge_size_by_centrality(graph):\n", " edge_centralities = nx.algorithms.centrality.edge_betweenness_centrality(graph)\n", " #edge_centralities = nx.algorithms.centrality.edge_current_flow_betweenness_centrality(graph)\n", " for edge_id, centrality_value in edge_centralities.items():\n", " graph.edges[edge_id][\"centrality\"] = 0.2 + centrality_value * 40\n", " return graph\n", "\n", "\n", "communities = detect_communities(graph, 12)\n", "graph = assign_node_size_by_degree(graph)\n", "graph = assign_edge_size_by_centrality(graph)\n", "graph = assign_node_color_by_community(graph, communities)\n", "graph = assign_node_position_by_community(graph, communities)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add node properties from Gapminder data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "countries = indicator_data[0]\n", "\n", "for i, indicator in enumerate(indicator_name):\n", " if i == 0:\n", " continue\n", " for country, number in zip(countries, indicator_data[i]):\n", " if country in ['Macedonia, FYR']:\n", " continue\n", " graph.nodes[country][indicator] = number" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot country graph\n", "\n", "Notes on how you can interact with the plot:\n", "\n", "- The node positions are fixed in the beginning (y value is determined by community, x value by node degree). Individual nodes can be released by dragging them. All nodes can be released in the `Nodes` menu with the `Release fixed nodes` button.\n", "- The node sizes are initially determined by the degree of each node, i.e. the number of edges it has. The `Data selection` menu has a drop-down menu called `Node size` where for example `Life expectancy` can be chosen, so that the node size reflects the life expectancy of each country. Further the sizes can be normalized, so that rather than using the raw value, it is adapted to lie within a certain minimum and maximum size for better visual appearance." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "node_size_data_sources = [\n", " 'degree',\n", " 'Median age [Years] 2020',\n", " 'Babies per woman 2018',\n", " 'Child mortality [0-5 year olds dying per 1000 born] 2018',\n", "]\n", "\n", "for source in node_size_data_sources:\n", " print()\n", " print(source)\n", " fig = gv.d3(\n", " graph, zoom_factor=0.33, use_centering_force=False,\n", " node_hover_neighborhood=True, node_label_rotation=15,\n", " node_size_data_source=source, use_node_size_normalization=True, node_size_normalization_max=50,\n", " edge_size_data_source='centrality')\n", " fig.display(inline=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 2 }