CKG¶

This notebook explores the biomedical knowledge graph provided by the project Clinical Knowledge Graph (CKG): Publication (2023), Website, Code, Data

The source file of this notebook is ckg.ipynb and can be found in the repository awesome-biomedical-knowledge-graphs that also contains information about similar projects.

Table of contents¶

  1. Setup
  2. Data download
  3. Data extraction
  4. Data import
  5. Data inspection
  6. Schema discovery
  7. Knowledge graph reconstruction
  8. Subgraph exploration

1. Setup¶

This section prepares the environment for the following exploratory data analysis.

a) Import packages¶

From the Python standard library.

In [1]:
import json
import os

From the Python Package Index (PyPI).

In [2]:
import dask.dataframe as dd
import gravis as gv
import igraph as ig
In [3]:
# A docker installation is required to load and convert the Neo4j database inside a suitable container
import docker

From a local Python module named shared_bmkg.py. The functions in it are used in several similar notebooks to reduce code repetition and to improve readability.

In [4]:
import shared_bmkg

b) Create data directories¶

The raw data provided by the project and the transformed data generated throughout this notebook are stored in separate directories. If the notebook is run more than once, the downloaded data is reused instead of fetching it again, but all data transformations are rerun.

In [5]:
project_name = "ckg"
download_dir = os.path.join(project_name, "downloads")
results_dir = os.path.join(project_name, "results")

shared_bmkg.create_dir(download_dir)
shared_bmkg.create_dir(results_dir)

2. Data download¶

This section fetches the data published by the project on Mendeley Data. The latest available version at the time of creating this notebook was used: Version 3 (2021-08-17).

All files provided by the project¶

  • ckg_latest_4.2.3.dump: Neo4j graph database with nodes, edges and annotations.
  • data.zip: Data about experimental studies presented in the publication.

Files needed to create the knowledge graph¶

  • ckg_latest_4.2.3.dump contains all information required for reconstructing the knowledge graph.
In [6]:
download_specification = [
    ("ckg_latest_4.2.3.dump", "https://data.mendeley.com/public-files/datasets/mrcf7f4tc2/files/ffaab45e-e15c-412d-b63b-5df681a2e303/file_downloaded", "eebe895e2e9f9fc39f3663fbcea032d5"),
    ("data.zip", "https://data.mendeley.com/public-files/datasets/mrcf7f4tc2/files/69de0ef6-6e71-4d8e-8fbc-b933b9fc4dce/file_downloaded", "1bbd8be31efcd559244b26f474d9ad59"),

    # APOC library for Neo4j to enable exports of the knowledge graph in CSV format
    ("apoc-4.4.0.24-all.jar", "https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.4.0.24/apoc-4.4.0.24-all.jar", "7c6a702322b0aaf663c25f378cd3494d"),
]

for filename, url, md5 in download_specification:
    filepath = os.path.join(download_dir, filename)
    shared_bmkg.fetch_file(url, filepath)
    shared_bmkg.validate_file(filepath, md5)
    print()
Found a full local copy of "ckg/downloads/ckg_latest_4.2.3.dump".
MD5 checksum is correct.

Found a full local copy of "ckg/downloads/data.zip".
MD5 checksum is correct.

Found a full local copy of "ckg/downloads/apoc-4.4.0.24-all.jar".
MD5 checksum is correct.

3. Data extraction¶

This section extracts the raw data from the Neo4j database into two CSV files. It uses a Docker container with a specific Neo4j+APOC installation and parameterization that allows to load and export the database dump file. It was tricky to figure out a configuration that works with this specific dump file.

In [7]:
def check_docker_installation():
    try:
        client = docker.from_env()
        
        # Pull the hello-world image
        client.images.pull('hello-world')
        
        # Run the hello-world container
        container = client.containers.run('hello-world', detach=True)
        output = container.logs()
        
        # Check the output for the expected message
        if b"Hello from Docker!" in output:
            print("Docker is installed and running properly. The hello-world container output is correct.")
            success = True
        else:
            print("Docker is installed but the hello-world container output is incorrect.")
            success = False
        
        # Clean-up: Remove the container and image
        container.remove()
        client.images.remove('hello-world')
        return success
    except (DockerException, ImageNotFound, ContainerError) as e:
        print(f"Docker is not installed or not running properly: {e}")
        return False


check_docker_installation()
Docker is installed and running properly. The hello-world container output is correct.
Out[7]:
True
In [8]:
def run_docker_command(container, command, verbose):
    # Report the command
    if verbose:
        print('Running command:', command)

    # Run the command
    exit_code, output = container.exec_run(command)

    # Report the result
    if verbose:
        print(' Worked!' if exit_code == 0 else f'Failed! Exit code: {exit_code}')
        try:
            output = output.decode()
            output = output.strip()
        except Exception:
            pass
        if output:
            print(' Output:', output)
        print()


def convert_neo4j_dump_to_csv_files(download_dir, results_dir, filename_db, filename_apoc, verbose=False):
    # Install APOC and load the export functions: https://neo4j.com/docs/apoc/5/installation
    # Use the export functions: https://neo4j.com/labs/apoc/4.4/overview/apoc.export

    # Construct source filepaths
    filepath_db_src = os.path.join(download_dir, filename_db)
    filepath_apoc_src = os.path.join(download_dir, filename_apoc)

    # Create a temporary destination directory
    dirname_dst = "tempdir_for_docker"
    dirpath_dst = os.path.abspath(os.path.join(download_dir, dirname_dst))
    os.makedirs(dirpath_dst, exist_ok=True)

    # Create the destination filepaths
    filepath_db_dst = os.path.join(dirpath_dst, "neo4j.dump")
    filepath_apoc_dst = os.path.join(dirpath_dst, filename_apoc)

    # Move files from source to destination directory
    os.rename(filepath_db_src, filepath_db_dst)
    os.rename(filepath_apoc_src, filepath_apoc_dst)

    # Fetch and run the Neo4j v4.4 Docker container
    client = docker.from_env()
    if verbose:
        print('Starting Docker container:', end=' ')
    container = client.containers.run(
        "neo4j:4.4",
        detach=True,
        volumes={dirpath_dst: {'bind': '/mnt', 'mode': 'rw'}},
        working_dir='/mnt',
        entrypoint='tail -f /dev/null',
    )
    if verbose:
        print(container)
        print()

    # Convert the .dump file to a .csv file
    try:
        commands = [
            # Configure Neo4j for the import
            '''sh -c "echo 'dbms.allow_upgrade=true' >> /var/lib/neo4j/conf/neo4j.conf"''',
            '''sh -c "echo 'dbms.security.procedures.allowlist=apoc.*' >> /var/lib/neo4j/conf/neo4j.conf"''',
            '''sh -c "echo 'dbms.security.procedures.unrestricted=apoc.*' >> /var/lib/neo4j/conf/neo4j.conf"''',
            '''sh -c "echo 'apoc.export.file.enabled=true' >> /var/lib/neo4j/conf/neo4j.conf"''',
            '''sh -c "neo4j-admin set-initial-password correcthorsebatterystaple"''',
            # Make APOC extension available to Neo4j
            f'''sh -c "cp /mnt/{filename_apoc} /var/lib/neo4j/plugins"''',
            # Import the .dump file
            f'''sh -c "neo4j-admin load --from=/mnt/neo4j.dump --database=neo4j --force"''',
            # Start the database
            '''sh -c "rm -rf /var/lib/neo4j/logs && neo4j start"''',
            # Wait so it is ready
            '''sh -c "sleep 60"''',
            # Export a .csv file for nodes
            '''cypher-shell -u neo4j -p correcthorsebatterystaple -d neo4j "CALL apoc.export.csv.query('MATCH (n) RETURN id(n) as id, labels(n)[0] as type, properties(n) as properties', 'nodes.csv', {})"''',
            # Export a .csv file for edges
            '''cypher-shell -u neo4j -p correcthorsebatterystaple -d neo4j "CALL apoc.export.csv.query('MATCH ()-[r]->() RETURN id(startNode(r)) as source_id, id(endNode(r)) as target_id, type(r) as type, properties(r) as properties', 'edges.csv', {})"''',
            # Move the .csv files to the mounted directory to make them available outside the container
            '''sh -c "mv /var/lib/neo4j/import/nodes.csv /mnt"''',
            '''sh -c "mv /var/lib/neo4j/import/edges.csv /mnt"''',
        ]
        for cmd in commands:
            run_docker_command(container, cmd, verbose)
    except Exception as e:
        print(e)
    finally:
        if container:
            if verbose:
                print('Stopping and removing the container')
            container.stop()
            container.remove()

        # Move the original files from temporary destination to source directory
        os.rename(filepath_db_dst, filepath_db_src)
        os.rename(filepath_apoc_dst, filepath_apoc_src)

        # Move the new .csv file from temporary destination to results directory
        os.rename(os.path.join(dirpath_dst, "nodes.csv"), os.path.join(results_dir, "nodes.csv"))
        os.rename(os.path.join(dirpath_dst, "edges.csv"), os.path.join(results_dir, "edges.csv"))

        # Delete the temporary destination directory
        os.rmdir(dirpath_dst)
    return container
In [9]:
%%time

filepath_nodes = os.path.join(results_dir, "nodes.csv")
filepath_edges = os.path.join(results_dir, "edges.csv")

if not (os.path.exists(filepath_nodes) and os.path.exists(filepath_edges)):
    print(f"Starting the conversion of the .dump file to two .csv files.")
    # takes about 45min to create nodes.csv and edges.csv
    container = convert_neo4j_dump_to_csv_files(
        download_dir,
        results_dir,
        "ckg_latest_4.2.3.dump",
        "apoc-4.4.0.24-all.jar",
        verbose=True,
    )
else:
    print(f"Found existing files {filepath_nodes} and {filepath_edges}. Using these instead of performing another conversion.")
Starting the conversion of the .dump file to two .csv files.
Starting Docker container: <Container: 8e4fa0a8be75>

Running command: sh -c "echo 'dbms.allow_upgrade=true' >> /var/lib/neo4j/conf/neo4j.conf"
 Worked!

Running command: sh -c "echo 'dbms.security.procedures.allowlist=apoc.*' >> /var/lib/neo4j/conf/neo4j.conf"
 Worked!

Running command: sh -c "echo 'dbms.security.procedures.unrestricted=apoc.*' >> /var/lib/neo4j/conf/neo4j.conf"
 Worked!

Running command: sh -c "echo 'apoc.export.file.enabled=true' >> /var/lib/neo4j/conf/neo4j.conf"
 Worked!

Running command: sh -c "neo4j-admin set-initial-password correcthorsebatterystaple"
 Worked!
 Output: Selecting JVM - Version:11.0.22+7, Name:OpenJDK 64-Bit Server VM, Vendor:Eclipse Adoptium
Changed password for user 'neo4j'. IMPORTANT: this change will only take effect if performed before the database is started for the first time.

Running command: sh -c "cp /mnt/apoc-4.4.0.24-all.jar /var/lib/neo4j/plugins"
 Worked!

Running command: sh -c "neo4j-admin load --from=/mnt/neo4j.dump --database=neo4j --force"
 Worked!
 Output: Selecting JVM - Version:11.0.22+7, Name:OpenJDK 64-Bit Server VM, Vendor:Eclipse Adoptium

Files: 1/185, data:  0.0%
Files: 1/185, data:  0.0%
Files: 2/185, data:  0.0%
Files: 3/185, data:  0.0%
Files: 4/185, data:  0.0%
Files: 5/185, data:  0.0%
Files: 5/185, data:  0.0%
Files: 6/185, data:  0.0%
Files: 7/185, data:  0.0%
Files: 8/185, data:  0.0%
Files: 9/185, data:  0.0%
Files: 10/185, data:  0.1%
Files: 10/185, data:  0.1%
Files: 10/185, data:  0.2%
Files: 10/185, data:  0.2%
Files: 10/185, data:  0.3%
Files: 10/185, data:  0.3%
Files: 10/185, data:  0.4%
Files: 10/185, data:  0.5%
Files: 10/185, data:  0.5%
Files: 10/185, data:  0.6%
Files: 10/185, data:  0.6%
Files: 10/185, data:  0.7%
Files: 10/185, data:  0.7%
Files: 10/185, data:  0.8%
Files: 10/185, data:  0.8%
Files: 10/185, data:  0.9%
Files: 10/185, data:  0.9%
Files: 10/185, data:  1.0%
Files: 10/185, data:  1.1%
Files: 10/185, data:  1.1%
Files: 10/185, data:  1.2%
Files: 10/185, data:  1.2%
Files: 10/185, data:  1.2%
Files: 11/185, data:  1.2%
Files: 12/185, data:  1.2%
Files: 13/185, data:  1.2%
Files: 14/185, data:  1.3%
Files: 14/185, data:  1.3%
Files: 14/185, data:  1.4%
Files: 14/185, data:  1.4%
Files: 14/185, data:  1.5%
Files: 14/185, data:  1.6%
Files: 14/185, data:  1.6%
Files: 14/185, data:  1.7%
Files: 14/185, data:  1.7%
Files: 14/185, data:  1.8%
Files: 14/185, data:  1.9%
Files: 14/185, data:  1.9%
Files: 14/185, data:  2.0%
Files: 14/185, data:  2.1%
Files: 14/185, data:  2.1%
Files: 14/185, data:  2.2%
Files: 14/185, data:  2.3%
Files: 14/185, data:  2.4%
Files: 14/185, data:  2.6%
Files: 14/185, data:  2.7%
Files: 14/185, data:  2.8%
Files: 14/185, data:  3.0%
Files: 14/185, data:  3.1%
Files: 14/185, data:  3.2%
Files: 14/185, data:  3.2%
Files: 14/185, data:  3.3%
Files: 14/185, data:  3.4%
Files: 14/185, data:  3.5%
Files: 14/185, data:  3.5%
Files: 14/185, data:  3.6%
Files: 14/185, data:  3.6%
Files: 14/185, data:  3.7%
Files: 14/185, data:  3.8%
Files: 14/185, data:  3.9%
Files: 14/185, data:  3.9%
Files: 14/185, data:  4.0%
Files: 14/185, data:  4.1%
Files: 14/185, data:  4.1%
Files: 14/185, data:  4.2%
Files: 14/185, data:  4.3%
Files: 14/185, data:  4.3%
Files: 14/185, data:  4.4%
Files: 14/185, data:  4.5%
Files: 14/185, data:  4.6%
Files: 14/185, data:  4.6%
Files: 14/185, data:  4.7%
Files: 14/185, data:  4.8%
Files: 14/185, data:  4.9%
Files: 14/185, data:  5.0%
Files: 14/185, data:  5.1%
Files: 14/185, data:  5.2%
Files: 14/185, data:  5.2%
Files: 14/185, data:  5.3%
Files: 14/185, data:  5.4%
Files: 14/185, data:  5.5%
Files: 14/185, data:  5.6%
Files: 14/185, data:  5.6%
Files: 14/185, data:  5.7%
Files: 14/185, data:  5.8%
Files: 14/185, data:  5.9%
Files: 14/185, data:  6.0%
Files: 14/185, data:  6.1%
Files: 14/185, data:  6.2%
Files: 14/185, data:  6.2%
Files: 14/185, data:  6.3%
Files: 14/185, data:  6.4%
Files: 14/185, data:  6.4%
Files: 14/185, data:  6.5%
Files: 14/185, data:  6.5%
Files: 14/185, data:  6.6%
Files: 14/185, data:  6.6%
Files: 14/185, data:  6.7%
Files: 14/185, data:  6.8%
Files: 14/185, data:  6.8%
Files: 14/185, data:  6.9%
Files: 14/185, data:  6.9%
Files: 14/185, data:  7.0%
Files: 14/185, data:  7.1%
Files: 14/185, data:  7.1%
Files: 14/185, data:  7.2%
Files: 14/185, data:  7.3%
Files: 14/185, data:  7.3%
Files: 14/185, data:  7.4%
Files: 14/185, data:  7.4%
Files: 14/185, data:  7.5%
Files: 14/185, data:  7.5%
Files: 14/185, data:  7.6%
Files: 14/185, data:  7.7%
Files: 14/185, data:  7.7%
Files: 14/185, data:  7.8%
Files: 14/185, data:  7.8%
Files: 14/185, data:  7.9%
Files: 14/185, data:  7.9%
Files: 14/185, data:  8.0%
Files: 14/185, data:  8.1%
Files: 14/185, data:  8.1%
Files: 14/185, data:  8.2%
Files: 14/185, data:  8.2%
Files: 14/185, data:  8.3%
Files: 14/185, data:  8.4%
Files: 14/185, data:  8.4%
Files: 14/185, data:  8.5%
Files: 14/185, data:  8.5%
Files: 14/185, data:  8.6%
Files: 14/185, data:  8.6%
Files: 14/185, data:  8.7%
Files: 14/185, data:  8.8%
Files: 14/185, data:  8.8%
Files: 14/185, data:  8.9%
Files: 14/185, data:  8.9%
Files: 14/185, data:  9.0%
Files: 14/185, data:  9.0%
Files: 14/185, data:  9.1%
Files: 14/185, data:  9.2%
Files: 14/185, data:  9.2%
Files: 14/185, data:  9.3%
Files: 14/185, data:  9.3%
Files: 14/185, data:  9.4%
Files: 14/185, data:  9.4%
Files: 14/185, data:  9.5%
Files: 14/185, data:  9.5%
Files: 14/185, data:  9.6%
Files: 14/185, data:  9.7%
Files: 14/185, data:  9.7%
Files: 14/185, data:  9.8%
Files: 14/185, data:  9.8%
Files: 14/185, data:  9.9%
Files: 14/185, data:  9.9%
Files: 14/185, data: 10.0%
Files: 14/185, data: 10.1%
Files: 14/185, data: 10.1%
Files: 14/185, data: 10.2%
Files: 14/185, data: 10.2%
Files: 14/185, data: 10.3%
Files: 14/185, data: 10.3%
Files: 14/185, data: 10.4%
Files: 14/185, data: 10.5%
Files: 14/185, data: 10.5%
Files: 14/185, data: 10.6%
Files: 14/185, data: 10.6%
Files: 14/185, data: 10.7%
Files: 14/185, data: 10.7%
Files: 14/185, data: 10.8%
Files: 14/185, data: 10.8%
Files: 14/185, data: 10.9%
Files: 14/185, data: 10.9%
Files: 14/185, data: 11.0%
Files: 14/185, data: 11.0%
Files: 14/185, data: 11.1%
Files: 14/185, data: 11.1%
Files: 14/185, data: 11.2%
Files: 14/185, data: 11.2%
Files: 14/185, data: 11.3%
Files: 14/185, data: 11.3%
Files: 14/185, data: 11.3%
Files: 14/185, data: 11.4%
Files: 14/185, data: 11.4%
Files: 14/185, data: 11.5%
Files: 14/185, data: 11.5%
Files: 14/185, data: 11.6%
Files: 14/185, data: 11.6%
Files: 14/185, data: 11.7%
Files: 14/185, data: 11.7%
Files: 14/185, data: 11.8%
Files: 14/185, data: 11.8%
Files: 14/185, data: 11.9%
Files: 14/185, data: 11.9%
Files: 14/185, data: 11.9%
Files: 14/185, data: 12.0%
Files: 14/185, data: 12.0%
Files: 14/185, data: 12.1%
Files: 14/185, data: 12.1%
Files: 14/185, data: 12.2%
Files: 14/185, data: 12.2%
Files: 14/185, data: 12.3%
Files: 14/185, data: 12.3%
Files: 14/185, data: 12.4%
Files: 14/185, data: 12.4%
Files: 14/185, data: 12.5%
Files: 14/185, data: 12.5%
Files: 14/185, data: 12.6%
Files: 14/185, data: 12.6%
Files: 14/185, data: 12.7%
Files: 14/185, data: 12.7%
Files: 14/185, data: 12.8%
Files: 14/185, data: 12.8%
Files: 14/185, data: 12.9%
Files: 14/185, data: 12.9%
Files: 14/185, data: 13.0%
Files: 14/185, data: 13.0%
Files: 14/185, data: 13.1%
Files: 14/185, data: 13.1%
Files: 14/185, data: 13.2%
Files: 14/185, data: 13.3%
Files: 14/185, data: 13.3%
Files: 14/185, data: 13.4%
Files: 14/185, data: 13.4%
Files: 14/185, data: 13.5%
Files: 14/185, data: 13.5%
Files: 14/185, data: 13.6%
Files: 14/185, data: 13.6%
Files: 14/185, data: 13.7%
Files: 14/185, data: 13.7%
Files: 14/185, data: 13.8%
Files: 14/185, data: 13.8%
Files: 14/185, data: 13.9%
Files: 14/185, data: 14.0%
Files: 14/185, data: 14.0%
Files: 14/185, data: 14.1%
Files: 14/185, data: 14.1%
Files: 14/185, data: 14.2%
Files: 14/185, data: 14.2%
Files: 14/185, data: 14.3%
Files: 14/185, data: 14.3%
Files: 14/185, data: 14.4%
Files: 14/185, data: 14.4%
Files: 14/185, data: 14.5%
Files: 14/185, data: 14.5%
Files: 14/185, data: 14.6%
Files: 14/185, data: 14.6%
Files: 14/185, data: 14.7%
Files: 14/185, data: 14.8%
Files: 14/185, data: 14.8%
Files: 14/185, data: 14.9%
Files: 14/185, data: 14.9%
Files: 14/185, data: 15.0%
Files: 14/185, data: 15.0%
Files: 14/185, data: 15.1%
Files: 14/185, data: 15.1%
Files: 14/185, data: 15.2%
Files: 14/185, data: 15.2%
Files: 14/185, data: 15.3%
Files: 14/185, data: 15.3%
Files: 14/185, data: 15.4%
Files: 14/185, data: 15.4%
Files: 14/185, data: 15.5%
Files: 14/185, data: 15.6%
Files: 14/185, data: 15.6%
Files: 14/185, data: 15.7%
Files: 14/185, data: 15.7%
Files: 14/185, data: 15.8%
Files: 14/185, data: 15.8%
Files: 14/185, data: 15.9%
Files: 14/185, data: 15.9%
Files: 14/185, data: 16.0%
Files: 14/185, data: 16.1%
Files: 14/185, data: 16.1%
Files: 14/185, data: 16.2%
Files: 14/185, data: 16.2%
Files: 14/185, data: 16.3%
Files: 14/185, data: 16.3%
Files: 14/185, data: 16.4%
Files: 14/185, data: 16.4%
Files: 14/185, data: 16.5%
Files: 14/185, data: 16.6%
Files: 14/185, data: 16.6%
Files: 14/185, data: 16.7%
Files: 14/185, data: 16.7%
Files: 14/185, data: 16.8%
Files: 14/185, data: 16.9%
Files: 14/185, data: 16.9%
Files: 14/185, data: 17.0%
Files: 14/185, data: 17.0%
Files: 14/185, data: 17.1%
Files: 14/185, data: 17.2%
Files: 14/185, data: 17.2%
Files: 14/185, data: 17.3%
Files: 14/185, data: 17.4%
Files: 14/185, data: 17.4%
Files: 14/185, data: 17.5%
Files: 14/185, data: 17.5%
Files: 14/185, data: 17.6%
Files: 14/185, data: 17.6%
Files: 14/185, data: 17.7%
Files: 14/185, data: 17.8%
Files: 14/185, data: 17.8%
Files: 14/185, data: 17.9%
Files: 14/185, data: 17.9%
Files: 14/185, data: 18.0%
Files: 14/185, data: 18.1%
Files: 14/185, data: 18.1%
Files: 14/185, data: 18.2%
Files: 14/185, data: 18.3%
Files: 14/185, data: 18.3%
Files: 14/185, data: 18.4%
Files: 14/185, data: 18.4%
Files: 14/185, data: 18.5%
Files: 14/185, data: 18.5%
Files: 14/185, data: 18.6%
Files: 14/185, data: 18.7%
Files: 14/185, data: 18.7%
Files: 14/185, data: 18.8%
Files: 14/185, data: 18.8%
Files: 14/185, data: 18.9%
Files: 14/185, data: 19.0%
Files: 14/185, data: 19.0%
Files: 14/185, data: 19.1%
Files: 14/185, data: 19.1%
Files: 14/185, data: 19.2%
Files: 14/185, data: 19.2%
Files: 14/185, data: 19.3%
Files: 14/185, data: 19.3%
Files: 14/185, data: 19.4%
Files: 14/185, data: 19.4%
Files: 14/185, data: 19.5%
Files: 14/185, data: 19.5%
Files: 14/185, data: 19.6%
Files: 14/185, data: 19.6%
Files: 14/185, data: 19.7%
Files: 14/185, data: 19.7%
Files: 14/185, data: 19.8%
Files: 14/185, data: 19.8%
Files: 14/185, data: 19.8%
Files: 14/185, data: 19.9%
Files: 14/185, data: 19.9%
Files: 14/185, data: 20.0%
Files: 14/185, data: 20.0%
Files: 14/185, data: 20.1%
Files: 14/185, data: 20.1%
Files: 14/185, data: 20.2%
Files: 14/185, data: 20.2%
Files: 14/185, data: 20.3%
Files: 14/185, data: 20.3%
Files: 14/185, data: 20.4%
Files: 14/185, data: 20.4%
Files: 14/185, data: 20.5%
Files: 14/185, data: 20.5%
Files: 14/185, data: 20.6%
Files: 14/185, data: 20.6%
Files: 14/185, data: 20.7%
Files: 14/185, data: 20.7%
Files: 14/185, data: 20.8%
Files: 14/185, data: 20.9%
Files: 14/185, data: 21.0%
Files: 14/185, data: 21.0%
Files: 14/185, data: 21.1%
Files: 14/185, data: 21.2%
Files: 14/185, data: 21.3%
Files: 14/185, data: 21.4%
Files: 14/185, data: 21.5%
Files: 14/185, data: 21.5%
Files: 14/185, data: 21.6%
Files: 14/185, data: 21.7%
Files: 14/185, data: 21.8%
Files: 14/185, data: 21.8%
Files: 14/185, data: 21.9%
Files: 14/185, data: 22.0%
Files: 14/185, data: 22.1%
Files: 14/185, data: 22.2%
Files: 14/185, data: 22.3%
Files: 14/185, data: 22.3%
Files: 14/185, data: 22.4%
Files: 14/185, data: 22.5%
Files: 14/185, data: 22.6%
Files: 14/185, data: 22.7%
Files: 14/185, data: 22.8%
Files: 14/185, data: 22.8%
Files: 14/185, data: 22.9%
Files: 14/185, data: 23.0%
Files: 14/185, data: 23.1%
Files: 14/185, data: 23.1%
Files: 14/185, data: 23.2%
Files: 14/185, data: 23.3%
Files: 14/185, data: 23.4%
Files: 14/185, data: 23.5%
Files: 14/185, data: 23.5%
Files: 14/185, data: 23.6%
Files: 14/185, data: 23.7%
Files: 14/185, data: 23.7%
Files: 14/185, data: 23.8%
Files: 14/185, data: 23.9%
Files: 14/185, data: 24.0%
Files: 14/185, data: 24.0%
Files: 14/185, data: 24.1%
Files: 14/185, data: 24.2%
Files: 14/185, data: 24.3%
Files: 14/185, data: 24.3%
Files: 14/185, data: 24.4%
Files: 14/185, data: 24.5%
Files: 14/185, data: 24.5%
Files: 14/185, data: 24.6%
Files: 14/185, data: 24.7%
Files: 14/185, data: 24.8%
Files: 14/185, data: 24.9%
Files: 14/185, data: 24.9%
Files: 14/185, data: 25.0%
Files: 14/185, data: 25.1%
Files: 14/185, data: 25.1%
Files: 14/185, data: 25.2%
Files: 14/185, data: 25.3%
Files: 14/185, data: 25.3%
Files: 14/185, data: 25.4%
Files: 14/185, data: 25.4%
Files: 14/185, data: 25.5%
Files: 14/185, data: 25.6%
Files: 14/185, data: 25.6%
Files: 14/185, data: 25.7%
Files: 14/185, data: 25.8%
Files: 14/185, data: 25.8%
Files: 14/185, data: 25.9%
Files: 14/185, data: 26.0%
Files: 14/185, data: 26.0%
Files: 14/185, data: 26.1%
Files: 14/185, data: 26.2%
Files: 14/185, data: 26.2%
Files: 14/185, data: 26.3%
Files: 14/185, data: 26.3%
Files: 14/185, data: 26.4%
Files: 14/185, data: 26.5%
Files: 14/185, data: 26.5%
Files: 14/185, data: 26.6%
Files: 14/185, data: 26.7%
Files: 14/185, data: 26.7%
Files: 14/185, data: 26.8%
Files: 14/185, data: 26.9%
Files: 14/185, data: 26.9%
Files: 14/185, data: 27.0%
Files: 14/185, data: 27.1%
Files: 14/185, data: 27.2%
Files: 14/185, data: 27.2%
Files: 14/185, data: 27.3%
Files: 14/185, data: 27.3%
Files: 14/185, data: 27.4%
Files: 14/185, data: 27.5%
Files: 14/185, data: 27.5%
Files: 14/185, data: 27.6%
Files: 14/185, data: 27.7%
Files: 14/185, data: 27.8%
Files: 14/185, data: 27.9%
Files: 14/185, data: 28.0%
Files: 14/185, data: 28.1%
Files: 14/185, data: 28.1%
Files: 15/185, data: 28.1%
Files: 15/185, data: 28.2%
Files: 15/185, data: 28.2%
Files: 15/185, data: 28.3%
Files: 15/185, data: 28.4%
Files: 15/185, data: 28.5%
Files: 15/185, data: 28.6%
Files: 15/185, data: 28.7%
Files: 15/185, data: 28.8%
Files: 15/185, data: 28.9%
Files: 15/185, data: 29.0%
Files: 15/185, data: 29.0%
Files: 15/185, data: 29.2%
Files: 15/185, data: 29.2%
Files: 15/185, data: 29.3%
Files: 15/185, data: 29.4%
Files: 15/185, data: 29.5%
Files: 15/185, data: 29.6%
Files: 15/185, data: 29.7%
Files: 15/185, data: 29.8%
Files: 15/185, data: 29.9%
Files: 15/185, data: 30.0%
Files: 15/185, data: 30.1%
Files: 15/185, data: 30.2%
Files: 15/185, data: 30.3%
Files: 15/185, data: 30.4%
Files: 15/185, data: 30.5%
Files: 15/185, data: 30.6%
Files: 15/185, data: 30.7%
Files: 15/185, data: 30.8%
Files: 15/185, data: 30.9%
Files: 15/185, data: 31.0%
Files: 15/185, data: 31.1%
Files: 15/185, data: 31.2%
Files: 15/185, data: 31.4%
Files: 15/185, data: 31.5%
Files: 15/185, data: 31.6%
Files: 15/185, data: 31.7%
Files: 15/185, data: 31.8%
Files: 15/185, data: 31.9%
Files: 15/185, data: 32.0%
Files: 15/185, data: 32.1%
Files: 15/185, data: 32.2%
Files: 15/185, data: 32.3%
Files: 15/185, data: 32.4%
Files: 15/185, data: 32.5%
Files: 15/185, data: 32.7%
Files: 15/185, data: 32.8%
Files: 15/185, data: 32.9%
Files: 15/185, data: 33.0%
Files: 15/185, data: 33.1%
Files: 15/185, data: 33.2%
Files: 15/185, data: 33.3%
Files: 15/185, data: 33.4%
Files: 15/185, data: 33.5%
Files: 15/185, data: 33.6%
Files: 15/185, data: 33.7%
Files: 15/185, data: 33.9%
Files: 15/185, data: 34.0%
Files: 15/185, data: 34.1%
Files: 15/185, data: 34.2%
Files: 15/185, data: 34.3%
Files: 15/185, data: 34.4%
Files: 15/185, data: 34.5%
Files: 15/185, data: 34.6%
Files: 16/185, data: 34.6%
Files: 17/185, data: 34.6%
Files: 17/185, data: 34.7%
Files: 18/185, data: 34.7%
Files: 19/185, data: 34.7%
Files: 20/185, data: 34.7%
Files: 21/185, data: 34.7%
Files: 22/185, data: 34.7%
Files: 22/185, data: 34.8%
Files: 22/185, data: 34.9%
Files: 22/185, data: 34.9%
Files: 22/185, data: 35.0%
Files: 22/185, data: 35.1%
Files: 22/185, data: 35.1%
Files: 22/185, data: 35.2%
Files: 22/185, data: 35.3%
Files: 22/185, data: 35.4%
Files: 22/185, data: 35.5%
Files: 22/185, data: 35.6%
Files: 22/185, data: 35.7%
Files: 22/185, data: 35.8%
Files: 22/185, data: 35.8%
Files: 22/185, data: 35.9%
Files: 22/185, data: 36.0%
Files: 22/185, data: 36.1%
Files: 22/185, data: 36.2%
Files: 22/185, data: 36.3%
Files: 22/185, data: 36.4%
Files: 22/185, data: 36.5%
Files: 22/185, data: 36.6%
Files: 22/185, data: 36.7%
Files: 22/185, data: 36.8%
Files: 22/185, data: 36.9%
Files: 22/185, data: 36.9%
Files: 22/185, data: 37.0%
Files: 22/185, data: 37.1%
Files: 22/185, data: 37.2%
Files: 22/185, data: 37.3%
Files: 22/185, data: 37.4%
Files: 22/185, data: 37.4%
Files: 22/185, data: 37.5%
Files: 22/185, data: 37.6%
Files: 22/185, data: 37.7%
Files: 22/185, data: 37.7%
Files: 22/185, data: 37.8%
Files: 22/185, data: 37.8%
Files: 22/185, data: 37.9%
Files: 22/185, data: 38.0%
Files: 22/185, data: 38.1%
Files: 22/185, data: 38.1%
Files: 22/185, data: 38.2%
Files: 22/185, data: 38.3%
Files: 22/185, data: 38.3%
Files: 22/185, data: 38.4%
Files: 22/185, data: 38.5%
Files: 22/185, data: 38.6%
Files: 22/185, data: 38.6%
Files: 22/185, data: 38.7%
Files: 22/185, data: 38.8%
Files: 22/185, data: 38.8%
Files: 22/185, data: 38.9%
Files: 22/185, data: 39.0%
Files: 22/185, data: 39.0%
Files: 22/185, data: 39.1%
Files: 22/185, data: 39.2%
Files: 22/185, data: 39.2%
Files: 22/185, data: 39.3%
Files: 22/185, data: 39.4%
Files: 22/185, data: 39.4%
Files: 22/185, data: 39.5%
Files: 22/185, data: 39.6%
Files: 22/185, data: 39.6%
Files: 22/185, data: 39.7%
Files: 22/185, data: 39.8%
Files: 22/185, data: 39.9%
Files: 22/185, data: 39.9%
Files: 22/185, data: 40.0%
Files: 22/185, data: 40.1%
Files: 22/185, data: 40.1%
Files: 22/185, data: 40.2%
Files: 22/185, data: 40.3%
Files: 22/185, data: 40.3%
Files: 22/185, data: 40.4%
Files: 22/185, data: 40.5%
Files: 22/185, data: 40.5%
Files: 22/185, data: 40.6%
Files: 22/185, data: 40.7%
Files: 22/185, data: 40.8%
Files: 22/185, data: 40.8%
Files: 22/185, data: 40.9%
Files: 22/185, data: 41.0%
Files: 22/185, data: 41.1%
Files: 22/185, data: 41.1%
Files: 22/185, data: 41.2%
Files: 22/185, data: 41.3%
Files: 22/185, data: 41.3%
Files: 22/185, data: 41.4%
Files: 22/185, data: 41.5%
Files: 22/185, data: 41.6%
Files: 22/185, data: 41.6%
Files: 22/185, data: 41.7%
Files: 22/185, data: 41.8%
Files: 22/185, data: 41.9%
Files: 22/185, data: 41.9%
Files: 22/185, data: 42.0%
Files: 22/185, data: 42.1%
Files: 22/185, data: 42.1%
Files: 22/185, data: 42.2%
Files: 22/185, data: 42.3%
Files: 22/185, data: 42.4%
Files: 22/185, data: 42.4%
Files: 22/185, data: 42.5%
Files: 22/185, data: 42.6%
Files: 22/185, data: 42.7%
Files: 22/185, data: 42.7%
Files: 22/185, data: 42.8%
Files: 22/185, data: 42.9%
Files: 22/185, data: 43.0%
Files: 22/185, data: 43.0%
Files: 22/185, data: 43.1%
Files: 22/185, data: 43.2%
Files: 22/185, data: 43.3%
Files: 22/185, data: 43.3%
Files: 22/185, data: 43.4%
Files: 22/185, data: 43.5%
Files: 22/185, data: 43.6%
Files: 22/185, data: 43.6%
Files: 22/185, data: 43.7%
Files: 22/185, data: 43.8%
Files: 22/185, data: 43.9%
Files: 22/185, data: 43.9%
Files: 22/185, data: 44.0%
Files: 22/185, data: 44.1%
Files: 22/185, data: 44.1%
Files: 22/185, data: 44.2%
Files: 22/185, data: 44.3%
Files: 22/185, data: 44.3%
Files: 22/185, data: 44.4%
Files: 22/185, data: 44.5%
Files: 22/185, data: 44.6%
Files: 22/185, data: 44.6%
Files: 22/185, data: 44.7%
Files: 22/185, data: 44.8%
Files: 22/185, data: 44.8%
Files: 22/185, data: 44.9%
Files: 22/185, data: 45.0%
Files: 22/185, data: 45.0%
Files: 22/185, data: 45.1%
Files: 22/185, data: 45.2%
Files: 22/185, data: 45.2%
Files: 22/185, data: 45.3%
Files: 22/185, data: 45.4%
Files: 22/185, data: 45.5%
Files: 22/185, data: 45.6%
Files: 22/185, data: 45.7%
Files: 22/185, data: 45.8%
Files: 22/185, data: 45.9%
Files: 22/185, data: 46.0%
Files: 22/185, data: 46.1%
Files: 22/185, data: 46.2%
Files: 22/185, data: 46.3%
Files: 22/185, data: 46.4%
Files: 22/185, data: 46.5%
Files: 22/185, data: 46.6%
Files: 22/185, data: 46.6%
Files: 22/185, data: 46.7%
Files: 22/185, data: 46.7%
Files: 22/185, data: 46.8%
Files: 22/185, data: 46.9%
Files: 22/185, data: 47.0%
Files: 22/185, data: 47.0%
Files: 22/185, data: 47.1%
Files: 22/185, data: 47.2%
Files: 22/185, data: 47.2%
Files: 22/185, data: 47.3%
Files: 22/185, data: 47.3%
Files: 22/185, data: 47.4%
Files: 22/185, data: 47.4%
Files: 22/185, data: 47.5%
Files: 22/185, data: 47.6%
Files: 22/185, data: 47.7%
Files: 22/185, data: 47.8%
Files: 22/185, data: 47.8%
Files: 22/185, data: 47.9%
Files: 22/185, data: 48.0%
Files: 22/185, data: 48.1%
Files: 22/185, data: 48.2%
Files: 22/185, data: 48.2%
Files: 22/185, data: 48.3%
Files: 22/185, data: 48.4%
Files: 22/185, data: 48.5%
Files: 22/185, data: 48.6%
Files: 22/185, data: 48.6%
Files: 22/185, data: 48.7%
Files: 22/185, data: 48.8%
Files: 22/185, data: 48.8%
Files: 22/185, data: 48.9%
Files: 22/185, data: 49.0%
Files: 22/185, data: 49.0%
Files: 22/185, data: 49.1%
Files: 22/185, data: 49.2%
Files: 22/185, data: 49.2%
Files: 22/185, data: 49.3%
Files: 22/185, data: 49.4%
Files: 22/185, data: 49.5%
Files: 22/185, data: 49.6%
Files: 22/185, data: 49.6%
Files: 22/185, data: 49.7%
Files: 22/185, data: 49.7%
Files: 22/185, data: 49.8%
Files: 22/185, data: 49.9%
Files: 22/185, data: 49.9%
Files: 22/185, data: 50.0%
Files: 22/185, data: 50.1%
Files: 22/185, data: 50.1%
Files: 22/185, data: 50.2%
Files: 22/185, data: 50.3%
Files: 22/185, data: 50.3%
Files: 22/185, data: 50.4%
Files: 22/185, data: 50.5%
Files: 22/185, data: 50.6%
Files: 22/185, data: 50.6%
Files: 22/185, data: 50.7%
Files: 22/185, data: 50.8%
Files: 22/185, data: 50.8%
Files: 22/185, data: 50.9%
Files: 22/185, data: 51.0%
Files: 22/185, data: 51.1%
Files: 22/185, data: 51.1%
Files: 22/185, data: 51.2%
Files: 22/185, data: 51.3%
Files: 22/185, data: 51.3%
Files: 23/185, data: 51.3%
Files: 24/185, data: 51.4%
Files: 24/185, data: 51.4%
Files: 24/185, data: 51.5%
Files: 24/185, data: 51.5%
Files: 24/185, data: 51.6%
Files: 25/185, data: 51.6%
Files: 26/185, data: 51.6%
Files: 26/185, data: 51.6%
Files: 26/185, data: 51.7%
Files: 26/185, data: 51.7%
Files: 26/185, data: 51.8%
Files: 26/185, data: 51.8%
Files: 26/185, data: 51.9%
Files: 26/185, data: 51.9%
Files: 26/185, data: 52.0%
Files: 26/185, data: 52.0%
Files: 26/185, data: 52.1%
Files: 26/185, data: 52.1%
Files: 26/185, data: 52.1%
Files: 26/185, data: 52.2%
Files: 26/185, data: 52.2%
Files: 26/185, data: 52.3%
Files: 26/185, data: 52.3%
Files: 26/185, data: 52.4%
Files: 26/185, data: 52.4%
Files: 26/185, data: 52.5%
Files: 26/185, data: 52.5%
Files: 26/185, data: 52.6%
Files: 26/185, data: 52.6%
Files: 26/185, data: 52.7%
Files: 26/185, data: 52.7%
Files: 26/185, data: 52.8%
Files: 26/185, data: 52.8%
Files: 26/185, data: 52.9%
Files: 26/185, data: 52.9%
Files: 26/185, data: 53.0%
Files: 26/185, data: 53.0%
Files: 26/185, data: 53.1%
Files: 26/185, data: 53.1%
Files: 26/185, data: 53.2%
Files: 26/185, data: 53.2%
Files: 26/185, data: 53.3%
Files: 26/185, data: 53.3%
Files: 26/185, data: 53.4%
Files: 26/185, data: 53.4%
Files: 26/185, data: 53.5%
Files: 26/185, data: 53.5%
Files: 26/185, data: 53.6%
Files: 26/185, data: 53.6%
Files: 26/185, data: 53.6%
Files: 26/185, data: 53.7%
Files: 26/185, data: 53.7%
Files: 26/185, data: 53.8%
Files: 26/185, data: 53.8%
Files: 26/185, data: 53.9%
Files: 26/185, data: 53.9%
Files: 26/185, data: 54.0%
Files: 26/185, data: 54.0%
Files: 26/185, data: 54.1%
Files: 26/185, data: 54.1%
Files: 26/185, data: 54.2%
Files: 26/185, data: 54.2%
Files: 26/185, data: 54.3%
Files: 26/185, data: 54.3%
Files: 26/185, data: 54.4%
Files: 26/185, data: 54.4%
Files: 26/185, data: 54.5%
Files: 26/185, data: 54.5%
Files: 26/185, data: 54.6%
Files: 26/185, data: 54.6%
Files: 26/185, data: 54.7%
Files: 26/185, data: 54.8%
Files: 26/185, data: 54.8%
Files: 26/185, data: 54.9%
Files: 26/185, data: 54.9%
Files: 26/185, data: 55.0%
Files: 26/185, data: 55.0%
Files: 26/185, data: 55.1%
Files: 26/185, data: 55.2%
Files: 26/185, data: 55.2%
Files: 26/185, data: 55.2%
Files: 26/185, data: 55.3%
Files: 26/185, data: 55.3%
Files: 26/185, data: 55.4%
Files: 26/185, data: 55.4%
Files: 26/185, data: 55.5%
Files: 26/185, data: 55.5%
Files: 26/185, data: 55.6%
Files: 26/185, data: 55.6%
Files: 26/185, data: 55.7%
Files: 26/185, data: 55.8%
Files: 26/185, data: 55.8%
Files: 26/185, data: 55.9%
Files: 26/185, data: 55.9%
Files: 26/185, data: 56.0%
Files: 26/185, data: 56.0%
Files: 26/185, data: 56.1%
Files: 26/185, data: 56.2%
Files: 26/185, data: 56.2%
Files: 26/185, data: 56.3%
Files: 26/185, data: 56.3%
Files: 26/185, data: 56.4%
Files: 26/185, data: 56.5%
Files: 26/185, data: 56.5%
Files: 26/185, data: 56.6%
Files: 26/185, data: 56.6%
Files: 26/185, data: 56.7%
Files: 26/185, data: 56.7%
Files: 26/185, data: 56.8%
Files: 26/185, data: 56.9%
Files: 26/185, data: 56.9%
Files: 26/185, data: 57.0%
Files: 26/185, data: 57.0%
Files: 26/185, data: 57.1%
Files: 26/185, data: 57.2%
Files: 26/185, data: 57.2%
Files: 26/185, data: 57.3%
Files: 26/185, data: 57.3%
Files: 26/185, data: 57.4%
Files: 26/185, data: 57.4%
Files: 26/185, data: 57.5%
Files: 26/185, data: 57.6%
Files: 26/185, data: 57.6%
Files: 26/185, data: 57.7%
Files: 26/185, data: 57.8%
Files: 26/185, data: 57.8%
Files: 26/185, data: 57.9%
Files: 26/185, data: 57.9%
Files: 26/185, data: 58.0%
Files: 26/185, data: 58.0%
Files: 26/185, data: 58.1%
Files: 26/185, data: 58.2%
Files: 26/185, data: 58.2%
Files: 26/185, data: 58.3%
Files: 26/185, data: 58.3%
Files: 26/185, data: 58.4%
Files: 26/185, data: 58.4%
Files: 26/185, data: 58.5%
Files: 26/185, data: 58.6%
Files: 26/185, data: 58.6%
Files: 26/185, data: 58.7%
Files: 26/185, data: 58.7%
Files: 26/185, data: 58.8%
Files: 26/185, data: 58.8%
Files: 26/185, data: 58.9%
Files: 26/185, data: 59.0%
Files: 26/185, data: 59.0%
Files: 26/185, data: 59.1%
Files: 26/185, data: 59.1%
Files: 26/185, data: 59.2%
Files: 26/185, data: 59.2%
Files: 26/185, data: 59.3%
Files: 26/185, data: 59.3%
Files: 26/185, data: 59.4%
Files: 26/185, data: 59.5%
Files: 26/185, data: 59.5%
Files: 26/185, data: 59.6%
Files: 26/185, data: 59.6%
Files: 26/185, data: 59.7%
Files: 26/185, data: 59.7%
Files: 26/185, data: 59.8%
Files: 26/185, data: 59.8%
Files: 26/185, data: 59.9%
Files: 26/185, data: 59.9%
Files: 26/185, data: 60.0%
Files: 26/185, data: 60.1%
Files: 26/185, data: 60.1%
Files: 26/185, data: 60.2%
Files: 26/185, data: 60.2%
Files: 26/185, data: 60.3%
Files: 26/185, data: 60.3%
Files: 26/185, data: 60.4%
Files: 26/185, data: 60.4%
Files: 26/185, data: 60.5%
Files: 26/185, data: 60.5%
Files: 26/185, data: 60.5%
Files: 26/185, data: 60.6%
Files: 26/185, data: 60.6%
Files: 26/185, data: 60.7%
Files: 26/185, data: 60.8%
Files: 26/185, data: 60.8%
Files: 26/185, data: 60.9%
Files: 26/185, data: 60.9%
Files: 26/185, data: 61.0%
Files: 26/185, data: 61.0%
Files: 26/185, data: 61.1%
Files: 26/185, data: 61.1%
Files: 26/185, data: 61.2%
Files: 26/185, data: 61.2%
Files: 26/185, data: 61.2%
Files: 26/185, data: 61.3%
Files: 26/185, data: 61.3%
Files: 26/185, data: 61.4%
Files: 26/185, data: 61.4%
Files: 26/185, data: 61.4%
Files: 26/185, data: 61.5%
Files: 26/185, data: 61.5%
Files: 26/185, data: 61.5%
Files: 26/185, data: 61.6%
Files: 26/185, data: 61.6%
Files: 26/185, data: 61.6%
Files: 26/185, data: 61.7%
Files: 26/185, data: 61.7%
Files: 26/185, data: 61.8%
Files: 26/185, data: 61.8%
Files: 26/185, data: 61.8%
Files: 26/185, data: 61.9%
Files: 26/185, data: 61.9%
Files: 26/185, data: 61.9%
Files: 26/185, data: 62.0%
Files: 26/185, data: 62.0%
Files: 26/185, data: 62.1%
Files: 26/185, data: 62.1%
Files: 26/185, data: 62.2%
Files: 26/185, data: 62.2%
Files: 26/185, data: 62.3%
Files: 26/185, data: 62.3%
Files: 26/185, data: 62.4%
Files: 26/185, data: 62.4%
Files: 26/185, data: 62.5%
Files: 26/185, data: 62.5%
Files: 26/185, data: 62.5%
Files: 26/185, data: 62.6%
Files: 26/185, data: 62.7%
Files: 26/185, data: 62.7%
Files: 26/185, data: 62.8%
Files: 26/185, data: 62.8%
Files: 26/185, data: 62.8%
Files: 26/185, data: 62.9%
Files: 26/185, data: 62.9%
Files: 26/185, data: 63.0%
Files: 26/185, data: 63.0%
Files: 26/185, data: 63.1%
Files: 26/185, data: 63.1%
Files: 26/185, data: 63.1%
Files: 26/185, data: 63.2%
Files: 26/185, data: 63.2%
Files: 26/185, data: 63.3%
Files: 26/185, data: 63.3%
Files: 26/185, data: 63.3%
Files: 26/185, data: 63.4%
Files: 26/185, data: 63.4%
Files: 26/185, data: 63.4%
Files: 26/185, data: 63.5%
Files: 26/185, data: 63.5%
Files: 26/185, data: 63.5%
Files: 26/185, data: 63.6%
Files: 26/185, data: 63.6%
Files: 26/185, data: 63.7%
Files: 26/185, data: 63.7%
Files: 26/185, data: 63.7%
Files: 26/185, data: 63.8%
Files: 26/185, data: 63.8%
Files: 26/185, data: 63.8%
Files: 26/185, data: 63.9%
Files: 26/185, data: 63.9%
Files: 26/185, data: 64.0%
Files: 26/185, data: 64.0%
Files: 26/185, data: 64.1%
Files: 26/185, data: 64.1%
Files: 26/185, data: 64.2%
Files: 26/185, data: 64.2%
Files: 26/185, data: 64.3%
Files: 26/185, data: 64.3%
Files: 26/185, data: 64.4%
Files: 26/185, data: 64.4%
Files: 26/185, data: 64.5%
Files: 26/185, data: 64.5%
Files: 26/185, data: 64.6%
Files: 26/185, data: 64.7%
Files: 26/185, data: 64.7%
Files: 26/185, data: 64.7%
Files: 26/185, data: 64.8%
Files: 26/185, data: 64.8%
Files: 26/185, data: 64.9%
Files: 26/185, data: 64.9%
Files: 26/185, data: 65.0%
Files: 26/185, data: 65.0%
Files: 26/185, data: 65.1%
Files: 26/185, data: 65.1%
Files: 26/185, data: 65.2%
Files: 26/185, data: 65.2%
Files: 26/185, data: 65.2%
Files: 26/185, data: 65.3%
Files: 26/185, data: 65.3%
Files: 26/185, data: 65.3%
Files: 26/185, data: 65.4%
Files: 26/185, data: 65.4%
Files: 26/185, data: 65.4%
Files: 26/185, data: 65.5%
Files: 26/185, data: 65.5%
Files: 26/185, data: 65.5%
Files: 26/185, data: 65.6%
Files: 26/185, data: 65.6%
Files: 26/185, data: 65.6%
Files: 26/185, data: 65.7%
Files: 26/185, data: 65.7%
Files: 26/185, data: 65.8%
Files: 26/185, data: 65.8%
Files: 26/185, data: 65.8%
Files: 26/185, data: 65.9%
Files: 26/185, data: 65.9%
Files: 26/185, data: 66.0%
Files: 26/185, data: 66.0%
Files: 26/185, data: 66.1%
Files: 26/185, data: 66.1%
Files: 26/185, data: 66.2%
Files: 26/185, data: 66.2%
Files: 26/185, data: 66.3%
Files: 26/185, data: 66.4%
Files: 26/185, data: 66.4%
Files: 26/185, data: 66.5%
Files: 26/185, data: 66.5%
Files: 26/185, data: 66.5%
Files: 26/185, data: 66.6%
Files: 26/185, data: 66.6%
Files: 26/185, data: 66.7%
Files: 26/185, data: 66.7%
Files: 26/185, data: 66.8%
Files: 26/185, data: 66.8%
Files: 26/185, data: 66.8%
Files: 26/185, data: 66.9%
Files: 26/185, data: 66.9%
Files: 26/185, data: 67.0%
Files: 26/185, data: 67.0%
Files: 26/185, data: 67.0%
Files: 26/185, data: 67.1%
Files: 26/185, data: 67.1%
Files: 26/185, data: 67.1%
Files: 26/185, data: 67.2%
Files: 26/185, data: 67.2%
Files: 26/185, data: 67.2%
Files: 26/185, data: 67.3%
Files: 26/185, data: 67.3%
Files: 26/185, data: 67.3%
Files: 26/185, data: 67.4%
Files: 26/185, data: 67.4%
Files: 26/185, data: 67.4%
Files: 26/185, data: 67.5%
Files: 26/185, data: 67.5%
Files: 26/185, data: 67.5%
Files: 26/185, data: 67.6%
Files: 26/185, data: 67.6%
Files: 26/185, data: 67.6%
Files: 26/185, data: 67.7%
Files: 26/185, data: 67.7%
Files: 26/185, data: 67.8%
Files: 26/185, data: 67.8%
Files: 26/185, data: 67.9%
Files: 26/185, data: 67.9%
Files: 26/185, data: 68.0%
Files: 26/185, data: 68.0%
Files: 26/185, data: 68.1%
Files: 26/185, data: 68.1%
Files: 26/185, data: 68.1%
Files: 26/185, data: 68.2%
Files: 26/185, data: 68.2%
Files: 26/185, data: 68.3%
Files: 26/185, data: 68.3%
Files: 26/185, data: 68.4%
Files: 26/185, data: 68.4%
Files: 26/185, data: 68.5%
Files: 26/185, data: 68.5%
Files: 26/185, data: 68.6%
Files: 26/185, data: 68.6%
Files: 26/185, data: 68.7%
Files: 26/185, data: 68.7%
Files: 26/185, data: 68.7%
Files: 26/185, data: 68.8%
Files: 26/185, data: 68.8%
Files: 26/185, data: 68.9%
Files: 26/185, data: 68.9%
Files: 26/185, data: 69.0%
Files: 26/185, data: 69.0%
Files: 26/185, data: 69.0%
Files: 26/185, data: 69.1%
Files: 26/185, data: 69.1%
Files: 26/185, data: 69.1%
Files: 26/185, data: 69.2%
Files: 26/185, data: 69.2%
Files: 26/185, data: 69.2%
Files: 26/185, data: 69.3%
Files: 26/185, data: 69.3%
Files: 26/185, data: 69.3%
Files: 26/185, data: 69.3%
Files: 26/185, data: 69.4%
Files: 26/185, data: 69.4%
Files: 26/185, data: 69.4%
Files: 26/185, data: 69.5%
Files: 26/185, data: 69.5%
Files: 26/185, data: 69.6%
Files: 26/185, data: 69.6%
Files: 26/185, data: 69.6%
Files: 26/185, data: 69.7%
Files: 26/185, data: 69.7%
Files: 26/185, data: 69.8%
Files: 26/185, data: 69.8%
Files: 26/185, data: 69.9%
Files: 26/185, data: 69.9%
Files: 26/185, data: 69.9%
Files: 26/185, data: 70.0%
Files: 26/185, data: 70.1%
Files: 26/185, data: 70.1%
Files: 26/185, data: 70.2%
Files: 26/185, data: 70.2%
Files: 26/185, data: 70.3%
Files: 26/185, data: 70.3%
Files: 26/185, data: 70.4%
Files: 26/185, data: 70.4%
Files: 26/185, data: 70.4%
Files: 26/185, data: 70.5%
Files: 26/185, data: 70.5%
Files: 26/185, data: 70.6%
Files: 26/185, data: 70.6%
Files: 26/185, data: 70.6%
Files: 26/185, data: 70.7%
Files: 26/185, data: 70.7%
Files: 26/185, data: 70.7%
Files: 26/185, data: 70.8%
Files: 26/185, data: 70.8%
Files: 26/185, data: 70.9%
Files: 26/185, data: 70.9%
Files: 26/185, data: 70.9%
Files: 26/185, data: 70.9%
Files: 26/185, data: 71.0%
Files: 26/185, data: 71.0%
Files: 26/185, data: 71.1%
Files: 26/185, data: 71.1%
Files: 26/185, data: 71.1%
Files: 26/185, data: 71.2%
Files: 26/185, data: 71.2%
Files: 26/185, data: 71.2%
Files: 26/185, data: 71.3%
Files: 26/185, data: 71.3%
Files: 26/185, data: 71.3%
Files: 26/185, data: 71.4%
Files: 26/185, data: 71.4%
Files: 26/185, data: 71.5%
Files: 26/185, data: 71.5%
Files: 26/185, data: 71.6%
Files: 26/185, data: 71.6%
Files: 26/185, data: 71.7%
Files: 26/185, data: 71.8%
Files: 26/185, data: 71.8%
Files: 26/185, data: 71.9%
Files: 26/185, data: 71.9%
Files: 26/185, data: 72.0%
Files: 26/185, data: 72.0%
Files: 26/185, data: 72.1%
Files: 26/185, data: 72.1%
Files: 26/185, data: 72.2%
Files: 26/185, data: 72.2%
Files: 26/185, data: 72.3%
Files: 26/185, data: 72.4%
Files: 26/185, data: 72.4%
Files: 26/185, data: 72.5%
Files: 26/185, data: 72.5%
Files: 26/185, data: 72.6%
Files: 26/185, data: 72.6%
Files: 26/185, data: 72.7%
Files: 26/185, data: 72.7%
Files: 26/185, data: 72.8%
Files: 26/185, data: 72.9%
Files: 26/185, data: 72.9%
Files: 26/185, data: 73.0%
Files: 26/185, data: 73.0%
Files: 26/185, data: 73.1%
Files: 26/185, data: 73.2%
Files: 26/185, data: 73.2%
Files: 26/185, data: 73.3%
Files: 26/185, data: 73.3%
Files: 26/185, data: 73.4%
Files: 26/185, data: 73.4%
Files: 26/185, data: 73.5%
Files: 26/185, data: 73.6%
Files: 26/185, data: 73.6%
Files: 26/185, data: 73.7%
Files: 26/185, data: 73.7%
Files: 26/185, data: 73.8%
Files: 26/185, data: 73.8%
Files: 26/185, data: 73.9%
Files: 26/185, data: 74.0%
Files: 26/185, data: 74.0%
Files: 26/185, data: 74.0%
Files: 26/185, data: 74.1%
Files: 26/185, data: 74.1%
Files: 26/185, data: 74.1%
Files: 26/185, data: 74.2%
Files: 26/185, data: 74.3%
Files: 26/185, data: 74.3%
Files: 26/185, data: 74.4%
Files: 26/185, data: 74.4%
Files: 26/185, data: 74.5%
Files: 26/185, data: 74.5%
Files: 26/185, data: 74.6%
Files: 26/185, data: 74.6%
Files: 26/185, data: 74.7%
Files: 26/185, data: 74.7%
Files: 26/185, data: 74.8%
Files: 26/185, data: 74.8%
Files: 26/185, data: 74.9%
Files: 26/185, data: 74.9%
Files: 26/185, data: 75.0%
Files: 26/185, data: 75.1%
Files: 26/185, data: 75.1%
Files: 26/185, data: 75.2%
Files: 26/185, data: 75.2%
Files: 26/185, data: 75.3%
Files: 26/185, data: 75.3%
Files: 26/185, data: 75.4%
Files: 26/185, data: 75.4%
Files: 26/185, data: 75.5%
Files: 26/185, data: 75.5%
Files: 26/185, data: 75.6%
Files: 26/185, data: 75.6%
Files: 26/185, data: 75.7%
Files: 26/185, data: 75.7%
Files: 26/185, data: 75.8%
Files: 26/185, data: 75.8%
Files: 26/185, data: 75.9%
Files: 26/185, data: 76.0%
Files: 26/185, data: 76.0%
Files: 26/185, data: 76.0%
Files: 26/185, data: 76.1%
Files: 26/185, data: 76.1%
Files: 26/185, data: 76.1%
Files: 26/185, data: 76.2%
Files: 26/185, data: 76.2%
Files: 26/185, data: 76.3%
Files: 26/185, data: 76.3%
Files: 26/185, data: 76.4%
Files: 26/185, data: 76.4%
Files: 26/185, data: 76.5%
Files: 26/185, data: 76.6%
Files: 26/185, data: 76.6%
Files: 26/185, data: 76.7%
Files: 26/185, data: 76.7%
Files: 26/185, data: 76.8%
Files: 26/185, data: 76.9%
Files: 26/185, data: 76.9%
Files: 26/185, data: 77.0%
Files: 26/185, data: 77.0%
Files: 26/185, data: 77.1%
Files: 26/185, data: 77.1%
Files: 26/185, data: 77.2%
Files: 26/185, data: 77.3%
Files: 26/185, data: 77.3%
Files: 26/185, data: 77.4%
Files: 26/185, data: 77.4%
Files: 26/185, data: 77.5%
Files: 26/185, data: 77.5%
Files: 26/185, data: 77.6%
Files: 26/185, data: 77.7%
Files: 26/185, data: 77.7%
Files: 26/185, data: 77.8%
Files: 26/185, data: 77.8%
Files: 26/185, data: 77.9%
Files: 26/185, data: 77.9%
Files: 26/185, data: 78.0%
Files: 26/185, data: 78.0%
Files: 26/185, data: 78.1%
Files: 26/185, data: 78.1%
Files: 26/185, data: 78.2%
Files: 26/185, data: 78.3%
Files: 26/185, data: 78.3%
Files: 26/185, data: 78.4%
Files: 26/185, data: 78.4%
Files: 26/185, data: 78.5%
Files: 26/185, data: 78.5%
Files: 26/185, data: 78.6%
Files: 26/185, data: 78.7%
Files: 26/185, data: 78.7%
Files: 26/185, data: 78.7%
Files: 26/185, data: 78.8%
Files: 26/185, data: 78.9%
Files: 26/185, data: 78.9%
Files: 26/185, data: 79.0%
Files: 26/185, data: 79.0%
Files: 26/185, data: 79.1%
Files: 26/185, data: 79.1%
Files: 26/185, data: 79.2%
Files: 26/185, data: 79.2%
Files: 26/185, data: 79.3%
Files: 26/185, data: 79.4%
Files: 26/185, data: 79.4%
Files: 26/185, data: 79.5%
Files: 26/185, data: 79.5%
Files: 26/185, data: 79.6%
Files: 26/185, data: 79.6%
Files: 26/185, data: 79.7%
Files: 26/185, data: 79.7%
Files: 26/185, data: 79.8%
Files: 26/185, data: 79.8%
Files: 26/185, data: 79.9%
Files: 26/185, data: 79.9%
Files: 26/185, data: 80.0%
Files: 26/185, data: 80.0%
Files: 26/185, data: 80.1%
Files: 26/185, data: 80.1%
Files: 26/185, data: 80.2%
Files: 26/185, data: 80.2%
Files: 26/185, data: 80.3%
Files: 26/185, data: 80.3%
Files: 26/185, data: 80.4%
Files: 26/185, data: 80.5%
Files: 26/185, data: 80.5%
Files: 26/185, data: 80.6%
Files: 26/185, data: 80.6%
Files: 26/185, data: 80.7%
Files: 26/185, data: 80.7%
Files: 26/185, data: 80.8%
Files: 26/185, data: 80.8%
Files: 26/185, data: 80.9%
Files: 26/185, data: 81.0%
Files: 26/185, data: 81.0%
Files: 26/185, data: 81.1%
Files: 26/185, data: 81.1%
Files: 26/185, data: 81.2%
Files: 26/185, data: 81.3%
Files: 26/185, data: 81.3%
Files: 26/185, data: 81.4%
Files: 26/185, data: 81.4%
Files: 26/185, data: 81.5%
Files: 26/185, data: 81.5%
Files: 26/185, data: 81.6%
Files: 26/185, data: 81.7%
Files: 26/185, data: 81.7%
Files: 26/185, data: 81.8%
Files: 26/185, data: 81.8%
Files: 26/185, data: 81.9%
Files: 26/185, data: 82.0%
Files: 26/185, data: 82.0%
Files: 26/185, data: 82.1%
Files: 26/185, data: 82.1%
Files: 26/185, data: 82.2%
Files: 26/185, data: 82.3%
Files: 26/185, data: 82.3%
Files: 26/185, data: 82.4%
Files: 26/185, data: 82.4%
Files: 26/185, data: 82.5%
Files: 26/185, data: 82.5%
Files: 26/185, data: 82.6%
Files: 26/185, data: 82.7%
Files: 26/185, data: 82.7%
Files: 26/185, data: 82.8%
Files: 26/185, data: 82.8%
Files: 26/185, data: 82.9%
Files: 26/185, data: 83.0%
Files: 26/185, data: 83.0%
Files: 26/185, data: 83.1%
Files: 26/185, data: 83.1%
Files: 26/185, data: 83.2%
Files: 26/185, data: 83.2%
Files: 26/185, data: 83.3%
Files: 26/185, data: 83.3%
Files: 26/185, data: 83.4%
Files: 26/185, data: 83.4%
Files: 26/185, data: 83.5%
Files: 26/185, data: 83.6%
Files: 26/185, data: 83.6%
Files: 26/185, data: 83.7%
Files: 26/185, data: 83.7%
Files: 26/185, data: 83.8%
Files: 26/185, data: 83.8%
Files: 26/185, data: 83.9%
Files: 26/185, data: 83.9%
Files: 26/185, data: 84.0%
Files: 26/185, data: 84.0%
Files: 26/185, data: 84.1%
Files: 26/185, data: 84.2%
Files: 26/185, data: 84.2%
Files: 26/185, data: 84.3%
Files: 26/185, data: 84.3%
Files: 26/185, data: 84.4%
Files: 26/185, data: 84.5%
Files: 26/185, data: 84.5%
Files: 26/185, data: 84.6%
Files: 26/185, data: 84.6%
Files: 26/185, data: 84.7%
Files: 26/185, data: 84.7%
Files: 26/185, data: 84.8%
Files: 26/185, data: 84.8%
Files: 26/185, data: 84.9%
Files: 26/185, data: 84.9%
Files: 26/185, data: 85.0%
Files: 26/185, data: 85.0%
Files: 26/185, data: 85.1%
Files: 26/185, data: 85.2%
Files: 26/185, data: 85.2%
Files: 26/185, data: 85.2%
Files: 26/185, data: 85.3%
Files: 26/185, data: 85.3%
Files: 26/185, data: 85.4%
Files: 26/185, data: 85.5%
Files: 26/185, data: 85.5%
Files: 26/185, data: 85.5%
Files: 26/185, data: 85.6%
Files: 26/185, data: 85.7%
Files: 26/185, data: 85.7%
Files: 26/185, data: 85.8%
Files: 26/185, data: 85.8%
Files: 26/185, data: 85.9%
Files: 26/185, data: 85.9%
Files: 26/185, data: 86.0%
Files: 26/185, data: 86.0%
Files: 26/185, data: 86.1%
Files: 26/185, data: 86.1%
Files: 26/185, data: 86.2%
Files: 26/185, data: 86.2%
Files: 26/185, data: 86.3%
Files: 26/185, data: 86.3%
Files: 26/185, data: 86.4%
Files: 26/185, data: 86.4%
Files: 26/185, data: 86.5%
Files: 26/185, data: 86.5%
Files: 26/185, data: 86.6%
Files: 26/185, data: 86.6%
Files: 26/185, data: 86.7%
Files: 26/185, data: 86.7%
Files: 26/185, data: 86.8%
Files: 26/185, data: 86.8%
Files: 26/185, data: 86.9%
Files: 26/185, data: 86.9%
Files: 26/185, data: 87.0%
Files: 26/185, data: 87.0%
Files: 26/185, data: 87.1%
Files: 26/185, data: 87.1%
Files: 26/185, data: 87.2%
Files: 26/185, data: 87.2%
Files: 26/185, data: 87.3%
Files: 26/185, data: 87.3%
Files: 26/185, data: 87.4%
Files: 26/185, data: 87.4%
Files: 26/185, data: 87.5%
Files: 26/185, data: 87.5%
Files: 26/185, data: 87.6%
Files: 26/185, data: 87.7%
Files: 26/185, data: 87.7%
Files: 26/185, data: 87.8%
Files: 26/185, data: 87.8%
Files: 26/185, data: 87.9%
Files: 26/185, data: 87.9%
Files: 26/185, data: 88.0%
Files: 26/185, data: 88.0%
Files: 26/185, data: 88.1%
Files: 26/185, data: 88.1%
Files: 26/185, data: 88.2%
Files: 26/185, data: 88.2%
Files: 27/185, data: 88.2%
Files: 27/185, data: 88.3%
Files: 28/185, data: 88.3%
Files: 29/185, data: 88.3%
Files: 30/185, data: 88.3%
Files: 31/185, data: 88.3%
Files: 32/185, data: 88.3%
Files: 33/185, data: 88.3%
Files: 34/185, data: 88.3%
Files: 35/185, data: 88.3%
Files: 36/185, data: 88.3%
Files: 37/185, data: 88.3%
Files: 38/185, data: 88.3%
Files: 39/185, data: 88.3%
Files: 40/185, data: 88.3%
Files: 41/185, data: 88.3%
Files: 42/185, data: 88.3%
Files: 43/185, data: 88.3%
Files: 44/185, data: 88.3%
Files: 45/185, data: 88.3%
Files: 46/185, data: 88.3%
Files: 47/185, data: 88.3%
Files: 48/185, data: 88.3%
Files: 49/185, data: 88.3%
Files: 50/185, data: 88.3%
Files: 51/185, data: 88.3%
Files: 52/185, data: 88.3%
Files: 53/185, data: 88.3%
Files: 54/185, data: 88.3%
Files: 55/185, data: 88.3%
Files: 56/185, data: 88.3%
Files: 57/185, data: 88.3%
Files: 58/185, data: 88.3%
Files: 59/185, data: 88.3%
Files: 60/185, data: 88.3%
Files: 61/185, data: 88.3%
Files: 62/185, data: 88.3%
Files: 63/185, data: 88.3%
Files: 64/185, data: 88.3%
Files: 65/185, data: 88.3%
Files: 66/185, data: 88.3%
Files: 67/185, data: 88.3%
Files: 68/185, data: 88.3%
Files: 69/185, data: 88.3%
Files: 70/185, data: 88.3%
Files: 71/185, data: 88.3%
Files: 72/185, data: 88.3%
Files: 73/185, data: 88.3%
Files: 74/185, data: 88.3%
Files: 74/185, data: 88.3%
Files: 75/185, data: 88.3%
Files: 76/185, data: 88.3%
Files: 77/185, data: 88.3%
Files: 78/185, data: 88.3%
Files: 79/185, data: 88.3%
Files: 80/185, data: 88.3%
Files: 81/185, data: 88.3%
Files: 82/185, data: 88.3%
Files: 83/185, data: 88.3%
Files: 84/185, data: 88.3%
Files: 85/185, data: 88.3%
Files: 86/185, data: 88.3%
Files: 87/185, data: 88.3%
Files: 88/185, data: 88.3%
Files: 89/185, data: 88.3%
Files: 90/185, data: 88.3%
Files: 91/185, data: 88.3%
Files: 92/185, data: 88.3%
Files: 93/185, data: 88.3%
Files: 94/185, data: 88.3%
Files: 95/185, data: 88.3%
Files: 96/185, data: 88.3%
Files: 97/185, data: 88.3%
Files: 98/185, data: 88.3%
Files: 99/185, data: 88.3%
Files: 100/185, data: 88.3%
Files: 101/185, data: 88.3%
Files: 102/185, data: 88.3%
Files: 103/185, data: 88.3%
Files: 104/185, data: 88.3%
Files: 105/185, data: 88.3%
Files: 106/185, data: 88.3%
Files: 107/185, data: 88.3%
Files: 108/185, data: 88.3%
Files: 109/185, data: 88.3%
Files: 110/185, data: 88.3%
Files: 111/185, data: 88.3%
Files: 112/185, data: 88.3%
Files: 113/185, data: 88.3%
Files: 114/185, data: 88.3%
Files: 115/185, data: 88.3%
Files: 116/185, data: 88.3%
Files: 117/185, data: 88.3%
Files: 118/185, data: 88.3%
Files: 119/185, data: 88.3%
Files: 120/185, data: 88.3%
Files: 121/185, data: 88.3%
Files: 122/185, data: 88.3%
Files: 123/185, data: 88.3%
Files: 124/185, data: 88.3%
Files: 125/185, data: 88.3%
Files: 126/185, data: 88.3%
Files: 127/185, data: 88.3%
Files: 128/185, data: 88.3%
Files: 129/185, data: 88.3%
Files: 130/185, data: 88.3%
Files: 131/185, data: 88.3%
Files: 132/185, data: 88.3%
Files: 133/185, data: 88.3%
Files: 134/185, data: 88.3%
Files: 135/185, data: 88.3%
Files: 135/185, data: 88.3%
Files: 136/185, data: 88.3%
Files: 137/185, data: 88.3%
Files: 138/185, data: 88.3%
Files: 139/185, data: 88.3%
Files: 140/185, data: 88.3%
Files: 141/185, data: 88.3%
Files: 142/185, data: 88.3%
Files: 143/185, data: 88.3%
Files: 144/185, data: 88.3%
Files: 145/185, data: 88.3%
Files: 146/185, data: 88.3%
Files: 147/185, data: 88.4%
Files: 147/185, data: 88.4%
Files: 148/185, data: 88.4%
Files: 148/185, data: 88.5%
Files: 149/185, data: 88.5%
Files: 149/185, data: 88.5%
Files: 150/185, data: 88.5%
Files: 151/185, data: 88.6%
Files: 152/185, data: 88.6%
Files: 152/185, data: 88.6%
Files: 152/185, data: 88.7%
Files: 152/185, data: 88.7%
Files: 152/185, data: 88.8%
Files: 153/185, data: 88.8%
Files: 153/185, data: 88.9%
Files: 153/185, data: 88.9%
Files: 153/185, data: 88.9%
Files: 154/185, data: 88.9%
Files: 155/185, data: 88.9%
Files: 156/185, data: 88.9%
Files: 157/185, data: 89.0%
Files: 158/185, data: 89.0%
Files: 159/185, data: 89.0%
Files: 160/185, data: 89.0%
Files: 160/185, data: 89.1%
Files: 160/185, data: 89.1%
Files: 160/185, data: 89.2%
Files: 160/185, data: 89.3%
Files: 160/185, data: 89.4%
Files: 160/185, data: 89.4%
Files: 160/185, data: 89.5%
Files: 160/185, data: 89.6%
Files: 160/185, data: 89.6%
Files: 160/185, data: 89.7%
Files: 160/185, data: 89.8%
Files: 160/185, data: 89.8%
Files: 160/185, data: 89.9%
Files: 160/185, data: 90.0%
Files: 160/185, data: 90.0%
Files: 160/185, data: 90.1%
Files: 160/185, data: 90.2%
Files: 160/185, data: 90.2%
Files: 160/185, data: 90.3%
Files: 160/185, data: 90.4%
Files: 160/185, data: 90.4%
Files: 160/185, data: 90.5%
Files: 160/185, data: 90.6%
Files: 160/185, data: 90.6%
Files: 160/185, data: 90.7%
Files: 160/185, data: 90.8%
Files: 160/185, data: 90.9%
Files: 160/185, data: 90.9%
Files: 160/185, data: 91.0%
Files: 160/185, data: 91.1%
Files: 160/185, data: 91.1%
Files: 160/185, data: 91.2%
Files: 160/185, data: 91.3%
Files: 160/185, data: 91.3%
Files: 160/185, data: 91.4%
Files: 160/185, data: 91.5%
Files: 160/185, data: 91.5%
Files: 160/185, data: 91.6%
Files: 160/185, data: 91.7%
Files: 160/185, data: 91.8%
Files: 160/185, data: 91.8%
Files: 160/185, data: 91.8%
Files: 161/185, data: 91.9%
Files: 161/185, data: 92.0%
Files: 161/185, data: 92.0%
Files: 161/185, data: 92.1%
Files: 161/185, data: 92.2%
Files: 161/185, data: 92.2%
Files: 161/185, data: 92.3%
Files: 161/185, data: 92.4%
Files: 161/185, data: 92.4%
Files: 161/185, data: 92.5%
Files: 161/185, data: 92.6%
Files: 161/185, data: 92.6%
Files: 161/185, data: 92.7%
Files: 161/185, data: 92.8%
Files: 161/185, data: 92.8%
Files: 161/185, data: 92.9%
Files: 161/185, data: 93.0%
Files: 161/185, data: 93.0%
Files: 161/185, data: 93.1%
Files: 161/185, data: 93.2%
Files: 161/185, data: 93.3%
Files: 161/185, data: 93.3%
Files: 161/185, data: 93.4%
Files: 161/185, data: 93.4%
Files: 161/185, data: 93.5%
Files: 161/185, data: 93.6%
Files: 161/185, data: 93.6%
Files: 161/185, data: 93.7%
Files: 161/185, data: 93.8%
Files: 161/185, data: 93.8%
Files: 161/185, data: 93.9%
Files: 161/185, data: 94.0%
Files: 161/185, data: 94.0%
Files: 161/185, data: 94.1%
Files: 161/185, data: 94.2%
Files: 161/185, data: 94.2%
Files: 161/185, data: 94.3%
Files: 161/185, data: 94.4%
Files: 161/185, data: 94.4%
Files: 161/185, data: 94.5%
Files: 161/185, data: 94.6%
Files: 161/185, data: 94.6%
Files: 161/185, data: 94.7%
Files: 161/185, data: 94.8%
Files: 161/185, data: 94.8%
Files: 161/185, data: 94.9%
Files: 162/185, data: 94.9%
Files: 162/185, data: 95.0%
Files: 162/185, data: 95.0%
Files: 162/185, data: 95.1%
Files: 162/185, data: 95.1%
Files: 162/185, data: 95.2%
Files: 162/185, data: 95.2%
Files: 162/185, data: 95.3%
Files: 162/185, data: 95.4%
Files: 162/185, data: 95.4%
Files: 162/185, data: 95.5%
Files: 162/185, data: 95.5%
Files: 162/185, data: 95.6%
Files: 162/185, data: 95.7%
Files: 162/185, data: 95.7%
Files: 162/185, data: 95.8%
Files: 162/185, data: 95.9%
Files: 162/185, data: 95.9%
Files: 162/185, data: 96.0%
Files: 162/185, data: 96.0%
Files: 162/185, data: 96.1%
Files: 162/185, data: 96.2%
Files: 162/185, data: 96.2%
Files: 162/185, data: 96.3%
Files: 162/185, data: 96.4%
Files: 162/185, data: 96.4%
Files: 162/185, data: 96.5%
Files: 162/185, data: 96.6%
Files: 162/185, data: 96.7%
Files: 162/185, data: 96.7%
Files: 162/185, data: 96.8%
Files: 162/185, data: 96.9%
Files: 162/185, data: 97.0%
Files: 162/185, data: 97.0%
Files: 162/185, data: 97.1%
Files: 163/185, data: 97.1%
Files: 163/185, data: 97.1%
Files: 164/185, data: 97.2%
Files: 164/185, data: 97.2%
Files: 164/185, data: 97.3%
Files: 164/185, data: 97.3%
Files: 164/185, data: 97.4%
Files: 164/185, data: 97.5%
Files: 165/185, data: 97.5%
Files: 166/185, data: 97.5%
Files: 166/185, data: 97.5%
Files: 167/185, data: 97.5%
Files: 168/185, data: 97.5%
Files: 169/185, data: 97.5%
Files: 170/185, data: 97.5%
Files: 171/185, data: 97.5%
Files: 172/185, data: 97.5%
Files: 173/185, data: 97.5%
Files: 174/185, data: 97.5%
Files: 175/185, data: 97.5%
Files: 176/185, data: 97.5%
Files: 176/185, data: 97.5%
Files: 177/185, data: 97.5%
Files: 178/185, data: 97.5%
Files: 179/185, data: 97.5%
Files: 180/185, data: 97.5%
Files: 181/185, data: 97.5%
Files: 182/185, data: 97.5%
Files: 183/185, data: 97.5%
Files: 184/185, data: 97.6%
Files: 184/185, data: 97.6%
Files: 184/185, data: 97.7%
Files: 184/185, data: 97.8%
Files: 184/185, data: 97.9%
Files: 184/185, data: 97.9%
Files: 184/185, data: 98.0%
Files: 184/185, data: 98.1%
Files: 184/185, data: 98.2%
Files: 184/185, data: 98.2%
Files: 184/185, data: 98.3%
Files: 184/185, data: 98.4%
Files: 184/185, data: 98.4%
Files: 184/185, data: 98.5%
Files: 184/185, data: 98.6%
Files: 184/185, data: 98.7%
Files: 184/185, data: 98.7%
Files: 184/185, data: 98.8%
Files: 184/185, data: 98.9%
Files: 184/185, data: 98.9%
Files: 185/185, data: 99.0%
Files: 185/185, data: 99.0%
Files: 185/185, data: 99.1%
Files: 185/185, data: 99.2%
Files: 185/185, data: 99.3%
Files: 185/185, data: 99.3%
Files: 185/185, data: 99.4%
Files: 185/185, data: 99.5%
Files: 185/185, data: 99.6%
Files: 185/185, data: 99.6%
Files: 185/185, data: 99.7%
Files: 185/185, data: 99.8%
Files: 185/185, data: 99.9%
Files: 185/185, data: 99.9%
Files: 185/185, data: 100.0%
Done: 185 files, 17.37GiB processed.
The loaded database is not on the latest format (current:AF4.1.a, latest:AF4.3.0). Set dbms.allow_upgrade=true to enable migration.

Running command: sh -c "rm -rf /var/lib/neo4j/logs && neo4j start"
 Worked!
 Output: Directories in use:
home:         /var/lib/neo4j
config:       /var/lib/neo4j/conf
logs:         /var/lib/neo4j/logs
plugins:      /var/lib/neo4j/plugins
import:       /var/lib/neo4j/import
data:         /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
licenses:     /var/lib/neo4j/licenses
run:          /var/lib/neo4j/run
Starting Neo4j.
Started neo4j (pid:202). It is available at http://localhost:7474
There may be a short delay until the server is ready.

Running command: sh -c "sleep 60"
 Worked!

Running command: cypher-shell -u neo4j -p correcthorsebatterystaple -d neo4j "CALL apoc.export.csv.query('MATCH (n) RETURN id(n) as id, labels(n)[0] as type, properties(n) as properties', 'nodes.csv', {})"
 Worked!
 Output: file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
"nodes.csv", "statement: cols(3)", "csv", 0, 0, 43629126, 471084, 14543042, 20000, 728, TRUE, NULL

Running command: cypher-shell -u neo4j -p correcthorsebatterystaple -d neo4j "CALL apoc.export.csv.query('MATCH ()-[r]->() RETURN id(startNode(r)) as source_id, id(endNode(r)) as target_id, type(r) as type, properties(r) as properties', 'edges.csv', {})"
 Worked!
 Output: file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
"edges.csv", "statement: cols(4)", "csv", 0, 0, 753064932, 2519837, 188266233, 20000, 9414, TRUE, NULL

Running command: sh -c "mv /var/lib/neo4j/import/nodes.csv /mnt"
 Worked!

Running command: sh -c "mv /var/lib/neo4j/import/edges.csv /mnt"
 Worked!

Stopping and removing the container
CPU times: user 658 ms, sys: 306 ms, total: 964 ms
Wall time: 56min 50s

4. Data import¶

This section loads the extracted data files that are required for reconstructing the knowledge graph.

In [10]:
def read_csv_file(filepath):
    with open(filepath) as f:
        # A Dask dataframe, not Pandas
        df = dd.read_csv(filepath, dtype=str)
    return df
In [11]:
%%time

df_nodes = read_csv_file(os.path.join(results_dir, "nodes.csv"))
df_edges = read_csv_file(os.path.join(results_dir, "edges.csv"))
CPU times: user 105 ms, sys: 177 ms, total: 282 ms
Wall time: 855 ms

5. Data inspection¶

This section attempts to reproduce some published numbers by inspecting the raw data and then prints a few exemplary records.

The publication mentions following statistics about the knowledge graph contents:

  • Close to 20 million nodes having 36 node types
  • More than 200 million edges having 47 edge types

a) Number of nodes and edges¶

In [12]:
%%time

num_nodes = len(df_nodes)
num_edges = len(df_edges)

print(f"{num_nodes:,} nodes")
print(f"{num_edges:,} edges")
print()
14,543,042 nodes
188,266,233 edges

CPU times: user 11min 5s, sys: 1min 55s, total: 13min 1s
Wall time: 6min 54s

Interpretation:

  • Inspecting the raw data resulted in 14,543,042 nodes, while the publication mentioned close to 20,000,000 nodes.
  • Inspecting the raw data resulted in 188,266,233 edges, while the publication mentioned more than 200,000,000 edges.
  • Both differences could have a variety of reasons, e.g. the authors may have changed the process that generates the data from Version 1 to Version 3 after the publication, there may be an error in the reported numbers, or the data extraction via Neo4j import and export inside a Docker container may have a defect.

b) Types of nodes and edges¶

In [13]:
%%time

nt_column = "type"
nt_counts = df_nodes.groupby(nt_column).size().compute()
nt_counts = nt_counts.sort_values(ascending=False)

print(len(nt_counts), "node types, sorted by their frequency:")
for key, val in nt_counts.items():
    print(f"- {key}: {val}")
print()
32 node types, sorted by their frequency:
- Known_variant: 10630108
- Publication: 1791712
- Peptide: 1001105
- Transcript: 280910
- Protein: 228725
- Clinically_relevant_variant: 190334
- Metabolite: 114222
- Pathway: 51219
- Protein_structure: 49317
- Gene: 42571
- Biological_process: 28642
- Modified_protein: 21407
- Amino_acid_sequence: 20614
- Functional_region: 16169
- Phenotype: 15872
- Molecular_function: 11169
- Disease: 10791
- Experimental_factor: 9883
- GWAS_study: 8713
- Tissue: 5897
- Cellular_component: 4176
- Experiment: 2829
- Complex: 2700
- Modification: 1978
- Food: 992
- Units: 442
- Analytical_sample: 172
- Biological_sample: 170
- Subject: 169
- Chromosome: 25
- Project: 7
- User: 2

CPU times: user 1min 52s, sys: 28.9 s, total: 2min 21s
Wall time: 36.4 s
In [14]:
%%time

et_column = "type"
et_counts = df_edges.groupby(et_column).size().compute()
et_counts = et_counts.sort_values(ascending=False)

print(len(et_counts), "edge types, sorted by their frequency:")
for key, val in et_counts.items():
    print(f"- {key}: {val}")
print()
39 edge types, sorted by their frequency:
- MENTIONED_IN_PUBLICATION: 111109238
- VARIANT_FOUND_IN_PROTEIN: 26807293
- ASSOCIATED_WITH: 16707629
- VARIANT_FOUND_IN_GENE: 10638935
- VARIANT_FOUND_IN_CHROMOSOME: 10630108
- BELONGS_TO_PROTEIN: 3629058
- COMPILED_INTERACTS_WITH: 1956612
- DETECTED_IN_PATHOLOGY_SAMPLE: 1697248
- ANNOTATED_IN_PATHWAY: 1203809
- ACTS_ON: 988705
- HAS_QUANTIFIED_PROTEIN: 797651
- TRANSLATED_INTO: 374294
- CURATED_INTERACTS_WITH: 299188
- LOCATED_IN: 295912
- TRANSCRIBED_INTO: 258487
- HAS_QUANTIFIED_MODIFIED_PROTEIN: 224478
- FOUND_IN_PROTEIN: 204244
- HAS_STRUCTURE: 195640
- HAS_PARENT: 128349
- HAS_MODIFIED_SITE: 21421
- HAS_MODIFICATION: 21407
- HAS_SEQUENCE: 20614
- VARIANT_FOUND_IN_GWAS: 16128
- IS_SUBUNIT_OF: 10968
- CURATED_AFFECTS_INTERACTION_WITH: 10873
- STUDIES_TRAIT: 9250
- PUBLISHED_IN: 3939
- MAPS_TO: 2289
- IS_SUBSTRATE_OF: 994
- IS_BIOMARKER_OF_DISEASE: 515
- IS_QCMARKER_IN_TISSUE: 249
- SPLITTED_INTO: 172
- BELONGS_TO_SUBJECT: 170
- HAS_ENROLLED: 169
- VARIANT_IS_CLINICALLY_RELEVANT: 169
- STUDIES_TISSUE: 7
- IS_RESPONSIBLE: 7
- STUDIES_DISEASE: 7
- PARTICIPATES_IN: 7

CPU times: user 5min 54s, sys: 49 s, total: 6min 43s
Wall time: 2min 1s
In [15]:
# Correctness checks

# 1) Do the counts of different node types add up to the total number of nodes?
sum_node_types = nt_counts.sum()
assert sum_node_types == num_nodes, f"Node counts differ: {sum_node_types} != {num_nodes}"
print(f"{sum_node_types:,} = {num_nodes:,} nodes")

# 2) Do the counts of different edge types add up to the total number of edges?
sum_edge_types = et_counts.sum()
assert sum_edge_types == num_edges, f"Edge counts differ: {sum_edge_types} != {num_edges}"
print(f"{sum_edge_types:,} = {num_edges:,} edges")
14,543,042 = 14,543,042 nodes
188,266,233 = 188,266,233 edges

Interpretation:

  • Inspecting the raw data resulted in 32 node types, while the publication mentions 36 node types, which is 4 more.
  • Inspecting the raw data resulted in 39 edge types, while the publication mentions 47 edge types, which is 8 more.
  • Both differences could have a variety of reasons, probably the same as the differences observed in node and edge counts.
  • Looking at relative frequencies of node and edge types suggests that the dataset is rather unbalanced.
    • The most frequent node type is "Known_variant" with 10,630,108 instances, while the least frequent node type is "User" with only 2 instances, a difference of 7 orders of magnitude.
    • The most frequent edge type is "MENTIONED_IN_PUBLICATION" with 111,109,238 instances, while the least frequent edge types only have 7 instance each, a difference of 8 orders of magnitude.

c) Example entries¶

This section prints some example entries of the raw data. It gives an impression of the format chosen by the authors, which differs greatly between projects due to a lack of a broadly accepted standard for biomedical knowledge graphs.

In [16]:
def report_first_n_items(data, n):
    return data.head(n)
In [17]:
def report_last_n_items(data, n):
    return data.tail(n)

Nodes together with node annotations¶

In [18]:
report_first_n_items(df_nodes, 2)
Out[18]:
id type properties
0 0 Disease {"synonyms":["angiosarcoma","hemangiosarcoma",...
1 1 Disease {"synonyms":["pterygium","surfer's eye","UMLS_...
In [19]:
report_last_n_items(df_nodes, 2)
Out[19]:
id type properties
127611 14880085 Pathway {"organism":"9606","name":"De Novo Triacylglyc...
127612 14880086 Pathway {"organism":"9606","name":"De Novo Triacylglyc...

Edges together with edge annotations¶

In [20]:
report_first_n_items(df_edges, 2)
Out[20]:
source_id target_id type properties
0 0 6993 HAS_PARENT {}
1 0 14512111 MENTIONED_IN_PUBLICATION {}
In [21]:
report_last_n_items(df_edges, 2)
Out[21]:
source_id target_id type properties
665269 14828866 81596 STUDIES_TRAIT {"source":"GWAS Catalog"}
665270 14828867 87979 STUDIES_TRAIT {"source":"GWAS Catalog"}

6. Schema detection¶

This section analyzes the structure of the knowledge graph by determining which types of nodes are connected by which types of edges. To construct this overview, it is necessary to iterate over the entire data once. The result is a condensed representation of all entities and relations, which is known as data model or schema in the context of graph databases.

In [22]:
node_type_to_color = {
    "Metabolite": "green",

    "Gene": "blue",
    "Transcript": "blue",
    "Protein": "blue",
    "Modified_protein": "blue",
    "Peptide": "blue",

    "Disease": "red",
    "Pathway": "red",
    "Biological_process": "red",
}
In [23]:
%%time

node_id_to_type = {row.id: row.type for row in df_nodes.itertuples()}
CPU times: user 2min 35s, sys: 26.5 s, total: 3min 1s
Wall time: 3min 1s
In [24]:
%%time

unique_triples = set()
for row in df_edges.itertuples():
    s = node_id_to_type[row.source_id]
    p = row.type
    o = node_id_to_type[row.target_id]
    triple = (s, p, o)
    unique_triples.add(triple)
CPU times: user 17min 51s, sys: 45 s, total: 18min 35s
Wall time: 18min 36s
In [25]:
gs = ig.Graph(directed=True)
unique_nodes = set()
for s, p, o in unique_triples:
    for node in (s, o):
        if node not in unique_nodes:
            unique_nodes.add(node)
            
            node_size = int(nt_counts[node])
            node_color = node_type_to_color.get(node, '')
            node_hover = f"{node}\n\n{nt_counts[node]} nodes of this type are contained in the knowledge graph."
            gs.add_vertex(node, size=node_size, color=node_color, label_color=node_color, hover=node_hover)

    edge_size = int(et_counts[p])
    edge_color = node_type_to_color.get(s, '')
    edge_hover = f"{p}\n\n{et_counts[p]} edges of this type are contained in the knowledge graph."
    gs.add_edge(s, o, size=edge_size, color=edge_color, hover=edge_hover, label=p, label_color="gray", label_size=5)

gs.vcount(), gs.ecount()
Out[25]:
(31, 65)
In [26]:
fig = gv.d3(
    gs,
    show_node_label=True,
    node_label_data_source="name",

    show_edge_label=True,
    edge_label_data_source="label",
    edge_curvature=0.2,

    use_node_size_normalization=True,
    node_size_normalization_min=10,
    node_size_normalization_max=50,
    node_drag_fix=True,
    node_hover_neighborhood=True,
    
    use_edge_size_normalization=True,
    edge_size_normalization_max=3,

    many_body_force_strength=-3000,
    zoom_factor=0.3,
)
fig
Out[26]:
Details for selected element
General
App state
Display mode
Export
Data selection
Graph
Node label text
Edge label text
Node size
Minimum
Maximum
Edge size
Minimum
Maximum
Nodes
Visibility
Size
Scaling factor
Position
Drag behavior
Hover behavior
Node images
Visibility
Size
Scaling factor
Node labels
Visibility
Size
Scaling factor
Rotation
Angle
Edges
Visibility
Size
Scaling factor
Form
Curvature
Hover behavior
Edge labels
Visibility
Size
Scaling factor
Rotation
Angle
Layout algorithm
Simulation
Many-body force
Strength
Theta
Min
Max
Links force
Distance
Strength
Collision force
Radius
Strength
x-positioning force
Strength
y-positioning force
Strength
Centering force
In [27]:
# Export the schema visualization
schema_filepath = os.path.join(results_dir, f"{project_name}_schema.html")
fig.export_html(schema_filepath, overwrite=True)

Interpretation:

  • Each node in the schema corresponds to one of the 32 node types in the data.
    • Node size represents the number of instances, i.e. how often that node type is present in the knowledge graph. The exact number can also be seen when hovering over a node. The large differences indicate again that the dataset is rather unbalanced.
    • Node color represents particular node types. The coloring scheme is based on a deliberately simple RGB palette with the same meaning across multiple notebooks to enable some visual comparison. The idea behind it is to highlight an interplay of certain entities, namely that drugs (or small molecules in general) can bind to proteins (or gene products in general) and thereby alter diseases (or involved pathways).
      • green = drugs & other small molecules (e.g. toxins)
      • blue = genes & gene products (e.g. proteins or RNAs)
      • red = diseases & related concepts (e.g. pathways)
      • black = all other types of entities
  • Each edge in the schema stands for one of the 39 edge types in the data. It is possible that the same edge type appears between different nodes.
    • Edge size represents the number of instances, i.e. how often that edge type is present in the knowledge graph.
    • Edge color is identical to the color of the source node, again to highlight the interplay between drugs, targets and diseases.

6. Knowledge graph reconstruction¶

Usually, this section first converts the raw data to an intermediate format used in several notebooks, and then reconstructs the knowledge graph from the standardized data with shared code. However, in the case of CKG, the data conversion was already done in section 3 that extracted the nodes and edges from the provided Neo4j database. Further, only a part of the knowledge graph will be reconstructed for memory reasons.

  • The intermediate form of the data is created as two simple Python lists, one for nodes and the other for edges, which can be exported to two CSV files.
  • The knowledge graph is built as a graph object from the Python package igraph, which can be exported to a GraphML file.

a) Convert the data into a standardized format¶

Transform the raw data to an standardized format that is compatible with most biomedical knowledge graphs in order to enable shared downstream processing:

  • Each node is represented by three items: id (str), type (str), properties (dict)
  • Each edge is represented by four items: source_id (str), target_id (str), type(str), properties (dict)

This format was initially inspired by a straightforward way in which the content of a Neo4j graph database can be exported to two CSV files, one for all nodes and the other for all edges. This is an effect of the property graph model used in Neo4j and many other graph databases, which also appears to be general enough to fully capture the majority of biomedical knowledge graphs described in scientific literature, despite the large variety of formats they are shared in.

A second motivation was that each line represents a single node or edge, and that no entry is connected to any sections at other locations, such as property descriptions at the beginning of a GraphML file. This structural simplicity makes it very easy to load just a subset of nodes and edges by picking a subset of lines, or to skip the loading of properties if they not required for a task simply by ignoring a single column.

Finally, this format also allows to load the data directly into popular SQL databases like SQLite, MySQL or PostgreSQL with built-in CSV functions (CSV in SQLite, CSV in MySQL, CSV in PostgreSQL). Further, the JSON string in the property column can be accessed directly by built-in JSON functions (JSON in SQLite, JSON in MySQL, JSON in PostgreSQL), which enables sophisticated queries that access or modify specific properties within the JSON data.

Subset of node ids¶

Caution: Only a part of the knowledge graph is reconstructed here, because the entire dataset is too large to load it completely into a igraph graph object in memory on an average laptop. To do so, first a subset of interesting nodes is identified (associated with the Imatinib/CML story that follows later), then all edges are collected that contain these nodes as source or target, and finally all nodes are collected that appear on either side of these edges.

In [28]:
%%time

substrings = ["bcr", "abl1", "imatinib", "chronic myeloid leukemia"]
column = "properties"
mask = df_nodes[column].str.contains(substrings[0], case=False, na=False)
for substring in substrings[1:]:
    mask |= df_nodes[column].str.contains(substring, case=False, na=False)

df_nodes_filtered = df_nodes[mask].compute()
node_ids = set(df_nodes_filtered["id"])
CPU times: user 11min 37s, sys: 45.3 s, total: 12min 22s
Wall time: 10min 40s

Edges¶

In [29]:
%%time

mask = df_edges["source_id"].isin(node_ids) | df_edges["target_id"].isin(node_ids)
edges = df_edges[mask].compute().values.tolist()
edges = [(sid, tid, etype, json.loads(eproperties)) for sid, tid, etype, eproperties in edges]
CPU times: user 9min 1s, sys: 57.9 s, total: 9min 59s
Wall time: 5min 51s

Nodes¶

In [30]:
source_ids = [sid for sid, tid, etype, eproperties in edges]
target_ids = [tid for sid, tid, etype, eproperties in edges]
node_ids_in_edges = set(source_ids + target_ids)
In [31]:
%%time

mask = df_nodes["id"].isin(node_ids_in_edges)
nodes = df_nodes[mask].compute().values.tolist()
nodes = [(nid, ntype, json.loads(nproperties)) for nid, ntype, nproperties in nodes]
CPU times: user 3min 4s, sys: 29.9 s, total: 3min 34s
Wall time: 1min 52s

b) Export the standardized data to two CSV files¶

Both the id and type items are simple strings, while the properties item is collection of key-value pairs represented by a Python dictionary that can be converted to a single JSON string, which the export function does internally. This means each node is fully represented by three strings, and each edge by four strings due to having a source id and target id.

In [32]:
nodes_csv_filepath = shared_bmkg.export_nodes_as_csv(nodes, results_dir, project_name + "_subset")
In [33]:
edges_csv_filepath = shared_bmkg.export_edges_as_csv(edges, results_dir, project_name + "_subset")

c) Use the standardized data to build a graph¶

Reconstruct a part of the knowledge graph in form of a Graph object from the package igraph. This kind of graph object allows to have directed multi-edges, i.e. an edge has a source and a target node, and two nodes can be connected by more than one edge. It also allows to have node and edge properties. These features are necessary and sufficient to represent almost any biomedical knowledge graph found in academic literature.

In [34]:
g = shared_bmkg.create_graph(nodes, edges)
shared_bmkg.report_graph_stats(g)
Directed multigraph with 72062 nodes, 286596 edges and a density of 5.519e-05.
In [35]:
# Correctness checks

# 1) Does the reconstructed graph contain the same number of nodes as the raw data?
num_nodes_in_data = len(nodes)
num_nodes_in_graph = g.vcount()
assert num_nodes_in_graph == num_nodes_in_data, f"Node counts differ: {num_nodes_in_graph} != {num_nodes_in_data}"
print(f"{num_nodes_in_graph:,} = {num_nodes_in_data:,}")

# 2) Does the reconstructed graph contain the same number of (unique) edges as the raw data?
num_edges_in_data = len(edges)
num_edges_in_graph = g.ecount()
assert num_edges_in_graph == num_edges_in_data, f"Edge counts differ: {num_edges_in_graph} != {num_edges_in_data}"
print(f"{num_edges_in_graph:,} = {num_edges_in_data:,}")
72,062 = 72,062
286,596 = 286,596

Select the largest connected component of the graph in order to get rid of some small disconnected subgraphs

In [36]:
g = g.connected_components(mode="weak").giant()
shared_bmkg.report_graph_stats(g)
Directed multigraph with 71655 nodes, 286196 edges and a density of 5.574e-05.

d) Export the graph to a GraphML file¶

Export the graph with all nodes, edges and properties as a single GraphML file.

In [37]:
%%time

g_graphml_filepath = shared_bmkg.export_graph_as_graphml(g, results_dir, project_name)
CPU times: user 25.4 s, sys: 1.71 s, total: 27.1 s
Wall time: 26.8 s

7. Subgraph exploration¶

This section explores small subgraphs of the knowledge graph in two ways: first by inspecting the direct neighborhood of a selected node, and second by finding shortest paths between two chosen nodes.

As a simple case study, the goal is to identify some nodes in the knowledge graph that are associated with the success story of the drug Imatinib, which was one of the first targeted therapies against cancer. Detailed background information can for example be found in an article by the National Cancer Institute and in a talk by Brian Druker who played a major role in the development of this paradigm-changing drug. To give a simplified summary, following biological entities and relationships are involved:

  • Mutation: In a bone marrow stem cell, a translocation event between chromosome 9 and 22 leads to what has been called the Philadelphia chromosome, which can be seen under a microscope and got named after the city it originally got discovered in.
  • Gene: It turned out that this particular rearrangement of DNA fuses the BCR) gene on chromosome 22 to the ABL1) gene on chromosome 9, resulting in a new fusion gene known as BCR-ABL1.
  • Disease: BCR-ABL1 acts as an oncogene, because it expresses a protein that is a defective tyrosine kinase in a permanent "on" state, which leads to uncontrolled growth of certain white blood cells and their precursors, thereby driving the disease Chronic Myelogenous Leukemia (CML).
  • Drug: Imatinib (Gleevec) was the first demonstration that a potent and selective Bcr-Abl tyrosine-kinase inhibitor (TKI) is possible and that such a targeted inhibition of an oncoprotein halts the uncontrolled growth of leukemia cells with BCR-ABL1, while having significantly less effect on other cells in the body compared to conventional chemotherapies used in cancer. This revolutionized the treatment of CML and drastically improved the five-year survival rate of patients from less than 20% to over 90%, as well as their quality of life.

In reality the story is a bit more complex, for example because there are other genes involved in disease progression, there are many closely related forms of leukemia, BCR-ABL1 also plays a role in other forms of cancer, there are several drugs available as treatment options today, all of them bind to more than one target and with different affinities, and their individual binding profiles are relevant to their particular therapeutic effects. Inspecting the knowledge graph will focus on highlighting some entities of the simplified story, but the surrounding elements will also indicate some of the complexity encountered in reality. Some simple forms of reasoning on the knowledge graph will demonstrate its potential for discovering new patterns and hypotheses.

a) Search for interesting nodes¶

In [38]:
# Drug: Imatinib
shared_bmkg.list_nodes_matching_substring(g, "imatinib", "_name")
id          type          _name                               
==============================================================
14709403    Metabolite    Imatinib                            
14757287    Metabolite    N-desmethylimatinib                 
14829069    Pathway       Imatinib-resistant KIT mutants      
14829750    Pathway       Imatinib-resistant PDGFR mutants    
14868156    Pathway       Imatinib Inhibition of BCR-ABL      
In [39]:
# Gene: ABL1
shared_bmkg.list_nodes_matching_substring(g, "abl1", "_name")  # Key "name" had to be replaced with "_name" to prevent an issue with igraph
id        type                   _name                                                          
================================================================================================
153256    Transcript             CDK5 and ABL1 enzyme substrate 1 isoform 3                     
156725    Transcript             CDK5 and ABL1 enzyme substrate 2                               
170294    Transcript             CDK5 and ABL1 enzyme substrate 1 isoform 1                     
201527    Transcript             CDK5 and ABL1 enzyme substrate 1 isoform 2                     
248136    Transcript             tyrosine-protein kinase ABL1 isoform b                         
2501      Disease                B-lymphoblastic leukemia/lymphoma with BCR-ABL1                
2508      Disease                B-lymphoblastic leukemia/lymphoma, BCR-ABL1–like             
270788    Transcript             tyrosine-protein kinase ABL1 isoform a                         
415423    Protein                ABL1                                                           
415687    Protein                ABL1                                                           
415936    Protein                ABL1                                                           
482984    Protein                CABL1                                                          
501506    Protein                BCR-ABL1                                                       
502824    Protein                ABL1                                                           
507519    Protein                ABL1                                                           
516183    Protein                ABL1                                                           
521954    Protein                ABL1                                                           
540406    Protein                ABL1                                                           
551827    Protein                ABL1                                                           
559528    Protein                ABL1                                                           
573628    Protein                ABL1                                                           
574840    Protein                BCR-ABL1                                                       
593626    Protein                ABL1                                                           
603785    Protein                BCR-ABL1                                                       
610827    Protein                BCR-ABL1 e19a2                                                 
626905    Protein                ABL1                                                           
89607     Experimental_factor    Blast Phase Chronic Myelogenous Leukemia, BCR-ABL1 Positive    
In [40]:
# Disease: Myeloid Leukemia - to find Chronic Myeloid Leukemia (CML)
shared_bmkg.list_nodes_matching_substring(g, "myeloid leukemia", "_name")
id       type                   _name                                 
======================================================================
10265    Disease                chronic myeloid leukemia              
10326    Disease                myeloid leukemia                      
10343    Disease                subacute myeloid leukemia             
10440    Disease                acute myeloid leukemia                
11530    Tissue                 myeloid leukemia cell line            
11846    Tissue                 myeloid leukemia cell                 
12333    Tissue                 chronic myeloid leukemia cell         
12334    Tissue                 acute myeloid leukemia cell           
12672    Tissue                 acute myeloid leukemia cell line      
1294     Disease                atypical chronic myeloid leukemia     
13369    Tissue                 chronic myeloid leukemia cell line    
1855     Disease                childhood acute myeloid leukemia      
90958    Experimental_factor    accelerated phase myeloid leukemia    

b) Explore the neighborhood of a chosen node¶

In [41]:
# Neighborhood of drug Imatinib
source = "14709403"
subgraph = shared_bmkg.get_egocentric_subgraph(g, source)

# Export
filename = f"{project_name}_neighbors_imatinib"
shared_bmkg.export_graph_as_graphml(subgraph, results_dir, filename)
shared_bmkg.export_nodes_as_csv(nodes, results_dir, filename, subgraph)
shared_bmkg.export_edges_as_csv(edges, results_dir, filename, subgraph)

# Report
shared_bmkg.report_graph_stats(subgraph)
shared_bmkg.visualize_graph(subgraph, node_type_to_color, source=source)
Directed multigraph with 2 nodes, 1 edges and a density of 0.25.
Out[41]:
Details for selected element
General
App state
Display mode
Export
Data selection
Graph
Node label text
Edge label text
Node size
Minimum
Maximum
Edge size
Minimum
Maximum
Nodes
Visibility
Size
Scaling factor
Position
Drag behavior
Hover behavior
Node images
Visibility
Size
Scaling factor
Node labels
Visibility
Size
Scaling factor
Rotation
Angle
Edges
Visibility
Size
Scaling factor
Form
Curvature
Hover behavior
Edge labels
Visibility
Size
Scaling factor
Rotation
Angle
Layout algorithm
Simulation
Many-body force
Strength
Theta
Min
Max
Links force
Distance
Strength
Collision force
Radius
Strength
x-positioning force
Strength
y-positioning force
Strength
Centering force

Interpretation:

  • The drug Imatinib (green node at the center) is connected only to one other node, a pathway named "Imatinib Inhibition of BCR-ABL".
  • Caution: This small neighborhood is most likely a result of the filtering step that had to be performed in order to get to a reasonably sized graph that fits into memory. It does not reflect all information about Imatinib in CKG!
In [42]:
# Neighborhood of protein ABL1
source = "415687"
subgraph = shared_bmkg.get_egocentric_subgraph(g, source)

# Export
filename = f"{project_name}_neighbors_abl1"
shared_bmkg.export_graph_as_graphml(subgraph, results_dir, filename)
shared_bmkg.export_nodes_as_csv(nodes, results_dir, filename, subgraph)
shared_bmkg.export_edges_as_csv(edges, results_dir, filename, subgraph)

# Report
shared_bmkg.report_graph_stats(subgraph)
#shared_bmkg.visualize_graph(subgraph, node_type_to_color, source=source)
Directed multigraph with 10418 nodes, 28461 edges and a density of 0.0002622.

Interpretation:

  • CKG contains too many nodes connected to the protein ABL1 for plotting it and performing a visual analysis.
  • Caution: This might not even be the entire neighborhood due to the filtering step that had to be performed in order to get to a reasonably sized graph that fits into memory.
In [43]:
# Neighborhood of disease CML
source = "10265"
subgraph = shared_bmkg.get_egocentric_subgraph(g, source)

# Export
filename = f"{project_name}_neighbors_cml"
shared_bmkg.export_graph_as_graphml(subgraph, results_dir, filename)
shared_bmkg.export_nodes_as_csv(nodes, results_dir, filename, subgraph)
shared_bmkg.export_edges_as_csv(edges, results_dir, filename, subgraph)

# Report
shared_bmkg.report_graph_stats(subgraph)
Directed multigraph with 15612 nodes, 21425 edges and a density of 8.79e-05.

Interpretation:

  • CKG contains too many nodes connected to the disease CML for plotting it and performing a visual analysis.
  • Caution: This might not be the entire neighborhood due to the filtering step that had to be performed in order to get to a reasonably sized graph that fits into memory.

c) Find shortest paths between two chosen nodes¶

In [44]:
# Paths from transcript "tyrosine-protein kinase ABL1 isoform a" to disease AML
source = "270788"
target = "10440"
subgraph = shared_bmkg.get_paths_subgraph(g, source, target)

# Report
shared_bmkg.report_graph_stats(subgraph)
shared_bmkg.visualize_graph(subgraph, node_type_to_color, source, target)
Directed multigraph with 3 nodes, 2 edges and a density of 0.2222.
Out[44]:
Details for selected element
General
App state
Display mode
Export
Data selection
Graph
Node label text
Edge label text
Node size
Minimum
Maximum
Edge size
Minimum
Maximum
Nodes
Visibility
Size
Scaling factor
Position
Drag behavior
Hover behavior
Node images
Visibility
Size
Scaling factor
Node labels
Visibility
Size
Scaling factor
Rotation
Angle
Edges
Visibility
Size
Scaling factor
Form
Curvature
Hover behavior
Edge labels
Visibility
Size
Scaling factor
Rotation
Angle
Layout algorithm
Simulation
Many-body force
Strength
Theta
Min
Max
Links force
Distance
Strength
Collision force
Radius
Strength
x-positioning force
Strength
y-positioning force
Strength
Centering force

Interpretation:

  • The transcript "tyrosine-protein kinase ABL1 isoform a" (blue node on the left) leads via a single path to the disease AML (red node on the right). It is connected via a "TRANSLATED_INTO" relation to the protein ABL1, which in turn is linked by a "ASSOCIATED_WITH" relation to the disease AML.
  • Caution: These might not be all paths due to the filtering step that had to be performed in order to get to a reasonably sized graph that fits into memory.