kgw

kgw - Knowledge Graph Workflows

A Python package for downloading, converting, and analyzing a selection of knowledge graphs.

Currently five projects from the domain of biomedicine are covered. In future, more projects from the same or other domains might get added. Contributions are welcome and encouraged!

Subpackages

Functions

run(workflow[, num_workers, verbose])

Execute all tasks in the provided workflow according to their dependencies.

Package Contents

kgw.run(workflow, num_workers=None, verbose=True)[source]

Execute all tasks in the provided workflow according to their dependencies.

This function uses the package Luigi [1] to build a dependency graph of all tasks defined in the workflow and execute them in parallel with multiple worker processes.

Parameters:
  • workflow (Project, or list of Project) – Specification of a workflow in form of a single or multiple project objects. A project object provides several methods that can be called in order to add specific tasks. For example, calling the method to_csv() will store a task in the object that represents the conversion of the project’s knowledge graph into CSV format. The workflow engine automatically detects these tasks, inspects their dependencies, and schedules all necessary steps in the correct order.

  • num_workers (int, optional, default=4*cpu_count) – The number of worker processes to run tasks in parallel. If not specified, it defaults to 4 times the number of CPU cores available on the machine.

  • verbose (bool, optional, default=True) – If True, a log of tasks and a summary of results is written to stdout. If False, no text is printed.

Returns:

success (bool) – Returns True if all tasks were successfully scheduled and executed, otherwise False. A failed run can be resumed from an intermediate state without re-running previously completed tasks.

Raises:
  • TypeError – Raised if workflow is not a project object or list of such objects, or if num_workers is not an integer, or if verbose is not a boolean.

  • ValueError – Raised if workflow is an empty list.

References