kgw.biomedicine

Knowledge graph projects from the domain of biomedicine.

Classes

`Ckg`	Clinical Knowledge Graph (CKG).
`Hald`	Human Aging and Longevity Dataset (HALD).
`MonarchKg`	Monarch Knowledge Graph (MonarchKG).
`Oregano`	Oregano Knowledge Graph.
`PrimeKg`	Precision Medicine Knowledge Graph (PrimeKG).

Package Contents

class kgw.biomedicine.Ckg(version, workdir)[source]

Clinical Knowledge Graph (CKG).

References

Publication: https://doi.org/10.1038/s41587-021-01145-6
Website: https://ckg.readthedocs.io
Code: https://github.com/MannLabs/CKG
Data: https://doi.org/10.17632/mrcf7f4tc2

__init__(version, workdir)

Initialize a project instance so that tasks can be defined on it.

Parameters:

version (str) – Version of the dataset that will be downloaded and processed. The method get_versions() returns all currently available versions.
workdir (str) – Path of the working directory in which a unique subdirectory will be created to hold all downloaded and generated files for this project and version.

Raises:

ValueError – Raised if version is invalid or unavailable.
TypeError – Raised if workdir is not a string.

Notes

This class does not automatically download or process any data. Such tasks first need to be specified by calling the relevant methods on the project object and then passing it to the function run() that builds and executes a corresponding workflow.

classmethod get_versions(): Fetch all currently available versions from the data repository of the project.

to_sqlite()

Convert the knowledge graph to a file-based SQLite database.

Generates the output file kg.sqlite. This database contains a unified representation for each knowledge graph, with the same schema being used for each projects. From this intermediate format it is possible to generate all other files, using just one method per output format rather than writing a custom converter for each project.

References

to_statistics()

Determine some statistical properties of the knowledge graph.

Generates the output file statistics.json. This is a JSON file that contains data about basic statistics of the elements in the knowledge graph, such as node, edge and type counts.

to_schema()

Determine the schema of the knowledge graph.

Generates the output file schema.html. This is a standalone HTML file with an interactive graph visualization of all entity types in the knowledge graph and the relationship types by which they are connected.

References

to_sql()

Convert the knowledge graph to a SQL text file.

Generates the output file kg.sql. This is a text file with SQL commands that can be used to import the structure and content of the knowledge graph into relational database systems such as MySQL or PostgreSQL.

References

to_csv()

Convert the knowledge graph to two CSV text files.

Generates the output files kg_nodes.csv and kg_edges.csv.

References

Wikipedia: CSV

to_jsonl()

Convert the knowledge graph to two JSON Lines text files.

Generates the output files kg_nodes.jsonl and kg_edges.jsonl.

References

JSONL

to_metta(representation='spo')

Convert the knowledge graph to a MeTTa text file.

Generates the output file kg_spo.metta, kg_properties_aggregated.metta or kg_properties_expanded.metta, depending on the chosen representation.

Caution: These representations are still subject to experimentation and testing. They might change in future versions of this package.

Parameters:

representation (str) – The format used to represent the knowledge graph in the MeTTa language.

Available options:

"spo": Semantic triples of the form ("subject", "predicate", "object"). If properties are present in the original knowledge graph, they are ignored in this representation.
"properties_aggregated": Properties (=key-value pairs) are represented by putting each key on a separate line, but each value is ensured to be a single number or string. This means values that hold a compound data type like a list or dict are aggregated into one string in JSON string format. Text identifiers of nodes are reused to create the association with their properties, while text identifiers of the form “e{cnt}” are introduced for edges to serve the same purpose.
"properties_expanded": Properties (=key-value pairs) are represented by fully expanding their keys and values onto as many lines as required. Numerical identifiers for nodes and edges are introduced to create the association between these elements and their properties.

References

to_graphml()

Convert the knowledge graph to a GraphML text file.

Generates the output file kg.graphml.

References

class kgw.biomedicine.Hald(version, workdir)[source]

Human Aging and Longevity Dataset (HALD).

References

Publication: https://doi.org/10.1038/s41597-023-02781-0
Website: https://bis.zju.edu.cn/hald
Code: https://github.com/zexuwu/hald
Data: https://doi.org/10.6084/m9.figshare.22828196

to_schema()[source]

Determine the schema of the knowledge graph.

Generates the output file schema.html. This is a standalone HTML file with an interactive graph visualization of all entity types in the knowledge graph and the relationship types by which they are connected.

References

__init__(version, workdir)

Initialize a project instance so that tasks can be defined on it.

Parameters:

version (str) – Version of the dataset that will be downloaded and processed. The method get_versions() returns all currently available versions.
workdir (str) – Path of the working directory in which a unique subdirectory will be created to hold all downloaded and generated files for this project and version.

Raises:

ValueError – Raised if version is invalid or unavailable.
TypeError – Raised if workdir is not a string.

Notes

This class does not automatically download or process any data. Such tasks first need to be specified by calling the relevant methods on the project object and then passing it to the function run() that builds and executes a corresponding workflow.

classmethod get_versions(): Fetch all currently available versions from the data repository of the project.

to_sqlite()

Convert the knowledge graph to a file-based SQLite database.

Generates the output file kg.sqlite. This database contains a unified representation for each knowledge graph, with the same schema being used for each projects. From this intermediate format it is possible to generate all other files, using just one method per output format rather than writing a custom converter for each project.

References

to_statistics()

Determine some statistical properties of the knowledge graph.

Generates the output file statistics.json. This is a JSON file that contains data about basic statistics of the elements in the knowledge graph, such as node, edge and type counts.

to_sql()

Convert the knowledge graph to a SQL text file.

Generates the output file kg.sql. This is a text file with SQL commands that can be used to import the structure and content of the knowledge graph into relational database systems such as MySQL or PostgreSQL.

References

to_csv()

Convert the knowledge graph to two CSV text files.

Generates the output files kg_nodes.csv and kg_edges.csv.

References

Wikipedia: CSV

to_jsonl()

Convert the knowledge graph to two JSON Lines text files.

Generates the output files kg_nodes.jsonl and kg_edges.jsonl.

References

JSONL

to_metta(representation='spo')

Convert the knowledge graph to a MeTTa text file.

Generates the output file kg_spo.metta, kg_properties_aggregated.metta or kg_properties_expanded.metta, depending on the chosen representation.

Caution: These representations are still subject to experimentation and testing. They might change in future versions of this package.

Parameters:

representation (str) – The format used to represent the knowledge graph in the MeTTa language.

Available options:

"spo": Semantic triples of the form ("subject", "predicate", "object"). If properties are present in the original knowledge graph, they are ignored in this representation.
"properties_aggregated": Properties (=key-value pairs) are represented by putting each key on a separate line, but each value is ensured to be a single number or string. This means values that hold a compound data type like a list or dict are aggregated into one string in JSON string format. Text identifiers of nodes are reused to create the association with their properties, while text identifiers of the form “e{cnt}” are introduced for edges to serve the same purpose.
"properties_expanded": Properties (=key-value pairs) are represented by fully expanding their keys and values onto as many lines as required. Numerical identifiers for nodes and edges are introduced to create the association between these elements and their properties.

References

to_graphml()

Convert the knowledge graph to a GraphML text file.

Generates the output file kg.graphml.

References

class kgw.biomedicine.MonarchKg(version, workdir)[source]

Monarch Knowledge Graph (MonarchKG).

References

Publication: https://doi.org/10.1093/nar/gkad1082
Website: https://monarchinitiative.org
Code: https://github.com/monarch-initiative/monarch-ingest
Data: https://data.monarchinitiative.org/monarch-kg/index.html

__init__(version, workdir)

Initialize a project instance so that tasks can be defined on it.

Parameters:

version (str) – Version of the dataset that will be downloaded and processed. The method get_versions() returns all currently available versions.
workdir (str) – Path of the working directory in which a unique subdirectory will be created to hold all downloaded and generated files for this project and version.

Raises:

ValueError – Raised if version is invalid or unavailable.
TypeError – Raised if workdir is not a string.

Notes

This class does not automatically download or process any data. Such tasks first need to be specified by calling the relevant methods on the project object and then passing it to the function run() that builds and executes a corresponding workflow.

classmethod get_versions(): Fetch all currently available versions from the data repository of the project.

to_sqlite()

Convert the knowledge graph to a file-based SQLite database.

Generates the output file kg.sqlite. This database contains a unified representation for each knowledge graph, with the same schema being used for each projects. From this intermediate format it is possible to generate all other files, using just one method per output format rather than writing a custom converter for each project.

References

to_statistics()

Determine some statistical properties of the knowledge graph.

Generates the output file statistics.json. This is a JSON file that contains data about basic statistics of the elements in the knowledge graph, such as node, edge and type counts.

to_schema()

Determine the schema of the knowledge graph.

Generates the output file schema.html. This is a standalone HTML file with an interactive graph visualization of all entity types in the knowledge graph and the relationship types by which they are connected.

References

to_sql()

Convert the knowledge graph to a SQL text file.

Generates the output file kg.sql. This is a text file with SQL commands that can be used to import the structure and content of the knowledge graph into relational database systems such as MySQL or PostgreSQL.

References

to_csv()

Convert the knowledge graph to two CSV text files.

Generates the output files kg_nodes.csv and kg_edges.csv.

References

Wikipedia: CSV

to_jsonl()

Convert the knowledge graph to two JSON Lines text files.

Generates the output files kg_nodes.jsonl and kg_edges.jsonl.

References

JSONL

to_metta(representation='spo')

Convert the knowledge graph to a MeTTa text file.

Generates the output file kg_spo.metta, kg_properties_aggregated.metta or kg_properties_expanded.metta, depending on the chosen representation.

Caution: These representations are still subject to experimentation and testing. They might change in future versions of this package.

Parameters:

representation (str) – The format used to represent the knowledge graph in the MeTTa language.

Available options:

"spo": Semantic triples of the form ("subject", "predicate", "object"). If properties are present in the original knowledge graph, they are ignored in this representation.
"properties_aggregated": Properties (=key-value pairs) are represented by putting each key on a separate line, but each value is ensured to be a single number or string. This means values that hold a compound data type like a list or dict are aggregated into one string in JSON string format. Text identifiers of nodes are reused to create the association with their properties, while text identifiers of the form “e{cnt}” are introduced for edges to serve the same purpose.
"properties_expanded": Properties (=key-value pairs) are represented by fully expanding their keys and values onto as many lines as required. Numerical identifiers for nodes and edges are introduced to create the association between these elements and their properties.

References

to_graphml()

Convert the knowledge graph to a GraphML text file.

Generates the output file kg.graphml.

References

class kgw.biomedicine.Oregano(version, workdir)[source]

Oregano Knowledge Graph.

References

Publication: https://doi.org/10.1038/s41597-023-02757-0
Code: https://gitub.u-bordeaux.fr/erias/oregano
Data: https://doi.org/10.6084/m9.figshare.23553114

__init__(version, workdir)

Initialize a project instance so that tasks can be defined on it.

Parameters:

version (str) – Version of the dataset that will be downloaded and processed. The method get_versions() returns all currently available versions.
workdir (str) – Path of the working directory in which a unique subdirectory will be created to hold all downloaded and generated files for this project and version.

Raises:

ValueError – Raised if version is invalid or unavailable.
TypeError – Raised if workdir is not a string.

Notes

This class does not automatically download or process any data. Such tasks first need to be specified by calling the relevant methods on the project object and then passing it to the function run() that builds and executes a corresponding workflow.

classmethod get_versions(): Fetch all currently available versions from the data repository of the project.

to_sqlite()

Convert the knowledge graph to a file-based SQLite database.

Generates the output file kg.sqlite. This database contains a unified representation for each knowledge graph, with the same schema being used for each projects. From this intermediate format it is possible to generate all other files, using just one method per output format rather than writing a custom converter for each project.

References

to_statistics()

Determine some statistical properties of the knowledge graph.

Generates the output file statistics.json. This is a JSON file that contains data about basic statistics of the elements in the knowledge graph, such as node, edge and type counts.

to_schema()

Determine the schema of the knowledge graph.

Generates the output file schema.html. This is a standalone HTML file with an interactive graph visualization of all entity types in the knowledge graph and the relationship types by which they are connected.

References

to_sql()

Convert the knowledge graph to a SQL text file.

Generates the output file kg.sql. This is a text file with SQL commands that can be used to import the structure and content of the knowledge graph into relational database systems such as MySQL or PostgreSQL.

References

to_csv()

Convert the knowledge graph to two CSV text files.

Generates the output files kg_nodes.csv and kg_edges.csv.

References

Wikipedia: CSV

to_jsonl()

Convert the knowledge graph to two JSON Lines text files.

Generates the output files kg_nodes.jsonl and kg_edges.jsonl.

References

JSONL

to_metta(representation='spo')

Convert the knowledge graph to a MeTTa text file.

Generates the output file kg_spo.metta, kg_properties_aggregated.metta or kg_properties_expanded.metta, depending on the chosen representation.

Caution: These representations are still subject to experimentation and testing. They might change in future versions of this package.

Parameters:

representation (str) – The format used to represent the knowledge graph in the MeTTa language.

Available options:

"spo": Semantic triples of the form ("subject", "predicate", "object"). If properties are present in the original knowledge graph, they are ignored in this representation.
"properties_aggregated": Properties (=key-value pairs) are represented by putting each key on a separate line, but each value is ensured to be a single number or string. This means values that hold a compound data type like a list or dict are aggregated into one string in JSON string format. Text identifiers of nodes are reused to create the association with their properties, while text identifiers of the form “e{cnt}” are introduced for edges to serve the same purpose.
"properties_expanded": Properties (=key-value pairs) are represented by fully expanding their keys and values onto as many lines as required. Numerical identifiers for nodes and edges are introduced to create the association between these elements and their properties.

References

to_graphml()

Convert the knowledge graph to a GraphML text file.

Generates the output file kg.graphml.

References

class kgw.biomedicine.PrimeKg(version, workdir)[source]

Precision Medicine Knowledge Graph (PrimeKG).

References

Publication: https://doi.org/10.1038/s41597-023-01960-3
Website: https://zitniklab.hms.harvard.edu/projects/PrimeKG
Code: https://github.com/mims-harvard/PrimeKG
Data: https://doi.org/10.7910/DVN/IXA7BM

__init__(version, workdir)

Initialize a project instance so that tasks can be defined on it.

Parameters:

version (str) – Version of the dataset that will be downloaded and processed. The method get_versions() returns all currently available versions.
workdir (str) – Path of the working directory in which a unique subdirectory will be created to hold all downloaded and generated files for this project and version.

Raises:

ValueError – Raised if version is invalid or unavailable.
TypeError – Raised if workdir is not a string.

Notes

This class does not automatically download or process any data. Such tasks first need to be specified by calling the relevant methods on the project object and then passing it to the function run() that builds and executes a corresponding workflow.

classmethod get_versions(): Fetch all currently available versions from the data repository of the project.

to_sqlite()

Convert the knowledge graph to a file-based SQLite database.

Generates the output file kg.sqlite. This database contains a unified representation for each knowledge graph, with the same schema being used for each projects. From this intermediate format it is possible to generate all other files, using just one method per output format rather than writing a custom converter for each project.

References

to_statistics()

Determine some statistical properties of the knowledge graph.

Generates the output file statistics.json. This is a JSON file that contains data about basic statistics of the elements in the knowledge graph, such as node, edge and type counts.

to_schema()

Determine the schema of the knowledge graph.

Generates the output file schema.html. This is a standalone HTML file with an interactive graph visualization of all entity types in the knowledge graph and the relationship types by which they are connected.

References

to_sql()

Convert the knowledge graph to a SQL text file.

Generates the output file kg.sql. This is a text file with SQL commands that can be used to import the structure and content of the knowledge graph into relational database systems such as MySQL or PostgreSQL.

References

to_csv()

Convert the knowledge graph to two CSV text files.

Generates the output files kg_nodes.csv and kg_edges.csv.

References

Wikipedia: CSV

to_jsonl()

Convert the knowledge graph to two JSON Lines text files.

Generates the output files kg_nodes.jsonl and kg_edges.jsonl.

References

JSONL

to_metta(representation='spo')

Convert the knowledge graph to a MeTTa text file.

Generates the output file kg_spo.metta, kg_properties_aggregated.metta or kg_properties_expanded.metta, depending on the chosen representation.

Caution: These representations are still subject to experimentation and testing. They might change in future versions of this package.

Parameters:

representation (str) – The format used to represent the knowledge graph in the MeTTa language.

Available options:

"spo": Semantic triples of the form ("subject", "predicate", "object"). If properties are present in the original knowledge graph, they are ignored in this representation.
"properties_aggregated": Properties (=key-value pairs) are represented by putting each key on a separate line, but each value is ensured to be a single number or string. This means values that hold a compound data type like a list or dict are aggregated into one string in JSON string format. Text identifiers of nodes are reused to create the association with their properties, while text identifiers of the form “e{cnt}” are introduced for edges to serve the same purpose.
"properties_expanded": Properties (=key-value pairs) are represented by fully expanding their keys and values onto as many lines as required. Numerical identifiers for nodes and edges are introduced to create the association between these elements and their properties.

References

to_graphml()

Convert the knowledge graph to a GraphML text file.

Generates the output file kg.graphml.

References