kgw.biomedicine
Knowledge graph projects from the domain of biomedicine.
Classes
Clinical Knowledge Graph (CKG). |
|
Human Aging and Longevity Dataset (HALD). |
|
Monarch Knowledge Graph (MonarchKG). |
|
Oregano Knowledge Graph. |
|
Precision Medicine Knowledge Graph (PrimeKG). |
Package Contents
- class kgw.biomedicine.Ckg(version, workdir)[source]
Clinical Knowledge Graph (CKG).
References
Publication: https://doi.org/10.1038/s41587-021-01145-6
Website: https://ckg.readthedocs.io
- __init__(version, workdir)
Initialize a project instance so that tasks can be defined on it.
- Parameters:
version (
str
) – Version of the dataset that will be downloaded and processed. The methodget_versions()
returns all currently available versions.workdir (
str
) – Path of the working directory in which a unique subdirectory will be created to hold all downloaded and generated files for this project and version.
- Raises:
ValueError – Raised if
version
is invalid or unavailable.TypeError – Raised if
workdir
is not a string.
Notes
This class does not automatically download or process any data. Such tasks first need to be specified by calling the relevant methods on the project object and then passing it to the function
run()
that builds and executes a corresponding workflow.
- classmethod get_versions()
Fetch all currently available versions from the data repository of the project.
- to_sqlite()
Convert the knowledge graph to a file-based SQLite database.
Generates the output file
kg.sqlite
. This database contains a unified representation for each knowledge graph, with the same schema being used for each projects. From this intermediate format it is possible to generate all other files, using just one method per output format rather than writing a custom converter for each project.References
- to_statistics()
Determine some statistical properties of the knowledge graph.
Generates the output file
statistics.json
. This is a JSON file that contains data about basic statistics of the elements in the knowledge graph, such as node, edge and type counts.
- to_schema()
Determine the schema of the knowledge graph.
Generates the output file
schema.html
. This is a standalone HTML file with an interactive graph visualization of all entity types in the knowledge graph and the relationship types by which they are connected.References
- to_sql()
Convert the knowledge graph to a SQL text file.
Generates the output file
kg.sql
. This is a text file with SQL commands that can be used to import the structure and content of the knowledge graph into relational database systems such as MySQL or PostgreSQL.References
- to_csv()
Convert the knowledge graph to two CSV text files.
Generates the output files
kg_nodes.csv
andkg_edges.csv
.References
- to_jsonl()
Convert the knowledge graph to two JSON Lines text files.
Generates the output files
kg_nodes.jsonl
andkg_edges.jsonl
.References
- to_metta(representation='spo')
Convert the knowledge graph to a MeTTa text file.
Generates the output file
kg_spo.metta
,kg_properties_aggregated.metta
orkg_properties_expanded.metta
, depending on the chosen representation.Caution: These representations are still subject to experimentation and testing. They might change in future versions of this package.
- Parameters:
representation (str) – The format used to represent the knowledge graph in the MeTTa language.
Available options:
"spo"
: Semantic triples of the form("subject", "predicate", "object")
. If properties are present in the original knowledge graph, they are ignored in this representation."properties_aggregated"
: Properties (=key-value pairs) are represented by putting each key on a separate line, but each value is ensured to be a single number or string. This means values that hold a compound data type like a list or dict are aggregated into one string in JSON string format. Text identifiers of nodes are reused to create the association with their properties, while text identifiers of the form “e{cnt}” are introduced for edges to serve the same purpose."properties_expanded"
: Properties (=key-value pairs) are represented by fully expanding their keys and values onto as many lines as required. Numerical identifiers for nodes and edges are introduced to create the association between these elements and their properties.
References
- to_graphml()
Convert the knowledge graph to a GraphML text file.
Generates the output file
kg.graphml
.References
- class kgw.biomedicine.Hald(version, workdir)[source]
Human Aging and Longevity Dataset (HALD).
References
Publication: https://doi.org/10.1038/s41597-023-02781-0
Website: https://bis.zju.edu.cn/hald
- to_schema()[source]
Determine the schema of the knowledge graph.
Generates the output file
schema.html
. This is a standalone HTML file with an interactive graph visualization of all entity types in the knowledge graph and the relationship types by which they are connected.References
- __init__(version, workdir)
Initialize a project instance so that tasks can be defined on it.
- Parameters:
version (
str
) – Version of the dataset that will be downloaded and processed. The methodget_versions()
returns all currently available versions.workdir (
str
) – Path of the working directory in which a unique subdirectory will be created to hold all downloaded and generated files for this project and version.
- Raises:
ValueError – Raised if
version
is invalid or unavailable.TypeError – Raised if
workdir
is not a string.
Notes
This class does not automatically download or process any data. Such tasks first need to be specified by calling the relevant methods on the project object and then passing it to the function
run()
that builds and executes a corresponding workflow.
- classmethod get_versions()
Fetch all currently available versions from the data repository of the project.
- to_sqlite()
Convert the knowledge graph to a file-based SQLite database.
Generates the output file
kg.sqlite
. This database contains a unified representation for each knowledge graph, with the same schema being used for each projects. From this intermediate format it is possible to generate all other files, using just one method per output format rather than writing a custom converter for each project.References
- to_statistics()
Determine some statistical properties of the knowledge graph.
Generates the output file
statistics.json
. This is a JSON file that contains data about basic statistics of the elements in the knowledge graph, such as node, edge and type counts.
- to_sql()
Convert the knowledge graph to a SQL text file.
Generates the output file
kg.sql
. This is a text file with SQL commands that can be used to import the structure and content of the knowledge graph into relational database systems such as MySQL or PostgreSQL.References
- to_csv()
Convert the knowledge graph to two CSV text files.
Generates the output files
kg_nodes.csv
andkg_edges.csv
.References
- to_jsonl()
Convert the knowledge graph to two JSON Lines text files.
Generates the output files
kg_nodes.jsonl
andkg_edges.jsonl
.References
- to_metta(representation='spo')
Convert the knowledge graph to a MeTTa text file.
Generates the output file
kg_spo.metta
,kg_properties_aggregated.metta
orkg_properties_expanded.metta
, depending on the chosen representation.Caution: These representations are still subject to experimentation and testing. They might change in future versions of this package.
- Parameters:
representation (str) – The format used to represent the knowledge graph in the MeTTa language.
Available options:
"spo"
: Semantic triples of the form("subject", "predicate", "object")
. If properties are present in the original knowledge graph, they are ignored in this representation."properties_aggregated"
: Properties (=key-value pairs) are represented by putting each key on a separate line, but each value is ensured to be a single number or string. This means values that hold a compound data type like a list or dict are aggregated into one string in JSON string format. Text identifiers of nodes are reused to create the association with their properties, while text identifiers of the form “e{cnt}” are introduced for edges to serve the same purpose."properties_expanded"
: Properties (=key-value pairs) are represented by fully expanding their keys and values onto as many lines as required. Numerical identifiers for nodes and edges are introduced to create the association between these elements and their properties.
References
- to_graphml()
Convert the knowledge graph to a GraphML text file.
Generates the output file
kg.graphml
.References
- class kgw.biomedicine.MonarchKg(version, workdir)[source]
Monarch Knowledge Graph (MonarchKG).
References
Publication: https://doi.org/10.1093/nar/gkad1082
Website: https://monarchinitiative.org
Data: https://data.monarchinitiative.org/monarch-kg/index.html
- __init__(version, workdir)
Initialize a project instance so that tasks can be defined on it.
- Parameters:
version (
str
) – Version of the dataset that will be downloaded and processed. The methodget_versions()
returns all currently available versions.workdir (
str
) – Path of the working directory in which a unique subdirectory will be created to hold all downloaded and generated files for this project and version.
- Raises:
ValueError – Raised if
version
is invalid or unavailable.TypeError – Raised if
workdir
is not a string.
Notes
This class does not automatically download or process any data. Such tasks first need to be specified by calling the relevant methods on the project object and then passing it to the function
run()
that builds and executes a corresponding workflow.
- classmethod get_versions()
Fetch all currently available versions from the data repository of the project.
- to_sqlite()
Convert the knowledge graph to a file-based SQLite database.
Generates the output file
kg.sqlite
. This database contains a unified representation for each knowledge graph, with the same schema being used for each projects. From this intermediate format it is possible to generate all other files, using just one method per output format rather than writing a custom converter for each project.References
- to_statistics()
Determine some statistical properties of the knowledge graph.
Generates the output file
statistics.json
. This is a JSON file that contains data about basic statistics of the elements in the knowledge graph, such as node, edge and type counts.
- to_schema()
Determine the schema of the knowledge graph.
Generates the output file
schema.html
. This is a standalone HTML file with an interactive graph visualization of all entity types in the knowledge graph and the relationship types by which they are connected.References
- to_sql()
Convert the knowledge graph to a SQL text file.
Generates the output file
kg.sql
. This is a text file with SQL commands that can be used to import the structure and content of the knowledge graph into relational database systems such as MySQL or PostgreSQL.References
- to_csv()
Convert the knowledge graph to two CSV text files.
Generates the output files
kg_nodes.csv
andkg_edges.csv
.References
- to_jsonl()
Convert the knowledge graph to two JSON Lines text files.
Generates the output files
kg_nodes.jsonl
andkg_edges.jsonl
.References
- to_metta(representation='spo')
Convert the knowledge graph to a MeTTa text file.
Generates the output file
kg_spo.metta
,kg_properties_aggregated.metta
orkg_properties_expanded.metta
, depending on the chosen representation.Caution: These representations are still subject to experimentation and testing. They might change in future versions of this package.
- Parameters:
representation (str) – The format used to represent the knowledge graph in the MeTTa language.
Available options:
"spo"
: Semantic triples of the form("subject", "predicate", "object")
. If properties are present in the original knowledge graph, they are ignored in this representation."properties_aggregated"
: Properties (=key-value pairs) are represented by putting each key on a separate line, but each value is ensured to be a single number or string. This means values that hold a compound data type like a list or dict are aggregated into one string in JSON string format. Text identifiers of nodes are reused to create the association with their properties, while text identifiers of the form “e{cnt}” are introduced for edges to serve the same purpose."properties_expanded"
: Properties (=key-value pairs) are represented by fully expanding their keys and values onto as many lines as required. Numerical identifiers for nodes and edges are introduced to create the association between these elements and their properties.
References
- to_graphml()
Convert the knowledge graph to a GraphML text file.
Generates the output file
kg.graphml
.References
- class kgw.biomedicine.Oregano(version, workdir)[source]
Oregano Knowledge Graph.
References
Publication: https://doi.org/10.1038/s41597-023-02757-0
- __init__(version, workdir)
Initialize a project instance so that tasks can be defined on it.
- Parameters:
version (
str
) – Version of the dataset that will be downloaded and processed. The methodget_versions()
returns all currently available versions.workdir (
str
) – Path of the working directory in which a unique subdirectory will be created to hold all downloaded and generated files for this project and version.
- Raises:
ValueError – Raised if
version
is invalid or unavailable.TypeError – Raised if
workdir
is not a string.
Notes
This class does not automatically download or process any data. Such tasks first need to be specified by calling the relevant methods on the project object and then passing it to the function
run()
that builds and executes a corresponding workflow.
- classmethod get_versions()
Fetch all currently available versions from the data repository of the project.
- to_sqlite()
Convert the knowledge graph to a file-based SQLite database.
Generates the output file
kg.sqlite
. This database contains a unified representation for each knowledge graph, with the same schema being used for each projects. From this intermediate format it is possible to generate all other files, using just one method per output format rather than writing a custom converter for each project.References
- to_statistics()
Determine some statistical properties of the knowledge graph.
Generates the output file
statistics.json
. This is a JSON file that contains data about basic statistics of the elements in the knowledge graph, such as node, edge and type counts.
- to_schema()
Determine the schema of the knowledge graph.
Generates the output file
schema.html
. This is a standalone HTML file with an interactive graph visualization of all entity types in the knowledge graph and the relationship types by which they are connected.References
- to_sql()
Convert the knowledge graph to a SQL text file.
Generates the output file
kg.sql
. This is a text file with SQL commands that can be used to import the structure and content of the knowledge graph into relational database systems such as MySQL or PostgreSQL.References
- to_csv()
Convert the knowledge graph to two CSV text files.
Generates the output files
kg_nodes.csv
andkg_edges.csv
.References
- to_jsonl()
Convert the knowledge graph to two JSON Lines text files.
Generates the output files
kg_nodes.jsonl
andkg_edges.jsonl
.References
- to_metta(representation='spo')
Convert the knowledge graph to a MeTTa text file.
Generates the output file
kg_spo.metta
,kg_properties_aggregated.metta
orkg_properties_expanded.metta
, depending on the chosen representation.Caution: These representations are still subject to experimentation and testing. They might change in future versions of this package.
- Parameters:
representation (str) – The format used to represent the knowledge graph in the MeTTa language.
Available options:
"spo"
: Semantic triples of the form("subject", "predicate", "object")
. If properties are present in the original knowledge graph, they are ignored in this representation."properties_aggregated"
: Properties (=key-value pairs) are represented by putting each key on a separate line, but each value is ensured to be a single number or string. This means values that hold a compound data type like a list or dict are aggregated into one string in JSON string format. Text identifiers of nodes are reused to create the association with their properties, while text identifiers of the form “e{cnt}” are introduced for edges to serve the same purpose."properties_expanded"
: Properties (=key-value pairs) are represented by fully expanding their keys and values onto as many lines as required. Numerical identifiers for nodes and edges are introduced to create the association between these elements and their properties.
References
- to_graphml()
Convert the knowledge graph to a GraphML text file.
Generates the output file
kg.graphml
.References
- class kgw.biomedicine.PrimeKg(version, workdir)[source]
Precision Medicine Knowledge Graph (PrimeKG).
References
Publication: https://doi.org/10.1038/s41597-023-01960-3
- __init__(version, workdir)
Initialize a project instance so that tasks can be defined on it.
- Parameters:
version (
str
) – Version of the dataset that will be downloaded and processed. The methodget_versions()
returns all currently available versions.workdir (
str
) – Path of the working directory in which a unique subdirectory will be created to hold all downloaded and generated files for this project and version.
- Raises:
ValueError – Raised if
version
is invalid or unavailable.TypeError – Raised if
workdir
is not a string.
Notes
This class does not automatically download or process any data. Such tasks first need to be specified by calling the relevant methods on the project object and then passing it to the function
run()
that builds and executes a corresponding workflow.
- classmethod get_versions()
Fetch all currently available versions from the data repository of the project.
- to_sqlite()
Convert the knowledge graph to a file-based SQLite database.
Generates the output file
kg.sqlite
. This database contains a unified representation for each knowledge graph, with the same schema being used for each projects. From this intermediate format it is possible to generate all other files, using just one method per output format rather than writing a custom converter for each project.References
- to_statistics()
Determine some statistical properties of the knowledge graph.
Generates the output file
statistics.json
. This is a JSON file that contains data about basic statistics of the elements in the knowledge graph, such as node, edge and type counts.
- to_schema()
Determine the schema of the knowledge graph.
Generates the output file
schema.html
. This is a standalone HTML file with an interactive graph visualization of all entity types in the knowledge graph and the relationship types by which they are connected.References
- to_sql()
Convert the knowledge graph to a SQL text file.
Generates the output file
kg.sql
. This is a text file with SQL commands that can be used to import the structure and content of the knowledge graph into relational database systems such as MySQL or PostgreSQL.References
- to_csv()
Convert the knowledge graph to two CSV text files.
Generates the output files
kg_nodes.csv
andkg_edges.csv
.References
- to_jsonl()
Convert the knowledge graph to two JSON Lines text files.
Generates the output files
kg_nodes.jsonl
andkg_edges.jsonl
.References
- to_metta(representation='spo')
Convert the knowledge graph to a MeTTa text file.
Generates the output file
kg_spo.metta
,kg_properties_aggregated.metta
orkg_properties_expanded.metta
, depending on the chosen representation.Caution: These representations are still subject to experimentation and testing. They might change in future versions of this package.
- Parameters:
representation (str) – The format used to represent the knowledge graph in the MeTTa language.
Available options:
"spo"
: Semantic triples of the form("subject", "predicate", "object")
. If properties are present in the original knowledge graph, they are ignored in this representation."properties_aggregated"
: Properties (=key-value pairs) are represented by putting each key on a separate line, but each value is ensured to be a single number or string. This means values that hold a compound data type like a list or dict are aggregated into one string in JSON string format. Text identifiers of nodes are reused to create the association with their properties, while text identifiers of the form “e{cnt}” are introduced for edges to serve the same purpose."properties_expanded"
: Properties (=key-value pairs) are represented by fully expanding their keys and values onto as many lines as required. Numerical identifiers for nodes and edges are introduced to create the association between these elements and their properties.
References
- to_graphml()
Convert the knowledge graph to a GraphML text file.
Generates the output file
kg.graphml
.References