Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update KGX code to process new merge.yaml design #500

Open
hrshdhgd opened this issue Aug 6, 2024 · 0 comments
Open

Update KGX code to process new merge.yaml design #500

hrshdhgd opened this issue Aug 6, 2024 · 0 comments

Comments

@hrshdhgd
Copy link
Collaborator

hrshdhgd commented Aug 6, 2024

This is w.r.t a conversation with @sierra-moxon .

The project (or rather PR) that triggered this issue: Knowledge-Graph-Hub/kg-microbe-merge#1

The issue that concerns KGX is that instead of having a physical merge.yaml file, we dynamically create the file through the following steps:

  1. Create a LinkML schema that replaces the merge.yaml file.
  2. Create classes using gen-python from linkml
  3. Create an object [MergeKG ] using the above classes based on the transforms that need to be merged.
    • This gives the flexibility of choosing which individual transforms need to be merged. Falls perfectly into the KG modularization discussion we've been having.
  4. Use yaml_dumper to dump the MergeKG into a yaml file which is then passes to the load_and_merge() in KGX.
    • This step could be avoided if KGX could accept the MergeKG object directly in the function above rather than exporting it as a file.

The old merge_yaml had the structure (displaying only parts that need to be updated):

merged_graph:
  name: kg-microbe-merge graph
  source:
    hp:
      name: "HP"
      input:
        format: tsv
        filename:
          - data/transformed/ontologies/HPTransform_nodes.tsv
          - data/transformed/ontologies/HPTransform_edges.tsv

  destination:
    merged-kg-tsv:
      format: tsv
      compression: tar.gz
      filename: merged-kg
    merged-kg-nt:
      format: nt
      compression: gz
      filename: kg_microbe_merge.nt.gz

The new format equivalent will look like this:

merged_graph:
  name: kg-microbe graph
  source:
    - name: "HP"
      input:
        format: tsv
        filename:
          - data/transformed/ontologies/hp_nodes.tsv
          - data/transformed/ontologies/hp_edges.tsv

  destination:
    - format: tsv
      compression: tar.gz
      filename: merged-kg

Just source and destination are the ones whose design is updated.

So the 2 features that can come out of this issue are:

  • Update cli_utils.py to handle the updated merge.yaml
  • Enable load_and_merge() to accept MergeKG object which is a linkml product.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant