Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change test data file format for matching-ws #1317

Open
djtfmartin opened this issue May 1, 2024 · 2 comments
Open

Change test data file format for matching-ws #1317

djtfmartin opened this issue May 1, 2024 · 2 comments
Assignees
Labels

Comments

@djtfmartin
Copy link
Contributor

djtfmartin commented May 1, 2024

To get the integration tests from gbif/checklistbank to work in the ported API involved assembling all the test data which is in format of the v1 species API in JSON, into a single CSV for the purposes of generating a test index.

Example of current format

{
    "usageKey": 1011638,
    "scientificName": "Abacion tesselatum Rafinesque, 1820",
    "canonicalName": "Abacion tesselatum",
    "rank": "SPECIES",
    "status": "ACCEPTED",
    "confidence": 100,
    "note": "Individual confidence: name=120; classification=-2; rank=0; status=1; singleMatch=10",
    "matchType": "EXACT",
    "kingdom": "Animalia",
    "phylum": "Arthropoda",
    "order": "Callipodida",
    "family": "Abacionidae",
    "genus": "Abacion",
    "species": "Abacion tesselatum",
    "kingdomKey": 1,
    "phylumKey": 54,
    "classKey": 361,
    "orderKey": 501,
    "familyKey": 7228,
    "genusKey": 1011637,
    "speciesKey": 1011638,
    "synonym": false,
    "class": "Diplopoda"
}

It may make sense to replace all test data in the nub*.json files with a single CSV, or replace the nub*.json files with small CSVs (which might be easier to maintain).

Another option would be potentially switch to the texttree format.

@djtfmartin djtfmartin self-assigned this May 1, 2024
@djtfmartin djtfmartin changed the title Change test data file format Change test data file format for matching-ws May 1, 2024
@mdoering
Copy link
Member

mdoering commented May 1, 2024

It was very convenient to use the matching response format to build the index. You could simply try out name matches that were problematic on the gbif API, store them locally as a seed to the index and create tests for it. Then work to fix the matching to respond as desired.

I fear that having the data in CSV or other formats quite some time is spent on preparing the test data. Could we not just continue with the old format? Or keep the old data as a single CSV but then allow to add new names via the new v2 matching response maybe?

@djtfmartin
Copy link
Contributor Author

djtfmartin commented May 2, 2024

My main concern was leaving a bunch of files that don't make too much sense for future developers maintaining the API as i assume v1 responses will eventually be deprecated and eventually be unavailable. Also the content of v2 responses wont fit into v1 (additional ranks, string keys etc).

For now I'll keep the existing files as they are and try and make it clear in the code its a format for v1 responses.
For the new API, we can follow the same model of using the matching response format in v2 to build the index and keep the v2 responses in separate directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants