Change test data file format for `matching-ws` #1317

djtfmartin · 2024-05-01T13:15:33Z

To get the integration tests from gbif/checklistbank to work in the ported API involved assembling all the test data which is in format of the v1 species API in JSON, into a single CSV for the purposes of generating a test index.

Example of current format

{
    "usageKey": 1011638,
    "scientificName": "Abacion tesselatum Rafinesque, 1820",
    "canonicalName": "Abacion tesselatum",
    "rank": "SPECIES",
    "status": "ACCEPTED",
    "confidence": 100,
    "note": "Individual confidence: name=120; classification=-2; rank=0; status=1; singleMatch=10",
    "matchType": "EXACT",
    "kingdom": "Animalia",
    "phylum": "Arthropoda",
    "order": "Callipodida",
    "family": "Abacionidae",
    "genus": "Abacion",
    "species": "Abacion tesselatum",
    "kingdomKey": 1,
    "phylumKey": 54,
    "classKey": 361,
    "orderKey": 501,
    "familyKey": 7228,
    "genusKey": 1011637,
    "speciesKey": 1011638,
    "synonym": false,
    "class": "Diplopoda"
}

It may make sense to replace all test data in the nub*.json files with a single CSV, or replace the nub*.json files with small CSVs (which might be easier to maintain).

Another option would be potentially switch to the texttree format.

mdoering · 2024-05-01T20:45:42Z

It was very convenient to use the matching response format to build the index. You could simply try out name matches that were problematic on the gbif API, store them locally as a seed to the index and create tests for it. Then work to fix the matching to respond as desired.

I fear that having the data in CSV or other formats quite some time is spent on preparing the test data. Could we not just continue with the old format? Or keep the old data as a single CSV but then allow to add new names via the new v2 matching response maybe?

djtfmartin · 2024-05-02T12:22:48Z

My main concern was leaving a bunch of files that don't make too much sense for future developers maintaining the API as i assume v1 responses will eventually be deprecated and eventually be unavailable. Also the content of v2 responses wont fit into v1 (additional ranks, string keys etc).

For now I'll keep the existing files as they are and try and make it clear in the code its a format for v1 responses.
For the new API, we can follow the same model of using the matching response format in v2 to build the index and keep the v2 responses in separate directory.

djtfmartin self-assigned this May 1, 2024

djtfmartin changed the title ~~Change test data file format~~ Change test data file format for matching-ws May 1, 2024

djtfmartin added the matching label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change test data file format for `matching-ws` #1317

Change test data file format for `matching-ws` #1317

djtfmartin commented May 1, 2024 •

edited

Loading

mdoering commented May 1, 2024

djtfmartin commented May 2, 2024 •

edited

Loading

Change test data file format for matching-ws #1317

Change test data file format for matching-ws #1317

Comments

djtfmartin commented May 1, 2024 • edited Loading

mdoering commented May 1, 2024

djtfmartin commented May 2, 2024 • edited Loading

Change test data file format for `matching-ws` #1317

Change test data file format for `matching-ws` #1317

djtfmartin commented May 1, 2024 •

edited

Loading

djtfmartin commented May 2, 2024 •

edited

Loading