Skip to content

Taxonomic Inference: 3. Publishing a Data Set

Katja Schulz edited this page Apr 19, 2023 · 4 revisions

Harvesting the Trait Data Set

Once a taxonomic inference trait data set has been uploaded to the EOL Open Data repository, it can be submitted to the EOL Resource Harvester. Under Import from OpenData (at the bottom of the page), enter the resource page url in the Opendata url field. Make sure you enter the resource page url, i.e., https://opendata.eol.org/dataset/[data set name]/resource/[resourceID], NOT the data set url (the parent page of the resource page) or the url for the resource file (which is listed on the resource page). Please review the metadata for the imported resource and revise them as needed, using the "edit" button on the resource page. Then click on re-harvest to initiate the harvest. The harvest may take a while, depending on the size of your data set. Once it is completed, please review the harvest report (under Harvests on the resource page) to see if there were any errors or warnings. Check if the harvest content has the expected number of taxa (nodes) and traits and make sure there are no unmatched nodes. These would be listed in the harvest report immediately after "map_all_nodes_to_pages".

Publishing the Trait Data Set

If your trait data set harvest was successful, you can publish your data set on EOL. On the EOL resource page, go to see in production, then click on republish. The publishing process may take a while, depending on the size of your data set. Keep an eye on the Publish Log. It will indicate when the publication process is completed. After publication, you can access all the taxa in your resource through the BROWSE ### NODES link. Spot check the data tabs of the EOL pages for some of your start nodes. These pages should now show the trait data records from your data set. Descendant pages will not yet have the data records, because the branches have not yet been painted.

Painting Branches

After your resource is published, you can use the branchpainting tool to propagate trait values to the descendants of your start nodes. (You'll need to log in to the EOL Jenkins platform.) The new trait data records inferred from the start and stop nodes in your data set will be added directly in the EOL graph database.

Click on Build with Parameters and enter the resource ID of your published resource. You can get the resource ID from the resource url in production, e.g., if the url of your resource is https://eol.org/resources/12345, your resource ID is 12345. Click on Build to initiate the branchpainting process. The branchpainting script looks for start nodes in the data set and copies the parent trait data record to all of the node’s descendants until it encounters a stop node.

Once the branchpainting process has finished (you can watch its progress in the left sidebar), spot check the data tabs of the EOL pages of some of your start node descendants to make sure the branches got painted properly. Trait data records that were created via taxonomic inference will have an inherited from metadata value for the EOL page ID of the start node from which the record was inferred. There will also be a scientific name metadata record for the start node and a listing of relevant stop nodes. To make sure there aren't any systematic errors in your data set, we provide a series of sample quality control queries that you can use to check the integrity of your published data set.

Revising a Published Data Set

If you want to revise a published taxonomic inference data set, simply reharvest your resource using the Re-download OpenData and harvest option. When you then publish the reharvested data set, all previous data from this resource will be removed, and you can use the use the branchpainting tool to create a new set of taxonomically inferred trait data based on the start and stop nodes in the revised data set.