Skip to content

OHSU-Library/cleaning-data-openrefine

 
 

Repository files navigation

DOI

OHSU Library OpenRefine

Cleaning Data Cleaning with OpenRefine Lesson adapted from Data Carpentry by the OHSU Library

OpenRefine Version

The current version has been tested with OpenRefine 3.7.2 on May 2023.

Data set notes

  • This data set is derived from The Portal Project Long-term desert ecology project data. This data file was downloaded and then modified specifically for use with OpenRefine.
    • Taxon names were put back into the file.
    • The number of rows was reduced to simplify the reconciliation and URL parsing exercises.
    • These modifications were made in order to illustrate some features of Open Refine.
      • Errors were added to the taxon names (scientificName field), to demonstrate OpenRefine's ability to find likely mis-entered data.
      • These errors can be found using clustering algorithms on the scientificName column, showing the power of the algorithms to find discrepancies quickly and making it simple to fix all issues found.