Skip to content

Developer Guidance

Dylan Hall edited this page Dec 21, 2022 · 3 revisions

Developer Testing

The documentation above outlines the approach for a single data owner to run these tools. For a developer who is testing on a synthetic data set, they might want to run all of the above steps quickly and repeatedly for a list of artificial data owners.

In the linkage agent tools there is a Jupyter notebook under development that will run all of these steps through the notebook by invoking scripts in the testing-and-tuning/ folder.

If you would like to test household linkage you can currently run the garble.sh script (configuring the sites for which you have extracted pii). If you would like to test blocking you may run the blocking_garble.sh script. Note: for these scripts it is assumed that the pii files created by the extract.py have been renamed to their respective pii_{site}.csv.

Salt Value

The testing-and-tuning/generate_secret.py script will create a secret salt for you if require it, e.g.:

python testing-and-tuning/generate_secret.py

This should create a new file called deidentification_secret.txt in your root directory.

Cleanup

In between runs it is advisable to run rm temp-data/* to clean up temporary data files used for individuals runs.

Formatting and Linting

This repository uses black, flake8, and isort to maintain consistent formatting and style. These tools can be run with the following command:

black .
isort .
flake8
Clone this wiki locally