Skip to content

Latest commit

 

History

History
50 lines (30 loc) · 3.01 KB

PR_review_guide.md

File metadata and controls

50 lines (30 loc) · 3.01 KB

PR review guide for method workflows

(If you don't really know what to look for when reviewing a PR)

When browsing the pull requests, watch out for the Ready for review tag; here reviews are needed. If you come across Reviews to address there has already been at least one review, and some changes have to be done to the code. You could still review, or wait until this label disappears, or you address the suggestions yourself. In the latter case, do communicate this on slack and in the pull request comments, to avoid several people getting in each other's way.
At least 2 independent reviews are required before merging a PR.

Start with the README of the method workflow

  • Try and follow the instructions as they are to run the pipeline (using the test data in tests/test_data) (Most workflows should have configs and sample-sheets set up to use this data anyway). Request improvement of the README if you can't run the method workflow with the given instructions. The workflow should run through on the test data without any errors, else please report those errors. However, if it has been documented in the README already that/why the method doesn't work on the test data, and sufficient proof that it works on larger datasets is given (ideally with links to that data), you can still approve. Note that "method doesn't work with ensembl/provided file format/etc." is NOT a valid reason. Those issues have to be adressed.

  • Note anything else that’s unclear or missing, for example concerning

    • Format of required input data
    • Layout of sample sheet and what the columns mean (there should be an example)
    • Output files produced (names, what they contain)
    • links to the original repo/README/publication of the participant

Check the output data matches our specifications

Refer to the specifications.

Check that steps to run methods/pre-processing are wrapped in Dockerfiles

Code organisation & general comments

  • Stages of pipeline should (ideally) do one thing

    • E.g. separate scripts to run method and to generate APAeval output files
  • No hardcoding of parameters or output filenames (make them adjustable via a config file)

  • Sample sheet should contain sample specific info, pipeline specific info should go in config

  • Custom annotation files should be generated by the workflow and not be downloaded (e.g. from UCSC Genome Browser, method’s Github repo etc.)

    • If ‘gene model’ or BED12 format required, use the ‘GTF to BED12’ utility in the utils/ folder
  • Output files from test data shouldn’t be committed with the repository

  • Make sure (custom) code is clearly documented with comments explaining functions etc.