Skip to content

Releases: theiagen/public_health_viral_genomics

v2.3.2

25 May 16:49
d75e99b
Compare
Choose a tag to compare

PHVG v2.3.2 Patch Release

This patch release updates the Mercury workflows and adds a new output variable ivar_variant_proportion_intermediate

Mercury patches

This release adds the "covv_consortium" column to the output GISAID metadata file in the Mercury workflows. This new optional column has been added to the metadata formatters, which can be found here: Mercury_PE/_SE_Prep at gs://theiagen-public-files/terra/mercury-files/Terra_Metadata_Formatter_2023_05_22.xlsx, and Mercury_Prep_N_Batch at gs://theiagen-public-files/terra/mercury-files/Mercury_Prep_N_Batch_SC2_Metadata_Formatter_2023_05_22.xlsx.

Also, empty date values will now fail more informatively in Mercury_Prep_N_Batch.

New output variable

The variant_call task has been modified to now calculate the proportion of variants at intermediate allele frequencies (60-90%). This value is reported in the output column ivar_variant_proportion_intermediate for workflows that use iVar to perform variant calling (TheiaCoV_Illumina_PE and TheiaCoV_Illumina_SE).

What's Changed

Full Changelog: v2.3.1...v2.3.2

Follow us on Twitter!

v2.3.1

10 Mar 16:42
715123f
Compare
Choose a tag to compare

PHVG v2.3.1 release notes

This patch release adds capability for detection of mutations known to be associated with Tamiflu resistance, includes bug fixes for Influenza Type B subtyping, and updates default input parameters (pangolin docker image, nextclade_dataset_tag, nextclade docker image).

New Features

  • New column tamiflu_resistance_aa_subs containing nextclade-detected substitutions that have been described in the literature to confer resistance to tamiflu (Influenza-specific)
  • New optional boolean input parameters for Mercury_Prep_N_Batch:
    • using_clearlabs_data, using_reads_dehosted, usa_territory
  • New optional input parameter for Freyja_Plot workflow: mincov

Default Docker Images and Input Parameter Updates

  • Default pangolin docker image: staphb/pangolin:4.2-pdata-1.18.1.1
  • Default nextclade docker image: nextstrain/nextclade:2.11.0
  • Default nextclade_dataset_tag for SARS-CoV-2: 2023-02-25T12:00:00Z
  • Default freyja docker image: staphb/freyja:1.3.11

Other Changes

  • Bug fix: Type B Influenza subtypes no longer duplicated from ABRicate output
  • Updates to GitHub Actions workflows for automated testing

Documentation can be found here: https://theiagen.notion.site/Theiagen-Public-Health-Resources-a4bd134b0c5c4fe39870e21029a30566

What's Changed

  • Expose minimum coverage option in Freyja_Plot by @sage-wright in #211
  • Enable alternative read and assembly files by @sage-wright in #210
  • Fix Bug RE Type-B subtyping (TheiaCoV_Illumina PE flu track) by @kevinlibuit in #216
  • update nextclade TSV parsing for SC2: clade_legacy. Also update Flu & nextclade by @kapsakcj and @cimendes in #213
  • update default pangolin docker to staphb/pangolin:4.2-pdata-1.18.1.1 and nextclade_dataset_tag for SC2 by @kapsakcj in #217

Full Changelog: v2.3.0...v2.3.1

Follow Theiagen on Twitter & LinkedIn!

v2.3.0

30 Dec 20:06
e492dec
Compare
Choose a tag to compare

PHVG v2.3.0 Release Notes

This minor release introduces updates organism updates for the TheiaCoV workflow series as well as a new workflow for preparing and submitting metadata to public repositories (Mercury_Prep_N_Batch).

Updates to the TheiaCoV Workflow Series

Organism track updates:

  • “MPXV” for monkeypox analysis: VADR annotation assessment enabled (was previously not supported)
  • "WNV" for West Nile Virus analysis: VADR annotation assessment enabled (was previously not supported)
  • "flu" for influenza analysis: will initiate genome assembly with IRMA and characterization with ABRicate against InsaFlu database and NextClade; available in TheiaCoV_Illumina_PE only
  • "HIV" for Human Immunodeficiency Virus analysis: will initiate consensus assembly by alignment (BWA + iVar or minimap2 + Medaka for Illumina and ONT read data, respectively) and characterization with Quasitools HyDRA for antiretroviral drug resistance detection

Note: The default value for the organism variable is “sars-cov-2”

QC and read processing modules updates:

Mercury Prep-N-Batch Workflow

The Mercury_Prep_N_Batch workflow combines the previously separate Mercury_PE/SE_Prep and Mercury_Batch workflows into one.
This workflow functions as follows:

Step 1: Performs supermassive metadata wrangling (task sm_metadata_wrangling in task_mercury_file_wrangling)

  • downloads the entire origin Terra table where the data, analysis results, metadata, etc. are stored.
  • extracts the samples that the user intends to upload
  • creates some standard variables that are used multiple times (such as year, isolate, etc.)
  • determines which organism is being run (currently only supports sars-cov-2 and mpox) and sets the required and optional variables for each file that is being created (e.g., BioSample vs SRA vs GISAID vs GenBank/BankIt)
  • removes any entries that do not meet predetermined quality thresholds (vadr_num_alerts and number_N)
  • removes any entries that do not have all required fields present, and writes the samples that were removed to a table that also lists what fields were missing
  • renames columns as appropriate
  • reformats columns as appropriate
  • compiles all required and optional information in TSV files
  • renames files with the submission_id and edits fasta headers as appropriate
  • uploads read files to the Theiagen SRA GCP Google bucket

Step 2: If sars-cov-2, trim GenBank fasta files of terminal Ns (task trim_genbank_fastas in task_mercury_file_wrangling.wdl)

  • uses VADR to trim terminal ambiguous nucleotides
  • returns the edited fasta file

Step 3: If mpox, put metadata into sqn format (task table2asn in task_mercury_file_wrangling.wdl)

  • soft links the .sbt, .fsa, and .src files to have common name
  • converts the data into a sqn file with table2asn so it can be emailed to NCBI

New Documentation

Detailed documentation has been created for all workflows in the PHVG v2.3.0 repository.

What's Changed

New Contributors

Full Changelog: v2.2.0...v2.3.0

Follow Theiagen on Twitter!

v2.2.0

08 Aug 20:22
ec5f1a9
Compare
Choose a tag to compare

This release introduces TheiaCoV amenability to non-SARS-CoV-2 (e.g., MPXV) genomic characterization.

NOTE: Use of TheiaCoV for MPXV will require modified input variables; e.g., primer_bed and reference_genome. Please view our public Notion page for information on recommended input variables for MPXV genomic characterization.

Use of TheiaCoV for SARS-CoV-2 will not require any change to input variables; i.e., SARS-CoV-2 characterization is the default behavior of the TheiaCoV workflows. Please view our public Notion page to find the latest recommended workspace data elements for SARS-CoV-2 genomic characterization.

TheiaCoV amenability to non-SARS-CoV-2 genomic characterization

  • An organism variable has been implemented to indicate what organism you want to analyze. This is intended to allow for expansion of the workflow to other viruses not currently supported in the future.
    • The default value is “sars-cov-2”
    • Change to “MPXV” for monkeypox analysis
  • A new Boolean variable trim_primers indicates whether or not you want to trim primers. This is most applicable when analyzing data generated without primers; e.g., a metagenomic approach. Because of this change, the primer_bed variable is now optional and no longer will appear in the same location on the workflow input page. You must indicate a primer_bed file in order to trim primers. When you switch to this new version, the primer file will be inherited to the correct place so no change is required for SARS-CoV-2 users.
    • The default value is true; primer trimming will occur unless indicated otherwise.
  • SC2-specific calculations have been moved to a new task so these calculations are performed only on SC2 samples, and output variables such as s_gene_percent_coverage are now prefaced by sc2_, for example sc2_s_gene_percent_coverage, in order to indicate this variable is specific for SC2.
  • VADR is only performed on SC2 samples.
    • VADR is able to be run on MPXV samples but this release does not support this. Future releases will enable this feature.
  • Kraken2 has a new input variable target_org that enables the user to specify a target organism to pull from the Kraken2 report; e.g., if this value is set to "Monkeypox virus", the kraken_target_org percentage will populate with the percentage of MPXV identified in the sample.

New features

  • Updated documentation is now available on our readthedocs page
  • Pangolin:
    • A new pango_lineage_expanded output variable has been created that is enabled by default through the expanded_lineage Boolean input variable. This output lists the pangolin lineage without any aliases (e.g., BA.5 → B.1.1.529.5)
    • --skip-scorpio and --skip-designation-cache are now Boolean inputs that are defaulted to false.
  • Freyja:
    • Two new workflows have been added: Freyja_Update, a workflow to create updated Freyja reference materials, and Freyja_Dash, a workflow to create an interactive HMTL visualization of aggregated Freyja demixed output
    • The docker image has been updated to v1.3.10 for all Freyja tasks.
    • New boolean inputs have been created to enable bootstrapping (bootstrap; default=false) and use of confirmed lineages only (confirmed_only; default=false)
    • A new integer input indicating the number of bootstraps is only used when bootstrap is true (number_bootstraps)
    • NOTE: Use of a dashboard configuration file is recommended for the Freyja_Dash workflow to create lineage groups and avoid “too many lineages” error messages. An example configuration file can be found here.
  • Nextclade:
    • The Nextclade task has been modified to be compatible with versions ≥v2.0.0.
    • The default dataset tag has been updated to 2022-07-26T12:00:00Z
    • The default docker image has been updated to nextstrain/nextclade:2.4.0
    • NOTE: In order to incorporate Nextclade v2.0.0, modifications were made that render our SARS-CoV-2 genomics characterization workflows (e.g., TheiaCoV_Illumina_PE) incompatible with older versions of Nextclade.

What's Changed

New Contributors

Full Changelog: v2.1.2...v2.2.0

v2.1.2

03 May 18:51
acdbc23
Compare
Choose a tag to compare

This patch release addresses an issue identified with the TheiaCoV_Augur_Prep workflow

Other modifications

  • Updated default pangolin_docker_image (staphb/pangolin:4.0.6-pdata-1.8)
  • Updated default nextclade_dataset_tag (2022-04-28T12:00:00Z)

What's Changed

  • Fix PHVG v2.1.1 bug and update default images and tags by @sage-wright in #141

Full Changelog: v2.1.1...v2.1.2

v2.1.1

26 Apr 19:20
2e366b6
Compare
Choose a tag to compare

This patch release addresses issues identified with the TheiaCoV_Augur_Run workflows

  • CSV elements in metadata_merged now properly converted into CSV format
  • Multiple TheiaCoV_Augur_Run tasks modified to allow for graceful memory telemetry failure, described by @dpark01 here

Other Modifications:

  • Addition of the pangolin_arguments variable allows for additional user-defined arguments; e.g., --skip-scorpio

What's Changed

Full Changelog: v2.1.0...v2.1.1

v2.1.0

08 Apr 20:45
0926c09
Compare
Choose a tag to compare

This minor release modifies the pangolin task to ensure compatibility with Pangolin ≥v4.0.4

NOTE: In order to incorporate Pangolin ≥v4.0.4, modifications were made that render our SARS-CoV-2 genomics characterization workflows (e.g. TheiaCoV_Illumina_PE) incompatible with older versions of Pangolin.

  • Default docker image for pangolin4 task set to: quay.io/staphb/pangolin:4.0.4-pdata-1.2.133

Other Modifications:

New Features

  • An s_gene_percent_coverage calculation was added to all Theia_COV workflows for SARS-CoV-2 genomic characterization that incorporate an alignment step (TheiaCoV_ClearLabs, TheiaCoV_Illumina_PE, TheiaCoV_Illumina_SE, and TheiaCoV_ONT).
    • An additional TSV file is made that includes the percent coverage of all genes in SC2 genomes, assuming Wuhan-1 reference genome positions. It can be found under this column: percent_gene_coverage
  • A min_depth input variable was created for TheiaCoV_Illumina_PE and TheiaCoV_Illumina_SE workflows to specify the minimum depth of coverage required to call a base in the final assembly output and a variant in the VCF output.
    • The default value for min_depth is 100.
    • This parameter replaces min_depth parameter for two previous tasks consensus and variant_call. These variables have been consolidated.
  • The NextClade dataset tag used is now an output value generated in our SARS-CoV-2 genomics characterization workflows (e.g. TheiaCoV_Illumina_PE) under column: nextclade_ds_tag.
  • The TheiaCoV_Augur_Run merged_metadata output file is now in CSV format to be compatible with both Auspice and MicrobeTrace.

Default Docker Image Updates

  • Default Nextclade docker image updated to: nextstrain/nextclade:1.11.0
  • Default nextclade_dataset_tag updated to: 2022-03-31T12:00:00Z
  • Default Freyja docker image updated to: quay.io/staphb/freyja:1.3.2

Bug Fixes

  • The output of several Mercury files were called CSV files when they were actually TSV files. This is fixed. #112

Pull Requests and Resolved Issues

Full Changelog: v2.0.0...v2.1.0

v2.0.0

16 Feb 23:47
a6df039
Compare
Choose a tag to compare

This major release renames workflows to utilize the TheiaCoV tag (previously Titan) and adds five new workflows for public health viral genomics.

Workflow names changed and modifications made:

  • Titan_Augur_Prep → TheiaCoV_Augur_Prep
  • Titan_Augur_Run → TheiaCoV_Augur_Run
    • Allow subsampling via user-defined builds.yml file
    • Update default nextstrain docker images (nextstrain/base:build-20210127T135203Znextstrain/base:build-20210218T081251)
  • Titan_ClearLabs
    • Update default consensus task docker container image (quay.io/staphb/artic-ncov2019:1.3.0quay.io/staphb/artic-ncov2019:1.3.0-medaka-1.4.3)
      • Note: quay.io/staphb/artic-ncov2019:1.3.0 & quay.io/staphb/artic-ncov2019-epi2me are both compatible alternative docker images
    • Use of fastq-scan rather than fastqc to calculate number of reads and pairs
    • Allow for use of a user-defined reference genome for consensus genome assembly
      • reference_genome consensus task input variable
  • Titan_Illumina_PE → TheiaCoV_Illumina_PE
    • Default minimum coverage changed from 20x to 100x (ivar consensus and ivar variants tasks)
    • Use of fastq-scan rather than fastqc to calculate number of reads and pairs
    • Allow for use of a user-defined reference genome for consensus genome assembly
      • reference_genome workflow input variable
  • Titan_Illumina_SE → TheiaCoV_Illumina_SE
    • Default minimum coverage changed from 20x to 100x (ivar consensus and ivar variants tasks)
    • Use of fastq-scan rather than fastqc to calculate number of reads and pairs
    • Allow for use of a user-defined reference genome for consensus genome assembly
      • reference_genome workflow input variable
  • Titan_ONT → TheiaCoV_ONT
    • Update default consensus task docker container image (quay.io/staphb/artic-ncov2019:1.3.0-medaka-1.4.3quay.io/staphb/artic-ncov2019-epi2me)
      • Note: quay.io/staphb/artic-ncov2019:1.3.0 & quay.io/staphb/artic-ncov2019:1.3.0-medaka-1.4.3 are both compatible alternative docker images
    • Use of fastq-scan rather than fastqc to calculate number of reads and pairs
    • Allow for use of a user-defined reference genome for consensus genome assembly
      • reference_genome consensus task input variable
  • Titan_FASTA → TheiaCoV_FASTA
  • Titan-GC → TheiaCoV-GC

Workflows Added:

  • TheiaCoV_Validate
    • Workflow that allows for the rapid comparison of critical output values generated by differing versions of TheiaCoV workflows for SARS-CoV-2 genomic characterization for bioinformatics validation purposes
  • TheiaCoV_DistanceTree
    • Workflow that allows for Augur distance trees to be generated without refinement
  • Workflows for SARS-CoV-2 Wastewater Data Analysis
    • Freyja_FASTQ
      • Workflow that allows running of the Freyja software with raw paired-end fastq files
        • This workflow will generate the required alignment that is used as input to the freya variants command that is then analyzed with freyja demix
    • Freyja_Plot
      • Workflow to visualize Freyja outputs using the freyja plot command
    • TheiaCoV_WWVC

Other modifications:

  • Default docker images updated for Pangolin (staphb/pangolin:3.1.11-pangolearn-2021-08-24quay.io/staphb/3.1.20-pangolearn-2022-02-02), VADR (staphb/vadr:1.3quay.io/staphb/1.4.1-models-1.3-2) and Nextclade (nextstrain/nextclade:1.3.0nextstrain/nextclade:1.10.3) and Nextclade dataset tag ( 2021-06-25T00:00:00Z2022-02-07T12:00:00Z) in all TheiaCOV workflows for SARS-CoV-2 genomic characterization (TheiaCoV_ClearLabs, TheiaCoV_FASTA, TheiaCoV_Illumina_PE, TheiaCoV_Illumina_SE, and TheiaCoV_ONT)
    • NOTE: In order to incorporate Nextclade ≥v1.10.0, modifications to the nextclade_one_sample were made that render it incompatible with older versions of Nextclade.
  • Inclusion of S-gene coverage calculation in all Theia_COV workflows for SARS-CoV-2 genomic characterization that incorporate an alignment step (TheiaCoV_ClearLabs, TheiaCoV_Illumina_PE, TheiaCoV_Illumina_SE, and TheiaCoV_ONT)
  • Mercury_Batch requiring Array[String] (i.e. gcp_uri) for sra_reads input (was Array[File]); this change avoids the need for localization into VM before transferring to transfer bucket for SRA read submission drastically decreasing runtime
    • This modifications means that a zipped file of reads for web portal submission is no longer produced if a gcp_bucket is not specified; instead, users are encouraged to utilize the zip_column_content workflow from the Theiagen Terra_Utilities repository to generate these files.
  • Implementation of a repository style guide

v1.5.3

15 Sep 03:08
Compare
Choose a tag to compare

Patch to address vulnerability in Mercury Prep workflows to the inadvertent removal of internal Ns when preparing assemblies for GenBank submission
This patch replaces the sed one liner that removed leading N's from assembly files in preparation for GenBank submission with the NCBI fasta-trim-terminal-ambigs.pl script as the sed solution was found to be vulnerable to inadvertent removal of non-terminal Ns in multi-line assembly files.

Other modifications made

  • NextClade default image updated to v1.3.0; nextclade_one_sample task modified to accommodate changes in sourcing reference files
  • GISAID metadata passage_history field auto-populated as original in the Mercury Prep workflows--other required fields (patient_age, patient_gender, and patient_status) populated as unknown if no input value is provided

v1.5.2

04 Sep 00:59
b56ddc4
Compare
Choose a tag to compare

Minor release to update the Mercury Workflows
The Mercury workflows (Mercury_PE_Prep, Mercury_SE_Prep, and Mercury_Batch) have been updated to enable the inclusion of all required and suggested metadata as per the PHA4GE SARS-CoV-2 Contextual Data Specifications.

In addition to the submittable files to GISAID and GenBank, the Mercury workflows to prepare files for both BioSample registration, SRA submission. A protocol to utilize these new workflows for SC2 data submission has been made publicly available on Protocols.io.

Other modifications made

  • Pangolin task modified to capture all software and reference versions; outputs have changed accordingly:
    -- pangolin_version: deprecated
    -- pangolin_usher_version: deprecated
    -- pangolin_versions: all pangolin software and reference data versions
    -- pangolin_assignment_version: version captured from the final pangolin report, i.e. version of inference approach utilized to make the final pango lineage assignment
  • Titan workflows for genomic characterization modified to remove the pangolin_docker_image input parameter
    -- The pangolin_docker_image is now an optional input parameter for the pangolin3 task titled docker
    -- The default value for the pangolin3.docker input parameter has been set to staphb/pangolin:3.1.11-pangolearn-2021-08-24
  • nextclade_one_sample task modified to allow processing of 0bp assembly files (PR by @HNH0303 #64)
  • titan_augur_run workflow modified to address bug regarding processing of unmasked inputs (PR by @dpark01 #62)