Skip to content

Commit

Permalink
fixed ncbi classification error
Browse files Browse the repository at this point in the history
  • Loading branch information
Ubuntu committed Aug 1, 2020
1 parent c8aa5b0 commit 95312b3
Show file tree
Hide file tree
Showing 7 changed files with 21 additions and 16 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Before updating, back up your `config-metawrap` file so you do not have to re-do
conda update -y -c ursky metawrap-mg
# or for a specific version:
conda install -y -c ursky metawrap-mg=1.2.4
conda install -y -c ursky metawrap-mg=1.3.0
```

If you are using the (recommended) manual instalation of metaWRAP, simply run `git pull` inside the metaWRAP directory.
Expand Down Expand Up @@ -92,7 +92,7 @@ conda install biopython blas=2.5 blast=2.6.0 bmtagger bowtie2 bwa checkm-genome
conda install -y -c ursky metawrap-mg
# Note: may take a while
# To fix the CONCOCT endless warning messages in metaWRAP=1.2, run
# To fix the CONCOCT endless warning messages in metaWRAP=1.2+, run
conda install -y blas=2.5=mkl
```

Expand All @@ -111,7 +111,7 @@ conda config --add channels ursky
conda install -y -c ursky metawrap-mg
# Note: may take a while
# To fix the CONCOCT endless warning messages in metaWRAP=1.2, run
# To fix the CONCOCT endless warning messages in metaWRAP=1.2+, run
conda install -y blas=2.5=mkl
```
Expand Down
4 changes: 2 additions & 2 deletions bin/config-metawrap
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ KRAKEN_DB=/scratch/gu/MY_KRAKEN_DB
BMTAGGER_DB=/scratch/gu/BMTAGGER_DB

# paths to BLAST databases
BLASTDB=/localscratch/gu/NCBI_nt
TAXDUMP=/localscratch/gu/NCBI_tax
BLASTDB=~/PGScratch/testing_taxonomy/NCBI_nt_v4
TAXDUMP=~/PGScratch/testing_taxonomy/NCBI_tax
2 changes: 1 addition & 1 deletion bin/metawrap
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Master metaWRAP script that calls on individual modules/pipelines
##############################################################################################################################################################

VERSION="1.2.4"
VERSION="1.3.0"

help_message () {
echo""
Expand Down
17 changes: 10 additions & 7 deletions bin/metawrap-modules/classify_bins.sh
Original file line number Diff line number Diff line change
Expand Up @@ -113,14 +113,17 @@ for f in $(ls $bin_folder); do cat ${bin_folder}/${f} >> ${out}/all_contigs.fa;
if [[ ! -s ${out}/all_contigs.fa ]]; then error "something went wrong with joining files in $bin_folder into ${out}/all_contigs.fa"; fi


if [[ -s ${out}/megablast_out.raw.tab ]]; then
comm "megablast alignment already done. Skipping..."
else
comm "aligning ${out}/all_contigs.fa to ${BLASTDB} database with MEGABLAST. This is the longest step - please be patient. You may look at the classification progress in ${out}/megablast_out.raw.tab"
blastn -task megablast -num_threads $threads\
-db ${BLASTDB}/nt\
-outfmt '6 qseqid qstart qend qlen sseqid staxids sstart send bitscore evalue nident length'\
-query ${out}/all_contigs.fa > ${out}/megablast_out.raw.tab

comm "aligning ${out}/all_contigs.fa to ${BLASTDB} database with MEGABLAST. This is the longest step - please be patient. You may look at the classification progress in ${out}/megablast_out.raw.tab"
blastn -task megablast -num_threads $threads\
-db ${BLASTDB}/nt\
-outfmt '6 qseqid qstart qend qlen sseqid staxids sstart send bitscore evalue nident length'\
-query ${out}/all_contigs.fa > ${out}/megablast_out.raw.tab

if [[ $? -ne 0 ]]; then error "Failed to run megablast. Exiting..."; fi
if [[ $? -ne 0 ]]; then error "Failed to run megablast. Exiting..."; fi
fi


comm "removing unnecessary lines that lead to bad tax IDs (without a proper rank)"
Expand Down
4 changes: 2 additions & 2 deletions bin/metawrap-scripts/prune_blast_hits.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
cut=line.split('\t')
ranks[cut[0]]=cut[4]

exclude=["no rank", "subspecies", "species group", "varietas", "forma", "subfamily", "cohort"]
include=set(["species", "genus", "family", "order", "class", "phylum", "superkingdom"])

#prune blast output to remove mappings without a rank and remove taxid columnn
for line in open(sys.argv[2]):
Expand All @@ -19,7 +19,7 @@
ct=0
for id in ids.split(';'):
if id not in ranks: continue
if ranks[id] in exclude: continue
if ranks[id] not in include: continue
if ct>0: continue
cut[5]=id
ct+=1
Expand Down
2 changes: 1 addition & 1 deletion conda_pkg/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package:
name: metawrap-mg
version: "1.2.4"
version: "1.3.0"

source:
git_url: https://github.com/bxlab/metaWRAP.git
Expand Down
2 changes: 2 additions & 0 deletions installation/database_installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ cd NCBI_nt
wget "ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.*.tar.gz"
for a in nt.*.tar.gz; do tar xzf $a; done
```
Note: if you are using a more recent blast verions (beyond v2.6) you will need a the newer database format: `wget "ftp://ftp.ncbi.nlm.nih.gov/blast/db/v4/nt_v4.*.tar.gz"`

Do not forget to set the BLASTDB variable in the config-metawrap file!
``` bash
BLASTDB=/your/location/of/database/NCBI_nt
Expand Down

0 comments on commit 95312b3

Please sign in to comment.