Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient peaks in pseudoreplicates resulting in chip.idr_ppr failure #307

Open
nursyahr opened this issue Sep 18, 2024 · 0 comments
Open

Comments

@nursyahr
Copy link

nursyahr commented Sep 18, 2024

Describe the bug

The pipeline fails at the peak merging step at chip.idr_ppr due to insufficient peaks. Upon checking the respective pseudoreplicate files, there are only 20 and 6 peaks respectively. To my understanding the pseudoreplicates are supposed to be a subsample of the true replicates, so I am unclear why there are so little peaks even before merging. I have tried to set a seed for the pseudoreplicates but still no luck. No such error encountered for other samples.

OS/Platform

  • OS/Platform: Linux 4.18.0-372.9.1.el8.x86_64
  • Pipeline version: v2.2.2
  • Caper version: v2.3.2

Caper configuration file

Paste contents of ~/.caper/default.conf.

backend=local

# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
local-loc-dir=/home/cbi/grn_inference/database/raw/as_tf/ctcf/chipseq/chip-seq_encode_pipeline/pipeline_data

cromwell=/home/nursyahi001/.caper/cromwell_jar/cromwell-82.jar
womtool=/home/nursyahi001/.caper/womtool_jar/womtool-82.jar

Input JSON file

Paste contents of your input JSON file.

{
    "chip.title" : "2024-08-20 FFF Control",
    "chip.description" : "Samples: WHC2180-2; FFF-C1,C2,C3",

    "chip.pipeline_type" : "tf",
    "chip.aligner" : "bowtie2",
    "chip.align_only" : false,
    "chip.true_rep_only" : false,

    "chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v4/hg38.tsv",
    "chip.genome_name" : "hg38",

    "chip.paired_end" : true,
    "chip.ctl_paired_end" : true,

    "chip.always_use_pooled_ctl" : true,

    "chip.fastqs_rep1_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2180_1_val_1.fq.gz"],
    "chip.fastqs_rep2_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2181_1_val_1.fq.gz"],
    "chip.fastqs_rep3_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2182_1_val_1.fq.gz"],

    "chip.fastqs_rep1_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2180_2_val_2.fq.gz"],
    "chip.fastqs_rep2_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2181_2_val_2.fq.gz"],
    "chip.fastqs_rep3_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2182_2_val_2.fq.gz"],

    "chip.ctl_fastqs_rep1_R1" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2192_1_val_1.fq.gz"],
    "chip.ctl_fastqs_rep1_R2" : [ "/home/cbi/projects/20240725_CHIPseq_data/fastq/trimmed/WHC2192_2_val_2.fq.gz"]
    
}

Troubleshooting result

If you ran caper run without Caper server then Caper automatically runs a troubleshooter for failed workflows. Find troubleshooting result in the bottom of Caper's screen log.

If you ran caper submit with a running Caper server then first find your workflow ID (1st column) with caper list and run caper debug [WORKFLOW_ID].

Paste troubleshooting result.



==== NAME=chip.idr_ppr, STATUS=Failed, PARENT=
SHARD_IDX=-1, RC=1, JOB_ID=2331552
START=2024-09-12T11:51:25.550Z, END=2024-09-12T11:51:39.781Z
STDOUT=/home/cbi/projects/20240725_PheckKhee_CHIPseq_data/croo_output/outputs/trimmed/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/execution/stdout
STDERR=/home/cbi/projects/20240725_PheckKhee_CHIPseq_data/croo_output/outputs/trimmed/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
  File "/software/chip-seq-pipeline/src/encode_task_idr.py", line 213, in <module>
    main()
  File "/software/chip-seq-pipeline/src/encode_task_idr.py", line 175, in main
    args.idr_thresh, args.idr_rank, args.mem_gb, args.out_dir,
  File "/software/chip-seq-pipeline/src/encode_task_idr.py", line 118, in idr
    idr_stdout=idr_stdout,
  File "/software/chip-seq-pipeline/src/encode_lib_common.py", line 359, in run_shell_cmd
    raise Exception(err_str)
Exception: PID=2331702, PGID=2331702, RC=1, DURATION_SEC=5.2
STDERR=Traceback (most recent call last):
  File "/usr/local/bin/idr", line 4, in <module>
    __import__('pkg_resources').run_script('idr==2.0.3', 'idr')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1438, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.6/dist-packages/idr-2.0.3-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
    idr.idr.main()
  File "/usr/local/lib/python3.6/dist-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 857, in main
    raise ValueError(error_msg)
ValueError: Peak files must contain at least 20 peaks post-merge
Hint: Merged peaks were written to the output file
STDOUT=/usr/local/bin/idr --samples /cromwell-executions/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/inputs/313145862/rep-pr1.pooled_x_WHC2192_1_val_1.srt.nodup.300K.regionPea
k.gz /cromwell-executions/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/inputs/305386503/rep-pr2.pooled_x_WHC2192_1_val_1.srt.nodup.300K.regionPeak.gz --peak-list /cromwell-exec
utions/chip/10977366-4d12-4516-b858-b2ecec8ef1d0/call-idr_ppr/attempt-2/inputs/1977487234/rep.pooled_x_WHC2192_1_val_1.srt.nodup.300K.regionPeak.gz --input-file-type narrowPeak --output-file poole
d-pr1_vs_pooled-pr2.idr0.05.unthresholded-peaks.txt --rank signal.value --soft-idr-threshold 0.05 --plot --use-best-multisummit-IDR --log-output-file pooled-pr1_vs_pooled-pr2.idr0.05.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant