Skip to content

Latest commit

 

History

History
123 lines (69 loc) · 14.1 KB

index-hicexplorer.md

File metadata and controls

123 lines (69 loc) · 14.1 KB
layout
subsite-galaxy

Galaxy HiCExplorer

Welcome to the Galaxy HiCExplorer -- a webserver to process, analyse and visualize Hi-C, capture Hi-C, HiChIP and single-cell Hi-C data.

Get started with Galaxy HiCExplorer

Are you new to Galaxy, or returning after a long time, and looking for help to get started? Take a guided tour through Galaxy's user interface.

Take a guided tour for an introduction to Galaxy HiCExplorer and Hi-C data analysis. This tour is guides you through the Hi-C tutorial on the Galaxy Training Network where you can analyse Hi-C data of Drosophila melanogaster. Follow the tutorial to understand the analysis steps better or as a help which parameters are useful.

A precomputed history of the tutorial can be viewed here.

A more advanced tutorial is hosted on readthedocs.io. It is designed for the shell based version of the HiCExplorer but can be easily adapted to Galaxy HiCExplorer. In this tutorial mouse stems cells from Marks et al. (2015) are analysed. We provided the input fastq files in our data library.

We recommend to follow the tutorial on FASTQC for quality checks.

Example data

The Galaxy Training Network tutorial uses Hi-C data from Drosophila melanogaster and is hosted on zenodo: DOI

Additional we provide the data in the shared data library of the Galaxy HiCExplorer. In comparison to the data hosted on zenodo it contains preprocessed intermediate files.

Galaxy HiCExplorer can process large Hi-C data. We processed Hi-C data with around 750 million reads from Rosa-Garrido et al.. Have a look at the preprocessed files.

Capture Hi-C and HiChIP

The new chic*-modules of HiCExplorer provide powerful tools to analyse capture Hi-C and HiChIP data. We recommend to follow the tutorial on hicexplorer.readthedocs.io for an introduction to the analysis pipeline. A preprocessed cHi-C history with data from Andrey et al. 2017. is provided here.

Single-cell Hi-C

The newest members of the HiCExplorer tool suit are the schic*-modules to bring the latest single-cell Hi-C research to Galaxy. We recommend to follow the tutorial on schicexplorer.readthedocs.io for an introduction to the analysis pipeline.

The raw scool matrices for with the data from Nagano 2017 in 10 kb and 1 Mb resolution is hosted on zenodo: DOI

HiGlass

The interactive Hi-C data exploration with HiGlass is accessible via the interactive live.usegalaxy.eu platform.

Galaxy HiCExplorer -- many possibilities

(A) Galaxy HiCExplorer workflows and tools. Quality control tools: (B) Output of hicCorrelate comparing two wild types and one knockdown samples. (C) Output of hicPlotDistVsCounts that shows changes of the number of contacts for different conditions. Analysis tools: (D) hicPlotMatrix of the Pearson correlation matrix derived from a contact matrix for chromosome 6 in mouse computed with hicTransform. The optional data track at the bottom shows the first eigenvector for A/B compartment obtained using hicPCA. (E) The pixel difference between a Hi-C corrected matrix for wild type condition and a knock down was computed using hicCompareMatrices and a 7Mb region is visualized using hicPlotMatrix. Visualization tools: (F) Contact matrix plot of a 80 to 105 Mb region of chromosome 2 in log scale. (G) Example output of hicPlotViewpoint showing the corrected number of Hi-C contacts for a single bin in chromosome 5 (output similar to 4C-seq) (Andrey 2017). (H) A Hi-C matrix was converted into an observed vs. expected matrix using hicTransform and this matrix, together with the location of high-affinity sites from (Ramirez 2015) were used to run hicAggregateContacts. (I) 85 Mb to 110 Mb region from human chromosome 2 visualized using hicPlotTADs. TADs were computed by hicFindTADs. The additional tracks added correspond to: TAD- separation score (as reported by hicFindTADs), chromatin state , principal component 1 (A/B compartment) computed using hicPCA, ChIP-seq coverage for the H3K27ac mark, DNA methylation, and a gene track. Hi-C data for B, C, E and H from Drosophila melanogaster S2 cells from (Ramirez 2018). Hi-C data for D, F and I from mouse cardiac myocytes(Nothjunge 2017). Additional tracks in I from (Nothjunge 2017).

The new tools in Galaxy HiCExplorer 3 to make even better Hi-C data analyses: (A)Detect loops computed by hicDetectLoops and plotted with hicPlotMatrix on GM12878 primary data from Rao 2014. (B) Short to long range contact ratios created by hicPlotSVL on GM12878 primary, IMR90 and HMEC data from Rao 2014. (C) Average regions of detected TADs from hicFindTADs on GM12878 primary, chromosome 1; data from Rao 2014. (D) Compartmentalization of GM12878 primary data from Rao 2014. Computed with hicCompartmentalization. (E) Viewpoint of the gene MSTN on FL-E13-5 and MB-E10-5 with mean background and p-values per relative distance via continuous negative binomial distributions. Data from Andrey 2017. (F) Quality control plot for FL-E13-5and MB-E10-5 showing the sparisity distribution. Data from Andrey 2017. (G) Single-cell Hi-C cluster profile. Created by dimension reduction with scHicClusterMinHash and spectral clustering on 1 MB single-cell Hi-C data by Nagano 2017. (H) Quality control plot for single-cell Hi-C data by Nagano 2017. Shows the read coverage per cell, cells with less than 100,000 reads are discarded. (I) Consensus matrix plot for single-cell Hi-C data on 1 MB resolution. Cells are dimension reduced by computing A/B compartments per cell and clustered with k-means. The consensus matrix of a cluster is the average of all interaction matrices of the cluster memebers. Data from Nagano 2017.

The different tools of Galaxy HiCExplorer in a workflow context: Analysis workflow for Hi-C (A), cHi-C / HiChIP (B) and scHi-C (C). All share the usage of hicBuildMatrix to create the individual contact matrices. Hi-C and cHi-C/HiChIP do support HiCExplorer's h5 and cool interaction matrix file format, scHi-C data creates for each cell one cool interaction matrix file and with scHicMergeToSCool all single-cell matrices are merged to one single-cell cool (scool) matrix.

Workflows

To automatize different consecutive steps we provide the following workflows in three categories: From scratch (FASTQ files), from scratch (FASTQ files) and summing up replicates and if you have already your contact matrix. Many workflows require collections of FASTQ files as an input, it is shown here how to create a collection. Please do not forget to check the quality of the FASTQ files with FastQC.

Please have in mind that all workflows need additional input from the user. All mapping steps are done with BWA-MEM and the correct reference genome need to be defined by the user. The correct restriction site and the bin size for hicBuildMatrix needs to be defined too. The correction of the matrix is done with the default parameters of -1.5 and 5, change this if necessary. Furthermore, the correct region and or chromosome needs to be defined for plotting the matrix, TADs or PCA.

From scratch (FASTQ files) individual

These workflows expect collections of FASTQ files as an input. The first collections needs to have all forward strand FASTQ files and the second one all reverse FASTQ files. Please make sure that the order of the FASTQ files in both collections is equal. The order is important to associate the related forward and reverse read strand files.

The following workflows are provided:

From scratch (FASTQ files) and summing up replicates

These workflows takes collections of FASTQ files for forward and reverse strand as an input, for each pair a contact matrix is build and all created contact matrices are summed up to one contact matrix. Use this workflow if you want to use replicates to increase statistical power of your contact matrix and the replicates are checked to be correct.

Contact matrix as a basis

Use the following workflows if you have already created a contact matrix.

Single-cell Hi-C

Use the following workflow for an existing scool matrix with QC and normalization:

Python API access

With the bioblend API it is possible to use the Galaxy HiCExplorer via a script written in Python. A small example on the usage is provided here as an ipython notebook. It is shown how to upload a dataset, run bowtie2 and to download the mapped file to the local computer. Please notice the options offered by the bioblend API are extensive and go way beyond this example.

Known pitfalls

Preprocssed SAM/BAM files: To build the contact matrix the SAM/BAM files need to generated using the --reorder option from bowtie2 / hisat2 to output the SAM/BAM files in the exact same order as in the fastq files. To cover the identical reason, the SAM/BAM file should not be sorted. Please make sure your preprocessed SAM/BAM files fulfill these requirements, if not the creation of a contact matrix with hicBuildMatrix will fail.

We recommend to use BWA-MEM with the Hi-C specific parameters, as shown in our tutorials.

Citation

Joachim Wolff, Vivek Bhardwaj, Stephan Nothjunge, Gautier Richard, Gina Renschler, Ralf Gilsbach, Thomas Manke, Rolf Backofen, Fidel Ramírez, Björn A Grüning. "Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization", Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W11–W16, doi: 10.1093/nar/gky504