Unverified Commit bc1ebc47 authored by Antonie Vietor's avatar Antonie Vietor Committed by GitHub
Browse files

Peak analysis v2 (#5)

* preseq and picard collectalignmentsummarymetrics added

* changed PICARD COLLECTALIGNMENTSUMMARYMETRICS to PICARD COLLECTMULTIPLEMETRICS and integrated the new wrapper as a temporary script

* integration of next steps with temporary use of future wrappers as scripts

* wrapper integration for collectmultiplemetrics and genomecov, rule to create an igv-file from bigWig files

* deeptools and phantompeakqualtools integration

* phantompeakqualtools, headers and multiqc integration

* draft for integration of additional plots from phantompeakqualtools data into multiqc

* Changes according view #4

* Cross-correlation plots are grouped now. Changes in the description of the plots.

* change to newer wrapper versions n all rules and code cleanup

* Changes according to view #4, temporary matplotlib dependency to multiqc added, github actions added

* github actions

* test config and test data

* changes according to PR #4

* update config

* more logs added

* lint: Mixed rules and functions in same snakefile -> moved a part of the rule_all input to common.smk, input functions for rule_all added

* undo the changes of the last commit

* moved all function from Snakefile to common.smk

* --cache flag added for github actions

* --cache flag added for github actions

* snakemake_output_cache location added

* test snakemake_output_cache location

* another test snakemake_output_cache location

* another test snakemake_output_cache location

* set cache in github actions

* fix: dependencies in pysam resulted in ContextualVersionConflict in multiqc

* test: set cache location in github actions

* removed config files in .test from gitignore

* pysam depenencies and changes for github actions

* directory for ngs-test-data added

* gitmodules

* config

* test submodules

* test submodules

* config added

* directory for snakemake output cache changed

* cache location removed

* creating directory for snakemake output cache in github actions

* test cache directory with mkdir and chmod

* code cleanup github actions

* code cleanup github actions

* conda-forge channel added to pysam env

* conda-forge channel added to pysam env

* rule phantompeak added in a script instead of a shell command via Rscript

* testing on saccharomyces cerevisiae data set with deactivated preseq_lc_extrap rule

* r-base environment added to rule phantompeak_correlation

* changed genome data back to human, added rule for downloading single chromosomes from the reference genome (to generate smaller test data)

* rule preseq_lc_extrap activated again, changed genome data back to human, added rule for downloading single chromosomes from the reference genome (to generate smaller test data)

* adopt changes from bam_post_analysis branch, control grouping for samples and input rule to get sample-control combinations

* minimal cleanup

* adjustment of the plot_fingerprint: input function, matching of each treatment sample to its control sample for common output, integration of the JSD calculation, new wildcard for the control samples

* changes on wildcard handling for controls

* rule for macs2 callpeak added

* rule for bedtools intersect added, drafts for multiqc peaks count added

* broad and narrow option handling via config, additional rule for narrow peaks output, peaks count and frip score for multiqc, peaks for igv

* adaptation and integration of plot scripts for results from homer and macs2 analysis, script for plot_peaks_count and it's integration in snakemake-report, integraion of older plots in snakemake-report

* changes for linter

* changes for linter

* changes for linter

* changes for linter

* changes for linter

* changes on input functions and on params parsing for rules plot_macs_qc and plot_homrer_annotatepeaks, peaks wildcard added to all outputs

* test for the behavior of the linter

* test for the behavior of the linter

* changes for the linter

* test for the linter

* refactoring the config variable, restoring the input functions

* plot for FRiP score and some reports added, plot for annotatepeaks summary as draft added

* plot for homer annotatepeaks summary and report description, changes on frip score and peak count plots, changes according to PR #5

* some code cleanup

* changes for PR #5 added

* code cleanup
parent 268eb612
......@@ -18,6 +18,8 @@ resources:
params:
lc_extrap: True
# choose "narrow" or "broad" for macs2 callpeak analysis, for documentation and source code please see https://github.com/macs3-project/MACS
peak-analysis: "broad"
# TODO: move adapter parameters into a `adapter` column in units.tsv and check for its presence via the units.schema.yaml -- this enables unit-specific adapters, e.g. when integrating multiple datasets
# these cutadapt parameters need to contain the required flag(s) for
# the type of adapter(s) to trim, i.e.:
......
sample unit fragment_len_mean fragment_len_sd fq1 fq2 platform
A 1 ngs-test-data/reads/a.chr21.1.fq ngs-test-data/reads/a.chr21.2.fq ILLUMINA
B 1 ngs-test-data/reads/b.chr21.1.fq ngs-test-data/reads/b.chr21.2.fq ILLUMINA
B 1 ngs-test-data/reads/a.chr21.1.fq ngs-test-data/reads/a.chr21.2.fq ILLUMINA
B 2 300 14 ngs-test-data/reads/b.chr21.1.fq ILLUMINA
C 1 ngs-test-data/reads/a.chr21.1.fq ngs-test-data/reads/a.chr21.2.fq ILLUMINA
D 1 ngs-test-data/reads/b.chr21.1.fq ngs-test-data/reads/b.chr21.2.fq ILLUMINA
......@@ -18,6 +18,9 @@ resources:
params:
lc_extrap: True
# choose "narrow" or "broad" for macs2 callpeak analysis, for documentation and source code please see https://github.com/macs3-project/MACS
peak-analysis: "broad"
# TODO: move adapter parameters into a `adapter` column in units.tsv and check for its presence via the units.schema.yaml -- this enables unit-specific adapters, e.g. when integrating multiple datasets
# these cutadapt parameters need to contain the required flag(s) for
# the type of adapter(s) to trim, i.e.:
# * https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
......
......@@ -10,6 +10,7 @@ include: "rules/filtering.smk"
include: "rules/stats.smk"
include: "rules/utils.smk"
include: "rules/post-analysis.smk"
include: "rules/peak_analysis.smk"
rule all:
input: all_input
channels:
- conda-forge
- bioconda
dependencies:
- r-base =3.5
- r-optparse =1.6
- r-ggplot2 =3.1
- r-reshape2 =1.4
channels:
- conda-forge
- bioconda
dependencies:
- r-tidyverse =1.3
- r-base =4.0
channels:
- bioconda
- conda-forge
dependencies:
- homer ==4.11
**HOMER** peak annotation summary plot is generated by calculating the proportion of {{snakemake.config["params"]["peak-analysis"]}} peaks assigned to genomic features by `HOMER annotatePeaks.pl <http://homer.ucsd.edu/homer/ngs/annotation.html>`_.
**MACS2 FRiP score** is generated by calculating the fraction of all mapped reads that fall into the `MACS2 <https://github.com/taoliu/MACS>`_ called {{snakemake.config["params"]["peak-analysis"]}} peak regions.
A read must overlap a peak by at least 20% to be counted to `FRiP score <https://www.encodeproject.org/data-standards/terms/>`_.
**MACS2 peak count** is calculated from total number of {{snakemake.config["params"]["peak-analysis"]}} peaks called by `MACS2 <https://github.com/taoliu/MACS>`_.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment