Unverified Commit bc1ebc47 authored by Antonie Vietor's avatar Antonie Vietor Committed by GitHub
Browse files

Peak analysis v2 (#5)

* preseq and picard collectalignmentsummarymetrics added

* changed PICARD COLLECTALIGNMENTSUMMARYMETRICS to PICARD COLLECTMULTIPLEMETRICS and integrated the new wrapper as a temporary script

* integration of next steps with temporary use of future wrappers as scripts

* wrapper integration for collectmultiplemetrics and genomecov, rule to create an igv-file from bigWig files

* deeptools and phantompeakqualtools integration

* phantompeakqualtools, headers and multiqc integration

* draft for integration of additional plots from phantompeakqualtools data into multiqc

* Changes according view #4

* Cross-correlation plots are grouped now. Changes in the description of the plots.

* change to newer wrapper versions n all rules and code cleanup

* Changes according to view #4, temporary matplotlib dependency to multiqc added, github actions added

* github actions

* test config and test data

* changes according to PR #4

* update config

* more logs added

* lint: Mixed rules and functions in same snakefile -> moved a part of the rule_all input to common.smk, input functions for rule_all added

* undo the changes of the last commit

* moved all function from Snakefile to common.smk

* --cache flag added for github actions

* --cache flag added for github actions

* snakemake_output_cache location added

* test snakemake_output_cache location

* another test snakemake_output_cache location

* another test snakemake_output_cache location

* set cache in github actions

* fix: dependencies in pysam resulted in ContextualVersionConflict in multiqc

* test: set cache location in github actions

* removed config files in .test from gitignore

* pysam depenencies and changes for github actions

* directory for ngs-test-data added

* gitmodules

* config

* test submodules

* test submodules

* config added

* directory for snakemake output cache changed

* cache location removed

* creating directory for snakemake output cache in github actions

* test cache directory with mkdir and chmod

* code cleanup github actions

* code cleanup github actions

* conda-forge channel added to pysam env

* conda-forge channel added to pysam env

* rule phantompeak added in a script instead of a shell command via Rscript

* testing on saccharomyces cerevisiae data set with deactivated preseq_lc_extrap rule

* r-base environment added to rule phantompeak_correlation

* changed genome data back to human, added rule for downloading single chromosomes from the reference genome (to generate smaller test data)

* rule preseq_lc_extrap activated again, changed genome data back to human, added rule for downloading single chromosomes from the reference genome (to generate smaller test data)

* adopt changes from bam_post_analysis branch, control grouping for samples and input rule to get sample-control combinations

* minimal cleanup

* adjustment of the plot_fingerprint: input function, matching of each treatment sample to its control sample for common output, integration of the JSD calculation, new wildcard for the control samples

* changes on wildcard handling for controls

* rule for macs2 callpeak added

* rule for bedtools intersect added, drafts for multiqc peaks count added

* broad and narrow option handling via config, additional rule for narrow peaks output, peaks count and frip score for multiqc, peaks for igv

* adaptation and integration of plot scripts for results from homer and macs2 analysis, script for plot_peaks_count and it's integration in snakemake-report, integraion of older plots in snakemake-report

* changes for linter

* changes for linter

* changes for linter

* changes for linter

* changes for linter

* changes on input functions and on params parsing for rules plot_macs_qc and plot_homrer_annotatepeaks, peaks wildcard added to all outputs

* test for the behavior of the linter

* test for the behavior of the linter

* changes for the linter

* test for the linter

* refactoring the config variable, restoring the input functions

* plot for FRiP score and some reports added, plot for annotatepeaks summary as draft added

* plot for homer annotatepeaks summary and report description, changes on frip score and peak count plots, changes according to PR #5

* some code cleanup

* changes for PR #5 added

* code cleanup
parent 268eb612
......@@ -18,6 +18,8 @@ resources:
params:
lc_extrap: True
# choose "narrow" or "broad" for macs2 callpeak analysis, for documentation and source code please see https://github.com/macs3-project/MACS
peak-analysis: "broad"
# TODO: move adapter parameters into a `adapter` column in units.tsv and check for its presence via the units.schema.yaml -- this enables unit-specific adapters, e.g. when integrating multiple datasets
# these cutadapt parameters need to contain the required flag(s) for
# the type of adapter(s) to trim, i.e.:
......
sample unit fragment_len_mean fragment_len_sd fq1 fq2 platform
A 1 ngs-test-data/reads/a.chr21.1.fq ngs-test-data/reads/a.chr21.2.fq ILLUMINA
B 1 ngs-test-data/reads/b.chr21.1.fq ngs-test-data/reads/b.chr21.2.fq ILLUMINA
B 1 ngs-test-data/reads/a.chr21.1.fq ngs-test-data/reads/a.chr21.2.fq ILLUMINA
B 2 300 14 ngs-test-data/reads/b.chr21.1.fq ILLUMINA
C 1 ngs-test-data/reads/a.chr21.1.fq ngs-test-data/reads/a.chr21.2.fq ILLUMINA
D 1 ngs-test-data/reads/b.chr21.1.fq ngs-test-data/reads/b.chr21.2.fq ILLUMINA
......@@ -18,6 +18,9 @@ resources:
params:
lc_extrap: True
# choose "narrow" or "broad" for macs2 callpeak analysis, for documentation and source code please see https://github.com/macs3-project/MACS
peak-analysis: "broad"
# TODO: move adapter parameters into a `adapter` column in units.tsv and check for its presence via the units.schema.yaml -- this enables unit-specific adapters, e.g. when integrating multiple datasets
# these cutadapt parameters need to contain the required flag(s) for
# the type of adapter(s) to trim, i.e.:
# * https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
......
......@@ -10,6 +10,7 @@ include: "rules/filtering.smk"
include: "rules/stats.smk"
include: "rules/utils.smk"
include: "rules/post-analysis.smk"
include: "rules/peak_analysis.smk"
rule all:
input: all_input
channels:
- conda-forge
- bioconda
dependencies:
- r-base =3.5
- r-optparse =1.6
- r-ggplot2 =3.1
- r-reshape2 =1.4
channels:
- conda-forge
- bioconda
dependencies:
- r-tidyverse =1.3
- r-base =4.0
channels:
- bioconda
- conda-forge
dependencies:
- homer ==4.11
**HOMER** peak annotation summary plot is generated by calculating the proportion of {{snakemake.config["params"]["peak-analysis"]}} peaks assigned to genomic features by `HOMER annotatePeaks.pl <http://homer.ucsd.edu/homer/ngs/annotation.html>`_.
**MACS2 FRiP score** is generated by calculating the fraction of all mapped reads that fall into the `MACS2 <https://github.com/taoliu/MACS>`_ called {{snakemake.config["params"]["peak-analysis"]}} peak regions.
A read must overlap a peak by at least 20% to be counted to `FRiP score <https://www.encodeproject.org/data-standards/terms/>`_.
**MACS2 peak count** is calculated from total number of {{snakemake.config["params"]["peak-analysis"]}} peaks called by `MACS2 <https://github.com/taoliu/MACS>`_.
This plot is colored by {{ snakemake.wildcards.condition }}.
Put a description of your workflow here. See [here](https://snakemake.readthedocs.io/en/stable/snakefiles/reporting.html) for details.
**ChIP-seq** peak-calling, QC and differential analysis pipeline (Snakemake port of the `nextflow pipeline <https://nf-co.re/chipseq>`_).
......@@ -51,6 +51,60 @@ def get_individual_fastq(wildcards):
elif wildcards.read == "2":
return units.loc[ (wildcards.sample, wildcards.unit), "fq2" ]
def get_fastqs(wildcards):
"""Get raw FASTQ files from unit sheet."""
if is_single_end(wildcards.sample, wildcards.unit):
return units.loc[ (wildcards.sample, wildcards.unit), "fq1" ]
else:
u = units.loc[ (wildcards.sample, wildcards.unit), ["fq1", "fq2"] ].dropna()
return [ f"{u.fq1}", f"{u.fq2}" ]
def is_control(sample):
control = samples.loc[sample]["control"]
return pd.isna(control) or pd.isnull(control)
def get_sample_control_peak_combinations_list():
sam_contr = []
for sample in samples.index:
if not is_control(sample):
sam_contr.extend(expand(["{sample}-{control}.{peak}"], sample = sample, control = samples.loc[sample]["control"], peak = config["params"]["peak-analysis"]))
return sam_contr
def get_peaks_count_plot_input():
return expand(
"results/macs2_callpeak/peaks_count/{sam_contr_peak}.peaks_count.tsv",
sam_contr_peak = get_sample_control_peak_combinations_list()
)
def get_frip_score_input():
return expand(
"results/intersect/{sam_contr_peak}.peaks_frip.tsv",
sam_contr_peak = get_sample_control_peak_combinations_list()
)
def get_plot_macs_qc_input():
return expand(
"results/macs2_callpeak/{sam_contr_peak}_peaks.{peak}Peak",
sam_contr_peak = get_sample_control_peak_combinations_list(), peak =config["params"]["peak-analysis"]
)
def get_plot_homer_annotatepeaks_input():
return expand("results/homer/annotate_peaks/{sam_contr_peak}_peaks.annotatePeaks.txt",
sam_contr_peak = get_sample_control_peak_combinations_list()
)
def get_map_reads_input(wildcards):
if is_single_end(wildcards.sample, wildcards.unit):
return "results/trimmed/{sample}-{unit}.fastq.gz"
return ["results/trimmed/{sample}-{unit}.1.fastq.gz", "results/trimmed/{sample}-{unit}.2.fastq.gz"]
def get_read_group(wildcards):
"""Denote sample name and platform in read group."""
return r"-R '@RG\tID:{sample}-{unit}\tSM:{sample}-{unit}\tPL:{platform}'".format(
sample=wildcards.sample,
unit=wildcards.unit,
platform=units.loc[(wildcards.sample, wildcards.unit), "platform"])
def get_multiqc_input(wildcards):
multiqc_input = []
for (sample, unit) in units.index:
......@@ -109,36 +163,31 @@ def get_multiqc_input(wildcards):
sample = sample
)
)
if not is_control(sample):
multiqc_input.extend(
expand (
[
"results/deeptools/{sample}-{control}.fingerprint_qcmetrics.txt",
"results/deeptools/{sample}-{control}.fingerprint_counts.txt",
"results/macs2_callpeak/{sample}-{control}.{peak}_peaks.xls"
],
sample = sample,
control = samples.loc[sample]["control"],
peak = config["params"]["peak-analysis"]
)
)
if config["params"]["lc_extrap"]:
multiqc_input.extend( expand(["results/preseq/{sample}.lc_extrap"], sample = sample))
return multiqc_input
def get_fastqs(wildcards):
"""Get raw FASTQ files from unit sheet."""
if is_single_end(wildcards.sample, wildcards.unit):
return units.loc[ (wildcards.sample, wildcards.unit), "fq1" ]
else:
u = units.loc[ (wildcards.sample, wildcards.unit), ["fq1", "fq2"] ].dropna()
return [ f"{u.fq1}", f"{u.fq2}" ]
def get_map_reads_input(wildcards):
if is_single_end(wildcards.sample, wildcards.unit):
return "results/trimmed/{sample}-{unit}.fastq.gz"
return ["results/trimmed/{sample}-{unit}.1.fastq.gz", "results/trimmed/{sample}-{unit}.2.fastq.gz"]
def get_read_group(wildcards):
"""Denote sample name and platform in read group."""
return r"-R '@RG\tID:{sample}-{unit}\tSM:{sample}-{unit}\tPL:{platform}'".format(
sample=wildcards.sample,
unit=wildcards.unit,
platform=units.loc[(wildcards.sample, wildcards.unit), "platform"])
def all_input(wildcards):
wanted_input = []
# QC with fastQC and multiQC
wanted_input.extend(["results/qc/multiqc/multiqc.html"])
wanted_input.extend([
"results/qc/multiqc/multiqc.html"
])
# trimming reads
for (sample, unit) in units.index:
......@@ -170,7 +219,7 @@ def all_input(wildcards):
wanted_input.extend(
expand (
[
"results/IGV/merged_library.bigWig.igv.txt",
"results/IGV/big_wig/merged_library.bigWig.igv.txt",
"results/deeptools/plot_profile.pdf",
"results/deeptools/heatmap.pdf",
"results/deeptools/heatmap_matrix.tab",
......@@ -179,5 +228,47 @@ def all_input(wildcards):
sample = sample
)
)
if not is_control(sample):
wanted_input.extend(
expand(
[
"results/deeptools/{sample}-{control}.plot_fingerprint.pdf",
"results/macs2_callpeak/{sample}-{control}.{peak}_treat_pileup.bdg",
"results/macs2_callpeak/{sample}-{control}.{peak}_control_lambda.bdg",
"results/macs2_callpeak/{sample}-{control}.{peak}_peaks.{peak}Peak",
"results/IGV/macs2_callpeak/{peak}/merged_library.{sample}-{control}.{peak}_peaks.igv.txt",
"results/macs2_callpeak/plots/plot_{peak}_peaks_count.pdf",
"results/macs2_callpeak/plots/plot_{peak}_peaks_frip_score.pdf",
"results/macs2_callpeak/plots/plot_{peak}_peaks_macs2.pdf",
"results/homer/plots/plot_{peak}_annotatepeaks.pdf",
"results/homer/plots/plot_{peak}_annotatepeaks_summary.pdf"
],
sample = sample,
control = samples.loc[sample]["control"],
peak = config["params"]["peak-analysis"]
)
)
if config["params"]["peak-analysis"] == "broad":
wanted_input.extend(
expand(
[
"results/macs2_callpeak/{sample}-{control}.{peak}_peaks.gappedPeak"
],
sample = sample,
control = samples.loc[sample]["control"],
peak = config["params"]["peak-analysis"]
)
)
if config["params"]["peak-analysis"] == "narrow":
wanted_input.extend(
expand(
[
"results/macs2_callpeak/{sample}-{control}.{peak}_summits.bed"
],
sample = sample,
control = samples.loc[sample]["control"],
peak = config["params"]["peak-analysis"]
)
)
return wanted_input
rule plot_fingerprint:
input:
bam_files=["results/orphan_rm_sorted/{sample}.bam", "results/orphan_rm_sorted/{control}.bam"],
bam_idx=["results/orphan_rm_sorted/{sample}.bam.bai", "results/orphan_rm_sorted/{control}.bam.bai"],
jsd_sample="results/orphan_rm_sorted/{control}.bam"
output: #ToDo: add description to report caption
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotfingerprint.html.
fingerprint=report("results/deeptools/{sample}-{control}.plot_fingerprint.pdf", caption="../report/plot_fingerprint_deeptools.rst", category="QC"),
counts="results/deeptools/{sample}-{control}.fingerprint_counts.txt",
qc_metrics="results/deeptools/{sample}-{control}.fingerprint_qcmetrics.txt"
log:
"logs/deeptools/plot_fingerprint.{sample}-{control}.log"
params:
"--labels {sample} {control}",
"--skipZeros ",
"--numberOfSamples 500000 ", # ToDo: to config?
threads:
8
wrapper:
"0.66.0/bio/deeptools/plotfingerprint"
rule macs2_callpeak_broad:
input:
treatment="results/orphan_rm_sorted/{sample}.bam",
control="results/orphan_rm_sorted/{control}.bam"
output:
# all output-files must share the same basename and only differ by it's extension
# Usable extensions (and which tools they implicitly call) are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.
multiext("results/macs2_callpeak/{sample}-{control}.broad",
"_peaks.xls",
# these output extensions internally set the --bdg or -B option:
"_treat_pileup.bdg",
"_control_lambda.bdg",
# these output extensions internally set the --broad option:
"_peaks.broadPeak",
"_peaks.gappedPeak"
)
log:
"logs/macs2/callpeak.{sample}-{control}.broad.log"
params: # ToDo: move to config?
"--broad --broad-cutoff 0.1 -f BAMPE -g hs --SPMR --qvalue 0.05 --keep-dup all"
# ToDo: Update wrapper to check for " --broad$" or " --broad " instead of only "--broad" (line 47),
# then "--broad" in params can be removed here in params
wrapper:
"0.66.0/bio/macs2/callpeak"
rule macs2_callpeak_narrow:
input:
treatment="results/orphan_rm_sorted/{sample}.bam",
control="results/orphan_rm_sorted/{control}.bam"
output:
# all output-files must share the same basename and only differ by it's extension
# Usable extensions (and which tools they implicitly call) are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.
multiext("results/macs2_callpeak/{sample}-{control}.narrow",
"_peaks.xls",
# these output extensions internally set the --bdg or -B option:
"_treat_pileup.bdg",
"_control_lambda.bdg",
# these output extensions internally set the --broad option:
"_peaks.narrowPeak",
"_summits.bed"
)
log:
"logs/macs2/callpeak.{sample}-{control}.narrow.log"
params: # ToDo: move to config?
"-f BAMPE -g hs --SPMR --qvalue 0.05 --keep-dup all"
wrapper:
"0.66.0/bio/macs2/callpeak"
rule peaks_count:
input:
peaks="results/macs2_callpeak/{sample}-{control}.{peak}_peaks.{peak}Peak"
output:
"results/macs2_callpeak/peaks_count/{sample}-{control}.{peak}.peaks_count.tsv"
log:
"logs/macs2_callpeak/peaks_count/{sample}-{control}.{peak}.peaks_count.log"
conda:
"../envs/gawk.yaml"
shell:
"cat {input.peaks} | "
" wc -l | "
" gawk -v OFS='\t' '{{print \"{wildcards.sample}-{wildcards.control}_{wildcards.peak}_peaks\", $1}}' "
" > {output} 2> {log}"
rule sm_report_peaks_count_plot:
input:
get_peaks_count_plot_input()
output:
report("results/macs2_callpeak/plots/plot_{peak}_peaks_count.pdf", caption="../report/plot_peaks_count_macs2.rst", category="CallPeaks")
log:
"logs/macs2_callpeak/plot_{peak}_peaks_count.log"
conda:
"../envs/r_plots.yaml"
script:
"../scripts/plot_peaks_count_macs2.R"
rule bedtools_intersect:
input:
left="results/orphan_rm_sorted/{sample}.bam",
right="results/macs2_callpeak/{sample}-{control}.{peak}_peaks.{peak}Peak"
output:
pipe("results/intersect/{sample}-{control}.{peak}.intersected.bed")
params:
extra="-bed -c -f 0.20"
log:
"logs/intersect/{sample}-{control}.{peak}.intersected.log"
wrapper:
"0.66.0/bio/bedtools/intersect"
rule frip_score:
input:
intersect="results/intersect/{sample}-{control}.{peak}.intersected.bed",
flagstats="results/orphan_rm_sorted/{sample}.orphan_rm_sorted.flagstat"
output:
"results/intersect/{sample}-{control}.{peak}.peaks_frip.tsv"
log:
"logs/intersect/{sample}-{control}.{peak}.peaks_frip.log"
conda:
"../envs/gawk.yaml"
shell:
"grep 'mapped (' {input.flagstats} | "
" gawk -v a=$(gawk -F '\t' '{{sum += $NF}} END {{print sum}}' < {input.intersect}) "
" -v OFS='\t' "
" '{{print \"{wildcards.sample}-{wildcards.control}_{wildcards.peak}_peaks\", a/$1}}' "
" > {output} 2> {log}"
rule sm_rep_frip_score:
input:
get_frip_score_input()
output:
report("results/macs2_callpeak/plots/plot_{peak}_peaks_frip_score.pdf", caption="../report/plot_frip_score_macs2_bedtools.rst", category="CallPeaks")
log:
"logs/intersect/plot_{peak}_peaks_frip_score.log"
conda:
"../envs/r_plots.yaml"
script:
"../scripts/plot_frip_score.R"
rule create_igv_peaks:
input:
"results/macs2_callpeak/{sample}-{control}.{peak}_peaks.{peak}Peak"
output:
"results/IGV/macs2_callpeak/{peak}/merged_library.{sample}-{control}.{peak}_peaks.igv.txt"
log:
"logs/igv/create_igv_peaks/merged_library.{sample}-{control}.{peak}_peaks.log"
shell:
" find {input} -type f -name '*_peaks.{wildcards.peak}Peak' -exec echo -e 'results/IGV/macs2_callpeak/{wildcards.peak}/\"{{}}\"\t0,0,178' \; > {output} 2> {log}"
rule homer_annotatepeaks:
input:
peaks="results/macs2_callpeak/{sample}-{control}.{peak}_peaks.{peak}Peak",
genome="resources/ref/genome.fasta",
gtf="resources/ref/annotation.gtf",
output:
annotations="results/homer/annotate_peaks/{sample}-{control}.{peak}_peaks.annotatePeaks.txt"
threads:
2
params:
mode="",
extra="-gid"
log:
"logs/homer/annotate_peaks/{sample}-{control}.{peak}.log"
conda:
"../envs/temp_annotatepeaks.yaml"
script:
"../scripts/temp_annotatepeaks_wrapper.py"
# #TODO: add wrapper and remove script, env and conda statement in this rule
# wrapper:
# "xxx/bio/homer/annotatePeaks"
rule plot_macs_qc:
input:
get_plot_macs_qc_input()
output: #ToDo: add description to report caption
summmary="results/macs2_callpeak/plots/plot_{peak}_peaks_macs2_summary.txt",
plot=report("results/macs2_callpeak/plots/plot_{peak}_peaks_macs2.pdf", caption="../report/plot_macs2_qc.rst", category="CallPeaks")
params:
input = lambda wc, input: ','.join(input),
sample_control_combinations = ','.join(get_sample_control_peak_combinations_list())
log:
"logs/macs2_callpeak/plot_{peak}_peaks_macs2.log"
conda:
"../envs/plot_macs_annot.yaml"
shell:
"Rscript ../workflow/scripts/plot_macs_qc.R -i {params.input} -s {params.sample_control_combinations} -o {output.plot} -p {output.summmary} 2> {log}"
rule plot_homer_annotatepeaks:
input:
get_plot_homer_annotatepeaks_input()
output: #ToDo: add description to report caption
summmary="results/homer/plots/plot_{peak}_annotatepeaks_summary.txt",
plot=report("results/homer/plots/plot_{peak}_annotatepeaks.pdf", caption="../report/plot_annotatepeaks_homer.rst", category="CallPeaks")
params:
input = lambda wc, input: ','.join(input),
sample_control_combinations = ','.join(get_sample_control_peak_combinations_list())
log:
"logs/homer/plot_{peak}_annotatepeaks.log"
conda:
"../envs/plot_macs_annot.yaml"
shell:
"Rscript ../workflow/scripts/plot_homer_annotatepeaks.R -i {params.input} -s {params.sample_control_combinations} -o {output.plot} -p {output.summmary} 2> {log}"
rule plot_sum_annotatepeaks:
input:
"results/homer/plots/plot_{peak}_annotatepeaks_summary.txt"
output:
report("results/homer/plots/plot_{peak}_annotatepeaks_summary.pdf", caption="../report/plot_annotatepeaks_summary_homer.rst", category="CallPeaks")
log:
"logs/homer/plot_{peak}_annotatepeaks_summary.log"
conda:
"../envs/r_plots.yaml"
script:
"../scripts/plot_annotatepeaks_summary_homer.R"
......@@ -14,7 +14,7 @@ rule collect_multiple_metrics:
input:
bam="results/orphan_rm_sorted/{sample}.bam",
ref="resources/ref/genome.fasta"
output:
output: #ToDo: add descriptions to report captions
# Through the output file extensions the different tools for the metrics can be selected
# so that it is not necessary to specify them under params with the "PROGRAM" option.
# Usable extensions (and which tools they implicitly call) are listed here:
......@@ -22,14 +22,14 @@ rule collect_multiple_metrics:
multiext("{path}{sample}",
".alignment_summary_metrics",
".base_distribution_by_cycle_metrics",
".base_distribution_by_cycle.pdf",
".insert_size_metrics",
".insert_size_histogram.pdf",
".quality_by_cycle_metrics",
".quality_by_cycle.pdf",
".quality_distribution_metrics",
".quality_distribution.pdf"
)
),
report("{path}{sample}.base_distribution_by_cycle.pdf", caption="../report/plot_base_distribution_by_cycle_picard_mm.rst", category="MulitpleMetrics"),
report("{path}{sample}.insert_size_histogram.pdf", caption="../report/plot_insert_size_histogram_picard_mm.rst", category="MulitpleMetrics"),
report("{path}{sample}.quality_by_cycle.pdf", caption="../report/plot_quality_by_cycle_picard_mm.rst", category="MulitpleMetrics"),
report("{path}{sample}.quality_distribution.pdf", caption="../report/plot_quality_distribution_picard_mm.rst", category="MulitpleMetrics")
resources:
# This parameter (default 3 GB) can be used to limit the total resources a pipeline is allowed to use, see:
# https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
......@@ -78,16 +78,16 @@ rule bedGraphToBigWig:
wrapper:
"0.64.0/bio/ucsc/bedGraphToBigWig"
rule create_igv:
rule create_igv_bigwig:
input:
"resources/ref/genome.bed",
expand("results/big_wig/{sample}.bigWig", sample=samples.index)
output:
"results/IGV/merged_library.bigWig.igv.txt"
"results/IGV/big_wig/merged_library.bigWig.igv.txt"
log:
"logs/igv/create_igv.log"
"logs/igv/create_igv_bigwig.log"
shell:
"find {input} -type f -name '*.bigWig' -exec echo -e 'results/big_wig/\"{{}}\"\t0,0,178' \; > {output} 2> {log}"
"find {input} -type f -name '*.bigWig' -exec echo -e 'results/IGV/big_wig/\"{{}}\"\t0,0,178' \; > {output} 2> {log}"
rule compute_matrix:
input:
......@@ -115,10 +115,10 @@ rule compute_matrix:
rule plot_profile:
input:
"results/deeptools/matrix_files/matrix.gz"
output:
output: #ToDo: add description to report caption
# Usable output variables, their extensions and which option they implicitly call are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotprofile.html.
plot_img="results/deeptools/plot_profile.pdf",
plot_img=report("results/deeptools/plot_profile.pdf", caption="../report/plot_profile_deeptools.rst", category="GenomicRegions"),
data="results/deeptools/plot_profile_data.tab"
log:
"logs/deeptools/plot_profile.log"
......@@ -130,10 +130,10 @@ rule plot_profile:
rule plot_heatmap:
input:
"results/deeptools/matrix_files/matrix.gz"
output:
output: #ToDo: add description to report caption
# Usable output variables, their extensions and which option they implicitly call are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotheatmap.html.
heatmap_img="results/deeptools/heatmap.pdf",
heatmap_img=report("results/deeptools/heatmap.pdf", caption="../report/plot_heatmap_deeptools.rst", category="Heatmaps"),
heatmap_matrix="results/deeptools/heatmap_matrix.tab"
log:
"logs/deeptools/heatmap.log"
......@@ -145,10 +145,10 @@ rule plot_heatmap:
rule phantompeakqualtools:
input:
"results/orphan_rm_sorted/{sample}.bam"
output:
output: #ToDo: add description to report caption
res_phantom="results/phantompeakqualtools/{sample}.phantompeak.spp.out",
r_data="results/phantompeakqualtools/{sample}.phantompeak.Rdata",
plot="results/phantompeakqualtools/{sample}.phantompeak.pdf"
plot=report("results/phantompeakqualtools/{sample}.phantompeak.pdf", caption="../report/plot_phantompeak_phantompeakqualtools.rst", category="Phantompeak")
threads:
8
log:
......
......@@ -10,6 +10,12 @@ properties:
condition:
type: string
description: sample condition that will be compared during differential analysis (e.g. a treatment, a tissue time, a disease)
control:
type: string
description: control associated with the sample, entries that don't have a control are themselves controls
antibody:
type: string
description: name of the antibody used
# columns that the config/samples.tsv file must have to pass schema validation
required:
......
log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")
library("tidyverse")
homer_data <- read_tsv(snakemake@input[[1]])
homer_data <- homer_data %>% gather(`exon`, `Intergenic`, `intron`, `promoter-TSS`, `TTS`, key="sequence_element", value="counts")
peaks_sum <- ggplot(homer_data, aes(x = counts, y = sample, fill = sequence_element)) +
geom_bar(position="fill", stat="Identity") +
theme_minimal() +
labs(x="", y="Peak count") +
theme(legend.position = "right") +
guides(fill=guide_legend("sequence element")) +
ggtitle("Peak to feature proportion")
ggsave(snakemake@output[[1]], peaks_sum)
log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")
library("tidyverse")
data <- lapply(snakemake@input, read.table, header=F, stringsAsFactors<