Skip to content
Snippets Groups Projects
Commit 6bc32fb7 authored by Aaron's avatar Aaron
Browse files

[DOC] update docs for merge and cluster

parent c6b49209
No related branches found
No related tags found
No related merge requests found
docs/content/images/tool-glyphs/cluster-glyph.png

35.5 KiB

docs/content/images/tool-glyphs/merge-glyph.png

36.2 KiB | W: | H:

docs/content/images/tool-glyphs/merge-glyph.png

36.2 KiB | W: | H:

docs/content/images/tool-glyphs/merge-glyph.png
docs/content/images/tool-glyphs/merge-glyph.png
docs/content/images/tool-glyphs/merge-glyph.png
docs/content/images/tool-glyphs/merge-glyph.png
  • 2-up
  • Swipe
  • Onion skin
###############
*cluster*
###############
\ No newline at end of file
###############
|
.. image:: ../images/tool-glyphs/cluster-glyph.png
:width: 600pt
|
Similar to :doc:`../tools/merge`, ``cluster`` report each set of overlapping or
"book-ended" features in an interval file. In contrast to ``merge``,
``cluster`` does not flatten the cluster of intervals into a new meta-interval;
instead, it assigns an unique cluster ID to each record in each cluster. This
is useful for having fine control over how sets of overlapping intervals in
a single interval file are combined.
.. note::
``bedtools cluster`` requires that you presort your data by chromosome and
then by start position (e.g., ``sort k1,1 -k2,2n in.bed > in.sorted.bed``
for BED files).
.. seealso::
:doc:`../tools/merge`
==========================================================================
Usage and option summary
==========================================================================
**Usage**:
::
bedtools cluster [OPTIONS] -i <BED/GFF/VCF>
**(or)**:
::
clusterBed [OPTIONS] -i <BED/GFF/VCF>
=========================== ===============================================================================================================================================================================================================
Option Description
=========================== ===============================================================================================================================================================================================================
**-s** Force strandedness. That is, only cluster features that are the same strand. *By default, this is disabled*.
**-d** Maximum distance between features allowed for features to be clustered. *Default is 0. That is, overlapping and/or book-ended features are clustered*.
=========================== ===============================================================================================================================================================================================================
==========================================================================
Default behavior
==========================================================================
By default, ``bedtools cluster`` collects overlapping (by at least 1 bp) and/or
bookended intervals into distinct clusters. In the example below, the 4th
column is the cluster ID.
.. code-block:: bash
$ cat A.bed
chr1 100 200
chr1 180 250
chr1 250 500
chr1 501 1000
$ bedtools cluster -i A.bed
chr1 100 200 1
chr1 180 250 1
chr1 250 500 1
chr1 501 1000 2
==========================================================================
``-s`` Enforcing "strandedness"
==========================================================================
The ``-s`` option will only cluster intervals that are overlapping/bookended
*and* are on the same strand.
.. code-block:: bash
$ cat A.bed
chr1 100 200 a1 1 +
chr1 180 250 a2 2 +
chr1 250 500 a3 3 -
chr1 501 1000 a4 4 +
$ bedtools cluster -i A.bed -s
chr1 100 200 a1 1 + 1
chr1 180 250 a2 2 + 1
chr1 501 1000 a4 4 + 2
chr1 250 500 a3 3 - 3
==========================================================================
``-d`` Controlling how close two features must be in order to cluster
==========================================================================
By default, only overlapping or book-ended features are combined into a new
feature. However, one can force ``cluster`` to combine more distant features
with the ``-d`` option. For example, were one to set ``-d`` to 1000, any
features that overlap or are within 1000 base pairs of one another will be
clustered.
.. code-block:: bash
$ cat A.bed
chr1 100 200
chr1 501 1000
$ bedtools cluster -i A.bed
chr1 100 200 1
chr1 501 1000 2
$ bedtools cluster -i A.bed -d 1000
chr1 100 200 1
chr1 501 1000 1
......@@ -34,7 +34,7 @@ Usage and option summary
**(or)**:
::
mergeBed [OPTIONS] -i <BED/GFF/VCF> -g <GENOME>
mergeBed [OPTIONS] -i <BED/GFF/VCF>
......@@ -89,7 +89,7 @@ The ``-s`` option will only merge intervals that are overlapping/bookended
chr1 250 500 a3 3 -
chr1 501 1000 a4 4 +
$ bedtools merge -i A.bed
$ bedtools merge -i A.bed -s
chr1 100 250 +
chr1 501 1000 +
chr1 250 500 -
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment