diff --git a/docs/content/images/tool-glyphs/cluster-glyph.png b/docs/content/images/tool-glyphs/cluster-glyph.png new file mode 100644 index 0000000000000000000000000000000000000000..4cea87073a492946f41f99d7ab8d101cfdacaaaa Binary files /dev/null and b/docs/content/images/tool-glyphs/cluster-glyph.png differ diff --git a/docs/content/images/tool-glyphs/merge-glyph.png b/docs/content/images/tool-glyphs/merge-glyph.png index d22cea81fec728d14d13910468264434649eda51..2a1c8ac2b058353521da7d9d9ac05304e6072659 100644 Binary files a/docs/content/images/tool-glyphs/merge-glyph.png and b/docs/content/images/tool-glyphs/merge-glyph.png differ diff --git a/docs/content/tools/cluster.rst b/docs/content/tools/cluster.rst index 3cf15f7b3382f640deb2b5a2407c0bf58af6fec1..0ab1fd7e0841660b31ad1af94ece0f84b20faec9 100644 --- a/docs/content/tools/cluster.rst +++ b/docs/content/tools/cluster.rst @@ -1,3 +1,121 @@ ############### *cluster* -############### \ No newline at end of file +############### + +| + +.. image:: ../images/tool-glyphs/cluster-glyph.png + :width: 600pt +| + + +Similar to :doc:`../tools/merge`, ``cluster`` report each set of overlapping or +"book-ended" features in an interval file. In contrast to ``merge``, +``cluster`` does not flatten the cluster of intervals into a new meta-interval; +instead, it assigns an unique cluster ID to each record in each cluster. This +is useful for having fine control over how sets of overlapping intervals in +a single interval file are combined. + +.. note:: + + ``bedtools cluster`` requires that you presort your data by chromosome and + then by start position (e.g., ``sort k1,1 -k2,2n in.bed > in.sorted.bed`` + for BED files). + +.. seealso:: + + :doc:`../tools/merge` + + +========================================================================== +Usage and option summary +========================================================================== +**Usage**: +:: + + bedtools cluster [OPTIONS] -i <BED/GFF/VCF> + +**(or)**: +:: + + clusterBed [OPTIONS] -i <BED/GFF/VCF> + + + +=========================== =============================================================================================================================================================================================================== +Option Description +=========================== =============================================================================================================================================================================================================== +**-s** Force strandedness. That is, only cluster features that are the same strand. *By default, this is disabled*. +**-d** Maximum distance between features allowed for features to be clustered. *Default is 0. That is, overlapping and/or book-ended features are clustered*. +=========================== =============================================================================================================================================================================================================== + + + + + +========================================================================== +Default behavior +========================================================================== +By default, ``bedtools cluster`` collects overlapping (by at least 1 bp) and/or +bookended intervals into distinct clusters. In the example below, the 4th +column is the cluster ID. + +.. code-block:: bash + + $ cat A.bed + chr1 100 200 + chr1 180 250 + chr1 250 500 + chr1 501 1000 + + $ bedtools cluster -i A.bed + chr1 100 200 1 + chr1 180 250 1 + chr1 250 500 1 + chr1 501 1000 2 + + +========================================================================== +``-s`` Enforcing "strandedness" +========================================================================== +The ``-s`` option will only cluster intervals that are overlapping/bookended +*and* are on the same strand. + +.. code-block:: bash + + $ cat A.bed + chr1 100 200 a1 1 + + chr1 180 250 a2 2 + + chr1 250 500 a3 3 - + chr1 501 1000 a4 4 + + + $ bedtools cluster -i A.bed -s + chr1 100 200 a1 1 + 1 + chr1 180 250 a2 2 + 1 + chr1 501 1000 a4 4 + 2 + chr1 250 500 a3 3 - 3 + + +========================================================================== +``-d`` Controlling how close two features must be in order to cluster +========================================================================== +By default, only overlapping or book-ended features are combined into a new +feature. However, one can force ``cluster`` to combine more distant features +with the ``-d`` option. For example, were one to set ``-d`` to 1000, any +features that overlap or are within 1000 base pairs of one another will be +clustered. + +.. code-block:: bash + + $ cat A.bed + chr1 100 200 + chr1 501 1000 + + $ bedtools cluster -i A.bed + chr1 100 200 1 + chr1 501 1000 2 + + $ bedtools cluster -i A.bed -d 1000 + chr1 100 200 1 + chr1 501 1000 1 + diff --git a/docs/content/tools/merge.rst b/docs/content/tools/merge.rst index 2c4b5da2cecb0ed65027ef69b93e557ed623b021..b8bfed7323cbdc0c43d103edb9f7e293aa045321 100755 --- a/docs/content/tools/merge.rst +++ b/docs/content/tools/merge.rst @@ -34,7 +34,7 @@ Usage and option summary **(or)**: :: - mergeBed [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> + mergeBed [OPTIONS] -i <BED/GFF/VCF> @@ -89,7 +89,7 @@ The ``-s`` option will only merge intervals that are overlapping/bookended chr1 250 500 a3 3 - chr1 501 1000 a4 4 + - $ bedtools merge -i A.bed + $ bedtools merge -i A.bed -s chr1 100 250 + chr1 501 1000 + chr1 250 500 -