diff --git a/docs/content/images/tool-glyphs/shuffle-glyph.png b/docs/content/images/tool-glyphs/shuffle-glyph.png new file mode 100644 index 0000000000000000000000000000000000000000..1c3479b64913aed2afffbca88c1891c304fdc76a Binary files /dev/null and b/docs/content/images/tool-glyphs/shuffle-glyph.png differ diff --git a/docs/content/tools/shuffle.rst b/docs/content/tools/shuffle.rst index 31949cf359593d321ecde3310627abb400601fbd..26d3716f7d8c65df82b90ec1590a3d53339b12f6 100755 --- a/docs/content/tools/shuffle.rst +++ b/docs/content/tools/shuffle.rst @@ -1,6 +1,13 @@ ############### *shuffle* ############### + +| + +.. image:: ../images/tool-glyphs/shuffle-glyph.png + :width: 600pt + + `bedtools shuffle` will randomly permute the genomic locations of a feature file among a genome defined in a genome file. One can also provide an "exclusions" BED/GFF/VCF file that lists regions where you do @@ -10,7 +17,12 @@ as a *null* basis against which to test the significance of associations of one feature with another. +.. seealso:: + :doc:`../tools/random` + :doc:`../tools/jaccard` + + ========================================================================== Usage and option summary ========================================================================== @@ -31,7 +43,7 @@ Usage and option summary **-incl** A BED file of coordinates in which features from -i *should* be placed. **-chrom** Keep features in -i on the same chromosome. Solely permute their location on the chromosome. *By default, both the chromosome and position are randomly chosen*. **-seed** Supply an integer seed for the shuffling. This will allow feature shuffling experiments to be recreated exactly as the seed for the pseudo-random number generation will be constant. *By default, the seed is chosen automatically*. -**-f** Maximum overlap (as a fraction of the -i feature) with an -excl feature that is tolerated before searching for a new, randomized locus. +**-f** Maximum overlap (as a fraction of the ``-i`` feature) with an ``-excl`` feature that is tolerated before searching for a new, randomized locus. **-chromFirst** Instead of choosing a position randomly among the entire genome (the default), first choose a chrom randomly, and then choose a random start coordinate on that chrom. This leads to features being ~uniformly distributed among the chroms, as opposed to features being distribute as a function of chrom size. **-bedpe** Indicate that the A file is in BEDPE format. **-maxTries** Max. number of attempts to find a home for a shuffled interval in the presence of -incl or -excl. *Default = 1000.* @@ -48,19 +60,20 @@ file on a random chromosome at a random position. The size and strand of each feature are preserved. For example: -:: - cat A.bed +.. code-block:: bash + + $ cat A.bed chr1 0 100 a1 1 + chr1 0 1000 a2 2 - - cat my.genome + $ cat my.genome chr1 10000 chr2 8000 chr3 5000 chr4 2000 - bedtools shuffle -i A.bed -g my.genome + $ bedtools shuffle -i A.bed -g my.genome chr4 1498 1598 a1 1 + chr3 2156 3156 a2 2 - @@ -69,25 +82,24 @@ For example: ========================================================================== -5.13.3 (-chrom) Requiring that features be shuffled on the same chromosome +``-chrom`` Requiring that features be shuffled on the same chromosome ========================================================================== The `-chrom` option behaves the same as the default behavior except that features are randomly placed on the same chromosome as defined in the BED file. -For example: -:: +.. code-block:: bash - cat A.bed + $ cat A.bed chr1 0 100 a1 1 + chr1 0 1000 a2 2 - - cat my.genome + $ cat my.genome chr1 10000 chr2 8000 chr3 5000 chr4 2000 - bedtools shuffle -i A.bed -g my.genome -chrom + $ bedtools shuffle -i A.bed -g my.genome -chrom chr1 9560 9660 a1 1 + chr1 7258 8258 a2 2 - @@ -95,55 +107,57 @@ For example: ========================================================================== -5.13.4 (-excl) Excluding certain genome regions from shuffleBed +``-excl`` Excluding certain genome regions from ``bedtools shuffle`` ========================================================================== One may want to prevent BED features from being placed in certain regions of the genome. For example, one may want to exclude genome gaps from permutation experiment. The `excl` option defines a BED file of regions that should be -excluded. **shuffleBed** will attempt to permute the locations of all features +excluded. ``bedtools shuffle`` will attempt to permute the locations of all features while adhering to the exclusion rules. However it will stop looking for an appropriate location if it cannot find a valid spot for a feature after 1,000,000 tries. For example (*note that the exclude file excludes all but 100 base pairs of the chromosome*): -:: - cat A.bed +.. code-block:: bash + + $ cat A.bed chr1 0 100 a1 1 + chr1 0 1000 a2 2 - - cat my.genome + $ cat my.genome chr1 10000 - cat exclude.bed + $ cat exclude.bed chr1 100 10000 - bedtools shuffle -i A.bed -g my.genome -excl exclude.bed + $ bedtools shuffle -i A.bed -g my.genome -excl exclude.bed chr1 0 100 a1 1 + Error, line 2: tried 1000000 potential loci for entry, but could not avoid excluded regions. Ignoring entry and moving on. For example (*now the exclusion file only excludes the first 100 bases of the chromosome*): -:: - cat A.bed +.. code-block:: bash + + $ cat A.bed chr1 0 100 a1 1 + chr1 0 1000 a2 2 - - cat my.genome + $ cat my.genome chr1 10000 - cat exclude.bed + $ cat exclude.bed chr1 0 100 - bedtools shuffle -i A.bed -g my.genome -excl exclude.bed + $ bedtools shuffle -i A.bed -g my.genome -excl exclude.bed chr1 147 247 a1 1 + chr1 2441 3441 a2 2 - ========================================================================== -5.13.5 (-seed) Defining a "seed" for the random replacement. +``-seed`` Defining a "seed" for the random replacement. ========================================================================== `bedtools shuffle` uses a pseudo-random number generator to permute the locations of BED features. Therefore, each run should produce a different @@ -154,25 +168,27 @@ seed and input files should produce identical results. For example (*note that the exclude file below excludes all but 100 base pairs of the chromosome*): -:: - cat A.bed + +.. code-block:: bash + + $ cat A.bed chr1 0 100 a1 1 + chr1 0 1000 a2 2 - - cat my.genome + $ cat my.genome chr1 10000 - shuffleBed -i A.bed -g my.genome -seed 927442958 + $ bedtools shuffle -i A.bed -g my.genome -seed 927442958 chr1 6177 6277 a1 1 + chr1 8119 9119 a2 2 - - shuffleBed -i A.bed -g my.genome -seed 927442958 + $ bedtools shuffle -i A.bed -g my.genome -seed 927442958 chr1 6177 6277 a1 1 + chr1 8119 9119 a2 2 - . . . - bedtools shuffle -i A.bed -g my.genome -seed 927442958 + $ bedtools shuffle -i A.bed -g my.genome -seed 927442958 chr1 6177 6277 a1 1 + chr1 8119 9119 a2 2 -