Skip to content
Snippets Groups Projects
Commit 4a06a79a authored by Aaron's avatar Aaron
Browse files

[DOC] update docs for shuffle.

parent 7ff5a929
No related branches found
No related tags found
No related merge requests found
docs/content/images/tool-glyphs/shuffle-glyph.png

30.9 KiB

############### ###############
*shuffle* *shuffle*
############### ###############
|
.. image:: ../images/tool-glyphs/shuffle-glyph.png
:width: 600pt
`bedtools shuffle` will randomly permute the genomic locations of a feature `bedtools shuffle` will randomly permute the genomic locations of a feature
file among a genome defined in a genome file. One can also provide an file among a genome defined in a genome file. One can also provide an
"exclusions" BED/GFF/VCF file that lists regions where you do "exclusions" BED/GFF/VCF file that lists regions where you do
...@@ -10,7 +17,12 @@ as a *null* basis against which to test the significance of associations ...@@ -10,7 +17,12 @@ as a *null* basis against which to test the significance of associations
of one feature with another. of one feature with another.
.. seealso::
:doc:`../tools/random`
:doc:`../tools/jaccard`
========================================================================== ==========================================================================
Usage and option summary Usage and option summary
========================================================================== ==========================================================================
...@@ -31,7 +43,7 @@ Usage and option summary ...@@ -31,7 +43,7 @@ Usage and option summary
**-incl** A BED file of coordinates in which features from -i *should* be placed. **-incl** A BED file of coordinates in which features from -i *should* be placed.
**-chrom** Keep features in -i on the same chromosome. Solely permute their location on the chromosome. *By default, both the chromosome and position are randomly chosen*. **-chrom** Keep features in -i on the same chromosome. Solely permute their location on the chromosome. *By default, both the chromosome and position are randomly chosen*.
**-seed** Supply an integer seed for the shuffling. This will allow feature shuffling experiments to be recreated exactly as the seed for the pseudo-random number generation will be constant. *By default, the seed is chosen automatically*. **-seed** Supply an integer seed for the shuffling. This will allow feature shuffling experiments to be recreated exactly as the seed for the pseudo-random number generation will be constant. *By default, the seed is chosen automatically*.
**-f** Maximum overlap (as a fraction of the -i feature) with an -excl feature that is tolerated before searching for a new, randomized locus. **-f** Maximum overlap (as a fraction of the ``-i`` feature) with an ``-excl`` feature that is tolerated before searching for a new, randomized locus.
**-chromFirst** Instead of choosing a position randomly among the entire genome (the default), first choose a chrom randomly, and then choose a random start coordinate on that chrom. This leads to features being ~uniformly distributed among the chroms, as opposed to features being distribute as a function of chrom size. **-chromFirst** Instead of choosing a position randomly among the entire genome (the default), first choose a chrom randomly, and then choose a random start coordinate on that chrom. This leads to features being ~uniformly distributed among the chroms, as opposed to features being distribute as a function of chrom size.
**-bedpe** Indicate that the A file is in BEDPE format. **-bedpe** Indicate that the A file is in BEDPE format.
**-maxTries** Max. number of attempts to find a home for a shuffled interval in the presence of -incl or -excl. *Default = 1000.* **-maxTries** Max. number of attempts to find a home for a shuffled interval in the presence of -incl or -excl. *Default = 1000.*
...@@ -48,19 +60,20 @@ file on a random chromosome at a random position. The size and strand of each ...@@ -48,19 +60,20 @@ file on a random chromosome at a random position. The size and strand of each
feature are preserved. feature are preserved.
For example: For example:
::
cat A.bed .. code-block:: bash
$ cat A.bed
chr1 0 100 a1 1 + chr1 0 100 a1 1 +
chr1 0 1000 a2 2 - chr1 0 1000 a2 2 -
cat my.genome $ cat my.genome
chr1 10000 chr1 10000
chr2 8000 chr2 8000
chr3 5000 chr3 5000
chr4 2000 chr4 2000
bedtools shuffle -i A.bed -g my.genome $ bedtools shuffle -i A.bed -g my.genome
chr4 1498 1598 a1 1 + chr4 1498 1598 a1 1 +
chr3 2156 3156 a2 2 - chr3 2156 3156 a2 2 -
...@@ -69,25 +82,24 @@ For example: ...@@ -69,25 +82,24 @@ For example:
========================================================================== ==========================================================================
5.13.3 (-chrom) Requiring that features be shuffled on the same chromosome ``-chrom`` Requiring that features be shuffled on the same chromosome
========================================================================== ==========================================================================
The `-chrom` option behaves the same as the default behavior except that The `-chrom` option behaves the same as the default behavior except that
features are randomly placed on the same chromosome as defined in the BED file. features are randomly placed on the same chromosome as defined in the BED file.
For example: .. code-block:: bash
::
cat A.bed $ cat A.bed
chr1 0 100 a1 1 + chr1 0 100 a1 1 +
chr1 0 1000 a2 2 - chr1 0 1000 a2 2 -
cat my.genome $ cat my.genome
chr1 10000 chr1 10000
chr2 8000 chr2 8000
chr3 5000 chr3 5000
chr4 2000 chr4 2000
bedtools shuffle -i A.bed -g my.genome -chrom $ bedtools shuffle -i A.bed -g my.genome -chrom
chr1 9560 9660 a1 1 + chr1 9560 9660 a1 1 +
chr1 7258 8258 a2 2 - chr1 7258 8258 a2 2 -
...@@ -95,55 +107,57 @@ For example: ...@@ -95,55 +107,57 @@ For example:
========================================================================== ==========================================================================
5.13.4 (-excl) Excluding certain genome regions from shuffleBed ``-excl`` Excluding certain genome regions from ``bedtools shuffle``
========================================================================== ==========================================================================
One may want to prevent BED features from being placed in certain regions of One may want to prevent BED features from being placed in certain regions of
the genome. For example, one may want to exclude genome gaps from permutation the genome. For example, one may want to exclude genome gaps from permutation
experiment. The `excl` option defines a BED file of regions that should be experiment. The `excl` option defines a BED file of regions that should be
excluded. **shuffleBed** will attempt to permute the locations of all features excluded. ``bedtools shuffle`` will attempt to permute the locations of all features
while adhering to the exclusion rules. However it will stop looking for an while adhering to the exclusion rules. However it will stop looking for an
appropriate location if it cannot find a valid spot for a feature appropriate location if it cannot find a valid spot for a feature
after 1,000,000 tries. after 1,000,000 tries.
For example (*note that the exclude file excludes all but 100 base pairs of the chromosome*): For example (*note that the exclude file excludes all but 100 base pairs of the chromosome*):
::
cat A.bed .. code-block:: bash
$ cat A.bed
chr1 0 100 a1 1 + chr1 0 100 a1 1 +
chr1 0 1000 a2 2 - chr1 0 1000 a2 2 -
cat my.genome $ cat my.genome
chr1 10000 chr1 10000
cat exclude.bed $ cat exclude.bed
chr1 100 10000 chr1 100 10000
bedtools shuffle -i A.bed -g my.genome -excl exclude.bed $ bedtools shuffle -i A.bed -g my.genome -excl exclude.bed
chr1 0 100 a1 1 + chr1 0 100 a1 1 +
Error, line 2: tried 1000000 potential loci for entry, but could not avoid excluded Error, line 2: tried 1000000 potential loci for entry, but could not avoid excluded
regions. Ignoring entry and moving on. regions. Ignoring entry and moving on.
For example (*now the exclusion file only excludes the first 100 bases of the chromosome*): For example (*now the exclusion file only excludes the first 100 bases of the chromosome*):
::
cat A.bed .. code-block:: bash
$ cat A.bed
chr1 0 100 a1 1 + chr1 0 100 a1 1 +
chr1 0 1000 a2 2 - chr1 0 1000 a2 2 -
cat my.genome $ cat my.genome
chr1 10000 chr1 10000
cat exclude.bed $ cat exclude.bed
chr1 0 100 chr1 0 100
bedtools shuffle -i A.bed -g my.genome -excl exclude.bed $ bedtools shuffle -i A.bed -g my.genome -excl exclude.bed
chr1 147 247 a1 1 + chr1 147 247 a1 1 +
chr1 2441 3441 a2 2 - chr1 2441 3441 a2 2 -
========================================================================== ==========================================================================
5.13.5 (-seed) Defining a "seed" for the random replacement. ``-seed`` Defining a "seed" for the random replacement.
========================================================================== ==========================================================================
`bedtools shuffle` uses a pseudo-random number generator to permute the `bedtools shuffle` uses a pseudo-random number generator to permute the
locations of BED features. Therefore, each run should produce a different locations of BED features. Therefore, each run should produce a different
...@@ -154,25 +168,27 @@ seed and input files should produce identical results. ...@@ -154,25 +168,27 @@ seed and input files should produce identical results.
For example (*note that the exclude file below excludes all but 100 base pairs For example (*note that the exclude file below excludes all but 100 base pairs
of the chromosome*): of the chromosome*):
::
cat A.bed
.. code-block:: bash
$ cat A.bed
chr1 0 100 a1 1 + chr1 0 100 a1 1 +
chr1 0 1000 a2 2 - chr1 0 1000 a2 2 -
cat my.genome $ cat my.genome
chr1 10000 chr1 10000
shuffleBed -i A.bed -g my.genome -seed 927442958 $ bedtools shuffle -i A.bed -g my.genome -seed 927442958
chr1 6177 6277 a1 1 + chr1 6177 6277 a1 1 +
chr1 8119 9119 a2 2 - chr1 8119 9119 a2 2 -
shuffleBed -i A.bed -g my.genome -seed 927442958 $ bedtools shuffle -i A.bed -g my.genome -seed 927442958
chr1 6177 6277 a1 1 + chr1 6177 6277 a1 1 +
chr1 8119 9119 a2 2 - chr1 8119 9119 a2 2 -
. . . . . .
bedtools shuffle -i A.bed -g my.genome -seed 927442958 $ bedtools shuffle -i A.bed -g my.genome -seed 927442958
chr1 6177 6277 a1 1 + chr1 6177 6277 a1 1 +
chr1 8119 9119 a2 2 - chr1 8119 9119 a2 2 -
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment