Skip to content
Snippets Groups Projects
Commit 0b78ddfd authored by Aaron's avatar Aaron
Browse files

[DOCS] multiple improvements to HTML docs

parent ae48a3ee
No related branches found
No related tags found
No related merge requests found
###############
Example usage
###############
Below are several examples of basic BEDTools usage. Example BED files are provided in the
/data directory of the BEDTools distribution.
Below are several examples of basic bedtools usage. Example BED files are
provided in the /data directory of the bedtools distribution.
==========================================================================
6.1 intersectBed
bedtools intersect
==========================================================================
6.1.1 Report the base-pair overlap between sequence alignments and genes.
::
intersectBed -a reads.bed -b genes.bed
Report the base-pair overlap between sequence alignments and genes.
6.1.2 Report whether each alignment overlaps one or more genes. If not, the alignment is not reported.
::
intersectBed -a reads.bed -b genes.bed -u
.. code-block:: bash
bedtools intersect -a reads.bed -b genes.bed
6.1.3 Report those alignments that overlap NO genes. Like "grep -v"
::
intersectBed -a reads.bed -b genes.bed -v
6.1.4 Report the number of genes that each alignment overlaps.
::
intersectBed -a reads.bed -b genes.bed -c
Report whether each alignment overlaps one or more genes. If not, the alignment is not reported.
.. code-block:: bash
bedtools intersect -a reads.bed -b genes.bed -u
Report those alignments that overlap NO genes. Like "grep -v"
6.1.5 Report the entire, original alignment entry for each overlap with a gene.
::
intersectBed -a reads.bed -b genes.bed -wa
.. code-block:: bash
bedtools intersect -a reads.bed -b genes.bed -v
Report the number of genes that each alignment overlaps.
6.1.6 Report the entire, original gene entry for each overlap with a gene.
::
intersectBed -a reads.bed -b genes.bed -wb
.. code-block:: bash
bedtools intersect -a reads.bed -b genes.bed -c
6.1.7 Report the entire, original alignment and gene entries for each overlap.
::
intersectBed -a reads.bed -b genes.bed -wa -wb
Report the entire, original alignment entry for each overlap with a gene.
6.1.8 Only report an overlap with a repeat if it spans at least 50% of the exon.
::
intersectBed -a exons.bed -b repeatMasker.bed -f 0.50
.. code-block:: bash
bedtools intersect -a reads.bed -b genes.bed -wa
6.1.9 Only report an overlap if comprises 50% of the structural variant and 50% of the segmental duplication. Thus, it is reciprocally at least a 50% overlap.
::
intersectBed -a SV.bed -b segmentalDups.bed -f 0.50 -r
Report the entire, original gene entry for each overlap with a gene.
.. code-block:: bash
bedtools intersect -a reads.bed -b genes.bed -wb
6.1.10 Read BED A from stdin. For example, find genes that overlap LINEs but not SINEs.
::
intersectBed -a genes.bed -b LINES.bed | intersectBed -a stdin -b SINEs.bed -v
Report the entire, original alignment and gene entries for each overlap.
6.1.11 Retain only single-end BAM alignments that overlap exons.
::
intersectBed -abam reads.bam -b exons.bed > reads.touchingExons.bam
.. code-block:: bash
bedtools intersect -a reads.bed -b genes.bed -wa -wb
6.1.12 Retain only single-end BAM alignments that do not overlap simple sequence
repeats.
::
intersectBed -abam reads.bam -b SSRs.bed -v > reads.noSSRs.bam
Only report an overlap with a repeat if it spans at least 50% of the exon.
==========================================================================
6.2 pairToBed
==========================================================================
.. code-block:: bash
bedtools intersect -a exons.bed -b repeatMasker.bed -f 0.50
6.2.1 Return all structural variants (in BEDPE format) that overlap with genes on either
end.
::
pairToBed -a sv.bedpe -b genes > sv.genes
Only report an overlap if comprises 50% of the structural variant and 50% of the segmental duplication. Thus, it is reciprocally at least a 50% overlap.
.. code-block:: bash
bedtools intersect -a SV.bed -b segmentalDups.bed -f 0.50 -r
6.2.2 Return all structural variants (in BEDPE format) that overlap with genes on both
end.
::
pairToBed -a sv.bedpe -b genes -type both > sv.genes
Read BED A from stdin. For example, find genes that overlap LINEs but not SINEs.
.. code-block:: bash
bedtools intersect -a genes.bed -b LINES.bed | intersectBed -a stdin -b SINEs.bed -v
6.2.3 Retain only paired-end BAM alignments where neither end overlaps simple
sequence repeats.
::
pairToBed -abam reads.bam -b SSRs.bed -type neither > reads.noSSRs.bam
6.2.4 Retain only paired-end BAM alignments where both ends overlap segmental
duplications.
::
pairToBed -abam reads.bam -b segdups.bed -type both > reads.SSRs.bam
Retain only single-end BAM alignments that overlap exons.
.. code-block:: bash
6.2.5 Retain only paired-end BAM alignments where neither or one and only one end
overlaps segmental duplications.
::
pairToBed -abam reads.bam -b segdups.bed -type notboth > reads.notbothSSRs.bam
bedtools intersect -abam reads.bam -b exons.bed > reads.touchingExons.bam
==========================================================================
6.3 pairToPair
==========================================================================
Retain only single-end BAM alignments that do not overlap simple sequence
repeats.
6.3.1 Find all SVs (in BEDPE format) in sample 1 that are also in sample 2.
::
pairToPair -a 1.sv.bedpe -b 2.sv.bedpe | cut -f 1-10 > 1.sv.in2.bedpe
.. code-block:: bash
bedtools intersect -abam reads.bam -b SSRs.bed -v > reads.noSSRs.bam
6.3.2 Find all SVs (in BEDPE format) in sample 1 that are not in sample 2.
::
pairToPair -a 1.sv.bedpe -b 2.sv.bedpe -type neither | cut -f 1-10 >
1.sv.notin2.bedpe
==========================================================================
bedtools bamtobed
==========================================================================
Convert BAM alignments to BED format.
.. code-block:: bash
bedtools bamtobed -i reads.bam > reads.bed
==========================================================================
6.4 bamToBed
==========================================================================
Convert BAM alignments to BED format using the BAM edit distance (NM) as the
BED "score".
6.4.1 Convert BAM alignments to BED format.
::
bamToBed -i reads.bam > reads.bed
.. code-block:: bash
bedtools bamtobed -i reads.bam -ed > reads.bed
6.4.2 Convert BAM alignments to BED format using the BAM edit distance (NM) as the
BED "score".
::
bamToBed -i reads.bam -ed > reads.bed
Convert BAM alignments to BEDPE format.
6.4.3 Convert BAM alignments to BEDPE format.
::
bamToBed -i reads.bam -bedpe > reads.bedpe
.. code-block:: bash
bedtools bamtobed -i reads.bam -bedpe > reads.bedpe
==========================================================================
6.5 windowBed
bedtools window
==========================================================================
6.5.1 Report all genes that are within 10000 bp upstream or downstream of CNVs.
::
windowBed -a CNVs.bed -b genes.bed -w 10000
Report all genes that are within 10000 bp upstream or downstream of CNVs.
.. code-block:: bash
bedtools window -a CNVs.bed -b genes.bed -w 10000
6.5.2 Report all genes that are within 10000 bp upstream or 5000 bp downstream of
Report all genes that are within 10000 bp upstream or 5000 bp downstream of
CNVs.
::
windowBed -a CNVs.bed -b genes.bed -l 10000 -r 5000
.. code-block:: bash
bedtools window -a CNVs.bed -b genes.bed -l 10000 -r 5000
6.5.3 Report all SNPs that are within 5000 bp upstream or 1000 bp downstream of genes.
Report all SNPs that are within 5000 bp upstream or 1000 bp downstream of genes.
Define upstream and downstream based on strand.
::
windowBed -a genes.bed -b snps.bed -l 5000 -r 1000 -sw
.. code-block:: bash
bedtools window -a genes.bed -b snps.bed -l 5000 -r 1000 -sw
==========================================================================
6.6 closestBed
bedtools closest
==========================================================================
Note: By default, if there is a tie for closest, all ties will be reported. **closestBed** allows overlapping
features to be the closest.
6.6.1 Find the closest ALU to each gene.
::
closestBed -a genes.bed -b ALUs.bed
Find the closest ALU to each gene.
.. code-block:: bash
bedtools closest -a genes.bed -b ALUs.bed
6.6.2 Find the closest ALU to each gene, choosing the first ALU in the file if there is a
Find the closest ALU to each gene, choosing the first ALU in the file if there is a
tie.
::
closestBed -a genes.bed -b ALUs.bed -t first
.. code-block:: bash
bedtools closest -a genes.bed -b ALUs.bed -t first
6.6.3 Find the closest ALU to each gene, choosing the last ALU in the file if there is a
Find the closest ALU to each gene, choosing the last ALU in the file if there is a
tie.
::
closestBed -a genes.bed -b ALUs.bed -t last
.. code-block:: bash
bedtools closest -a genes.bed -b ALUs.bed -t last
==========================================================================
6.7 subtractBed
bedtools subtract
==========================================================================
Note: If a feature in A is entirely "spanned" by any feature in B, it will not be reported.
.. note::
If a feature in A is entirely "spanned" by any feature in B, it will not be reported.
Remove introns from gene features. Exons will (should) be reported.
.. code-block:: bash
6.7.1 Remove introns from gene features. Exons will (should) be reported.
::
subtractBed -a genes.bed -b introns.bed
bedtools subtract -a genes.bed -b introns.bed
==========================================================================
6.8 mergeBed
bedtools merge
==========================================================================
.. note::
6.8.1 Merge overlapping repetitive elements into a single entry.
::
mergeBed -i repeatMasker.bed
``merge`` requires that the input is sorted by chromosome and then by start
coordinate. For example, for BED files, one would first sort the input
as follows: ``sort -k1,1 -k2,2n input.bed > input.sorted.bed``
Merge overlapping repetitive elements into a single entry.
.. code-block:: bash
6.8.2 Merge overlapping repetitive elements into a single entry, returning the number of
bedtools merge -i repeatMasker.bed
Merge overlapping repetitive elements into a single entry, returning the number of
entries merged.
::
mergeBed -i repeatMasker.bed -n
.. code-block:: bash
bedtools merge -i repeatMasker.bed -n
6.8.3 Merge nearby (within 1000 bp) repetitive elements into a single entry.
::
mergeBed -i repeatMasker.bed -d 1000
Merge nearby (within 1000 bp) repetitive elements into a single entry.
.. code-block:: bash
bedtools merge -i repeatMasker.bed -d 1000
==========================================================================
6.9 coverageBed
bedtools coverage
==========================================================================
6.9.1 Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
genome.
::
coverageBed -a reads.bed -b windows10kb.bed | head
.. code-block:: bash
bedtools coverage -a reads.bed -b windows10kb.bed | head
chr1 0 10000 0 10000 0.00
chr1 10001 20000 33 10000 0.21
chr1 20001 30000 42 10000 0.29
chr1 30001 40000 71 10000 0.36
6.9.2 Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
genome and created a BEDGRAPH of the number of aligned reads in each window for
display on the UCSC browser.
::
coverageBed -a reads.bed -b windows10kb.bed | cut -f 1-4 > windows10kb.cov.bedg
.. code-block:: bash
bedtools coverage -a reads.bed -b windows10kb.bed | cut -f 1-4 > windows10kb.cov.bedg
6.9.3 Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
genome and created a BEDGRAPH of the fraction of each window covered by at least
one aligned read for display on the UCSC browser.
::
coverageBed -a reads.bed -b windows10kb.bed | awk ¡®{OFS="\t"; print $1,$2,$3,$6}¡¯
> windows10kb.pctcov.bedg
.. code-block:: bash
bedtools coverage -a reads.bed -b windows10kb.bed | \
awk '{OFS="\t"; print $1,$2,$3,$6}' \
> windows10kb.pctcov.bedg
==========================================================================
6.10 complementBed
bedtools complement
==========================================================================
6.10.1 Report all intervals in the human genome that are not covered by repetitive
Report all intervals in the human genome that are not covered by repetitive
elements.
::
complementBed -i repeatMasker.bed -g hg18.genome
.. code-block:: bash
bedtools complement -i repeatMasker.bed -g hg18.genome
==========================================================================
6.11 shuffleBed
bedtools shuffle
==========================================================================
6.11.1 Randomly place all discovered variants in the genome. However, prevent them
Randomly place all discovered variants in the genome. However, prevent them
from being placed in know genome gaps.
::
shuffleBed -i variants.bed -g hg18.genome -excl genome_gaps.bed
.. code-block:: bash
bedtools shuffle -i variants.bed -g hg18.genome -excl genome_gaps.bed
6.11.2 Randomly place all discovered variants in the genome. However, prevent them
Randomly place all discovered variants in the genome. However, prevent them
from being placed in know genome gaps and require that the variants be randomly
placed on the same chromosome.
::
shuffleBed -i variants.bed -g hg18.genome -excl genome_gaps.bed -chrom
.. code-block:: bash
bedtools shuffle -i variants.bed -g hg18.genome -excl genome_gaps.bed -chrom
......@@ -3,19 +3,19 @@ General usage
###############
=======================
4.1 Supported file formats
Supported file formats
=======================
----------------------
4.1.1 BED format
BED format
----------------------
As described on the UCSC Genome Browser website (see link below), the BED format is a concise and
flexible way to represent genomic features and annotations. The BED format description supports up to
12 columns, but only the first 3 are required for the UCSC browser, the Galaxy browser and for
BEDTools. BEDTools allows one to use the "BED12" format (that is, all 12 fields listed below).
bedtools. bedtools allows one to use the "BED12" format (that is, all 12 fields listed below).
However, only intersectBed, coverageBed, genomeCoverageBed, and bamToBed will obey the BED12
"blocks" when computing overlaps, etc., via the **"-split"** option. For all other tools, the last six columns
are not used for any comparisons by the BEDTools. Instead, they will use the entire span (start to end)
are not used for any comparisons by the bedtools. Instead, they will use the entire span (start to end)
of the BED12 entry to perform any relevant feature comparisons. The last six columns will be reported
in the output of all comparisons.
......@@ -34,26 +34,26 @@ The file description below is modified from: http://genome.ucsc.edu/FAQ/FAQforma
4. **name** - Defines the name of the BED feature.
- *Any string can be used*. For example, "LINE", "Exon3", "HWIEAS_0001:3:1:0:266#0/1", or "my_Feature".
- *This column is optional*.
5. **score** - The UCSC definition requires that a BED score range from 0 to 1000, inclusive. However, BEDTools allows any string to be stored in this field in order to allow greater flexibility in annotation features. For example, strings allow scientific notation for p-values, mean enrichment values, etc. It should be noted that this flexibility could prevent such annotations from being correctly displayed on the UCSC browser.
5. **score** - The UCSC definition requires that a BED score range from 0 to 1000, inclusive. However, bedtools allows any string to be stored in this field in order to allow greater flexibility in annotation features. For example, strings allow scientific notation for p-values, mean enrichment values, etc. It should be noted that this flexibility could prevent such annotations from being correctly displayed on the UCSC browser.
- *Any string can be used*. For example, 7.31E-05 (p-value), 0.33456 (mean enrichment value), "up", "down", etc.
- *This column is optional*.
6. **strand** - Defines the strand - either '+' or '-'.
- *This column is optional*.
7. **thickStart** - The starting position at which the feature is drawn thickly.
- *Allowed yet ignored by BEDTools*.
- *Allowed yet ignored by bedtools*.
8. **thickEnd** - The ending position at which the feature is drawn thickly.
- *Allowed yet ignored by BEDTools*.
- *Allowed yet ignored by bedtools*.
9. **itemRgb** - An RGB value of the form R,G,B (e.g. 255,0,0).
- *Allowed yet ignored by BEDTools*.
- *Allowed yet ignored by bedtools*.
10. **blockCount** - The number of blocks (exons) in the BED line.
- *Allowed yet ignored by BEDTools*.
- *Allowed yet ignored by bedtools*.
11. **blockSizes** - A comma-separated list of the block sizes.
- *Allowed yet ignored by BEDTools*.
- *Allowed yet ignored by bedtools*.
12. **blockStarts** - A comma-separated list of block starts.
- *Allowed yet ignored by BEDTools*.
- *Allowed yet ignored by bedtools*.
BEDTools requires that all BED input files (and input received from stdin) are **tab-delimited**. The following types of BED files are supported by BEDTools:
bedtools requires that all BED input files (and input received from stdin) are **tab-delimited**. The following types of BED files are supported by bedtools:
1. | **BED3**: A BED file where each feature is described by **chrom**, **start**, and **end**.
......@@ -69,7 +69,7 @@ BEDTools requires that all BED input files (and input received from stdin) are *
| 11873 0 3 354,109,1189, 0,739,1347,
----------------------
4.1.2 BEDPE format
BEDPE format
----------------------
We have defined a new file format (BEDPE) in order to concisely describe disjoint genome features,
such as structural variations or paired-end sequence alignments. We chose to define a new format
......@@ -106,7 +106,7 @@ The BEDPE format is described below. The description is modified from: http://ge
7. **name** - Defines the name of the BEDPE feature.
- *Any string can be used*. For example, "LINE", "Exon3", "HWIEAS_0001:3:1:0:266#0/1", or "my_Feature".
- *This column is optional*.
8. **score** - The UCSC definition requires that a BED score range from 0 to 1000, inclusive. *However, BEDTools allows any string to be stored in this field in order to allow greater flexibility in annotation features*. For example, strings allow scientific notation for p-values, mean enrichment values, etc. It should be noted that this flexibility could prevent such annotations from being correctly displayed on the UCSC browser.
8. **score** - The UCSC definition requires that a BED score range from 0 to 1000, inclusive. *However, bedtools allows any string to be stored in this field in order to allow greater flexibility in annotation features*. For example, strings allow scientific notation for p-values, mean enrichment values, etc. It should be noted that this flexibility could prevent such annotations from being correctly displayed on the UCSC browser.
- *Any string can be used*. For example, 7.31E-05 (p-value), 0.33456 (mean enrichment value), "up", "down", etc.
- *This column is optional*.
9. **strand1** - Defines the strand for the first end of the feature. Either '+' or '-'.
......@@ -115,27 +115,29 @@ The BEDPE format is described below. The description is modified from: http://ge
10. **strand2** - Defines the strand for the second end of the feature. Either '+' or '-'.
- *This column is optional*.
- *Use "." for unknown*.
11. **Any number of additional, user-defined fields** - BEDTools allows one to add as many additional fields to the normal, 10-column BEDPE format as necessary. These columns are merely "passed through" **pairToBed** and **pairToPair** and are not part of any analysis. One would use these additional columns to add extra information (e.g., edit distance for each end of an alignment, or "deletion", "inversion", etc.) to each BEDPE feature.
11. **Any number of additional, user-defined fields** - bedtools allows one to add as many additional fields to the normal, 10-column BEDPE format as necessary. These columns are merely "passed through" **pairToBed** and **pairToPair** and are not part of any analysis. One would use these additional columns to add extra information (e.g., edit distance for each end of an alignment, or "deletion", "inversion", etc.) to each BEDPE feature.
- *These additional columns are optional*.
Entries from an typical BEDPE file:
::
chr1 100 200 chr5 5000 5100 bedpe_example1 30 + -
chr9 1000 5000 chr9 3000 3800 bedpe_example2 100 + -
Entries from a BEDPE file with two custom fields added to each record:
::
chr1 10 20 chr5 50 60 a1 30 + - 0 1
chr9 30 40 chr9 80 90 a2 100 + - 2 1
----------------------
4.1.3 GFF format
GFF format
----------------------
The GFF format is described on the Sanger Institute's website (http://www.sanger.ac.uk/resources/software/gff/spec.html). The GFF description below is modified from the definition at this URL. All nine columns in the GFF format description are required by BEDTools.
The GFF format is described on the Sanger Institute's website (http://www.sanger.ac.uk/resources/software/gff/spec.html). The GFF description below is modified from the definition at this URL. All nine columns in the GFF format description are required by bedtools.
1. **seqname** - The name of the sequence (e.g. chromosome) on which the feature exists.
- *Any string can be used*. For example, "chr1", "III", "myChrom", "contig1112.23".
......@@ -147,10 +149,10 @@ The GFF format is described on the Sanger Institute's website (http://www.sanger
- *This column is required*.
4. **start** - The one-based starting position of feature on **seqname**.
- *This column is required*.
- *BEDTools accounts for the fact the GFF uses a one-based position and BED uses a zero-based start position*.
- *bedtools accounts for the fact the GFF uses a one-based position and BED uses a zero-based start position*.
5. **end** - The one-based ending position of feature on **seqname**.
- *This column is required*.
6. **score** - A score assigned to the GFF feature. Like BED format, BEDTools allows any string to be stored in this field in order to allow greater flexibility in annotation features. We note that this differs from the GFF definition in the interest of flexibility.
6. **score** - A score assigned to the GFF feature. Like BED format, bedtools allows any string to be stored in this field in order to allow greater flexibility in annotation features. We note that this differs from the GFF definition in the interest of flexibility.
- *This column is required*.
7. **strand** - Defines the strand. Use '+', '-' or '.'
- *This column is required*.
......@@ -167,13 +169,13 @@ An entry from an example GFF file :
----------------------
4.1.3 GFF format
----------------------
Some of the BEDTools (e.g., genomeCoverageBed, complementBed, slopBed) need to know the size of
------------------------
*Genome* file format
------------------------
Some of the bedtools (e.g., genomeCoverageBed, complementBed, slopBed) need to know the size of
the chromosomes for the organism for which your BED files are based. When using the UCSC Genome
Browser, Ensemble, or Galaxy, you typically indicate which which species/genome build you are
working. The way you do this for BEDTools is to create a "genome" file, which simply lists the names of
working. The way you do this for bedtools is to create a "genome" file, which simply lists the names of
the chromosomes (or scaffolds, etc.) and their size (in basepairs).
......@@ -185,29 +187,28 @@ Genome files must be **tab-delimited** and are structured as follows (this is an
chrX 17718854
chrM 13794
BEDTools includes pre-defined genome files for human and mouse in the **/genomes** directory included
in the BEDTools distribution.
bedtools includes pre-defined genome files for human and mouse in the **/genomes** directory included
in the bedtools distribution.
----------------------
4.1.5 SAM/BAM format
SAM/BAM format
----------------------
The SAM / BAM format is a powerful and widely-used format for storing sequence alignment data (see
http://samtools.sourceforge.net/ for more details). It has quickly become the standard format to which
most DNA sequence alignment programs write their output. Currently, the following BEDTools
support inout in BAM format: *intersectBed, windowBed, coverageBed, genomeCoverageBed,
pairToBed, bamToBed*. Support for the BAM format in BEDTools allows one to (to name a few):
most DNA sequence alignment programs write their output. Currently, the following bedtools
support input in BAM format: ``intersect``, ``window``, ``coverage``, ``genomecov``,
``pairtobed``, ``bamtobed``. Support for the BAM format in bedtools allows one to (to name a few):
compare sequence alignments to annotations, refine alignment datasets, screen for potential mutations
and compute aligned sequence coverage.
The details of how these tools work with BAM files are addressed in **Section 5** of this manual.
----------------------
4.1.6 VCF format
VCF format
----------------------
The Variant Call Format (VCF) was conceived as part of the 1000 Genomes Project as a standardized
means to report genetic variation calls from SNP, INDEL and structural variant detection programs
(see http://www.1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0 for details).
BEDTools now supports the latest version of this format (i.e, Version 4.0). As a result, BEDTools can
bedtools now supports the latest version of this format (i.e, Version 4.0). As a result, bedtools can
be used to compare genetic variation calls with other genomic features.
......@@ -2,24 +2,28 @@
Installation
############
BEDTools is intended to run in a "command line" environment on UNIX, LINUX and Apple OS X
operating systems. Installing BEDTools involves downloading the latest source code archive followed by
compiling the source code into binaries on your local system. The following commands will install
BEDTools in a local directory on a NIX or OS X machine. Note that the **"<version>"** refers to the
latest posted version number on http://bedtools.googlecode.com/.
``bedtools`` is intended to run in a "command line" environment on UNIX, LINUX
and Apple OS X operating systems. Installing BEDTools involves downloading the
latest source code archive followed by compiling the source code into binaries
on your local system. The following commands will install ``bedtools`` in a
local directory on an UNIX or OS X machine. Note that the **"<version>"**
refers to the latest posted version number on http://bedtools.googlecode.com/.
Note: *The BEDTools "makefiles" use the GCC compiler. One should edit the Makefiles accordingly if
one wants to use a different compiler.*::
.. note::
curl http://bedtools.googlecode.com/files/BEDTools.<version>.tar.gz > BEDTools.tar.gz
tar -zxvf BEDTools.tar.gz
cd BEDTools-<version>
make clean
make all
ls bin
The bedtools Makefiles utilize the GCC compiler. One should edit the
Makefiles accordingly if one wants to use a different compiler.
.. code-block:: bash
$ curl http://bedtools.googlecode.com/files/BEDTools.<version>.tar.gz > BEDTools.tar.gz
$ tar -zxvf BEDTools.tar.gz
$ cd BEDTools-<version>
$ make
At this point, one should copy the binaries in BEDTools/bin/ to either usr/local/bin/ or some
other repository for commonly used UNIX tools in your environment. You will typically require
administrator (e.g. "root" or "sudo") privileges to copy to usr/local/bin/. If in doubt, contact you
At this point, one should copy the binaries in ./bin/ to either
``usr/local/bin/`` or some other repository for commonly used UNIX tools in
your environment. You will typically require administrator (e.g. "root" or
"sudo") privileges to copy to ``usr/local/bin/``. If in doubt, contact you
system administrator for help.
This diff is collapsed.
......@@ -3,54 +3,67 @@ Quick start
###########
================
Install BEDTools
Install bedtools
================
::
.. code-block:: bash
curl http://bedtools.googlecode.com/files/BEDTools.<version>.tar.gz > BEDTools.tar.gz
tar -zxvf BEDTools.tar.gz
cd BEDTools
make clean
make all
make
sudo cp bin/* /usr/local/bin/
===============
Use BEDTools
Use bedtools
===============
Below are examples of typical BEDTools usage. **Additional usage examples are described in
section 6 of this manual.** Using the "-h" option with any BEDTools will report a list of all command
line options.
Below are examples of typical bedtools usage. Using the "-h" option with any
bedtools will report a list of all command line options.
Report the base-pair overlap between the features in two BED files.
.. code-block:: bash
bedtools intersect -a reads.bed -b genes.bed
A. Report the base-pair overlap between the features in two BED files.
::
Report those entries in A that overlap NO entries in B. Like "grep -v"
intersectBed -a reads.bed -b genes.bed
.. code-block:: bash
B. Report those entries in A that overlap NO entries in B. Like "grep -v"
::
bedtools intersect -a reads.bed -b genes.bed
intersectBed -a reads.bed -b genes.bed ¨Cv
C. Read BED A from stdin. Useful for stringing together commands. For example, find genes that overlap LINEs
but not SINEs.
::
Read BED A from STDIN. Useful for stringing together commands. For example,
find genes that overlap LINEs but not SINEs.
intersectBed -a genes.bed -b LINES.bed | intersectBed -a stdin -b SINEs.bed ¨Cv
.. code-block:: bash
D. Find the closest ALU to each gene.
::
bedtools intersect -a genes.bed -b LINES.bed | \
bedtools intersect -a stdin -b SINEs.bed
closestBed -a genes.bed -b ALUs.bed
Find the closest ALU to each gene.
.. code-block:: bash
bedtools closest -a genes.bed -b ALUs.bed
E. Merge overlapping repetitive elements into a single entry, returning the number of entries merged.
::
mergeBed -i repeatMasker.bed -n
Merge overlapping repetitive elements into a single entry, returning the number of entries merged.
.. code-block:: bash
bedtools merge -i repeatMasker.bed -n
F. Merge nearby repetitive elements into a single entry, so long as they are within 1000 bp of one another.
::
mergeBed -i repeatMasker.bed -d 1000
Merge nearby repetitive elements into a single entry, so long as they are within 1000 bp of one another.
.. code-block:: bash
bedtools merge -i repeatMasker.bed -d 1000
......
......@@ -2,14 +2,16 @@
**bedtools**: *a powerful toolset for genome arithmetic*
================================================================
=================
Overview
=================
.. todo::
Brief paragraph of the software.
Collectively, the **bedtools** utilities are a swiss-army knife of tools
for a wide-range of genomics analysis tasks. The most widely-used
tools enable *genome arithmetic*: that is, set theory on the genome. For
example, **bedtools** allows one to *intersect*, *merge*, *count*, *complement*,
and *shuffle* genomic intervals from multiple files in widely-used
genomic file formats such as BAM, BED, GFF/GTF, VCF.
While each individual tool is designed to do a relatively simple task (e.g.,
*intersect* two interval files), quite sophisticated analyses can be conducted
by combining multiple bedtools operations on the UNIX command line.
=================
Table of contents
......@@ -25,11 +27,51 @@ Table of contents
content/bedtools-suite
content/example-usage
content/advanced-usage
content/FAQ
content/tips-and-tricks
content/faq
=================
Brief example
=================
Let's imagine you have a BED file of ChiP-seq peaks from two different]
experiments. You want to identify peaks that were observed in *both* experiments
(requiring 50% reciprocal overlap) and for those peaks, you want to find to
find the closest, non-overlapping gene. Such an analysis could be conducted
with two, relatively simple bedtools commands.
.. code-block:: bash
# intersect the peaks from both experiments.
# -f 0.50 combined with -r requires 50% reciprocal overlap between the
# peaks from each experiment.
$ bedtools intersect -a exp1.bed -b exp2.bed -f 0.50 -r > both.bed
# find the closest, non-overlapping gene for each interval where
# both experiments had a peak
# -io ignores overlapping intervals and returns only the closest,
# non-overlapping interval (in this case, genes)
$ bedtools closest -a both.bed -b genes.bed -io > both.nearest.genes.txt
==========
License
==========
bedtools is freely available under a GNU Public License (Version 2).
=====================================
Contributors.
=====================================
As open-source software, BEDTools greatly benefits from contributions made by other developers and
users of the tools. We encourage and welcome suggestions, contributions and complaints. This is how
software matures, improves and stays on top of the needs of its user community. The Google Code
(GC) site maintains a list of individuals who have contributed either source code or useful ideas for
improving the tools. In the near future, we hope to maintain a source repository on the GC site in
order to facilitate further contributions. We are currently unable to do so because we use Git for
version control, which is not yet supported by GC.
=================
Mailing list
=================
Refer to the mailing list.
If you have questions, requests, or bugs to report, please email the
`bedtools mailing list <https://groups.google.com/forum/?fromgroups#!forum/bedtools-discuss>`_
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment