README.md 3.32 KB
Newer Older
Aaron Quinlan's avatar
Aaron Quinlan committed
1
2
bedtools - a swiss army knife for genome arithmetic         
===================================================
Aaron's avatar
Aaron committed
3

Aaron Quinlan's avatar
Aaron Quinlan committed
4
[Download current version](https://github.com/arq5x/bedtools2/releases/latest)
5

arq5x's avatar
arq5x committed
6
7
Note
-------
Aaron Quinlan's avatar
Aaron Quinlan committed
8
9
10
Stable release for bedtools were formerly archived on Google Code. Unfortunately, the Google Code
downloads facility is shutting down; so henceforth, all source code and stable releases will be 
maintained via this Github repository.
Aaron's avatar
Aaron committed
11

arq5x's avatar
arq5x committed
12
13
**Full documentation**:  http://bedtools.readthedocs.org

arq5x's avatar
arq5x committed
14
15
16
Summary
-------
Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.
Aaron's avatar
Aaron committed
17

arq5x's avatar
arq5x committed
18
While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.
Aaron's avatar
Aaron committed
19

arq5x's avatar
arq5x committed
20
21
22
23
24
25
26
27

Performance
-----------
As of version 2.18, ``bedtools`` is substantially more scalable thanks to improvements we have made in the algorithm used to process datasets that are pre-sorted
by chromosome and start position. As you can see in the plots below, the speed and memory consumption scale nicely
with sorted data as compared to the poor scaling for unsorted data. The current version of bedtools intersect is as fast as (or slightly faster) than the ``bedops`` package's ``bedmap`` which uses a similar algorithm for sorted data.  The plots below represent counting the number of intersecting alignments from exome capture BAM files against CCDS exons.
The alignments have been converted to BED to facilitate comparisons to ``bedops``. We compare to the bedmap ``--ec`` option because similar error checking is enforced by ``bedtools``.

arq5x's avatar
tweaks    
arq5x committed
28
29
30
31

**Note:** bedtools could not complete when using 100 million alignments and the R-Tree algorithm used for unsorted data.


arq5x's avatar
arq5x committed
32
33
34
35
![Speed Comparison](http://bedtools.readthedocs.org/en/latest/_images/speed-comparo.png)
![Memory Comparison](http://bedtools.readthedocs.org/en/latest/_images/memory-comparo.png)


arq5x's avatar
arq5x committed
36
37
Details
-------
arq5x's avatar
typo    
arq5x committed
38
First created through urgency and adrenaline by Aaron Quinlan Spring 2009. 
arq5x's avatar
arq5x committed
39
Maintained by the Quinlan Laboratory at the University of Virginia.
Aaron's avatar
Aaron committed
40

arq5x's avatar
arq5x committed
41
42
43
44
1. **Lead developers**:           Aaron Quinlan, Neil Kindlon
2. **Significant contributions**: Assaf Gordon, Royden Clark, John Marshall, Brent Pedersen, Ryan Dale
3. **Repository**:                https://github.com/arq5x/bedtools2
4. **Stable releases**:           https://github.com/arq5x/bedtools2/releases
arq5x's avatar
arq5x committed
45
5. **Documentation**:             http://bedtools.readthedocs.org
arq5x's avatar
arq5x committed
46
6. **License**:                   Released under GNU public license version 2 (GPL v2).
Aaron's avatar
Aaron committed
47
48


arq5x's avatar
arq5x committed
49
50
51
52
53
54
55
Citation
--------
*Please cite the following article if you use BEDTools in your research*:
  * Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842. 

Also, if you use *pybedtools*, please cite the following.
  * Dale RK, Pedersen BS, and Quinlan AR. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics (2011). doi:10.1093/bioinformatics/btr539
Aaron's avatar
Aaron committed
56