1. The `coverage` tool now takes advantage of pre-sorted intervals via the `-sorted` option. This allows the `coverage` tool to be much faster, use far less memory, and report coverage for intervals in their original order in the input file.
2. We have changed the behavior of the `coverage` tool such that it is consistent with the other tools. Specifically, coverage is now computed for the intervals in the A file based on the overlaps with the B file, rather than vice versa.
3. The ``subtract`` tool now supports pre-sorted data via the ``-sorted`` option and is therefore much faster and scalable.
4. The ``-nonamecheck`` option provides greater tolerance for chromosome labeling when using the ``-sorted`` option.
5. Support for multiple SVLEN tags in VCF format, and fixed a bug that failed to process SVLEN tags coming at the end of a VCF INFO field.
6. Support for reverse complementing IUPAC codes in the ``getfasta`` tool.
7. Provided greater flexibility for "BED+" files, where the first 3 columns are chrom, start, and end, and the remaining columns are free-form.
8. We now detect stale FAI files and recreate an index thanks to a fix from @gtamazian.
9. New feature from Pierre Lindenbaum allowing the ``sort`` tool to sort files based on the chromosome order in a ``faidx`` file.
10. Eliminated multiple compilation warnings thanks to John Marshall.
11. Fixed bug in handling INS variants in VCF files.
Version 2.23.0 (22-Feb-2015)
============================
1. Added ``-k`` option to the closest tool to report the k-closest features in one or more -b files.
2. Added ``-fd`` option to the closest tool to for the reporting of downstream features in one or more -b files. Requires -D to dictate how "downstream" should be defined.
3. Added ``-fu`` option to the closest tool to for the reporting of downstream features in one or more -b files. Requires -D to dictate how "downstream" should be defined.
4. Pierre Lindenbaum added a new split tool that will split an input file into multiple sub files. Unlike UNIX split, it can balance the chunking of the sub files not just by number of lines, but also by total number of base pairs in each sub file.
5. Added a new spacing tool that reports the distances between features in a file.
6. Jay Hesselberth added a ``-reverse`` option to the makewindows tool that reverses the order of the assigned window numbers.
7. Fixed a bug that caused incorrect reporting of overlap for zero-length BED records. Thanks to @roryk.
8. Fixed a bug that caused the map tool to not allow ``-b`` to be specified before ``-a``. Thanks to @semenko.
9. Fixed a bug in ``makewindows`` that mistakenly required ``-s`` with ``-n``.
Version 2.22.1 (01-Jan-2015)
============================
1. When using -sorted with intersect, map, and closest, bedtools can now detect and warn you when your input datasets employ different chromosome sorting orders.
2. Fixed multiple bugs in the new, faster closest tool. Specifically, the -iu, -id, and -D options were not behaving properly with the new "sweeping" algorithm that was implemented for the 2.22.0 release. Many thanks to Sol Katzman for reporting these issues and for providing a detailed analysis and example files.
3. We FINALLY wrote proper documentation for the closest tool (http://bedtools.readthedocs.org/en/latest/content/tools/closest.html)
4. Fixed bug in the tag tool when using -intervals, -names, or -scores. Thanks to Yarden Katz for reporting this.
5. Fixed issues with chromosome boundaries in the slop tool when using negative distances. Thanks to @acdaugherty!
6. Multiple improvements to the fisher tool. Added a -m option to the fisher tool to merge overlapping intervals prior to comparing overlaps between two input files. Thanks to@brentp
7. Fixed a bug in makewindows tool requiring the use of -b with -s.
8. Fixed a bug in intersect that prevented -split from detecting complete overlaps with -f 1. Thanks to @tleonardi .
9. Restored the default decimal precision to the groupby tool.
10. Added the -prec option to the merge and map tools to specific the decimal precision of the output.
Version 2.22.0 (12-Nov-2014)
============================
1. The "closest" tool now requires sorted files, but this requirement now enables it to simultaneously find the closest intervals from many (not just one) files.
2. We now have proper support for "imprecise" SVs in VCF format. This addresses a long standing (sorry) limitation in the way bedtools handles VCF files.
Version 2.21.0 (18-Sep-2014)
============================
1. Added ability to intersect against multiple `-b` files in the `intersect` tool.
**-s** Force strandedness. That is, only features in A are only counted towards coverage in B if they are the same strand. *By default, this is disabled and coverage is counted without respect to strand*.
**-hist** Report a histogram of coverage for each feature in B as well as a summary histogram for _all_ features in B.
| Output (tab delimited) after each feature in B:
| 1) depth
| 2) # bases at depth
| 3) size of B
| 4) % of B at depth
**-d** Report the depth at each position in each B feature. Positions reported are one based. Each position and depth follow the complete B feature.
**-split** Treat "split" BAM or BED12 entries as distinct BED intervals when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12).
**-a** BAM/BED/GFF/VCF file "A". Each feature in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe.
**-b** One or more BAM/BED/GFF/VCF file(s) "B". Use "stdin" if passing B with a UNIX pipe.
**NEW!!!**: -b may be followed with multiple databases and/or wildcard (*) character(s).
**-abam** BAM file A. Each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example: samtools view -b <BAM> | bedtools intersect -abam stdin -b genes.bed. **Note**: no longer necessary after version 2.19.0
**-hist** | Report a histogram of coverage for each feature in A as well as a summary histogram for _all_ features in A.
| Output (tab delimited) after each feature in A:
| 1) depth
| 2) # bases at depth
| 3) size of A
| 4) % of A at depth
**-d** Report the depth at each position in each A feature. Positions reported are one based. Each position and depth follow the complete A feature.
**-counts** Only report the count of overlaps, don't compute fraction, etc. Restricted by -f and -r.
**-f** Minimum overlap required as a fraction of A. Default is 1E-9 (i.e. 1bp).
**-r** Require that the fraction of overlap be reciprocal for A and B. In other words, if -f is 0.90 and -r is used, this requires that B overlap at least 90% of A and that A also overlaps at least 90% of B.
**-s** Force "strandedness". That is, only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand.
**-S** Require different strandedness. That is, only report hits in B that overlap A on the _opposite_ strand. By default, overlaps are reported without respect to strand.
**-split** Treat "split" BAM (i.e., having an "N" CIGAR operation) or BED12 entries as distinct BED intervals.
**-sorted** For very large B files, invoke a "sweeping" algorithm that requires position-sorted (e.g., ``sort -k1,1 -k2,2n`` for BED files) input. When using -sorted, memory usage remains low even for very large files.
**-g** Specify a genome file the defines the expected chromosome order in the input files for use with the ``-sorted`` option.
**-header** Print the header from the A file prior to results.
**-sortout** When using *multiple databases* (`-b`), sort the output DB hits for each record.
**-nobuf** Disable buffered output. Using this option will cause each line of output to be printed as it is generated, rather than saved in a buffer. This will make printing large output files noticeably slower, but can be useful in conjunction with other software tools and scripts that need to process one line of bedtools output at a time.
**-iobuf** Follow with desired integer size of read buffer. Optional suffixes `K/M/G` supported. **Note**: currently has no effect with compressed files.