Commit 47c6a2ec authored by arq5x's avatar arq5x
Browse files

tweaks for 2.20.0

parent 69eb4eaa
bedtools - a swiss army knife for genome arithmetic
===================================================
**Current version**: 2.19.1
**Current version**: 2.20.0
Note
-------
......
Version 2.18.2 (16-Dec-2013)
bedtools. The changes to bedtools reflect fixes to compilation errors, performance enhancements for smaller files, and a bug fix for BAM files that lack a formal header. Our current focus for the 2.19.* release is is on addressing some standing bug/enhancements and also in updating some of the other more widely used tools (e.g., coverage, map, and substract) to use the new API. We will also continue to look into ways to improve performance while hopefully reducing memory usage for algorithms that work with unsorted data (thanks to Ian Sudberry for the ping!).
pybedtools. Ryan Dale has updated pybedtools to accomodate bedtools 2.18.*, added unit tests, and provided new functionality and bug fixes. The details for this release are here:
http://pythonhosted.org/pybedtools/changes.html
Version 2.18.1 (16-Dec-2013)
Fixes that address compilation errors with CLANG and force compilation of custom BamTools library.
......
......@@ -51,9 +51,9 @@ copyright = u'2009 - 2013, Aaron R. Quinlan'
# built documents.
#
# The short X.Y version.
version = '2.19.1'
version = '2.20.0'
# The full version, including alpha/beta/rc tags.
release = '2.19.1'
release = '2.20.0'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
......
This diff is collapsed.
......@@ -54,6 +54,9 @@ Option Description
**-b** BED/GFF/VCF file B. Use "stdin" if passing B with a UNIX pipe.
**-f** Minimum overlap required as a fraction of A. Default is 1E-9 (i.e. 1bp).
**-r** Require that the fraction of overlap be reciprocal for A and B. In other words, if -f is 0.90 and -r is used, this requires that B overlap at least 90% of A and that A also overlaps at least 90% of B.
**-s** Force "strandedness". That is, only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand.
**-S** Require different strandedness. That is, only report hits in B that overlap A on the _opposite_ strand. By default, overlaps are reported without respect to strand.
**-split** Treat "split" BAM (i.e., having an "N" CIGAR operation) or BED12 entries as distinct BED intervals.
=========================== =========================================================================================================================================================
......
......@@ -87,8 +87,6 @@ Option Description
| **collapse** (i.e., print a comma separated list) - *numeric or text*
| **distinct** (i.e., print a comma separated list) - *numeric or text*
| **concat** (i.e., print a comma separated list) - *numeric or text*
|
| ``Default: 5``
**-f** Minimum overlap required as a fraction of A. Default is 1E-9 (i.e. 1bp).
**-r** Require that the fraction of overlap be reciprocal for A and B. In other words, if -f is 0.90 and -r is used, this requires that B overlap at least 90% of A and that A also overlaps at least 90% of B.
......
......@@ -33,28 +33,42 @@ Usage and option summary
**Usage**:
::
bedtools merge [OPTIONS] -i <BED/GFF/VCF>
bedtools merge [OPTIONS] -i <BED/GFF/VCF/BAM>
**(or)**:
::
mergeBed [OPTIONS] -i <BED/GFF/VCF>
mergeBed [OPTIONS] -i <BED/GFF/VCF/BAM>
=========================== ===============================================================================================================================================================================================================
Option Description
=========================== ===============================================================================================================================================================================================================
**-s** Force strandedness. That is, only merge features that are the same strand. *By default, this is disabled*.
**-n** Report the number of BED entries that were merged. *1 is reported if no merging occurred*.
**-s** Force strandedness. That is, only merge features that are the same strand. *By default, this is disabled*.
**-S** Force merge for one specific strand only. Follow with + or - to force merge from only the forward or reverse strand, respectively. *By default, merging is done without respect to strand*.
**-d** Maximum distance between features allowed for features to be merged. *Default is 0. That is, overlapping and/or book-ended features are merged*.
**-nms** Report the names of the merged features separated by commas. Change delimiter with ``-delim``
**-scores** | Report the scores of the merged features.
| Specify one of the following options for reporting scores:
| ``sum``, ``min``, ``max``,
| ``mean``, ``median``, ``mode``, ``antimode``,
| ``collapse`` (i.e., print a semicolon-separated list)
**-c** Specify columns from the input file to operate upon (see -o option, below). Multiple columns can be specified in a comma-delimited list.
**-o** | Specify the operation that should be applied to ``-c``.
| Valid operations:
| sum, min, max, absmin, absmax,
| mean, median,
| collapse (i.e., print a delimited list (duplicates allowed)),
| distinct (i.e., print a delimited list (NO duplicates allowed)),
| count
| count_distinct (i.e., a count of the unique values in the column),
| **Default:** sum
| Multiple operations can be specified in a comma-delimited list.
| If there is only column, but multiple operations, all operations will be
| applied on that column. Likewise, if there is only one operation, but
| multiple columns, that operation will be applied to all columns.
| Otherwise, the number of columns must match the the number of operations,
| and will be applied in respective order.
|
| E.g., ``-c 5,4,6 -o sum,mean,count`` will give the sum of column 5,
| the mean of column 4, and the count of column 6.
| The order of output columns will match the ordering given in the command.
**-header** | Print the header from the A file prior to results.
**-delim** | Specify a custom delimiter for the -nms and -scores concat options
| Example: ``-delim "|"``
......@@ -103,26 +117,31 @@ The ``-s`` option will only merge intervals that are overlapping/bookended
chr1 501 1000 +
chr1 250 500 -
==========================================================================
``-n`` Reporting the number of features that were merged
``-S`` Reporting merged intervals on a specific strand.
==========================================================================
The -n option will report the number of features that were combined from the
original file in order to make the newly merged feature. If a feature in the
original file was not merged with any other features, a "1" is reported.
The ``-S`` option will only merge intervals for a specific strand. For example,
to only report merged intervals on the "+" strand:
.. code-block:: bash
$ cat A.bed
chr1 100 200
chr1 180 250
chr1 250 500
chr1 501 1000
$ bedtools merge -i A.bed -n
chr1 100 500 3
chr1 501 1000 1
chr1 100 200 a1 1 +
chr1 180 250 a2 2 +
chr1 250 500 a3 3 -
chr1 501 1000 a4 4 +
$ bedtools merge -i A.bed -S +
chr1 100 250
chr1 501 1000
To also report the strand, you could use the ``-c`` and ``-o`` operators (see below for more details):
.. code-block:: bash
$ bedtools merge -i A.bed -S + -c 6 -o distinct
chr1 100 250 +
chr1 501 1000 +
==========================================================================
......@@ -147,55 +166,87 @@ combined.
$ bedtools merge -i A.bed -d 1000
chr1 100 200 1000
==========================================================================
``-nms`` Reporting the names of the features that were merged
``-c`` and ``-o`` Applying operations to columns from merged intervals.
==========================================================================
Occasionally, one might like to know that names of the features that were
merged into a new feature. The ``-nms`` option will add an extra column to the
``merge`` output which lists (separated by semicolons) the names of the
merged features.
When merging intervals, we often want to summarize or keep track of the
values observed in specific columns (e.g., the feature name or score) from
the original, unmerged intervals. When used together, the ``-c`` and ``-o``
options allow one to select specific columns (``-c``) and apply operation
(``-o``) to each column. The result will be appended to the default, merged
interval output. For example, one could use the following to report the
count of intervals that we merged in each resulting interval (this replaces
the ``-n`` option that existed prior to version ``2.20.0``).
.. code-block:: bash
$ cat A.bed
chr1 100 200 A1
chr1 150 300 A2
chr1 250 500 A3
$ bedtools merge -i A.bed -nms
chr1 100 500 A1,A2,A3
chr1 100 200
chr1 180 250
chr1 250 500
chr1 501 1000
$ bedtools merge -i A.bed -c 1 -o count
chr1 100 500 3
chr1 501 1000 1
==========================================================================
``-scores`` Reporting the scores of the features that were merged
==========================================================================
Similarly, we might like to know that scores of the features that were
merged into a new feature. Enter the ``-scores`` option. One can specify
how the scores from each overlapping interval should be reported.
We could also use these options to report the mean of the score (#5) field:
.. code-block:: bash
$ cat A.bed
chr1 100 200 A1 1
chr1 150 300 A2 2
chr1 250 500 A3 3
$ bedtools merge -i A.bed -scores mean
chr1 100 500 2
$ bedtools merge -i A.bed -scores max
chr1 100 500 3
chr1 100 200 a1 1 +
chr1 180 250 a2 2 +
chr1 250 500 a3 3 -
chr1 501 1000 a4 4 +
$ bedtools merge -i A.bed -c 5 -o mean
chr1 100 500 2
chr1 501 1000 4
Let's get fancy and report the mean, min, and max of the score column:
.. code-block:: bash
$ bedtools merge -i A.bed -c 5 -o mean,min,max
chr1 100 500 2 1 3
chr1 501 1000 4 4 4
Let's also report a comma-separated list of the strands:
$ bedtools merge -i A.bed -scores collapse
chr1 100 500 1,2,3
.. code-block:: bash
$ bedtools merge -i A.bed -c 5,5,5,6 -o mean,min,max,collapse
chr1 100 500 2 1 3 +,+,-
chr1 501 1000 4 4 4 +
Hopefully this provides a clear picture of what can be done.
==========================================================================
``-n`` Reporting the number of features that were merged
==========================================================================
.. deprecated:: 2.20.0
See the ``-c`` and ``-o`` operators.
==========================================================================
``-nms`` Reporting the names of the features that were merged
==========================================================================
.. deprecated:: 2.20.0
See the ``-c`` and ``-o`` operators.
==========================================================================
``-scores`` Reporting the scores of the features that were merged
==========================================================================
.. deprecated:: 2.20.0
See the ``-c`` and ``-o`` operators.
==========================================================================
``-delim`` Change the delimiter for ``-nms`` and ``-scores collapse``
``-delim`` Change the delimiter for ``-c`` and ``-o``
==========================================================================
One can override the use of a comma as the delimiter for the ``-nms`` and
``-scores collapse`` options via the use of the ``-delim`` option.
One can override the use of a comma as the delimiter for the ``-c`` and
``-o collapse|distinct`` options via the use of the ``-delim`` option.
.. code-block:: bash
......@@ -208,12 +259,12 @@ Compare:
.. code-block:: bash
$ bedtools merge -i A.bed -nms
$ bedtools merge -i A.bed -c 4 -o collapse
chr1 100 500 A1,A2,A3
to:
.. code-block:: bash
$ bedtools merge -i A.bed -nms -delim "|"
$ bedtools merge -i A.bed -c 4 -o collapse -delim "|"
chr1 100 500 A1|A2|A3
......@@ -32,12 +32,12 @@ Table of contents
=================
.. toctree::
:maxdepth: 1
:numbered:
content/overview
content/installation
content/quick-start
content/general-usage
content/history
content/bedtools-suite
content/example-usage
content/advanced-usage
......
......@@ -70,11 +70,9 @@ void merge_help(void) {
cerr << "\t\tMultiple columns can be specified in a comma-delimited list." << endl << endl;
KeyListOpsHelp();
cerr << "Notes: " << endl;
cerr << "\t(1) All output, regardless of input type (e.g., GFF or VCF)" << endl;
cerr << "\t will in BED format with zero-based starts" << endl << endl;
cerr << "\t(2) The input file (-i) file must be sorted by chrom, then start." << endl << endl;
cerr << "Notes: " << endl;
cerr << "\t(1) The input file (-i) file must be sorted by chrom, then start." << endl << endl;
// end the program here
exit(1);
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment