As of version, 2.22.0, the `closest` tool allows one to find the closest
intervals in multiple `-b` files. Consider the following examples.
.. note::
When using multiple `-b` files, an additional column describing the file number from which the closest B interval came will be added between the columns representing the full A interval and the columns representing the full A interval. This file number will refer to the order in which the files were provided on the command line.
.. code-block:: bash
$ cat a.bed
chr1 10 20 a1 1 -
$ cat b1.bed
chr1 5 6 b1.1 1 -
chr1 30 40 b1.2 2 +
$ cat b2.bed
chr1 0 1 b2.1 1 -
chr1 21 22 b2.2 2 +
# In this example, the 7th column reflects the file number from
# which the closest interval came.
$ bedtools closest -a a.bed -b b1.bed b2.bed
chr1 10 20 a1 1 - 1 chr1 5 6 b1.1 1 -
chr1 10 20 a1 1 - 2 chr1 21 22 b2.2 2 +
Instead of using file numbers, you can also provide more informative labels via the `-names` option.
By default, the closest interval from **each** file is reported when using multiple `-b` files.
.. code-block:: bash
$ cat a.bed
chr1 10 20 a1 1 -
$ cat b1.bed
chr1 5 6 b1.1 1 -
chr1 30 40 b1.2 2 +
$ cat b2.bed
chr1 0 1 b2.1 1 -
chr1 21 22 b2.2 2 +
$ bedtools closest -a a.bed -b b1.bed b2.bed -d
chr1 10 20 a1 1 - 1 chr1 5 6 b1.1 1 - 5
chr1 10 20 a1 1 - 2 chr1 21 22 b2.2 2 + 2
$ bedtools closest -a a.bed -b b1.bed b2.bed -mdb each -d
chr1 10 20 a1 1 - 1 chr1 5 6 b1.1 1 - 5
chr1 10 20 a1 1 - 2 chr1 21 22 b2.2 2 + 2
However, one can optionally choose to report only the closest interval(s) observed among **all** of the `-b` files. In this example, the second interval from b2.bed is only 2 base pairs away from the interval in A, whereas the first interval in b1.bed is 5 base pairs away. Therefore, when using `mdb all`, the the second interval from b2.bed wins.
.. code-block:: bash
$ bedtools closest -a a.bed -b b1.bed b2.bed -mdb all -d
ClosestBed will optionally report the distance to the closest feature in the B file using the **-d** option.
One often wants to also know the distance in base pairs between the interval in A and the closest interval(s) in B. `closest` will optionally report the distance to the closest feature in the B file using the `-d` option. The distance (in base pairs) will be reported as the last column in the output.
.. note::
When a feature in B overlaps a feature in A, a distance of 0 is reported.
Whereas the `-d` option always reports distances as positive integers, the
`-D` option will use negative integers to report distances to "upstream" features. There are three options for dictating how "upstream" should be defined.
1. `-D ref`: Report distance with respect to the reference genome. That is, B features with lower start/stop coordinates are considered to be upstream.
2. `-D a`: Report distance with respect to the orientation of the interval in A. That is, when A is on the - strand, "upstream" means B has higher start/stop coordinates. When A is on the + strand, "upstream" means B has lower start/stop coordinates.
3. `-D b`: Report distance with respect to the orientation of the interval in B. That is, when B is on the - strand, "upstream" means A has higher start/stop coordinates. When B is on the + strand, "upstream" means A has lower start/stop coordinates.
This is best demonstrated through multiple examples.
.. code-block:: bash
$ cat a.bed
chr1 10 20 a1 1 +
$ cat b.bed
chr1 7 8 b1 1 +
chr1 22 23 b2 2 -
$ bedtools closest -a a.bed -b b.bed -D ref
chr1 10 20 a1 1 + chr1 7 8 b1 1 + -3
chr1 10 20 a1 1 + chr1 22 23 b2 2 - 3
Since the A record is on the "+" strand in this example, `-D ref` and `-D a` have the same effect.
.. code-block:: bash
$ bedtools closest -a a.bed -b b.bed -D a
chr1 10 20 a1 1 + chr1 7 8 b1 1 + -3
chr1 10 20 a1 1 + chr1 22 23 b2 2 - 3
However, the signs of the distances change if the A interval is on the "-" strand.
.. code-block:: bash
$ cat a.bed
chr1 10 20 a1 1 -
$ bedtools closest -a a.bed -b b.bed -D a
chr1 10 20 a1 1 - chr1 7 8 b1 1 + 3
chr1 10 20 a1 1 - chr1 22 23 b2 2 - -3
Let's switch the A interval back to the "+" strand and now report distances with respect to the orientation of the closest B records.
.. code-block:: bash
$ cat a.bed
chr1 10 20 a1 1 +
$ bedtools closest -a a.bed -b b.bed -D b
chr1 10 20 a1 1 + chr1 7 8 b1 1 + 3
chr1 10 20 a1 1 + chr1 22 23 b2 2 - 3
Let's flip the stand of the two B records and compare.
As of version 2.21.0, the `intersect` tool can detect overlaps between
a single `-a` file and multiple `-b` files (instead of just one previously).
One simply provides multiple `-b` files on the command line.
For example, consider the following query (`-a`) file and three distinct (`-b`) files:
.. code-block:: bash
$ cat query.bed
chr1 1 20
chr1 40 45
chr1 70 90
chr1 105 120
chr2 1 20
chr2 40 45
chr2 70 90
chr2 105 120
chr3 1 20
chr3 40 45
chr3 70 90
chr3 105 120
chr3 150 200
chr4 10 20
$ cat d1.bed
chr1 5 25
chr1 65 75
chr1 95 100
chr2 5 25
chr2 65 75
chr2 95 100
chr3 5 25
chr3 65 75
chr3 95 100
$ cat d2.bed
chr1 40 50
chr1 110 125
chr2 40 50
chr2 110 125
chr3 40 50
chr3 110 125
$ cat d3.bed
chr1 85 115
chr2 85 115
chr3 85 115
We can now compare query.bed to all three database files at once.:
.. code-block:: bash
$ bedtools intersect -a query.bed \
-b d1.bed d2.bed d3.bed
chr1 5 20
chr1 40 45
chr1 70 75
chr1 85 90
chr1 110 120
chr1 105 115
chr2 5 20
chr2 40 45
chr2 70 75
chr2 85 90
chr2 110 120
chr2 105 115
chr3 5 20
chr3 40 45
chr3 70 75
chr3 85 90
chr3 110 120
chr3 105 115
Clearly this is not completely informative because we cannot tell from which file each intersection came. However, if we use `-wa` and `-wb`, this becomes abundantly clear. When these options are used, the first column after the complete `-a` record lists the file number from which the overlap came. The number corresponds to the order in which the files were given on the command line.
.. code-block:: bash
$ bedtools intersect -wa -wb \
-a query.bed \
-b d1.bed d2.bed d3.bed \
-sorted
chr1 1 20 1 chr1 5 25
chr1 40 45 2 chr1 40 50
chr1 70 90 1 chr1 65 75
chr1 70 90 3 chr1 85 115
chr1 105 120 2 chr1 110 125
chr1 105 120 3 chr1 85 115
chr2 1 20 1 chr2 5 25
chr2 40 45 2 chr2 40 50
chr2 70 90 1 chr2 65 75
chr2 70 90 3 chr2 85 115
chr2 105 120 2 chr2 110 125
chr2 105 120 3 chr2 85 115
chr3 1 20 1 chr3 5 25
chr3 40 45 2 chr3 40 50
chr3 70 90 1 chr3 65 75
chr3 70 90 3 chr3 85 115
chr3 105 120 2 chr3 110 125
chr3 105 120 3 chr3 85 115
In many cases, it may be more useful to report an informative "label" for each file instead of a file number. One can do this with the `-names` option.
.. code-block:: bash
$ bedtools intersect -wa -wb \
-a query.bed \
-b d1.bed d2.bed d3.bed \
-names d1 d2 d3 \
-sorted
chr1 1 20 d1 chr1 5 25
chr1 40 45 d2 chr1 40 50
chr1 70 90 d1 chr1 65 75
chr1 70 90 d3 chr1 85 115
chr1 105 120 d2 chr1 110 125
chr1 105 120 d3 chr1 85 115
chr2 1 20 d1 chr2 5 25
chr2 40 45 d2 chr2 40 50
chr2 70 90 d1 chr2 65 75
chr2 70 90 d3 chr2 85 115
chr2 105 120 d2 chr2 110 125
chr2 105 120 d3 chr2 85 115
chr3 1 20 d1 chr3 5 25
chr3 40 45 d2 chr3 40 50
chr3 70 90 d1 chr3 65 75
chr3 70 90 d3 chr3 85 115
chr3 105 120 d2 chr3 110 125
chr3 105 120 d3 chr3 85 115
Or perhaps it may be more useful to report the file name. One can do this with the `-filenames` option.
.. code-block:: bash
$ bedtools intersect -wa -wb \
-a query.bed \
-b d1.bed d2.bed d3.bed \
-sorted \
-filenames
chr1 1 20 d1.bed chr1 5 25
chr1 40 45 d2.bed chr1 40 50
chr1 70 90 d1.bed chr1 65 75
chr1 70 90 d3.bed chr1 85 115
chr1 105 120 d2.bed chr1 110 125
chr1 105 120 d3.bed chr1 85 115
chr2 1 20 d1.bed chr2 5 25
chr2 40 45 d2.bed chr2 40 50
chr2 70 90 d1.bed chr2 65 75
chr2 70 90 d3.bed chr2 85 115
chr2 105 120 d2.bed chr2 110 125
chr2 105 120 d3.bed chr2 85 115
chr3 1 20 d1.bed chr3 5 25
chr3 40 45 d2.bed chr3 40 50
chr3 70 90 d1.bed chr3 65 75
chr3 70 90 d3.bed chr3 85 115
chr3 105 120 d2.bed chr3 110 125
chr3 105 120 d3.bed chr3 85 115
Other options to `intersect` can be used as well. For example, let's use `-v` to report those intervals in query.bed that do not overlap any of the intervals in the three database files:
.. code-block:: bash
$ bedtools intersect -wa -wb \
-a query.bed \
-b d1.bed d2.bed d3.bed \
-sorted \
-v
chr3 150 200
chr4 10 20
Or, let's report only those intersections where 100% of the query record is overlapped by a database record:
As of version 2.21.0, the `intersect` tool can detect overlaps between
a single `-a` file and multiple `-b` files (instead of just one previously).
One simply provides multiple `-b` files on the command line.
For example, consider the following query (`-a`) file and three distinct (`-b`) files:
.. code-block:: bash
$ cat query.bed
chr1 1 20
chr1 40 45
chr1 70 90
chr1 105 120
chr2 1 20
chr2 40 45
chr2 70 90
chr2 105 120
chr3 1 20
chr3 40 45
chr3 70 90
chr3 105 120
chr3 150 200
chr4 10 20
$ cat d1.bed
chr1 5 25
chr1 65 75
chr1 95 100
chr2 5 25
chr2 65 75
chr2 95 100
chr3 5 25
chr3 65 75
chr3 95 100
$ cat d2.bed
chr1 40 50
chr1 110 125
chr2 40 50
chr2 110 125
chr3 40 50
chr3 110 125
$ cat d3.bed
chr1 85 115
chr2 85 115
chr3 85 115
We can now compare query.bed to all three database files at once.:
.. code-block:: bash
$ bedtools intersect -a query.bed \
-b d1.bed d2.bed d3.bed
chr1 5 20
chr1 40 45
chr1 70 75
chr1 85 90
chr1 110 120
chr1 105 115
chr2 5 20
chr2 40 45
chr2 70 75
chr2 85 90
chr2 110 120
chr2 105 115
chr3 5 20
chr3 40 45
chr3 70 75
chr3 85 90
chr3 110 120
chr3 105 115
Clearly this is not completely informative because we cannot tell from which file each intersection came. However, if we use `-wa` and `-wb`, this becomes abundantly clear. When these options are used, the first column after the complete `-a` record lists the file number from which the overlap came. The number corresponds to the order in which the files were given on the command line.
.. code-block:: bash
$ bedtools intersect -wa -wb \
-a query.bed \
-b d1.bed d2.bed d3.bed \
-sorted
chr1 1 20 1 chr1 5 25
chr1 40 45 2 chr1 40 50
chr1 70 90 1 chr1 65 75
chr1 70 90 3 chr1 85 115
chr1 105 120 2 chr1 110 125
chr1 105 120 3 chr1 85 115
chr2 1 20 1 chr2 5 25
chr2 40 45 2 chr2 40 50
chr2 70 90 1 chr2 65 75
chr2 70 90 3 chr2 85 115
chr2 105 120 2 chr2 110 125
chr2 105 120 3 chr2 85 115
chr3 1 20 1 chr3 5 25
chr3 40 45 2 chr3 40 50
chr3 70 90 1 chr3 65 75
chr3 70 90 3 chr3 85 115
chr3 105 120 2 chr3 110 125
chr3 105 120 3 chr3 85 115
In many cases, it may be more useful to report an informative "label" for each file instead of a file number. One can do this with the `-names` option.
.. code-block:: bash
$ bedtools intersect -wa -wb \
-a query.bed \
-b d1.bed d2.bed d3.bed \
-names d1 d2 d3 \
-sorted
chr1 1 20 d1 chr1 5 25
chr1 40 45 d2 chr1 40 50
chr1 70 90 d1 chr1 65 75
chr1 70 90 d3 chr1 85 115
chr1 105 120 d2 chr1 110 125
chr1 105 120 d3 chr1 85 115
chr2 1 20 d1 chr2 5 25
chr2 40 45 d2 chr2 40 50
chr2 70 90 d1 chr2 65 75
chr2 70 90 d3 chr2 85 115
chr2 105 120 d2 chr2 110 125
chr2 105 120 d3 chr2 85 115
chr3 1 20 d1 chr3 5 25
chr3 40 45 d2 chr3 40 50
chr3 70 90 d1 chr3 65 75
chr3 70 90 d3 chr3 85 115
chr3 105 120 d2 chr3 110 125
chr3 105 120 d3 chr3 85 115
Or perhaps it may be more useful to report the file name. One can do this with the `-filenames` option.
.. code-block:: bash
$ bedtools intersect -wa -wb \
-a query.bed \
-b d1.bed d2.bed d3.bed \
-sorted \
-filenames
chr1 1 20 d1.bed chr1 5 25
chr1 40 45 d2.bed chr1 40 50
chr1 70 90 d1.bed chr1 65 75
chr1 70 90 d3.bed chr1 85 115
chr1 105 120 d2.bed chr1 110 125
chr1 105 120 d3.bed chr1 85 115
chr2 1 20 d1.bed chr2 5 25
chr2 40 45 d2.bed chr2 40 50
chr2 70 90 d1.bed chr2 65 75
chr2 70 90 d3.bed chr2 85 115
chr2 105 120 d2.bed chr2 110 125
chr2 105 120 d3.bed chr2 85 115
chr3 1 20 d1.bed chr3 5 25
chr3 40 45 d2.bed chr3 40 50
chr3 70 90 d1.bed chr3 65 75
chr3 70 90 d3.bed chr3 85 115
chr3 105 120 d2.bed chr3 110 125
chr3 105 120 d3.bed chr3 85 115
Other options to `intersect` can be used as well. For example, let's use `-v` to report those intervals in query.bed that do not overlap any of the intervals in the three database files:
.. code-block:: bash
$ bedtools intersect -wa -wb \
-a query.bed \
-b d1.bed d2.bed d3.bed \
-sorted \
-v
chr3 150 200
chr4 10 20
Or, let's report only those intersections where 100% of the query record is overlapped by a database record: