Commit 6d5a7587 authored by arq5x's avatar arq5x
Browse files

[DOC] cleanup docs to facilitate man page creation.

parent e5e5d751
......@@ -22,51 +22,81 @@ in the output of all comparisons.
The file description below is modified from: http://genome.ucsc.edu/FAQ/FAQformat#format1.
1. **chrom** - The name of the chromosome on which the genome feature exists.
- *Any string can be used*. For example, "chr1", "III", "myChrom", "contig1112.23".
- *This column is required*.
- *Any string can be used*. For example, "chr1", "III", "myChrom", "contig1112.23".
- *This column is required*.
2. **start** - The zero-based starting position of the feature in the chromosome.
- *The first base in a chromosome is numbered 0*.
- *The start position in each BED feature is therefore interpreted to be 1 greater than the start position listed in the feature. For example, start=9, end=20 is interpreted to span bases 10 through 20,inclusive*.
- *This column is required*.
3. **end** - The one-based ending position of the feature in the chromosome.
- *The end position in each BED feature is one-based. See example above*.
- *This column is required*.
4. **name** - Defines the name of the BED feature.
- *Any string can be used*. For example, "LINE", "Exon3", "HWIEAS_0001:3:1:0:266#0/1", or "my_Feature".
- *This column is optional*.
5. **score** - The UCSC definition requires that a BED score range from 0 to 1000, inclusive. However, bedtools allows any string to be stored in this field in order to allow greater flexibility in annotation features. For example, strings allow scientific notation for p-values, mean enrichment values, etc. It should be noted that this flexibility could prevent such annotations from being correctly displayed on the UCSC browser.
- *Any string can be used*. For example, 7.31E-05 (p-value), 0.33456 (mean enrichment value), "up", "down", etc.
- *This column is optional*.
6. **strand** - Defines the strand - either '+' or '-'.
- *This column is optional*.
7. **thickStart** - The starting position at which the feature is drawn thickly.
- *Allowed yet ignored by bedtools*.
8. **thickEnd** - The ending position at which the feature is drawn thickly.
- *Allowed yet ignored by bedtools*.
9. **itemRgb** - An RGB value of the form R,G,B (e.g. 255,0,0).
- *Allowed yet ignored by bedtools*.
10. **blockCount** - The number of blocks (exons) in the BED line.
- *Allowed yet ignored by bedtools*.
11. **blockSizes** - A comma-separated list of the block sizes.
- *Allowed yet ignored by bedtools*.
12. **blockStarts** - A comma-separated list of block starts.
- *Allowed yet ignored by bedtools*.
bedtools requires that all BED input files (and input received from stdin) are **tab-delimited**. The following types of BED files are supported by bedtools:
1. | **BED3**: A BED file where each feature is described by **chrom**, **start**, and **end**.
| For example: chr1 11873 14409
2. | **BED4**: A BED file where each feature is described by **chrom**, **start**, **end**, and **name**.
| For example: chr1 11873 14409 uc001aaa.3
3. | **BED5**: A BED file where each feature is described by **chrom**, **start**, **end**, **name**, and **score**.
| For example: chr1 11873 14409 uc001aaa.3 0
4. | **BED6**: A BED file where each feature is described by **chrom**, **start**, **end**, **name**, **score**, and **strand**.
| For example: chr1 11873 14409 uc001aaa.3 0 +
5. | **BED12**: A BED file where each feature is described by all twelve columns listed above.
| For example: chr1 11873 14409 uc001aaa.3 0 + 11873
| 11873 0 3 354,109,1189, 0,739,1347,
1. **BED3**: A BED file where each feature is described by **chrom**, **start**, and **end**.
For example: ``chr1 11873 14409``
2. **BED4**: A BED file where each feature is described by **chrom**, **start**, **end**, and **name**.
For example: ``chr1 11873 14409 uc001aaa.3``
3. **BED5**: A BED file where each feature is described by **chrom**, **start**, **end**, **name**, and **score**.
For example: ``chr1 11873 14409 uc001aaa.3 0``
4. **BED6**: A BED file where each feature is described by **chrom**, **start**, **end**, **name**, **score**, and **strand**.
For example: ``chr1 11873 14409 uc001aaa.3 0 +``
5. **BED12**: A BED file where each feature is described by all twelve columns listed above.
For example: ``chr1 11873 14409 uc001aaa.3 0 + 11873 11873 0 3 354,109,1189, 0,739,1347,``
----------------------
BEDPE format
......@@ -80,42 +110,63 @@ alignments, especially when studying structural variation.
The BEDPE format is described below. The description is modified from: http://genome.ucsc.edu/FAQ/FAQformat#format1.
1. **chrom1** - The name of the chromosome on which the **first** end of the feature exists.
- *Any string can be used*. For example, "chr1", "III", "myChrom", "contig1112.23".
- *This column is required*.
- *Use "." for unknown*.
2. **start1** - The zero-based starting position of the **first** end of the feature on **chrom1**.
- *The first base in a chromosome is numbered 0*.
- *As with BED format, the start position in each BEDPE feature is therefore interpreted to be 1 greater than the start position listed in the feature. This column is required*.
- *Use -1 for unknown*.
3. **end1** - The one-based ending position of the first end of the feature on **chrom1**.
- *The end position in each BEDPE feature is one-based*.
- *This column is required*.
- *Use -1 for unknown*.
4. **chrom2** - The name of the chromosome on which the **second** end of the feature exists.
- *Any string can be used*. For example, "chr1", "III", "myChrom", "contig1112.23".
- *This column is required*.
- *Use "." for unknown*.
5. **start2** - The zero-based starting position of the **second** end of the feature on **chrom2**.
- *The first base in a chromosome is numbered 0*.
- *As with BED format, the start position in each BEDPE feature is therefore interpreted to be 1 greater than the start position listed in the feature. This column is required*.
- *Use -1 for unknown*.
6. **end2** - The one-based ending position of the **second** end of the feature on **chrom2**.
- *The end position in each BEDPE feature is one-based*.
- *This column is required*.
- *Use -1 for unknown*.
7. **name** - Defines the name of the BEDPE feature.
- *Any string can be used*. For example, "LINE", "Exon3", "HWIEAS_0001:3:1:0:266#0/1", or "my_Feature".
- *This column is optional*.
8. **score** - The UCSC definition requires that a BED score range from 0 to 1000, inclusive. *However, bedtools allows any string to be stored in this field in order to allow greater flexibility in annotation features*. For example, strings allow scientific notation for p-values, mean enrichment values, etc. It should be noted that this flexibility could prevent such annotations from being correctly displayed on the UCSC browser.
- *Any string can be used*. For example, 7.31E-05 (p-value), 0.33456 (mean enrichment value), "up", "down", etc.
- *This column is optional*.
9. **strand1** - Defines the strand for the first end of the feature. Either '+' or '-'.
- *This column is optional*.
- *Use "." for unknown*.
10. **strand2** - Defines the strand for the second end of the feature. Either '+' or '-'.
- *This column is optional*.
- *Use "." for unknown*.
11. **Any number of additional, user-defined fields** - bedtools allows one to add as many additional fields to the normal, 10-column BEDPE format as necessary. These columns are merely "passed through" **pairToBed** and **pairToPair** and are not part of any analysis. One would use these additional columns to add extra information (e.g., edit distance for each end of an alignment, or "deletion", "inversion", etc.) to each BEDPE feature.
- *These additional columns are optional*.
......@@ -140,28 +191,46 @@ GFF format
The GFF format is described on the Sanger Institute's website (http://www.sanger.ac.uk/resources/software/gff/spec.html). The GFF description below is modified from the definition at this URL. All nine columns in the GFF format description are required by bedtools.
1. **seqname** - The name of the sequence (e.g. chromosome) on which the feature exists.
- *Any string can be used*. For example, "chr1", "III", "myChrom", "contig1112.23".
- *This column is required*.
2. **source** - The source of this feature. This field will normally be used to indicate the program making the prediction, or if it comes from public database annotation, or is experimentally verified, etc.
- *This column is required*.
3. **feature** - The feature type name. Equivalent to BED's **name** field.
- *Any string can be used*. For example, "exon", etc.
- *This column is required*.
4. **start** - The one-based starting position of feature on **seqname**.
- *This column is required*.
- *bedtools accounts for the fact the GFF uses a one-based position and BED uses a zero-based start position*.
5. **end** - The one-based ending position of feature on **seqname**.
- *This column is required*.
6. **score** - A score assigned to the GFF feature. Like BED format, bedtools allows any string to be stored in this field in order to allow greater flexibility in annotation features. We note that this differs from the GFF definition in the interest of flexibility.
- *This column is required*.
7. **strand** - Defines the strand. Use '+', '-' or '.'
- *This column is required*.
8. **frame** - The frame of the coding sequence. Use '0', '1', '2', or '.'.
- *This column is required*.
9. **attribute** - Taken from http://www.sanger.ac.uk/resources/software/gff/spec.html: From version 2 onwards, the attribute field must have an tag value structure following the syntax used within objects in a .ace file, flattened onto one line by semicolon separators. Tags must be standard identifiers ([A-Za-z][AZa-z0-9_]*). Free text values must be quoted with double quotes. *Note: all non-printing characters in such free text value strings (e.g. newlines, tabs, control characters, etc) must be explicitly represented by their C (UNIX) style backslash-escaped representation (e.g. newlines as '\n', tabs as '\t')*. As in ACEDB, multiple values can follow a specific tag. The aim is to establish consistent use of particular tags, corresponding to an underlying implied ACEDB model if you want to think that way (but acedb is not required).
9. **attribute** - Taken from http://www.sanger.ac.uk/resources/software/gff/spec.html: From version 2 onwards, the attribute field must have an tag value structure following the syntax used within objects in a .ace file, flattened onto one line by semicolon separators. Free text values must be quoted with double quotes. *Note: all non-printing characters in such free text value strings (e.g. newlines, tabs, control characters, etc) must be explicitly represented by their C (UNIX) style backslash-escaped representation (e.g. newlines as '\n', tabs as '\t')*. As in ACEDB, multiple values can follow a specific tag. The aim is to establish consistent use of particular tags, corresponding to an underlying implied ACEDB model if you want to think that way (but acedb is not required).
- *This column is required*.
An entry from an example GFF file :
::
seq1 BLASTX similarity 101 235 87.1 + 0 Target "HBA_HUMAN" 11 55 ;
E_value 0.0003 dJ102G20 GD_mRNA coding_exon 7105 7201 . - 2 Sequence
......@@ -180,6 +249,7 @@ the chromosomes (or scaffolds, etc.) and their size (in basepairs).
Genome files must be **tab-delimited** and are structured as follows (this is an example for *C. elegans*):
::
chrI 15072421
chrII 15279323
......
......@@ -77,9 +77,9 @@ reported after the complete feature in the file to be annotated.
=======================================================================================
===========================================================================================
``-both`` Report both the count of hits and the fraction covered from the annotation files
=======================================================================================
===========================================================================================
.. code-block:: bash
......
......@@ -145,6 +145,7 @@ BED format.
the entire span of a spliced/split BAM alignment. However, when using the
``-split`` command, a BED12 feature is reported where BED blocks will be
created for each aligned portion of the sequencing read.
::
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
......@@ -10,6 +10,7 @@ bed12ToBed6 would create six separate BED6 features (i.e., one for each exon).
Usage and option summary
==========================================================================
Usage:
::
bed12ToBed6 [OPTIONS] -i <BED12>
......@@ -26,14 +27,12 @@ Usage:
Default behavior
==========================================================================
Figure:
::
head data/knownGene.hg18.chr21.bed | tail -n 3
chr21 10079666 10120808 uc002yiv.1 0 - 10081686 1 0 1 2 0 6 0 8
0 4 528,91,101,215, 0,1930,39750,40927,
chr21 10080031 10081687 uc002yiw.1 0 - 10080031 1 0 0 8 0 0 3 1
0 2 200,91, 0,1565,
chr21 10081660 10120796 uc002yix.2 0 - 10081660 1 0 0 8 1 6 6 0
0 3 27,101,223,0,37756,38913,
chr21 10079666 10120808 uc002yiv.1 0 - 10081686 1 0 1 2 0 6 0 8 0 4 528,91,101,215, 0,1930,39750,40927,
chr21 10080031 10081687 uc002yiw.1 0 - 10080031 1 0 0 8 0 0 3 1 0 2 200,91, 0,1565,
chr21 10081660 10120796 uc002yix.2 0 - 10081660 1 0 0 8 1 6 6 0 0 3 27,101,223,0,37756,38913,
head data/knownGene.hg18.chr21.bed | tail -n 3 | bed12ToBed6 -i stdin
chr21 10079666 10080194 uc002yiv.1 0 -
......
......@@ -8,6 +8,7 @@ storing large genome annotations in a compact, indexed format for visualization
Usage and option summary
==========================================================================
Usage:
::
bedToBam [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> > <BAM>
......@@ -26,6 +27,7 @@ Usage:
Default behavior
==========================================================================
The default behavior is to assume that the input file is in unblocked format. For example:
::
head -5 rmsk.hg18.chr21.bed
chr21 9719768 9721892 ALR/Alpha 1004 +
......@@ -53,6 +55,7 @@ alignments. The image illustrates this behavior, as the top track is a BAM repre
bedToBam) of a BED file of UCSC genes.
For example:
::
bedToBam -i knownGene.hg18.chr21.bed -g human.hg18.genome -bed12 > knownGene.bam
......
......@@ -11,6 +11,7 @@ overlapping feature as the closest---that is, it does not restrict to closest *n
5.6.1 Usage and option summary
==========================================================================
**Usage:**
::
closestBed [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
......@@ -41,6 +42,7 @@ Default behavior
in B that overlaps the highest fraction of A is reported. If no overlaps are found, **closestBed** looks for
the feature in B that is *closest* (that is, least genomic distance to the start or end of A) to A. For
example, in the figure below, feature B1 would be reported as the closest feature to A1.
::
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -52,6 +54,7 @@ example, in the figure below, feature B1 would be reported as the closest featur
For example:
::
cat A.bed
chr1 100 200
......@@ -84,6 +87,7 @@ choose the just first or last feature (in terms of where it occurred in the inpu
position) that occurred in B.
For example (note the difference between -l 200 and -l 300):
::
cat A.bed
chr1 100 101 rs1234
......@@ -116,6 +120,7 @@ For example (note the difference between -l 200 and -l 300):
==========================================================================
ClosestBed will optionally report the distance to the closest feature in the B file using the **-d** option.
When a feature in B overlaps a feature in A, a distance of 0 is reported.
::
cat A.bed
chr1 100 200
......
......@@ -12,6 +12,7 @@ computes the fraction of bases in B interval that were overlapped by one or more
Usage and option summary
==========================================================================
Usage:
::
coverageBed [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
......@@ -51,6 +52,7 @@ After each interval in B, **coverageBed** will report:
4) The fraction of bases in B that had non-zero coverage from features in A.
Below are the number of features in A (N=...) overlapping B and fraction of bases in B with coverage.
::
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -63,6 +65,7 @@ Below are the number of features in A (N=...) overlapping B and fraction of base
For example:
::
cat A.bed
chr1 10 20
......@@ -89,6 +92,7 @@ Use the "**-s**" option if one wants to only count coverage if features in A are
feature / window in B. This is especially useful for RNA-seq experiments.
For example (note the difference in coverage with and without **-s**:
::
cat A.bed
chr1 10 20 a1 1 -
......@@ -120,6 +124,7 @@ features in A across B.
In this case, each entire feature in B will be reported, followed by the depth of coverage, the number of
bases at that depth, the size of the feature, and the fraction covered. After all of the features in B have
been reported, a histogram summarizing the coverage among all features in B will be reported.
::
cat A.bed
chr1 10 20 a1 1 -
......@@ -150,6 +155,7 @@ positions across each B interval.
The output will consist of a line for each one-based position in each B feature, followed by the coverage
detected at that position.
::
cat A.bed
chr1 0 5
......
###############
*igv*
###############
\ No newline at end of file
......@@ -506,6 +506,7 @@ For example, the diagram below illustrates the *default* behavior. The blue dots
spliced" portion of the alignment (i.e., CIGAR "N" operation). In this case, the two exon annotations
are reported as overlapping with the "split" BAM alignment, but in addition, a third feature that
overlaps the "split" portion of the alignment is also reported.
::
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -519,6 +520,7 @@ overlaps the "split" portion of the alignment is also reported.
In contrast, when using the **-split** option, only the exon overlaps are reported.
::
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
......@@ -9,6 +9,7 @@ annotations or features.
Usage and option summary
==========================================================================
Usage:
::
linksBed [OPTIONS] -i <BED/GFF/VCF> > <HTML file>
......@@ -29,6 +30,7 @@ Default behavior
By default, **linksBed** creates links to the public UCSC Genome Browser.
For example:
::
head genes.bed
chr21 9928613 10012791 uc002yip.1 0 -
......@@ -57,6 +59,7 @@ Creating HTML links to a local UCSC Browser installation
Optionally, **linksBed** will create links to a local copy of the UCSC Genome Browser.
For example:
::
head -3 genes.bed
chr21 9928613 10012791 uc002yip.1 0 -
......@@ -65,6 +68,7 @@ For example:
linksBed -i genes.bed -base http://mirror.uni.edu > genes.html
One can point the links to the appropriate organism and genome build as well:
::
head -3 genes.bed
chr21 9928613 10012791 uc002yip.1 0 -
......
......@@ -9,6 +9,7 @@
:align: center
|
``bedtools map`` allows one to map overlapping features in a B file onto
features in an A file and apply statistics and/or summary operations on those
features.
......
......@@ -9,6 +9,7 @@
|
``bedtools merge`` combines overlapping or "book-ended" features in an interval
file into a single feature which spans all of the combined features.
......
......@@ -10,6 +10,7 @@ the output of other BEDTools.
Usage and option summary
==========================================================================
Usage:
::
overlap [OPTIONS] -i <input> -cols s1,e1,s2,e2
......@@ -27,6 +28,7 @@ Default behavior
==========================================================================
The default behavior is to compute the amount of overlap between the features you specify based on the
start and end coordinates. For example:
::
windowBed -a A.bed -b B.bed -w 10
chr1 10 20 A chr1 15 25 B
......@@ -34,6 +36,7 @@ start and end coordinates. For example:
# Now let's say we want to compute the number of base pairs of overlap
# between the overlapping features from the output of windowBed.
::
windowBed -a A.bed -b B.bed -w 10 | overlap -i stdin -cols 2,3,6,7
chr1 10 20 A chr1 15 25 B 5
......
......@@ -11,6 +11,7 @@ discordant pair suggests the same structural variation in each file/sample.
Usage and option summary
================================
**Usage:**
::
pairToPair [OPTIONS] -a <BEDPE> -b <BEDPE>
......@@ -45,6 +46,7 @@ overlaps on each end be on the same strand. This way, an otherwise overlapping (
locations) F/R alignment will not be matched with a R/R alignment.
Default: Report A if *both* ends overlaps B.
::
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -57,6 +59,7 @@ Default: Report A if *both* ends overlaps B.
Default when strand information is present in both BEDPE files: Report A if *both* ends overlaps B *on
the same strands*.
::
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -83,6 +86,7 @@ Using then **-type neither, pairToPair** will only report A if *neither* end ove
feature in B.
**-type neither**: Report A only if *neither* end overlaps B.
::
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
......@@ -23,12 +23,14 @@ that should be generated.
Usage and option summary
==========================================================================
**Usage**:
::
bedtools random [OPTIONS] -g <GENOME>
**(or)**:
::
:
:
randomBed [OPTIONS] -g <GENOME>
......
......@@ -7,6 +7,7 @@
Usage and option summary
==========================================================================
Usage:
::
sortBed [OPTIONS] -i <BED/GFF/VCF>
......@@ -29,6 +30,7 @@ Default behavior
By default, **sortBed** sorts a BED file by chromosome and then by start position in ascending order.
For example:
::
cat A.bed
chr1 800 1000
......@@ -51,6 +53,7 @@ Optional sorting behavior
**sortBed** will also sorts a BED file by chromosome and then by other criteria.
For example, to sort by chromosome and then by feature size (in descending order):
::
cat A.bed
chr1 800 1000
......@@ -68,6 +71,7 @@ For example, to sort by chromosome and then by feature size (in descending order
**Disclaimer:** it should be noted that **sortBed** is merely a convenience utility, as the UNIX sort utility
will sort BED files more quickly while using less memory. For example, UNIX sort will sort a BED file
by chromosome then by start position in the following manner:
::
sort -k 1,1 -k2,2n a.bed
chr1 1 10
......
......@@ -9,6 +9,7 @@ compare coverage (and other text-values such as genotypes) across multiple sampl
Usage and option summary
==========================================================================
Usage:
::
unionBedGraphs [OPTIONS] -i FILE1 FILE2 FILE3 ... FILEn
......@@ -31,6 +32,7 @@ Usage:
Default behavior
==========================================================================
Figure:
::
cat 1.bg
chr1 1000 1500 10
......@@ -62,6 +64,7 @@ Figure:
``-header`` Add a header line to the output
==========================================================================
Figure:
::
unionBedGraphs -i 1.bg 2.bg 3.bg -header
chrom start end 1 2 3
......@@ -80,6 +83,7 @@ Figure:
``-names`` Add a header line with custom file names to the output
==========================================================================
Figure:
::
unionBedGraphs -i 1.bg 2.bg 3.bg -header -names WT-1 WT-2 KO-1
chrom start end WT-1 WT-2 KO-1
......@@ -100,6 +104,7 @@ Figure:
``-empty`` Include regions that have zero coverage in all BEDGRAPH files.
==========================================================================
Figure:
::
unionBedGraphs -i 1.bg 2.bg 3.bg -empty -g sizes.txt -header
chrom start end WT-1 WT-2 KO-1
......@@ -122,6 +127,7 @@ Figure:
``-filler`` Use a custom value for missing values.
==========================================================================
Figure:
::
unionBedGraphs -i 1.bg 2.bg 3.bg -empty -g sizes.txt -header -filler N/A
chrom start end WT-1 WT-2 KO-1
......@@ -144,6 +150,7 @@ Figure:
Use BEDGRAPH files with non-numeric values.
==========================================================================
Figure:
::
cat 1.snp.bg
chr1 0 1 A/G
......
......@@ -120,11 +120,11 @@ For example (note the difference between -l 200 and -l 300):
chr1 10000 20000
$ bedtools window -a A.bed -b B.bed -l 200 -r 20000
chr1 100 200 chr1 10000 20000
chr1 1000 2000 chr1 10000 20000
$ bedtools window -a A.bed -b B.bed -l 300 -r 20000
chr1 100 200 chr1 500 800
chr1 100 200 chr1 10000 20000
chr1 1000 2000 chr1 500 800
chr1 1000 2000 chr1 10000 20000
==========================================================================
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment