Commit caf701ed authored by St. Elmo's avatar St. Elmo
Browse files

merged branches

parents 01460618 0a296f34
name: Documentation
on:
push:
branches:
- master
tags: '*'
pull_request:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@latest
with:
version: '1.5'
- name: Install dependencies
run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
- name: Build and deploy
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # For authentication with GitHub Actions token
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # For authentication with SSH deploy key
run: julia --project=docs/ docs/make.jl
\ No newline at end of file
name: CI
on:
pull_request:
branches:
- master
push:
branches:
- master
tags: '*'
jobs:
test:
name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
version:
- '1.5'
- '1'
os:
- ubuntu-latest
- macOS-latest
- windows-latest
arch:
- x64
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@v1
with:
version: ${{ matrix.version }}
arch: ${{ matrix.arch }}
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v1
with:
file: lcov.info
\ No newline at end of file
name: format-check
on:
push:
branches:
- 'master'
tags: '*'
pull_request:
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
julia-version: [1.3.0]
julia-arch: [x86]
os: [ubuntu-latest]
steps:
- uses: julia-actions/setup-julia@latest
with:
version: ${{ matrix.julia-version }}
- uses: actions/checkout@v1
- name: Install JuliaFormatter and format
run: |
julia -e 'using Pkg; Pkg.add(PackageSpec(name="JuliaFormatter"))'
julia -e 'using JuliaFormatter; format(".", verbose=true)'
- name: Format check
run: |
julia -e '
out = Cmd(`git diff --name-only`) |> read |> String
if out == ""
exit(0)
else
@error "Some files have not been formatted !!!"
write(stdout, out)
exit(1)
end'
\ No newline at end of file
.DS_Store
/Manifest.toml
docs/build
test/data/ecoli_core.xml
# ignore VScode clutter
/.vscode
/vscode
/.vscode
\ No newline at end of file
*.code-workspace
# Build artifacts for creating documentation generated by the Documenter package
docs/build/
docs/site/
# File generated by Pkg, the package manager, based on a corresponding Project.toml
# It records a fixed state of all packages used by the project. As such, it should not be
# committed for packages, but should be committed for applications that require a static
# environment.
Manifest.toml
# Ignore temporary script for testing functions
temp.jl
......@@ -5,22 +5,34 @@ version = "0.1.0"
[deps]
Clp = "e2554f3b-3117-50c0-817c-e040a3ddf72d"
Conda = "8f4d0f93-b110-5947-807f-2305c1781a2d"
DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
DistributedData = "f6a0035f-c5ac-4ad0-b410-ad102ced35df"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
GLPK = "60bf3e95-4087-53dc-ae20-288a0d20c6a6"
HDF5 = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
JuMP = "4076af6c-e467-56ae-b986-b466b2749572"
JuliaFormatter = "98e50ef6-434e-11e9-1051-2b60c6c9e899"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
MAT = "23992714-dd62-5051-b70f-ba57cb901cac"
Measurements = "eff96d63-e80a-5855-80a2-b1b0885c5ab7"
OSQP = "ab2f91bb-94b4-55e3-9ba0-7f65df51de79"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Requires = "ae029012-a4dd-5104-9daa-d747884805df"
Revise = "295af30f-e4ad-537b-8983-00126c2a3abe"
SBML = "e5567a89-2604-4b09-9718-f5f78e97c3bb"
SBML_jll = "bb12108a-f4ef-5f88-8ef3-0b33ff7017f1"
SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
Tulip = "6dd1b50a-3aae-11e9-10b5-ef983d2400fa"
[compat]
julia = "1"
DistributedData = "0.1.3"
julia = "1"
......@@ -3,4 +3,54 @@
</div>
<br>
# Constraint-Based Reconstruction and EXascale Analysis
\ No newline at end of file
# Constraint-Based Reconstruction and EXascale Analysis
[docs-img]:https://img.shields.io/badge/docs-latest-blue.svg
[docs-url]: https://stelmo.github.io/CobraTools.jl/dev
[ci-img]: https://github.com/stelmo/CobraTools.jl/actions/workflows/ci.yml/badge.svg?branch=master&event=push
[ci-url]: https://github.com/stelmo/CobraTools.jl/actions/workflows/ci.yml
[cov-img]: https://codecov.io/gh/stelmo/CobraTools.jl/branch/master/graph/badge.svg?token=3AE3ZDCJJG
[cov-url]: https://codecov.io/gh/stelmo/CobraTools.jl
[contrib]: https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat
[license-img]: http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat
[license-url]: LICENSE.md
[![][license-img]][license-url] [![contributions welcome][contrib]](https://github.com/LCSB-BioCore/COBREXA.jl/issues)
| **Documentation** | **Tests** | **Coverage** |
|:--------------:|:-------:|:---------:|
| [![docs-img]][docs-url] | [![CI][ci-img]][ci-url] | [![codecov][cov-img]][cov-url] |
This is package aims to provide constraint based reconstruction and analysis tools at the exa-scale in Julia.
## Installation
To install this package: `] add ???`. See the documentation for more information.
## Quick Example
Let's use `COBREXA.jl` to perform classic flux balance analysis on an *E. coli* community.
```julia
using COBREXA
using JuMP
using Tulip # pick any solver supported by JuMP
# Import E. coli models (models have pretty printing)
model_1 = read_model("iJO1366.json")
model_2 = read_model("iJO1366.json")
model_3 = read_model("iJO1366.json")
# Build an exascale model
exascale_model = join(model_1, model_2, model_3,...)
```
More funcionality is described in the documention, e.g. model construction and analysis in pure Julia.
### Citations
1) Ebrahim, A., Lerman, J.A., Palsson, B.O. & Hyduke, D. R. (2013). COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Systems Biology, 7(74). https://doi.org/10.1186/1752-0509-7-74
2) Heirendt, L., Arreckx, S., Pfau, T. et al. (2019). Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat Protoc 14, 639–702. https://doi.org/10.1038/s41596-018-0098-2
3) Noor, E., Bar-Even, A., Flamholz, A., Lubling, Y., Davidi, D., & Milo, R. (2012). An integrated open framework for thermodynamics of reactions that combines accuracy and coverage. Bioinformatics, 28(15), 2037–2044. https://doi.org/10.1093/bioinformatics/bts317
4) Chang, A., Jeske, L., Ulbrich, S., Hofmann, J., Koblitz, J., Schomburg, I., Neumann-Schaal, M., Jahn, D., Schomburg, D.. (2021). BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Research, 49(D1). https://doi.org/10.1093/nar/gkaa1025
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
[compat]
Documenter = "0.26"
\ No newline at end of file
......@@ -15,5 +15,11 @@ makedocs(modules = [COBREXA],
"Home" => "index.md",
"Functions" => "functions.md",
"How to contribute" => "howToContribute.md",
"Model Structure" => "model_structure.md",
"Model IO" => "io.md",
"Model Construction" => "model_construction.md",
"Optimization Based Analysis Tools" => "basic_analysis.md",
"Sampling Tools" => "sampling_tools.md",
"External Tools" => "external_tools.md",
],
)
\ No newline at end of file
# Optimization Based Analysis
A selection of standard COBRA functions have been implemented to make basic model analysis more convenient.
Additionally, `CobraTools.jl` allows you to easily formulate your own optimization problems using the structure of a constraint based model.
This makes it easy to experiment with custom algorithms etc.
## Flux balance analysis (FBA)
Flux balance analysis solves the linear program,
```math
\begin{aligned}
& \underset{v}{\text{max}}
& \sum_{i}^I {w_i}{v_i} \\
& \text{s. t.}
& Sv = 0 \\
& & v_{\text{LB}} \leq v \leq v_{\text{UB}} \\
\end{aligned}
```
using any `JuMP` compatible solver. Typically ``I`` is a singleton set that only includes the index of the biomass objective function, ``v_\mu``, and weight, ``w_\mu=1``.
```@docs
fba
```
Here, we use `Tulip.jl`, a pure Julia interior point linear program solver, with the `fba` function from `CobraTools.jl`.
```@setup fba
model_location = joinpath("..","..", "models", "e_coli_core.json")
```
```@example fba
using CobraTools
using JuMP
using Tulip
model = read_model(model_location)
biomass = findfirst(model.reactions, "BIOMASS_Ecoli_core_w_GAM")
optimizer = Tulip.Optimizer
sol = fba(model, biomass, optimizer)
```
Note that the result of the optimization problem, `sol` maps fluxes to reaction `id`s in the model, to simplify down stream analysis.
Since the fluxes are mapped to a dictionary, it makes it simple to export them as a JSON file for visualization in, e.g. [Escher](https://escher.github.io/#/).
```@example fba
using JSON
open("fluxes.json", "w") do io
JSON.print(io, sol)
end
```
## Solution inspection
Sometimes it is useful to investigate which reactions consume or produce a certain metabolite, or which exchange reactions are active.
This functionality is exposed via `metabolite_fluxes` and `exchange_reactions`.
```@docs
metabolite_fluxes
exchange_reactions
```
```@example fba
consuming, producing = metabolite_fluxes(sol, model)
consuming["atp_c"]
```
## Parsimonious FBA
Parsimonious FBA (pFBA) solves a two stage optimization problem. First, a classic FBA problem is solved to identify the unique maximum of the objective.
However, it should be noted that the fluxes from FBA are not unique (i.e. many fluxes may yield the objective optimum).
To yield a unique set of fluxes, and remove internal futile cycles, a secondary quadratic problem is imposed *ad hoc*.
Suppose that FBA has found the optimum of ``v_\mu = \mu``, pFBA then solves,
```math
\begin{aligned}
& \underset{v}{\text{min}}
& \sum_{i} {v_i^2} \\
& \text{s. t.}
& Sv = 0 \\
& & v_{\text{LB}} \leq v \leq v_{\text{UB}} \\
& & v_\mu = \mu
\end{aligned}
```
again using any JuMP compatible solver(s). In the `CobraTools.jl` implementation of pFBA, both the FBA and QP problem are solved internally in `pfba`, using similar input arguments as in `fba`. If multiple solvers are given, the first solver is used to solve the LP, and the second solver the QP, otherwise the same solver is used to solve both problems.
This is useful if the QP solver does not handle the LP problem well, as with OSQP.
An alternative, related formulation of this idea exists, called "CycleFreeFlux".
This replaces the quadratic formulation with an L1 (taxi cab) norm. While this should also remove futile cycles, it doesn't have the same uniqueness qualities and doesn't
really have much benefit beyond only solving two linear programs. See [Building your own optimization analysis script](@ref) for ideas about how to implement this yourself if you really need it.
```@docs
pfba
```
Here, we use `Tulip.jl` followed by `OSQP.jl`, with the `pfba` function from `CobraTools.jl`. Note that `OSQP.jl` has iffy performance, and is only included here because it is open source. We recommend that a commercial solver, e.g. `Gubobi.jl`, be used to simplify your user experience.
```@example fba
using OSQP
atts = Dict("eps_abs" => 5e-4,"eps_rel" => 5e-4, "max_iter" => 100_000, "verbose"=>false) # set solver attributes for QSQP
sol = pfba(model, biomass, [Tulip.Optimizer, OSQP.Optimizer]; solver_attributes=Dict("opt1" => Dict{Any, Any}(), "opt2" => atts))
```
## Flux variability analysis (FVA)
Flux variability analysis can also be used to investigate the degeneracy associated with flux balance analysis derived solutions (see also [Sampling Tools](@ref)).
`CobraTools.jl` exposes `fva` that sequentially maximizes and minimizes each reaction in a model subject to the constraint that each optimization problem also satisfies an initial FBA type objective optimum, below denoted by ``v_{\mu}=\mu``,
```math
\begin{aligned}
& \underset{v}{\text{max or min}}
& v_i \\
& \text{s. t.}
& Sv = 0 \\
& & v_{\text{LB}} \leq v \leq v_{\text{UB}} \\
& & v_{\mu} = \mu \\
\end{aligned}
```
```@docs
fva
```
## Building your own optimization analysis script
`CobraTools.jl` also makes it simple to construct customized optimization problems by making judicious use of [JuMP](https://jump.dev/).
Convenience functions make optimization problem construction, modification and data extraction from JuMP result objects easy.
```@docs
get_core_model
build_cbm
set_bound
map_fluxes(::Array{Float64,1}, ::CobraTools.Model)
```
```@example fba
using CobraTools
using JuMP
using Tulip
model = read_model(model_location)
cbm, v, mb, ubs, lbs = build_cbm(model)
glucose_index = model[findfirst(model.reactions, "EX_glc__D_e")]
set_bound(glucose_index, ubs, lbs; ub=-12.0, lb=-12.0)
set_optimizer(cbm, Tulip.Optimizer)
@objective(cbm, Max, v[model[biomass]])
optimize!(cbm)
sol = map_fluxes(v, model)
```
# External Tools
Numerous external databases exist that can add functionality to constraint based models.
Currently, `CobraTools.jl` tries to make it easier to incorporate two such databases in analyses.
The first is Brenda, a database that maps enzymes to kinetic data.
The second is Equilibrator, a database that allows one to calculate thermodynamic information, e.g. ``\Delta G``, from reactions.
`CobraTools.jl` includes lightweight interfaces to these sources.
## Brenda Interface
To use the Brenda interface functions, you will need to download the database as a txt file [available here](https://www.brenda-enzymes.org/download_brenda_without_registration.php) (~250 MB).
Once the database has been downloaded, it can be parsed by `parse_brenda`.
```@docs
parse_brenda
```
This function returns an array of `BrendaEntry` structs, which are composed of `EnzymeParams` for each field extracted from the Brenda database.
Currently, only the ID (=EC number), TN (=turn over number), KM (=Michaelis-Menten constant, ``K_M``), KI (=Inhibition term for Michaelis-Menten kintics), and KKM (=ratio of TN/KM) numbers for each enzyme class (ID, or EC number) are extracted. All the structs have pretty printing enabled.
```@docs
CobraTools.BrendaEntry
CobraTools.EnzymeParams
```
```@setup brenda
brenda_loc = model_location = joinpath("..","..", "test", "data", "small_brenda.txt")
```
```@example brenda
using CobraTools
brenda_data = parse_brenda(brenda_loc)
```
```@example brenda
brenda_data[1]
```
```@example brenda
brenda_data[1].TN[1]
```
## Equilibrator Interface
The Equilibrator interface requires that the Equilibrator-API has been installed and can be accessed through Julia's PyCall package. Refer to the [Equilibrator-API website](https://gitlab.com/equilibrator/equilibrator-api) for installation instructions. Within Julia, if you can call `pyimport("equilibrator_api")` successfully, then you will be able to use the functions exposed here. To actually use the functions insert `using PyCall` in your main level script (before or after `using CobraTools`).
```@docs
map_gibbs_rxns
map_gibbs_external
map_gibbs_internal
```
......@@ -22,4 +22,50 @@ If you want to contribute, please read these guidelines first:
```@contents
Pages = ["howToContribute.md"]
```
\ No newline at end of file
```
## Contents
```@contents
Pages = [
"model_structure.md",
"io.md",
"model_construction.md",
"basic_analysis.md",
"sampling_tools.md",
"external_tools.md",
"thermodynamics.md"
]
Depth=2
```
## Installation
To install this package: `] add https://github.com/stelmo/CobraTools.jl`.
Some of the optional features used in this package require external programs and/or data to be available. These are described below:
* The Equilibrator interface requires that the Equilibrator-API has been installed and can be accessed through Julia's PyCall package. Refer to the [Equilibrator-API website](https://gitlab.com/equilibrator/equilibrator-api) for installation instructions. Within Julia, if you can call `pyimport("equilibrator_api")` successfully, then you will be able to use the functions exposed here. To actually use the functions insert `using PyCall` in your main level script (before or after `using CobraTools`).
* To extract turnover numbers, Km, Kcat/Km and Ki from the Brenda database, you will need to download the database as a txt file [available here](https://www.brenda-enzymes.org/download_brenda_without_registration.php) (~250 MB).
The optimization solvers are implemented through `JuMP` and thus this package should be solver agnostic. All tests are conducted using `Tulip.jl` and `OSQP.jl`, but other solvers should also work (I mostly use `Gurobi.jl`).
## Quick Example
Let's perform flux balance analysis on a constraint based model.
```@setup intro
model_location = joinpath("..","..", "models", "e_coli_core.json")
```
```@example intro
using CobraTools
using JuMP
using Tulip
model = read_model(model_location)
biomass = findfirst(model.reactions, "BIOMASS_Ecoli_core_w_GAM")
optimizer = Tulip.Optimizer
sol = fba(model, biomass, optimizer)
```
# Model IO
## Reading constraint based models
Currently, JSON and Matlab formatted models can be imported.
```@docs
read_model(file_location::String)
```
```@example
using CobraTools
model_location = joinpath("..","..", "models", "iJO1366.json")
model = read_model(model_location)
model # pretty printing
```
## Writing constraint based models
Currently, JSON and Matlab models can be exported.
```@docs
save_model(model::CobraTools.Model, file_location::String)
```
```@example
using CobraTools # hide
model_location = joinpath("..","..", "models", "iJO1366.json") # hide
model = read_model(model_location) # hide
# "e_coli_json_model.json" is the file name we are going to use to save the model
model_location = joinpath("e_coli_json_model.json")
# model is a CobraTools.Model object previously imported or created
save_model(model, model_location)
rm(model_location) # hide
```
## IO Problems?
Please let me know when you run into model import/export problems by filing an issue.
\ No newline at end of file
# Model Construction
## Defining genes
Genes are represented by the `Gene` type in `CobraTools.jl`, see [Model Structure](@ref) for details.
`Gene`s can be constructed using either an empty constructor, or a constructor taking only
the string `id` of the gene.
```@docs
Gene()
Gene(::String)
```
```@example
using CobraTools # hide
gene = Gene("gene1")
gene.name = "gene 1 name"
gene # pretty printing
```
Helper functions from Base have also been overwritten to make accessing arrays of genes easy.
```@docs
findfirst(::Array{Gene, 1}, ::String)
getindex(::Array{Gene, 1}, ::Gene)
```
## Defining metabolites
Metabolites are represented by the `Metabolite` type in `CobraTools.jl`, see [Model Structure](@ref) for details.
The simplest way to define a new metabolite is by using the empty constructor `Metabolite()`.
Alternatively, `Metabolite(id::String)` can be used to assign only the `id` field of the `Metabolite`.
```@docs
Metabolite()
Metabolite(::String)
```
The other fields can be modified as usual, if desired.
```@example
using CobraTools
atp = Metabolite("atp")
atp.name = "Adenosine triphosphate"
atp.formula = "C10H12N5O13P3"
atp.charge = -4 # note that the charge is an Int
atp # pretty printing
```
Basic analysis of `Metabolite`s is also possible. The molecular structure of a metabolite can be inspected by calling
`get_atoms(met::Metabolite)`.
This function is useful for checking atom balances across reactions, or the entire model.
```@docs
CobraTools.get_atoms(met::Metabolite)
```
```@example
using CobraTools # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
get_atoms(atp)
```
Helper functions from Base have also been overwritten to make accessing arrays of metabolites easy.
```@docs
findfirst(mets::Array{Metabolite, 1}, metid::String)
getindex(mets::Array{Metabolite, 1}, met::Metabolite)
```
## Defining reactions
Reactions are represented by the `Reaction` type in `CobraTools.jl`, see [Model Structure](@ref) for details.
The simplest way to define a new reaction is by using the empty constructor `Reaction()`.
All the other fields still need to be assigned.
```@docs
Reaction()
```
Another option is to use `Reaction(id::String, metabolites::Dict{Metabolite, Float64}, dir="bidir")`, which
assigns the reaction `id`, the reaction stoichiometry (through the metabolite dictionary argument), and the directionality of the reaction.
The remaining fields still need to be assigned, if desired.
```@docs
Reaction(id::String, metabolites::Dict{Metabolite, Float64}, dir="bidir")
```
```@example
using CobraTools # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
atp.charge = -4 # hide
gene = Gene("gene1") # hide
adp = Metabolite("adp") # define another metabolite
metdict = Dict(atp => -1.0, adp => 1.0) # nb stoichiometries need to be floats
rxn = Reaction("dummy rxn", metdict, "for")
rxn.annotation["ec-code"] = ["0.0.0.0"]
rxn.grr = [[gene]] # only gene1 is required for this reaction to work
rxn # pretty printing
```
See the discussion in [Model Structure](@ref) about how to assign `grr` to a reaction.
Yet another way of defining a reaction is through overloading of the operators: `*, +, ∅, ⟶, →, ←, ⟵, ↔, ⟷`.
The longer and shorter arrows mean the same thing, i.e. `⟶` is the same as `→`, etc.
The other fields of the reaction still need to be set directly.
```@example
using CobraTools # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
atp.charge = -4 # hide
adp = Metabolite("adp") # hide
adp.formula = "C10H12N5O10P2" # hide
another_rxn = 2.0adp ⟶ 2.0*atp # forward reaction
another_rxn.id = "another dummy rxn"
another_rxn
```
When building exchange, demand, or sinks reactions the `∅` empty metabolite should be used to indicate that a metabolite is being created or destroyed.
```@example
using CobraTools # hide
adp = Metabolite("adp") # hide
adp.formula = "C10H12N5O10P2" # hide
ex_rxn = ∅ ⟷ adp # exchange reaction
```
It is also possible to check if a reaction is mass balanced by using `is_mass_balanced(rxn::Reaction)`.
Note, this function requires that all the metabolites in the reaction have formulas assigned to them to work properly.
```@docs
is_mass_balanced
```
```@example
using CobraTools # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
atp.charge = -4 # hide
adp = Metabolite("adp") # hide
adp.formula = "C10H12N5O10P2" # hide