Unverified Commit 56e5902f authored by St. Elmo's avatar St. Elmo
Browse files

housekeeping changes

parent 85e7f1cf
......@@ -68,7 +68,7 @@ If you run a non-standard platform (e.g. a customized operating systems), or if
## Quick start guide
In addition to `COBREXA`, you also need to include a Julia package which provides an appropriate solver. One such solver is `Tulip`, which is provided by the [Tulip.jl](https://github.com/ds4dm/Tulip.jl) package.
In addition to `COBREXA`, you also need to include a Julia package that provides an appropriate optimizer. One such optimizer for linear programs is `Tulip`, which is provided by the [Tulip.jl](https://github.com/ds4dm/Tulip.jl) package.
```julia
] add Tulip
......
# Optimization Based Analysis
A selection of standard COBRA functions have been implemented to make basic
model analysis more convenient.
Additionally, `COBREXA.jl` allows you to easily formulate your own optimization
problems using the structure of a constraint based model.
This makes it easy to experiment with custom algorithms etc.
## Flux balance analysis (FBA)
Flux balance analysis solves the linear program,
```math
\begin{aligned}
& \underset{v}{\text{max}}
& \sum_{i}^I {w_i}{v_i} \\
& \text{s. t.}
& Sv = 0 \\
& & v_{\text{LB}} \leq v \leq v_{\text{UB}} \\
\end{aligned}
```
using any `JuMP` compatible solver. Typically ``I`` is a singleton set that
only includes the index of the biomass objective function, ``v_\mu``, and
weight, ``w_\mu=1``.
Here, we use `Tulip.jl`, a pure Julia interior point linear program solver,
with the `fba` function from `COBREXA.jl`.
```
using COBREXA
using JuMP
using Tulip
if !isfile("e_coli_core.json")
download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json")
end
model = read_model("e_coli_core.json")
biomass = findfirst(model.reactions, "BIOMASS_Ecoli_core_w_GAM")
optimizer = Tulip.Optimizer
sol = flux_balance_analysis(model, biomass, optimizer)
```
Note that the result of the optimization problem, `sol` maps fluxes to reaction
`id`s in the model, to simplify down stream analysis.
Since the fluxes are mapped to a dictionary, it makes it simple to export them
as a JSON file for visualization in, e.g.
[Escher](https://escher.github.io/#/).
```
using JSON
open("fluxes.json", "w") do io
JSON.print(io, sol)
end
```
## Solution inspection
Sometimes it is useful to investigate which reactions consume or produce a
certain metabolite, or which exchange reactions are active. This functionality
is exposed via `metabolite_fluxes` and `exchange_reactions`:
```
consuming, producing = metabolite_fluxes(sol, model)
consuming["atp_c"]
```
## Parsimonious FBA
Parsimonious FBA (pFBA) solves a two stage optimization problem. First, a
classic FBA problem is solved to identify the unique maximum of the objective.
However, it should be noted that the fluxes from FBA are not unique (i.e. many
fluxes may yield the objective optimum). To yield a unique set of fluxes, and
remove internal futile cycles, a secondary quadratic problem is imposed *ad
hoc*.
Suppose that FBA has found the optimum of ``v_\mu = \mu``, pFBA then solves:
```math
\begin{aligned}
& \underset{v}{\text{min}}
& \sum_{i} {v_i^2} \\
& \text{s. t.}
& Sv = 0 \\
& & v_{\text{LB}} \leq v \leq v_{\text{UB}} \\
& & v_\mu = \mu
\end{aligned}
```
again using any JuMP compatible solver(s). In the `COBREXA.jl` implementation
of pFBA, both the FBA and QP problem are solved internally in `pfba`, using
similar input arguments as in `fba`. If multiple solvers are given, the first
solver is used to solve the LP, and the second solver the QP, otherwise the
same solver is used to solve both problems.
This is useful if the QP solver does not handle the LP problem well, as with
OSQP.
An alternative, related formulation of this idea exists, called
"CycleFreeFlux".
This replaces the quadratic formulation with an L1 (taxi cab) norm. While this
should also remove futile cycles, it doesn't have the same uniqueness qualities
and doesn't really have much benefit beyond only solving two linear programs.
See [Building your own optimization analysis script](@ref) for ideas about how
to implement this yourself if you really need it.
Here, we use `Tulip.jl` followed by `OSQP.jl`, with the `pfba` function from
`COBREXA.jl`. Note that `OSQP.jl` has iffy performance, and is only included
here because it is open source. We recommend that a commercial solver, e.g.
`Gubobi.jl`, be used to simplify your user experience.
```
using OSQP
atts = Dict("eps_abs" => 5e-4,"eps_rel" => 5e-4, "max_iter" => 100_000, "verbose"=>false) # set solver attributes for QSQP
sol = pfba(model, biomass, [Tulip.Optimizer, OSQP.Optimizer]; solver_attributes=Dict("opt1" => Dict{Any, Any}(), "opt2" => atts))
```
## Flux variability analysis (FVA)
Flux variability analysis can also be used to investigate the degeneracy
associated with flux balance analysis derived solutions (see also [Sampling
Tools](@ref)).
`COBREXA.jl` exposes `fva` that sequentially maximizes and minimizes each
reaction in a model subject to the constraint that each optimization problem
also satisfies an initial FBA type objective optimum, below denoted by
``v_{\mu}=\mu``,
```math
\begin{aligned}
& \underset{v}{\text{max or min}}
& v_i \\
& \text{s. t.}
& Sv = 0 \\
& & v_{\text{LB}} \leq v \leq v_{\text{UB}} \\
& & v_{\mu} = \mu \\
\end{aligned}
```
## Building your own optimization analysis script
`COBREXA.jl` also makes it simple to construct customized optimization problems
by making judicious use of [JuMP](https://jump.dev/).
Convenience functions make optimization problem construction, modification and
data extraction from JuMP result objects easy.
```
model = read_model("e_coli_core.json")
cbm, v, mb, ubs, lbs = build_cbm(model)
glucose_index = model[findfirst(model.reactions, "EX_glc__D_e")]
set_bound(glucose_index, ubs, lbs; ub=-12.0, lb=-12.0)
set_optimizer(cbm, Tulip.Optimizer)
@objective(cbm, Max, v[model[biomass]])
optimize!(cbm)
sol = map_fluxes(v, model)
```
# Model IO
## Reading constraint based models
Currently, JSON and Matlab formatted models can be imported.
```
using COBREXA
if !isfile("e_coli_core.json")
download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json")
end
model = read_model("e_coli_core.json")
model # pretty print the model
```
## Writing constraint based models
Currently, JSON and Matlab models can be exported.
```
save_model(model, "e_coli_core2.json")
```
## IO Problems?
Please let me know when you run into model import/export problems by filing an issue.
# Model Construction
## Defining genes
Genes are represented by the `Gene` type in `COBREXA.jl`, see [Model
Structure](@ref) for details. `Gene`s can be constructed using either an empty
constructor, or a constructor taking only the string `id` of the gene.
```
using COBREXA
gene = Gene("gene1")
gene.name = "gene 1 name"
gene # pretty printing
```
Helper functions from Base have also been overwritten to make accessing arrays
of genes easy.
## Defining metabolites
Metabolites are represented by the `Metabolite` type in `COBREXA.jl`, see
[Model Structure](@ref) for details. The simplest way to define a new
metabolite is by using the empty constructor `Metabolite()`.
Alternatively, `Metabolite(id::String)` can be used to assign only the `id`
field of the `Metabolite`.
The other fields can be modified as usual, if desired.
```
using COBREXA # hide
atp = Metabolite("atp")
atp.name = "Adenosine triphosphate"
atp.formula = "C10H12N5O13P3"
atp.charge = -4 # note that the charge is an Int
atp # pretty printing
```
Basic analysis of `Metabolite`s is also possible. The molecular structure of a
metabolite can be inspected by calling `get_atoms(met::Metabolite)`. This
function is useful for checking atom balances across reactions, or the entire
model.
```
using COBREXA # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
get_atoms(atp)
```
Helper functions from Base have also been overwritten to make accessing arrays
of metabolites easy.
## Defining reactions
Reactions are represented by the `Reaction` type in `COBREXA.jl`, see [Model
Structure](@ref) for details. The simplest way to define a new reaction is by
using the empty constructor `Reaction()`. All the other fields still need to
be assigned.
Another option is to use
`Reaction(id::String, metabolites::Dict{Metabolite, Float64}, dir=:bidirectional)`,
which assigns the reaction `id`, the reaction stoichiometry (through the
metabolite dictionary argument), and the directionality of the reaction. The
remaining fields still need to be assigned, if desired.
```
using COBREXA # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
atp.charge = -4 # hide
gene = Gene("gene1") # hide
adp = Metabolite("adp") # define another metabolite
metdict = Dict(atp => -1.0, adp => 1.0) # nb stoichiometries need to be floats
rxn = Reaction("dummy rxn", metdict, :forward)
rxn.annotation["ec-code"] = ["0.0.0.0"]
rxn.grr = [[gene]] # only gene1 is required for this reaction to work
rxn # pretty printing
```
See the discussion in [Model Structure](@ref) about how to assign `grr` to a
reaction.
Yet another way of defining a reaction is through overloading of the operators:
`*, +, ∅, ⟶, →, ←, ⟵, ↔, ⟷`. The longer and shorter arrows mean the same
thing, i.e. `⟶` is the same as `→`, etc. The other fields of the reaction
still need to be set directly.
```
using COBREXA # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
atp.charge = -4 # hide
adp = Metabolite("adp") # hide
adp.formula = "C10H12N5O10P2" # hide
another_rxn = 2.0adp ⟶ 2.0*atp # forward reaction
another_rxn.id = "another dummy rxn"
another_rxn
```
When building exchange, demand, or sinks reactions the `∅` empty metabolite
should be used to indicate that a metabolite is being created or destroyed.
```
using COBREXA # hide
adp = Metabolite("adp") # hide
adp.formula = "C10H12N5O10P2" # hide
ex_rxn = ∅ ⟷ adp # exchange reaction
```
It is also possible to check if a reaction is mass balanced by using
`is_mass_balanced(rxn::Reaction)`. Note, this function requires that all the
metabolites in the reaction have formulas assigned to them to work properly.
```
using COBREXA # hide
atp = Metabolite("atp") # hide
atp.name = "Adenosine triphosphate" # hide
atp.formula = "C10H12N5O13P3" # hide
atp.charge = -4 # hide
adp = Metabolite("adp") # hide
adp.formula = "C10H12N5O10P2" # hide
unbal_rxn = 2.0adp ⟶ 1.0*atp # unbalanced reaction
is_mass_balanced(unbal_rxn)
```
Helper functions from Base have also been overwritten to make accessing arrays
of reactions easy.
## Building models
Once you have defined some metabolites, genes, and reactions, you can construct
a model! This is most simply done by using the empty model constructor:
The fields of `StandardModel` can then be assigned as usual.
```
using COBREXA
atp = Metabolite("atp")
adp = Metabolite("adp")
glc = Metabolite("glc")
h2o = Metabolite("h2o")
pp = Metabolite("pp") # pi (phosphate) usually means π, so pp is used here instead
lac = Metabolite("lac")
g1 = Gene("gene1")
g2 = Gene("gene2")
g3 = Gene("gene3")
g4 = Gene("gene4")
catabolism = glc + 2.0*adp + 2.0*pp 2.0 * lac + 2.0 * h2o + 2.0 * atp
catabolism.id = "catabolism"
catabolism.grr = [[g1, g2], [g3, g4]]
anabolism = 10.0 * atp + 10.0*h2o 10.0*adp + 10.0*pp
anabolism.id = "anabolism"
glc_ex = glc # exchanges are defined like this so negative fluxes mean import
lac_ex = lac # positive flux means export
mets = [atp, adp, pp, h2o, glc, lac]
genes = [g1, g2, g3, g4]
rxns = [catabolism, anabolism, lac_ex, glc_ex]
model = StandardModel()
model.id = "Test model"
add!(model, mets)
add!(model, rxns)
add!(model, genes)
model # pretty printing
```
Here the `add` functions were used to add the reactions, metabolites and genes
to the model.
Checking for duplicates of genes, metabolites or reactions can also be done.
Note that duplicate checking is NOT performed when models are imported. If you
suspect the model quality you should check each reaction, metabolite and gene
yourself.
```
using COBREXA
met1 = Metabolite()
met1.id = "met1"
met1.name = "Metabolite 1"
met1.formula = "C6H12O6N"
met1.charge = 1
met1.compartment = "c"
met1.notes = Dict("notes"=>["This is a made up metabolite", "Another note"])
met1.annotation = Dict("sboterm" => "sbo000001", "kegg.compound" => ["C0001", "C0010"])
met2 = Metabolite("met2")
met2.formula = "C6H12O6N"
met3 = Metabolite("met3")
met3.formula = "X"
met3.annotation = Dict("sboterm" => "sbo00001", "kegg.compound" => ["C02222", "C0001"])
mets = [met1, met2, met3]
dup, ind = check_duplicate_annotations(mets, met3)
```
```
mms = check_same_formula([met3, met1], met2)
```
Similar functionality exists for genes and reactions. Duplicate reactions,
metabolites or genes can be removed using `rm!`.
```
using COBREXA # hide
atp = Metabolite("atp")
atp2 = Metabolite("atp2")
mets = [atp, atp2]
model = StandardModel()
add!(model, mets)
rm!(model, atp2)
```
A model can also be checked to ensure that no metabolites or genes are missing
relative to the reactions. `fix_model` also ensures that no extra metabolites
are present.
```
using COBREXA # hide
atp = Metabolite("atp")
adp = Metabolite("adp")
anabolism = 10.0 * atp 10.0*adp
anabolism.id = "anabolism"
mets = [atp]
rxns = [anabolism]
model = StandardModel()
model.id = "Test model"
add!(model, mets) # missing adp
add!(model, rxns)
fix_model!(model) # adp added
model # now has 2 metabolites
```
Helper functions from Base have also been overwritten to make accessing
reactions, metabolites and genes easy from a model.
# Model Structure
Before reading, writing, or building models, it is important to understand how
they are represented internally in `COBREXA.jl`.
Each model is a struct of the type `StandardModel`, which is composed of a model
`id`, and arrays of `Reaction`s, `Metabolite`s and `Gene`s.
The fields of `Reaction`, `Metabolite`, `Gene` types are shown below. When
reading, writing, building or analysing models, these fields are what is used
by `COBREXA.jl`.
Note, the format of `grr` (gene reaction rule) in `Reaction` should be a nested
array, like `[[g1, g2], [g3, g4], ...]`.
Each sub-array, e.g. `[g1, g2]`, is composed of essential genes
(`g1::COBREXA.Gene`, etc.) for the reaction to function.
Thus, if the reaction requires (`g1` and `g2`) or (`g3` and `g4`) to function,
then this would be represented by `[[g1, g2], [g3, g4]]` in `grr`. Also note,
the metabolites dictionary field of `Reaction` maps a `Metabolite` to its
stoichiometrix coefficient, i.e. it represents the reaction equation.
Also note, the format used the `annotation` field in `Reaction`, `Metabolite`
and `Gene` should always be a dictionary with string ids mapped to vectors of
entries EXCEPT for "sbo" terms, which are not vectors but strings, e.g.:
```
reaction.annotation = Dict(
"bigg.reaction" => ["PFK"],
"rhea" => ["16111", "16109", "16110", "16112"],
"sbo" => "SBO:0000176",
"ec-code" => ["2.7.1.11"],
)
```
...is correct, but any other format will cause issues.
# Sampling Tools
It is well known that the FBA does not yield a unique solution, i.e. many flux
distributions are capable of satisfying the system constraints as well as
optimizing the imposed objective function.
Let the feasible space be defined by
``\mathcal{P} = \left\{ v : Sv = 0 \cap v_{\text{min}} \leq v \leq v_{\text{max}} \right\}``.
Sampling methods have been developed to uniformly sample from this feasible
solution space. `COBREXA.jl` implements `hit_and_run` to
sample from ``\mathcal{P}``.
```
using COBREXA
using JuMP
using Tulip
model = read_model("e_coli_core.json")
optimizer = COBREXA.Tulip.Optimizer
biomass = findfirst(model.reactions, "BIOMASS_Ecoli_core_w_GAM")
cons = Dict("EX_glc__D_e" => (-12.0, -12.0))
sol = fba(model, biomass, optimizer, constraints=cons) # classic flux balance analysis
cons["BIOMASS_Ecoli_core_w_GAM"] = (sol["BIOMASS_Ecoli_core_w_GAM"], sol["BIOMASS_Ecoli_core_w_GAM"]*0.99)
samples = hit_and_run(100_000, model, optimizer; keepevery=10, samplesize=5000, constraints=cons)
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment