Unverified Commit a0564590 authored by Miroslav Kratochvil's avatar Miroslav Kratochvil Committed by GitHub
Browse files

Merge pull request #209 from LCSB-BioCore/mk-tut2

tutorial2 + notebook2
parents 07566c1d 07ef694a
......@@ -30,5 +30,5 @@ temp.*
.ipynb_checkpoints
# add generated tutorial specifics
docs/src/notebooks/*.md
docs/src/notebooks/*.ipynb
!docs/src/notebooks/.git-keep
docs/src/notebooks/
# # Constraint-based analysis of single cell models
#md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/notebooks/@__NAME__.ipynb)
#md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/notebooks/@__NAME__.ipynb)
# In this tutorial we will use `COBREXA`'s `flux_balance_analysis`,
# `flux_variability_analysis`, and `parsimonious_flux_balance_analysis`
# functions to analyze a toy model of *E. coli*.
# If it is not already present, load the model.
!isfile("e_coli_core.xml") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml")
using COBREXA
#md # !!! tip "Tip: use `?` when unsure about a function"
#md # When you are unsure about how a function works, use `?
#md # function_name` to access the docstrings of a function.
#nb # When you are unsure about how a function works, use `? function_name` to
#nb # access the docstrings of a function.
model = load_model("e_coli_core.xml")
# ## Optimization solvers in `COBREXA`
#
# To actually perform any optimization based analysis we need to load an
# optimizer. Any [`JuMP` supported
# optimizers](https://jump.dev/JuMP.jl/stable/installation/#Supported-solvers)
# will work. Here we will use [`Tulip.jl`](https://github.com/ds4dm/Tulip.jl) to
# optimize linear programs and
# [OSQP.jl](https://osqp.org/docs/get_started/julia.html) to optimize quadratic
# programs.
using Tulip
using OSQP
lp_optimizer = Tulip.Optimizer
qp_optimizer = OSQP.Optimizer
# ## Flux balance analysis
#
# Most analysis functions come in two flavours that determine how the output
# will be returned. The suffix of the function determines this behaviour.
# 1. `flux_balance_analysis_vec` returns a vector of fluxes in the order of the
# reactions returned by `reactions` (recall that this is one of the generic
# interface accessors).
# 2. `flux_balance_analysis_dict` returns a
# dictionary mapping reaction ids to fluxes.
# In both cases there are two required inputs: the `model` and the `optimizer`.
vec_soln = flux_balance_analysis_vec(model, lp_optimizer)
#
dict_soln = flux_balance_analysis_dict(model, lp_optimizer)
# ## Problem modification
# Often it is desirable to modify the problem before performing analysis.
# Problem modifications include things like changing the: objective sense,
# optimizer, optimizer attributes, flux constraints, and optimization objective. It
# is also possible to knock out genes. For completeness we demonstrate all of
# the mentioned problem modifications below.
#md # !!! note "Note: modifications added to analysis functions are temporary"
#md # `COBREXA` supports two flavours of modifications to models. The type
#md # shown here is temporary, i.e. after the analysis function terminates the model
#md # is the same as it was before. See the reconstruction tutorial for more information
#md # about ways to permanently change the model.
#nb # Note: `COBREXA` supports two flavours of modifications to models. The type
#nb # shown here is temporary, i.e. after the analysis function terminates the model
#nb # is the same as it was before. See the reconstruction tutorial for more information
#nb # about ways to permanently change the model.
dict_soln = flux_balance_analysis_dict(
model,
qp_optimizer; # will change to Tulip below
modifications = [ # modifications are evaluated in order
change_objective("R_BIOMASS_Ecoli_core_w_GAM"),
change_constraint("R_EX_glc__D_e", -12, -12),
knockout(["b0978", "b0734"]), # knocks out cytochrome oxidase (CYTBD)
change_optimizer(lp_optimizer), # swap back to using Tulip
change_optimizer_attribute("IPM_IterationsLimit", 110), # this is a Tulip specific attribute, other optimizers have other attributes
change_sense(MAX_SENSE), # another valid option is MIN_SENSE
],
)
# ## Flux variability analysis
# Flux variability analysis can be performed in serial or in parallel. Here the
# serial version is demonstrated, see the more advanced tutorials for the
# parallel implementation.
fva_mins, fva_maxs = flux_variability_analysis_dict(
model,
Tulip.Optimizer;
bounds = objective_bounds(0.99), # the objective function is allowed to vary by ~1% from the FBA optimum
modifications = [
change_optimizer_attribute("IPM_IterationsLimit", 500),
change_constraint("R_EX_glc__D_e", -10, -10),
change_constraint("R_EX_o2_e", 0.0, 0.0),
],
)
fva_maxs["R_EX_ac_e"]["R_EX_ac_e"] # acetate exchange maximized, acetate exchange flux
# ## Parsimonious flux balance analysis
# Parsimonious flux balance analysis finds a unique flux solution that minimizes
# the sum of fluxes of the system subject to maintaining the same objective
# value as the flux balance analysis solution. This function requires the use of
# a quadratic program optimizer (`OSQP.jl` is used in this example). Like in
# `flux_balance_analysis`, two variants exist where the suffix determines the
# function output.
#md # !!! note "Note: A quadratic programming optimizer is required"
#md # If you are using an optimizer that supports quadratic programming, like
#md # `Gurobi.jl`, then you only need to specify one optimizer e.g.
#md # `parsimonious_flux_balance_analysis_dict(model, Gurobi.Optimizer)`.
#md # All problem and/or optimizer modifications should be passed to the
#md # `modifications` keyword argument. The `qp_modifications` keyword
#md # argument should be ignored in this case. If you are using two optimizers,
#md # then the optimizer passed first is used as the linear
#md # programming optimizer, and the second as the quadratic
#md # programming optimizer. This is demonstrated below.
#nb # If you are using an optimizer that supports quadratic programming, like
#nb # `Gurobi.jl`, then you only need to specify one optimizer e.g.
#nb # `parsimonious_flux_balance_analysis_dict(model, Gurobi.Optimizer)`.
#nb # All problem and/or optimizer modifications should be passed to the
#nb # `modifications` keyword argument. The `qp_modifications` keyword
#nb # argument should be ignored in this case. If you are using two optimizers,
#nb # then the optimizer passed first is used as the linear
#nb # programming optimizer, and the second as the quadratic
#nb # programming optimizer. This is demonstrated below.
dict_soln = parsimonious_flux_balance_analysis_dict(
model,
lp_optimizer;
modifications = [
change_constraint("R_EX_glc__D_e", -12, -12),
change_optimizer_attribute("IPM_IterationsLimit", 500), # modifies the required optimizer input (lp_optimizer in this case)
],
qp_modifications = [
change_optimizer(qp_optimizer), # only necessary if the first optimizer cannot handle QPs
change_optimizer_attribute("verbose", false),
],
)
#
vec_soln = parsimonious_flux_balance_analysis_vec(
model,
lp_optimizer;
modifications = [
change_constraint("R_EX_glc__D_e", -12, -12),
change_optimizer_attribute("IPM_IterationsLimit", 500), # modifies the required optimizer input
],
qp_modifications = [
change_optimizer(qp_optimizer), # only necessary if the first optimizer cannot handle QPs
change_optimizer_attribute("verbose", false),
],
)
# # Finding balance and variability of constraint-based models
#md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/notebooks/@__NAME__.ipynb)
#md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/notebooks/@__NAME__.ipynb)
# Here we use [`flux_balance_analysis`](@ref),
# [`flux_variability_analysis`](@ref), and
# [`parsimonious_flux_balance_analysis`](@ref) of `COBREXA.jl` functions to
# analyze a toy model of *E. coli*.
# If it is not already present, download the model.
!isfile("e_coli_core.xml") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml")
using COBREXA
#md # !!! tip "Tip: use `?` to get quick help about functions"
#md # When you are unsure about how a function works, write `?
#md # function_name` to see the function reference documentation.
model = load_model("e_coli_core.xml")
# ## Optimization solvers in `COBREXA`
#
# To actually perform any optimization based analysis we need to load an
# optimizer. Any [`JuMP.jl`-supported
# optimizers](https://jump.dev/JuMP.jl/stable/installation/#Supported-solvers)
# will work. Here, we will use [`Tulip.jl`](https://github.com/ds4dm/Tulip.jl)
# to optimize linear programs and
# [`OSQP.jl`](https://osqp.org/docs/get_started/julia.html) to optimize quadratic
# programs.
using Tulip
using OSQP
# ## Flux balance analysis (FBA)
#
# Most analysis functions come in several variants that produce different types
# of output. All of them usually require a model and `JuMP.jl`-compatible
# optimizer to work in the model.
#
# In the case of FBA, you may choose from these variants (here using the
# `Tulip` optimizer):
vec_soln = flux_balance_analysis_vec(model, Tulip.Optimizer)
#
dict_soln = flux_balance_analysis_dict(model, Tulip.Optimizer)
# ## Modifications
# Often it is desirable to add a slight modififaction to the problem before
# performing analysis, to see e.g. differences of the model behavior caused by
# the change introduced.
#
# `COBREXA.jl` supports several modifications by default, which include
# changing objective sense, optimizer attributes, flux constraints,
# optimization objective, reaction and gene knockouts, and others.
dict_soln = flux_balance_analysis_dict(
model,
OSQP.Optimizer;
modifications = [ # modifications are applied in order
## this changes the objective to maximize the biomass production
change_objective("R_BIOMASS_Ecoli_core_w_GAM"),
## this fixes a specific rate of the glucose exchange
change_constraint("R_EX_glc__D_e", -12, -12),
## this knocks out two genes, i.e. constrains their associated reactions to zero.
knockout(["b0978", "b0734"]), ## the gene IDs are cytochrome oxidase (CYTBD)
## ignore the optimizer specified above and change it to Tulip
change_optimizer(Tulip.Optimizer),
## set a custom attribute of the Tulip optimizer (see Tulip docs for more possibilities)
change_optimizer_attribute("IPM_IterationsLimit", 110),
## explicitly tell the optimizer to maximize the new objective
change_sense(MAX_SENSE),
],
)
# ## Flux variability analysis (FVA)
# The default FVA in [`flux_variability_analysis`](@ref) returns maximized and
# minimized reaction fluxes in a matrix. Here we use the dictionary variant in
# flux_variability_analysis_dict, to show how to easily access specific fluxes
# from its results.
fva_mins, fva_maxs = flux_variability_analysis_dict(
model,
Tulip.Optimizer;
bounds = objective_bounds(0.99), # the objective function is allowed to vary by ~1% from the FBA optimum
modifications = [
change_optimizer_attribute("IPM_IterationsLimit", 500),
change_constraint("R_EX_glc__D_e", -10, -10),
change_constraint("R_EX_o2_e", 0.0, 0.0),
],
)
#
fva_maxs["R_EX_ac_e"]["R_EX_ac_e"] # get the maximal acetate exchange flux
# ## Parsimonious flux balance analysis (pFBA)
# Parsimonious flux balance analysis (here in
# [`parsimonious_flux_balance_analysis`](@ref) finds a unique flux solution
# that minimizes the squared sum of fluxes of the system subject, while
# maintaining the same objective value as the flux balance analysis solution.
# Since we are optimizing a quadratic objective, we also need to switch to a
# quadratic optimizer. In this case, OSQP will work. We demonstrate it on the
# dictionary-returning variant of pFBA,
# [`parsimonious_flux_balance_analysis_dict`](@ref):
dict_soln = parsimonious_flux_balance_analysis_dict(
model,
OSQP.Optimizer;
modifications = [
change_optimizer_attribute("verbose", false), # silence the optimizer (OSQP is very verbose by default)
change_constraint("R_EX_glc__D_e", -12, -12),
],
)
# The function also has the expectable second variant that returns a vector of
# solutions, in [`parsimonious_flux_balance_analysis_vec`](@ref). Here, we
# utilize it to show how to use different optimizers for finding the optimum
# and for solving the quadratic problem. That may be preferable if the
# optimizer qualities differ for the differing tasks. pFBA allows you to
# specify `qp_modifications` that are applied after the original optimum is
# found, and before the quadratic part of the problem solving begins.
vec_soln = parsimonious_flux_balance_analysis_vec(
model,
Tulip.Optimizer; # start with Tulip
modifications = [
change_constraint("R_EX_glc__D_e", -12, -12),
change_optimizer_attribute("IPM_IterationsLimit", 500), # we may change Tulip-specific attributes here
],
qp_modifications = [
change_optimizer(OSQP.Optimizer), # now switch to OSQP (Tulip wouldn't be able to finish the computation)
change_optimizer_attribute("verbose", false), # and silence it.
],
)
# Basic analysis of constraint-based models
`COBREXA.jl` supports several common analysis methods that are often used for
exploring the biological models. The currently supported methods include
- Flux balance analysis (FBA), in function [`flux_balance_analysis`](@ref)
- Flux variability analysis (FVA), in [`flux_variability_analysis`](@ref)
- Flux sampling by hit-and-run algorithm, in [`hit_and_run`](@ref)
- Parsimonious flux balance analysis (pFBA), in
[`parsimonious_flux_balance_analysis`](@ref)
Other analysis methods are either in development and testing, or may be
specified or customized by the user. Implementing new analyses is generally
feasible; you may want to watch [the `COBREXA.jl`
repository](https://github.com/LCSB-BioCore/COBREXA.jl) for newly incoming
analysis methods.
`COBREXA.jl` additionally exports several helper functions that may help you in
running custom analysis methods:
- you can convert all types of [`MetabolicModel`](@ref)s to `JuMP.jl` models
using [`make_optimization_model`](@ref), then you may explore and analyze the
models independently of `COBREXA.jl` using the tools provided by `JuMP.jl`
- there is a system of analysis *modifications* that allows you to easily
specify various adjustments to the existing analysis methods
!!! tip "Notebook available!"
Examples of running the analysis functions are [available
here](../notebooks/2_finding_balance.md).
## Optimization problem solvers
For solving most analysis tasks, you need an optimization problem solver that
is compatible with `JuMP.jl`. You may refer to the [official list of supported
solvers](https://jump.dev/JuMP.jl/stable/installation/#Supported-solvers), but
we generally recommend to use either of these:
- `Tulip` (pure Julia implementation) for linear problems
- `GLPK` (based on a C library) for linear and mixed-integer problems
- `Gurobi` (based on an external library, but requires a license that is free
for academic use) for linear, mixed-integer and quadratic problems
- `OSQP` as a free alternative to `Gurobi` for solving quadratic problems
All solvers can be installed using the Julia package manger.
## Flux balance analysis
The above methods generally accept 2 arguments: the model, and the optimizer.
In particular, having installed e.g. GLPK from the above optimizers, you can
run FBA on [your favorite *E. Coli* core
model](http://bigg.ucsd.edu/models/e_coli_core) as follows:
```
using COBREXA
m = load_model(CoreModel, "e_coli_core.xml")
using GLPK
opt_model = flux_balance_analysis(m, GLPK.Optimizer)
```
After a short while (the solver machinery usually needs to get precompiled
before the first use), you should get `opt_model`, which is now an optimized
`JuMP.jl` model. It may print out information like this:
```
A JuMP Model
Maximization problem with:
Variables: 95
Objective function type: JuMP.GenericAffExpr{Float64,JuMP.VariableRef}
`JuMP.GenericAffExpr{Float64,JuMP.VariableRef}`-in-`MathOptInterface.EqualTo{Float64}`: 73 constraints
`JuMP.GenericAffExpr{Float64,JuMP.VariableRef}`-in-`MathOptInterface.LessThan{Float64}`: 192 constraints
Model mode: AUTOMATIC
CachingOptimizer state: ATTACHED_OPTIMIZER
Solver name: GLPK
Names registered in the model: lbs, mb, ubs, x
```
From that, you can extract the required information with the JuMP interface,
loaded with `using JuMP`. With that,
- `objective_value(opt_model)` prints roughly `0.87`,
- `value.(opt_model[:x])` prints the vector of individual reaction fluxes.
For convenience, you can get the results nicely formatted without manually
getting them out of the optimized models:
- [`flux_balance_analysis_vec`](@ref) works like
[`flux_balance_analysis`](@ref), but returns the vector of fluxes directly
(in the same order as in `reactions(m)`)
- [`flux_balance_analysis_dict`](@ref) returns a dictionary with the fluxes,
keyed by reaction identifier
## Flux variability analysis
FVA is implemented in [`flux_variability_analysis`](@ref), which returns the
usual matrix of minimal and maximal feasible fluxes for each reaction in the
model.
The result of calling `flux_variability_analysis(m, GLPK.Optimizer)` may look
like this (possibly showing minor numeric errors in the GLPK optimizer):
```
95×2 Array{Float64,2}:
0.0 0.0
6.00725 6.00725
3.64414e-13 3.17348e-13
3.2149 3.2149
```
You can relax the optimality requirement of the reactions by specifying a wider
objective bound, getting a wider range of reaction fluxes, e.g. using
[`gamma_bounds`](@ref) (for COBRA-like γ-bound) and [`objective_bounds`](@ref)
(for a multiplicative bound around the original optimal objective).
As a result, `flux_variability_analysis(m, GLPK.Optimizer;
bounds=gamma_bounds(0.8))` will return a much less constrained system:
```
95×2 Array{Float64,2}:
0.0 0.0
0.754299 10.1285
-4.42865 0.0
2.57192 3.2149
```
You may additionally restrict the analysis to a list of reactions (passing the
list as the second argument, see documentation of
[`flux_variability_analysis`](@ref)), or retrieve a dictionary of the resulting
fluxes with [`flux_variability_analysis_dict`](@ref).
## Parsimonious flux balance analysis
Parsimonious flux balance analysis (pFBA) requires a solver that can handle
quadratic problems. You may use e.g. `OSQP` or `Gurobi`.
Otherwise, the function behaves just like [`flux_balance_analysis`](@ref):
- [`parsimonious_flux_balance_analysis`](@ref)`(m, OSQP.Optimizer)` will return
a `JuMP.jl` model optimized to a slightly more realistic (parsimonious)
optimum than plain FBA,
- [`parsimonious_flux_balance_analysis_vec`](@ref) will return the fluxes in a
vector,
- [`parsimonious_flux_balance_analysis_dict`](@ref) will return a
reaction-keyed dictionary.
## Flux sampling
For the [`hit_and_run`](@ref), you need a previously optimized and constrained
model from another analysis function, such as [`flux_balance_analysis`](@ref),
or created by [`make_optimization_model`](@ref). You may need to carefully
choose the number of iterations and sample sizes to match your model; see the
documentation of [`hit_and_run`](@ref) for details.
As an example, you can run the sampling for 100 thousand iterations with:
```
hit_and_run(100_000, make_optimization_model(m, GLPK.Optimizer))
```
You should receive a matching flux sample with the (default) 1000 samples in a
matrix that may look like this one:
```
95×1000 Array{Float64,2}:
0.0 0.0 … 0.0
7.82669 9.38895 3.30653
7.13016 4.36813 9.64434
-0.290925 -9.3037 -0.0908829
24.1294 17.4794 0.0511032
⋮ ⋱
-16.243 -37.4763 -5.57301
0.0 0.0 0.0
-0.310819 -1.20057e-7 -2.13126
5.71597e-5 0.00990677 0.692399
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment