Unverified Commit 131adbf3 authored by Miroslav Kratochvil's avatar Miroslav Kratochvil Committed by GitHub
Browse files

Merge pull request #321 from LCSB-BioCore/mk-tutorial-screening

Screening cleanup + tutorial
parents 866501a6 310d6a07
......@@ -33,8 +33,8 @@ temp.*
.ipynb_checkpoints
# add generated tutorial specifics
docs/src/notebooks/*
!docs/src/notebooks/*.jl
docs/src/notebooks/
# add generated docs and tutorial specifics
docs/src/index.md
......
......@@ -71,7 +71,8 @@ makedocs(
linkcheck = !("skiplinks" in ARGS),
pages = [
"Home" => "index.md",
"Tutorials" => "tutorials.md",
"Quickstart tutorials" => "tutorials.md",
"Advanced tutorials" => "advanced.md",
"Examples and notebooks" => "notebooks.md",
"Function reference" => "functions.md",
"How to contribute" => "howToContribute.md",
......
# COBREXA Advanced tutorials
```@contents
Pages = joinpath.("advanced", filter(x -> endswith(x, ".md"), readdir("advanced")))
Depth = 2
```
# Exploring many model variants
A major goal of COBREXA.jl is to make exploring of many model variants easy and
fast.
One main concept that can be utilized for doing that is implemented in the
function [`screen`](@ref), which takes your model, a list of model _variants_
that you want to explore by some specified _analysis_, and schedules the
analysis of the model variants parallely on the available distributed workers.
In its most basic form, the "screening" may use the slightly simplified variant
of [`screen`](@ref) that is called [`screen_variants`](@ref), which works as
follows:
```julia
m = load_model(StandardModel, "e_coli_core.json")
screen_variants(
m, # the model for screening
[
[], # a variant with no modifications
[with_set_bound("CO2t", lb = 0, ub = 0)], # disable CO2 transport
[with_set_bound("O2t", lb = 0, ub = 0)], # disable O2 transport
[with_set_bound("CO2t", lb = 0, ub = 0), with_set_bound("O2t", lb = 0, ub = 0)], # disable both transports
],
m -> flux_balance_analysis_dict(m, Tulip.Optimizer)["BIOMASS_Ecoli_core_w_GAM"],
)
```
The call specifies a model (the `m` that we have loaded) that is being tested,
then a vector of model variants to be created and tested, and then the analysis
that is being run on each variant -- in this case, we find an optimal steady
state of each of the variants, and check out the biomass production rate at
that state. In this particular case, we are checking what will be the effect of
disabling combinations of CO2 transport and O2 transport in the cells. For
that, we get the following result:
```
4-element Vector{Float64}:
0.8739215022678488
0.46166961413944896
0.21166294973372135
0.21114065173865518
```
The numbers are the biomass production rates for the specified variants. We can
see that disabling O2 transport really does not help the organism much.
## Variant specification
In the above example, we have specified 4 variants, thus the analysis returned
4 different results that correspond with the specifications. Let us have a look
at the precise format of the specification and result.
Importantly, the `variants` argument is of type `Array{Vector{Any}}`, meaning
that it can be an array of any dimensionality that contains vectors. Each of the
vectors specifies precisely one variant, possibly with more modifications
applied to the model in sequence.
For example:
- `[]` specifies no modifications at all
- `[with_set_bound("CO2t", lb=0, ub=10)]` limits the CO2 transport
- `[with_set_bound("CO2t", lb=0, ub=2), with_set_bound("O2t", lb=0, ub=100)]`
severely limits the CO2 transport _and_ slightly restricts the transport of
O2
!!! note "Variants are single-parameter model-transforming functions"
Because the variants are just generators of single parameter functions
that take the model and return its modified version, you can also use
`identity` to specify a variant that does nothing -- `[identity]` is
perfectly same as `[]`
The shape of the variants array is important too, because it is precisely
retained in the result (just as with `pmap`). If you pass in a matrix of
variants, you will receive a matrix of analysis results of the same size. That
can be exploited for easily exploring many combinations of possible model
properties. Let's try exploring a "cube" of possible restricted reactions:
```julia
using IterTools # for cartesian products
res = screen_variants(m,
[
# for each variant we restricts 2 reactions
[with_set_bound(r1, lb=-3, ub=3), with_set_bound(r2, lb=-1, ub=1)]
# the reaction pair will be chosen from a cartesian product
for (r1,r2) in product(
["H2Ot", "CO2t", "O2t", "NH4t"], # of this set of transport reactions
["EX_h2o_e", "EX_co2_e", "EX_o2_e", "EX_nh4_e"], # and this set of exchanges
)
],
m -> flux_balance_analysis_dict(m, Tulip.Optimizer)["BIOMASS_Ecoli_core_w_GAM"],
)
```
As a result, we will receive a full matrix of the biomass productions:
```
4×4 Matrix{Float64}:
0.407666 0.454097 0.240106 0.183392
0.407666 0.485204 0.24766 0.183392
0.314923 0.319654 0.24766 0.183392
0.407666 0.485204 0.24766 0.183392
```
Notably, this shows that O2 transport and NH4 exchange may be serious
bottlenecks for biomass production.
For clarity, you may always annotate the result by zipping it with the
specification structure you have used and collecting the data:
``` julia
collect(zip(
product(
["H2Ot", "CO2t", "O2t", "NH4t"],
["EX_h2o_e", "EX_co2_e", "EX_o2_e", "EX_nh4_e"],
),
res,
))
```
...which gives the following annotated result:
```
4×4 Matrix{Tuple{Tuple{String, String}, Float64}}:
(("H2Ot", "EX_h2o_e"), 0.407666) (("H2Ot", "EX_co2_e"), 0.454097) (("H2Ot", "EX_o2_e"), 0.240106) (("H2Ot", "EX_nh4_e"), 0.183392)
(("CO2t", "EX_h2o_e"), 0.407666) (("CO2t", "EX_co2_e"), 0.485204) (("CO2t", "EX_o2_e"), 0.24766) (("CO2t", "EX_nh4_e"), 0.183392)
(("O2t", "EX_h2o_e"), 0.314923) (("O2t", "EX_co2_e"), 0.319654) (("O2t", "EX_o2_e"), 0.24766) (("O2t", "EX_nh4_e"), 0.183392)
(("NH4t", "EX_h2o_e"), 0.407666) (("NH4t", "EX_co2_e"), 0.485204) (("NH4t", "EX_o2_e"), 0.24766) (("NH4t", "EX_nh4_e"), 0.183392)
```
This may be easily used for e.g. scrutinizing all possible reaction pairs, to
find the ones that are redundant and not.
!!! tip "Notebook available"
A notebook is available that demonstrates
[the screening on a larger scale](../notebooks/6_screening.md).
There are many other variant "specifications" to choose from. You may use
[`with_added_reactions`](@ref), [`with_removed_reactions`](@ref),
[`with_removed_metabolites`](@ref), and others. Function reference contains a
complete list; as a convention, names of the specifications all start with
`with_`.
## Writing custom variant functions
It is actually very easy to create custom specifications that do any
modification that you can implement, to be later used with
[`screen_variants`](@ref) and [`screen`](@ref).
Generally, the "specifications" are supposed to return a _function_ that
creates a modified copy of the model. The copy of the model may be shallow, but
the functions should always prevent modifying the original model structure --
`screen` is keeping a single copy of the original model at each worker to
prevent unnecessary bulk data transport, and if that is changed in-place, all
following analyses of the model will work on inconsistent data, usually
returning wrong results (even randomly changing ones, because of the
asynchronous nature of [`screen`](@ref) execution).
Despite of that, writing a modification is easy. The simplest modification that
"does nothing" (isomorphic to standard `identity`) can be formatted as follows:
```julia
with_no_change = model -> model
```
The modifications may change the model, provided it is copied properly. The
following modification will remove a reaction called "O2t", effectively
removing the possibility to transport oxygen. We require a specific type of
model where this change is easy to perform (generally, not all variants may be
feasible on all model types).
```julia
with_disabled_oxygen_transport = (model::StandardModel) -> begin
# make "as shallow as possible" copy of the `model`.
# Utilizing `deepcopy` is also possible, but inefficient.
new_model = copy(model)
new_model.reactions = copy(model.reactions)
# remove the O2 transport from the model copy
delete!(new_model.reactions, "O2t")
return new_model #return the newly created variant
end
```
Finally, the whole definition may be parametrized as a normal function. The
following variant removes any user-selected reaction:
```julia
with_disabled_reaction(reaction_id) = (model::StandardModel) -> begin
new_model = copy(model)
new_model.reactions = copy(model.reactions)
delete!(new_model.reactions, reaction_id) # use the parameter from the specification
return new_model
end
```
In turn, these variants can be used in [`screen_variants`](@ref) just as we
used [`with_set_bound`](@ref) above:
```julia
screen_variants(
m, # the model for screening
[
[with_no_change],
[with_disabled_oxygen_transport],
[with_disabled_reaction("NH4t")],
],
m -> flux_balance_analysis_dict(m, Tulip.Optimizer)["BIOMASS_Ecoli_core_w_GAM"],
)
```
That should get you the results for all new variants of the model:
```
3-element Vector{Float64}:
0.8739215022674809
0.21166294865468896
1.2907224478973395e-15
```
!!! warning "Custom variants with distributed processing"
If using distributed evaluation, remember the variant-generating functions
need to be defined on all used workers (generating the variants in parallel
on the workers allows COBREXA to run the screening process very
efficiently, without unnecessary sending of bulk model data). Prefixing the
definition with `@everywhere` is usually sufficient for that purpose.
## Passing extra arguments to the analysis function
Some analysis functions may take additional arguments, which you might want to
vary for the analysis. `modifications` argument of
[`flux_balance_analysis_dict`](@ref) is one example of such argument, allowing
you to specify details of the optimization procedure.
[`screen`](@ref) function allows you to do precisely that -- apart from
`variants`, you may also specify an array of `args` of the same shape as
`variants`, the entries of which will get passed together with the generated
model variants to your specified analysis function. If either of the arguments
is missing (or set to `nothing`), it is defaulted to "no modifications" or "no
arguments".
The arguments _must_ be tuples; you may need to make 1-tuples from your data
(e.g. using `(value,)`) if you want to pass just a single argument.
Let's try to use that functionality for trying to find a sufficient amount of
iterations needed for Tulip solver to find a feasible solution:
```julia
screen(m,
args = [(i,) for i in 5:15], # the iteration counts, packed in 1-tuples
analysis = (m,a) -> # `args` elements get passed as the extra parameter here
flux_balance_analysis_vec(m,
Tulip.Optimizer;
modifications=[change_optimizer_attribute("IPM_IterationsLimit", a)],
),
)
```
From the result, we can see that Tulip would need at least 14 iterations to
find a feasible region:
```
11-element Vector{Union{Nothing, Vector{Float64}}}:
nothing
nothing
nothing
nothing
nothing
nothing
nothing
nothing
nothing
[7.47738193404817, 1.8840414375838503e-8, 4.860861010127701, -16.023526104614593, … ]
[7.47738193404817, 1.8840414375838503e-8, 4.860861010127701, -16.023526104614593, … ]
```
......@@ -42,6 +42,13 @@ Modules = [COBREXA]
Pages = map(file -> joinpath("reconstruction", file), readdir("../src/reconstruction"))
```
### Model variant specifications
```@autodocs
Modules = [COBREXA]
Pages = map(file -> joinpath("reconstruction", "modifications", file), readdir("../src/reconstruction/modifications"))
```
## Analysis functions
```@autodocs
......
......@@ -41,7 +41,7 @@ Development history of COBREXA.jl.
<!--insert_quickstart-->
## Tutorials
## Basic and quick-start tutorials
Detailed tutorial content is [available here](tutorials.md).
......@@ -50,6 +50,15 @@ Pages = joinpath.("tutorials", filter(x -> endswith(x, ".md"), readdir("tutorial
Depth = 1
```
## Advanced tutorials
Detailed listing of advanced tutorials is [available here](advanced.md).
```@contents
Pages = joinpath.("advanced", filter(x -> endswith(x, ".md"), readdir("advanced")))
Depth = 1
```
## Example notebooks and workflows
Detailed notebook content is [available here](notebooks.md).
......
......@@ -37,12 +37,7 @@ model = load_model("e_coli_core.xml")
#md # you use. Commercial solvers like `Gurobi`, `Mosek`, `CPLEX`, etc.
#md # require less user engagement.
import Pkg
Pkg.add("Tulip")
Pkg.add("OSQP")
using Tulip
using OSQP
using Tulip, OSQP
# ## Flux balance analysis (FBA)
#
......
# # Exploring model variants with `screen`
#md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/notebooks/@__NAME__.ipynb)
#md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/notebooks/@__NAME__.ipynb)
# This notebooks demonstrates a simple use of [`screen`](@ref) to explore large
# number of model variants. On the toy *E. Coli* model, we try to map the
# impact of knocking out single reactions and 2-reaction combinations.
# First, let's download the data and load the packages and the model
!isfile("e_coli_core.json") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json");
using COBREXA, Tulip
model = load_model(StandardModel, "e_coli_core.json")
# ## Preparing the functions
#
# While we could use the [`with_set_bound`](@ref) to limit the reaction rates,
# but we will make a slightly more precise, usage-tailored modification. This
# is a straightforward modification of the [`with_set_bound`](@ref) that does
# not set bounds "outside" of the original bounds:
with_limited_rate(reaction_id::String, limit) =
model::StandardModel -> begin
m = copy(model)
m.reactions = copy(model.reactions)
r = m.reactions[reaction_id] = copy(model.reactions[reaction_id])
if -limit > r.lb
r.lb = -limit
end
if limit < r.ub
r.ub = limit
end
m
end
# ## Knocking out single reactions
#
# This can be applied to see how much biomass can the model produce with
# certain reactions limited to almost zero:
get_biomass(x) = isnothing(x) ? 0 : x["BIOMASS_Ecoli_core_w_GAM"]
productions = screen_variants(
model,
[[with_limited_rate(rxn, 0.1)] for rxn in reactions(model)],
model -> get_biomass(flux_balance_analysis_dict(model, Tulip.Optimizer)),
)
# This can be nicely plotted to give a more comprehensive overview of which
# reactions are critical or not:
using Plots
bar(reactions(model), productions, orientation = :hor, dpi = 600)
# ## Knocking out reaction combinations
#
# With a bit of help from `IterTools` package, it is very easy to prepare
# matrices of biomass productions from all possible two-reaction knockouts. To
# make it more interesting, we will restrict one of the reactions of the pair a
# bit less, to see more possible outcomes.
#
using IterTools
# We do not process all reactions here to make the notebook rendering
# efficient, but you can easily remove the restriction, and speed the process
# up using parallel execution, by specifying `workers` argument (see
# documentation of [`screen`](@ref) for details)
rxns = reactions(model)
productions = screen_variants(
model,
[
[with_limited_rate(rxn1, 3), with_limited_rate(rxn2, 0.1)] for
(rxn1, rxn2) in product(rxns, rxns)
],
model -> get_biomass(flux_balance_analysis_dict(model, Tulip.Optimizer)),
)
heatmap(productions, dpi = 600)
......@@ -37,8 +37,9 @@ _inc_all.(
"io",
joinpath("io", "show"),
"reconstruction",
joinpath("analysis", "modifications"),
joinpath("reconstruction", "modifications"),
"analysis",
joinpath("analysis", "modifications"),
joinpath("analysis", "sampling"),
joinpath("base", "utils"),
],
......
......@@ -26,7 +26,7 @@ function _do_knockout(model::MetabolicModel, opt_model, gene_ids::Vector{String}
rga = reaction_gene_association(model, rxn_id)
if !isnothing(rga) &&
all([any(in.(gene_ids, Ref(conjunction))) for conjunction in rga])
set_bound(rxn_num, opt_model, ub = 0, lb = 0)
set_optmodel_bound!(rxn_num, opt_model, ub = 0, lb = 0)
end
end
end
......@@ -59,7 +59,7 @@ Change the lower and upper bounds (`lb` and `ub` respectively) of reaction `id`.
function change_constraint(id::String, lb, ub)
(model, opt_model) -> begin
ind = first(indexin([id], reactions(model)))
set_bound(ind, opt_model, lb = lb, ub = ub)
set_optmodel_bound!(ind, opt_model, lb = lb, ub = ub)
end
end
......
......@@ -82,7 +82,7 @@ function warmup_from_variability(
# snatch the bounds from whatever worker is around
lbs, ubs = get_val_from(
workers[1],
:($COBREXA.get_bound_vectors(cobrexa_sampling_warmup_optmodel)),
:($COBREXA.get_optmodel_bounds(cobrexa_sampling_warmup_optmodel)),
)
# free the data on workers
......
......@@ -66,6 +66,7 @@ screen_variants(m,
[reverse_reaction(3), reverse_reaction(6)]
],
mod -> flux_balance_analysis_vec(mod, GLPK.Optimizer)) # run analysis
```
"""
function screen(
model::MetabolicModel;
......
......@@ -60,19 +60,19 @@ function is_solved(optmodel)
end
"""
get_bound_vectors(opt_model)
get_optmodel_bounds(opt_model)
Returns vectors of the lower and upper bounds of `opt_model` constraints, where
`opt_model` is a JuMP model constructed by e.g.
[`make_optimization_model`](@ref) or [`flux_balance_analysis`](@ref).
"""
get_bound_vectors(opt_model) = (
get_optmodel_bounds(opt_model) = (
[-normalized_rhs(lb) for lb in opt_model[:lbs]],
[normalized_rhs(ub) for ub in opt_model[:ubs]],
)
"""
set_bound(index, optimization_model;
set_optmodel_bound!(index, optimization_model;
ub=_constants.default_reaction_rate,
lb=-_constants.default_reaction_rate)
......@@ -87,7 +87,7 @@ change the constraints.
Just supply the constraint `index` and the JuMP model (`opt_model`) that
will be solved, and the variable's bounds will be set to `ub` and `lb`.
"""
function set_bound(
function set_optmodel_bound!(
vind,
opt_model;
ub = _constants.default_reaction_rate,
......
......@@ -4,7 +4,12 @@
Shallow copy of a [`StandardModel`](@ref)
"""
Base.copy(m::StandardModel) = StandardModel(m.id, m.reactions, m.metabolites, m.genes)
Base.copy(m::StandardModel) = StandardModel(
m.id,
reactions = m.reactions,
metabolites = m.metabolites,
genes = m.genes,
)
"""
Base.copy(r::Reaction)
......
......@@ -11,7 +11,7 @@ flux_variability_analysis(model, some_optimizer; bounds = gamma_bounds(0.9))
gamma_bounds(gamma) = z -> (gamma * z, Inf)
"""
(tolerance) = z -> begin
objective_bounds(tolerance)
A bounds-generating function for [`flux_variability_analysis`](@ref) that
limits the objective value to a small multiple of Z₀. Use as `bounds` argument,
......
......@@ -213,8 +213,27 @@ remove_gene!(model, "g1")
remove_gene!(model::StandardModel, gid::String; knockout_reactions::Bool = false) =
remove_genes!(model, [gid]; knockout_reactions = knockout_reactions)
function set_bound(model::StandardModel, reaction_id::String; ub, lb)
"""
set_bound!(model::StandardModel, reaction_id::String; lb, ub)
Change the bounds of a reaction in-place.
"""
function set_bound!(model::StandardModel, reaction_id::String; lb, ub)
reaction = model.reactions[reaction_id]
reaction.lb = lb
reaction.ub = ub
end
"""
set_bound(model::StandardModel, reaction_id::String; lb, ub)
Return a shallow copy of the `model` with reaction bounds changed.
"""
function set_bound(model::StandardModel, reaction_id::String; lb, ub)
m = copy(model)
m.reactions = copy(model.reactions)
r = m.reactions[reaction_id] = copy(model.reactions[reaction_id])
r.lb = lb
r.ub = ub
m
end
"""
with_set_bound(args...; kwargs...)
Specifies a model variant that has a new bound set. Forwards arguments to
[`set_bound`](@ref). Intended for usage with [`screen`](@ref).
"""
with_set_bound(args...; kwargs...) = m -> set_bound(m, args...; kwargs...)
"""
with_removed_metabolites(args...; kwargs...)
Specifies a model variant without specified metabolites. Forwards arguments to
[`remove_metabolites`](@ref). Intended to be used with [`screen`](@ref).
"""
with_removed_metabolites(args...; kwargs...) =
m -> remove_metabolites(m, args...; kwargs...)
"""
with_added_reactions(args...; kwargs...)
Specifies a model variant with reactions added. Forwards the arguments to
[`add_Reactions`](@ref). Intended to be used with [`screen`](@ref).
"""
with_added_reactions(args...; kwargs...) = m -> add_reactions(m, args...; kwargs...)
"""
with_removed_reactions(args...; kwargs...)
Specifies a model variant with specified reactions removed. Forwards arguments
to [`remove_reactions`](@ref). Intended to be used with [`screen`](@ref).
"""
with_removed_reactions(args...; kwargs...) = m -> remove_reactions(m, args...; kwargs...)
......@@ -32,10 +32,10 @@
glucose_index = first(indexin(["EX_glc__D_e"], reactions(model)))
o2_index = first(indexin(["EX_o2_e"], reactions(model)))