Unverified Commit 10389068 authored by St. Elmo's avatar St. Elmo
Browse files

Added notebook generation files back

parent 9e8bd001
# # Loading, converting, and saving models
#md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/notebooks/@__NAME__.ipynb)
#md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/notebooks/@__NAME__.ipynb)
# `COBREXA` can load models stored in `.mat`, `.json`, and `.xml` formats (with
# the latter denoting SBML formatted models).
# We will primarily use the *E. Coli* "core" model to demonstrate the utilities
# found in `COBREXA`. First, let's download the model in several formats.
## Downloads the model files if they don't already exist
!isfile("e_coli_core.mat") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.mat", "e_coli_core.mat");
!isfile("e_coli_core.json") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json");
!isfile("e_coli_core.xml") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml");
# Now, load the package:
using COBREXA
#md # !!! tip "Save bandwidth!"
#md # The published models usually do not change very often. It is
#md # therefore pretty useful to save them to a central location and load
#md # them from there. That saves your time, and does not unnecessarily
#md # consume the connectivity resources of the model repository.
# ## Loading models
# Load the models using the [`load_model`](@ref) function. Each model is able to
# "pretty-print" itself, hiding the inner complexity.
mat_model = load_model("e_coli_core.mat")
#
json_model = load_model("e_coli_core.json")
#
sbml_model = load_model("e_coli_core.xml")
#
#md # !!! note "Note: `load_model` infers the output type from the file extension"
#md # Notice how each model was read into memory as a model type corresponding
#md # to its file type, i.e. the file ending with `.json` loaded as a
#md # [`JSONModel`](@ref), the file ending with `.mat` loaded as [`MATModel`](@ref), and the
#md # file ending with `.xml` loaded as an [`SBMLModel`](@ref).
# You can directly inspect the model objects, although only with a specific way
# for each specific type.
# JSON models contain their corresponding JSON:
json_model.json
# SBML models contain a complicated structure from [`SBML.jl`
# package](https://github.com/LCSB-BioCore/SBML.jl):
typeof(sbml_model.sbml)
# MAT models contain MATLAB data:
mat_model.mat
# ## Using the generic interface to access model details
# To prevent the complexities of object representation, `COBREXA.jl` uses a set
# of generic interface functions that extract various important information
# from all supported model types. This approach ensures that the analysis
# functions can work on any data.
# For example, you can check the reactions and metabolites contained in SBML
# and JSON models using the same accessor:
reactions(json_model)
#
reactions(sbml_model)
#
issetequal(reactions(json_model), reactions(mat_model)) # do models contain the same reactions?
# All accessors are defined in a single file in COBREXA source code; you may
# therefore get a list of all accessors as follows:
using InteractiveUtils
for method in filter(
x -> endswith(string(x.file), "MetabolicModel.jl"),
InteractiveUtils.methodswith(MetabolicModel, COBREXA),
)
println(method.name)
end
# ## Converting between model types
# It is possible to convert model types to-and-fro. To do this, use the
# `convert` function, which is overloaded from Julia's `Base`.
#md # !!! danger "Data loss may occur when converting between models"
#md # The conversion of models only uses the data accessible through the
#md # generic accessors. Other data may get lost.
m = convert(MATModel, json_model)
# `m` will now contain the MATLAB-style matrix representation of the model:
Matrix(m.mat["S"])
# The loading and conversion can be combined using a shortcut:
m = load_model(MATModel, "e_coli_core.json")
# ## Saving and exporting models
# `COBREXA.jl` supports exporting the models in JSON and MAT format, using [`save_model`](@ref).
save_model(m, "converted_model.json")
save_model(m, "converted_model.mat")
# If you need a non-standard suffix, use the type-specific saving functions:
save_json_model(m, "file.without.a.good.suffix")
save_mat_model(m, "another.file.matlab")
# If you are saving the models only for future processing in Julia environment,
# it is often wasteful to encode the models to external formats and decode them
# back. Instead, you can use the "native" Julia data format, accessible with
# package `Serialization`.
#
# This way, you can use `serialize` to save even the [`StandardModel`](@ref)
# that has no file format associated:
using Serialization
sm = convert(StandardModel, m)
open(f -> serialize(f, sm), "myModel.stdmodel", "w")
# The models can then be loaded back using `deserialize`:
sm2 = deserialize("myModel.stdmodel")
issetequal(metabolites(sm), metabolites(sm2))
# This form of loading operation is usually pretty quick:
t = @elapsed deserialize("myModel.stdmodel")
@info "Deserialization took $t seconds"
# Notably, large and complicated models with thousands of reactions and
# annotations can take seconds to decode properly. Serialization allows you to
# almost completely remove this overhead, and scales well to tens of millions
# of reactions.
# # Finding balance and variability of constraint-based models
#md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/notebooks/@__NAME__.ipynb)
#md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/notebooks/@__NAME__.ipynb)
# Here we use [`flux_balance_analysis`](@ref),
# [`flux_variability_analysis`](@ref), and
# [`parsimonious_flux_balance_analysis`](@ref) of `COBREXA.jl` functions to
# analyze a toy model of *E. coli*.
# If it is not already present, download the model.
!isfile("e_coli_core.xml") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml")
using COBREXA
#md # !!! tip "Tip: use `?` to get quick help about functions"
#md # When you are unsure about how a function works, write `?
#md # function_name` to see the function reference documentation.
model = load_model("e_coli_core.xml")
# ## Optimization solvers in `COBREXA`
#
# To actually perform any optimization based analysis we need to load an
# optimizer. Any [`JuMP.jl`-supported
# optimizers](https://jump.dev/JuMP.jl/stable/installation/#Supported-solvers)
# will work. Here, we will use [`Tulip.jl`](https://github.com/ds4dm/Tulip.jl)
# to optimize linear programs and
# [`OSQP.jl`](https://osqp.org/docs/get_started/julia.html) to optimize quadratic
# programs.
import Pkg
Pkg.add("Tulip")
Pkg.add("OSQP")
using Tulip
using OSQP
# ## Flux balance analysis (FBA)
#
# Most analysis functions come in several variants that produce different types
# of output. All of them usually require a model and `JuMP.jl`-compatible
# optimizer to work in the model.
#
# In the case of FBA, you may choose from these variants (here using the
# `Tulip` optimizer):
vec_soln = flux_balance_analysis_vec(model, Tulip.Optimizer)
#
dict_soln = flux_balance_analysis_dict(model, Tulip.Optimizer)
# ## Modifications
# Often it is desirable to add a slight modififaction to the problem before
# performing analysis, to see e.g. differences of the model behavior caused by
# the change introduced.
#
# `COBREXA.jl` supports several modifications by default, which include
# changing objective sense, optimizer attributes, flux constraints,
# optimization objective, reaction and gene knockouts, and others.
dict_soln = flux_balance_analysis_dict(
model,
OSQP.Optimizer;
modifications = [ # modifications are applied in order
## this changes the objective to maximize the biomass production
change_objective("R_BIOMASS_Ecoli_core_w_GAM"),
## this fixes a specific rate of the glucose exchange
change_constraint("R_EX_glc__D_e", -12, -12),
## this knocks out two genes, i.e. constrains their associated reactions to zero.
knockout(["b0978", "b0734"]), ## the gene IDs are cytochrome oxidase (CYTBD)
## ignore the optimizer specified above and change it to Tulip
change_optimizer(Tulip.Optimizer),
## set a custom attribute of the Tulip optimizer (see Tulip docs for more possibilities)
change_optimizer_attribute("IPM_IterationsLimit", 110),
## explicitly tell the optimizer to maximize the new objective
change_sense(MAX_SENSE),
],
)
# ## Flux variability analysis (FVA)
# The default FVA in [`flux_variability_analysis`](@ref) returns maximized and
# minimized reaction fluxes in a matrix. Here we use the dictionary variant in
# flux_variability_analysis_dict, to show how to easily access specific fluxes
# from its results.
fva_mins, fva_maxs = flux_variability_analysis_dict(
model,
Tulip.Optimizer;
bounds = objective_bounds(0.99), # the objective function is allowed to vary by ~1% from the FBA optimum
modifications = [
change_optimizer_attribute("IPM_IterationsLimit", 500),
change_constraint("R_EX_glc__D_e", -10, -10),
change_constraint("R_EX_o2_e", 0.0, 0.0),
],
)
#
fva_maxs["R_EX_ac_e"]["R_EX_ac_e"] # get the maximal acetate exchange flux
# ## Parsimonious flux balance analysis (pFBA)
# Parsimonious flux balance analysis (here in
# [`parsimonious_flux_balance_analysis`](@ref) finds a unique flux solution
# that minimizes the squared sum of fluxes of the system subject, while
# maintaining the same objective value as the flux balance analysis solution.
# Since we are optimizing a quadratic objective, we also need to switch to a
# quadratic optimizer. In this case, OSQP will work. We demonstrate it on the
# dictionary-returning variant of pFBA,
# [`parsimonious_flux_balance_analysis_dict`](@ref):
dict_soln = parsimonious_flux_balance_analysis_dict(
model,
OSQP.Optimizer;
modifications = [
change_optimizer_attribute("verbose", false), # silence the optimizer (OSQP is very verbose by default)
change_constraint("R_EX_glc__D_e", -12, -12),
],
)
# The function also has the expectable second variant that returns a vector of
# solutions, in [`parsimonious_flux_balance_analysis_vec`](@ref). Here, we
# utilize it to show how to use different optimizers for finding the optimum
# and for solving the quadratic problem. That may be preferable if the
# optimizer qualities differ for the differing tasks. pFBA allows you to
# specify `qp_modifications` that are applied after the original optimum is
# found, and before the quadratic part of the problem solving begins.
vec_soln = parsimonious_flux_balance_analysis_vec(
model,
Tulip.Optimizer; # start with Tulip
modifications = [
change_constraint("R_EX_glc__D_e", -12, -12),
change_optimizer_attribute("IPM_IterationsLimit", 500), # we may change Tulip-specific attributes here
],
qp_modifications = [
change_optimizer(OSQP.Optimizer), # now switch to OSQP (Tulip wouldn't be able to finish the computation)
change_optimizer_attribute("verbose", false), # and silence it.
],
)
# # Basic usage of `StandardModel`
#md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/notebooks/@__NAME__.ipynb)
#md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/notebooks/@__NAME__.ipynb)
# In this tutorial we will use `COBREXA`'s `StandardModel` and functions that
# specifically operate on it. As usual we will use the toy model of *E. coli*
# for demonstration.
!isfile("e_coli_core.json") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json")
using COBREXA
# ## Loading a model
model = load_model(StandardModel, "e_coli_core.json") # we specifically want to load a StandardModel from the model file
#md # !!! note "Note: Loading `StandardModel`s implicitly uses `convert`"
#md # When using `load_model(StandardModel, file_location)` the model at
#md # `file_location` is first loaded into its inferred format and is then
#md # converted to a `StandardModel` using the generic accessor interface.
#md # Thus, data loss may occur. Always check your model to ensure that
#md # nothing important has been lost.
#nb # When using `load_model(StandardModel, file_location)` the model at
#nb # `file_location` is first loaded into its inferred format and is then
#nb # converted to a `StandardModel` using the generic accessor interface.
#nb # Thus, data loss may occur. Always check your model to ensure that
#nb # nothing important has been lost.
# ## Basic analysis
# As before, for optimization based analysis we need to load an optimizer. Here we
# will use [`Tulip.jl`](https://github.com/ds4dm/Tulip.jl) to solve the linear
# programs of this tutorial. Refer to the basic constraint-based analysis
# tutorial for more informaiton.
# All the normal analysis functions work on `StandardModel`, due to it also
# having the same generic accessor interface as all the other model types.
using Tulip
dict_sol = flux_balance_analysis_dict(
model,
Tulip.Optimizer;
modifications = [
change_objective("BIOMASS_Ecoli_core_w_GAM"),
change_constraint("EX_glc__D_e", -12, -12),
change_constraint("EX_o2_e", 0, 0),
],
)
# This is not very exciting yet, since every other model type can also do this.
# However, deeper inspection of flux results is possible when using
# `StandardModel`.
# ## Inspecting the flux solution: `atom_exchange`
# It is sometimes interesting to keep track of the atoms entering and leaving
# the system through boundary reactions. This can be inspected by calling
# `atom_exchange`.
atom_exchange(dict_sol, model) # flux of individual atoms entering and leaving the system through boundary reactions (e.g. exchange reactions) based on flux_dict
# ## Inspecting the flux solution: `exchange_reactions`
# It is also sometimes useful to inspect the exchange reactions used by a flux
# solution. The function `exchange_reactions` fulfills this purpose.
consuming, producing = exchange_reactions(dict_sol, model; top_n = 4);
# ## Inspecting the flux solution: `metabolite_fluxes`
# Another useful flux result analysis function is `metabolite_fluxes`. This
# function keeps track of reactions consuming and producing each metabolite.
consuming, producing = metabolite_fluxes(dict_sol, model)
consuming["atp_c"] # reactions consuming atp_c
# ## Internals of `StandardModel`
# Another benefit of `StandardModel` is that it supports a richer internal
# infrastructure that can be used to manipulate internal model attributes in a
# systematic way. Specifically, the genes, reactions, and metabolites with of a
# model each have a type. This is particularly useful when modifying or even
# constructing a model from scratch.
# ## `Gene`s, `Reaction`s, and `Metabolite`s
# `StandardModel` is composed of ordered dictionaries of `Gene`s, `Metabolite`s
# and `Reaction`s. Ordered dictionaries are used because the order of the
# reactions and metabolites are important for constructing a stoichiometric
# matrix since the rows and columns should correspond to the order of the metabolites
# and reactions returned by calling the accessors `metabolites` and `reactions`.
# Each `StandardModel` is composed of the following fields:
fieldnames(StandardModel) # fields of a StandardModel
# The `:genes` field of a `StandardModel` contains an ordered dictionary of gene ids mapped to `Gene`s.
model.genes # the keys of this dictionary are the same as genes(model)
# The `Gene` type is a struct that can be used to store information about genes
# in a `StandardModel`. Each `Gene` is composed of the following fields:
fieldnames(Gene)
#md # !!! tip "Tip: Use <tab> complete to explore the structure of types"
#md # Use <tab> to quickly explore the fields of a struct. For example,
#md # Gene.<tab> will list all the fields shown above.
#nb # Use <tab> to quickly explore the fields of a struct. For example,
#nb # Gene.<tab> will list all the fields shown above.
# The keys used in the ordered dictionaries in
# `model.genes` are the ids returned using the generic accessor `genes`. `Gene`s
# have pretty printing, as demonstrated below for a random gene drawn from the
# model:
random_gene_id = genes(model)[rand(1:n_genes(model))]
model.genes[random_gene_id]
# The same idea holds for both metabolites (stored as `Metabolite`s) and
# reactions (stored as `Reaction`s). This is demonstrated below.
random_metabolite_id = metabolites(model)[rand(1:n_metabolites(model))]
model.metabolites[random_metabolite_id]
#
random_reaction_id = reactions(model)[rand(1:n_reactions(model))]
model.reactions[random_reaction_id]
# `StandardModel` can be used to build your own metabolic model or modify an
# existing one. One of the main use cases for `StandardModel` is that it can be
# used to merge multiple models or parts of multiple models together. Since the
# internals are uniform inside each `StandardModel`, attributes of other model
# types are squashed into the required format (using the generic accessors).
# This ensures that the internals of all `StandardModel`s are the same -
# allowing easy systematic evaluation.
#md # !!! warning "Warning: Combining models with different namespaces is tricky"
#md # Combining models that use different namespaces requires care.
#md # For example, in some models the water exchange reaction is called
#md # `EX_h2o_e`, while in others it is called `R_EX_h2o_s`. This needs to
#md # manually addressed (for now) to prevent duplicate, e.g. reactions,
#md # from being added.
# ## Checking the internals of `StandardModel`s: `annotation_index`
# Often when models are automatically reconstructed duplicate genes, reactions
# or metabolites end up in a model. `COBREXA` exports `annotation_index` to
# check for cases where the id of a struct may be different, but the annotations
# the same (possibly suggesting a duplication). `annotation_index` builds a
# dictionary mapping annotation features to the ids of whatever struct you are
# inspecting. This makes it easy to find structs that share certain annotation features.
rxn_annotations = annotation_index(model.reactions)
#
rxn_annotations["ec-code"]
# The `annotation_index` function can also be used on `Reaction`s and
# `Gene`s in the same way.
# ## Checking the internals of `StandardModel`s: `check_duplicate_reaction`
# Another useful function is `check_duplicate_reaction`, which checks for
# reactions that have duplicate (or similar) reaction equations.
pgm_duplicate = Reaction()
pgm_duplicate.id = "pgm2" # Phosphoglycerate mutase
pgm_duplicate.metabolites = Dict{String,Float64}("3pg_c" => 1, "2pg_c" => -1)
pgm_duplicate
#
check_duplicate_reaction(pgm_duplicate, model.reactions; only_metabolites = false) # can also just check if only the metabolites are the same but different stoichiometry is used
# ## Checking the internals of `StandardModel`s: `is_mass_balanced`
# Finally, `is_mass_balanced` can be used to check if a reaction is mass
# balanced based on the formulas of the reaction equation.
pgm_duplicate.metabolites = Dict{String,Float64}("3pg_c" => 1, "2pg_c" => -1, "h2o_c" => 1) # not mass balanced now
is_bal, extra_atoms = is_mass_balanced(pgm_duplicate, model) # extra_atoms shows which atoms are in excess/deficit
# # Basic usage of `CoreModel` and `CoreModelCoupled`
#md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/notebooks/@__NAME__.ipynb)
#md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/notebooks/@__NAME__.ipynb)
# In this tutorial we will introduce `COBREXA`'s `CoreModel` and
# `CoreModelCoupled`. We will use *E. coli*'s toy model to start with.
!isfile("e_coli_core.xml") &&
download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml")
using COBREXA
# ## Loading a `CoreModel`
model = load_model(CoreModel, "e_coli_core.xml") # we specifically want to load a CoreModel from the model file
# ## Basic analysis on `CoreModel`
# As before, for optimization based analysis we need to load an optimizer. Here we
# will use [`Tulip.jl`](https://github.com/ds4dm/Tulip.jl) to optimize the linear
# programs of this tutorial. Refer to the constraint-based analysis basics
# tutorial if you are confused by any functions in this section.
# All the normal analysis functions work on `CoreModel`, due to it also having
# the same generic accessor interface as all the other model types.
using Tulip
dict_sol = flux_balance_analysis_dict(
model,
Tulip.Optimizer;
modifications = [
change_objective("R_BIOMASS_Ecoli_core_w_GAM"),
change_constraint("R_EX_glc__D_e", -12, -12),
change_constraint("R_EX_o2_e", 0, 0),
],
)
# ## Structure of `CoreModel`
# `CoreModel` is a special `COBREXA` type that is optimized for large scale
# analysis of large models. It stores data in a sparse format where possible.
fieldnames(CoreModel)
#
model.S
# ## `CoreModelCoupled` adds coupling constraints to `CoreModel`
# `CoreModelCoupled` extends `CoreModel` by adding coupling constraints.
fieldnames(CoreModelCoupled)
# In short, coupling constraints can be used to ensure that fluxes scale with
# the growth rate (μ) of a model. This reduces the impact of biologically
# infeasible cycles from occurring. Here we will model coupling constraints by
# assuming that they have the form: `-γ ≤ vᵢ/μ ≤ γ`, where `γ` is the ratio
# between each individual flux (vᵢ) in the model and the growth rate.
gamma = 40 # arbitrary
nr = n_reactions(model) # number of reactions
biomass_index = first(indexin(["R_BIOMASS_Ecoli_core_w_GAM"], reactions(model)))
using LinearAlgebra, SparseArrays
Cf = sparse(1.0I, nr, nr)
Cf[:, biomass_index] .= -gamma
Cb = sparse(1.0I, nr, nr)
Cb[:, biomass_index] .= gamma
C = [Cf; Cb] # coupling constraint matrix
clb = spzeros(2 * nr)
clb[1:nr] .= -1000.0
cub = spzeros(2 * nr)
cub[nr+1:end] .= 1000
cmodel = CoreModelCoupled(model, C, clb, cub)
#
d = flux_balance_analysis_dict(model, Tulip.Optimizer)
d["R_BIOMASS_Ecoli_core_w_GAM"]
#
dc = flux_balance_analysis_dict(cmodel, Tulip.Optimizer)
dc["R_BIOMASS_Ecoli_core_w_GAM"]
# # Model construction and modification
#md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/notebooks/@__NAME__.ipynb)
#md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/notebooks/@__NAME__.ipynb)
# `COBREXA` can load models stored in `.mat`, `.json`, and `.xml` formats; and convert
# these into `StandardModel`s. However, it is also possible to construct models
# from scratch, and modify existing models. This will be demonstrated
# here.
# ## Model construction
# In `COBREXA`, model construction is primarily supported through `StandardModel`s.
# To begin, create an empty `StandardModel`.
using COBREXA
model = StandardModel("FirstModel") # assign model id = "FirstModel"
# Next, genes, metabolites and reactions need to be added to the model.
# ### Add genes to the model
gene_list = [Gene(string("g", num)) for num = 1:8]
#md # !!! warning "Warning: Don't accidentally overwrite the generic accessors"
#md # It may be tempting to call a variable `genes`, `metabolites`, or
#md # `reactions`. However, these names conflict with generic accessors
#md # functions and will create problems downstream.
add!(model, gene_list)
# ### Add metabolites to the model
metabolite_list = [Metabolite(string("m", num)) for num = 1:4]
metabolite_list[1].formula = "C6H12O6" # can edit metabolites, etc. directly
add!(model, metabolite_list)
# ### Add reactions to the model
# There are two ways to create and add reactions to a model.
# These are using functions, or macros.