Commit 56e5902f by St. Elmo

### housekeeping changes

parent 85e7f1cf
 ... ... @@ -68,7 +68,7 @@ If you run a non-standard platform (e.g. a customized operating systems), or if ## Quick start guide In addition to COBREXA, you also need to include a Julia package which provides an appropriate solver. One such solver is Tulip, which is provided by the [Tulip.jl](https://github.com/ds4dm/Tulip.jl) package. In addition to COBREXA, you also need to include a Julia package that provides an appropriate optimizer. One such optimizer for linear programs is Tulip, which is provided by the [Tulip.jl](https://github.com/ds4dm/Tulip.jl) package. julia ] add Tulip ... ...
 # Optimization Based Analysis A selection of standard COBRA functions have been implemented to make basic model analysis more convenient. Additionally, COBREXA.jl allows you to easily formulate your own optimization problems using the structure of a constraint based model. This makes it easy to experiment with custom algorithms etc. ## Flux balance analysis (FBA) Flux balance analysis solves the linear program, math \begin{aligned} & \underset{v}{\text{max}} & \sum_{i}^I {w_i}{v_i} \\ & \text{s. t.} & Sv = 0 \\ & & v_{\text{LB}} \leq v \leq v_{\text{UB}} \\ \end{aligned}  using any JuMP compatible solver. Typically I is a singleton set that only includes the index of the biomass objective function, v_\mu, and weight, w_\mu=1. Here, we use Tulip.jl, a pure Julia interior point linear program solver, with the fba function from COBREXA.jl.  using COBREXA using JuMP using Tulip if !isfile("e_coli_core.json") download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json") end model = read_model("e_coli_core.json") biomass = findfirst(model.reactions, "BIOMASS_Ecoli_core_w_GAM") optimizer = Tulip.Optimizer sol = flux_balance_analysis(model, biomass, optimizer)  Note that the result of the optimization problem, sol maps fluxes to reaction ids in the model, to simplify down stream analysis. Since the fluxes are mapped to a dictionary, it makes it simple to export them as a JSON file for visualization in, e.g. [Escher](https://escher.github.io/#/).  using JSON open("fluxes.json", "w") do io JSON.print(io, sol) end  ## Solution inspection Sometimes it is useful to investigate which reactions consume or produce a certain metabolite, or which exchange reactions are active. This functionality is exposed via metabolite_fluxes and exchange_reactions:  consuming, producing = metabolite_fluxes(sol, model) consuming["atp_c"]  ## Parsimonious FBA Parsimonious FBA (pFBA) solves a two stage optimization problem. First, a classic FBA problem is solved to identify the unique maximum of the objective. However, it should be noted that the fluxes from FBA are not unique (i.e. many fluxes may yield the objective optimum). To yield a unique set of fluxes, and remove internal futile cycles, a secondary quadratic problem is imposed *ad hoc*. Suppose that FBA has found the optimum of v_\mu = \mu, pFBA then solves: math \begin{aligned} & \underset{v}{\text{min}} & \sum_{i} {v_i^2} \\ & \text{s. t.} & Sv = 0 \\ & & v_{\text{LB}} \leq v \leq v_{\text{UB}} \\ & & v_\mu = \mu \end{aligned}  again using any JuMP compatible solver(s). In the COBREXA.jl implementation of pFBA, both the FBA and QP problem are solved internally in pfba, using similar input arguments as in fba. If multiple solvers are given, the first solver is used to solve the LP, and the second solver the QP, otherwise the same solver is used to solve both problems. This is useful if the QP solver does not handle the LP problem well, as with OSQP. An alternative, related formulation of this idea exists, called "CycleFreeFlux". This replaces the quadratic formulation with an L1 (taxi cab) norm. While this should also remove futile cycles, it doesn't have the same uniqueness qualities and doesn't really have much benefit beyond only solving two linear programs. See [Building your own optimization analysis script](@ref) for ideas about how to implement this yourself if you really need it. Here, we use Tulip.jl followed by OSQP.jl, with the pfba function from COBREXA.jl. Note that OSQP.jl has iffy performance, and is only included here because it is open source. We recommend that a commercial solver, e.g. Gubobi.jl, be used to simplify your user experience.  using OSQP atts = Dict("eps_abs" => 5e-4,"eps_rel" => 5e-4, "max_iter" => 100_000, "verbose"=>false) # set solver attributes for QSQP sol = pfba(model, biomass, [Tulip.Optimizer, OSQP.Optimizer]; solver_attributes=Dict("opt1" => Dict{Any, Any}(), "opt2" => atts))  ## Flux variability analysis (FVA) Flux variability analysis can also be used to investigate the degeneracy associated with flux balance analysis derived solutions (see also [Sampling Tools](@ref)). COBREXA.jl exposes fva that sequentially maximizes and minimizes each reaction in a model subject to the constraint that each optimization problem also satisfies an initial FBA type objective optimum, below denoted by v_{\mu}=\mu, math \begin{aligned} & \underset{v}{\text{max or min}} & v_i \\ & \text{s. t.} & Sv = 0 \\ & & v_{\text{LB}} \leq v \leq v_{\text{UB}} \\ & & v_{\mu} = \mu \\ \end{aligned}  ## Building your own optimization analysis script COBREXA.jl also makes it simple to construct customized optimization problems by making judicious use of [JuMP](https://jump.dev/). Convenience functions make optimization problem construction, modification and data extraction from JuMP result objects easy.  model = read_model("e_coli_core.json") cbm, v, mb, ubs, lbs = build_cbm(model) glucose_index = model[findfirst(model.reactions, "EX_glc__D_e")] set_bound(glucose_index, ubs, lbs; ub=-12.0, lb=-12.0) set_optimizer(cbm, Tulip.Optimizer) @objective(cbm, Max, v[model[biomass]]) optimize!(cbm) sol = map_fluxes(v, model) 
 # Model IO ## Reading constraint based models Currently, JSON and Matlab formatted models can be imported.  using COBREXA if !isfile("e_coli_core.json") download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json") end model = read_model("e_coli_core.json") model # pretty print the model  ## Writing constraint based models Currently, JSON and Matlab models can be exported.  save_model(model, "e_coli_core2.json")  ## IO Problems? Please let me know when you run into model import/export problems by filing an issue.
 # Model Construction ## Defining genes Genes are represented by the Gene type in COBREXA.jl, see [Model Structure](@ref) for details. Genes can be constructed using either an empty constructor, or a constructor taking only the string id of the gene.  using COBREXA gene = Gene("gene1") gene.name = "gene 1 name" gene # pretty printing  Helper functions from Base have also been overwritten to make accessing arrays of genes easy. ## Defining metabolites Metabolites are represented by the Metabolite type in COBREXA.jl, see [Model Structure](@ref) for details. The simplest way to define a new metabolite is by using the empty constructor Metabolite(). Alternatively, Metabolite(id::String) can be used to assign only the id field of the Metabolite. The other fields can be modified as usual, if desired.  using COBREXA # hide atp = Metabolite("atp") atp.name = "Adenosine triphosphate" atp.formula = "C10H12N5O13P3" atp.charge = -4 # note that the charge is an Int atp # pretty printing  Basic analysis of Metabolites is also possible. The molecular structure of a metabolite can be inspected by calling get_atoms(met::Metabolite). This function is useful for checking atom balances across reactions, or the entire model.  using COBREXA # hide atp = Metabolite("atp") # hide atp.name = "Adenosine triphosphate" # hide atp.formula = "C10H12N5O13P3" # hide get_atoms(atp)  Helper functions from Base have also been overwritten to make accessing arrays of metabolites easy. ## Defining reactions Reactions are represented by the Reaction type in COBREXA.jl, see [Model Structure](@ref) for details. The simplest way to define a new reaction is by using the empty constructor Reaction(). All the other fields still need to be assigned. Another option is to use Reaction(id::String, metabolites::Dict{Metabolite, Float64}, dir=:bidirectional), which assigns the reaction id, the reaction stoichiometry (through the metabolite dictionary argument), and the directionality of the reaction. The remaining fields still need to be assigned, if desired.  using COBREXA # hide atp = Metabolite("atp") # hide atp.name = "Adenosine triphosphate" # hide atp.formula = "C10H12N5O13P3" # hide atp.charge = -4 # hide gene = Gene("gene1") # hide adp = Metabolite("adp") # define another metabolite metdict = Dict(atp => -1.0, adp => 1.0) # nb stoichiometries need to be floats rxn = Reaction("dummy rxn", metdict, :forward) rxn.annotation["ec-code"] = ["0.0.0.0"] rxn.grr = [[gene]] # only gene1 is required for this reaction to work rxn # pretty printing  See the discussion in [Model Structure](@ref) about how to assign grr to a reaction. Yet another way of defining a reaction is through overloading of the operators: *, +, ∅, ⟶, →, ←, ⟵, ↔, ⟷. The longer and shorter arrows mean the same thing, i.e. ⟶ is the same as →, etc. The other fields of the reaction still need to be set directly.  using COBREXA # hide atp = Metabolite("atp") # hide atp.name = "Adenosine triphosphate" # hide atp.formula = "C10H12N5O13P3" # hide atp.charge = -4 # hide adp = Metabolite("adp") # hide adp.formula = "C10H12N5O10P2" # hide another_rxn = 2.0adp ⟶ 2.0*atp # forward reaction another_rxn.id = "another dummy rxn" another_rxn  When building exchange, demand, or sinks reactions the ∅ empty metabolite should be used to indicate that a metabolite is being created or destroyed.  using COBREXA # hide adp = Metabolite("adp") # hide adp.formula = "C10H12N5O10P2" # hide ex_rxn = ∅ ⟷ adp # exchange reaction  It is also possible to check if a reaction is mass balanced by using is_mass_balanced(rxn::Reaction). Note, this function requires that all the metabolites in the reaction have formulas assigned to them to work properly.  using COBREXA # hide atp = Metabolite("atp") # hide atp.name = "Adenosine triphosphate" # hide atp.formula = "C10H12N5O13P3" # hide atp.charge = -4 # hide adp = Metabolite("adp") # hide adp.formula = "C10H12N5O10P2" # hide unbal_rxn = 2.0adp ⟶ 1.0*atp # unbalanced reaction is_mass_balanced(unbal_rxn)  Helper functions from Base have also been overwritten to make accessing arrays of reactions easy. ## Building models Once you have defined some metabolites, genes, and reactions, you can construct a model! This is most simply done by using the empty model constructor: The fields of StandardModel can then be assigned as usual.  using COBREXA atp = Metabolite("atp") adp = Metabolite("adp") glc = Metabolite("glc") h2o = Metabolite("h2o") pp = Metabolite("pp") # pi (phosphate) usually means π, so pp is used here instead lac = Metabolite("lac") g1 = Gene("gene1") g2 = Gene("gene2") g3 = Gene("gene3") g4 = Gene("gene4") catabolism = glc + 2.0*adp + 2.0*pp ⟶ 2.0 * lac + 2.0 * h2o + 2.0 * atp catabolism.id = "catabolism" catabolism.grr = [[g1, g2], [g3, g4]] anabolism = 10.0 * atp + 10.0*h2o ⟶ 10.0*adp + 10.0*pp anabolism.id = "anabolism" glc_ex = ∅ ⟷ glc # exchanges are defined like this so negative fluxes mean import lac_ex = ∅ ⟷ lac # positive flux means export mets = [atp, adp, pp, h2o, glc, lac] genes = [g1, g2, g3, g4] rxns = [catabolism, anabolism, lac_ex, glc_ex] model = StandardModel() model.id = "Test model" add!(model, mets) add!(model, rxns) add!(model, genes) model # pretty printing  Here the add functions were used to add the reactions, metabolites and genes to the model. Checking for duplicates of genes, metabolites or reactions can also be done. Note that duplicate checking is NOT performed when models are imported. If you suspect the model quality you should check each reaction, metabolite and gene yourself.  using COBREXA met1 = Metabolite() met1.id = "met1" met1.name = "Metabolite 1" met1.formula = "C6H12O6N" met1.charge = 1 met1.compartment = "c" met1.notes = Dict("notes"=>["This is a made up metabolite", "Another note"]) met1.annotation = Dict("sboterm" => "sbo000001", "kegg.compound" => ["C0001", "C0010"]) met2 = Metabolite("met2") met2.formula = "C6H12O6N" met3 = Metabolite("met3") met3.formula = "X" met3.annotation = Dict("sboterm" => "sbo00001", "kegg.compound" => ["C02222", "C0001"]) mets = [met1, met2, met3] dup, ind = check_duplicate_annotations(mets, met3)   mms = check_same_formula([met3, met1], met2)  Similar functionality exists for genes and reactions. Duplicate reactions, metabolites or genes can be removed using rm!.  using COBREXA # hide atp = Metabolite("atp") atp2 = Metabolite("atp2") mets = [atp, atp2] model = StandardModel() add!(model, mets) rm!(model, atp2)  A model can also be checked to ensure that no metabolites or genes are missing relative to the reactions. fix_model also ensures that no extra metabolites are present.  using COBREXA # hide atp = Metabolite("atp") adp = Metabolite("adp") anabolism = 10.0 * atp ⟶ 10.0*adp anabolism.id = "anabolism" mets = [atp] rxns = [anabolism] model = StandardModel() model.id = "Test model" add!(model, mets) # missing adp add!(model, rxns) fix_model!(model) # adp added model # now has 2 metabolites  Helper functions from Base have also been overwritten to make accessing reactions, metabolites and genes easy from a model.
 # Model Structure Before reading, writing, or building models, it is important to understand how they are represented internally in COBREXA.jl. Each model is a struct of the type StandardModel, which is composed of a model id, and arrays of Reactions, Metabolites and Genes. The fields of Reaction, Metabolite, Gene types are shown below. When reading, writing, building or analysing models, these fields are what is used by COBREXA.jl. Note, the format of grr (gene reaction rule) in Reaction should be a nested array, like [[g1, g2], [g3, g4], ...]. Each sub-array, e.g. [g1, g2], is composed of essential genes (g1::COBREXA.Gene, etc.) for the reaction to function. Thus, if the reaction requires (g1 and g2) or (g3 and g4) to function, then this would be represented by [[g1, g2], [g3, g4]] in grr. Also note, the metabolites dictionary field of Reaction maps a Metabolite to its stoichiometrix coefficient, i.e. it represents the reaction equation. Also note, the format used the annotation field in Reaction, Metabolite and Gene should always be a dictionary with string ids mapped to vectors of entries EXCEPT for "sbo" terms, which are not vectors but strings, e.g.:  reaction.annotation = Dict( "bigg.reaction" => ["PFK"], "rhea" => ["16111", "16109", "16110", "16112"], "sbo" => "SBO:0000176", "ec-code" => ["2.7.1.11"], )  ...is correct, but any other format will cause issues.
 # Sampling Tools It is well known that the FBA does not yield a unique solution, i.e. many flux distributions are capable of satisfying the system constraints as well as optimizing the imposed objective function. Let the feasible space be defined by \mathcal{P} = \left\{ v : Sv = 0 \cap v_{\text{min}} \leq v \leq v_{\text{max}} \right\}. Sampling methods have been developed to uniformly sample from this feasible solution space. COBREXA.jl implements hit_and_run to sample from \mathcal{P}.  using COBREXA using JuMP using Tulip model = read_model("e_coli_core.json") optimizer = COBREXA.Tulip.Optimizer biomass = findfirst(model.reactions, "BIOMASS_Ecoli_core_w_GAM") cons = Dict("EX_glc__D_e" => (-12.0, -12.0)) sol = fba(model, biomass, optimizer, constraints=cons) # classic flux balance analysis cons["BIOMASS_Ecoli_core_w_GAM"] = (sol["BIOMASS_Ecoli_core_w_GAM"], sol["BIOMASS_Ecoli_core_w_GAM"]*0.99) samples = hit_and_run(100_000, model, optimizer; keepevery=10, samplesize=5000, constraints=cons) `
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!