Basic usage of StandardModel

In this tutorial we will use COBREXA's StandardModel and functions that specifically operate on it. As usual we will use the toy model of E. coli for demonstration.

!isfile("e_coli_core.json") &&
    download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json")

using COBREXA

Loading a model

model = load_model(StandardModel, "e_coli_core.json") # we specifically want to load a StandardModel from the model file
Metabolic model of type StandardModel

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⢄⠀⠀⠀⠀⠀⠀⠀⠈⠶⠴⡆⠀⠀⠀⠀⠀⠀
⡀⢐⣀⢀⡀⡒⢒⣐⠀⣂⣂⠀⣂⣂⢂⠀⢀⠀⠀⠀⠀⠀⢀⠄⠀⠀⠀⢂⠀⢂⣀⣐⡒⡀⠆⢙⣀⠀⡀⠀
⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⠀⠰⠀⠀⠀⠀⠀⠀⠀⠀⠀⠠⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠈⢑⣀⣀⠀⠀
⠀⠀⠃⠀⠃⠀⠀⠀⠘⠀⡇⠀⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠀⠀⡜⠀⡄⣤⢠⠘⠙⢣⡇⠘
⠀⠐⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⠐⠁⠉⠀⠀⠀⠀⠀⠘⠄
⠀⢐⠀⠂⠀⠄⠠⠠⠀⠠⠆⠀⠄⠀⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠠⠀⠠⠀⠀⢀⠀⠀⠠⠀⠀⠁
⢀⠐⠀⠨⢀⠁⠈⣈⠀⢁⣁⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠁⢀⠀⢊⠉⠀⠀⠀⢀⠀⣀⠀⢀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡈⠀⡀⠆⠀⠆⠀⡀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠆⠀
⠀⠀⠂⠀⡂⠀⠀⠁⠀⠀⠀⠈⠁⠀⠀⠀⠄⠄⢁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀
⠈⠀⠁⠀⠀⢀⡀⠀⠠⠁⠁⠀⠑⠀⠐⠲⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂⠀⠂⠀⠀⠀⠀⠀⠀⠊⠀⠀⠀⠈
⠄⠠⢠⠀⠰⠀⠠⠀⠤⠦⠄⠈⠀⠀⠀⠠⠀⠁⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠤⠄⠄⠠⠀⠀⠀⠀⠀
⠂⠐⠀⠀⠐⡠⢐⠘⢃⠒⠂⡀⠄⠀⠀⠐⠀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠒⠀⢀⢀⠀⠀⣀⠀⢀
⠈⠀⠁⠀⡀⠀⠀⠀⠈⠁⠅⠀⠁⠀⢀⠈⠄⠔⠀⠀⠀⠀⠀⠀⠀⠀⠐⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠈
⠣⠁⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠀⠀⠈⠀⠁⠁⠀⠈⡀⠀⠀⠀⠀⠀⠐⢣⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⡄⠀⠀⠀⠀⠂⠄⠤⠀⠀⠈⠂⠀⠀⠀⠀⠠⠀⠊⠒⣠⠀⠀⠀⠀⠀⠀⠀⠀⠀
Number of reactions: 95
Number of metabolites: 72
Note: Loading `StandardModel`s implicitly uses `convert`

When using load_model(StandardModel, file_location) the model at file_location is first loaded into its inferred format and is then converted to a StandardModel using the generic accessor interface. Thus, data loss may occur. Always check your model to ensure that nothing important has been lost.

Basic analysis

As before, for optimization based analysis we need to load an optimizer. Here we will use Tulip.jl to solve the linear programs of this tutorial. Refer to the basic constraint-based analysis tutorial for more informaiton.

All the normal analysis functions work on StandardModel, due to it also having the same generic accessor interface as all the other model types.

using Tulip

dict_sol = flux_balance_analysis_dict(
    model,
    Tulip.Optimizer;
    modifications = [
        change_objective("BIOMASS_Ecoli_core_w_GAM"),
        change_constraint("EX_glc__D_e", -12, -12),
        change_constraint("EX_o2_e", 0, 0),
    ],
)
Dict{String, Float64} with 95 entries:
  "ACALD"       => -9.78427
  "PTAr"        => 10.0729
  "ALCD2x"      => -9.78427
  "PDH"         => 1.98381e-9
  "PYK"         => 9.94501
  "CO2t"        => 0.487021
  "EX_nh4_e"    => -1.48633
  "MALt2_2"     => -0.0
  "CS"          => 0.294088
  "PGM"         => -22.8676
  "TKT1"        => -0.0487648
  "EX_mal__L_e" => -0.0
  "ACONTa"      => 0.294088
  "EX_pi_e"     => -1.00274
  "GLNS"        => 0.069699
  "ICL"         => 5.34932e-11
  "EX_o2_e"     => -0.0
  "FBA"         => 11.7289
  "EX_gln__L_e" => -0.0
  ⋮             => ⋮

This is not very exciting yet, since every other model type can also do this. However, deeper inspection of flux results is possible when using StandardModel.

Inspecting the flux solution: atom_exchange

It is sometimes interesting to keep track of the atoms entering and leaving the system through boundary reactions. This can be inspected by calling atom_exchange.

atom_exchange(dict_sol, model) # flux of individual atoms entering and leaving the system through boundary reactions (e.g. exchange reactions) based on flux_dict
Dict{String, Float64} with 5 entries:
  "C" => -11.5998
  "N" => -1.48633
  "P" => -1.00274
  "H" => -20.7086
  "O" => -12.995

Inspecting the flux solution: exchange_reactions

It is also sometimes useful to inspect the exchange reactions used by a flux solution. The function exchange_reactions fulfills this purpose.

consuming, producing = exchange_reactions(dict_sol, model; top_n = 4);
Consuming fluxes:
EX_glc__D_e = -12.0
EX_h2o_e = -8.285701
EX_nh4_e = -1.48633
EX_pi_e = -1.002744
EX_co2_e = -0.487021
Producing fluxes:
EX_h_e = 36.713726
EX_for_e = 21.172843
EX_ac_e = 10.072906
EX_etoh_e = 9.78427
EX_succ_e = 0.0

Inspecting the flux solution: metabolite_fluxes

Another useful flux result analysis function is metabolite_fluxes. This function keeps track of reactions consuming and producing each metabolite.

consuming, producing = metabolite_fluxes(dict_sol, model)

consuming["atp_c"] # reactions consuming atp_c
Dict{String, Float64} with 5 entries:
  "PFK"                      => -11.7289
  "BIOMASS_Ecoli_core_w_GAM" => -16.3031
  "GLNS"                     => -0.069699
  "ATPM"                     => -8.39
  "ATPS4r"                   => -6.80168

Internals of StandardModel

Another benefit of StandardModel is that it supports a richer internal infrastructure that can be used to manipulate internal model attributes in a systematic way. Specifically, the genes, reactions, and metabolites with of a model each have a type. This is particularly useful when modifying or even constructing a model from scratch.

Genes, Reactions, and Metabolites

StandardModel is composed of ordered dictionaries of Genes, Metabolites and Reactions. Ordered dictionaries are used because the order of the reactions and metabolites are important for constructing a stoichiometric matrix since the rows and columns should correspond to the order of the metabolites and reactions returned by calling the accessors metabolites and reactions.

Each StandardModel is composed of the following fields:

fieldnames(StandardModel) # fields of a StandardModel
(:id, :reactions, :metabolites, :genes)

The :genes field of a StandardModel contains an ordered dictionary of gene ids mapped to Genes.

model.genes # the keys of this dictionary are the same as genes(model)
OrderedCollections.OrderedDict{String, Gene} with 137 entries:
  "b1241" => Gene("b1241", nothing, Dict("original_bigg_ids"=>["b1241"]), Dict(…
  "b0351" => Gene("b0351", nothing, Dict("original_bigg_ids"=>["b0351"]), Dict(…
  "s0001" => Gene("s0001", nothing, Dict("original_bigg_ids"=>["s0001"]), Dict(…
  "b1849" => Gene("b1849", nothing, Dict("original_bigg_ids"=>["b1849"]), Dict(…
  "b3115" => Gene("b3115", nothing, Dict("original_bigg_ids"=>["b3115"]), Dict(…
  "b2296" => Gene("b2296", nothing, Dict("original_bigg_ids"=>["b2296"]), Dict(…
  "b1276" => Gene("b1276", nothing, Dict("original_bigg_ids"=>["b1276"]), Dict(…
  "b0118" => Gene("b0118", nothing, Dict("original_bigg_ids"=>["b0118"]), Dict(…
  "b0474" => Gene("b0474", nothing, Dict("original_bigg_ids"=>["b0474"]), Dict(…
  "b0116" => Gene("b0116", nothing, Dict("original_bigg_ids"=>["b0116"]), Dict(…
  "b0727" => Gene("b0727", nothing, Dict("original_bigg_ids"=>["b0727"]), Dict(…
  "b0726" => Gene("b0726", nothing, Dict("original_bigg_ids"=>["b0726"]), Dict(…
  "b2587" => Gene("b2587", nothing, Dict("original_bigg_ids"=>["b2587"]), Dict(…
  "b0356" => Gene("b0356", nothing, Dict("original_bigg_ids"=>["b0356"]), Dict(…
  "b1478" => Gene("b1478", nothing, Dict("original_bigg_ids"=>["b1478"]), Dict(…
  "b3734" => Gene("b3734", nothing, Dict("original_bigg_ids"=>["b3734"]), Dict(…
  "b3733" => Gene("b3733", nothing, Dict("original_bigg_ids"=>["b3733"]), Dict(…
  "b3736" => Gene("b3736", nothing, Dict("original_bigg_ids"=>["b3736"]), Dict(…
  "b3737" => Gene("b3737", nothing, Dict("original_bigg_ids"=>["b3737"]), Dict(…
  ⋮       => ⋮

The Gene type is a struct that can be used to store information about genes in a StandardModel. Each Gene is composed of the following fields:

fieldnames(Gene)
(:id, :name, :notes, :annotations)
Tip: Use <tab> complete to explore the structure of types

Use <tab> to quickly explore the fields of a struct. For example, Gene.<tab> will list all the fields shown above.

The keys used in the ordered dictionaries in model.genes are the ids returned using the generic accessor genes. Genes have pretty printing, as demonstrated below for a random gene drawn from the model:

random_gene_id = genes(model)[rand(1:n_genes(model))]
model.genes[random_gene_id]
Gene.id: b4232
Gene.name: ---
Gene.notes: 
	original_bigg_ids: ["b4232"]
Gene.annotations: 
	sbo: ["SBO:0000243"]
	uniprot: ["P0A993"]
	ecogene: ["EG10283"]
	ncbigene: ["948753"]
	ncbigi: ["16132054"]
	refseq_locus_tag: ["b4232"]
	refseq_name: ["fbp"]
	asap: ["ABE-0013842"]
	refseq_synonym: JW4191, ..., fdp

The same idea holds for both metabolites (stored as Metabolites) and reactions (stored as Reactions). This is demonstrated below.

random_metabolite_id = metabolites(model)[rand(1:n_metabolites(model))]
model.metabolites[random_metabolite_id]
Metabolite.id: nad_c
Metabolite.name: ---
Metabolite.formula: C21N7P2H26O14
Metabolite.charge: -1
Metabolite.compartment: c
Metabolite.notes: 
	original_bigg_ids: ["nad_c"]
Metabolite.annotations: 
	kegg.drug: ["D00002"]
	sabiork: ["37"]
	kegg.compound: ["C00003"]
	sbo: ["SBO:0000247"]
	biocyc: ["META:NAD"]
	chebi: CHEBI:7422, ..., CHEBI:29867
	metanetx.chemical: ["MNXM8"]
	inchi_key: BAWFJGJZGIEFAR-NNYOX...
	hmdb: ["HMDB00902"]
	bigg.metabolite: ["nad"]
	seed.compound: ["cpd00003"]
	reactome.compound: 352330, ..., 29360
random_reaction_id = reactions(model)[rand(1:n_reactions(model))]
model.reactions[random_reaction_id]
Reaction.id: PDH
Reaction.name: ---
Reaction.metabolites: 1.0 coa_c + 1.0 nad_c + 1.0 pyr_c ⟶  1.0 accoa_c + 1.0 co2_c + 1.0 nadh_c
Reaction.lb: 0.0
Reaction.ub: 1000.0
Reaction.grr: (b0114 and b0115 and b0116)
Reaction.subsystem: Glycolysis/Gluconeogenesis
Reaction.notes: 
	original_bigg_ids: ["PDH"]
Reaction.annotations: 
	bigg.reaction: ["PDH"]
	sabiork: ["523"]
	metanetx.reaction: ["MNXR102425"]
	rhea: 28043, ..., 28042
	sbo: ["SBO:0000176"]
	seed.reaction: ["rxn00154"]
	kegg.reaction: ["R00209"]
	biocyc: ["META:PYRUVDEH-RXN"]
	reactome.reaction: R-RNO-71397, ..., R-MMU-71397
	ec-code: 1.2.1, ..., 2.3.1.12
Reaction.objective_coefficient: 0.0

StandardModel can be used to build your own metabolic model or modify an existing one. One of the main use cases for StandardModel is that it can be used to merge multiple models or parts of multiple models together. Since the internals are uniform inside each StandardModel, attributes of other model types are squashed into the required format (using the generic accessors). This ensures that the internals of all StandardModels are the same - allowing easy systematic evaluation.

Warning: Combining models with different namespaces is tricky

Combining models that use different namespaces requires care. For example, in some models the water exchange reaction is called EX_h2o_e, while in others it is called R_EX_h2o_s. This needs to manually addressed (for now) to prevent duplicate, e.g. reactions, from being added.

Checking the internals of StandardModels: annotation_index

Often when models are automatically reconstructed duplicate genes, reactions or metabolites end up in a model. COBREXA exports annotation_index to check for cases where the id of a struct may be different, but the annotations the same (possibly suggesting a duplication). annotation_index builds a dictionary mapping annotation features to the ids of whatever struct you are inspecting. This makes it easy to find structs that share certain annotation features.

rxn_annotations = annotation_index(model.reactions)
Dict{String, Dict{String, Set{String}}} with 10 entries:
  "ec-code"           => Dict("3.6.3.37"=>Set(["ATPM"]), "3.6.3.42"=>Set(["ATPM…
  "sabiork"           => Dict("109"=>Set(["PGL"]), "762"=>Set(["GLUN"]), "155"=…
  "metanetx.reaction" => Dict("MNXR104869"=>Set(["TKT2"]), "MNXR99715"=>Set(["E…
  "rhea"              => Dict("27626"=>Set(["TKT2"]), "10229"=>Set(["ACONTa"]),…
  "sbo"               => Dict("SBO:0000627"=>Set(["EX_for_e", "EX_nh4_e", "EX_p…
  "seed.reaction"     => Dict("rxn05297"=>Set(["GLUt2r"]), "rxn09717"=>Set(["PY…
  "kegg.reaction"     => Dict("R00114"=>Set(["GLUSy"]), "R00199"=>Set(["PPS"]),…
  "biocyc"            => Dict("META:TRANS-RXN-121B"=>Set(["FUMt2_2"]), "META:PE…
  "reactome.reaction" => Dict("R-TGU-71397"=>Set(["PDH"]), "R-XTR-70449"=>Set([…
  "bigg.reaction"     => Dict("ACALD"=>Set(["ACALD"]), "PTAr"=>Set(["PTAr"]), "…
rxn_annotations["ec-code"]
Dict{String, Set{String}} with 141 entries:
  "3.6.3.37" => Set(["ATPM"])
  "3.6.3.42" => Set(["ATPM"])
  "3.6.3.38" => Set(["ATPM"])
  "3.6.3.19" => Set(["ATPM"])
  "2.3.3.1"  => Set(["CS"])
  "1.6.1.2"  => Set(["NADTRHD"])
  "3.6.3.35" => Set(["ATPM"])
  "6.2.1.5"  => Set(["SUCOAS"])
  "6.3.5.4"  => Set(["GLUN"])
  "3.6.3.49" => Set(["ATPM"])
  "3.6.3.51" => Set(["ATPM"])
  "1.2.1.12" => Set(["GAPD"])
  "3.6.3.32" => Set(["ATPM"])
  "2.3.3.3"  => Set(["CS"])
  "2.7.4.3"  => Set(["ADK1"])
  "6.3.5.5"  => Set(["GLUN"])
  "3.5.1.2"  => Set(["GLUN"])
  "1.1.1.49" => Set(["G6PDH2r"])
  "5.3.1.9"  => Set(["PGI"])
  ⋮          => ⋮

The annotation_index function can also be used on Reactions and Genes in the same way.

Checking the internals of StandardModels: check_duplicate_reaction

Another useful function is check_duplicate_reaction, which checks for reactions that have duplicate (or similar) reaction equations.

pgm_duplicate = Reaction()
pgm_duplicate.id = "pgm2" # Phosphoglycerate mutase
pgm_duplicate.metabolites = Dict{String,Float64}("3pg_c" => 1, "2pg_c" => -1)
pgm_duplicate
Reaction.id: pgm2
Reaction.name: ---
Reaction.metabolites: 1.0 2pg_c ⟷  1.0 3pg_c
Reaction.lb: -1000.0
Reaction.ub: 1000.0
Reaction.grr: ---
Reaction.subsystem: ---
Reaction.notes: ---
Reaction.annotations: ---
Reaction.objective_coefficient: 0.0
check_duplicate_reaction(pgm_duplicate, model.reactions; only_metabolites = false) # can also just check if only the metabolites are the same but different stoichiometry is used
"PGM"

Checking the internals of StandardModels: is_mass_balanced

Finally, is_mass_balanced can be used to check if a reaction is mass balanced based on the formulas of the reaction equation.

pgm_duplicate.metabolites = Dict{String,Float64}("3pg_c" => 1, "2pg_c" => -1, "h2o_c" => 1) # not mass balanced now
is_bal, extra_atoms = is_mass_balanced(pgm_duplicate, model) # extra_atoms shows which atoms are in excess/deficit
(false, Dict("C" => 0.0, "P" => 0.0, "H" => 2.0, "O" => 1.0))

This page was generated using Literate.jl.