1_loading_converting_saving.ipynb 15 KB
Newer Older
Documenter.jl's avatar
Documenter.jl committed
1
{"cells":[{"cell_type":"markdown","source":["# Loading, converting, and saving models"],"metadata":{}},{"cell_type":"markdown","source":["`COBREXA` can load models stored in `.mat`, `.json`, and `.xml` formats (with\n","the latter denoting SBML formatted models)."],"metadata":{}},{"cell_type":"markdown","source":["We will primarily use the *E. Coli* \"core\" model to demonstrate the utilities\n","found in `COBREXA`. First, let's download the model in several formats."],"metadata":{}},{"outputs":[],"cell_type":"code","source":["# Downloads the model files if they don't already exist\n","!isfile(\"e_coli_core.mat\") &&\n","    download(\"http://bigg.ucsd.edu/static/models/e_coli_core.mat\", \"e_coli_core.mat\");\n","!isfile(\"e_coli_core.json\") &&\n","    download(\"http://bigg.ucsd.edu/static/models/e_coli_core.json\", \"e_coli_core.json\");\n","!isfile(\"e_coli_core.xml\") &&\n","    download(\"http://bigg.ucsd.edu/static/models/e_coli_core.xml\", \"e_coli_core.xml\");"],"metadata":{},"execution_count":1},{"cell_type":"markdown","source":["Now, load the package:"],"metadata":{}},{"outputs":[],"cell_type":"code","source":["using COBREXA"],"metadata":{},"execution_count":2},{"cell_type":"markdown","source":["## Loading models"],"metadata":{}},{"cell_type":"markdown","source":["Load the models using the `load_model` function. Each model is able to\n","\"pretty-print\" itself, hiding the inner complexity."],"metadata":{}},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"Metabolic model of type MATModel\nsparse([9, 51, 55, 64, 65, 34, 44, 59, 66, 64  …  20, 22, 23, 25, 16, 17, 34, 44, 57, 59], [1, 1, 1, 1, 1, 2, 2, 2, 2, 3  …  93, 93, 94, 94, 95, 95, 95, 95, 95, 95], [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0  …  1.0, -1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0], 72, 95)\nNumber of reactions: 95\nNumber of metabolites: 72\n"},"metadata":{},"execution_count":3}],"cell_type":"code","source":["mat_model = load_model(\"e_coli_core.mat\")"],"metadata":{},"execution_count":3},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"Metabolic model of type JSONModel\nsparse([9, 51, 55, 64, 65, 34, 44, 59, 66, 64  …  20, 22, 23, 25, 16, 17, 34, 44, 57, 59], [1, 1, 1, 1, 1, 2, 2, 2, 2, 3  …  93, 93, 94, 94, 95, 95, 95, 95, 95, 95], [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0  …  1.0, -1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0], 72, 95)\nNumber of reactions: 95\nNumber of metabolites: 72\n"},"metadata":{},"execution_count":4}],"cell_type":"code","source":["json_model = load_model(\"e_coli_core.json\")"],"metadata":{},"execution_count":4},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"Metabolic model of type SBMLModel\nsparse([41, 23, 51, 67, 61, 65, 1, 7, 19, 28  …  72, 3, 8, 33, 57, 66, 31, 45, 46, 57], [1, 2, 2, 2, 3, 3, 4, 4, 4, 4  …  93, 94, 94, 94, 94, 94, 95, 95, 95, 95], [-1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0, 1.0, -1.0  …  1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, -1.0, 1.0], 72, 95)\nNumber of reactions: 95\nNumber of metabolites: 72\n"},"metadata":{},"execution_count":5}],"cell_type":"code","source":["sbml_model = load_model(\"e_coli_core.xml\")"],"metadata":{},"execution_count":5},{"cell_type":"markdown","source":["You can directly inspect the model objects, although only with a specific way\n","for each specific type."],"metadata":{}},{"cell_type":"markdown","source":["JSON models contain their corresponding JSON:"],"metadata":{}},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"Dict{String, Any} with 6 entries:\n  \"metabolites\"  => Any[Dict{String, Any}(\"compartment\"=>\"e\", \"name\"=>\"D-Glucos…\n  \"id\"           => \"e_coli_core\"\n  \"compartments\" => Dict{String, Any}(\"c\"=>\"cytosol\", \"e\"=>\"extracellular space…\n  \"reactions\"    => Any[Dict{String, Any}(\"name\"=>\"Phosphofructokinase\", \"metab…\n  \"version\"      => \"1\"\n  \"genes\"        => Any[Dict{String, Any}(\"name\"=>\"adhE\", \"id\"=>\"b1241\", \"notes…"},"metadata":{},"execution_count":6}],"cell_type":"code","source":["json_model.json"],"metadata":{},"execution_count":6},{"cell_type":"markdown","source":["SBML models contain a complicated structure from [`SBML.jl`\n","package](https://github.com/LCSB-BioCore/SBML.jl):"],"metadata":{}},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"SBML.Model"},"metadata":{},"execution_count":7}],"cell_type":"code","source":["typeof(sbml_model.sbml)"],"metadata":{},"execution_count":7},{"cell_type":"markdown","source":["MAT models contain MATLAB data:"],"metadata":{}},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"Dict{String, Any} with 17 entries:\n  \"description\" => \"e_coli_core\"\n  \"c\"           => [0.0; 0.0; … ; 0.0; 0.0;;]\n  \"rev\"         => [0; 0; … ; 1; 0;;]\n  \"mets\"        => Any[\"glc__D_e\"; \"gln__L_c\"; … ; \"g3p_c\"; \"g6p_c\";;]\n  \"grRules\"     => Any[\"b3916 or b1723\"; \"((b0902 and b0903) and b2579) or (b09…\n  \"subSystems\"  => Any[\"Glycolysis/Gluconeogenesis\"; \"Pyruvate Metabolism\"; … ;…\n  \"b\"           => [0.0; 0.0; … ; 0.0; 0.0;;]\n  \"metFormulas\" => Any[\"C6H12O6\"; \"C5H10N2O3\"; … ; \"C3H5O6P\"; \"C6H11O9P\";;]\n  \"rxnGeneMat\"  => sparse([6, 10, 6, 11, 27, 82, 93, 94, 12, 12  …  37, 38, 38,…\n  \"S\"           => [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0…\n  \"metNames\"    => Any[\"D-Glucose\"; \"L-Glutamine\"; … ; \"Glyceraldehyde 3-phosph…\n  \"lb\"          => [0.0; 0.0; … ; -1000.0; 0.0;;]\n  \"metCharge\"   => [0.0; 0.0; … ; -2.0; -2.0;;]\n  \"ub\"          => [1000.0; 1000.0; … ; 1000.0; 1000.0;;]\n  \"rxnNames\"    => Any[\"Phosphofructokinase\"; \"Pyruvate formate lyase\"; … ; \"O2…\n  \"rxns\"        => Any[\"PFK\"; \"PFL\"; … ; \"O2t\"; \"PDH\";;]\n  \"genes\"       => Any[\"b1241\"; \"b0351\"; … ; \"b2935\"; \"b3919\";;]"},"metadata":{},"execution_count":8}],"cell_type":"code","source":["mat_model.mat"],"metadata":{},"execution_count":8},{"cell_type":"markdown","source":["## Using the generic interface to access model details"],"metadata":{}},{"cell_type":"markdown","source":["To prevent the complexities of object representation, `COBREXA.jl` uses a set\n","of generic interface functions that extract various important information\n","from all supported model types. This approach ensures that the analysis\n","functions can work on any data."],"metadata":{}},{"cell_type":"markdown","source":["For example, you can check the reactions and metabolites contained in SBML\n","and JSON models using the same accessor:"],"metadata":{}},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"95-element Vector{String}:\n \"PFK\"\n \"PFL\"\n \"PGI\"\n \"PGK\"\n \"PGL\"\n \"ACALD\"\n \"AKGt2r\"\n \"PGM\"\n \"PIt2r\"\n \"ALCD2x\"\n ⋮\n \"MALt2_2\"\n \"MDH\"\n \"ME1\"\n \"ME2\"\n \"NADH16\"\n \"NADTRHD\"\n \"NH4t\"\n \"O2t\"\n \"PDH\""},"metadata":{},"execution_count":9}],"cell_type":"code","source":["reactions(json_model)"],"metadata":{},"execution_count":9},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"95-element Vector{String}:\n \"R_EX_fum_e\"\n \"R_ACONTb\"\n \"R_TPI\"\n \"R_SUCOAS\"\n \"R_GLNS\"\n \"R_EX_pi_e\"\n \"R_PPC\"\n \"R_O2t\"\n \"R_G6PDH2r\"\n \"R_TALA\"\n ⋮\n \"R_THD2\"\n \"R_EX_h2o_e\"\n \"R_GLUSy\"\n \"R_ME1\"\n \"R_GLUN\"\n \"R_EX_o2_e\"\n \"R_FRUpts2\"\n \"R_ALCD2x\"\n \"R_PIt2r\""},"metadata":{},"execution_count":10}],"cell_type":"code","source":["reactions(sbml_model)"],"metadata":{},"execution_count":10},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"true"},"metadata":{},"execution_count":11}],"cell_type":"code","source":["issetequal(reactions(json_model), reactions(mat_model)) # do models contain the same reactions?"],"metadata":{},"execution_count":11},{"cell_type":"markdown","source":["All accessors are defined in a single file in COBREXA source code; you may\n","therefore get a list of all accessors as follows:"],"metadata":{}},{"outputs":[{"name":"stdout","output_type":"stream","text":["balance\n","bounds\n","coupling\n","coupling_bounds\n","fluxes\n","gene_annotations\n","gene_name\n","gene_notes\n","genes\n","metabolite_annotations\n","metabolite_charge\n","metabolite_compartment\n","metabolite_formula\n","metabolite_name\n","metabolite_notes\n","metabolites\n","n_coupling_constraints\n","n_fluxes\n","n_genes\n","n_metabolites\n","n_reactions\n","objective\n","precache!\n","reaction_annotations\n","reaction_flux\n","reaction_gene_association\n","reaction_name\n","reaction_notes\n","reaction_stoichiometry\n","reaction_subsystem\n","reactions\n","stoichiometry\n"]}],"cell_type":"code","source":["using InteractiveUtils\n","\n","for method in filter(\n","    x -> endswith(string(x.file), \"MetabolicModel.jl\"),\n","    InteractiveUtils.methodswith(MetabolicModel, COBREXA),\n",")\n","    println(method.name)\n","end"],"metadata":{},"execution_count":12},{"cell_type":"markdown","source":["## Converting between model types"],"metadata":{}},{"cell_type":"markdown","source":["It is possible to convert model types to-and-fro. To do this, use the\n","`convert` function, which is overloaded from Julia's `Base`."],"metadata":{}},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"Metabolic model of type MATModel\nsparse([9, 51, 55, 64, 65, 34, 44, 59, 66, 64  …  20, 22, 23, 25, 16, 17, 34, 44, 57, 59], [1, 1, 1, 1, 1, 2, 2, 2, 2, 3  …  93, 93, 94, 94, 95, 95, 95, 95, 95, 95], [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0  …  1.0, -1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0], 72, 95)\nNumber of reactions: 95\nNumber of metabolites: 72\n"},"metadata":{},"execution_count":13}],"cell_type":"code","source":["m = convert(MATModel, json_model)"],"metadata":{},"execution_count":13},{"cell_type":"markdown","source":["`m` will now contain the MATLAB-style matrix representation of the model:"],"metadata":{}},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"72×95 Matrix{Float64}:\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0  …  0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0  …  0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0  -1.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  1.0  0.0   0.0  0.0   1.0  1.0   1.0     0.0  0.0  -4.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0  -1.0     0.0  0.0   3.0  0.0  0.0  0.0  0.0\n  ⋮                          ⋮          ⋱             ⋮                   \n -1.0  0.0   1.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  1.0  0.0   0.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  1.0   0.0  0.0   0.0  0.0   0.0  …  0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0   0.0  0.0   0.0  0.0   0.0  …  0.0  0.0   0.0  0.0  0.0  0.0  0.0\n  0.0  0.0  -1.0  0.0   0.0  0.0   0.0     0.0  0.0   0.0  0.0  0.0  0.0  0.0"},"metadata":{},"execution_count":14}],"cell_type":"code","source":["Matrix(m.mat[\"S\"])"],"metadata":{},"execution_count":14},{"cell_type":"markdown","source":["The loading and conversion can be combined using a shortcut:"],"metadata":{}},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"Metabolic model of type MATModel\nsparse([9, 51, 55, 64, 65, 34, 44, 59, 66, 64  …  20, 22, 23, 25, 16, 17, 34, 44, 57, 59], [1, 1, 1, 1, 1, 2, 2, 2, 2, 3  …  93, 93, 94, 94, 95, 95, 95, 95, 95, 95], [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0  …  1.0, -1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0], 72, 95)\nNumber of reactions: 95\nNumber of metabolites: 72\n"},"metadata":{},"execution_count":15}],"cell_type":"code","source":["m = load_model(MATModel, \"e_coli_core.json\")"],"metadata":{},"execution_count":15},{"cell_type":"markdown","source":["## Saving and exporting models"],"metadata":{}},{"cell_type":"markdown","source":["`COBREXA.jl` supports exporting the models in JSON and MAT format, using `save_model`."],"metadata":{}},{"outputs":[],"cell_type":"code","source":["save_model(m, \"converted_model.json\")\n","save_model(m, \"converted_model.mat\")"],"metadata":{},"execution_count":16},{"cell_type":"markdown","source":["If you need a non-standard suffix, use the type-specific saving functions:"],"metadata":{}},{"outputs":[],"cell_type":"code","source":["save_json_model(m, \"file.without.a.good.suffix\")\n","save_mat_model(m, \"another.file.matlab\")"],"metadata":{},"execution_count":17},{"cell_type":"markdown","source":["If you are saving the models only for future processing in Julia environment,\n","it is often wasteful to encode the models to external formats and decode them\n","back. Instead, you can use the \"native\" Julia data format, accessible with\n","package `Serialization`.\n","\n","This way, you can use `serialize` to save even the `StandardModel`\n","that has no file format associated:"],"metadata":{}},{"outputs":[],"cell_type":"code","source":["using Serialization\n","\n","sm = convert(StandardModel, m)\n","\n","open(f -> serialize(f, sm), \"myModel.stdmodel\", \"w\")"],"metadata":{},"execution_count":18},{"cell_type":"markdown","source":["The models can then be loaded back using `deserialize`:"],"metadata":{}},{"outputs":[{"output_type":"execute_result","data":{"text/plain":"true"},"metadata":{},"execution_count":19}],"cell_type":"code","source":["sm2 = deserialize(\"myModel.stdmodel\")\n","issetequal(metabolites(sm), metabolites(sm2))"],"metadata":{},"execution_count":19},{"cell_type":"markdown","source":["This form of loading operation is usually pretty quick:"],"metadata":{}},{"outputs":[{"name":"stdout","output_type":"stream","text":["┌ Info: Deserialization took 0.001927563 seconds\n","└ @ Main.##275 string:2\n"]}],"cell_type":"code","source":["t = @elapsed deserialize(\"myModel.stdmodel\")\n","@info \"Deserialization took $t seconds\""],"metadata":{},"execution_count":20},{"cell_type":"markdown","source":["Notably, large and complicated models with thousands of reactions and\n","annotations can take seconds to decode properly. Serialization allows you to\n","almost completely remove this overhead, and scales well to tens of millions\n","of reactions."],"metadata":{}},{"cell_type":"markdown","source":["---\n","\n","*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*"],"metadata":{}}],"nbformat_minor":3,"metadata":{"language_info":{"file_extension":".jl","mimetype":"application/julia","name":"julia","version":"1.1.0"},"kernelspec":{"name":"julia-1.1","display_name":"Julia 1.1.0","language":"julia"}},"nbformat":4}