Unverified Commit 2ca4bb63 authored by Miroslav Kratochvil's avatar Miroslav Kratochvil
Browse files

document the sampler

parent 4e5ef108
Pipeline #42344 passed with stages
in 11 minutes and 41 seconds
"""
TODO
Perform basic hit and run sampling for `N` iterations on `model` using `optimizer` to
generate the warmup points. Here warmup points are generated by iteratively
minimizing and maximizing each reaction index in `warmup_indices`. Any problem modifications
needs to be specified in `modifications`. The sampler will store every `keepevery` sample.
The sampler will return `nchains` which is a `Chain` type from `MCMCChains.jl`. This makes
investigating convergence easier. The sampler can also be run in parallel by specifying the
worker process indices. See "Robert L. Smith Efficient Monte Carlo Procedures for
Generating Points Uniformly Distributed over Bounded Regions. Operations
Research 32 (6) 1296-1308 https://doi.org/10.1287/opre.32.6.1296" for more
details about the algorithm.
Note that `N` needs to be much greater than sample size (especially if
`keepever` is not 1), and should be greater than
the dimensionality of the sampled space (i.e., at least same as the number of
reactions).
# Example: serial sampling
```
using COBREXA
using Tulip
function affine_hit_and_run(
warmup_points::Matrix{Float64},
lbs::Vector{Float64},
ubs::Vector{Float64};
sample_iters = 100 .* (1:5),
workers = [myid()],
chains = length(workers),
)
model = load_model(StandardModel, model_path)
Run a hit-and-run style sampling that starts from `warmup_points` and uses
their affine combinations for generating the run directions to sample the space
delimited by `lbs` and `ubs`. The points that represent fluxes in
`warmup_points` should be organized in columns, i.e. `warmup_points[:,1]` is
the first warmup flux.
chains = hit_and_run(
model,
Tulip.Optimizer;
N = 1000_000,
nchains = 3,
modifications = [change_constraint("EX_glc__D_e",-8, -8)]
)
```
There are total `chains` of hit-and-run runs, each on a batch of
`size(warmup_points, 2)` points. The runs are scheduled on `workers`, for good
load balancing `chains` should be ideally much greater than `length(workers)`.
# Example: parallel sampling
Each run continues for `maximum(sample_iters)` iterations; the numbers in
`sample_iters` represent the iterations at which the whole "current" batch of
points is collected for output. For example, `sample_iters=[1,4,5]` causes the
process run for 5 iterations, returning the sample batch that was produced by
1st, 4th and last (5th) iteration.
Returns a matrix of sampled fluxes (in columns), with all collected samples
horizontally concatenated. The total number of samples (columns) will be
`size(warmup_points,2) * chains * length(sample_iters)`.
# Example
```
using COBREXA
using Tulip
model = load_model(StandardModel, model_path)
using Distributed
addprocs(3)
@everywhere using COBREXA, Tulip
chains = hit_and_run(
model,
Tulip.Optimizer;
N = 1000_000,
nchains = 3,
modifications = [change_constraint("EX_glc__D_e",-8, -8)]
workerids = workers()
)
warmup, lbs, ubs = warmup_from_variability(model, Tulip.Optimizer, 100)
samples = affine_hit_and_run(warmup, lbs, ubs, sample_iters = 1:3)
```
"""
function affine_hit_and_run(
......@@ -85,6 +70,13 @@ function affine_hit_and_run(
return samples
end
"""
_affine_hit_and_run_chain(warmup, lbs, ubs, iters, chain)
Internal helper function for computing a single affine hit-and-run chain. The
number of the chain is passed for possible future initialization of stable
RNGs.
"""
function _affine_hit_and_run_chain(warmup, lbs, ubs, iters, chain)
points = copy(warmup)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment