Compare revisions

Todor Kondić · Todor Kondić · Todor Kondić · Todor Kondić · Todor Kondić · Todor Kondić
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -44,9 +44,6 @@ variables:
 base-image:
-  tags:
-    - docker
-    - $RUNNER_TAG
  stage: dep_images
  rules:
    - if: $CI_COMMIT_TAG == ""

--- a/DESCRIPTION
+++ b/DESCRIPTION
 Package: shinyscreen
 Title: Pre-screening of Mass Spectrometry Data 
-Version: 1.3.19
+Version: 1.3.21
 Author: Todor Kondić
 Maintainer: Todor Kondić <todor.kondic@uni.lu>
 Authors@R: 
@@ -50,6 +50,7 @@ Collate:
    'errors.R'
    'mix.R'
    'envopts.R'
+    'data-model.R'
    'state.R'
    'metfrag.R'
    'plotting.R'

--- a/Docker_Documentation/Shinyscreen_Docker.Rmd
+++ b/Docker_Documentation/Shinyscreen_Docker.Rmd
+---
+title: "shinyscreen Docker Documentation"
+author: "Anjana Elapavalore"
+date: '`r Sys.Date()`'
+output: pdf_document
+---
+The objective of this documentation is to provide a detailed procedure for running the shinyScreen application within a Docker container. shinyScreen is a web-based tool developed for visualizing and analyzing high-throughput screening data using Shiny, a web application framework for R. Docker is employed to containerize the application, ensuring reproducibility and simplifying the deployment process by packaging the application with all its dependencies.
+This document provides step-by-step instructions on how to install Docker, pull the ShinyScreen Docker image, and run the application.
+## Prerequisites
+Before proceeding, the following prerequisites must be met:
+- Docker Desktop must be installed and properly configured on the system.
+- Sufficient system resources, including at least 4 GB of RAM, are recommended for optimal performance.
+## Step 1: Install Docker Desktop on Windows
+If Docker Desktop is not already installed on your system, follow these steps to set it up:
+1. **Download Docker Desktop**:
+   - Visit the [Docker Desktop for Windows](https://docs.docker.com/desktop/install/windows-install/) download page and download the installer.
+2. **Install Docker Desktop**:
+   - Run the downloaded installer.
+   - Follow the installation prompts, and make sure the option to "Use the WSL 2-based engine" is selected during setup (if available).
+3. **Verify Docker Installation**:
+   - After installation, launch Docker Desktop.
+   - Ensure Docker is running by checking the Docker icon in the system tray (it should be active).
+   - Open a command prompt (CMD) or PowerShell and type the following command to check the Docker version:
+```bash
+docker --version
+```
+This should return the installed Docker version.
+## Step 2: Search for the ShinyScreen Docker Image in Docker Desktop
+Docker Desktop provides a graphical interface to search for Docker images. To search and pull the **shinyScreen** Docker image:
+1. **Open Docker Desktop**:
+   - Launch Docker Desktop from the Windows Start menu or by clicking the Docker icon in the system tray.
+2. **Search for the ShinyScreen Image**:
+   - In Docker Desktop, navigate to the **search** tab as shown in the image below.
+   **Docker Image : anjanae/shinyscreen:v21_latest**
+```{r, echo=FALSE, out.width="70%"}
+knitr::include_graphics("images/searchtab.png")
+```
+3. **Pull the Image**:
+   - Click on the **Pull** button next to the desired ShinyScreen image. This will download the image to your local system.
+You can monitor the download progress in Docker Desktop’s **Images** tab.
+## Step 3: Run the ShinyScreen Docker Container
+Once the ShinyScreen image has been pulled successfully, the next step is to run the container. This can be done directly from Docker Desktop or via the command line:
+### Running the Container via Docker Desktop:
+1. **Navigate to the Images Tab**:
+   - In Docker Desktop, go to the **Images** tab where your downloaded ShinyScreen image will be listed.
+2. **Launch the Container**:
+   - Click the **Run** button next to the ShinyScreen image.
+   - In the pop-up dialog, ensure that port `3838` is exposed. To do this, add the following port configuration in the **Optional Settings**:
+```
+Host Port: 3838
+Container Port: 3838
+```
+3. **To create the volume mounts**:
+  - To share files between the host and the container, you can use Docker’s volume mounting          feature.
+### local Directory Structure
+Before launching the Docker container, the following directory structure should be set up on your local system. These directories will be mounted to specific locations inside the container, enabling the application to access the required input data and store the output files.
+### Root Directory
+The **root directory** is the top-level folder where the entire project is organized. This root directory will contain the following subdirectories:
+1. **Project Directory**
+2. **Data Directory**
+3. **MetFrag Databases Directory**
+Below is an overview of each subdirectory and its purpose.
+### Project Directory
+The **Project Directory** should initially contain only the **compound list** file, which is an essential input for the ShinyScreen application.
+After the application runs, this directory will also store various output files generated during the analysis.
+### Data Directory
+The **Data Directory** should contain only the **mzML files**. These files serve as the primary data input for the ShinyScreen application and represent mass spectrometry data in mzML format.
+### MetFrag Databases Directory
+The **MetFrag Databases Directory** must include the latest **PubChemLite_Exposomics.csv** file, which will be used by the ShinyScreen application for MetFrag analysis.
+### Configure Volume Mounts in Docker Desktop
+Now that the local directories are set up, you can configure **volume mounts** in Docker Desktop to map these local directories to the corresponding paths inside the Docker container.
+For the volume mounts, each directory on the host machine (local system) will be mapped to a specific location within the Docker container, allowing the application to access the input data and save output files.
+In the **Optional Settings** pop-up window, you will configure the following **three volume mounts** (Please refer to the table and image below):
+   - **Local Directory on Windows**: 
+     - This is the folder on your local machine where the data will reside.
+   - **Container Directory**: 
+     - This is the path inside the Docker container where the files will be mounted.
+| Local Directory (Host)         | Container Directory (Mount)            | Description                     |
+| ------------------------------ | -------------------------------------- | ------------------------------- |
+| `C:\path\to\rootdirectory`           | `/home/ssuser/projects`                | Directory for project files.     |
+| `C:\path\to\rootdirectory`       | `/home/ssuser/top_data_dir`            | Directory for top-level data.    |
+| `C:\path\to\metfrag_dbs`        | `/home/ssuser/metfrag_dbs`             | Directory for MetFrag databases. |
+```{r, echo=FALSE, out.width="70%"}
+knitr::include_graphics("images/runtab.png")
+```
+By setting up the volume mounts, any changes made to files in the mapped directories on the host machine will automatically reflect inside the container.
+Once the **port** and **volume configurations** are confirmed, click **Run** to start the container. The ShinyScreen application will now be running inside the container.
+Now click on the port mapping to access the application in the web browser
+```{r, echo=FALSE, out.width="85%"}
+knitr::include_graphics("images/containertab.png")
+```
--- a/Docker_Documentation/Shinyscreen_Docker.pdf
+++ b/Docker_Documentation/Shinyscreen_Docker.pdf
--- a/Docker_Documentation/images/containertab.png
+++ b/Docker_Documentation/images/containertab.png
--- a/Docker_Documentation/images/runtab.png
+++ b/Docker_Documentation/images/runtab.png
--- a/Docker_Documentation/images/searchtab.png
+++ b/Docker_Documentation/images/searchtab.png
--- a/Dockerfile
+++ b/Dockerfile
 FROM gitlab.lcsb.uni.lu:4567/eci/shinyscreen/dep/ssuser:latest
 MAINTAINER todor.kondic@uni.lu
-EXPOSE 5432
+EXPOSE 3838
 ENV SS_MF_DB="PubChemLite_exposomics.csv"
 ENV SS_CPU 2
 ADD . shinyscreen/
 RUN R CMD build shinyscreen
 USER root
 RUN R CMD INSTALL shinyscreen
 USER ssuser
 RUN cp shinyscreen/runme /home/ssuser/runme
 RUN chmod u+x /home/ssuser/runme
 # RUN chown ssuser /home/ssuser/runme
 # RUN chown -R ssuser /home/ssuser/shinyscreen
 RUN R -e 'library(shinyscreen);setwd("~");init(top_data_dir="~/top_data_dir",projects="~/projects",users_dir="~/users",metfrag_db_dir=Sys.getenv("SS_MF_DB_DIR"),metfrag_jar="/usr/local/bin/MetFragCommandLine.jar",no_structure_plots=T,save=T,merge=F)'
 ENTRYPOINT ["/home/ssuser/runme"]
 CMD ["app"]
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -6,7 +6,6 @@ export(conf_trans)
 export(create_plots)
 export(create_stub_gui)
 export(extr_data)
-export(extract)
 export(get_fn_comp)
 export(get_fn_conf)
 export(get_fn_extr)
@@ -47,8 +46,6 @@ export(sort_spectra)
 export(subset_summary)
 export(tk_save_file)
 import(data.table)
-importFrom(MSnbase,filterMz)
-importFrom(MSnbase,readMSData)
 importFrom(promises,"%...>%")
 importFrom(promises,future_promise)
 importFrom(shiny,HTML)

--- a/R/api.R
+++ b/R/api.R
@@ -32,6 +32,7 @@ run <- function(envopts,
    all_phases=list(setup=setup_phase,
                    comptab=mk_comp_tab,
+                    db=make_db,
                    extract=extr_data,
                    prescreen=prescreen,
                    sort=sort_spectra,
@@ -98,39 +99,14 @@ run_in_dir <- function(m) {
 }
 ##' @export
 load_compound_input <- function(m) {
-    coll <- list()
-    fields <- colnames(EMPTY_CMPD_LIST)
-    fns <- file.path(m$run$paths$project,m$conf$compounds$lists)
-    message("fns:",paste0(fns,collapse=","))
-    coltypes <- c(ID="character",
-                  SMILES="character",
-                  Formula="character",
-                  Name="character",
-                  RT="numeric",
-                  mz="numeric")
-    for (l in 1:length(fns)) {
-        fn <- fns[[l]]
-        ## Figure out column headers.
-        nms <- colnames(file2tab(fn,nrows=0))
-        ## Read the table. Knowing column headers prevents unnecessary
-        ## warnings.
-        dt <- file2tab(fn, colClasses=coltypes[nms])
-        verify_cmpd_l(dt=dt,fn=fn)
-                                        # nonexist <- setdiff(fnfields,fields)
-        coll[[l]] <- dt #if (length(nonexist)==0) dt else dt[,(nonexist) := NULL]
-        coll[[l]]$ORIG <- fn
-    }
-    cmpds <- if (length(fns)>0) rbindlist(l=c(list(EMPTY_CMPD_LIST), coll), use.names = T, fill = T) else EMPTY_CMPD_LIST
+    fns <- file.path(m$run$paths$project,m$conf$compounds$lists)
+    cmpds = join_compound_lists(fns)
+    ## Process sets.
+    cmpds = process_cmpd_sets(cmpds)
    dups <- duplicated(cmpds$ID)
    dups <- dups | duplicated(cmpds$ID,fromLast = T)
    dupIDs <- cmpds$ID[dups]
@@ -219,8 +195,6 @@ mk_comp_tab <- function(m) {
    smiforadd <- smiles[smiforadd,.(ID,SMILES,Formula,adduct),on=c("SMILES")]
    data.table::setkey(smiforadd,"adduct","ID")
-    ## FIXME: Why is Formula a list when there are no SMILES, instead
-    ## of an empty string?
    smiforadd[,Formula:=as.character(Formula)]
    ## Update the intermediate table with masses.
@@ -336,181 +310,67 @@ mk_tol_funcs <- function(m) {
 }
 ##' @export
 extr_data <-function(m) {
-    message("Stage: extract")
-    if (is.null(m$conf$serial) || !m$conf$serial) {
-        extr_data_future(m)
-    } else {
-        message("(extract): Serial extraction.")
-        extr_data_serial(m)
-    }
-}
-extr_data_future <- function(m) {
+    fine = create_fine_table(m)
-    ## Reduce the comp table to only unique masses (this is because
-    ## different sets can have same masses).
-    m$out$tab$data <- m$out$tab$comp[,head(.SD,1),by=BASE_KEY]
-    m$out$tab$data[,set:=NULL] #This column is meaningless now.
-    file <- m$out$tab$data[,unique(file)]
-    fpaths <- file.path(m$run$paths$data,file)
-    allCEs <- do.call(c,args=lapply(fpaths,function(fn) {
-        z <- MSnbase::readMSData(files=fn,msLevel = c(1,2),mode="onDisk")
+    dpath = m$run$paths$data
-        unique(MSnbase::collisionEnergy(z),fromLast=T)
-    }))
-    allCEs <- unique(allCEs)
-    allCEs <- allCEs[!is.na(allCEs)]
-    cols <-paste('CE',allCEs,sep = '')
-    vals <- rep(NA,length(cols))
-    m$out$tab$data[,(cols) := .(rep(NA,.N))]
-    file <- m$out$tab$data[,unique(file)]
-    ftags <- m$out$tab$data[,.(tag=unique(tag)),by=file]
-    fpaths <- file.path(m$run$paths$data,ftags[,file])
-    futuref <- m$future
-    tmp <- lapply(1:nrow(ftags),function(ii) {
-        fn <- fpaths[[ii]]
-        the_tag <- ftags[ii,tag]
-        message("(extract): Commencing extraction for tag: ", the_tag, "; file: ",fn)
-        tab <- as.data.frame(data.table::copy(m$out$tab$data[tag==the_tag,.(file,tag,adduct,mz,rt,ID)]))
-        ## err_ms1_eic <- m$extr$tol$eic
-        ## err_coarse_fun <- m$extr$tol$coarse
-        ## err_fine_fun <- m$extr$tol$fine
-        ## err_rt <- m$extr$tol$rt
-        err_coarse <- m$conf$tolerance[["ms1 coarse"]]
-        err_fine <- m$conf$tolerance[["ms1 fine"]]
+    ## Open all files.
-        err_ms1_eic <- m$conf$tolerance$eic 
+    fns = fine[,unique(file)]
+    lms = lapply(fns,function(fn) read_data_file(file.path(dpath,fn)))
+    names(lms) = fns
-        err_rt <- m$conf$tolerance$rt
+    ## Load all feature data.
+    lfdata = lapply(lms,get_fdata)
+    names(lfdata) = fns
-        missing_precursor_info <- m$conf$extract$missing_precursor_info
+    ## Extract MS1 chromatograms using "fine" tolerance.
-        x <- futuref(extract(fn=fn,
+    cgram_ms1 = fine[,extr_cgrams_ms1(lms[[file]],.SD,lfdata[[file]]),
-                             tag=the_tag,
+                     by="file",
-                             tab=tab,
+                     .SDcols=c("iso_fine_min",
-                             err_ms1_eic=err_ms1_eic,
+                               "iso_fine_max",
-                             err_coarse = err_coarse,
+                               "rt_min",
-                             err_fine= err_fine,
+                               "rt_max",
-                             err_rt= err_rt,
+                               "precid")]
-                             missing_precursors = missing_precursor_info),
+    setkey(cgram_ms1,file,precid)
-                     lazy = F)
-        x
+    ## Extract MS2 chromatograms.
-    })
+    ## Create the "coarse" table. Parent masses are known with
+    ## "coarse". We will prefilter our ms2 results based on that...
+    coarse = create_coarse_table(m)
-    msk <- sapply(tmp,future::resolved)
-    curr_done <- which(msk)
-    for (x in curr_done) {
+    cgram_ms2 = data.table(precid=integer(0),
-        message("Done extraction for ", future::value(tmp[[x]])$ms1$tag[[1]])
+                           ce=numeric(0),
-    }
+                           scan=character(0),
-    while (!all(msk)) {
+                           idx=integer(0),
-        msk <- sapply(tmp,future::resolved)
+                           rt=numeric(0),
-        newly_done <- which(msk)
+                           intensity=numeric(0))
-        for (x in setdiff(newly_done,curr_done)) {
-            message("Done extraction for ", future::value(tmp[[x]])$ms1$tag[[1]])
-        }
+    ## Extract MS2 spectra.
-        Sys.sleep(0.5)
+    spectra = empty_spectra_table()
-        curr_done <- newly_done
+    for (fn in names(lfdata)) {
+        rtab = relate_ms2_to_precid(coarse=coarse[.(fn),on=.(file)],
+                                    ms2=lfdata[[fn]]$ms2,
+                                    cgram_ms1=cgram_ms1[.(fn),
+                                                        on=.(file)])
+        sptab = extract_spectra(lms[[fn]],rtab)
+        cgram_ms2 = rbind(cgram_ms2,rtab)
+        spectra = rbind(spectra,sptab)
    }
+    setkey(cgram_ms1,precid,rt)
-    ztmp <- lapply(tmp,future::value)
+    setkey(cgram_ms2,precid,ce,rt)
-    m$extr$ms1 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms1))
+    setkey(spectra,precid,scan)
-    m$extr$ms2 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms2))
+    m$db$extr$cgm$ms1 = cgram_ms1
-    data.table::setkeyv(m$extr$ms1,BASE_KEY)
+    m$db$extr$cgm$ms2 = cgram_ms2
-    data.table::setkeyv(m$extr$ms2,c(BASE_KEY,"CE"))
+    m$db$extr$spectra = spectra
-    fn_ex <- get_fn_extr(m)
-    timetag <- format(Sys.time(), "%Y%m%d_%H%M%S")
-    saveRDS(object = m, file = file.path(m$run$paths$project,FN_EXTR_STATE))
    m
-}
-extr_data_serial <- function(m) {
-    ## Reduce the comp table to only unique masses (this is because
-    ## different sets can have same masses).
-    m$out$tab$data <- m$out$tab$comp[,head(.SD,1),by=BASE_KEY]
-    m$out$tab$data[,set:=NULL] #This column is meaningless now.
-    file <- m$out$tab$data[,unique(file)]
-    fpaths <- file.path(m$run$paths$data,file)
-    allCEs <- do.call(c,args=lapply(fpaths,function(fn) {
-        z <- MSnbase::readMSData(files=fn,msLevel = c(1,2),mode="onDisk")
-        unique(MSnbase::collisionEnergy(z),fromLast=T)
-    }))
-    allCEs <- unique(allCEs)
-    allCEs <- allCEs[!is.na(allCEs)]
-    cols <-paste('CE',allCEs,sep = '')
-    vals <- rep(NA,length(cols))
-    m$out$tab$data[,(cols) := .(rep(NA,.N))]
-    file <- file.path(m$run$paths$data,m$out$tab$data[,unique(file)])
-    ftags <- m$out$tab$data[,.(tag=unique(tag)),by=file]
-    ftags[,path:=file.path(..m$run$paths$data,file)]
-    futuref <- m$future
-    tmp <- lapply(1:nrow(ftags),function(ii) {
-        fn <- ftags[ii,path]
-        the_tag <- ftags[ii,tag]
-        message("(extract): Commencing extraction for tag: ", the_tag, "; file: ",fn)
-        tab <- as.data.frame(data.table::copy(m$out$tab$data[tag==the_tag,.(file,tag,adduct,mz,rt,ID)]))
-        ## err_ms1_eic <- m$extr$tol$eic
-        ## err_coarse_fun <- m$extr$tol$coarse
-        ## err_fine_fun <- m$extr$tol$fine
-        ## err_rt <- m$extr$tol$rt
-        err_coarse <- m$conf$tolerance[["ms1 coarse"]]
-        err_fine <- m$conf$tolerance[["ms1 fine"]]
-        err_ms1_eic <- m$conf$tolerance$eic 
-        err_rt <- m$conf$tolerance$rt
-        missing_precursor_info <- m$conf$extract$missing_precursor_info
-        x <- extract(fn=fn,
-                     tag=the_tag,
-                     tab=tab,
-                     err_ms1_eic=err_ms1_eic,
-                     err_coarse = err_coarse,
-                     err_fine= err_fine,
-                     err_rt= err_rt,
-                     missing_precursors = missing_precursor_info)
-        message("Done extraction for ", x$ms1$tag[[1]])
-        x
-    })
-    ztmp <- tmp
-    m$extr$ms1 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms1))
-    m$extr$ms2 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms2))
-    data.table::setkeyv(m$extr$ms1,BASE_KEY)
-    data.table::setkeyv(m$extr$ms2,c(BASE_KEY,"CE"))
-    fn_ex <- get_fn_extr(m)
-    timetag <- format(Sys.time(), "%Y%m%d_%H%M%S")
-    saveRDS(object = m, file = file.path(m$run$paths$project,FN_EXTR_STATE))
-    m
 }
 ##' @export
@@ -523,11 +383,10 @@ conf_trans <- function(conf) {
 prescreen <- function(m) {
    ## Top-level auto prescreening function.
    message("(prescreen): Start.")
-    ## confpres <- conf_trans_pres(m$conf$prescreen)
+    m$qa = NULL
-    m$qa <- NULL
+    m$out$tab$summ = NULL
-    m$out$tab$summ <- NULL
+    m$qa = analyse_extracted_data(m$db,m$conf$prescreen)
-    m$qa <- analyse_extracted_data(m$extr,m$conf$prescreen)
+    m$out$tab$summ = gen_summ(m$db,m$qa,m$out$tab$comp)
-    m$out$tab$summ <- gen_summ(m$out$tab$comp,m$qa)
    message("(prescreen): End.")
    m
 }
@@ -605,8 +464,8 @@ create_plots <- function(m) {
    ## Select the data nedeed for plotting.
    flt_summ <- m$out$tab$flt_summ
-    ms2 <- m$extr$ms2
+    ms2 <- m$db$extr$cgm$ms2
-    ms1 <- m$extr$ms1
+    ms1 <- m$db$extr$cgm$ms1
    rt_min <- rt_in_min(m$conf$figures$rt_min)
    rt_max <- rt_in_min(m$conf$figures$rt_max)
    keytab <- flt_summ[,unique(.SD),.SDcol=c("adduct","ID")]
@@ -747,8 +606,8 @@ report <- function(m) {
    dir.create(REP_TOPDIR,recursive = T,showWarnings = F)
    header <- knitr::knit_expand(fn_header)
    flt_summ <- m$out$tab$reptab
-    ms2 <- m$extr$ms2
+    ms2 <- m$db$extr$cgm$ms2
-    ms1 <- m$extr$ms1
+    ms1 <- m$db$extr$cgm$ms1
    rt_min <- rt_in_min(m$conf$figures$rt_min)
    rt_max <- rt_in_min(m$conf$figures$rt_max)
    keytab <- flt_summ[,unique(.SD),.SDcol=c("adduct","ID")]
@@ -922,7 +781,7 @@ metfrag <- function(m) {
                       path = m$run$metfrag$path,
                       subpaths = m$run$metfrag$subpaths,
                       db_file = m$run$metfrag$db_file,
-                       stag_tab = stagtab, ms2 = m$extr$ms2,
+                       stag_tab = stagtab, ms2 = m$db$extr$spectra,
                       runtime=m$run$metfrag$runtime,
                       java_bin=m$run$metfrag$java_bin,
                       nproc = m$conf$metfrag$nproc)
@@ -942,3 +801,9 @@ metfrag <- function(m) {
    m
 }
+make_db <- function(m) {
+    m = make_db_catalogue(m)
+    m = make_db_precursors(m)
+    m
+}
--- a/R/base.R
+++ b/R/base.R
@@ -18,7 +18,13 @@
 the_ifelse <- data.table::fifelse
 dtable <- data.table::data.table
-norm_path <- function(...) normalizePath(...,winslash='/')
+norm_path <- function(...) {
+    test = nchar(...) > 0L
+    res = character(length(test))
+    res[test] = normalizePath(...[test],winslash='/') 
+    res[!test] = ...[!test]
+    res
+}
 tab2file<-function(tab,file,...) {
    data.table::fwrite(x=tab,file=file,...)
@@ -143,3 +149,23 @@ uniqy_slugs <- function(slugs) {
    dt = data.table::data.table(slug=slugs)
    dt[,slug:=fifelse(rep(.N==1L,.N),slug,paste0(slug,"_",seq(1L,.N))),by="slug"]$slug
 }
+gen_val_unc <- function(x,dx) {
+    ## Doesn't work well for <=0.
+    p = floor(log10(x))
+    dp = floor(log10(dx))
+    ## Zero?
+    message("p ",p)
+    w = which(is.infinite(p))
+    p[w] = 0
+    ## Normalise x and dx.
+    main = x/10**p
+    unc = round(dx/10**dp,0)
+    place = p - dp
+    main = mapply(function (m,d) formatC(m,digits=d,format='f',flag="#"),main,place,USE.NAMES=F)
+    w = which(main=='10.')
+    main[w]='1'
+    p[w]=p[w]+1
+    paste0(main,"(",unc,") x 10^",p)
+}
--- a/R/data-model.R
+++ b/R/data-model.R
+#Copyright (C) 2023 by University of Luxembourg
+## Shinyscreen works of an internal relational database implemented
+## using `data.table' package. Implementation is here.
+make_db_catalogue <- function(m) {
+    ## Takes comprehensive database from state `m' and generates a
+    ## catalogue with a unique key. This catalogue is based on
+    ## inputs. Each entry in the catalogue corresponds to a single
+    ## target mass from a single experimental run.
+    res = m$out$tab$comp[,unique(.SD),.SDcols=c("set","tag","adduct","ID")]
+    res[,catid:=.I]
+    setkeyv(res,DB_CATALOGUE_KEY)
+    setindex(res,catid)
+    m$db$cat = res
+    m
+}
+merge_precid_4_isobars <- function(orig_precids,masses,up_masses) {
+    start = head(orig_precids,1L)
+    n = length(orig_precids)
+    precid = orig_precids
+    i = 1L
+    while (i < n) {
+        theprecid = orig_precids[[i]]
+        themz = masses[[i]]
+        mzup = up_masses[[i]]
+        w = which(masses[(i+1L):n]<mzup)
+        precid[(i+1L):n][w] = theprecid
+        i = i + length(w) + 1L
+    }
+    precid
+}
+make_db_precursors <- function(m) {
+    ## Generate masses and label isobars.
+    cat = m$db$cat
+    masses = m$out$tab$comp[cat,.(tag=tag,catid=catid,mz=mz,rt=rt),on=key(cat)]
+    setkey(masses,tag,mz)
+    ## Retention time.
+    tmp = get_val_unit(m$conf$tolerance[['rt']])
+    rttol = as.numeric(tmp[['val']])
+    rtunit = tmp[['unit']]
+    if (rtunit == "s") {
+        rttol = rttol/60.
+    } else if (rtunit != "min") {
+        stop('make_db_precursors: Unknown retention time unit.')
+    }
+    masses[!is.na(rt),`:=`(rt_min=rt-rttol,rt_max=rt+rttol)]
+    ## Fine error.
+    tmp = get_val_unit(m$conf$tolerance[['ms1 fine']])
+    ms1tol = as.numeric(tmp[['val']])
+    ms1unit = tmp[['unit']]
+    if (ms1unit == "ppm") {
+        masses[,`:=`(mz_fine_min=mz-ms1tol*mz*1e-6,mz_fine_max=mz+ms1tol*mz*1e-6)]
+    } else if (ms1unit == "Da") {
+        masses[,`:=`(mz_fine_min=mz-ms1tol,mz_fine_max=mz+ms1tol)]
+    } else {
+        stop('make_db_precursors: Unknown mass unit (fine).')
+    }
+    ## Coarse error.
+    tmp = get_val_unit(m$conf$tolerance[['ms1 coarse']])
+    ms1tol = as.numeric(tmp[['val']])
+    ms1unit = tmp[['unit']]
+    if (ms1unit == "ppm") {
+        masses[,`:=`(mz_coarse_min=mz-ms1tol*mz*1e-6,mz_coarse_max=mz+ms1tol*mz*1e-6)]
+    } else if (ms1unit == "Da") {
+        masses[,`:=`(mz_coarse_min=mz-ms1tol,mz_coarse_max=mz+ms1tol)]
+    } else {
+        stop('make_db_precursors: Unknown mass unit (coarse).')
+    }
+    ## Assign "fine" isobars to same isocoarse number.
+    masses[,precid:=merge_precid_4_isobars(catid,mz,mz_fine_max),by="tag"]
+    ## Assign "coarse" isobars to same isocoarse number.
+    masses[,isocoarse:=merge_precid_4_isobars(catid,mz,mz_coarse_max),by="tag"]
+    masses[,`:=`(iso_coarse_min=min(mz_coarse_min),
+                 iso_coarse_max=max(mz_coarse_max)),
+                 by=isocoarse]
+    masses[,`:=`(iso_fine_min=min(mz_fine_min),
+                 iso_fine_max=max(mz_fine_max)),
+                 by=precid]
+    setindex(masses,isocoarse,precid)
+    ## Add files.
+    filetab = m$input$tab$mzml[m$db$cat,
+                               .(catid=i.catid,file=file),
+                               on=c("set","tag"),nomatch=NULL]
+    masses[filetab,file:=i.file,on="catid"]
+    m$db$precursors = masses
+    m
+}
+empty_cgram_ms1 <- function(n=0L) {
+    r = data.table(file=character(n),
+                   cgmidx=integer(n),
+                   precid=integer(n),
+                   scan=integer(n),
+                   rt=numeric(n),
+                   intensity=numeric(n))
+    setkey(r,precid,rt)
+    r
+}
+empty_cgram_ms2 <- function(n=0L) {
+    r = data.table(precid=integer(n),
+                   ce=numeric(n),
+                   scan=character(n),
+                   idx=integer(n),
+                   rt=numeric(n),
+                   intensity=numeric(n))
+    setkey(r,precid,ce,idx)
+    r
+}
+empty_spectra_table <- function(n=0L) {
+    r = data.table(precid=integer(n),
+                   scan=character(n),
+                   mz=numeric(n),
+                   intensity=numeric(n))
+    setkey(r,precid,scan)
+    r
+}
+summ_needs_from_cat <- function(cat) {
+    ## Catalogue columns.
+    cat
+}
+summ_needs_from_precursors <- function(res,precursors) {
+    ## Mass columns.
+    precursors[res,on=.(catid),.(precid,
+                                 mz,
+                                 set,
+                                 adduct,
+                                 tag,
+                                 ID,
+                                 mz_l=mz_fine_min,
+                                 mz_r=mz_fine_max),by=.EACHI]
+}
+summ_needs_from_qa <- function(res,qa) {
+    needs = qa[,.SD,.SDcols=c("precid",
+                              "ce",
+                              "scan",
+                              "ms1_rt",
+                              "ms1_int",
+                              "ms2_rt",
+                              "ms2_int",
+                              "ms1_mean",
+                              "ms2_sel",
+                              "qa_pass",
+                              "qa_ms1_exists",
+                              "qa_ms2_exists",
+                              "qa_ms1_good_int",
+                              "qa_ms1_above_noise",
+                              "qa_ms2_near",
+                              "qa_ms2_good_int",
+                              "qlt_ms1",
+                              "qlt_ms2")]
+    res = needs[res,on=.(precid),allow.cartesian=T]
+    ## TODO: additional processing?
+    res
+}
+summ_needs_from_comp <- function(res,comp) {
+    needs = comp[,.(set,ID,Name,SMILES)]
+    setkey(needs,set,ID)
+    res[needs,on=.(set,ID),`:=`(Name=i.Name,
+                                SMILES=i.SMILES)]
+}
+## This function creates `summ' table. 
+gen_summ <- function(db,qa,comp) {
+    ## Start with the basic things.
+    res = summ_needs_from_cat(db$cat)
+    ## Add masses and precids.
+    res = summ_needs_from_precursors(res,db$precursors)
+    ## Add qa columns.
+    res = summ_needs_from_qa(res,qa)
+    setkeyv(res,SUMM_KEY)
+    ## Add comp columns.
+    summ_needs_from_comp(res,comp)
+}
--- a/R/extraction.R
+++ b/R/extraction.R
@@ -12,598 +12,298 @@
 ## See the License for the specific language governing permissions and
 ## limitations under the License.
-load_raw_data<-function(fn,mode="inMemory") {
+create_fine_table <- function(m) {
-    ms1<-MSnbase::readMSData(files=fn,mode=mode,msLevel.=1)
+    ## Select fine mz-ranges and split them into those with rt entries
-    ms2<-MSnbase::readMSData(files=fn,mode=mode,msLevel.=2)
+    ## and those without.
-    c(ms1=ms1,ms2=ms2)
+    precs = m$db$precursors
-}
+    precs[,unique(.SD),.SDcols=c("iso_fine_min",
+                                 "iso_fine_max",
+                                 "rt_min",
+                                 "rt_max",
+                                 "file"),
+          keyby=c("file","precid")]
-centroided1 <- function(ms) {
-    if (all(MSnbase::centroided(ms)) == T)
-        return(T) else {
-                      state <- MSnbase::isCentroided(ms)
-                      N <- length(state)
-                      fls <-length(which(state == F))
-                      if (fls/(1.*N) < 0.01) T else F
-                  }
 }
-centroided <- function(msvec) {
+create_coarse_table <- function(m) {
-    if (is.vector(msvec)) {
+    ## Select coarse mz-ranges and split them into those with rt entries
-        f <- list()
+    ## and those without.
-        for (i in 1:length(msvec)) {
+    precs = m$db$precursors
-            f[[i]] <- future::future(centroided1(msvec[[i]]))
+    precs[,unique(.SD),.SDcols=c("iso_coarse_min",
-        }
+                                 "iso_coarse_max",
-        lapply(f, FUN = future::value)
+                                 "rt_min",
-    } else {
+                                 "rt_max",
-        centroided1(msvec)
+                                 "file"),
-    }
+          keyby=c("file","isocoarse","precid")]
-}
-acq_mz<-function(tabFn) {
-    df<-read.csv(tabFn,
-                 stringsAsFactors=F,
-                 comment.char='')
-    x<-as.numeric(df$mz)
-    names(x)<-paste("ID:",as.character(df$ID),sep='')
-    x
 }
-name2id<-function(nm) {as.integer(substring(nm,4))}
+read_data_file <- function(file) {
-id2name<-function(id) {paste("ID:",id,sep='')}
+    MSnbase::readMSData(file=file,msLevel=c(1,2),mode="onDisk")
-ppm2dev<-function(m,ppm) 1e-6*ppm*m
-gen_mz_range<-function(mz,err) {
-    mat<-matrix(data=numeric(1),nrow=length(mz),ncol=2)
-    mat[,1]<-mz - err
-    mat[,2]<-mz + err
-    mat
 }
-gen_rt_range<-function(rt,err) {
-    mat<-matrix(data=numeric(1),nrow=length(rt),ncol=2)
-    rV<-which(!is.na(rt))
-    rNA<-which(is.na(rt))
-    mat[rV,1]<-(rt[rV] - err)*60
-    mat[rV,2]<-(rt[rV] + err)*60
-    mat[rNA,1]<--Inf
-    mat[rNA,2]<-Inf
-    mat
-}
-filt_ms2_by_prcs <- function(ms2,mzrng,ids,adduct) {
+extr_cgrams_ms1 <- function(ms,tab,fdata) {
-    pre<-MSnbase::precursorMz(ms2)
+    ## Some helpers.
-    psn<-MSnbase::precScanNum(ms2)
+    new_restab <- function(intab,cgm) {
-    acN<-MSnbase::acquisitionNum(ms2)
+        base = intab[,.(precid=precid,cgmidx=.I)]
-    nR<-length(pre)
+        cgm = base[,{
+            rt = rtime(cgm[cgmidx,1])
-    inRange<-function(i) {
+            inte = intensity(cgm[cgmidx,1])
-        mp<-pre[[i]]
+            .(precid=precid,
-        x<-mzrng[,1]<mp & mp<mzrng[,2]
+              rt = rt,
-        ind<-which(x)
+              intensity = inte,
-        sids <- ids[ind]
+              scan = names(rt))},
-        add <- adduct[ind]
+            by="cgmidx"]
-        dtable(ID=sids,adduct=add)
+        setkey(cgm,scan)
+        cgm[fdata$ms1,idx:=i.idx,on="scan"]
+        cgm
    }
-    lst<-lapply(1:nR,function(i) {
-        dt <- inRange(i)
-        list(n=i,prec_scan=psn[[i]],aN=acN[[i]],ids=dt$ID,adduct=dt$adduct)
-    })
-    nemp<-sapply(lst,function(m) length(m$ids)>0)
-    wrk<-lst[nemp]
-    dfL<-sum(sapply(wrk,function(w) length(w$ids)))
-    df<-dtable(ID=character(dfL),
-               adduct=character(dfL),
-               prec_scan=integer(dfL),
-               aN=integer(dfL),
-               OK=logical(dfL))
-    df$OK<-T #TODO Introduced for testing, be careful.
-    offD<-0
-    for (m in wrk) {
-        l<-length(m$ids)
-        rng<-(offD+1):(offD+l)
-        df[rng,"ID"] <- m$ids
-        df[rng,"prec_scan"] <- m$prec_scan
-        df[rng,"aN"] <- m$aN
-        df[rng,"adduct"] <- m$adduct
-        offD<-offD+l
-    }
-    df[order(df$aN),]
-}
-filt_ms2_by_prcs_ht<-function(ms2,mzrng,ids,adduct) {
-    lgnd<-filt_ms2_by_prcs(ms2,mzrng=mzrng,ids=ids,adduct=adduct)
-    scans<-unique(lgnd$aN)
-    ns<-which(MSnbase::acquisitionNum(ms2) %in% scans)
-    sms2<-ms2[ns]
-    list(ms2=sms2,leg=lgnd)
-}
-pick_unique_precScans<-function(idx) {
-    ps<-unique(idx$prec_scan)
-    mind<-match(ps,idx$prec_scan)
-    data.frame(prec_scan=idx$prec_scan[mind],
-               ID=idx$ID[mind],
-               adduct=idx$adduct[mind],
-               stringsAsFactors=F)
-}
+    trt = tab[!is.na(rt_min)]
+    tnort = tab[is.na(rt_min)]
-pick_uniq_pscan<-function(leg) {
+    resrt = if (nrow(trt)>0L) {
-    res <- leg[,.(prec_scan=unique(prec_scan)),by=c("ID","adduct")]
+                ## Call with rt argument (in seconds).
-    res[order(prec_scan),]
+                mzrng = as.matrix(trt[,.(iso_fine_min,iso_fine_max)])
-    ## ids<-unique(leg$ID)
+                rtrng = as.matrix(trt[,.(rt_min*60,rt_max*60)])
-    ## x<-lapply(ids,function(id) {ups<-unique(leg[id==leg$ID,"prec_scan"]);data.frame(ID=rep(id,length(ups)),prec_scan=ups,stringsAsFactors = F)})
+                new_restab(trt,MSnbase::chromatogram(ms,mz = mzrng, rt = rtrng))
-    ## res<-do.call(rbind,c(x,list(stringsAsFactors=F)))
+            } else data.table()
-    ## res[order(res$prec_scan),]
-}
-verif_prec_fine_ht<-function(preLeg,ms1,mz,mzrng,ids,adduct) {
+    resnort = if (nrow(tnort)>0L) {
-    ## TODO FIXME TESTPHASE Something goes wrong here, all mapply results are
+                  mzrng = as.matrix(tnort[,.(iso_fine_min,iso_fine_max)])
-    ## not OK. More testing needed. ... huh? But, it works now? (23/02/2022)
+                  new_restab(tnort,MSnbase::chromatogram(ms,mz = mzrng))
-    df<-preLeg
+              } else data.table()
-    xx <- dtable(adduct=adduct,ID=ids,mz=mz,mz1=mzrng[,1],mz2=mzrng[,2])
-    df <- preLeg[xx,on=c("ID","adduct")]
-    df$ipns<-match(df$prec_scan,MSnbase::acquisitionNum(ms1))
-    df[, ("mzsp") := .(lapply(ipns,function (ip) if (!is.na(ip)) MSnbase::mz(ms1[[ip]]) else NA_real_))]
-    df$OK<-mapply(function(m1,sp,m2) any((m1<sp) & (sp<m2)),df$mz1,df$mzsp,df$mz2)
-    res<-df[df$OK,]
-    res$ipns<-NULL
-    res$mz1<-NULL
-    res$mz2<-NULL
-    res$mzsp<-NULL
-    res
-}
-filt_ms2<-function(ms1,ms2,mz,errCoarse,errFinePPM) {
-    tmp<-filt_ms2_by_prcs_ht(ms2,mz,errCoarse=errCoarse)
-    legMS2<-tmp$leg
-    legPcs<-pick_uniq_pscan(legMS2)
-    legPcs<-verif_prec_fine_ht(legPcs,ms1=ms1,mz=mz,errFinePPM=errFinePPM)
-    x<-Map(function (id,psn) {legMS2[id==legMS2$ID & psn==legMS2$prec_scan,]},legPcs[,"ID"],legPcs[,"prec_scan"])
-    x<-do.call(rbind,c(x,list(make.row.names=F,stringsAsFactors=F)))[c("ID","aN")]
-    rownames(x)<-NULL
-    x<-x[order(x$aN),]
-    uids<-unique(x$ID)
-    acN<-MSnbase::acquisitionNum(ms2)
-    res<-lapply(uids,function(id) {
-        x<-ms2[match(x[id==x$ID,"aN"],acN)]
-        fData(x)[,"rtm"]<-MSnbase::rtime(x)/60.
-        fData(x)[,"maxI"]<-sapply(MSnbase::intensity(x),max)
-        x})
-    names(res)<-uids
-    res
-}
-filt_ms2_fine <- function(ms1,ms2,mz,ids,adduct,err_coarse_fun,err_fine_fun) {
-    ## This function is supposed to extract only those MS2 spectra for
-    ## which it is proven that the precursor exists within the fine
-    ## error range.
-    mzrng_c <- gen_mz_range(mz,err_coarse_fun(mz))
-    mzrng_f <- gen_mz_range(mz,err_fine_fun(mz))
-    tmp<-filt_ms2_by_prcs_ht(ms2,mzrng=mzrng_c,ids=ids,adduct=adduct)
-    legMS2<-tmp$leg
-    message("nrow legMS2:", nrow(legMS2))
-    legPcs<-pick_uniq_pscan(legMS2)
-    legPcs<-verif_prec_fine_ht(legPcs,ms1=ms1,mz=mz,mzrng=mzrng_f,ids=ids,adduct=adduct)
-    ## x<-Map(function (id,psn,a) {legMS2[id==legMS2$ID & a==legMS2$adduct & psn==legMS2$prec_scan,]},legPcs[,"ID"],legPcs[,"prec_scan"],legPcs[,"adduct"])
-    ## x <- data.table::rbindlist(x)[,.(ID,adduct,aN)]
-    x <- legMS2[legPcs[,.(ID,adduct,prec_scan)],on=c("ID","adduct","prec_scan")]
-    ## x<-do.call(rbind,c(x,list(make.row.names=F,stringsAsFactors=F)))[c("ID","aN")]
-    ## rownames(x)<-NULL
-    x<-x[order(x$aN),]
-    x
-}
-extr_ms2<-function(ms1,ms2,ids,mz,adduct,err_coarse_fun, err_fine_fun) {
-    ## Extraction of MS2 EICs and spectra.
-    x <- filt_ms2_fine(ms1=ms1,
-                       ms2=ms2,
-                       mz=mz,
-                       ids=ids,
-                       adduct=adduct,
-                       err_coarse_fun=err_coarse_fun,
-                       err_fine_fun=err_fine_fun)
-    ## This was here before and obviously wrong when multiple adducts
-    ## correspond to the same ID:
-    ## 
-    ## uids <- unique(x$ID)
-    ## uadds <- unique(x$adduct)
-    idadd <- x[,unique(.SD),.SDcols=c("ID","adduct")]
-    acN<-MSnbase::acquisitionNum(ms2)
-    chunks <- Map(function(id,ad) {
-        ans <- x[id==x$ID & ad==x$adduct,]$aN
-        sp<-ms2[which(acN %in% ans)]
-        res <- gen_ms2_spec_blk(sp)
-        res$ID <- id
-        res$adduct <- ad
-        res
-    },
-    idadd$ID,idadd$adduct)
-    data.table::rbindlist(chunks,fill = T)
-}
+    res = rbind(resnort,resrt,fill=T)
+    res[,rt:=rt/60]
-add_ms2_prcs_scans<-function(ms2,idx) {
-    df<-idx
-    df$prec_scan<-integer(nrow(idx))
-    for (i in 1:nrow(df)) {
-        sn<-df$sn[[i]]
-        df$prec_scan[[i]]<-MSnbase::precScanNum(ms2[[sn]])
-    }
-    ## sn<-as.integer(df$sn)
-    ## df$prec_scan[]<-MSnbase::precScanNum(ms2[sn])
-    ## 
-    ## This errors with: msLevel for signature "NULL" for a specific
-    ## compound. However, the above approach is cool.
-    df
 }
-refn_ms2_by_prec<-function(idxMS2,preFine) {
+get_fdata <- function(ms) {
-    pf<-preFine[preFine$OK,]
+    fdata = as.data.table(fData(ms),keep.rownames="scan")
-    pf$ID<-as.character(pf$ID)
+    setkey(fdata,scan)
-    idxMS2$OK<-logical(nrow(idxMS2))
+    res = list()
-    idxMS2$ID<-as.character(idxMS2$ID)
+    res$ms1 = fdata[msLevel==1L,.(scan,
-    for (n in 1:nrow(idxMS2)) {
+                                  idx=spIdx)]
-        scan<-idxMS2$prec_scan[[n]]
+    res$ms2 = fdata[msLevel==2L,.(scan,
-        id2<-idxMS2$ID[[n]]
+                                  idx=spIdx,
-        ppf<-pf[pf$ID==id2,]
+                                  an=acquisitionNum,
-        inPF<- ppf$prec_scan %in% scan
+                                  rt=retentionTime/60.,
-        idxMS2$OK[[n]]<-any(inPF)
+                                  intensity=basePeakIntensity,
-    }
+                                  ce=collisionEnergy,
+                                  prec_mz=precursorMZ,
-    idxMS2
+                                  prec_idx=precursorScanNum)]
-}
-trim_ms2_by_prec<-function(rawMS2,mz,errCoarse,errFinePPM) {
-    idxMS2<-filt_ms2_by_prcs(ms2=ms2,mz=mz,errCoarse=errCoarse)
-}
-grab_ms2_spec<-function(idx,raw) {
-    idx<-idx[idx$OK,]
-    IDs<-unique(idx$ID)
-    res<-lapply(IDs,function (id) {
-        sn<-idx$sn[idx$ID==id]
-        spec<-raw[sn]
-        rts<-MSnbase::rtime(spec)
-        lmz<-MSnbase::mz(spec)
-        lI<-MSnbase::intensity(spec)
-        rts<-rts/60.
-        names(lmz)<-NULL
-        names(lI)<-NULL
-        Map(function (mz,I,rt) {
-            mat<-matrix(data=0.0,ncol=length(mz),nrow=2,dimnames=list(c("mz","intensity")))
-            mat["mz",]<-as.numeric(mz)
-            mat["intensity",]<-as.numeric(I)
-            list(rt=rt,spec=mat)
-        },lmz,lI,rts)
-    })
-    names(res)<-IDs
    res
-}
-gen_ms2_chrom<-function(ms2Spec) {
-    lapply(ms2Spec, function(sp)
-    {
-        if (length(sp)>0) {
-            nRow<-length(sp)
-            mat<-matrix(0.0,nrow=nRow,ncol=2)
-            rt<-sapply(sp,function(x) x$rt)
-            ord<-order(rt)
-            intn<-lapply(sp,function (x) max(x$spec))
-            rt<-as.numeric(rt[ord])
-            intn<-as.numeric(intn[ord])
-            names(intn)<-NULL
-            names(rt)<-NULL
-            mat[,1]<-rt
-            mat[,2]<-intn
-            colnames(mat)<-c("rt","intensity")
-            mat
-        } else list()
-    })
 }
-gen_ms1_chrom <- function(raw,mz,errEIC,id,rt=NULL,errRT=NULL) {
+relate_ms2_to_precid <- function(coarse,ms2,cgram_ms1) {
-    mzRng<-gen_mz_range(mz,err = errEIC)
+    ## Take `coarse' table (the one with coarse mass limits), ms2
-    rtRng<-gen_rt_range(rt,err = errRT)
+    ## fData and ms1 chromatogram, then relate precids from cgram_ms1
-    x<-MSnbase::chromatogram(raw,mz=mzRng,msLevel=1,missing=0.0,rt=rtRng)
+    ## to ms2 data.
-    res<-lapply(x,function (xx) {
+    ## Select those MS2 entries the parents of which coarsely match
-        rt<-MSnbase::rtime(xx)/60.
+    ## compound lists masses.
-        ints<-MSnbase::intensity(xx)
+    res = ms2[coarse,on=.(prec_mz>iso_coarse_min,prec_mz<iso_coarse_max),.(prec_mz=x.prec_mz,precid,prec_idx,scan,idx,ce,rt,intensity),nomatch=NULL]
-        df<-dtable(rt=rt,intensity=ints)
+    setkey(res,precid,prec_idx)
-        df
-    })
+    ## Now, make sure those coarsely matched MS2 actually have a
-    names(res)<-id
+    ## parent that finely matches something in the chromatogram (and
-    res
+    ## this is by ensuring that a `precid' with the correct scan (idx)
+    ## shows up in the chromatogram.
+    x = cgram_ms1[!is.na(intensity)]
+    x[res,on=.(precid,idx==prec_idx),
+      .(precid,ce,scan=i.scan,
+        idx=i.idx,rt=i.rt,
+        intensity=i.intensity),nomatch=NULL]
 }
+extract_spectra <- function(ms,cgram_ms2) {
-gen_ms1_chrom_ht<-function(raw,mz,errEIC,rt=NULL,errRT=NULL) {
+    ## This will extract full MS2 spectra based on ms2 chromatogram entries.
-    mzRng<-gen_mz_range(mz,err=errEIC)
+    indices = cgram_ms2[,.SD,.SDcol=c("precid","scan","idx")]
-    rtRng<-gen_rt_range(rt,err=errRT)
-    res<-MSnbase::chromatogram(raw,mz=mzRng,msLevel=1,missing=0.0,rt=rtRng)
+    res = empty_spectra_table()
-    fData(res)[["ID"]]<-rownames(mzRng)
+    selind = indices[,unique(.SD),.SDcol=c("scan","idx")]
-    res
+    sel = ms[selind$idx]
-}
+    masses = mz(sel)
-get_ext_width <- function(maxid) {as.integer(log10(maxid)+1)}
+    intensities = intensity(sel)
-id_fn_ext<-function(width,id) {
+    res = selind
-    formatC(as.numeric(id),width=width,flag=0)
+    setkey(res,scan)
-}
+    res = res[,data.table(mz=masses[[scan]],
+                          intensity=intensities[[scan]]),
-write_eic<-function(eic,suff="eic.csv",dir=".",width=get_ext_width(max(as.numeric(names(eic))))) {
+              keyby=c("scan")]
-    Map(function (e,n) {
+    res[indices,on=.(scan),precid:=i.precid]
-        if (length(e)>0) {
-            fn<-file.path(dir,paste(id_fn_ext(width,n),suff,sep="."))
-            tab2file(tab=e,file=fn)
-        }
-    },eic,names(eic))
 }
-write_ms2_spec<-function(ms2Spec,dir=".") {
+## PRESCREENING
-    ids<-as.numeric(names(ms2Spec))
-    maxid<-max(ids)
-    width<-get_ext_width(maxid)
+## This function extracts intensity maxima on intervals given by
+## RT vectors rt_1 and rt_2.
-    for (id in names(ms2Spec)) {
+find_ms1_max <- function(rt,intensity,rt_1,rt_2)
-        sp<-ms2Spec[[id]]
+{
-        if (length(sp)>0) {
+    x = mapply(function (rt_1,rt_2) {
-            dr<-file.path(dir,id_fn_ext(width,id))
+        rt_ival <- c(rt_1,rt_2)
-            dir.create(path=dr,showWarnings=F)
+        intv <- findInterval(rt,rt_ival)
-            for (s in sp) {
+        lintv = length(intv)
-                fn<-file.path(dr,paste("RT_",s$rt,"_spectrum.csv",sep=""))
+        if (intv[1]==0L && intv[lintv] == 2L) {
-                df<-t(s$spec)
+            pos = match(c(1L,2L),intv)
-                colnames(df)<-c("mz","intensity")
+        } else if (intv[1]==1L && intv[lintv]!=1L) {
-                tab2file(tab=df,file=fn)
+            pos = c(1L,match(2L,intv))
-            }
+        } else if (intv[1]==0L && intv[lintv]!=0L) {
+            pos = c(match(1L,intv),lintv)
+        } else {
+            pos = c(1L,lintv)
        }
-    }
+        pmax = pos[[1]] + which.max(intensity[pos[[1]]:pos[[2]]]) - 1L
-}
+        if (length(pmax)==0L) pmax = pos[[1]]
+        c(rt[pmax],intensity[pmax])
-extr_msnb <-function(file,wd,mz,errEIC, errFinePPM,errCoarse=0.5,rt=NULL,errRT=NULL,mode="inMemory") {
+    }, rt_1, rt_2, USE.NAMES=F)
-    ## Perform the entire data extraction procedure.
+    x
-    ## 
-    ## file - The input mzML file.
-    ## wd - Top-level directory where the results should be deposited.
-    ## mz - A named vector of precursor masses for which to scan the
-    ## file. The names can be RMassBank IDs.
-    ## rt - A named vector of length 1, or same as mz, giving the retention
-    ## times in minutes. The names should be the same as for mz.
-    ## errRT - A vector of length 1, or same as mz, giving the
-    ## half-width of the time window in which the peak for the
-    ## corresponding mz is supposed to be.
-    ## errEIC - Absolute mz tolerance used to extract precursor EICs.
-    ## errFinePPM - Tolerance given in PPM used to associate input
-    ## masses with what the instrument assigned as precursors to MS2
-    ## products.
-    message("Loading ",file," in mode ",mode, ".")
-    data<-load_raw_data(file,mode=mode)
-    ms1<-data[["ms1"]]
-    ms2<-data[["ms2"]]
-    message("Done loading ",file,".")
-    ## EICs for precursors.
-    message("Extracting precursor EICs. Please wait.")
-    eicMS1<-gen_ms1_chrom(raw=ms1,mz=mz,errEIC=errEIC,rt=rt,errRT=errRT)
-    write_eic(eicMS1,dir=wd)
-    message("Extracting precursor EICs finished.")
-    ## Extract MS2 spectra.
-    message("Extracting MS2 spectra.")
-    idxMS2<-filt_ms2_by_prcs(ms2=ms2,mz=mz,errCoarse=errCoarse)
-    message("Resampling MS2 spectra.")
-                                        # idxMS2<-add_ms2_prcs_scans(ms2,idxMS2)
-    prsc<-pick_unique_precScans(idxMS2)
-    vprsc<-verif_prec_fine(preSc=prsc,ms1=ms1,mz=mz,errFinePPM = errFinePPM)
-    idxMS2<-refn_ms2_by_prec(idxMS2=idxMS2,preFine=vprsc)
-    message("Resampling MS2 spectra finished.")
-    ms2Spec<-grab_ms2_spec(idxMS2,ms2)
-    eicMS2<-gen_ms2_chrom(ms2Spec)
-    message("Extracting MS2 spectra finished.")
-    write_eic(eicMS2,dir=wd,suff="kids.csv",width=get_ext_width(max(as.numeric(names(eicMS1)))))
-    specDir<-file.path(wd,"ms2_spectra")
-    dir.create(specDir,showWarnings = F)
-    write_ms2_spec(ms2Spec,dir=specDir)
-    message("Done with ", file)
 }
-##' @importFrom MSnbase filterMz
+analyse_extracted_data <- function(db,prescreen_param) {
-extr_msnb_ht <-function(file,wd,mz,errEIC, errFinePPM,errCoarse,fnSpec,rt=NULL,errRT=NULL,mode="onDisk") {
+    ## Note
-    ## Perform the entire data extraction procedure.
+    ##
-    ## 
+    ## I am working on this two days before the group meeting. The
-    ## file - The input mzML file.
+    ## point of a meeting is to have something to show, so I will just
-    ## wd - Top-level directory where the results should be deposited.
+    ## minimally adapt the old `analyse_extracted_data' to the new
-    ## mz - A named vector of precursor masses for which to scan the
+    ## `db' entries in the state. I suspect, even this is not going to
-    ## file. The names can be RMassBank IDs.
+    ## be very easy.
-    ## rt - A named vector of length 1, or same as mz, giving the retention
+    ##
-    ## times in minutes. The names should be the same as for mz.
+    ## If no meeting was happening, then I'd create a nice, sleek
-    ## errRT - A vector of length 1, or same as mz, giving the
+    ## function that fully adheres to the new `data model'
-    ## half-width of the time window in which the peak for the
+    ## philosophy. Alas, ... 
-    ## corresponding mz is supposed to be.
-    ## errEIC - Absolute mz tolerance used to extract precursor EICs.
-    ## errFinePPM - Tolerance given in PPM used to associate input
-    ## masses with what the instrument assigned as precursors to MS2
-    ## products.
-    message("Loading ",file," in mode ",mode, ".")
-    data<-load_raw_data(file,mode=mode)
-    ms1<-data[["ms1"]]
-    ms2<-data[["ms2"]]
-    message("Done loading ",file,".")
-    ## Filtering
-    mzCrs<-gen_mz_range(mz=mz,err=errCoarse)
-    mzMin<-min(mzCrs)
-    mzMax<-max(mzCrs)
-    ms1<-filterMz(ms1,c(mzMin,mzMax))
-    fms2<-filt_ms2(ms1,ms2,mz,errCoarse=errCoarse,errFinePPM=errFinePPM)
-    ## EICs for precursors.
-    message("Extracting precursor EICs. Please wait.")
-    eicMS1<-gen_ms1_chrom_ht(raw=ms1,mz=mz,errEIC=errEIC,rt=rt,errRT=errRT)
-    message("Extracting precursor EICs finished.")
+    ms1 = db$extr$cgm$ms1
+    ms2 = db$extr$cgm$ms2
+    spectra = db$extr$spectra
+    precursors = db$precursors
-    x<-list(eic=eicMS1,ms2=fms2)
+    ## Get file info.
-    saveRDS(object=x,file=file.path(wd,fnSpec))
+    ms2_noise_table = precursors[spectra,.(file,intensity),on="precid",by=.EACHI,nomatch=NULL]
-    x
-}
+    ## Calculate threshold.
+    ms2_noise_table[,threshold:=0.33333333*mean(intensity),by="file"]
-extr_eic_ms1 <- function(tab,err) {
+    ## Reduce table.
-    ## Asynchronous extraction of ms1 spectra. The result is a list of
+    ms2_noise_table = ms2_noise_table[,.(threshold=first(threshold,1L)),keyby="precid"]
-    ## running futures.
-    file <- unique(tab$file)
-    res <-lapply(file,function (fn) future::futur(extr_fn(fn), lazy=T))
-    names(res) <- file
-    res
-}
-##' @importFrom MSnbase filterMz readMSData
-##' @export
+    ## Parameters.
-extract <- function(fn,tag,tab,err_ms1_eic.,err_coarse,err_fine,err_rt.,missing_precursors) {
+    presconf = conf_trans_pres(prescreen_param)
-    ## Extracts MS1 and MS2 EICs, as well as MS2 spectra, subject to
+    rt_shift = presconf$ret_time_shift_tol
-    ## tolerance specifications.
+    det_ms2_noise = presconf$det_ms2_noise
+    ms2_int_thresh = presconf$ms2_int_thresh
+    ms1_int_thresh = presconf$ms1_int_thresh
+    ## We start populating the ms2 qa table.
+    tab_ms2 = copy(ms2)
+    ## Calculate noise.
+    tab_ms2[ms2_noise_table,qa_ms2_good_int:=intensity>threshold,on=.(precid)]
+    ## Rename as downstream wants it.
+    setnames(tab_ms2,c("rt","intensity"),c("ms2_rt","ms2_int"))
+    tab_ms2[,`:=`(rt_left = ms2_rt - rt_shift,rt_right = ms2_rt + rt_shift)]
+    ## Get mean ms1 value.
+    tab_ms1 = copy(ms1)
-    ## TODO: Still detecting external references ... but which?
+    ## To (artificially) differentiate beween ms1 and ms2 (because,
-    ## However, the results check out, compared to sequential access.
+    ## they get stapled together later on, set scan to NA_character_.
-    err_coarse_fun <- gen_mz_err_f(err_coarse,
+    tab_ms1[,scan:=NA_character_]
-                                   "ms1 coarse error: Only ppm, or Da units allowed.")
+    tab_ms1_mean = tab_ms1[,.(ms1_mean=mean(intensity,na.rm=T)),keyby="precid"]
-    err_fine_fun <- gen_mz_err_f(err_fine,
+    ## Perform MS1 maxima calculation in the neighbourhood of each
-                                 "ms1 fine error: Only ppm, or Da units allowed.")
+    ## MS2 result.
-    err_ms1_eic <- gen_mz_err_f(err_ms1_eic.,
+    tmp = tab_ms1[tab_ms2,
-                                "eic error: Only ppm, or Da units allowed.")
+    {
+        xx = find_ms1_max(rt,intensity,i.rt_left,i.rt_right)
+        .(scan=i.scan,
+          ms1_rt = xx[1,],
+          ms1_int = xx[2,])
+    },on=.(precid),
+    by=.EACHI,
+    nomatch=NULL]
+    ## Calculate QA values.
+    tab_ms2[tmp,on=.(precid,scan),c("ms1_rt","ms1_int"):=.(i.ms1_rt,i.ms1_int)]
+    tab_ms2[,c("rt_left","rt_right"):=c(NULL,NULL)]
+    tab_ms2[tab_ms1_mean,ms1_mean:=i.ms1_mean]
+    tab_ms2[,`:=`(qa_ms1_good_int=fifelse(ms1_int>ms1_int_thresh,T,F),
+               qa_ms1_above_noise=F,
+               qa_ms2_near=F)]
+    ## TODO: I wonder if so stupidly auto-calculated ms1 noise should
+    ## be taken into account at all? My recollection from earlier
+    ## times was that it was helpful, at least sometimes.
+    tab_ms2[qa_ms1_good_int==T,qa_ms1_above_noise:=fifelse(ms1_int>ms1_mean/3.,T,F)]
+    tab_ms2[qa_ms1_good_int==T & qa_ms1_above_noise==T & qa_ms2_good_int==T,qa_ms2_near:=T]
+    tab_ms2$qa_ms2_exists=T
+    ## Reduce tab_ms1 to only to MS1 with no children.
+    precs_ms1 = tab_ms1[,unique(precid)]
+    precs_ms2 = tab_ms2[,unique(precid)]
+    precs_noms =  data.table(precid=precs_ms1[!(precs_ms1 %in% precs_ms2)])
+    tab_noms2 = tab_ms1[precs_noms,
+                        on=.(precid),
+                        .SD,nomatch=NULL]
+    tab_noms2 = tab_noms2[,.(ms1_mean=mean(intensity,na.rm=T),
+                             ms1_rt=rt[which.max(intensity)],
+                             ms1_int=max(intensity, na.rm=T)),
+                          keyby="precid"]
+    ## QA for the above (lazy qa ... take only the max peak into account).
+    tab_noms2[,c("qa_ms1_good_int","qa_ms1_above_noise"):=.(ms1_int>ms1_int_thresh,ms1_int>ms1_mean/3.)]
+    ## MS2 QA criteria all fail.
+    tab_noms2[,c("qa_ms2_exists","qa_ms2_good_int","qa_ms2_near"):=.(F,F,F)]
+    ## Bind MS1-only and MS1/MS2 entries together.
+    res = rbind(tab_ms2,tab_noms2,fill=T,use.names=T)
+    ## If ms1_int has been calculated as a Na(N) value, this means
+    ## that no MS1 has been found for that precid.
+    res[,qa_ms1_exists:=F]
+    res[!is.na(ms1_int),qa_ms1_exists:=T]
+    data.table::setkey(res,precid)
-    err_rt <- gen_rt_err(err_rt.,
-                         "rt error: Only s(econds), or min(utes) allowed.")
-    tab <- data.table::as.data.table(tab)
+    qflg = QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
-    ## chunk <- tab[file==fn]
+    res[,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
-    mz <- tab$mz
+    res[.(T),del_rt:=abs(ms2_rt - ms1_rt),on="qa_pass",by='scan']
-    rt <- tab$rt
+    resby = BASE_KEY_MS2[! (BASE_KEY_MS2 %in% 'scan')]
-    id <- tab$ID
+    res[.(T),qa_tmp_ms1_max:= ms1_int==max(ms1_int),on="qa_pass",by=resby]
-    adduct <- tab$adduct
+    res[,ms2_sel:=F]
-    names(mz) <- id
+    res[.(T,T),ms2_sel:= del_rt == del_rt[which.min(del_rt)],on=c("qa_pass","qa_tmp_ms1_max"),by=resby]
-    names(rt) <- id
+    res[,qlt_ms1:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms1_exists",
-    mzerr <- err_coarse_fun(mz)
+                                                                 "qa_ms1_above_noise",
-    mzrng <- gen_mz_range(mz=mz,err=mzerr)
+                                                                 "qa_ms1_good_int")]
-    rtrng <- gen_rt_range(rt=rt,err=err_rt)
+    res[,qlt_ms2:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms2_exists",
-    mzmin <- min(mzrng)
+                                                                 "qa_ms2_near",
-    mzmax <- max(mzrng)
+                                                                 "qa_ms2_good_int")]
-    read_ms1 <- function() {
+    res[is.na(qlt_ms1),qlt_ms1:=0L]
-        ms1 <- readMSData(file=fn,msLevel=1,mode="onDisk")
+    res[is.na(qlt_ms2),qlt_ms2:=0L]
-        ms1 <- filterMz(ms1,mz=c(mzmin,mzmax),msLevel=1)
-        ms1
+    ## Set all other flags to false when qa_ms1_exists == F by decree.
-    }
+    flgs = c(QA_FLAGS,"ms2_sel")
-    read_ms2 <- function() {
+    res[qa_ms1_exists == F,(flgs):=F]
-        ms2 <- MSnbase::readMSData(file=fn,msLevel=2,mode="onDisk")
-        ms2
-    }
-    read_all <- function() {
-        MSnbase::readMSData(file=fn,msLevel. = c(1,2),mode="onDisk")
-    }
-    extr_ms1_eic <- function(ms1) {
-        eic <- MSnbase::chromatogram(ms1,mz=mzrng,msLevel=1,missing=0.0,rt=rtrng)
-        bits <- dtable(N=sapply(eic,NROW))
-        bigN <- bits[,sum(N)]
-        bits[,idx:=paste0('I',.I)]
-        bits$ID <- id
-        bits$adduct <- adduct
-        bits$tag <- tag
-        res<-dtable(rt=numeric(bigN),
-                    intensity=numeric(bigN),
-                    tag=tag,
-                    adduct=bits[,rep(adduct,N)],
-                    ID=bits[,rep(ID,N)],
-                    idx=bits[,rep(idx,N)])
-        data.table::setkey(res,idx)
-        names(eic)<-bits$idx
-        res[,c("rt","intensity") :=
-                 .(MSnbase::rtime(eic[[idx]])/60.,
-                   MSnbase::intensity(eic[[idx]])),
-                   by=idx]
-        data.table::setkeyv(res,BASE_KEY)
-        res
-    }
-    if (is.null(missing_precursors)) missing_precursors <- "do_nothing"
-    if (missing_precursors != "do_nothing") {
-        ms <- clean_na_precs(read_all(),missing_precursors = missing_precursors)
-        fdta <- as.data.table(MSnbase::fData(ms),keep.rownames = "rn")
-        fdta[,idx := .I]
-        idx_ms1 <- fdta[msLevel == 1,idx]
-        idx_ms2 <- fdta[msLevel == 2,idx]
-        ms1 <- ms[idx_ms1]
-        ms2 <- ms[idx_ms2]
-    } else {
-        ms1 <- read_ms1()
-        ms2 <- read_ms2()
-    }
-    res_ms1 <- extr_ms1_eic(ms1)
-    res_ms2 <- extr_ms2(ms1=ms1,
-                        ms2=ms2,
-                        ids=id,
-                        mz=mz,
-                        adduct=adduct,
-                        err_coarse_fun=err_coarse_fun,
-                        err_fine_fun=err_fine_fun)
-    res_ms2[,"tag":=tag]
-    ## Clean all the NA intensity MS2 (sometimes a consequence of 'fill'.)
-    res_ms2<-res_ms2[!is.na(intensity)]
-    res <- list(ms1=res_ms1,
-                ms2=res_ms2)
    res
 }
-gen_ms2_spec_blk <- function(spectra) {
-    dt <- dtable(mz=MSnbase::mz(spectra),
-                 intensity=MSnbase::intensity(spectra),
-                 rt = lapply(MSnbase::rtime(spectra),function (z) z/60.),
-                 CE = MSnbase::collisionEnergy(spectra),
-                 an = MSnbase::acquisitionNum(spectra))
-    dt[,maspI:=sapply(intensity,function (zz) max(zz))]
-    data.table::rbindlist(apply(dt,1,function(row) dtable(intensity=row[["intensity"]],
-                                                          rt = row[["rt"]],
-                                                          mz = row[["mz"]],
-                                                          CE = row[["CE"]],
-                                                          an = row[["an"]])))
-}
--- a/R/metfrag.R
+++ b/R/metfrag.R
@@ -36,7 +36,7 @@ metfrag_get_stag_tab <- function(summ) {
    ## Argument summ can be a subset of actual `summ' table.
    x = gen_1d_keytab(summ)
    data.table::setnames(x,old="key1d",new="stag")
-    res = x[summ,`:=`(CE=i.CE,ion_mz=mz)]
+    res = x[summ,`:=`(ce=i.ce,ion_mz=mz)]
    res
 }
@@ -54,18 +54,21 @@ get_mf_res_ext <- function(fn) {
 metfrag_run <- function(param,path,subpaths,db_file,stag_tab,ms2,runtime,java_bin,nproc = 1L) {
    keys = intersect(colnames(stag_tab),colnames(ms2))
+    rms2 = ms2[stag_tab,on=keys,nomatch=NULL]
    message("Generating MetFrag configs.")
-    file_tab = ms2[stag_tab,{
+    file_tab = rms2[,{
        r = write_metfrag_config(param = ..param,
                                 path = ..path,
                                 subpaths = ..subpaths,
                                 db_file = ..db_file,
-                                 stag = stag,
+                                 stag = first(stag),
-                                 adduct = adduct,
+                                 adduct = first(adduct),
-                                 ion_mz = ion_mz,
+                                 ion_mz = first(ion_mz),
                                 spec = data.table(mz=mz,intensity=intensity))
-        c(r,stag = stag)
+        c(r,stag = first(stag))
-    },by=.EACHI,on=keys]
+    },keyby=keys]
    message("Done generating MetFrag configs.")
    withr::with_dir(path,{
@@ -89,33 +92,30 @@ metfrag_run <- function(param,path,subpaths,db_file,stag_tab,ms2,runtime,java_bi
 mf_narrow_summ <- function(summ,kv,ms2_rt_i=NA_integer_,ms2_rt_f=NA_integer_) {
    skey = data.table::key(summ)
-    cols = c("adduct","tag","ID","CE","an","mz","qa_pass","ms2_rt")
+    cols = union(names(skey),c("adduct","tag","ID","ce","precid","scan","mz","qa_pass","ms2_rt"))
-    nsumm = get_rows_from_summ(summ,kv,cols)
+    dtkv = as.data.table(kv)
+    nsumm = summ[dtkv,on=names(kv),.SD,.SDcols=cols]
    nsumm = nsumm[qa_pass==T] # Those that make sense.
-    nsumm_key = union(SUMM_KEY,"ms2_rt")
+    nsumm_key = intersect(union(SUMM_KEY,"ms2_rt"),colnames(nsumm))
    data.table::setkeyv(nsumm,nsumm_key)
-    if (!is.na(ms2_rt_i)) {
+    ms2_rt_i = if (!is.na(ms2_rt_i)) ms2_rt_i else 0.
-        nsumm = nsumm[ms2_rt>(ms2_rt_i)]
+    ms2_rt_f = if (!is.na(ms2_rt_f)) ms2_rt_f else Inf
-    }
-    if (!is.na(ms2_rt_f)) {
+    nsumm[ms2_rt > (ms2_rt_i) & ms2_rt < (ms2_rt_f)]
-        nsumm = nsumm[ms2_rt<(ms2_rt_f)]
-    }
-    nsumm
 }
 get_metfrag_targets <- function(stag_tab,ms2) {
    ## Take the columns we need from summ.
    x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz")]
-    mrg_keys = c(intersect(key(ms2),key(summ)),"an")
+    mrg_keys = c(intersect(key(ms2),key(summ)),"scan")
    x=ms2[x,.(CE=CE,ion_mz=i.mz,mz,intensity),on=mrg_keys,by=.EACHI]
    ## Get column order so that `an' follows `CE'.
-    resnms = setdiff(mrg_keys,"an")
+    resnms = setdiff(mrg_keys,"scan")
-    nms = union(union(resnms,"CE"),c("an","ion_mz","mz","intensity"))
+    nms = union(union(resnms,"CE"),c("scan","ion_mz","mz","intensity"))
    data.table::setcolorder(x,neworder = nms)
-    setkeyv(x,unique(c(resnms,"CE","an")))
+    setkeyv(x,unique(c(resnms,"CE","scan")))
    x
 }
@@ -220,7 +220,7 @@ summarise_metfrag_results <- function(param,path,subpaths,cand_parameters,db_sco
    }
    .adapt_col_types <- function(x) {
-        x[,(names(db_scores)):=lapply(.SD, as.numeric),.SDcol=names(db_scores)]
+        if (length(db_scores)>0) x[,(names(db_scores)):=lapply(.SD, as.numeric),.SDcol=names(db_scores)] else x
    }
    .calc_basic_scores <- function(x) {

--- a/R/mix.R
+++ b/R/mix.R
@@ -191,7 +191,7 @@ gen_empty_summ <- function() {
 ##     ms2_cols <- intersect(colnames(qa_ms2),SUMM_COLS)
 ##     ms2_cols <- setdiff(ms2_cols,colnames(summ))
 ##     summ <- qa_ms2[summ,c(..comp_cols,..ms1_cols,..ms2_cols),on=BASE_KEY]
-##     data.table::setkeyv(summ,c(BASE_KEY_MS2,"an"))
+##     data.table::setkeyv(summ,c(BASE_KEY_MS2,"scan"))
 ##     summ[,qa_ms1_exists:=the_ifelse(!is.na(qa_ms1_good_int),T,F)]
 ##     summ[,qa_ms2_exists:=the_ifelse(!is.na(CE),T,F)]
 ##     summ[,qa_pass:=apply(.SD,1,all),.SDcols=QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]]
@@ -547,274 +547,6 @@ conf_trans_pres <- function(pres_list) {
    pres_list
 }
-## PRESCREENING
-create_qa_table <- function(extr,conf_presc) {
-    ## The first input argument is the extracted `ms`, table
-    ## containing MS1 and MS2 spectra. The argument `conf_presc` is
-    ## m$conf$prescreen, the prescreening parameters given in the conf
-    ## file.
-    ## The qa table is just a copy of ms with added quality control
-    ## columns QA_COLS.
-    ## The QA_FLAGS columns are flags specifying which properties of
-    ## compounds are known well, or not.
-    ## For each compound (mass) we ask the following questions:
-    ## qa_ms1_exists -- does the MS1 spectrum exist at all?
-    ## qa_ms2_exists -- do we have any MS2 spectra at all?
-    ## qa_ms1_above_noise -- is MS1 above the noise treshold?
-    ## qa_ms2_near -- is there any MS2 spectrum inside the tolerated
-    ## retention time window around the MS1 peak? That is, are we
-    ## non-RT-shifted?
-    ## qa_ms2_good_int -- Is there any MS2 spectral intensity greater
-    ## than the MS2 threshold and less than the MS1 peak?
-    ## qa_pass -- did the spectrum pass all the checks?
-    ## The columns in QA_NUM_REAL are:
-    ## 
-    ## ms1_int -- the maximum intensity of MS1 spectrum over the
-    ## entire run;
-    ##
-    ## ms1_rt -- the retention time of the peak MS1.
-    ## The columns in QA_NUM_INT are:
-    ##
-    ## ms2_sel -- index of the selected MS2 spectrum; if not NA, the
-    ## associated spectrum passed all the checks (qa_pass == T); the
-    ## spectrum itself is in one of the member sublists of the `spec'
-    ## column. The integer `ms2_sel' is then the index of the spectrum
-    ## in that sublist.
-    ##
-    ## ms1_rt_ind -- TODO (but not important to end users).
-    qa <- list(prescreen=conf_presc)
-    ## checks <- extr$ms2[,{
-    ## },keyby=BASE_KEY_MS2]
-    checks <- extr$ms2
-    checks[,(QA_FLAGS):=F]
-    ## message("checks:")
-    ## print(checks)
-    ## message("done checks")
-    checks[,(QA_NUM_INT):=NA_integer_]
-    checks[,(QA_NUM_REAL):=NA_real_]
-    setkeyv(checks,BASE_KEY_MS2)
-    qa$checks <- checks
-    qa
-}
-assess_ms1 <- function(m) {
-    qa <- m$qa
-    ms1 <- m$extr$ms1
-    ## Calculate auxiliary variables and indices.
-    qa_ms1 <- ms1[,.(ms1_rt_ind=which.max(intensity)),keyby=BASE_KEY]
-    qa_ms1 <- ms1[qa_ms1,.(ms1_rt_ind=ms1_rt_ind,
-                           ms1_int=intensity[[ms1_rt_ind]],
-                           ms1_rt=rt[[ms1_rt_ind]],
-                           ms1_mean=mean(intensity)),on=BASE_KEY,by=.EACHI]
-    qa_ms1[,qa_ms1_good_int := ms1_int > qa$prescreen$ms1_int_thresh]
-    qa_ms1[,qa_ms1_above_noise := F]
-    qa_ms1[qa_ms1_good_int==T,qa_ms1_above_noise := .(ms1_int > qa$prescreen$s2n*ms1_mean)]
-    ## checks[(!qa_ms1_above_noise),c("qa_ms2_good_int","qa_ms2_near","qa_ms2_exists","qa_pass"):=F]
-    ## qa_ms1 <- check_ms1_noise(check_ms1(qa_ms1))
-    m$qa$ms1 <- qa_ms1
-    m
-}
-assess_ms2 <- function(m) {
-    presconf <- conf_trans_pres(m$conf$prescreen)
-    ms1 <- m$extr$ms1
-    ms2 <- m$extr$ms2
-    qa_ms1 <- m$qa$ms1
-    qa_ms2 <- ms2[qa_ms1[qa_ms1_above_noise==T],.(CE=unique(CE),
-                                                  pc_rt=i.ms1_rt,
-                                                  pc_int=i.ms1_int,
-                                                  an=unique(an)),on=BASE_KEY,by=.EACHI,nomatch=NULL]
-    rt_win2 <- presconf$ret_time_shift_tol
-    qa_ms2 <- ms2[qa_ms2,.(pc_rt=pc_rt,
-                           pc_int=pc_int,
-                           ms2_int=max(intensity),
-                           ms2_rt=unique(rt),
-                           qa_ms2_near=head(rt,1) < pc_rt + rt_win2 & head(rt,1) > pc_rt - rt_win2),
-                  by=.EACHI,on=c(BASE_KEY_MS2,"an")]
-    qa_ms2$qa_ms2_good_int <-F
-    qa_ms2[qa_ms2_near==T,
-           qa_ms2_good_int := ms2_int > presconf$ms2_int_thresh & ms2_int < pc_int,
-           by=c(BASE_KEY_MS2,"an")]
-    ## qa_ms2$qa_pass <- F
-    ## qa_ms2[qa_ms2_good_int==T,qa_pass:=T]
-    qa_ms2$ms2_sel <- F
-    qa_ms2[qa_ms2_good_int==T,ms2_sel:={
-        ind<-which.min(abs(ms2_rt-pc_rt))
-        z<-ms2_sel
-        z[[ind]]<-T
-        z
-    },by=BASE_KEY_MS2]
-    setkeyv(qa_ms2,BASE_KEY_MS2)
-    m$qa$ms2 <- qa_ms2
-    m    
-}
-## Analyze extracted data.
-analyse_extracted_data <- function(extr,prescreen_param) {
-    ms1 <- extr$ms1
-    ms2 <- extr$ms2
-    ## Parameters.
-    presconf <- conf_trans_pres(prescreen_param)
-    rt_shift <- presconf$ret_time_shift_tol
-    det_ms2_noise <- presconf$det_ms2_noise
-    ms2_int_thresh <- presconf$ms2_int_thresh
-    ms1_int_thresh <- presconf$ms1_int_thresh
-    ## Detect MS2 noise.
-    ms2_clc_ns <- if (det_ms2_noise) {
-                      if (ms2_int_thresh>0) {
-                          warning("Ignore user specified ms2_int_thresh value and autodetect noise instead.")
-                      }
-                      ms2[,ms2_thr:=0.33333333*mean(intensity),by="tag"]
-                  } else {
-                      ms2[,ms2_thr:=ms2_int_thresh]
-                  }
-    ## message('ms2_clc_ns:',key(ms2_clc_ns))
-    ## ms2_clc_ns <- ms2_clc_ns[intensity>ms2_thr]
-    ## We drop mz info.
-    tab_ms2 <- ms2_clc_ns[,.(ms2_rt=first(rt),ms2_int=max(intensity),ms2_thr=first(ms2_thr)),by=c(BASE_KEY_MS2,'an')]
-    tab_ms2[,qa_ms2_good_int:=ms2_int>ms2_thr,by="an"]
-    data.table::setkeyv(tab_ms2,BASE_KEY_MS2)
-    tab_ms2[,`:=`(rt_left = ms2_rt - rt_shift,rt_right = ms2_rt + rt_shift)]
-    ## Get mean ms1 value.
-    tab_ms1 <- extr$ms1
-    tab_ms1_mean <- tab_ms1[,.(ms1_mean=mean(intensity)),keyby=BASE_KEY]
-    ## This function extracts intensity maxima on intervals given by
-    ## RT vectors rt_1 and rt_2.
-    find_ms1_max <- function(rt,intensity,rt_1,rt_2)
-    {
-        mapply(function (rt_1,rt_2) {
-            rt_ival <- c(rt_1,rt_2)
-            intv <- findInterval(rt,rt_ival)
-            lintv = length(intv)
-            if (intv[1]==0L && intv[lintv] == 2L) {
-                pos = match(c(1L,2L),intv)
-            } else if (intv[1]==1L && intv[lintv]!=1L) {
-                pos = c(1L,match(2L,intv))
-            } else if (intv[1]==0L && intv[lintv]!=0L) {
-                pos = c(match(1L,intv),lintv)
-            } else {
-                pos = c(1L,lintv)
-            }
-            pmax = pos[[1]] + which.max(intensity[pos[[1]]:pos[[2]]]) - 1L
-            c(rt[pmax],intensity[pmax])
-        }, rt_1, rt_2, USE.NAMES=F)
-    }
-    ## Perform MS1 maxima calculation in the neighbourhood of each
-    ## MS2 result.
-    tmp = tab_ms1[tab_ms2,{
-        xx = find_ms1_max(rt,intensity,i.rt_left,i.rt_right)
-        .(an=i.an,
-          ms1_rt = xx[1,],
-          ms1_int = xx[2,])
-    },by=.EACHI, nomatch=NULL]
-    ## Calculate QA values.
-    tab_ms2[tmp,c("ms1_rt","ms1_int"):=.(i.ms1_rt,i.ms1_int),on=c(BASE_KEY,'an')]
-    tab_ms2[,c("rt_left","rt_right"):=c(NULL,NULL)]
-    tab_ms2[tab_ms1_mean,ms1_mean:=i.ms1_mean]
-    tab_ms2[,`:=`(qa_ms1_good_int=fifelse(ms1_int>ms1_int_thresh,T,F),
-               qa_ms1_above_noise=F,
-               qa_ms2_near=F)]
-    ## TODO: I wonder if so stupidly auto-calculated ms1 noise should
-    ## be taken into account at all? My recollection from earlier
-    ## times was that it was helpful, at least sometimes.
-    tab_ms2[qa_ms1_good_int==T,qa_ms1_above_noise:=fifelse(ms1_int>ms1_mean/3.,T,F)]
-    tab_ms2[qa_ms1_good_int==T & qa_ms1_above_noise==T & qa_ms2_good_int==T,qa_ms2_near:=T]
-    tab_ms2$qa_ms2_exists=T
-    ## Find MS1 with no corresponding MS2.
-    ms1key <- tab_ms1[,unique(.SD),.SDcol=BASE_KEY]
-    ms2key <- tab_ms2[,unique(.SD),.SDcol=BASE_KEY]
-    ms2key$mch=T
-    tabmatches <- ms2key[ms1key]
-    ms1woms2 <- tabmatches[is.na(mch)][,mch:=NULL]
-    ## calculate the most intense peak, its location and the mean for
-    ## childless MS1.
-    tab_noms2 <- tab_ms1[ms1woms2,.(ms1_mean=mean(intensity),ms1_rt=rt[which.max(intensity)],ms1_int=max(intensity)),by=.EACHI,nomatch=NULL]
-    ## QA for the above (lazy qa ... take only the max peak into account).
-    tab_noms2[,c("qa_ms1_good_int","qa_ms1_above_noise"):=.(ms1_int>ms1_int_thresh,ms1_int>ms1_mean/3.)]
-    ## MS2 QA criteria all fail.
-    tab_noms2[,c("qa_ms2_exists","qa_ms2_good_int","qa_ms2_near"):=.(F,F,F)]
-    ## Bind MS1-only and MS1/MS2 entries together.
-    res <- rbind(tab_ms2,tab_noms2,fill=T,use.names=T)
-    ## Every single entry which was extracted has at least MS1.
-    res[,qa_ms1_exists:=T]
-    data.table::setkeyv(res,BASE_KEY)
-    qflg <- QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
-    res[,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
-    res[.(T),del_rt:=abs(ms2_rt - ms1_rt),on="qa_pass",by='an']
-    resby <- BASE_KEY_MS2[! (BASE_KEY_MS2 %in% 'an')]
-    res[.(T),qa_tmp_ms1_max:= ms1_int==max(ms1_int),on="qa_pass",by=resby]
-    res[,ms2_sel:=F]
-    res[.(T,T),ms2_sel:= del_rt == del_rt[which.min(del_rt)],on=c("qa_pass","qa_tmp_ms1_max"),by=resby]
-    res[,qlt_ms1:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms1_exists",
-                                                                 "qa_ms1_above_noise",
-                                                                 "qa_ms1_good_int")]
-    res[,qlt_ms2:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms2_exists",
-                                                                 "qa_ms2_near",
-                                                                 "qa_ms2_good_int")]
-    res
-}
-## Based on the `comprehensive' and `qa' tabs, greate `summ'.
-gen_summ <- function(comp,qa) {
-    comp_cols <- intersect(SUMM_COLS,colnames(comp))
-    rdcomp <- comp[,..comp_cols]
-    data.table::setkeyv(rdcomp,BASE_KEY)
-    summ <- qa[rdcomp,nomatch=F] #We changed `nomatch' cases from NA
-                                 #to F, because NA does not work well
-                                 #with X == F condition.
-    ## flgs <- c(QA_FLAGS,"ms2_sel")
-    ## summ[is.na(qa_ms1_exists),(flgs):=F]
-    data.table::setkeyv(summ,SUMM_KEY)
-    summ[.(F),c("qlt_ms1","qlt_ms2"):=0.,on="qa_ms1_exists"]
-    summ
-}
 gen_mz_err_f <- function(entry,msg) {
    eppm <- grab_unit(entry,"ppm")
    eda <- grab_unit(entry,"Da")

--- a/R/plotting.R
+++ b/R/plotting.R
@@ -21,8 +21,8 @@ aes <- ggplot2::aes
 plot_fname <- function(kvals) {
    if (!is.null(kvals)) {
-        kparts <- mapply(paste0,names(kvals),kvals,USE.NAMES=F)
+        kparts = mapply(paste0,names(kvals),kvals,USE.NAMES=F)
-        stump <- paste(kparts,collapse="_")
+        stump = paste(kparts,collapse="_")
        paste0("plot_",stump,".pdf")
    } else 'default.pdf'
 }
@@ -33,23 +33,23 @@ is_fname_rds <- function(fn) {
 }
 sci10 <- function(x) {
-    prefmt <- formatC(x,format="e",digits=2)
+    prefmt = formatC(x,format="e",digits=2)
-    bits <- strsplit(prefmt,split="e")
+    bits = strsplit(prefmt,split="e")
-    bits1 <-sapply(bits,function(x) {
+    bits1 =sapply(bits,function(x) {
        if (length(x) > 1) {
-            res <- x[[1]]
+            res = x[[1]]
            sub(" ","~",res)
        } else {
            x
        }
    })
-    bits2 <-sapply(bits,function(x) if (length(x)>1) paste0(" %*% 10^","'",sub("[+]"," ",x[[2]]),"'") else "")
+    bits2 =sapply(bits,function(x) if (length(x)>1) paste0(" %*% 10^","'",sub("[+]"," ",x[[2]]),"'") else "")
-    txt <- mapply(function(b1,b2) if (nchar(b2)!=0) {paste0("'",b1,"'",b2)} else NA,
+    txt = mapply(function(b1,b2) if (nchar(b2)!=0) {paste0("'",b1,"'",b2)} else NA,
                  bits1,
                  bits2,
                  SIMPLIFY = F)
-    names(txt) <- NULL
+    names(txt) = NULL
-    txt <- gsub(pattern = "^'0\\.00'.*$","  0",x=txt)
+    txt = gsub(pattern = "^'0\\.00'.*$","  0",x=txt)
    parse(text=txt)
@@ -58,15 +58,15 @@ sci10 <- function(x) {
 scale_legend <- function(colrdata,pdata) {
    if (is.null(colrdata) || is.null(pdata)) NULL
-    labs <- data.table::key(colrdata)
+    labs = data.table::key(colrdata)
-    sdcols <- c(labs,"label")
+    sdcols = c(labs,"label")
-    tab_lab <- pdata[,unique(.SD),.SDcols=sdcols][colrdata,.(colour=i.colour),on=labs,nomatch=NULL]
+    tab_lab = pdata[,unique(.SD),.SDcols=sdcols][colrdata,.(colour=i.colour),on=labs,nomatch=NULL]
-    x <- tab_lab$colour
+    x = tab_lab$colour
-    names(x) <- tab_lab$label
+    names(x) = tab_lab$label
    ggplot2::scale_colour_manual(values=x)
 }
@@ -79,15 +79,15 @@ pal_maker <- function(n,palname = NULL) {
    ## the first colours to the original palette, then go over to the
    ## generated ones. Returns a vector of colours.
-    krzywinski <- c("#68023F","#008169","#EF0096","#00DCB5",
+    krzywinski = c("#68023F","#008169","#EF0096","#00DCB5",
                    "#FFCFE2","#003C86","#9400E6","#009FFA",
                    "#FF71FD","#7CFFFA","#6A0213","#008607")
-    info <- as.data.table(RColorBrewer::brewer.pal.info,keep.rownames = T)
+    info = as.data.table(RColorBrewer::brewer.pal.info,keep.rownames = T)
-    maxcol <- if (!is.null(palname)) info[rn == palname]$maxcolors else length(krzywinski)
+    maxcol = if (!is.null(palname)) info[rn == palname]$maxcolors else length(krzywinski)
-    startpal <- if (!is.null(palname)) RColorBrewer::brewer.pal(maxcol,palname) else krzywinski
+    startpal = if (!is.null(palname)) RColorBrewer::brewer.pal(maxcol,palname) else krzywinski
-    pal <- if (n>length(startpal)) {
+    pal = if (n>length(startpal)) {
-               intrppal <-(colorRampPalette(startpal))(n)
+               intrppal =(colorRampPalette(startpal))(n)
-               newcol <- setdiff(intrppal,startpal)
+               newcol = setdiff(intrppal,startpal)
               unique(c(startpal,intrppal))
           } else startpal
@@ -106,16 +106,16 @@ make_struct_plot <- function(smiles, kekulise=TRUE, width=300, height=300,
    ## structure -> grob
    smiles2img <- function() {
        if (is.na(smiles) || nchar(smiles)==0) return(NULL) #Handle empty SMILES.
-        dep <- rcdk::get.depictor(width = width, height = height, zoom = zoom, style = style, annotate = annotate,
+        dep = rcdk::get.depictor(width = width, height = height, zoom = zoom, style = style, annotate = annotate,
                                  abbr = abbr, suppressh = suppressh, showTitle = showTitle, smaLimit = smaLimit,
                                  sma = NULL)
-        mol <- RMassBank::getMolecule(smiles)
+        mol = RMassBank::getMolecule(smiles)
-        z<-rcdk::view.image.2d(mol, depictor=dep)
+        z=rcdk::view.image.2d(mol, depictor=dep)
        grid::rasterGrob(z)
    }
-    grob <- smiles2img()
+    grob = smiles2img()
    if (!is.null(grob)) {
        qplot(1:5, 2*(1:5), geom="blank") +
@@ -156,13 +156,13 @@ theme_print <- function(...) ggplot2::theme_light()+ggplot2::theme(axis.title=gg
                                                        ...)+guide_fun()
-theme_empty <- ggplot2::theme_bw()
+theme_empty = ggplot2::theme_bw()
-theme_empty$line <- ggplot2::element_blank()
+theme_empty$line = ggplot2::element_blank()
-theme_empty$rect <- ggplot2::element_blank()
+theme_empty$rect = ggplot2::element_blank()
-theme_empty$strip.text <- ggplot2::element_blank()
+theme_empty$strip.text = ggplot2::element_blank()
-theme_empty$axis.text <- ggplot2::element_blank()
+theme_empty$axis.text = ggplot2::element_blank()
-theme_empty$plot.title <- ggplot2::element_blank()
+theme_empty$plot.title = ggplot2::element_blank()
-theme_empty$axis.title <- ggplot2::element_blank()
+theme_empty$axis.title = ggplot2::element_blank()
 cust_geom_line <- function(key_glyph="rect",...) ggplot2::geom_line(...,key_glyph=key_glyph)
@@ -187,18 +187,31 @@ mk_logic_exp <- function(rest,sofar=NULL) {
    } else {
        nm = names(rest)[[1]]
        val = rest[[1]]
-        ex <- bquote(.(as.symbol(nm)) %in% .(val))
+        ex = bquote(.(as.symbol(nm)) %in% .(val))
-        zz <- if (is.null(sofar)) ex else bquote(.(ex) & .(sofar))
+        zz = if (is.null(sofar)) ex else bquote(.(ex) & .(sofar))
        mk_logic_exp(tail(rest,-1L), zz)
    }
 }
-get_data_from_key <- function(tab,key) {
+get_data_from_key <- function(db,tab,kvals,outcols=NULL) {
-    skey <- mk_logic_exp(key)
-    tab <- eval(bquote(tab[.(skey)]))
+    ## Ensure only names that exist in cat are used in selection. Or,
-    setkeyv(tab,names(key))
+    ## should we not do this?
-    tab
+    valid_names = intersect(names(kvals),colnames(db$cat))
+    ## Turn list into a data.table.
+    dt = as.data.table(kvals[valid_names])
+    ## Get catids.
+    cattab = db$cat[dt,on=valid_names]
+    ## Get precids.
+    mztab = db$precursors[cattab,on="catid"]
+    outnames = union(valid_names,outcols)
+    res = tab[mztab,on="precid"]
+    if (!is.null(outcols)) res[,..outnames] else res
 }
@@ -220,9 +233,9 @@ define_colrdata <- function(comptab,labs) {
    ## Determine colours based on `labs'. 
    one_keyset <- function(dt) {
        labtab = dt[,unique(.SD),.SDcol=labs]
-        n <- NROW(labtab)
+        n = NROW(labtab)
-        cols <- if (n<13L) {
+        cols = if (n<13L) {
-                    pal <- RColorBrewer::brewer.pal(n=n,name="Paired") 
+                    pal = RColorBrewer::brewer.pal(n=n,name="Paired") 
                    if (n>3L) pal else if (n>0L) pal[1:n] else character()
                } else {
                    scales::viridis_pal()(n)
@@ -232,14 +245,14 @@ define_colrdata <- function(comptab,labs) {
    }
    ## Calculate lengths of all the COLRDATA_KEY subgroups.
-    dt <- comptab[,unique(.SD),.SDcols=labs,by=COLRDATA_KEY]
+    dt = comptab[,unique(.SD),.SDcols=labs,by=COLRDATA_KEY]
    ## Arrange colours to map to specific labels by sorting.
-    allcols <- union(COLRDATA_KEY,labs)
+    allcols = union(COLRDATA_KEY,labs)
    data.table::setkeyv(dt,allcols)
    ## Assign colours to labels subgroups.
-    res <- dt[,one_keyset(.SD),by=COLRDATA_KEY]
+    res = dt[,one_keyset(.SD),by=COLRDATA_KEY]
    ## Sort everything again, 
    data.table::setkeyv(res,allcols)
@@ -251,10 +264,10 @@ define_colrdata <- function(comptab,labs) {
 ## compound set.).
 narrow_colrdata <- function(colrdata,kvals) {
    if (is.null(colrdata)) return(NULL)
-    vals <- as.list(kvals[COLRDATA_KEY])
+    vals = as.list(kvals[COLRDATA_KEY])
-    res <- colrdata[,(COLRDATA_KEY):=NULL]
+    res = colrdata[,(COLRDATA_KEY):=NULL]
-    labs <- names(res)[names(res)!="colour"]
+    labs = names(res)[names(res)!="colour"]
    data.table::setkeyv(res,labs)
    res
 }
@@ -266,201 +279,219 @@ narrow_colrdata <- function(colrdata,kvals) {
 ## subset of the `summ' table based on `kvals'. We need it for rt-s in
 ## the labels. Argument `labs' is a vector of names that will be used
 ## to construct the legend labels.
-get_data_4_eic_ms1 <- function(extr_ms1,summ_rows,kvals,labs) {
+get_data_4_eic_ms1 <- function(db,summ_rows,kvals,labs) {
-    ## Which of the selected keys are in the extr_ms1? This can be
+    ## Which of the selected keys are in the db$extr$cgm$ms1? This can be
    ## made more obvious to the user, but note necessary atm.
-    keys <- names(kvals)
+    keys = names(kvals)
-    actual_key <- intersect(keys,names(extr_ms1))
+    actual_key = intersect(keys,names(db$extr$cgm$ms1))
-    actual_kvals <- kvals[actual_key]
+    actual_kvals = kvals[actual_key]
+    ## Subset db$extr$cgm$ms1 by the actual key.
+    tab = get_data_from_key(db=db,
+                            tab=db$extr$cgm$ms1,
+                            kvals=kvals,
+                            outcols = c("catid","mz","rt","intensity"))
-    ## Subset extr_ms1 by the actual key.
-    tab <-get_data_from_key(tab=extr_ms1,key=actual_kvals)
-    ## Group the plot data per label group (ie tags, or adducts, or
+    meta = db$cat[tab[,.(catid=unique(catid))],on=.(catid),.SD,by=.EACHI,.SDcols=labs]
-    ## both).
-    xlxx <- intersect(labs,names(extr_ms1))
-    xlxx <- as.character(xlxx)
-    pdata <- tab[,.(rt,intensity),by=xlxx]
+    ## Attach catid information.
+    tab = meta[tab,on=.(catid),nomatch=NULL]
+    pdata = tab[,.(rt,intensity),by=labs]
-    ## TODO: FIXME: This fails because summ_rows sux wrt calcing of ms1_rt for labels. #Now, add the RTs in.
+    text_labels = labs
-    ## pdata[summ_rows,ms1_rt:=signif(i.ms1_rt,5),on=xlxx]
+    pdata = eval(bquote(pdata[,label:=make_line_label(..(lapply(text_labels,as.symbol))),by=text_labels],splice=T))
+    setkeyv(pdata,cols=unique(as.character(text_labels)))
-    ## Create labels.
-    ## xlxx <- unique(c(xlxx,"ms1_rt"))
-    pdata <- eval(bquote(pdata[,label:=make_line_label(..(lapply(xlxx,as.symbol))),by=xlxx],splice=T))
-    setkeyv(pdata,cols=unique(as.character(xlxx),"rt"))
    pdata
 }
 ## Prepare MS2 eic data: rt and intensity + key made of splitby.
-get_data_4_eic_ms2 <- function(summ,kvals,labs) {
+get_data_4_eic_ms2 <- function(db,summ,kvals,labs) {
-    tab <-get_data_from_key(tab=summ,key=kvals)
+    tab = get_data_from_key(db=db,tab=db$extr$cgm$ms2,kvals=kvals,outcols=c("catid","ce","mz","rt","intensity"))
-    nms <- names(kvals)
-    byby <- unique(c(nms,labs,"an"))
+    meta = db$cat[tab[,.(catid=unique(catid))],on=.(catid),.SD,by=.EACHI,.SDcols=labs]
-    pdata <- tab[,.(intensity=ms2_int,rt=ms2_rt),by=byby]
+    ## Attach catid information.
+    tab = meta[tab,on=.(catid),nomatch=NULL]
+    pdata = tab[,.(rt,intensity),by=labs]
    if (NROW(pdata)==0L) return(NULL)
-    xlxx <- as.character(labs)
-    pdata <- eval(bquote(pdata[,label:=make_line_label(..(lapply(xlxx,as.symbol))),by=.(xlxx)],splice=T))
-    setkeyv(pdata,cols=c(labs,"rt"))
+    text_labels = labs
+    pdata = eval(bquote(pdata[,label:=make_line_label(..(lapply(text_labels,as.symbol))),by=text_labels],splice=T))
+    setkeyv(pdata,cols=unique(as.character(text_labels)))
    pdata
 }
-get_rows_from_summ <- function(summ,kvals,...) {
-    summ_rows_cols <- union(names(kvals),c(...))
-    get_data_from_key(summ,key=kvals)[,unique(.SD),.SDcol=summ_rows_cols]
-}
-narrow_summ <- function(summ,kvals,labs,...) {
+narrow_summ <- function(db,summ,kvals,labs,...) {
-        keys <- names(kvals)
+        keys = names(kvals)
-        ## keys <- keys[!is.na(keys)]
+        nms = union(names(kvals),
-        needed <- setdiff(labs,keys)
+                    labs)
-        x <- as.list(c(needed,...))
+        nms = union(union("precid",nms),c(...))
-        x <- c(list(summ,kvals),x)
+        nsumm = get_data_from_key(db=db,
-        do.call(get_rows_from_summ,x)
+                                  tab=summ,
+                                  kvals=kvals,
+                                  outcols=nms)
+        nsumm
 }
 ### PLOTTING: TOP-LEVEL PLOT CREATION
-make_eic_ms1_plot <- function(extr_ms1,summ,kvals,labs,axis="linear",rt_range=NULL,i_range=NULL, asp=1,colrdata=NULL) {
+make_eic_ms1_plot <- function(db,summ,kvals,labs,axis="linear",rt_range=NULL,i_range=NULL, asp=1,colrdata=NULL) {
    ## If nothing selected, just return NULL.
    if (is.null(kvals)) return(NULL)
-    key <- names(kvals)
+    key = names(kvals)
    ## Get metadata.
-    ## TODO: FIXME: Somehow calculating representationve ms1_rt for
+    summ_rows = narrow_summ(db=db,summ,kvals,labs,"mz","ms1_rt","ms1_int","Name","SMILES","qa_ms1_exists","scan","ms2_sel")
-    ## plots is wrong. Horrible and wrong. Will remove those labels
+    rows_key = union(data.table::key(summ_rows),labs)
-    ## until we fix.
-    summ_rows <- narrow_summ(summ,kvals,labs,"mz","ms1_rt","ms1_int","Name","SMILES","Formula","qa_ms1_exists","an","ms2_sel")
-    rows_key <- union(data.table::key(summ_rows),labs)
    summ_rows$sel_ms1_rt=NA_real_
    summ_rows[ms2_sel==T,sel_ms1_rt:=ms1_rt[which.max(ms1_int)],by=rows_key]
    summ_rows[is.na(sel_ms1_rt) & ms2_sel==F & qa_ms1_exists==T,sel_ms1_rt:=ms1_rt[which.max(ms1_int)],by=rows_key]
    summ_rows[,ms1_rt:=sel_ms1_rt]
    summ_rows[,sel_ms1_rt:=NULL]
-    summ_rows[,c("an","qa_ms1_exists","ms2_sel"):=NULL]
+    summ_rows[,c("scan","qa_ms1_exists","ms2_sel"):=NULL]
-    summ_rows <- summ_rows[,unique(.SD)]
+    summ_rows = summ_rows[,unique(.SD)]
    ## Get the table with ms1 data.
-    pdata <- get_data_4_eic_ms1(extr_ms1, summ_rows, kvals, labs)
+    pdata = get_data_4_eic_ms1(db=db, summ_rows, kvals, labs)
    ## Deal with retention time range.
-    coord <- if (is.null(rt_range) && is.null(i_range)) {
+    coord = if (is.null(rt_range) && is.null(i_range)) {
                 NULL
             } else {
                 ggplot2::coord_cartesian(xlim=rt_range,
                                          ylim=i_range)
             }
-    xrng <- range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt)
+    xrng = range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt)
-    dx <- abs(xrng[[2]]-xrng[[1]])
+    dx = abs(xrng[[2]]-xrng[[1]])
-    yrng <- range(pdata$intensity)
+    yrng = range(pdata$intensity)
-    dy <- abs(yrng[[2]]-yrng[[1]])
+    dy = abs(yrng[[2]]-yrng[[1]])
    ## Calculate aspect ratio.
-    aspr <- if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
+    aspr = if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
    tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])),
                     collapse='; ') ## paste0("Set: ", set, " ID: ",id)
    title_txt = paste0("MS1 EIC for ion m/z = ",paste0(signif(unique(summ_rows$mz),digits=7L),collapse=", "))
-    nm <- paste(unique(summ_rows$Name),collapse="; ")
+    nm = paste(unique(summ_rows$Name),collapse="; ")
    subt_txt = if (!length(nm)==0L && !is.na(nm) && nchar(nm)>0L) nm else NULL
-    p <- ggplot2::ggplot(pdata,aes(x=rt,y=intensity,colour=label))+
+    p = ggplot2::ggplot(pdata,aes(x=rt,y=intensity,colour=label))+
        ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt)+
        ggplot2::xlab("retention time")+
        cust_geom_line()+
        scale_y(axis=axis,labels=sci10)+
        coord
-    colrdata <- narrow_colrdata(colrdata,kvals)
+    colrdata = narrow_colrdata(colrdata,kvals)
    p + scale_legend(colrdata,pdata) + theme_eic()
 }
-make_eic_ms2_plot <- function(summ,kvals,labs,axis="linear",rt_range=NULL,asp=1, colrdata=NULL) {
+make_eic_ms2_plot <- function(db,summ,kvals,labs,axis="linear",rt_range=NULL,asp=1, colrdata=NULL) {
    ## If nothing selected, just return NULL.
    if (is.null(kvals)) return(NULL)
    ## Get metadata.
-    summ_rows <- narrow_summ(summ,kvals,labs,"mz","ms2_rt","ms2_int","Name","SMILES","Formula")
+    summ_rows = narrow_summ(db=db,summ,kvals,labs,"mz","ms2_rt","ms2_int","Name","SMILES")
    ## Get plotting data for the compound.
-    pdata <- get_data_4_eic_ms2(summ,
+    pdata = get_data_4_eic_ms2(db=db,
-                                kvals=kvals,
+                               summ,
-                                labs=labs)
+                               kvals=kvals,
+                               labs=labs)
    if (NROW(pdata)==0L) return(NULL)
    ## Deal with retention time range.
-    rt_lim <- if (is.null(rt_range)) NULL else ggplot2::coord_cartesian(xlim=rt_range)#ggplot2::xlim(rt_range)
+    rt_lim = if (is.null(rt_range)) NULL else ggplot2::coord_cartesian(xlim=rt_range)#ggplot2::xlim(rt_range)
-    xrng <- range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt)
+    xrng = range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt)
-    dx <- abs(xrng[[2]]-xrng[[1]])
+    dx = abs(xrng[[2]]-xrng[[1]])
-    yrng <- range(pdata$intensity)
+    yrng = range(pdata$intensity)
-    dy <- abs(yrng[[2]]-yrng[[1]])
+    dy = abs(yrng[[2]]-yrng[[1]])
    ## Fix aspect ratio.
-    aspr <- if (is.null(dx) || is.na(dx) || dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
+    aspr = if (is.null(dx) || is.na(dx) || dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
    ## Derive various labels.
    tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])),
                     collapse='; ')
    title_txt = paste0("MS2 EIC for ion m/z = ",paste0(signif(unique(summ_rows$mz),digits=7L),collapse=", "))
    subt_txt = if (!length(summ_rows$Name)==0L && !is.na(summ_rows$Name[[1]]) && nchar(summ_rows$Name[[1]])>0L) summ_rows$Name[[1]] else NULL
    ## Base plot.
-    p <- ggplot2::ggplot(pdata,aes(x=rt,ymin=0,ymax=intensity,colour=label)) +
+    p = ggplot2::ggplot(pdata,aes(x=rt,ymin=0,ymax=intensity,colour=label)) +
         ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt) +
         ggplot2::xlab("retention time")+ggplot2::ylab("intensity")+cust_geom_linerange()+
         scale_y(axis=axis,labels=sci10)+rt_lim+guide_fun()
-    ans <- pdata[,unique(an)]
    ## Add theme.
-    colrdata <- narrow_colrdata(colrdata,kvals)
+    colrdata = narrow_colrdata(colrdata,kvals)
    p + scale_legend(colrdata,pdata) + theme_eic()
 }
-make_spec_ms2_plot <- function(extr_ms2,summ,kvals,labs,axis="linear",asp=1, colrdata=NULL) {
+make_spec_ms2_plot <- function(db,summ,kvals,labs,axis="linear",asp=1, colrdata=NULL) {
    ## Only the chosen ones.
-    mdata  <- get_data_from_key(summ,key=kvals)[ms2_sel==T]
+    mdata  = get_data_from_key(db=db,
-    common_key <- intersect(names(extr_ms2),names(kvals))
+                               tab=summ,
-    common_vals <- kvals[common_key]
+                               kvals=kvals,
-    if (length(common_key) == 0L) return(NULL)
+                               outcols=union(names(kvals),
-    subxdata <- get_data_from_key(extr_ms2,key=common_vals)
+                                             colnames(summ)))[ms2_sel==T]
+    spectra = db$extr$spectra[mdata,on=.(precid,scan),.(catid,ce,mz,scan,rt=i.ms2_rt,intensity)]
    if (NROW(mdata)==0L) return(NULL)
-    if (NROW(subxdata) == 0L) return(NULL)
+    if (NROW(spectra) == 0L) return(NULL)
-    ans <- data.table(an=mdata[,unique(an)],key="an")
-    ms2ctg <- c(intersect(c(names(kvals),labs),names(extr_ms2)),"CE")
+    meta = db$cat[spectra[,.(catid=unique(catid))],
-    xlxx <- intersect(as.character(labs),names(extr_ms2))
+                  on=.(catid),
-    common_labels <- unique(c("an",common_key,intersect(names(extr_ms2),labs)))
+                  .SD,
-    pdata <- subxdata[ans,on="an"][,.(mz=mz,intensity=intensity,rt=signif(unique(rt),5)),by=common_labels]
+                  by=.EACHI,
-    pdata <- eval(bquote(pdata[,label:=make_line_label(..(lapply(c(xlxx,"rt"),as.symbol))),by=.(xlxx)],splice=T))
+                  .SDcols=labs]
+    subxdata = meta[spectra,on=.(catid),nomatch=NULL]
+    labs = c(labs,"ce")
+    pdata = subxdata[,.(mz=mz,intensity=intensity,rt=signif(unique(rt),5)),by=labs]
+    pdata = eval(bquote(pdata[,label:=make_line_label(..(lapply(c(labs,"rt"),as.symbol))),by=.(labs)],splice=T))
    if (NROW(pdata)==0L) return(NULL)
    # Aspect ratio.
-    xrng <- range(pdata$mz)
+    xrng = range(pdata$mz,na.rm=T)
-    dx <- abs(xrng[[2]]-xrng[[1]])
+    dx = abs(xrng[[2]]-xrng[[1]])
-    yrng <- range(pdata$intensity)
+    yrng = range(pdata$intensity)
-    dy <- abs(yrng[[2]]-yrng[[1]])
+    dy = abs(yrng[[2]]-yrng[[1]])
-    aspr <- if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
+    aspr = if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
    ## Get labels.
    tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])),
                     collapse='; ')
    title_txt = paste0("MS2 spectra for ion m/z = ",paste0(signif(unique(mdata$mz),digits=7L),collapse=", "))
-    nm <- paste(unique(mdata$Name),collapse="; ")
+    nm = paste(unique(mdata$Name),collapse="; ")
    subt_txt = if (!length(nm)==0L && !is.na(nm) && nchar(nm)>0L) nm else NULL
-    p <- ggplot2::ggplot(pdata,aes(x=mz,ymin=0,ymax=intensity,colour=label))+ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt)+ggplot2::xlab("m/z")+cust_geom_linerange()+scale_y(axis=axis,labels=sci10)+guide_fun()
+    p = ggplot2::ggplot(pdata,aes(x=mz,ymin=0,ymax=intensity,colour=label))+ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt)+ggplot2::xlab("m/z")+cust_geom_linerange()+scale_y(axis=axis,labels=sci10)+guide_fun()
    ## Add theme.
-    colrdata <- narrow_colrdata(colrdata,kvals)
+    colrdata = narrow_colrdata(colrdata,kvals)
    p + scale_legend(colrdata,pdata) + theme_eic()
 }

--- a/R/resources.R
+++ b/R/resources.R
@@ -124,13 +124,14 @@ NUM_INP_HEIGHT="5%"
 ## Possible compound list fields
 EMPTY_CMPD_LIST = dtable(ID=character(),
-                          SMILES=character(),
+                         SMILES=character(),
-                          Name=character(),
+                         Name=character(),
-                          Formula=character(),
+                         Formula=character(),
-                          RT=numeric(),
+                         RT=numeric(),
-                          mz=numeric(),
+                         mz=numeric(),
-                          known=character(),
+                         known=character(),
-                          ORIG=character())
+                         set=character(),
+                         ORIG=character())
 COMP_LIST_COLS = c("ID","Name","SMILES","Formula","RT","mz")
 ## Comprehensive table properties
 COMP_NAME_MAP = list(RT="rt")
@@ -204,8 +205,8 @@ REPORT_TITLE = "Plots of EICs and MS2 Spectra"
 ## Select the most fundamental group of entries. Within this group,
 ## each ID is unique.
-BASE_KEY = c("adduct","tag","ID")
+BASE_KEY = "precid"#c("adduct","tag","ID")
-BASE_KEY_MS2 = c(BASE_KEY,"CE","an")
+BASE_KEY_MS2 = c("precid","ce","scan")#c(BASE_KEY,"CE","scan")
 FIG_DEF_CONF =list(grouping=list(group="adduct",
                                  plot="ID",
@@ -216,7 +217,7 @@ FIG_DEF_CONF =list(grouping=list(group="adduct",
 SUMM_COLS=c("set",BASE_KEY_MS2,"mz","ms1_rt", "ms1_int", "ms2_rt", "ms2_int",
            "ms1_mean","ms2_sel",QA_FLAGS,"Name", "SMILES", "Formula", "known","Comments","file")
-SUMM_KEY = c("set","ID","adduct","tag","an")
+SUMM_KEY = c("set","ID","adduct","tag","precid","catid","scan")
 PLOT_FEATURES = c("adduct",
                   "tag",
@@ -224,34 +225,34 @@ PLOT_FEATURES = c("adduct",
 ## Empty summary table.
 EMPTY_SUMM = data.table::data.table(set=character(0),
-                                     adduct=character(0),
+                                    adduct=character(0),
-                                     tag=character(0),
+                                    tag=character(0),
-                                     ID=character(0),
+                                    ID=character(0),
-                                     CE=character(0),
+                                    CE=character(0),
-                                     an=integer(0),
+                                    scan=character(0),
-                                     mz=numeric(0),
+                                    mz=numeric(0),
-                                     ms1_rt=numeric(0),
+                                    ms1_rt=numeric(0),
-                                     ms1_int=numeric(0),
+                                    ms1_int=numeric(0),
-                                     ms2_rt=numeric(0),
+                                    ms2_rt=numeric(0),
-                                     ms2_int=numeric(0),
+                                    ms2_int=numeric(0),
-                                     ms1_mean=numeric(0),
+                                    ms1_mean=numeric(0),
-                                     ms2_sel=logical(0),
+                                    ms2_sel=logical(0),
-                                     qa_pass=logical(0),
+                                    qa_pass=logical(0),
-                                     qa_ms1_exists=logical(0),
+                                    qa_ms1_exists=logical(0),
-                                     qa_ms2_exists=logical(0),
+                                    qa_ms2_exists=logical(0),
-                                     qa_ms1_good_int=logical(0),
+                                    qa_ms1_good_int=logical(0),
-                                     qa_ms1_above_noise=logical(0),
+                                    qa_ms1_above_noise=logical(0),
-                                     qa_ms2_near=logical(0),
+                                    qa_ms2_near=logical(0),
-                                     qa_ms2_good_int=logical(0),
+                                    qa_ms2_good_int=logical(0),
-                                     Name=character(0),
+                                    Name=character(0),
-                                     SMILES=character(0),
+                                    SMILES=character(0),
-                                     Formula=character(0),
+                                    Formula=character(0),
-                                     known=character(0),
+                                    known=character(0),
-                                     Comments=character(0),
+                                    Comments=character(0),
-                                     file=character(0))
+                                    file=character(0))
 ## Default sorting keys of spectra in the summary table
-DEF_KEY_SUMM = c(BASE_KEY_MS2,"an")
+DEF_KEY_SUMM = c(BASE_KEY_MS2,"scan")
 SUBSET_VALS = c(IGNORE="ignore",
@@ -360,3 +361,7 @@ METFRAG_RESULT_READF = list(csv = function(file,...) data.table::fread(file=file
                            xml = function(file,...) readxl::read_excel(path=file,...))
 METFRAG_DEFAULT_PROC = 1L
+## DATA MODEL
+DB_CATALOGUE_KEY = c("set","tag","adduct","ID")
--- a/R/shiny-state.R
+++ b/R/shiny-state.R
@@ -22,7 +22,7 @@
 #' @importFrom shiny selectInput numericInput textInput HTML
-GUI_SELECT_INPUTS <- c("proj_list",
+GUI_SELECT_INPUTS = c("proj_list",
                       "indir_list",
                       "ms1_coarse_unit",
                       "ms1_fine_unit",
@@ -30,7 +30,7 @@ GUI_SELECT_INPUTS <- c("proj_list",
                       "ret_time_shift_tol_unit",
                       "dfile_list")
-GUI_NUMERIC_INPUTS <- c("ms1_coarse",
+GUI_NUMERIC_INPUTS = c("ms1_coarse",
                   "ms1_fine",
                   "ms1_eic",
                   "ms1_rt_win",
@@ -39,14 +39,14 @@ GUI_NUMERIC_INPUTS <- c("ms1_coarse",
                   "s2n",
                   "ret_time_shift_tol")
-GUI_TEXT_INPUTS <- c("rep_aut",
+GUI_TEXT_INPUTS = c("rep_aut",
                     "rep_tit")
-GUI_RADIO_INPUTS <- c("missingprec")
+GUI_RADIO_INPUTS = c("missingprec")
-GUI_ALL_INPUTS <- c(GUI_SELECT_INPUTS,
+GUI_ALL_INPUTS = c(GUI_SELECT_INPUTS,
                    GUI_NUMERIC_INPUTS,
                    GUI_TEXT_INPUTS,
                    GUI_RADIO_INPUTS)
@@ -55,8 +55,8 @@ GUI_ALL_INPUTS <- c(GUI_SELECT_INPUTS,
 add_new_def_tag <- function(old_tags,how_many) {
    ind = which(grepl(r"(^F\d+$)",old_tags))
    st_num = if (length(ind)>0L) {
-                    old_def_tags <- old_tags[ind]
+                    old_def_tags = old_tags[ind]
-                    tag_nums <- gsub(r"(^F(\d+)$)",r"(\1)",old_def_tags)
+                    tag_nums = gsub(r"(^F(\d+)$)",r"(\1)",old_def_tags)
                    max(as.integer(tag_nums))
@@ -162,26 +162,26 @@ create_gui <- function(project_path=NA_character_) {
 #'@export
 r2datatab <- function(rdatatab) {
    shiny::isolate({
-        file <- rdatatab$file
+        file = rdatatab$file
-        adduct <- rdatatab$adduct
+        adduct = rdatatab$adduct
-        tag <- rdatatab$tag
+        tag = rdatatab$tag
-        set <- rdatatab$set
+        set = rdatatab$set
        })
-    if (length(file)==0L) file <- character(0)
+    if (length(file)==0L) file = character(0)
-    if (length(adduct)==0L) adduct <- rep(NA_character_,length(file))
+    if (length(adduct)==0L) adduct = rep(NA_character_,length(file))
-    if (length(tag)==0L) tag <- rep(NA_character_,length(file))
+    if (length(tag)==0L) tag = rep(NA_character_,length(file))
-    if (length(set)==0L) tag <- rep(NA_character_,length(file))
+    if (length(set)==0L) tag = rep(NA_character_,length(file))
    data.table(tag=tag,adduct=adduct,set=set,file=file)
 }
 r2filetag <- function(rfiletag) {
    shiny::isolate({
-        file <- rfiletag$file
+        file = rfiletag$file
-        tag <- rfiletag$tag
+        tag = rfiletag$tag
        })
-    if (length(file)==0L) file <- character(0)
+    if (length(file)==0L) file = character(0)
-    if (length(tag)==0L) tag <- rep(NA_character_,length(file))
+    if (length(tag)==0L) tag = rep(NA_character_,length(file))
    data.table(tag=tag,file=file)
 }
@@ -193,7 +193,7 @@ gen_dtab <- function(tablist,sets) {
 r2compounds <- function(rcompounds) {
    shiny::isolate({
-        cmpd_lists <- rcompounds$lists
+        cmpd_lists = rcompounds$lists
        })
    list(lists=cmpd_lists)
@@ -202,17 +202,17 @@ r2compounds <- function(rcompounds) {
 #' @export
 pack_app_state <- function(input, gui) {
-    pack <- list()
+    pack = list()
    shiny::isolate({
-        pack_inputs <- list()
+        pack_inputs = list()
-        pack_input_names <- which_gui_inputs(inputs)
+        pack_input_names = which_gui_inputs(inputs)
-        pack_inputs <- shiny::reactiveValuesToList(input)[pack_input_names]
+        pack_inputs = shiny::reactiveValuesToList(input)[pack_input_names]
-        pack$input <- pack_inputs
+        pack$input = pack_inputs
-        pack$datatab <- r2datatab(gui$datatab)
+        pack$datatab = r2datatab(gui$datatab)
-        pack$filetag <- r2filetag(gui$filetag)
+        pack$filetag = r2filetag(gui$filetag)
-        pack$compounds <- r2compounds(gui$compounds)
+        pack$compounds = r2compounds(gui$compounds)
-        pack$paths <- list()
+        pack$paths = list()
-        pack$paths$data <- gui$paths$data
+        pack$paths$data = gui$paths$data
        })
    pack
@@ -318,19 +318,21 @@ unpack_app_state <- function(session,envopts,input,top_data_dir,project_path,pac
                                 input=input,
                                 packed_state=packed_state)
-        gui <- create_gui(project_path=project_path)
+        gui = create_gui(project_path=project_path)
-        gui$compounds$lists <- packed_state$compounds$lists
+        gui$compounds$lists = packed_state$compounds$lists
-        gui$datatab$file <- packed_state$datatab$file
+        gui$datatab$file = packed_state$datatab$file
-        gui$datatab$adduct <- packed_state$datatab$adduct
+        gui$datatab$adduct = packed_state$datatab$adduct
-        gui$datatab$tag <- packed_state$datatab$tag
+        gui$datatab$tag = packed_state$datatab$tag
-        gui$datatab$set <- packed_state$datatab$set
+        gui$datatab$set = packed_state$datatab$set
-        gui$filetag$file <- packed_state$filetag$file
+        gui$filetag$file = packed_state$filetag$file
-        gui$filetag$tag <- packed_state$filetag$tag
+        gui$filetag$tag = packed_state$filetag$tag
-        x <- packed_state$paths$data
+        x = packed_state$paths$data
-        gui$paths$data = if (length(x)>0 && nchar(x)>0) basename(x) else ""
+        gui$paths$data = if (length(x)>0 && nchar(x)>0) {
-        if (!dir.exists(file.path(top_data_dir,gui$paths$data))) {warning("Data directory ", gui$paths$data, " does not exist. You must select one.")}
+                             file.path(top_data_dir,basename(x))
+                         } else ""
+        if (!dir.exists(gui$paths$data)) {warning("Data directory ", gui$paths$data, " does not exist. You must select one.")}
        gui
    })
@@ -340,12 +342,12 @@ unpack_app_state <- function(session,envopts,input,top_data_dir,project_path,pac
 input2conf_setup <- function(input,gui,conf=list()) {
    if (length(conf)==0L) {
-        conf$compounds <- list()
+        conf$compounds = list()
-        conf$summary_table <- list()
+        conf$summary_table = list()
-        conf$debug <- F
+        conf$debug = F
    }
-    conf$compounds$lists <- gui$compounds$lists
+    conf$compounds$lists = gui$compounds$lists
    conf$paths$data = basename(gui$paths$data)
    conf
 }
@@ -353,11 +355,11 @@ input2conf_setup <- function(input,gui,conf=list()) {
 input2conf_extract <- function(input,conf) {
    conf$tolerance = list()
    conf$extract = list()
-    conf$tolerance[["ms1 fine"]] <- paste(input$ms1_fine,input$ms1_fine_unit)
+    conf$tolerance[["ms1 fine"]] = paste(input$ms1_fine,input$ms1_fine_unit)
-    conf$tolerance[["ms1 coarse"]] <- paste(input$ms1_coarse,input$ms1_coarse_unit)
+    conf$tolerance[["ms1 coarse"]] = paste(input$ms1_coarse,input$ms1_coarse_unit)
-    conf$tolerance[["eic"]] <- paste(input$ms1_eic,input$ms1_eic_unit)
+    conf$tolerance[["eic"]] = paste(input$ms1_eic,input$ms1_eic_unit)
-    conf$tolerance[["rt"]] <- paste(input$ms1_rt_win,input$ms1_rt_win_unit)
+    conf$tolerance[["rt"]] = paste(input$ms1_rt_win,input$ms1_rt_win_unit)
-    conf$extract$missing_precursor_info <- input$missingprec
+    conf$extract$missing_precursor_info = input$missingprec
    conf
 }
@@ -365,26 +367,26 @@ input2conf_extract <- function(input,conf) {
 input2conf_prescreen <- function(input,conf) {
    conf$prescreen = list()
-    conf$prescreen[["ms1_int_thresh"]] <- input$ms1_int_thresh
+    conf$prescreen[["ms1_int_thresh"]] = input$ms1_int_thresh
-    conf$prescreen[["ms2_int_thresh"]] <- input$ms2_int_thresh
+    conf$prescreen[["ms2_int_thresh"]] = input$ms2_int_thresh
-    conf$prescreen[["s2n"]] <- input$s2n
+    conf$prescreen[["s2n"]] = input$s2n
-    conf$prescreen[["ret_time_shift_tol"]] <- paste(input$ret_time_shift_tol,input$ret_time_shift_tol_unit)
+    conf$prescreen[["ret_time_shift_tol"]] = paste(input$ret_time_shift_tol,input$ret_time_shift_tol_unit)
    conf
 }
 input2conf_figures <- function(input,conf) {
    conf$figures = list()
-    conf$figures$rt_min <- paste(input$plot_rt_min,input$plot_rt_min_unit)
+    conf$figures$rt_min = paste(input$plot_rt_min,input$plot_rt_min_unit)
-    conf$figures$rt_max <- paste(input$plot_rt_max,input$plot_rt_max_unit)
+    conf$figures$rt_max = paste(input$plot_rt_max,input$plot_rt_max_unit)
-    conf$figures$ext <- input$plot_ext
+    conf$figures$ext = input$plot_ext
    conf
 }
 input2conf_report <- function(input,conf) {
    conf$report = list()
-    conf$report$author <- input$rep_aut
+    conf$report$author = input$rep_aut
-    conf$report$title <- input$rep_tit
+    conf$report$title = input$rep_tit
    conf
 }
@@ -401,7 +403,7 @@ input2conf_metfrag <- function(input,conf) {
                          MetFragPreProcessingCandidateFilter = paste(input$mf_pre_processing_candidate_filter,collapse=","),
                          MetFragPostProcessingCandidateFilter = paste(input$mf_post_processing_candidate_filter,collapse=","))
-    ## TODO: FIXME: We need to move away from unit weights from some
+    ## TODO: FIXME: We need to move away from unit weights at some
    ## point. This needs some extra widgets (Sigh!). 
    insc = sapply(input$mf_scores_intrinsic,function(x) 1.0)
    names(insc) = input$mf_scores_intrinsic
@@ -425,7 +427,7 @@ app_update_conf <- function(input,gui,envopts,fconf,m) {
        fstr = paste0("input2conf_",fstrp)
        m$conf = do.call(fstr,list(input,conf=m$conf))
    }
-    m$run <- new_runtime_state(project=gui$paths$project,
+    m$run = new_runtime_state(project=gui$paths$project,
                               envopts = envopts,
                               conf=m$conf)
    m
@@ -453,28 +455,24 @@ app_state2state <- function(input,gui,envopts,m=NULL) {
 }
 get_sets <- function(gui) {
-    ## TODO FIXME
+    fn_lists = file.path(gui$paths$project,gui$compounds$lists)
-    ## Think about this
+    cmpds = join_compound_lists(fn_lists)
-    ## fn_lists <- file.path(gui$paths$project,gui$compounds$lists)
+    cmpds = process_cmpd_sets(cmpds)
+    cmpds[,unique(set)]
-    ## df <- fread(file=fn_lists)
-    ## if (!
-    ## res = df[,unique(set)]
-    if (length(res)==0L) res = "ALL"
 }
 gen_dfiles_tab <- function(gui) {
-    curr_file <- gui$filetag$file
+    curr_file = gui$filetag$file
-    curr_tag <- gui$filetag$tag
+    curr_tag = gui$filetag$tag
-    res <- data.table(file=curr_file,tag=curr_tag)
+    res = data.table(file=curr_file,tag=curr_tag)
    res
 }
 gui2datatab <- function(gui) {
-    df <- data.table(tag=as.character(gui$datatab$tag),
+    df = data.table(tag=as.character(gui$datatab$tag),
                     adduct=as.character(gui$datatab$adduct),
                     set=as.character(gui$datatab$set),
                     file=as.character(gui$datatab$file))
@@ -493,55 +491,59 @@ gui2datatab <- function(gui) {
 ## (of tags, CEs) in the index.
 gen_cindex <- function(summ,sorder,cols = CINDEX_COLS,by. = CINDEX_BY) {
    if (NROW(summ) == 0L) return(NULL)
-    allc <- c(by.,cols)
+    allc = c(by.,cols)
-    xsumm <- summ[,..allc]
+    xsumm = summ[,..allc]
    setnames(xsumm,old="ms1_rt",new="rt",skip_absent=T)
-    res <- xsumm[,.SD[max(qlt_ms1)==qlt_ms1][max(qlt_ms2)==qlt_ms2],by=by.]
+    res = xsumm[,.SD[max(qlt_ms1)==qlt_ms1][max(qlt_ms2)==qlt_ms2],by=by.]
-    res <- res[,c("mz","rt","Name","qlt_ms1","qlt_ms2"):=.(first(mz),
+    res = res[,c("mz","rt","Name","qlt_ms1","qlt_ms2"):=.(mean(mz,na.rm=T),
-                                                         first(mean(rt)),
+                                                           mean(rt,na.rm=T),
-                                                         first(Name),
+                                                           first(Name),
-                                                         first(qlt_ms1),
+                                                           max(qlt_ms1,na.rm=T),
-                                                         first(qlt_ms2)),
+                                                           max(qlt_ms2,na.rm=T)),
               by=by.]
-    res <- res[,unique(.SD),by=by.]
+    res = res[,unique(.SD),by=by.]
-    sorder <- unique(sorder)
+    sorder = unique(sorder)
-    wna <- which(sorder=="nothing"); if (length(wna)>0L) sorder <- sorder[-wna]
+    wna = which(sorder=="nothing"); if (length(wna)>0L) sorder = sorder[-wna]
-    quality <- which("quality"==sorder)
+    quality = which("quality"==sorder)
    if (length(quality)>0L) {
-        pre <- head(sorder,quality-1L)
+        pre = head(sorder,quality-1L)
-        post <- tail(sorder,-quality)
+        post = tail(sorder,-quality)
-        sorder <- c(pre,"qlt_ms1","qlt_ms2",post)
+        sorder = c(pre,"qlt_ms1","qlt_ms2",post)
    }
-    ord <- rep(1L,length(sorder))
+    ord = rep(1L,length(sorder))
    if ("qlt_ms1" %in% sorder) {
-        ind <- which(sorder %in% c("qlt_ms1","qlt_ms2"))
+        ind = which(sorder %in% c("qlt_ms1","qlt_ms2"))
-        ord[ind] <- -1L
+        ord[ind] = -1L
    }
    if (length(sorder)>0) setorderv(res,cols=sorder,order=ord)
-    setnames(res,old="rt",new="rt(ms1)")
+    ## Remove confusing columns.
+    res[,c("rt","mz"):=NULL]
+    res[,c("qlt_ms1","qlt_ms2"):=.(signif(100*qlt_ms1/10.,3),signif(100*qlt_ms2/10.,3))]
+    setnames(res,c("qlt_ms1","qlt_ms2"),c("best score (ms1)","best score (ms2)"))
    res
 }
 cindex_from_input <- function(clabs,sort_catg=character(4),summ) {
-    grp <- if (isTruthy(clabs)) setdiff(CINDEX_BY,clabs) else CINDEX_BY
+    grp = if (isTruthy(clabs)) setdiff(CINDEX_BY,clabs) else CINDEX_BY
-    sorder <- setdiff(sort_catg,clabs)
+    sorder = setdiff(sort_catg,clabs)
    gen_cindex(summ,sorder=sorder,by=grp)
 }
 get_cindex_key <- function(cindex) {
    ## Select only valid category names.
-    x <- which(CINDEX_BY %in% names(cindex))
+    x = which(CINDEX_BY %in% names(cindex))
    CINDEX_BY[x]
 }
 get_cindex_parents <- function(summ,ckey,kvals,labs) {
    ## Get kvals part of summ.
-    tab <- summ[(kvals),on=names(kvals)][,unique(.SD),.SDcols=labs,by=ckey] #get_data_from_key(summ,kvals)
+    tab = summ[(kvals),on=names(kvals)][,unique(.SD),.SDcols=labs,by=ckey] #get_data_from_key(summ,kvals)
    tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcol=labs]
-    keys <- names(tab)[names(tab)!="item"]
+    keys = names(tab)[names(tab)!="item"]
    data.table::setkeyv(tab,keys)
    tab
 }
@@ -550,28 +552,28 @@ get_cindex_kval <- function(cindex,row,key) {
    ## Accounting for not fully initialised state.
    if (!is.numeric(row) || is.na(row) || length(key)==0L || is.na(key) || NROW(cindex)==0L) return(NULL)
-    rowtab <- cindex[(row),..key]
+    rowtab = cindex[(row),..key]
-    res <- lapply(rowtab,function (x) x[[1]])
+    res = lapply(rowtab,function (x) x[[1]])
-    names(res) <- key
+    names(res) = key
    res
 }
-get_summ_subset <- function(summ,ptab,paritem,kvals) {
+get_summ_subset <- function(db,summ,ptab,paritem,kvals) {
-    select <- ptab[item==(paritem)]
+    select = ptab[item==(paritem)]
-    tab <- get_data_from_key(summ,kvals)[select,nomatch=NULL,on=key(ptab)]
+    tab = get_data_from_key(db=db,tab=summ,kvals=kvals)[select,nomatch=NULL,on=key(ptab)]
-    if ("an.1" %in% names(tab)) tab[,an.1:=NULL] #TODO: This is
+    if ("scan.1" %in% names(tab)) tab[,scan.1:=NULL] #TODO: This is
                                                 #probably a lousy
                                                 #hack.
    tab
 }
-get_ltab <- function(summ_subs,cols=c("an","ms2_rt")) {
+get_ltab <- function(summ_subs,cols=c("scan","ms2_rt")) {
-    tab <- summ_subs
+    tab = summ_subs
    if (NROW(tab)==1L && is.na(tab$an)) return(data.table::data.table(item=character()))
-    tab[is.na(ms2_sel),ms2_sel:=F] #TODO FIXME: Check why NAs exist at all?
+    tab[is.na(ms2_sel),ms2_sel:=F]
    tab[,passval:=fifelse(qa_pass==T,"OK","BAD")]
    tab[ms2_sel==T,passval:="SELECTED"]
-    res <- tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcols=c(cols,"passval")]
+    res = tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcols=c(cols,"passval")]
    data.table::setkey(res,"ms2_rt")
    res
 }
@@ -585,18 +587,18 @@ update_on_commit_chg <- function(summ,input,ptab,ltab) {
    n_ms2_sel = input$chg_ms2sel
-    sel_par <- input$sel_parent_trace
+    sel_par = input$sel_parent_trace
-    sel_spec <- input$sel_spec
+    sel_spec = input$sel_spec
-    pkvals <- ptab[item==(sel_par),.SD,.SDcols=intersect(SUMM_KEY,names(ptab))]
+    pkvals = ptab[item==(sel_par),.SD,.SDcols=intersect(SUMM_KEY,names(ptab))]
-    lkvals <- ltab[item==(sel_spec),.SD,.SDcols=intersect(SUMM_KEY,names(ltab))]
+    lkvals = ltab[item==(sel_spec),.SD,.SDcols=intersect(SUMM_KEY,names(ltab))]
-    kvals <- c(as.list(pkvals),as.list(lkvals))
+    kvals = c(as.list(pkvals),as.list(lkvals))
-    kvals <- kvals[unique(names(kvals))]
+    kvals = kvals[unique(names(kvals))]
    if ('an' %in% names(kvals) && n_ms2_sel) {
-        rkvals <- kvals[!(names(kvals) %in% 'an')]
+        rkvals = kvals[!(names(kvals) %in% 'an')]
-        rktab <- tabkey(summ,kvals=rkvals)
+        rktab = tabkey(summ,kvals=rkvals)
-        tabsel <- summ[rktab,.(an,ms2_sel)]
+        tabsel = summ[rktab,.(scan,ms2_sel)]
-        ansel <- tabsel[ms2_sel == T,an]
+        ansel = tabsel[ms2_sel == T,scan]
        print('ansel')
        print(ansel)
        if (length(ansel)!=0) {
@@ -607,29 +609,29 @@ update_on_commit_chg <- function(summ,input,ptab,ltab) {
    }
-    tgts <- c("ms1_rt","ms1_int",names(n_qa),"ms2_sel")
+    tgts = c("ms1_rt","ms1_int",names(n_qa),"ms2_sel")
-    srcs <- c(list(n_ms1_rt,n_ms1_int),as.list(n_qa),as.list(n_ms2_sel))
+    srcs = c(list(n_ms1_rt,n_ms1_int),as.list(n_qa),as.list(n_ms2_sel))
-    the_row <- tabkey(summ,kvals=kvals)
+    the_row = tabkey(summ,kvals=kvals)
    summ[the_row,(tgts):=..srcs]
-    summ[,an.1:=NULL] #FIXME: an.1 pops up somewhere.
+    summ[,scan.1:=NULL]
-    qflg <- QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
+    qflg = QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
    summ[the_row,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
    summ
 }
 get_mprop_ms2_metadata <- function(ltab_entry) {
- res <- list(rt=NA_real_,int=NA_real_,qa=character(0),ms2_sel=F)
+ res = list(rt=NA_real_,int=NA_real_,qa=character(0),ms2_sel=F)
 if (NROW(ltab_entry)==0L) return(res)
 res$rt = ltab_entry$ms1_rt
 res$int = ltab_entry$ms1_int
- z <- ltab_entry[.SD,.SDcols=patterns("qa_ms[12].*")]
+ z = ltab_entry[.SD,.SDcols=patterns("qa_ms[12].*")]
- lqa_vals <- as.list(ltab_entry[,.SD,.SDcols=patterns("qa_ms[12].*")])
+ lqa_vals = as.list(ltab_entry[,.SD,.SDcols=patterns("qa_ms[12].*")])
- qa_names <- names(lqa_vals)
+ qa_names = names(lqa_vals)
- res$qa <- qa_names[as.logical(lqa_vals)]
+ res$qa = qa_names[as.logical(lqa_vals)]
 res$ms2_sel = ltab_entry$ms2_sel
 res

--- a/R/shiny-ui-base.R
+++ b/R/shiny-ui-base.R
@@ -564,7 +564,7 @@ mk_shinyscreen_server <- function(projects,init) {
        res = if (NROW(sel_dt)>0) {
-                   tab[sel_dt,..coln,on=c("adduct","tag","ID","an")]
+                   tab[sel_dt,..coln,on=c("adduct","tag","ID","scan")]
               } else triv
        data.table::setnames(res,"intensity","ms2_int")
        res
@@ -700,8 +700,6 @@ mk_shinyscreen_server <- function(projects,init) {
        ## REACTIVE FUNCTIONS: COMPOUND INDEX
        rf_get_cindex <- reactive({
-            ## TODO: FIXME: Uncomment after rearranging everything.
-            ## input$cmt_changes_b
            rvs$status$is_qa_stat
            s1 = input$sort1
            s2 = input$sort2
@@ -741,8 +739,8 @@ mk_shinyscreen_server <- function(projects,init) {
        rf_get_cindex_parents <- reactive({
            rvs$m
            isolate({
-                ms1 = rvs$m$extr$ms1
+                ms1 = rvs$m$db$extr$cgm$ms1
-                ms2 = rvs$m$extr$ms2
+                ms2 = rvs$m$db$extr$cgm$ms2
                summ = req(rvs$m$out$tab$summ)
            })
@@ -760,7 +758,8 @@ mk_shinyscreen_server <- function(projects,init) {
            kvals = req(rf_get_cindex_kval())
            ptab = rf_get_cindex_parents()
            if (isTruthy(parent)) {
-                get_summ_subset(summ=summ,
+                get_summ_subset(db=rvs$m$db,
+                                summ=summ,
                                ptab=ptab,
                                paritem=parent,
                                kvals=kvals)
@@ -813,20 +812,23 @@ mk_shinyscreen_server <- function(projects,init) {
        rf_plot_eic_ms1 <- reactive({
            isolate({
-                ms1 = rvs$m$extr$ms1
+                ms1 = rvs$m$db$extr$cgm$ms1
                summ = rvs$m$out$tab$summ
+                db = rvs$m$db
            })
            req(NROW(summ)>0L)
            req(NROW(ms1)>0L)
-            p = make_eic_ms1_plot(ms1,summ,kvals=rf_get_cindex_kval(),
+            p = make_eic_ms1_plot(db=db,
-                                   labs=rf_get_cindex_labs(),
+                                  summ,
-                                   asp=PLOT_EIC_ASPECT,
+                                  kvals=rf_get_cindex_kval(),
-                                   rt_range=rf_get_rtrange(),
+                                  labs=rf_get_cindex_labs(),
-                                   i_range=rf_get_irange(),
+                                  asp=PLOT_EIC_ASPECT,
-                                   colrdata = rf_colrdata())
+                                  rt_range=rf_get_rtrange(),
+                                  i_range=rf_get_irange(),
+                                  colrdata = rf_colrdata())
            p = if (!is.null(p)) p else empty_plot("Nothing to plot")
@@ -851,12 +853,13 @@ mk_shinyscreen_server <- function(projects,init) {
            gg = rf_plot_eic_ms1()
            rt_rng = range(gg$data$rt)
-            p = make_eic_ms2_plot(summ,
+            p = make_eic_ms2_plot(rvs$m$db,
-                                   kvals=rf_get_cindex_kval(),
+                                  summ,
-                                   labs=rf_get_cindex_labs(),
+                                  kvals=rf_get_cindex_kval(),
-                                   rt_range = rf_get_ms2_eic_rtrange(),
+                                  labs=rf_get_cindex_labs(),
-                                   asp=PLOT_EIC_ASPECT,
+                                  rt_range = rf_get_ms2_eic_rtrange(),
-                                   colrdata=rf_colrdata())
+                                  asp=PLOT_EIC_ASPECT,
+                                  colrdata=rf_colrdata())
            p = if (!is.null(p)) p else empty_plot("Nothing to plot")
@@ -878,15 +881,15 @@ mk_shinyscreen_server <- function(projects,init) {
        rf_plot_spec_ms2 <- reactive({
            isolate({
                summ = rvs$m$out$tab$summ
-                ms2 = rvs$m$extr$ms2
+                ms2 = rvs$m$db$extr$cgm$ms2
            })
            req(NROW(summ)>0L)
            req(NROW(ms2)>0L)
-            p = make_spec_ms2_plot(ms2,
+            p = make_spec_ms2_plot(db = rvs$m$db,
-                                    summ,
+                                   summ,
-                                    kvals=req(rf_get_cindex_kval()),
+                                   kvals=req(rf_get_cindex_kval()),
-                                    labs=req(rf_get_cindex_labs()),
+                                   labs=req(rf_get_cindex_labs()),
-                                    colrdata=rf_colrdata())
+                                   colrdata=rf_colrdata())
            p = if (!is.null(p)) p else empty_plot("Nothing to plot")
            p
@@ -905,29 +908,35 @@ mk_shinyscreen_server <- function(projects,init) {
                              choices = list.files(path=top_data_dir,
                                                   pattern = CMPD_LIST_PATT))
-            updateSelectInput(session = session,
-                              inputId = "dfile_list",
-                              choices = list.files(path=top_data_dir,
-                                                   pattern = DFILES_LIST_PATT))
-            updateSelectInput(session = session,
-                              inputId = "top_data_dir_list",
-                              selected = basename(top_data_dir),
-                              choices = list.dirs(path = init$envopts$top_data_dir,
-                                                  full.names = F,
-                                                  recursive = F))
        })
        observe({
-            top_data_dir = rvs$gui$paths$data
+            data_dir = rvs$gui$paths$data
-            req(isTruthy(top_data_dir) && dir.exists(top_data_dir))
+            if ((isTruthy(data_dir) && dir.exists(data_dir))) {
+                dfchoices = list.files(path=data_dir,
+                                       pattern = DFILES_LIST_PATT)
+                updateSelectInput(session = session,
+                                  inputId = "dfile_list",
+                                  choices = dfchoices)
+                updateSelectInput(session = session,
+                                  inputId = "top_data_dir_list",
+                                  selected = basename(rvs$gui$paths$data),
+                                  choices = list.dirs(path = init$envopts$top_data_dir,
+                                                      full.names = F,
+                                                      recursive = F))
+            }
-            updateSelectInput(session = session,
-                              inputId = "dfile_list",
-                              choices = list.files(path=top_data_dir,
-                                                   pattern = DFILES_LIST_PATT))
        })
        ## Update projects and data directories every second.
        observeEvent(rtimer1000(),{
@@ -981,13 +990,15 @@ mk_shinyscreen_server <- function(projects,init) {
                rvs$status$ms2_int_thresh_stat = rvs$m$conf$prescreen[["ms2_int_thresh"]]
                rvs$status$s2n_stat = rvs$m$conf$prescreen[["s2n"]]
                rvs$status$ret_time_shift_tol_stat = rvs$m$conf$prescreen[["ret_time_shift_tol"]]
-                if (NROW(m$extr$ms1)>0L) rvs$status$is_extracted_stat = "Yes."
+                if (NROW(m$db$extr$cgm$ms1)>0L) rvs$status$is_extracted_stat = "Yes."
                if (NROW(m$out$tab$summ)>0L) rvs$status$is_qa_stat = "Yes."
            } else {
                message("Initialising project: ",wd)
                rvs$gui = create_gui(project_path=fullwd)
            }
            message("project: ",rvs$gui$project())
        }, label = "project-b")
@@ -1017,10 +1028,11 @@ mk_shinyscreen_server <- function(projects,init) {
        observeEvent(input$sel_data_dir_b,{
            data_dir = input$top_data_dir_list
-            req(isTruthy(data_dir))
+            if (isTruthy(data_dir)) {
-            rvs$gui$paths$data = file.path(init$envopts$top_data_dir, data_dir)
+                rvs$gui$paths$data = file.path(init$envopts$top_data_dir, data_dir)
+                message("Selected data dir:",rvs$gui$paths$data)
-            message("Selected data dir:",rvs$gui$paths$data)
+            }
        })
@@ -1045,32 +1057,9 @@ mk_shinyscreen_server <- function(projects,init) {
        observeEvent(input$datafiles_b,{
            new_file = input$dfile_list
-            if (isTruthy(new_file)) {
+            if (isTruthy(new_file)) rvs$gui$filetag = filetag_add_file(rvs$gui$filetag,
-                rvs$gui$filetag = filetag_add_file(rvs$gui$filetag,
+                                                                       new_file)
-                                                   new_file)
-                 ## curr_file = rvs$gui$datatab$file
-                ## curr_tag = rvs$gui$datatab$tag
-                ## curr_adduct = rvs$gui$datatab$adduct
-                ## curr_set = rvs$gui$datatab$set
-                ## res_file = union(curr_file,new_file)
-                ## res_adduct = c(curr_adduct,rep(NA_character_,nd))
-                ## res_set = c(curr_set,rep(NA_character_,nd))
-                ## rvs$gui$datatab$file = res_file
-                ## rvs$gui$datatab$tag = add_new_def_tag(as.character(rvs$gui$datatab$tag),nd)
-                ## rvs$gui$datatab$adduct = res_adduct
-                ## rvs$gui$datatab$set = res_set
-            }
-            updateSelectInput(session=session,
-                              inputId="dfile_list",
-                              selected=NULL)
        })
        observeEvent(input$rem_dfiles_b,{
@@ -1162,7 +1151,7 @@ mk_shinyscreen_server <- function(projects,init) {
                    rv_extr_flag(F)
                    rvs$m = run(m=rvs$m,
                                envopts=init$envopts,
-                                phases=c("setup","comptab","extract"))
+                                phases=c("setup","comptab","db","extract"))
                    rvs$status$is_extracted_stat = "Yes."
                    rvs$status$is_qa_stat = "No."
                    fn_c_state = file.path(rvs$m$run$paths$project,
@@ -1178,7 +1167,7 @@ mk_shinyscreen_server <- function(projects,init) {
        observeEvent(input$presc_b,{
-            if (NROW(rvs$m$extr$ms1)>0L) {
+            if (NROW(rvs$m$db$extr$cgm$ms1)>0L) {
                ## Update just prescreening conf.
                rvs$m = app_update_conf(input=input,
                                        gui=rvs$gui,
@@ -1310,9 +1299,10 @@ mk_shinyscreen_server <- function(projects,init) {
        observeEvent(input$make_report_b,{
            isolate({
-                ms1 = rvs$m$extr$ms1
+                ms1 = rvs$m$db$extr$cgm$ms1
-                ms2 = rvs$m$extr$ms2
+                ms2 = rvs$m$db$extr$cgm$ms2
                summ = rvs$m$out$tab$summ
+                db = rvs$m$db
            })
            req(NROW(summ)>0L)
@@ -1334,18 +1324,21 @@ mk_shinyscreen_server <- function(projects,init) {
                names(kvals) = key
                message('Compound index row: ',ri)
-                p1 = make_eic_ms1_plot(ms1,summ,kvals=kvals,
+                p1 = make_eic_ms1_plot(db=db,
-                                        labs=labs,
+                                       summ,
-                                        asp=PLOT_EIC_ASPECT,
+                                       kvals=kvals,
-                                        rt_range=rt_range,
+                                       labs=labs,
-                                        i_range=i_range,
+                                       asp=PLOT_EIC_ASPECT,
-                                        colrdata = colrdata) + theme_print()
+                                       rt_range=rt_range,
+                                       i_range=i_range,
-                p2 = make_eic_ms2_plot(summ,kvals=kvals,
+                                       colrdata = colrdata) + theme_print()
-                                        labs=labs,
-                                        asp=PLOT_EIC_ASPECT,
+                p2 = make_eic_ms2_plot(db=db,
-                                        rt_range=rt_range,
+                                       summ,kvals=kvals,
-                                        colrdata = colrdata) + theme_print()
+                                       labs=labs,
+                                       asp=PLOT_EIC_ASPECT,
+                                       rt_range=rt_range,
+                                       colrdata = colrdata) + theme_print()
@@ -1353,11 +1346,11 @@ mk_shinyscreen_server <- function(projects,init) {
                smi = rvs$m$out$tab$comp[ID==(id),SMILES][[1]]
                p_struc = make_struct_plot(smi)
-                p_spec = make_spec_ms2_plot(ms2,
+                p_spec = make_spec_ms2_plot(db=db,
-                                             summ,
+                                            summ,
-                                             kvals=kvals,
+                                            kvals=kvals,
-                                             labs=labs,
+                                            labs=labs,
-                                             colrdata=colrdata)+theme_print()
+                                            colrdata=colrdata)+theme_print()
                cmb = combine_plots(p1,p2,p_spec,p_struc)
                print(cmb)
@@ -1377,7 +1370,7 @@ mk_shinyscreen_server <- function(projects,init) {
            fn = file.path(projdir,input$ms2_spectra_tab_name)
            shinymsg(paste0("Saving MS2 spectra table to: ",basename(fn)))
            tab2file(pack_ms2_w_summ(rvs$m$out$tab$summ,
-                                    rvs$m$extr$ms2),
+                                    rvs$m$db$extr$spectra),
                     fn)
            shinymsg("Done saving MS2 spectra table.")
        })
@@ -1588,7 +1581,7 @@ mk_shinyscreen_server <- function(projects,init) {
        })
        output$comp_table = DT::renderDataTable({
-            ## TODO FIXME
+            ## TODO 
            ## cmpds = rf_get_cmpd_tab()
            ## validate(need(NROW(cmpds)>0,"No compound list loaded yet."))
            ## DT::datatable(cmpds,
@@ -1626,8 +1619,6 @@ mk_shinyscreen_server <- function(projects,init) {
                                        m=rvs$m)
                kv = rf_get_cindex_kval()
-                ## Some cols that might be needed to be specified
-                ## explicitely. FIXME TODO: Make this more robust.
                nsumm = mf_narrow_summ(rvs$m$out$tab$summ,kv,
                                       ms2_rt_i=input$mf_entry_rt_min,
                                       ms2_rt_f=input$mf_entry_rt_max)
@@ -1639,7 +1630,8 @@ mk_shinyscreen_server <- function(projects,init) {
                                       path = rvs$m$run$metfrag$path,
                                       subpaths = rvs$m$run$metfrag$subpaths,
                                       db_file = rvs$m$run$metfrag$db_file,
-                                       stag_tab = stagtab, ms2 = rvs$m$extr$ms2,
+                                       stag_tab = stagtab,
+                                       ms2 = rvs$m$db$extr$spectra,
                                       runtime=rvs$m$run$metfrag$runtime,
                                       java_bin=rvs$m$run$metfrag$java_bin,
                                       nproc = rvs$m$conf$metfrag$nproc)
@@ -1768,7 +1760,7 @@ mk_shinyscreen_server <- function(projects,init) {
            selMS2 = req(input$sel_spec)
            if (NROW(ms2tabsel)!=0L) {
                lval = lapply(ms2tabsel[item==(selMS2)],function(x) x)
-                ms2 = rvs$m$extr$ms2
+                ms2 = rvs$m$db$extr$cgm$ms2
                kval = rf_get_cindex_kval()
                allval = c(kval,lval)
                ## There can be some duplicates.
@@ -1778,10 +1770,11 @@ mk_shinyscreen_server <- function(projects,init) {
                #more than the names existing in extr$ms2. Also,
                #BASE_KEY_MS2 does not contain `an', so we need to readd
                #it.
-                key = unique(c(names(allval)[names(allval) %in% BASE_KEY_MS2],"an"))
+                key = unique(c(names(allval)[names(allval) %in% BASE_KEY_MS2],"scan"))
                kval2 = allval[key]
-                spec = get_data_from_key(ms2,kval2)[,.(mz,intensity)]
+                ## Only precid and scan can be used for selection in spectra.
-                ## as.character(lapply(1L:NROW(spec),function(nr) paste0(spec[nr,mz]," ",spec[nr,intensity])))
+                kval2 = as.data.table(kval2[c("precid","scan")])
+                spec = rvs$m$db$extr$spectra[kval2,.(mz,intensity),on=.(precid,scan)]
                print(as.data.frame(spec),row.names=F)
            } else {

--- a/R/state.R
+++ b/R/state.R
@@ -37,13 +37,12 @@ new_state <- function() {
 runtime_from_conf <- function(run,envopts,conf) {
    lst_cmpl <- conf$compounds$lists
-    lst_fn_cmpl <- lapply(names(lst_cmpl),function (nm) {
+    lst_fn_cmpl <- lapply(lst_cmpl,function (lst) {
-        bfn_cmpl <- lst_cmpl[[nm]]
+        bfn_cmpl <- lst
        fn <- file.path(run$paths$project,bfn_cmpl)
        if (!file.exists(fn)) stop("File ", fn, " does not exist in ", run$paths$project," .")
        fn
    })
-    names(lst_fn_cmpl) <- names(lst_cmpl)
    run$paths$compounds$lists <- lst_fn_cmpl
    run$paths$data = norm_path(file.path(envopts$top_data_dir,conf$paths$data))
@@ -95,7 +94,7 @@ new_runtime_state <- function(project,envopts,conf=NULL) {
            ## If MetFrag possible, set up the file structure.
            if (metfrag$cando_metfrag) {
                mfdir = file.path(project_path,"metfrag")
-                dir.create(mfdir)
+                dir.create(mfdir, showWarnings = F)
                metfrag$path=mfdir
                subpaths = list(results = "results",
                                config = "config",
@@ -204,7 +203,7 @@ new_project <- function(project,envopts,datatab=NULL,conf=NULL) {
                 check_file_absent(fn_conf,what="conf-file")
                 yaml::yaml.load_file(fn_conf)
             } else conf 
-    m$conf$compounds$lists = label_cmpd_lists(m$conf$compounds$lists)
+    ## m$conf$compounds$lists = label_cmpd_lists(m$conf$compounds$lists)
    m$run = new_runtime_state(project,envopts=envopts,conf=m$conf)
    if (!is.null(datatab)) {
        m$input$tab$mzml = datatab
@@ -409,9 +408,11 @@ pack_ms2_w_summ <- function(summ,ms2) {
    ## Takes summ, finds entries with high quality spectra and subsets ms2 based on that.
    ## Take the columns we need from summ.
-    x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz","SMILES","Formula","Name")]
+    x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz","SMILES","Name")]
-    mrg_keys = c(intersect(key(ms2),key(summ)),"an")
+    mrg_keys = intersect(key(ms2),key(summ))
-    ms2[x,.(mz=i.mz,ms2_spectrum=encode_ms2_to_line(.SD[,c("mz","intensity")])),on=mrg_keys,by=.EACHI]
+    mrg_keys = c(mrg_keys,"scan")
+    ## ms2[x,.(mz=i.mz,ms2_spectrum=encode_ms2_to_line(.SD[,c("mz","intensity")])),on=mrg_keys,by=.EACHI]
+    ms2[x,.(ion_mz=i.mz,mz,intensity),on=mrg_keys,by=.EACHI]
 }
@@ -431,3 +432,40 @@ pack_project <- function(m,fn_arch) {
    })
    fn_arch
 }
+join_compound_lists <- function(fname) {
+    coll <- list()
+    fields <- colnames(EMPTY_CMPD_LIST)
+    coltypes <- c(ID="character",
+                  SMILES="character",
+                  Formula="character",
+                  Name="character",
+                  RT="numeric",
+                  mz="numeric")
+    l=0L
+    for (fn in fname) {
+        l = l + 1L
+        ## Figure out column headers.
+        nms <- colnames(file2tab(fn,nrows=0))
+        ## Read the table. Knowing column headers prevents unnecessary
+        ## warnings.
+        dt <- file2tab(fn, colClasses=coltypes[nms])
+        verify_cmpd_l(dt=dt,fn=fn)
+                                        # nonexist <- setdiff(fnfields,fields)
+        coll[[l]] <- dt #if (length(nonexist)==0) dt else dt[,(nonexist) := NULL]
+        coll[[l]]$ORIG <- fn
+    }
+    if (length(fname)>0) rbindlist(l=c(list(EMPTY_CMPD_LIST), coll), use.names = T, fill = T) else EMPTY_CMPD_LIST
+}
+process_cmpd_sets <- function(cmpdlist) {
+    if (nrow(cmpdlist)==0L) return(EMPTY_CMPD_LIST)
+    ## Process sets.
+    if (! ("set" %in% colnames(cmpdlist))) cmpdlist$set=NA_character_ else cmpdlist[,set:=as.character(set)]
+    slugs = cmpdlist[,.(slug=gen_fname_slug(ORIG)),by="ORIG"]
+    slugs[,slug:=uniqy_slugs(slug)]
+    cmpdlist[slugs,set:=fifelse(is.na(set),i.slug,set),on="ORIG"]
+    cmpdlist
+}
No results found