Compare revisions

Todor Kondić · Todor Kondić · Todor Kondić · Todor Kondić · Todor Kondić · Todor Kondić
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -44,9 +44,6 @@ variables:


 base-image:
-  tags:
-    - docker
-    - $RUNNER_TAG
  stage: dep_images
  rules:
    - if: $CI_COMMIT_TAG == ""

--- a/DESCRIPTION
+++ b/DESCRIPTION
 Package: shinyscreen
 Title: Pre-screening of Mass Spectrometry Data 
-Version: 1.3.19
+Version: 1.3.21
 Author: Todor Kondić
 Maintainer: Todor Kondić <todor.kondic@uni.lu>
 Authors@R: 
@@ -50,6 +50,7 @@ Collate:
    'errors.R'
    'mix.R'
    'envopts.R'
+    'data-model.R'
    'state.R'
    'metfrag.R'
    'plotting.R'

--- a/Docker_Documentation/Shinyscreen_Docker.Rmd
+++ b/Docker_Documentation/Shinyscreen_Docker.Rmd
+---
+title: "shinyscreen Docker Documentation"
+author: "Anjana Elapavalore"
+date: '`r Sys.Date()`'
+output: pdf_document
+---
+
+The objective of this documentation is to provide a detailed procedure for running the shinyScreen application within a Docker container. shinyScreen is a web-based tool developed for visualizing and analyzing high-throughput screening data using Shiny, a web application framework for R. Docker is employed to containerize the application, ensuring reproducibility and simplifying the deployment process by packaging the application with all its dependencies.
+
+This document provides step-by-step instructions on how to install Docker, pull the ShinyScreen Docker image, and run the application.
+
+## Prerequisites
+
+Before proceeding, the following prerequisites must be met:
+
+- Docker Desktop must be installed and properly configured on the system.
+- Sufficient system resources, including at least 4 GB of RAM, are recommended for optimal performance.
+
+## Step 1: Install Docker Desktop on Windows
+
+If Docker Desktop is not already installed on your system, follow these steps to set it up:
+
+1. **Download Docker Desktop**:
+   - Visit the [Docker Desktop for Windows](https://docs.docker.com/desktop/install/windows-install/) download page and download the installer.
+   
+2. **Install Docker Desktop**:
+   - Run the downloaded installer.
+   - Follow the installation prompts, and make sure the option to "Use the WSL 2-based engine" is selected during setup (if available).
+   
+3. **Verify Docker Installation**:
+   - After installation, launch Docker Desktop.
+   - Ensure Docker is running by checking the Docker icon in the system tray (it should be active).
+   - Open a command prompt (CMD) or PowerShell and type the following command to check the Docker version:
+
+```bash
+
+docker --version
+
+```
+
+This should return the installed Docker version.
+
+## Step 2: Search for the ShinyScreen Docker Image in Docker Desktop
+
+Docker Desktop provides a graphical interface to search for Docker images. To search and pull the **shinyScreen** Docker image:
+
+1. **Open Docker Desktop**:
+   - Launch Docker Desktop from the Windows Start menu or by clicking the Docker icon in the system tray.
+
+2. **Search for the ShinyScreen Image**:
+   - In Docker Desktop, navigate to the **search** tab as shown in the image below.
+   
+   **Docker Image : anjanae/shinyscreen:v21_latest**
+   
+```{r, echo=FALSE, out.width="70%"}
+knitr::include_graphics("images/searchtab.png")
+
+```
+
+3. **Pull the Image**:
+   - Click on the **Pull** button next to the desired ShinyScreen image. This will download the image to your local system.
+
+You can monitor the download progress in Docker Desktop’s **Images** tab.
+
+## Step 3: Run the ShinyScreen Docker Container
+
+Once the ShinyScreen image has been pulled successfully, the next step is to run the container. This can be done directly from Docker Desktop or via the command line:
+
+### Running the Container via Docker Desktop:
+
+1. **Navigate to the Images Tab**:
+   - In Docker Desktop, go to the **Images** tab where your downloaded ShinyScreen image will be listed.
+
+2. **Launch the Container**:
+   - Click the **Run** button next to the ShinyScreen image.
+   - In the pop-up dialog, ensure that port `3838` is exposed. To do this, add the following port configuration in the **Optional Settings**:
+     
+```
+Host Port: 3838
+Container Port: 3838
+
+```
+3. **To create the volume mounts**:
+
+  - To share files between the host and the container, you can use Docker’s volume mounting          feature.
+  
+### local Directory Structure
+
+Before launching the Docker container, the following directory structure should be set up on your local system. These directories will be mounted to specific locations inside the container, enabling the application to access the required input data and store the output files.
+
+### Root Directory
+
+The **root directory** is the top-level folder where the entire project is organized. This root directory will contain the following subdirectories:
+1. **Project Directory**
+2. **Data Directory**
+3. **MetFrag Databases Directory**
+
+Below is an overview of each subdirectory and its purpose.
+
+### Project Directory
+
+The **Project Directory** should initially contain only the **compound list** file, which is an essential input for the ShinyScreen application.
+
+After the application runs, this directory will also store various output files generated during the analysis.
+
+### Data Directory
+
+The **Data Directory** should contain only the **mzML files**. These files serve as the primary data input for the ShinyScreen application and represent mass spectrometry data in mzML format.
+
+### MetFrag Databases Directory
+
+The **MetFrag Databases Directory** must include the latest **PubChemLite_Exposomics.csv** file, which will be used by the ShinyScreen application for MetFrag analysis.
+
+
+### Configure Volume Mounts in Docker Desktop
+
+Now that the local directories are set up, you can configure **volume mounts** in Docker Desktop to map these local directories to the corresponding paths inside the Docker container.
+
+For the volume mounts, each directory on the host machine (local system) will be mapped to a specific location within the Docker container, allowing the application to access the input data and save output files.
+
+In the **Optional Settings** pop-up window, you will configure the following **three volume mounts** (Please refer to the table and image below):
+
+   - **Local Directory on Windows**: 
+     - This is the folder on your local machine where the data will reside.
+   - **Container Directory**: 
+     - This is the path inside the Docker container where the files will be mounted.
+
+
+| Local Directory (Host)         | Container Directory (Mount)            | Description                     |
+| ------------------------------ | -------------------------------------- | ------------------------------- |
+| `C:\path\to\rootdirectory`           | `/home/ssuser/projects`                | Directory for project files.     |
+| `C:\path\to\rootdirectory`       | `/home/ssuser/top_data_dir`            | Directory for top-level data.    |
+| `C:\path\to\metfrag_dbs`        | `/home/ssuser/metfrag_dbs`             | Directory for MetFrag databases. |
+
+```{r, echo=FALSE, out.width="70%"}
+knitr::include_graphics("images/runtab.png")
+
+```
+
+By setting up the volume mounts, any changes made to files in the mapped directories on the host machine will automatically reflect inside the container.
+
+Once the **port** and **volume configurations** are confirmed, click **Run** to start the container. The ShinyScreen application will now be running inside the container.
+
+Now click on the port mapping to access the application in the web browser
+
+```{r, echo=FALSE, out.width="85%"}
+knitr::include_graphics("images/containertab.png")
+
+```
+
+
+
--- a/Docker_Documentation/Shinyscreen_Docker.pdf
+++ b/Docker_Documentation/Shinyscreen_Docker.pdf
--- a/Docker_Documentation/images/containertab.png
+++ b/Docker_Documentation/images/containertab.png
--- a/Docker_Documentation/images/runtab.png
+++ b/Docker_Documentation/images/runtab.png
--- a/Docker_Documentation/images/searchtab.png
+++ b/Docker_Documentation/images/searchtab.png
--- a/Dockerfile
+++ b/Dockerfile
 FROM gitlab.lcsb.uni.lu:4567/eci/shinyscreen/dep/ssuser:latest
 MAINTAINER todor.kondic@uni.lu
-EXPOSE 5432
+
+EXPOSE 3838
+
 ENV SS_MF_DB="PubChemLite_exposomics.csv"
 ENV SS_CPU 2
+
 ADD . shinyscreen/
 RUN R CMD build shinyscreen
+
 USER root
 RUN R CMD INSTALL shinyscreen
+
 USER ssuser
 RUN cp shinyscreen/runme /home/ssuser/runme
 RUN chmod u+x /home/ssuser/runme
 # RUN chown ssuser /home/ssuser/runme
 # RUN chown -R ssuser /home/ssuser/shinyscreen
+
 RUN R -e 'library(shinyscreen);setwd("~");init(top_data_dir="~/top_data_dir",projects="~/projects",users_dir="~/users",metfrag_db_dir=Sys.getenv("SS_MF_DB_DIR"),metfrag_jar="/usr/local/bin/MetFragCommandLine.jar",no_structure_plots=T,save=T,merge=F)'
+
 ENTRYPOINT ["/home/ssuser/runme"]
 CMD ["app"]
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -6,7 +6,6 @@ export(conf_trans)
 export(create_plots)
 export(create_stub_gui)
 export(extr_data)
-export(extract)
 export(get_fn_comp)
 export(get_fn_conf)
 export(get_fn_extr)
@@ -47,8 +46,6 @@ export(sort_spectra)
 export(subset_summary)
 export(tk_save_file)
 import(data.table)
-importFrom(MSnbase,filterMz)
-importFrom(MSnbase,readMSData)
 importFrom(promises,"%...>%")
 importFrom(promises,future_promise)
 importFrom(shiny,HTML)

--- a/R/api.R
+++ b/R/api.R
@@ -32,6 +32,7 @@ run <- function(envopts,
    
    all_phases=list(setup=setup_phase,
                    comptab=mk_comp_tab,
+                    db=make_db,
                    extract=extr_data,
                    prescreen=prescreen,
                    sort=sort_spectra,
@@ -98,39 +99,14 @@ run_in_dir <- function(m) {
    
 }

-
-
-
-
 ##' @export
 load_compound_input <- function(m) {
-    coll <- list()
-    fields <- colnames(EMPTY_CMPD_LIST)
-    fns <- file.path(m$run$paths$project,m$conf$compounds$lists)
-    message("fns:",paste0(fns,collapse=","))
-    coltypes <- c(ID="character",
-                  SMILES="character",
-                  Formula="character",
-                  Name="character",
-                  RT="numeric",
-                  mz="numeric")
-    for (l in 1:length(fns)) {
-        fn <- fns[[l]]
-
-        ## Figure out column headers.
-        nms <- colnames(file2tab(fn,nrows=0))
-
-        ## Read the table. Knowing column headers prevents unnecessary
-        ## warnings.
-        dt <- file2tab(fn, colClasses=coltypes[nms])
-        verify_cmpd_l(dt=dt,fn=fn)
-                                        # nonexist <- setdiff(fnfields,fields)
-        coll[[l]] <- dt #if (length(nonexist)==0) dt else dt[,(nonexist) := NULL]
-        coll[[l]]$ORIG <- fn
-        
-    }
-    cmpds <- if (length(fns)>0) rbindlist(l=c(list(EMPTY_CMPD_LIST), coll), use.names = T, fill = T) else EMPTY_CMPD_LIST

+    fns <- file.path(m$run$paths$project,m$conf$compounds$lists)
+    cmpds = join_compound_lists(fns)
+    ## Process sets.
+    cmpds = process_cmpd_sets(cmpds)
+    
    dups <- duplicated(cmpds$ID)
    dups <- dups | duplicated(cmpds$ID,fromLast = T)
    dupIDs <- cmpds$ID[dups]
@@ -219,8 +195,6 @@ mk_comp_tab <- function(m) {
    smiforadd <- smiles[smiforadd,.(ID,SMILES,Formula,adduct),on=c("SMILES")]
    data.table::setkey(smiforadd,"adduct","ID")

-    ## FIXME: Why is Formula a list when there are no SMILES, instead
-    ## of an empty string?
    smiforadd[,Formula:=as.character(Formula)]
    
    ## Update the intermediate table with masses.
@@ -336,181 +310,67 @@ mk_tol_funcs <- function(m) {
    
 }

-
 ##' @export
 extr_data <-function(m) {
-    message("Stage: extract")
-    if (is.null(m$conf$serial) || !m$conf$serial) {
-        extr_data_future(m)
-    } else {
-        message("(extract): Serial extraction.")
-        extr_data_serial(m)
-    }
-}

-extr_data_future <- function(m) {
-    ## Reduce the comp table to only unique masses (this is because
-    ## different sets can have same masses).
-    
-    m$out$tab$data <- m$out$tab$comp[,head(.SD,1),by=BASE_KEY]
-    m$out$tab$data[,set:=NULL] #This column is meaningless now.
-    file <- m$out$tab$data[,unique(file)]
-    fpaths <- file.path(m$run$paths$data,file)
-    allCEs <- do.call(c,args=lapply(fpaths,function(fn) {
-        z <- MSnbase::readMSData(files=fn,msLevel = c(1,2),mode="onDisk")
+    fine = create_fine_table(m)

-        
-        unique(MSnbase::collisionEnergy(z),fromLast=T)
-        
-    }))
-    allCEs <- unique(allCEs)
-    allCEs <- allCEs[!is.na(allCEs)]
-    cols <-paste('CE',allCEs,sep = '')
-    vals <- rep(NA,length(cols))
-    m$out$tab$data[,(cols) := .(rep(NA,.N))]
-    file <- m$out$tab$data[,unique(file)]
-    ftags <- m$out$tab$data[,.(tag=unique(tag)),by=file]
-    fpaths <- file.path(m$run$paths$data,ftags[,file])
-    futuref <- m$future
-    tmp <- lapply(1:nrow(ftags),function(ii) {
-        fn <- fpaths[[ii]]
-        the_tag <- ftags[ii,tag]
-        message("(extract): Commencing extraction for tag: ", the_tag, "; file: ",fn)
-        tab <- as.data.frame(data.table::copy(m$out$tab$data[tag==the_tag,.(file,tag,adduct,mz,rt,ID)]))
-        ## err_ms1_eic <- m$extr$tol$eic
-        ## err_coarse_fun <- m$extr$tol$coarse
-        ## err_fine_fun <- m$extr$tol$fine
-        ## err_rt <- m$extr$tol$rt
-
-        err_coarse <- m$conf$tolerance[["ms1 coarse"]]
-
-
-        err_fine <- m$conf$tolerance[["ms1 fine"]]
+    dpath = m$run$paths$data

-        
-        err_ms1_eic <- m$conf$tolerance$eic 
-        
-        
-        err_rt <- m$conf$tolerance$rt
+    ## Open all files.
+    fns = fine[,unique(file)]
+    lms = lapply(fns,function(fn) read_data_file(file.path(dpath,fn)))
+    names(lms) = fns
+
+    ## Load all feature data.
+    lfdata = lapply(lms,get_fdata)
+    names(lfdata) = fns

-        missing_precursor_info <- m$conf$extract$missing_precursor_info
-        x <- futuref(extract(fn=fn,
-                             tag=the_tag,
-                             tab=tab,
-                             err_ms1_eic=err_ms1_eic,
-                             err_coarse = err_coarse,
-                             err_fine= err_fine,
-                             err_rt= err_rt,
-                             missing_precursors = missing_precursor_info),
-                     lazy = F)
+    ## Extract MS1 chromatograms using "fine" tolerance.
+    cgram_ms1 = fine[,extr_cgrams_ms1(lms[[file]],.SD,lfdata[[file]]),
+                     by="file",
+                     .SDcols=c("iso_fine_min",
+                               "iso_fine_max",
+                               "rt_min",
+                               "rt_max",
+                               "precid")]
+    setkey(cgram_ms1,file,precid)

-        x
+    ## Extract MS2 chromatograms.

-    })
+    ## Create the "coarse" table. Parent masses are known with
+    ## "coarse". We will prefilter our ms2 results based on that...
+    coarse = create_coarse_table(m)

-    msk <- sapply(tmp,future::resolved)
-    curr_done <- which(msk)
    
-    for (x in curr_done) {
-        message("Done extraction for ", future::value(tmp[[x]])$ms1$tag[[1]])
-    }
-    while (!all(msk)) {
-        msk <- sapply(tmp,future::resolved)
-        newly_done <- which(msk)
-        for (x in setdiff(newly_done,curr_done)) {
-            message("Done extraction for ", future::value(tmp[[x]])$ms1$tag[[1]])
-        }
-        Sys.sleep(0.5)
-        curr_done <- newly_done
+    cgram_ms2 = data.table(precid=integer(0),
+                           ce=numeric(0),
+                           scan=character(0),
+                           idx=integer(0),
+                           rt=numeric(0),
+                           intensity=numeric(0))
+
+
+    ## Extract MS2 spectra.
+    spectra = empty_spectra_table()
+
+    for (fn in names(lfdata)) {
+        rtab = relate_ms2_to_precid(coarse=coarse[.(fn),on=.(file)],
+                                    ms2=lfdata[[fn]]$ms2,
+                                    cgram_ms1=cgram_ms1[.(fn),
+                                                        on=.(file)])
+        sptab = extract_spectra(lms[[fn]],rtab)
+        cgram_ms2 = rbind(cgram_ms2,rtab)
+        spectra = rbind(spectra,sptab)
    }
-    
-    ztmp <- lapply(tmp,future::value)
-    m$extr$ms1 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms1))
-    m$extr$ms2 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms2))
-    data.table::setkeyv(m$extr$ms1,BASE_KEY)
-    data.table::setkeyv(m$extr$ms2,c(BASE_KEY,"CE"))
-
-    fn_ex <- get_fn_extr(m)
-    timetag <- format(Sys.time(), "%Y%m%d_%H%M%S")
-    saveRDS(object = m, file = file.path(m$run$paths$project,FN_EXTR_STATE))
+    setkey(cgram_ms1,precid,rt)
+    setkey(cgram_ms2,precid,ce,rt)
+    setkey(spectra,precid,scan)
+    m$db$extr$cgm$ms1 = cgram_ms1
+    m$db$extr$cgm$ms2 = cgram_ms2
+    m$db$extr$spectra = spectra
    m
-    
-}
-
-
-extr_data_serial <- function(m) {
-    ## Reduce the comp table to only unique masses (this is because
-    ## different sets can have same masses).
-    
-    m$out$tab$data <- m$out$tab$comp[,head(.SD,1),by=BASE_KEY]
-    m$out$tab$data[,set:=NULL] #This column is meaningless now.
-    file <- m$out$tab$data[,unique(file)]
-    fpaths <- file.path(m$run$paths$data,file)
-    allCEs <- do.call(c,args=lapply(fpaths,function(fn) {
-        z <- MSnbase::readMSData(files=fn,msLevel = c(1,2),mode="onDisk")
-
-        
-        unique(MSnbase::collisionEnergy(z),fromLast=T)
-        
-    }))
-    allCEs <- unique(allCEs)
-    allCEs <- allCEs[!is.na(allCEs)]
-    cols <-paste('CE',allCEs,sep = '')
-    vals <- rep(NA,length(cols))
-    m$out$tab$data[,(cols) := .(rep(NA,.N))]
-    file <- file.path(m$run$paths$data,m$out$tab$data[,unique(file)])
-    ftags <- m$out$tab$data[,.(tag=unique(tag)),by=file]
-    ftags[,path:=file.path(..m$run$paths$data,file)]
-    futuref <- m$future
-    tmp <- lapply(1:nrow(ftags),function(ii) {
-        fn <- ftags[ii,path]
-        the_tag <- ftags[ii,tag]
-        message("(extract): Commencing extraction for tag: ", the_tag, "; file: ",fn)
-        tab <- as.data.frame(data.table::copy(m$out$tab$data[tag==the_tag,.(file,tag,adduct,mz,rt,ID)]))
-        ## err_ms1_eic <- m$extr$tol$eic
-        ## err_coarse_fun <- m$extr$tol$coarse
-        ## err_fine_fun <- m$extr$tol$fine
-        ## err_rt <- m$extr$tol$rt
-
-        err_coarse <- m$conf$tolerance[["ms1 coarse"]]
-
-
-        err_fine <- m$conf$tolerance[["ms1 fine"]]
-
-        
-        err_ms1_eic <- m$conf$tolerance$eic 
-        
-        
-        err_rt <- m$conf$tolerance$rt
-
-        missing_precursor_info <- m$conf$extract$missing_precursor_info
-       
-        x <- extract(fn=fn,
-                     tag=the_tag,
-                     tab=tab,
-                     err_ms1_eic=err_ms1_eic,
-                     err_coarse = err_coarse,
-                     err_fine= err_fine,
-                     err_rt= err_rt,
-                     missing_precursors = missing_precursor_info)
-
-        
-        message("Done extraction for ", x$ms1$tag[[1]])
-        x
-
-    })
-
-    ztmp <- tmp
-    m$extr$ms1 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms1))
-    m$extr$ms2 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms2))
-    data.table::setkeyv(m$extr$ms1,BASE_KEY)
-    data.table::setkeyv(m$extr$ms2,c(BASE_KEY,"CE"))

-    fn_ex <- get_fn_extr(m)
-    timetag <- format(Sys.time(), "%Y%m%d_%H%M%S")
-    saveRDS(object = m, file = file.path(m$run$paths$project,FN_EXTR_STATE))
-    m
-    
 }

 ##' @export
@@ -523,11 +383,10 @@ conf_trans <- function(conf) {
 prescreen <- function(m) {
    ## Top-level auto prescreening function.
    message("(prescreen): Start.")
-    ## confpres <- conf_trans_pres(m$conf$prescreen)
-    m$qa <- NULL
-    m$out$tab$summ <- NULL
-    m$qa <- analyse_extracted_data(m$extr,m$conf$prescreen)
-    m$out$tab$summ <- gen_summ(m$out$tab$comp,m$qa)
+    m$qa = NULL
+    m$out$tab$summ = NULL
+    m$qa = analyse_extracted_data(m$db,m$conf$prescreen)
+    m$out$tab$summ = gen_summ(m$db,m$qa,m$out$tab$comp)
    message("(prescreen): End.")
    m
 }
@@ -605,8 +464,8 @@ create_plots <- function(m) {

    ## Select the data nedeed for plotting.
    flt_summ <- m$out$tab$flt_summ
-    ms2 <- m$extr$ms2
-    ms1 <- m$extr$ms1
+    ms2 <- m$db$extr$cgm$ms2
+    ms1 <- m$db$extr$cgm$ms1
    rt_min <- rt_in_min(m$conf$figures$rt_min)
    rt_max <- rt_in_min(m$conf$figures$rt_max)
    keytab <- flt_summ[,unique(.SD),.SDcol=c("adduct","ID")]
@@ -747,8 +606,8 @@ report <- function(m) {
    dir.create(REP_TOPDIR,recursive = T,showWarnings = F)
    header <- knitr::knit_expand(fn_header)
    flt_summ <- m$out$tab$reptab
-    ms2 <- m$extr$ms2
-    ms1 <- m$extr$ms1
+    ms2 <- m$db$extr$cgm$ms2
+    ms1 <- m$db$extr$cgm$ms1
    rt_min <- rt_in_min(m$conf$figures$rt_min)
    rt_max <- rt_in_min(m$conf$figures$rt_max)
    keytab <- flt_summ[,unique(.SD),.SDcol=c("adduct","ID")]
@@ -922,7 +781,7 @@ metfrag <- function(m) {
                       path = m$run$metfrag$path,
                       subpaths = m$run$metfrag$subpaths,
                       db_file = m$run$metfrag$db_file,
-                       stag_tab = stagtab, ms2 = m$extr$ms2,
+                       stag_tab = stagtab, ms2 = m$db$extr$spectra,
                       runtime=m$run$metfrag$runtime,
                       java_bin=m$run$metfrag$java_bin,
                       nproc = m$conf$metfrag$nproc)
@@ -942,3 +801,9 @@ metfrag <- function(m) {

    m
 }
+
+make_db <- function(m) {
+    m = make_db_catalogue(m)
+    m = make_db_precursors(m)
+    m
+}
--- a/R/base.R
+++ b/R/base.R
@@ -18,7 +18,13 @@
 the_ifelse <- data.table::fifelse
 dtable <- data.table::data.table

-norm_path <- function(...) normalizePath(...,winslash='/')
+norm_path <- function(...) {
+    test = nchar(...) > 0L
+    res = character(length(test))
+    res[test] = normalizePath(...[test],winslash='/') 
+    res[!test] = ...[!test]
+    res
+}

 tab2file<-function(tab,file,...) {
    data.table::fwrite(x=tab,file=file,...)
@@ -143,3 +149,23 @@ uniqy_slugs <- function(slugs) {
    dt = data.table::data.table(slug=slugs)
    dt[,slug:=fifelse(rep(.N==1L,.N),slug,paste0(slug,"_",seq(1L,.N))),by="slug"]$slug
 }
+
+gen_val_unc <- function(x,dx) {
+    ## Doesn't work well for <=0.
+    p = floor(log10(x))
+    dp = floor(log10(dx))
+    ## Zero?
+    message("p ",p)
+    w = which(is.infinite(p))
+    p[w] = 0
+
+    ## Normalise x and dx.
+    main = x/10**p
+    unc = round(dx/10**dp,0)
+    place = p - dp
+    main = mapply(function (m,d) formatC(m,digits=d,format='f',flag="#"),main,place,USE.NAMES=F)
+    w = which(main=='10.')
+    main[w]='1'
+    p[w]=p[w]+1
+    paste0(main,"(",unc,") x 10^",p)
+}
--- a/R/data-model.R
+++ b/R/data-model.R
+#Copyright (C) 2023 by University of Luxembourg
+
+## Shinyscreen works of an internal relational database implemented
+## using `data.table' package. Implementation is here.
+
+
+make_db_catalogue <- function(m) {
+    ## Takes comprehensive database from state `m' and generates a
+    ## catalogue with a unique key. This catalogue is based on
+    ## inputs. Each entry in the catalogue corresponds to a single
+    ## target mass from a single experimental run.
+    res = m$out$tab$comp[,unique(.SD),.SDcols=c("set","tag","adduct","ID")]
+    res[,catid:=.I]
+    setkeyv(res,DB_CATALOGUE_KEY)
+    setindex(res,catid)
+    m$db$cat = res
+    m
+}
+
+
+
+merge_precid_4_isobars <- function(orig_precids,masses,up_masses) {
+    start = head(orig_precids,1L)
+    n = length(orig_precids)
+    precid = orig_precids
+    i = 1L
+    while (i < n) {
+        theprecid = orig_precids[[i]]
+        themz = masses[[i]]
+        mzup = up_masses[[i]]
+        w = which(masses[(i+1L):n]<mzup)
+        precid[(i+1L):n][w] = theprecid
+        i = i + length(w) + 1L
+    }
+    precid
+}
+
+make_db_precursors <- function(m) {
+    ## Generate masses and label isobars.
+
+    cat = m$db$cat
+    masses = m$out$tab$comp[cat,.(tag=tag,catid=catid,mz=mz,rt=rt),on=key(cat)]
+    setkey(masses,tag,mz)
+
+    ## Retention time.
+    tmp = get_val_unit(m$conf$tolerance[['rt']])
+    rttol = as.numeric(tmp[['val']])
+    rtunit = tmp[['unit']]
+    if (rtunit == "s") {
+        rttol = rttol/60.
+    } else if (rtunit != "min") {
+        stop('make_db_precursors: Unknown retention time unit.')
+    }
+    masses[!is.na(rt),`:=`(rt_min=rt-rttol,rt_max=rt+rttol)]
+    
+    ## Fine error.
+    tmp = get_val_unit(m$conf$tolerance[['ms1 fine']])
+    ms1tol = as.numeric(tmp[['val']])
+    ms1unit = tmp[['unit']]
+    if (ms1unit == "ppm") {
+        masses[,`:=`(mz_fine_min=mz-ms1tol*mz*1e-6,mz_fine_max=mz+ms1tol*mz*1e-6)]
+    } else if (ms1unit == "Da") {
+        masses[,`:=`(mz_fine_min=mz-ms1tol,mz_fine_max=mz+ms1tol)]
+    } else {
+        stop('make_db_precursors: Unknown mass unit (fine).')
+    }
+
+
+    ## Coarse error.
+    tmp = get_val_unit(m$conf$tolerance[['ms1 coarse']])
+    ms1tol = as.numeric(tmp[['val']])
+    ms1unit = tmp[['unit']]
+    if (ms1unit == "ppm") {
+        masses[,`:=`(mz_coarse_min=mz-ms1tol*mz*1e-6,mz_coarse_max=mz+ms1tol*mz*1e-6)]
+    } else if (ms1unit == "Da") {
+        masses[,`:=`(mz_coarse_min=mz-ms1tol,mz_coarse_max=mz+ms1tol)]
+    } else {
+        stop('make_db_precursors: Unknown mass unit (coarse).')
+    }
+
+    ## Assign "fine" isobars to same isocoarse number.
+    masses[,precid:=merge_precid_4_isobars(catid,mz,mz_fine_max),by="tag"]
+
+    ## Assign "coarse" isobars to same isocoarse number.
+    masses[,isocoarse:=merge_precid_4_isobars(catid,mz,mz_coarse_max),by="tag"]
+    
+
+    masses[,`:=`(iso_coarse_min=min(mz_coarse_min),
+                 iso_coarse_max=max(mz_coarse_max)),
+                 by=isocoarse]
+
+    masses[,`:=`(iso_fine_min=min(mz_fine_min),
+                 iso_fine_max=max(mz_fine_max)),
+                 by=precid]
+    setindex(masses,isocoarse,precid)
+    ## Add files.
+    filetab = m$input$tab$mzml[m$db$cat,
+                               .(catid=i.catid,file=file),
+                               on=c("set","tag"),nomatch=NULL]
+    masses[filetab,file:=i.file,on="catid"]
+    m$db$precursors = masses
+    m
+}
+
+
+empty_cgram_ms1 <- function(n=0L) {
+    r = data.table(file=character(n),
+                   cgmidx=integer(n),
+                   precid=integer(n),
+                   scan=integer(n),
+                   rt=numeric(n),
+                   intensity=numeric(n))
+    setkey(r,precid,rt)
+    r
+}
+
+empty_cgram_ms2 <- function(n=0L) {
+    r = data.table(precid=integer(n),
+                   ce=numeric(n),
+                   scan=character(n),
+                   idx=integer(n),
+                   rt=numeric(n),
+                   intensity=numeric(n))
+    setkey(r,precid,ce,idx)
+    r
+               
+}
+
+
+empty_spectra_table <- function(n=0L) {
+    r = data.table(precid=integer(n),
+                   scan=character(n),
+                   mz=numeric(n),
+                   intensity=numeric(n))
+    setkey(r,precid,scan)
+    r
+}
+
+
+summ_needs_from_cat <- function(cat) {
+    ## Catalogue columns.
+    cat
+}
+
+summ_needs_from_precursors <- function(res,precursors) {
+    ## Mass columns.
+    precursors[res,on=.(catid),.(precid,
+                                 mz,
+                                 set,
+                                 adduct,
+                                 tag,
+                                 ID,
+                                 mz_l=mz_fine_min,
+                                 mz_r=mz_fine_max),by=.EACHI]
+}
+
+summ_needs_from_qa <- function(res,qa) {
+    needs = qa[,.SD,.SDcols=c("precid",
+                              "ce",
+                              "scan",
+                              "ms1_rt",
+                              "ms1_int",
+                              "ms2_rt",
+                              "ms2_int",
+                              "ms1_mean",
+                              "ms2_sel",
+                              "qa_pass",
+                              "qa_ms1_exists",
+                              "qa_ms2_exists",
+                              "qa_ms1_good_int",
+                              "qa_ms1_above_noise",
+                              "qa_ms2_near",
+                              "qa_ms2_good_int",
+                              "qlt_ms1",
+                              "qlt_ms2")]
+
+    res = needs[res,on=.(precid),allow.cartesian=T]
+    ## TODO: additional processing?
+    res
+}
+
+summ_needs_from_comp <- function(res,comp) {
+    needs = comp[,.(set,ID,Name,SMILES)]
+    setkey(needs,set,ID)
+    res[needs,on=.(set,ID),`:=`(Name=i.Name,
+                                SMILES=i.SMILES)]
+}
+
+## This function creates `summ' table. 
+gen_summ <- function(db,qa,comp) {
+
+    ## Start with the basic things.
+    res = summ_needs_from_cat(db$cat)
+
+    ## Add masses and precids.
+    res = summ_needs_from_precursors(res,db$precursors)
+
+    ## Add qa columns.
+    res = summ_needs_from_qa(res,qa)
+
+    setkeyv(res,SUMM_KEY)
+
+    ## Add comp columns.
+    summ_needs_from_comp(res,comp)
+    
+}
--- a/R/extraction.R
+++ b/R/extraction.R
--- a/R/metfrag.R
+++ b/R/metfrag.R
@@ -36,7 +36,7 @@ metfrag_get_stag_tab <- function(summ) {
    ## Argument summ can be a subset of actual `summ' table.
    x = gen_1d_keytab(summ)
    data.table::setnames(x,old="key1d",new="stag")
-    res = x[summ,`:=`(CE=i.CE,ion_mz=mz)]
+    res = x[summ,`:=`(ce=i.ce,ion_mz=mz)]
    res
 }

@@ -54,18 +54,21 @@ get_mf_res_ext <- function(fn) {

 metfrag_run <- function(param,path,subpaths,db_file,stag_tab,ms2,runtime,java_bin,nproc = 1L) {
    keys = intersect(colnames(stag_tab),colnames(ms2))
+    rms2 = ms2[stag_tab,on=keys,nomatch=NULL]
    message("Generating MetFrag configs.")
-    file_tab = ms2[stag_tab,{
+
+
+    file_tab = rms2[,{
        r = write_metfrag_config(param = ..param,
                                 path = ..path,
                                 subpaths = ..subpaths,
                                 db_file = ..db_file,
-                                 stag = stag,
-                                 adduct = adduct,
-                                 ion_mz = ion_mz,
+                                 stag = first(stag),
+                                 adduct = first(adduct),
+                                 ion_mz = first(ion_mz),
                                 spec = data.table(mz=mz,intensity=intensity))
-        c(r,stag = stag)
-    },by=.EACHI,on=keys]
+        c(r,stag = first(stag))
+    },keyby=keys]
    message("Done generating MetFrag configs.")

    withr::with_dir(path,{
@@ -89,33 +92,30 @@ metfrag_run <- function(param,path,subpaths,db_file,stag_tab,ms2,runtime,java_bi

 mf_narrow_summ <- function(summ,kv,ms2_rt_i=NA_integer_,ms2_rt_f=NA_integer_) {
    skey = data.table::key(summ)
-    cols = c("adduct","tag","ID","CE","an","mz","qa_pass","ms2_rt")
-    nsumm = get_rows_from_summ(summ,kv,cols)
+    cols = union(names(skey),c("adduct","tag","ID","ce","precid","scan","mz","qa_pass","ms2_rt"))
+    dtkv = as.data.table(kv)
+    nsumm = summ[dtkv,on=names(kv),.SD,.SDcols=cols]
    nsumm = nsumm[qa_pass==T] # Those that make sense.
-    nsumm_key = union(SUMM_KEY,"ms2_rt")
+    nsumm_key = intersect(union(SUMM_KEY,"ms2_rt"),colnames(nsumm))
    data.table::setkeyv(nsumm,nsumm_key)
    
-    if (!is.na(ms2_rt_i)) {
-        nsumm = nsumm[ms2_rt>(ms2_rt_i)]
-    }
+    ms2_rt_i = if (!is.na(ms2_rt_i)) ms2_rt_i else 0.
+    ms2_rt_f = if (!is.na(ms2_rt_f)) ms2_rt_f else Inf

-    if (!is.na(ms2_rt_f)) {
-        nsumm = nsumm[ms2_rt<(ms2_rt_f)]
-    }
+    nsumm[ms2_rt > (ms2_rt_i) & ms2_rt < (ms2_rt_f)]

-    nsumm
 }

 get_metfrag_targets <- function(stag_tab,ms2) {
    ## Take the columns we need from summ.
    x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz")]
-    mrg_keys = c(intersect(key(ms2),key(summ)),"an")
+    mrg_keys = c(intersect(key(ms2),key(summ)),"scan")
    x=ms2[x,.(CE=CE,ion_mz=i.mz,mz,intensity),on=mrg_keys,by=.EACHI]
    ## Get column order so that `an' follows `CE'.
-    resnms = setdiff(mrg_keys,"an")
-    nms = union(union(resnms,"CE"),c("an","ion_mz","mz","intensity"))
+    resnms = setdiff(mrg_keys,"scan")
+    nms = union(union(resnms,"CE"),c("scan","ion_mz","mz","intensity"))
    data.table::setcolorder(x,neworder = nms)
-    setkeyv(x,unique(c(resnms,"CE","an")))
+    setkeyv(x,unique(c(resnms,"CE","scan")))
    x
    
 }
@@ -220,7 +220,7 @@ summarise_metfrag_results <- function(param,path,subpaths,cand_parameters,db_sco
    }

    .adapt_col_types <- function(x) {
-        x[,(names(db_scores)):=lapply(.SD, as.numeric),.SDcol=names(db_scores)]
+        if (length(db_scores)>0) x[,(names(db_scores)):=lapply(.SD, as.numeric),.SDcol=names(db_scores)] else x
    }

    .calc_basic_scores <- function(x) {

--- a/R/mix.R
+++ b/R/mix.R
@@ -191,7 +191,7 @@ gen_empty_summ <- function() {
 ##     ms2_cols <- intersect(colnames(qa_ms2),SUMM_COLS)
 ##     ms2_cols <- setdiff(ms2_cols,colnames(summ))
 ##     summ <- qa_ms2[summ,c(..comp_cols,..ms1_cols,..ms2_cols),on=BASE_KEY]
-##     data.table::setkeyv(summ,c(BASE_KEY_MS2,"an"))
+##     data.table::setkeyv(summ,c(BASE_KEY_MS2,"scan"))
 ##     summ[,qa_ms1_exists:=the_ifelse(!is.na(qa_ms1_good_int),T,F)]
 ##     summ[,qa_ms2_exists:=the_ifelse(!is.na(CE),T,F)]
 ##     summ[,qa_pass:=apply(.SD,1,all),.SDcols=QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]]
@@ -547,274 +547,6 @@ conf_trans_pres <- function(pres_list) {
    pres_list
 }

-## PRESCREENING
-
-create_qa_table <- function(extr,conf_presc) {
-    ## The first input argument is the extracted `ms`, table
-    ## containing MS1 and MS2 spectra. The argument `conf_presc` is
-    ## m$conf$prescreen, the prescreening parameters given in the conf
-    ## file.
-
-    ## The qa table is just a copy of ms with added quality control
-    ## columns QA_COLS.
-
-    ## The QA_FLAGS columns are flags specifying which properties of
-    ## compounds are known well, or not.
-
-    ## For each compound (mass) we ask the following questions:
-    ## qa_ms1_exists -- does the MS1 spectrum exist at all?
-    ## qa_ms2_exists -- do we have any MS2 spectra at all?
-    ## qa_ms1_above_noise -- is MS1 above the noise treshold?
-    ## qa_ms2_near -- is there any MS2 spectrum inside the tolerated
-    ## retention time window around the MS1 peak? That is, are we
-    ## non-RT-shifted?
-    ## qa_ms2_good_int -- Is there any MS2 spectral intensity greater
-    ## than the MS2 threshold and less than the MS1 peak?
-    ## qa_pass -- did the spectrum pass all the checks?
-    
-
-    ## The columns in QA_NUM_REAL are:
-    ## 
-    ## ms1_int -- the maximum intensity of MS1 spectrum over the
-    ## entire run;
-    ##
-    ## ms1_rt -- the retention time of the peak MS1.
-
-    ## The columns in QA_NUM_INT are:
-    ##
-    ## ms2_sel -- index of the selected MS2 spectrum; if not NA, the
-    ## associated spectrum passed all the checks (qa_pass == T); the
-    ## spectrum itself is in one of the member sublists of the `spec'
-    ## column. The integer `ms2_sel' is then the index of the spectrum
-    ## in that sublist.
-    ##
-    ## ms1_rt_ind -- TODO (but not important to end users).
-    
-    
-    qa <- list(prescreen=conf_presc)
-    
-    ## checks <- extr$ms2[,{
-    ## },keyby=BASE_KEY_MS2]
-    checks <- extr$ms2
-    checks[,(QA_FLAGS):=F]
-    ## message("checks:")
-    ## print(checks)
-    ## message("done checks")
-    checks[,(QA_NUM_INT):=NA_integer_]
-    checks[,(QA_NUM_REAL):=NA_real_]
-    setkeyv(checks,BASE_KEY_MS2)
-    qa$checks <- checks
-    qa
-}
-
-assess_ms1 <- function(m) {
-    qa <- m$qa
-    ms1 <- m$extr$ms1
-    ## Calculate auxiliary variables and indices.
-    qa_ms1 <- ms1[,.(ms1_rt_ind=which.max(intensity)),keyby=BASE_KEY]
-    qa_ms1 <- ms1[qa_ms1,.(ms1_rt_ind=ms1_rt_ind,
-                           ms1_int=intensity[[ms1_rt_ind]],
-                           ms1_rt=rt[[ms1_rt_ind]],
-                           ms1_mean=mean(intensity)),on=BASE_KEY,by=.EACHI]
-
-    
-    qa_ms1[,qa_ms1_good_int := ms1_int > qa$prescreen$ms1_int_thresh]
-    qa_ms1[,qa_ms1_above_noise := F]
-    qa_ms1[qa_ms1_good_int==T,qa_ms1_above_noise := .(ms1_int > qa$prescreen$s2n*ms1_mean)]
-    
-    
-    
-
-    
-    ## checks[(!qa_ms1_above_noise),c("qa_ms2_good_int","qa_ms2_near","qa_ms2_exists","qa_pass"):=F]
-    ## qa_ms1 <- check_ms1_noise(check_ms1(qa_ms1))
-    m$qa$ms1 <- qa_ms1
-    m
-}
-
-assess_ms2 <- function(m) {
-
-    presconf <- conf_trans_pres(m$conf$prescreen)
-
-    ms1 <- m$extr$ms1
-    ms2 <- m$extr$ms2
-    qa_ms1 <- m$qa$ms1
-    qa_ms2 <- ms2[qa_ms1[qa_ms1_above_noise==T],.(CE=unique(CE),
-                                                  pc_rt=i.ms1_rt,
-                                                  pc_int=i.ms1_int,
-                                                  an=unique(an)),on=BASE_KEY,by=.EACHI,nomatch=NULL]
-
-    rt_win2 <- presconf$ret_time_shift_tol
-    qa_ms2 <- ms2[qa_ms2,.(pc_rt=pc_rt,
-                           pc_int=pc_int,
-                           ms2_int=max(intensity),
-                           ms2_rt=unique(rt),
-                           qa_ms2_near=head(rt,1) < pc_rt + rt_win2 & head(rt,1) > pc_rt - rt_win2),
-                  by=.EACHI,on=c(BASE_KEY_MS2,"an")]
-
-    qa_ms2$qa_ms2_good_int <-F
-    qa_ms2[qa_ms2_near==T,
-           qa_ms2_good_int := ms2_int > presconf$ms2_int_thresh & ms2_int < pc_int,
-           by=c(BASE_KEY_MS2,"an")]
-
-
-    ## qa_ms2$qa_pass <- F
-    ## qa_ms2[qa_ms2_good_int==T,qa_pass:=T]
-    qa_ms2$ms2_sel <- F
-    qa_ms2[qa_ms2_good_int==T,ms2_sel:={
-        ind<-which.min(abs(ms2_rt-pc_rt))
-        z<-ms2_sel
-        z[[ind]]<-T
-        z
-    },by=BASE_KEY_MS2]
-    setkeyv(qa_ms2,BASE_KEY_MS2)
-    m$qa$ms2 <- qa_ms2
-    m    
-}
-
-## Analyze extracted data.
-analyse_extracted_data <- function(extr,prescreen_param) {
-    ms1 <- extr$ms1
-    ms2 <- extr$ms2
-    ## Parameters.
-    presconf <- conf_trans_pres(prescreen_param)
-    rt_shift <- presconf$ret_time_shift_tol
-    det_ms2_noise <- presconf$det_ms2_noise
-    ms2_int_thresh <- presconf$ms2_int_thresh
-    ms1_int_thresh <- presconf$ms1_int_thresh
-
-    ## Detect MS2 noise.
-    ms2_clc_ns <- if (det_ms2_noise) {
-                      if (ms2_int_thresh>0) {
-                          warning("Ignore user specified ms2_int_thresh value and autodetect noise instead.")
-                      }
-                      ms2[,ms2_thr:=0.33333333*mean(intensity),by="tag"]
-
-                          
-                  } else {
-                      ms2[,ms2_thr:=ms2_int_thresh]
-                      
-                  }
-
-    ## message('ms2_clc_ns:',key(ms2_clc_ns))
-    ## ms2_clc_ns <- ms2_clc_ns[intensity>ms2_thr]
-
-
-    ## We drop mz info.
-    tab_ms2 <- ms2_clc_ns[,.(ms2_rt=first(rt),ms2_int=max(intensity),ms2_thr=first(ms2_thr)),by=c(BASE_KEY_MS2,'an')]
-    tab_ms2[,qa_ms2_good_int:=ms2_int>ms2_thr,by="an"]
-    data.table::setkeyv(tab_ms2,BASE_KEY_MS2)
-    tab_ms2[,`:=`(rt_left = ms2_rt - rt_shift,rt_right = ms2_rt + rt_shift)]
-
-    ## Get mean ms1 value.
-    tab_ms1 <- extr$ms1
-    tab_ms1_mean <- tab_ms1[,.(ms1_mean=mean(intensity)),keyby=BASE_KEY]
-
-    ## This function extracts intensity maxima on intervals given by
-    ## RT vectors rt_1 and rt_2.
-    find_ms1_max <- function(rt,intensity,rt_1,rt_2)
-    {
-        mapply(function (rt_1,rt_2) {
-            rt_ival <- c(rt_1,rt_2)
-            intv <- findInterval(rt,rt_ival)
-            lintv = length(intv)
-            if (intv[1]==0L && intv[lintv] == 2L) {
-                pos = match(c(1L,2L),intv)
-            } else if (intv[1]==1L && intv[lintv]!=1L) {
-                pos = c(1L,match(2L,intv))
-            } else if (intv[1]==0L && intv[lintv]!=0L) {
-                pos = c(match(1L,intv),lintv)
-            } else {
-                pos = c(1L,lintv)
-            }
-            pmax = pos[[1]] + which.max(intensity[pos[[1]]:pos[[2]]]) - 1L
-            c(rt[pmax],intensity[pmax])
-        }, rt_1, rt_2, USE.NAMES=F)
-        
-    }
-
-    ## Perform MS1 maxima calculation in the neighbourhood of each
-    ## MS2 result.
-    tmp = tab_ms1[tab_ms2,{
-        xx = find_ms1_max(rt,intensity,i.rt_left,i.rt_right)
-        .(an=i.an,
-          ms1_rt = xx[1,],
-          ms1_int = xx[2,])
-    },by=.EACHI, nomatch=NULL]
-    ## Calculate QA values.
-    tab_ms2[tmp,c("ms1_rt","ms1_int"):=.(i.ms1_rt,i.ms1_int),on=c(BASE_KEY,'an')]
-    tab_ms2[,c("rt_left","rt_right"):=c(NULL,NULL)]
-    tab_ms2[tab_ms1_mean,ms1_mean:=i.ms1_mean]
-    tab_ms2[,`:=`(qa_ms1_good_int=fifelse(ms1_int>ms1_int_thresh,T,F),
-               qa_ms1_above_noise=F,
-               qa_ms2_near=F)]
-
-    ## TODO: I wonder if so stupidly auto-calculated ms1 noise should
-    ## be taken into account at all? My recollection from earlier
-    ## times was that it was helpful, at least sometimes.
-    tab_ms2[qa_ms1_good_int==T,qa_ms1_above_noise:=fifelse(ms1_int>ms1_mean/3.,T,F)]
-    tab_ms2[qa_ms1_good_int==T & qa_ms1_above_noise==T & qa_ms2_good_int==T,qa_ms2_near:=T]
-    tab_ms2$qa_ms2_exists=T
-
-
-    ## Find MS1 with no corresponding MS2.
-    ms1key <- tab_ms1[,unique(.SD),.SDcol=BASE_KEY]
-    ms2key <- tab_ms2[,unique(.SD),.SDcol=BASE_KEY]
-    ms2key$mch=T
-    tabmatches <- ms2key[ms1key]
-    ms1woms2 <- tabmatches[is.na(mch)][,mch:=NULL]
-
-    ## calculate the most intense peak, its location and the mean for
-    ## childless MS1.
-    tab_noms2 <- tab_ms1[ms1woms2,.(ms1_mean=mean(intensity),ms1_rt=rt[which.max(intensity)],ms1_int=max(intensity)),by=.EACHI,nomatch=NULL]
-
-    ## QA for the above (lazy qa ... take only the max peak into account).
-    tab_noms2[,c("qa_ms1_good_int","qa_ms1_above_noise"):=.(ms1_int>ms1_int_thresh,ms1_int>ms1_mean/3.)]
-
-    ## MS2 QA criteria all fail.
-    tab_noms2[,c("qa_ms2_exists","qa_ms2_good_int","qa_ms2_near"):=.(F,F,F)]
-
-    ## Bind MS1-only and MS1/MS2 entries together.
-    res <- rbind(tab_ms2,tab_noms2,fill=T,use.names=T)
-
-    ## Every single entry which was extracted has at least MS1.
-    res[,qa_ms1_exists:=T]
-    data.table::setkeyv(res,BASE_KEY)
-    
-    qflg <- QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
-    res[,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
-    res[.(T),del_rt:=abs(ms2_rt - ms1_rt),on="qa_pass",by='an']
-    resby <- BASE_KEY_MS2[! (BASE_KEY_MS2 %in% 'an')]
-    res[.(T),qa_tmp_ms1_max:= ms1_int==max(ms1_int),on="qa_pass",by=resby]
-    res[,ms2_sel:=F]
-    res[.(T,T),ms2_sel:= del_rt == del_rt[which.min(del_rt)],on=c("qa_pass","qa_tmp_ms1_max"),by=resby]
-    res[,qlt_ms1:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms1_exists",
-                                                                 "qa_ms1_above_noise",
-                                                                 "qa_ms1_good_int")]
-    res[,qlt_ms2:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms2_exists",
-                                                                 "qa_ms2_near",
-                                                                 "qa_ms2_good_int")]
-    res
-}
-
-## Based on the `comprehensive' and `qa' tabs, greate `summ'.
-gen_summ <- function(comp,qa) {
-    comp_cols <- intersect(SUMM_COLS,colnames(comp))
-    rdcomp <- comp[,..comp_cols]
-    data.table::setkeyv(rdcomp,BASE_KEY)
-    summ <- qa[rdcomp,nomatch=F] #We changed `nomatch' cases from NA
-                                 #to F, because NA does not work well
-                                 #with X == F condition.
-    ## flgs <- c(QA_FLAGS,"ms2_sel")
-    ## summ[is.na(qa_ms1_exists),(flgs):=F]
-    data.table::setkeyv(summ,SUMM_KEY)
-    summ[.(F),c("qlt_ms1","qlt_ms2"):=0.,on="qa_ms1_exists"]
-    summ
-}
-
-
-
-
 gen_mz_err_f <- function(entry,msg) {
    eppm <- grab_unit(entry,"ppm")
    eda <- grab_unit(entry,"Da")

--- a/R/plotting.R
+++ b/R/plotting.R
--- a/R/resources.R
+++ b/R/resources.R
@@ -124,13 +124,14 @@ NUM_INP_HEIGHT="5%"

 ## Possible compound list fields
 EMPTY_CMPD_LIST = dtable(ID=character(),
-                          SMILES=character(),
-                          Name=character(),
-                          Formula=character(),
-                          RT=numeric(),
-                          mz=numeric(),
-                          known=character(),
-                          ORIG=character())
+                         SMILES=character(),
+                         Name=character(),
+                         Formula=character(),
+                         RT=numeric(),
+                         mz=numeric(),
+                         known=character(),
+                         set=character(),
+                         ORIG=character())
 COMP_LIST_COLS = c("ID","Name","SMILES","Formula","RT","mz")
 ## Comprehensive table properties
 COMP_NAME_MAP = list(RT="rt")
@@ -204,8 +205,8 @@ REPORT_TITLE = "Plots of EICs and MS2 Spectra"

 ## Select the most fundamental group of entries. Within this group,
 ## each ID is unique.
-BASE_KEY = c("adduct","tag","ID")
-BASE_KEY_MS2 = c(BASE_KEY,"CE","an")
+BASE_KEY = "precid"#c("adduct","tag","ID")
+BASE_KEY_MS2 = c("precid","ce","scan")#c(BASE_KEY,"CE","scan")

 FIG_DEF_CONF =list(grouping=list(group="adduct",
                                  plot="ID",
@@ -216,7 +217,7 @@ FIG_DEF_CONF =list(grouping=list(group="adduct",
 SUMM_COLS=c("set",BASE_KEY_MS2,"mz","ms1_rt", "ms1_int", "ms2_rt", "ms2_int",
            "ms1_mean","ms2_sel",QA_FLAGS,"Name", "SMILES", "Formula", "known","Comments","file")

-SUMM_KEY = c("set","ID","adduct","tag","an")
+SUMM_KEY = c("set","ID","adduct","tag","precid","catid","scan")

 PLOT_FEATURES = c("adduct",
                   "tag",
@@ -224,34 +225,34 @@ PLOT_FEATURES = c("adduct",

 ## Empty summary table.
 EMPTY_SUMM = data.table::data.table(set=character(0),
-                                     adduct=character(0),
-                                     tag=character(0),
-                                     ID=character(0),
-                                     CE=character(0),
-                                     an=integer(0),
-                                     mz=numeric(0),
-                                     ms1_rt=numeric(0),
-                                     ms1_int=numeric(0),
-                                     ms2_rt=numeric(0),
-                                     ms2_int=numeric(0),
-                                     ms1_mean=numeric(0),
-                                     ms2_sel=logical(0),
-                                     qa_pass=logical(0),
-                                     qa_ms1_exists=logical(0),
-                                     qa_ms2_exists=logical(0),
-                                     qa_ms1_good_int=logical(0),
-                                     qa_ms1_above_noise=logical(0),
-                                     qa_ms2_near=logical(0),
-                                     qa_ms2_good_int=logical(0),
-                                     Name=character(0),
-                                     SMILES=character(0),
-                                     Formula=character(0),
-                                     known=character(0),
-                                     Comments=character(0),
-                                     file=character(0))
+                                    adduct=character(0),
+                                    tag=character(0),
+                                    ID=character(0),
+                                    CE=character(0),
+                                    scan=character(0),
+                                    mz=numeric(0),
+                                    ms1_rt=numeric(0),
+                                    ms1_int=numeric(0),
+                                    ms2_rt=numeric(0),
+                                    ms2_int=numeric(0),
+                                    ms1_mean=numeric(0),
+                                    ms2_sel=logical(0),
+                                    qa_pass=logical(0),
+                                    qa_ms1_exists=logical(0),
+                                    qa_ms2_exists=logical(0),
+                                    qa_ms1_good_int=logical(0),
+                                    qa_ms1_above_noise=logical(0),
+                                    qa_ms2_near=logical(0),
+                                    qa_ms2_good_int=logical(0),
+                                    Name=character(0),
+                                    SMILES=character(0),
+                                    Formula=character(0),
+                                    known=character(0),
+                                    Comments=character(0),
+                                    file=character(0))

 ## Default sorting keys of spectra in the summary table
-DEF_KEY_SUMM = c(BASE_KEY_MS2,"an")
+DEF_KEY_SUMM = c(BASE_KEY_MS2,"scan")


 SUBSET_VALS = c(IGNORE="ignore",
@@ -360,3 +361,7 @@ METFRAG_RESULT_READF = list(csv = function(file,...) data.table::fread(file=file
                            xml = function(file,...) readxl::read_excel(path=file,...))

 METFRAG_DEFAULT_PROC = 1L
+
+
+## DATA MODEL
+DB_CATALOGUE_KEY = c("set","tag","adduct","ID")
--- a/R/shiny-state.R
+++ b/R/shiny-state.R
@@ -22,7 +22,7 @@
 #' @importFrom shiny selectInput numericInput textInput HTML


-GUI_SELECT_INPUTS <- c("proj_list",
+GUI_SELECT_INPUTS = c("proj_list",
                       "indir_list",
                       "ms1_coarse_unit",
                       "ms1_fine_unit",
@@ -30,7 +30,7 @@ GUI_SELECT_INPUTS <- c("proj_list",
                       "ret_time_shift_tol_unit",
                       "dfile_list")
                  
-GUI_NUMERIC_INPUTS <- c("ms1_coarse",
+GUI_NUMERIC_INPUTS = c("ms1_coarse",
                   "ms1_fine",
                   "ms1_eic",
                   "ms1_rt_win",
@@ -39,14 +39,14 @@ GUI_NUMERIC_INPUTS <- c("ms1_coarse",
                   "s2n",
                   "ret_time_shift_tol")

-GUI_TEXT_INPUTS <- c("rep_aut",
+GUI_TEXT_INPUTS = c("rep_aut",
                     "rep_tit")

-GUI_RADIO_INPUTS <- c("missingprec")
+GUI_RADIO_INPUTS = c("missingprec")



-GUI_ALL_INPUTS <- c(GUI_SELECT_INPUTS,
+GUI_ALL_INPUTS = c(GUI_SELECT_INPUTS,
                    GUI_NUMERIC_INPUTS,
                    GUI_TEXT_INPUTS,
                    GUI_RADIO_INPUTS)
@@ -55,8 +55,8 @@ GUI_ALL_INPUTS <- c(GUI_SELECT_INPUTS,
 add_new_def_tag <- function(old_tags,how_many) {
    ind = which(grepl(r"(^F\d+$)",old_tags))
    st_num = if (length(ind)>0L) {
-                    old_def_tags <- old_tags[ind]
-                    tag_nums <- gsub(r"(^F(\d+)$)",r"(\1)",old_def_tags)
+                    old_def_tags = old_tags[ind]
+                    tag_nums = gsub(r"(^F(\d+)$)",r"(\1)",old_def_tags)
                    max(as.integer(tag_nums))
                   
        
@@ -162,26 +162,26 @@ create_gui <- function(project_path=NA_character_) {
 #'@export
 r2datatab <- function(rdatatab) {
    shiny::isolate({
-        file <- rdatatab$file
-        adduct <- rdatatab$adduct
-        tag <- rdatatab$tag
-        set <- rdatatab$set
+        file = rdatatab$file
+        adduct = rdatatab$adduct
+        tag = rdatatab$tag
+        set = rdatatab$set
        })
-    if (length(file)==0L) file <- character(0)
-    if (length(adduct)==0L) adduct <- rep(NA_character_,length(file))
-    if (length(tag)==0L) tag <- rep(NA_character_,length(file))
-    if (length(set)==0L) tag <- rep(NA_character_,length(file))
+    if (length(file)==0L) file = character(0)
+    if (length(adduct)==0L) adduct = rep(NA_character_,length(file))
+    if (length(tag)==0L) tag = rep(NA_character_,length(file))
+    if (length(set)==0L) tag = rep(NA_character_,length(file))
    data.table(tag=tag,adduct=adduct,set=set,file=file)
 }


 r2filetag <- function(rfiletag) {
    shiny::isolate({
-        file <- rfiletag$file
-        tag <- rfiletag$tag
+        file = rfiletag$file
+        tag = rfiletag$tag
        })
-    if (length(file)==0L) file <- character(0)
-    if (length(tag)==0L) tag <- rep(NA_character_,length(file))
+    if (length(file)==0L) file = character(0)
+    if (length(tag)==0L) tag = rep(NA_character_,length(file))
    data.table(tag=tag,file=file)
 }

@@ -193,7 +193,7 @@ gen_dtab <- function(tablist,sets) {

 r2compounds <- function(rcompounds) {
    shiny::isolate({
-        cmpd_lists <- rcompounds$lists
+        cmpd_lists = rcompounds$lists
        })

    list(lists=cmpd_lists)
@@ -202,17 +202,17 @@ r2compounds <- function(rcompounds) {

 #' @export
 pack_app_state <- function(input, gui) {
-    pack <- list()
+    pack = list()
    shiny::isolate({
-        pack_inputs <- list()
-        pack_input_names <- which_gui_inputs(inputs)
-        pack_inputs <- shiny::reactiveValuesToList(input)[pack_input_names]
-        pack$input <- pack_inputs
-        pack$datatab <- r2datatab(gui$datatab)
-        pack$filetag <- r2filetag(gui$filetag)
-        pack$compounds <- r2compounds(gui$compounds)
-        pack$paths <- list()
-        pack$paths$data <- gui$paths$data
+        pack_inputs = list()
+        pack_input_names = which_gui_inputs(inputs)
+        pack_inputs = shiny::reactiveValuesToList(input)[pack_input_names]
+        pack$input = pack_inputs
+        pack$datatab = r2datatab(gui$datatab)
+        pack$filetag = r2filetag(gui$filetag)
+        pack$compounds = r2compounds(gui$compounds)
+        pack$paths = list()
+        pack$paths$data = gui$paths$data

        })
    pack
@@ -318,19 +318,21 @@ unpack_app_state <- function(session,envopts,input,top_data_dir,project_path,pac
                                 input=input,
                                 packed_state=packed_state)
        
-        gui <- create_gui(project_path=project_path)
-        gui$compounds$lists <- packed_state$compounds$lists
-        gui$datatab$file <- packed_state$datatab$file
-        gui$datatab$adduct <- packed_state$datatab$adduct
-        gui$datatab$tag <- packed_state$datatab$tag
-        gui$datatab$set <- packed_state$datatab$set
-
-        gui$filetag$file <- packed_state$filetag$file
-        gui$filetag$tag <- packed_state$filetag$tag
-
-        x <- packed_state$paths$data
-        gui$paths$data = if (length(x)>0 && nchar(x)>0) basename(x) else ""
-        if (!dir.exists(file.path(top_data_dir,gui$paths$data))) {warning("Data directory ", gui$paths$data, " does not exist. You must select one.")}
+        gui = create_gui(project_path=project_path)
+        gui$compounds$lists = packed_state$compounds$lists
+        gui$datatab$file = packed_state$datatab$file
+        gui$datatab$adduct = packed_state$datatab$adduct
+        gui$datatab$tag = packed_state$datatab$tag
+        gui$datatab$set = packed_state$datatab$set
+
+        gui$filetag$file = packed_state$filetag$file
+        gui$filetag$tag = packed_state$filetag$tag
+
+        x = packed_state$paths$data
+        gui$paths$data = if (length(x)>0 && nchar(x)>0) {
+                             file.path(top_data_dir,basename(x))
+                         } else ""
+        if (!dir.exists(gui$paths$data)) {warning("Data directory ", gui$paths$data, " does not exist. You must select one.")}
        gui
    })

@@ -340,12 +342,12 @@ unpack_app_state <- function(session,envopts,input,top_data_dir,project_path,pac

 input2conf_setup <- function(input,gui,conf=list()) {
    if (length(conf)==0L) {
-        conf$compounds <- list()
-        conf$summary_table <- list()
-        conf$debug <- F
+        conf$compounds = list()
+        conf$summary_table = list()
+        conf$debug = F
    }

-    conf$compounds$lists <- gui$compounds$lists
+    conf$compounds$lists = gui$compounds$lists
    conf$paths$data = basename(gui$paths$data)
    conf
 }
@@ -353,11 +355,11 @@ input2conf_setup <- function(input,gui,conf=list()) {
 input2conf_extract <- function(input,conf) {
    conf$tolerance = list()
    conf$extract = list()
-    conf$tolerance[["ms1 fine"]] <- paste(input$ms1_fine,input$ms1_fine_unit)
-    conf$tolerance[["ms1 coarse"]] <- paste(input$ms1_coarse,input$ms1_coarse_unit)
-    conf$tolerance[["eic"]] <- paste(input$ms1_eic,input$ms1_eic_unit)
-    conf$tolerance[["rt"]] <- paste(input$ms1_rt_win,input$ms1_rt_win_unit)
-    conf$extract$missing_precursor_info <- input$missingprec
+    conf$tolerance[["ms1 fine"]] = paste(input$ms1_fine,input$ms1_fine_unit)
+    conf$tolerance[["ms1 coarse"]] = paste(input$ms1_coarse,input$ms1_coarse_unit)
+    conf$tolerance[["eic"]] = paste(input$ms1_eic,input$ms1_eic_unit)
+    conf$tolerance[["rt"]] = paste(input$ms1_rt_win,input$ms1_rt_win_unit)
+    conf$extract$missing_precursor_info = input$missingprec
    conf

 }
@@ -365,26 +367,26 @@ input2conf_extract <- function(input,conf) {

 input2conf_prescreen <- function(input,conf) {
    conf$prescreen = list()
-    conf$prescreen[["ms1_int_thresh"]] <- input$ms1_int_thresh
-    conf$prescreen[["ms2_int_thresh"]] <- input$ms2_int_thresh
-    conf$prescreen[["s2n"]] <- input$s2n
-    conf$prescreen[["ret_time_shift_tol"]] <- paste(input$ret_time_shift_tol,input$ret_time_shift_tol_unit)
+    conf$prescreen[["ms1_int_thresh"]] = input$ms1_int_thresh
+    conf$prescreen[["ms2_int_thresh"]] = input$ms2_int_thresh
+    conf$prescreen[["s2n"]] = input$s2n
+    conf$prescreen[["ret_time_shift_tol"]] = paste(input$ret_time_shift_tol,input$ret_time_shift_tol_unit)
    conf
 }

 input2conf_figures <- function(input,conf) {
    conf$figures = list()
-    conf$figures$rt_min <- paste(input$plot_rt_min,input$plot_rt_min_unit)
-    conf$figures$rt_max <- paste(input$plot_rt_max,input$plot_rt_max_unit)
-    conf$figures$ext <- input$plot_ext
+    conf$figures$rt_min = paste(input$plot_rt_min,input$plot_rt_min_unit)
+    conf$figures$rt_max = paste(input$plot_rt_max,input$plot_rt_max_unit)
+    conf$figures$ext = input$plot_ext
    conf
    
 }

 input2conf_report <- function(input,conf) {
    conf$report = list()
-    conf$report$author <- input$rep_aut
-    conf$report$title <- input$rep_tit
+    conf$report$author = input$rep_aut
+    conf$report$title = input$rep_tit
    conf

 }
@@ -401,7 +403,7 @@ input2conf_metfrag <- function(input,conf) {
                          MetFragPreProcessingCandidateFilter = paste(input$mf_pre_processing_candidate_filter,collapse=","),
                          MetFragPostProcessingCandidateFilter = paste(input$mf_post_processing_candidate_filter,collapse=","))

-    ## TODO: FIXME: We need to move away from unit weights from some
+    ## TODO: FIXME: We need to move away from unit weights at some
    ## point. This needs some extra widgets (Sigh!). 
    insc = sapply(input$mf_scores_intrinsic,function(x) 1.0)
    names(insc) = input$mf_scores_intrinsic
@@ -425,7 +427,7 @@ app_update_conf <- function(input,gui,envopts,fconf,m) {
        fstr = paste0("input2conf_",fstrp)
        m$conf = do.call(fstr,list(input,conf=m$conf))
    }
-    m$run <- new_runtime_state(project=gui$paths$project,
+    m$run = new_runtime_state(project=gui$paths$project,
                               envopts = envopts,
                               conf=m$conf)
    m
@@ -453,28 +455,24 @@ app_state2state <- function(input,gui,envopts,m=NULL) {
 }
    
 get_sets <- function(gui) {
-    ## TODO FIXME
-    ## Think about this
-    ## fn_lists <- file.path(gui$paths$project,gui$compounds$lists)
-
-    ## df <- fread(file=fn_lists)
-    ## if (!
-    ## res = df[,unique(set)]
-    if (length(res)==0L) res = "ALL"
+    fn_lists = file.path(gui$paths$project,gui$compounds$lists)
+    cmpds = join_compound_lists(fn_lists)
+    cmpds = process_cmpd_sets(cmpds)
+    cmpds[,unique(set)]
 }


 gen_dfiles_tab <- function(gui) {
-    curr_file <- gui$filetag$file
-    curr_tag <- gui$filetag$tag
+    curr_file = gui$filetag$file
+    curr_tag = gui$filetag$tag
    
-    res <- data.table(file=curr_file,tag=curr_tag)
+    res = data.table(file=curr_file,tag=curr_tag)
    res
    
 }

 gui2datatab <- function(gui) {
-    df <- data.table(tag=as.character(gui$datatab$tag),
+    df = data.table(tag=as.character(gui$datatab$tag),
                     adduct=as.character(gui$datatab$adduct),
                     set=as.character(gui$datatab$set),
                     file=as.character(gui$datatab$file))
@@ -493,55 +491,59 @@ gui2datatab <- function(gui) {
 ## (of tags, CEs) in the index.
 gen_cindex <- function(summ,sorder,cols = CINDEX_COLS,by. = CINDEX_BY) {
    if (NROW(summ) == 0L) return(NULL)
-    allc <- c(by.,cols)
-    xsumm <- summ[,..allc]
+    allc = c(by.,cols)
+    xsumm = summ[,..allc]
    setnames(xsumm,old="ms1_rt",new="rt",skip_absent=T)
-    res <- xsumm[,.SD[max(qlt_ms1)==qlt_ms1][max(qlt_ms2)==qlt_ms2],by=by.]
-    res <- res[,c("mz","rt","Name","qlt_ms1","qlt_ms2"):=.(first(mz),
-                                                         first(mean(rt)),
-                                                         first(Name),
-                                                         first(qlt_ms1),
-                                                         first(qlt_ms2)),
+    res = xsumm[,.SD[max(qlt_ms1)==qlt_ms1][max(qlt_ms2)==qlt_ms2],by=by.]
+    res = res[,c("mz","rt","Name","qlt_ms1","qlt_ms2"):=.(mean(mz,na.rm=T),
+                                                           mean(rt,na.rm=T),
+                                                           first(Name),
+                                                           max(qlt_ms1,na.rm=T),
+                                                           max(qlt_ms2,na.rm=T)),
               by=by.]
-    res <- res[,unique(.SD),by=by.]
+    res = res[,unique(.SD),by=by.]
   
-    sorder <- unique(sorder)
-    wna <- which(sorder=="nothing"); if (length(wna)>0L) sorder <- sorder[-wna]
-    quality <- which("quality"==sorder)
+    sorder = unique(sorder)
+    wna = which(sorder=="nothing"); if (length(wna)>0L) sorder = sorder[-wna]
+    quality = which("quality"==sorder)
    if (length(quality)>0L) {
-        pre <- head(sorder,quality-1L)
-        post <- tail(sorder,-quality)
-        sorder <- c(pre,"qlt_ms1","qlt_ms2",post)
+        pre = head(sorder,quality-1L)
+        post = tail(sorder,-quality)
+        sorder = c(pre,"qlt_ms1","qlt_ms2",post)
    }
-    ord <- rep(1L,length(sorder))
+    ord = rep(1L,length(sorder))

    if ("qlt_ms1" %in% sorder) {
-        ind <- which(sorder %in% c("qlt_ms1","qlt_ms2"))
-        ord[ind] <- -1L
+        ind = which(sorder %in% c("qlt_ms1","qlt_ms2"))
+        ord[ind] = -1L
    }
    if (length(sorder)>0) setorderv(res,cols=sorder,order=ord)
-    setnames(res,old="rt",new="rt(ms1)")
+
+    ## Remove confusing columns.
+    res[,c("rt","mz"):=NULL]
+    res[,c("qlt_ms1","qlt_ms2"):=.(signif(100*qlt_ms1/10.,3),signif(100*qlt_ms2/10.,3))]
+    setnames(res,c("qlt_ms1","qlt_ms2"),c("best score (ms1)","best score (ms2)"))
    res
 }


 cindex_from_input <- function(clabs,sort_catg=character(4),summ) {
-    grp <- if (isTruthy(clabs)) setdiff(CINDEX_BY,clabs) else CINDEX_BY
-    sorder <- setdiff(sort_catg,clabs)
+    grp = if (isTruthy(clabs)) setdiff(CINDEX_BY,clabs) else CINDEX_BY
+    sorder = setdiff(sort_catg,clabs)
    gen_cindex(summ,sorder=sorder,by=grp)
 }

 get_cindex_key <- function(cindex) {
    ## Select only valid category names.
-    x <- which(CINDEX_BY %in% names(cindex))
+    x = which(CINDEX_BY %in% names(cindex))
    CINDEX_BY[x]
 }

 get_cindex_parents <- function(summ,ckey,kvals,labs) {
    ## Get kvals part of summ.
-    tab <- summ[(kvals),on=names(kvals)][,unique(.SD),.SDcols=labs,by=ckey] #get_data_from_key(summ,kvals)
+    tab = summ[(kvals),on=names(kvals)][,unique(.SD),.SDcols=labs,by=ckey] #get_data_from_key(summ,kvals)
    tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcol=labs]
-    keys <- names(tab)[names(tab)!="item"]
+    keys = names(tab)[names(tab)!="item"]
    data.table::setkeyv(tab,keys)
    tab
 }
@@ -550,28 +552,28 @@ get_cindex_kval <- function(cindex,row,key) {
    ## Accounting for not fully initialised state.
    if (!is.numeric(row) || is.na(row) || length(key)==0L || is.na(key) || NROW(cindex)==0L) return(NULL)
    
-    rowtab <- cindex[(row),..key]
-    res <- lapply(rowtab,function (x) x[[1]])
-    names(res) <- key
+    rowtab = cindex[(row),..key]
+    res = lapply(rowtab,function (x) x[[1]])
+    names(res) = key
    res
 }

-get_summ_subset <- function(summ,ptab,paritem,kvals) {
-    select <- ptab[item==(paritem)]
-    tab <- get_data_from_key(summ,kvals)[select,nomatch=NULL,on=key(ptab)]
-    if ("an.1" %in% names(tab)) tab[,an.1:=NULL] #TODO: This is
+get_summ_subset <- function(db,summ,ptab,paritem,kvals) {
+    select = ptab[item==(paritem)]
+    tab = get_data_from_key(db=db,tab=summ,kvals=kvals)[select,nomatch=NULL,on=key(ptab)]
+    if ("scan.1" %in% names(tab)) tab[,scan.1:=NULL] #TODO: This is
                                                 #probably a lousy
                                                 #hack.
    tab
 }

-get_ltab <- function(summ_subs,cols=c("an","ms2_rt")) {
-    tab <- summ_subs
+get_ltab <- function(summ_subs,cols=c("scan","ms2_rt")) {
+    tab = summ_subs
    if (NROW(tab)==1L && is.na(tab$an)) return(data.table::data.table(item=character()))
-    tab[is.na(ms2_sel),ms2_sel:=F] #TODO FIXME: Check why NAs exist at all?
+    tab[is.na(ms2_sel),ms2_sel:=F]
    tab[,passval:=fifelse(qa_pass==T,"OK","BAD")]
    tab[ms2_sel==T,passval:="SELECTED"]
-    res <- tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcols=c(cols,"passval")]
+    res = tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcols=c(cols,"passval")]
    data.table::setkey(res,"ms2_rt")
    res
 }
@@ -585,18 +587,18 @@ update_on_commit_chg <- function(summ,input,ptab,ltab) {

    n_ms2_sel = input$chg_ms2sel
    
-    sel_par <- input$sel_parent_trace
-    sel_spec <- input$sel_spec
+    sel_par = input$sel_parent_trace
+    sel_spec = input$sel_spec

-    pkvals <- ptab[item==(sel_par),.SD,.SDcols=intersect(SUMM_KEY,names(ptab))]
-    lkvals <- ltab[item==(sel_spec),.SD,.SDcols=intersect(SUMM_KEY,names(ltab))]
-    kvals <- c(as.list(pkvals),as.list(lkvals))
-    kvals <- kvals[unique(names(kvals))]
+    pkvals = ptab[item==(sel_par),.SD,.SDcols=intersect(SUMM_KEY,names(ptab))]
+    lkvals = ltab[item==(sel_spec),.SD,.SDcols=intersect(SUMM_KEY,names(ltab))]
+    kvals = c(as.list(pkvals),as.list(lkvals))
+    kvals = kvals[unique(names(kvals))]
    if ('an' %in% names(kvals) && n_ms2_sel) {
-        rkvals <- kvals[!(names(kvals) %in% 'an')]
-        rktab <- tabkey(summ,kvals=rkvals)
-        tabsel <- summ[rktab,.(an,ms2_sel)]
-        ansel <- tabsel[ms2_sel == T,an]
+        rkvals = kvals[!(names(kvals) %in% 'an')]
+        rktab = tabkey(summ,kvals=rkvals)
+        tabsel = summ[rktab,.(scan,ms2_sel)]
+        ansel = tabsel[ms2_sel == T,scan]
        print('ansel')
        print(ansel)
        if (length(ansel)!=0) {
@@ -607,29 +609,29 @@ update_on_commit_chg <- function(summ,input,ptab,ltab) {
        
    }

-    tgts <- c("ms1_rt","ms1_int",names(n_qa),"ms2_sel")
-    srcs <- c(list(n_ms1_rt,n_ms1_int),as.list(n_qa),as.list(n_ms2_sel))
+    tgts = c("ms1_rt","ms1_int",names(n_qa),"ms2_sel")
+    srcs = c(list(n_ms1_rt,n_ms1_int),as.list(n_qa),as.list(n_ms2_sel))

-    the_row <- tabkey(summ,kvals=kvals)
+    the_row = tabkey(summ,kvals=kvals)
    summ[the_row,(tgts):=..srcs]
-    summ[,an.1:=NULL] #FIXME: an.1 pops up somewhere.
-    qflg <- QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
+    summ[,scan.1:=NULL]
+    qflg = QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
    summ[the_row,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
    summ
 }


 get_mprop_ms2_metadata <- function(ltab_entry) {
- res <- list(rt=NA_real_,int=NA_real_,qa=character(0),ms2_sel=F)
+ res = list(rt=NA_real_,int=NA_real_,qa=character(0),ms2_sel=F)

 if (NROW(ltab_entry)==0L) return(res)
 
 res$rt = ltab_entry$ms1_rt
 res$int = ltab_entry$ms1_int
- z <- ltab_entry[.SD,.SDcols=patterns("qa_ms[12].*")]
- lqa_vals <- as.list(ltab_entry[,.SD,.SDcols=patterns("qa_ms[12].*")])
- qa_names <- names(lqa_vals)
- res$qa <- qa_names[as.logical(lqa_vals)]
+ z = ltab_entry[.SD,.SDcols=patterns("qa_ms[12].*")]
+ lqa_vals = as.list(ltab_entry[,.SD,.SDcols=patterns("qa_ms[12].*")])
+ qa_names = names(lqa_vals)
+ res$qa = qa_names[as.logical(lqa_vals)]
 res$ms2_sel = ltab_entry$ms2_sel

 res

--- a/R/shiny-ui-base.R
+++ b/R/shiny-ui-base.R
@@ -564,7 +564,7 @@ mk_shinyscreen_server <- function(projects,init) {
        
        res = if (NROW(sel_dt)>0) {
                   
-                   tab[sel_dt,..coln,on=c("adduct","tag","ID","an")]
+                   tab[sel_dt,..coln,on=c("adduct","tag","ID","scan")]
               } else triv
        data.table::setnames(res,"intensity","ms2_int")
        res
@@ -700,8 +700,6 @@ mk_shinyscreen_server <- function(projects,init) {
        ## REACTIVE FUNCTIONS: COMPOUND INDEX
        rf_get_cindex <- reactive({

-            ## TODO: FIXME: Uncomment after rearranging everything.
-            ## input$cmt_changes_b
            rvs$status$is_qa_stat
            s1 = input$sort1
            s2 = input$sort2
@@ -741,8 +739,8 @@ mk_shinyscreen_server <- function(projects,init) {
        rf_get_cindex_parents <- reactive({
            rvs$m
            isolate({
-                ms1 = rvs$m$extr$ms1
-                ms2 = rvs$m$extr$ms2
+                ms1 = rvs$m$db$extr$cgm$ms1
+                ms2 = rvs$m$db$extr$cgm$ms2
                summ = req(rvs$m$out$tab$summ)
            })

@@ -760,7 +758,8 @@ mk_shinyscreen_server <- function(projects,init) {
            kvals = req(rf_get_cindex_kval())
            ptab = rf_get_cindex_parents()
            if (isTruthy(parent)) {
-                get_summ_subset(summ=summ,
+                get_summ_subset(db=rvs$m$db,
+                                summ=summ,
                                ptab=ptab,
                                paritem=parent,
                                kvals=kvals)
@@ -813,20 +812,23 @@ mk_shinyscreen_server <- function(projects,init) {
        
        rf_plot_eic_ms1 <- reactive({
            isolate({
-                ms1 = rvs$m$extr$ms1
+                ms1 = rvs$m$db$extr$cgm$ms1
                summ = rvs$m$out$tab$summ
+                db = rvs$m$db

            })
            req(NROW(summ)>0L)
            req(NROW(ms1)>0L)


-            p = make_eic_ms1_plot(ms1,summ,kvals=rf_get_cindex_kval(),
-                                   labs=rf_get_cindex_labs(),
-                                   asp=PLOT_EIC_ASPECT,
-                                   rt_range=rf_get_rtrange(),
-                                   i_range=rf_get_irange(),
-                                   colrdata = rf_colrdata())
+            p = make_eic_ms1_plot(db=db,
+                                  summ,
+                                  kvals=rf_get_cindex_kval(),
+                                  labs=rf_get_cindex_labs(),
+                                  asp=PLOT_EIC_ASPECT,
+                                  rt_range=rf_get_rtrange(),
+                                  i_range=rf_get_irange(),
+                                  colrdata = rf_colrdata())

            p = if (!is.null(p)) p else empty_plot("Nothing to plot")

@@ -851,12 +853,13 @@ mk_shinyscreen_server <- function(projects,init) {

            gg = rf_plot_eic_ms1()
            rt_rng = range(gg$data$rt)
-            p = make_eic_ms2_plot(summ,
-                                   kvals=rf_get_cindex_kval(),
-                                   labs=rf_get_cindex_labs(),
-                                   rt_range = rf_get_ms2_eic_rtrange(),
-                                   asp=PLOT_EIC_ASPECT,
-                                   colrdata=rf_colrdata())
+            p = make_eic_ms2_plot(rvs$m$db,
+                                  summ,
+                                  kvals=rf_get_cindex_kval(),
+                                  labs=rf_get_cindex_labs(),
+                                  rt_range = rf_get_ms2_eic_rtrange(),
+                                  asp=PLOT_EIC_ASPECT,
+                                  colrdata=rf_colrdata())

            
            p = if (!is.null(p)) p else empty_plot("Nothing to plot")
@@ -878,15 +881,15 @@ mk_shinyscreen_server <- function(projects,init) {
        rf_plot_spec_ms2 <- reactive({
            isolate({
                summ = rvs$m$out$tab$summ
-                ms2 = rvs$m$extr$ms2
+                ms2 = rvs$m$db$extr$cgm$ms2
            })
            req(NROW(summ)>0L)
            req(NROW(ms2)>0L)
-            p = make_spec_ms2_plot(ms2,
-                                    summ,
-                                    kvals=req(rf_get_cindex_kval()),
-                                    labs=req(rf_get_cindex_labs()),
-                                    colrdata=rf_colrdata())
+            p = make_spec_ms2_plot(db = rvs$m$db,
+                                   summ,
+                                   kvals=req(rf_get_cindex_kval()),
+                                   labs=req(rf_get_cindex_labs()),
+                                   colrdata=rf_colrdata())

            p = if (!is.null(p)) p else empty_plot("Nothing to plot")
            p
@@ -905,29 +908,35 @@ mk_shinyscreen_server <- function(projects,init) {
                              choices = list.files(path=top_data_dir,
                                                   pattern = CMPD_LIST_PATT))

-            updateSelectInput(session = session,
-                              inputId = "dfile_list",
-                              choices = list.files(path=top_data_dir,
-                                                   pattern = DFILES_LIST_PATT))
            
-            updateSelectInput(session = session,
-                              inputId = "top_data_dir_list",
-                              selected = basename(top_data_dir),
-                              choices = list.dirs(path = init$envopts$top_data_dir,
-                                                  full.names = F,
-                                                  recursive = F))
        })

        observe({
-            top_data_dir = rvs$gui$paths$data
-            req(isTruthy(top_data_dir) && dir.exists(top_data_dir))
+            data_dir = rvs$gui$paths$data
+            if ((isTruthy(data_dir) && dir.exists(data_dir))) {
+
+                dfchoices = list.files(path=data_dir,
+                                       pattern = DFILES_LIST_PATT)
+
+
+                updateSelectInput(session = session,
+                                  inputId = "dfile_list",
+                                  choices = dfchoices)
+
+            
+            
+                updateSelectInput(session = session,
+                                  inputId = "top_data_dir_list",
+                                  selected = basename(rvs$gui$paths$data),
+                                  choices = list.dirs(path = init$envopts$top_data_dir,
+                                                      full.names = F,
+                                                      recursive = F))
+            }
+
            
-            updateSelectInput(session = session,
-                              inputId = "dfile_list",
-                              choices = list.files(path=top_data_dir,
-                                                   pattern = DFILES_LIST_PATT))
        })

+
        ## Update projects and data directories every second.
        observeEvent(rtimer1000(),{

@@ -981,13 +990,15 @@ mk_shinyscreen_server <- function(projects,init) {
                rvs$status$ms2_int_thresh_stat = rvs$m$conf$prescreen[["ms2_int_thresh"]]
                rvs$status$s2n_stat = rvs$m$conf$prescreen[["s2n"]]
                rvs$status$ret_time_shift_tol_stat = rvs$m$conf$prescreen[["ret_time_shift_tol"]]
-                if (NROW(m$extr$ms1)>0L) rvs$status$is_extracted_stat = "Yes."
+                if (NROW(m$db$extr$cgm$ms1)>0L) rvs$status$is_extracted_stat = "Yes."
                if (NROW(m$out$tab$summ)>0L) rvs$status$is_qa_stat = "Yes."

            } else {
                message("Initialising project: ",wd)
                rvs$gui = create_gui(project_path=fullwd)
            }
+
+
            message("project: ",rvs$gui$project())
        }, label = "project-b")

@@ -1017,10 +1028,11 @@ mk_shinyscreen_server <- function(projects,init) {

        observeEvent(input$sel_data_dir_b,{
            data_dir = input$top_data_dir_list
-            req(isTruthy(data_dir))
-            rvs$gui$paths$data = file.path(init$envopts$top_data_dir, data_dir)
-            
-            message("Selected data dir:",rvs$gui$paths$data)
+            if (isTruthy(data_dir)) {
+                rvs$gui$paths$data = file.path(init$envopts$top_data_dir, data_dir)
+                message("Selected data dir:",rvs$gui$paths$data)
+            }
+

        })

@@ -1045,32 +1057,9 @@ mk_shinyscreen_server <- function(projects,init) {

        observeEvent(input$datafiles_b,{
            new_file = input$dfile_list
-            if (isTruthy(new_file)) {
-                rvs$gui$filetag = filetag_add_file(rvs$gui$filetag,
-                                                   new_file)
+            if (isTruthy(new_file)) rvs$gui$filetag = filetag_add_file(rvs$gui$filetag,
+                                                                       new_file)
                                                         
-                 ## curr_file = rvs$gui$datatab$file
-                ## curr_tag = rvs$gui$datatab$tag
-                ## curr_adduct = rvs$gui$datatab$adduct
-                ## curr_set = rvs$gui$datatab$set
-
-                ## res_file = union(curr_file,new_file)
-
-                ## res_adduct = c(curr_adduct,rep(NA_character_,nd))
-                ## res_set = c(curr_set,rep(NA_character_,nd))
-                ## rvs$gui$datatab$file = res_file
-                ## rvs$gui$datatab$tag = add_new_def_tag(as.character(rvs$gui$datatab$tag),nd)
-                ## rvs$gui$datatab$adduct = res_adduct
-                ## rvs$gui$datatab$set = res_set
-
-
-            }
-
-            updateSelectInput(session=session,
-                              inputId="dfile_list",
-                              selected=NULL)
-
-            
        })
    
        observeEvent(input$rem_dfiles_b,{
@@ -1162,7 +1151,7 @@ mk_shinyscreen_server <- function(projects,init) {
                    rv_extr_flag(F)
                    rvs$m = run(m=rvs$m,
                                envopts=init$envopts,
-                                phases=c("setup","comptab","extract"))
+                                phases=c("setup","comptab","db","extract"))
                    rvs$status$is_extracted_stat = "Yes."
                    rvs$status$is_qa_stat = "No."
                    fn_c_state = file.path(rvs$m$run$paths$project,
@@ -1178,7 +1167,7 @@ mk_shinyscreen_server <- function(projects,init) {
        

        observeEvent(input$presc_b,{
-            if (NROW(rvs$m$extr$ms1)>0L) {
+            if (NROW(rvs$m$db$extr$cgm$ms1)>0L) {
                ## Update just prescreening conf.
                rvs$m = app_update_conf(input=input,
                                        gui=rvs$gui,
@@ -1310,9 +1299,10 @@ mk_shinyscreen_server <- function(projects,init) {
        
        observeEvent(input$make_report_b,{
            isolate({
-                ms1 = rvs$m$extr$ms1
-                ms2 = rvs$m$extr$ms2
+                ms1 = rvs$m$db$extr$cgm$ms1
+                ms2 = rvs$m$db$extr$cgm$ms2
                summ = rvs$m$out$tab$summ
+                db = rvs$m$db

            })
            req(NROW(summ)>0L)
@@ -1334,18 +1324,21 @@ mk_shinyscreen_server <- function(projects,init) {
                names(kvals) = key
                message('Compound index row: ',ri)

-                p1 = make_eic_ms1_plot(ms1,summ,kvals=kvals,
-                                        labs=labs,
-                                        asp=PLOT_EIC_ASPECT,
-                                        rt_range=rt_range,
-                                        i_range=i_range,
-                                        colrdata = colrdata) + theme_print()
-
-                p2 = make_eic_ms2_plot(summ,kvals=kvals,
-                                        labs=labs,
-                                        asp=PLOT_EIC_ASPECT,
-                                        rt_range=rt_range,
-                                        colrdata = colrdata) + theme_print()
+                p1 = make_eic_ms1_plot(db=db,
+                                       summ,
+                                       kvals=kvals,
+                                       labs=labs,
+                                       asp=PLOT_EIC_ASPECT,
+                                       rt_range=rt_range,
+                                       i_range=i_range,
+                                       colrdata = colrdata) + theme_print()
+
+                p2 = make_eic_ms2_plot(db=db,
+                                       summ,kvals=kvals,
+                                       labs=labs,
+                                       asp=PLOT_EIC_ASPECT,
+                                       rt_range=rt_range,
+                                       colrdata = colrdata) + theme_print()

                

@@ -1353,11 +1346,11 @@ mk_shinyscreen_server <- function(projects,init) {
                smi = rvs$m$out$tab$comp[ID==(id),SMILES][[1]]
                p_struc = make_struct_plot(smi)

-                p_spec = make_spec_ms2_plot(ms2,
-                                             summ,
-                                             kvals=kvals,
-                                             labs=labs,
-                                             colrdata=colrdata)+theme_print()
+                p_spec = make_spec_ms2_plot(db=db,
+                                            summ,
+                                            kvals=kvals,
+                                            labs=labs,
+                                            colrdata=colrdata)+theme_print()

                cmb = combine_plots(p1,p2,p_spec,p_struc)
                print(cmb)
@@ -1377,7 +1370,7 @@ mk_shinyscreen_server <- function(projects,init) {
            fn = file.path(projdir,input$ms2_spectra_tab_name)
            shinymsg(paste0("Saving MS2 spectra table to: ",basename(fn)))
            tab2file(pack_ms2_w_summ(rvs$m$out$tab$summ,
-                                    rvs$m$extr$ms2),
+                                    rvs$m$db$extr$spectra),
                     fn)
            shinymsg("Done saving MS2 spectra table.")
        })
@@ -1588,7 +1581,7 @@ mk_shinyscreen_server <- function(projects,init) {
        })

        output$comp_table = DT::renderDataTable({
-            ## TODO FIXME
+            ## TODO 
            ## cmpds = rf_get_cmpd_tab()
            ## validate(need(NROW(cmpds)>0,"No compound list loaded yet."))
            ## DT::datatable(cmpds,
@@ -1626,8 +1619,6 @@ mk_shinyscreen_server <- function(projects,init) {
                                        m=rvs$m)
                
                kv = rf_get_cindex_kval()
-                ## Some cols that might be needed to be specified
-                ## explicitely. FIXME TODO: Make this more robust.
                nsumm = mf_narrow_summ(rvs$m$out$tab$summ,kv,
                                       ms2_rt_i=input$mf_entry_rt_min,
                                       ms2_rt_f=input$mf_entry_rt_max)
@@ -1639,7 +1630,8 @@ mk_shinyscreen_server <- function(projects,init) {
                                       path = rvs$m$run$metfrag$path,
                                       subpaths = rvs$m$run$metfrag$subpaths,
                                       db_file = rvs$m$run$metfrag$db_file,
-                                       stag_tab = stagtab, ms2 = rvs$m$extr$ms2,
+                                       stag_tab = stagtab,
+                                       ms2 = rvs$m$db$extr$spectra,
                                       runtime=rvs$m$run$metfrag$runtime,
                                       java_bin=rvs$m$run$metfrag$java_bin,
                                       nproc = rvs$m$conf$metfrag$nproc)
@@ -1768,7 +1760,7 @@ mk_shinyscreen_server <- function(projects,init) {
            selMS2 = req(input$sel_spec)
            if (NROW(ms2tabsel)!=0L) {
                lval = lapply(ms2tabsel[item==(selMS2)],function(x) x)
-                ms2 = rvs$m$extr$ms2
+                ms2 = rvs$m$db$extr$cgm$ms2
                kval = rf_get_cindex_kval()
                allval = c(kval,lval)
                ## There can be some duplicates.
@@ -1778,10 +1770,11 @@ mk_shinyscreen_server <- function(projects,init) {
                #more than the names existing in extr$ms2. Also,
                #BASE_KEY_MS2 does not contain `an', so we need to readd
                #it.
-                key = unique(c(names(allval)[names(allval) %in% BASE_KEY_MS2],"an"))
+                key = unique(c(names(allval)[names(allval) %in% BASE_KEY_MS2],"scan"))
                kval2 = allval[key]
-                spec = get_data_from_key(ms2,kval2)[,.(mz,intensity)]
-                ## as.character(lapply(1L:NROW(spec),function(nr) paste0(spec[nr,mz]," ",spec[nr,intensity])))
+                ## Only precid and scan can be used for selection in spectra.
+                kval2 = as.data.table(kval2[c("precid","scan")])
+                spec = rvs$m$db$extr$spectra[kval2,.(mz,intensity),on=.(precid,scan)]
                print(as.data.frame(spec),row.names=F)
                
            } else {

--- a/R/state.R
+++ b/R/state.R
@@ -37,13 +37,12 @@ new_state <- function() {

 runtime_from_conf <- function(run,envopts,conf) {
    lst_cmpl <- conf$compounds$lists
-    lst_fn_cmpl <- lapply(names(lst_cmpl),function (nm) {
-        bfn_cmpl <- lst_cmpl[[nm]]
+    lst_fn_cmpl <- lapply(lst_cmpl,function (lst) {
+        bfn_cmpl <- lst
        fn <- file.path(run$paths$project,bfn_cmpl)
        if (!file.exists(fn)) stop("File ", fn, " does not exist in ", run$paths$project," .")
        fn
    })
-    names(lst_fn_cmpl) <- names(lst_cmpl)
    run$paths$compounds$lists <- lst_fn_cmpl

    run$paths$data = norm_path(file.path(envopts$top_data_dir,conf$paths$data))
@@ -95,7 +94,7 @@ new_runtime_state <- function(project,envopts,conf=NULL) {
            ## If MetFrag possible, set up the file structure.
            if (metfrag$cando_metfrag) {
                mfdir = file.path(project_path,"metfrag")
-                dir.create(mfdir)
+                dir.create(mfdir, showWarnings = F)
                metfrag$path=mfdir
                subpaths = list(results = "results",
                                config = "config",
@@ -204,7 +203,7 @@ new_project <- function(project,envopts,datatab=NULL,conf=NULL) {
                 check_file_absent(fn_conf,what="conf-file")
                 yaml::yaml.load_file(fn_conf)
             } else conf 
-    m$conf$compounds$lists = label_cmpd_lists(m$conf$compounds$lists)
+    ## m$conf$compounds$lists = label_cmpd_lists(m$conf$compounds$lists)
    m$run = new_runtime_state(project,envopts=envopts,conf=m$conf)
    if (!is.null(datatab)) {
        m$input$tab$mzml = datatab
@@ -409,9 +408,11 @@ pack_ms2_w_summ <- function(summ,ms2) {
    ## Takes summ, finds entries with high quality spectra and subsets ms2 based on that.

    ## Take the columns we need from summ.
-    x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz","SMILES","Formula","Name")]
-    mrg_keys = c(intersect(key(ms2),key(summ)),"an")
-    ms2[x,.(mz=i.mz,ms2_spectrum=encode_ms2_to_line(.SD[,c("mz","intensity")])),on=mrg_keys,by=.EACHI]
+    x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz","SMILES","Name")]
+    mrg_keys = intersect(key(ms2),key(summ))
+    mrg_keys = c(mrg_keys,"scan")
+    ## ms2[x,.(mz=i.mz,ms2_spectrum=encode_ms2_to_line(.SD[,c("mz","intensity")])),on=mrg_keys,by=.EACHI]
+    ms2[x,.(ion_mz=i.mz,mz,intensity),on=mrg_keys,by=.EACHI]
 }


@@ -431,3 +432,40 @@ pack_project <- function(m,fn_arch) {
    })
    fn_arch
 }
+
+join_compound_lists <- function(fname) {
+    coll <- list()
+    fields <- colnames(EMPTY_CMPD_LIST)
+    coltypes <- c(ID="character",
+                  SMILES="character",
+                  Formula="character",
+                  Name="character",
+                  RT="numeric",
+                  mz="numeric")
+    l=0L
+    for (fn in fname) {
+        l = l + 1L
+        ## Figure out column headers.
+        nms <- colnames(file2tab(fn,nrows=0))
+        
+        ## Read the table. Knowing column headers prevents unnecessary
+        ## warnings.
+        dt <- file2tab(fn, colClasses=coltypes[nms])
+        verify_cmpd_l(dt=dt,fn=fn)
+                                        # nonexist <- setdiff(fnfields,fields)
+        coll[[l]] <- dt #if (length(nonexist)==0) dt else dt[,(nonexist) := NULL]
+        coll[[l]]$ORIG <- fn
+    }
+    if (length(fname)>0) rbindlist(l=c(list(EMPTY_CMPD_LIST), coll), use.names = T, fill = T) else EMPTY_CMPD_LIST
+
+}
+
+process_cmpd_sets <- function(cmpdlist) {
+    if (nrow(cmpdlist)==0L) return(EMPTY_CMPD_LIST)
+    ## Process sets.
+    if (! ("set" %in% colnames(cmpdlist))) cmpdlist$set=NA_character_ else cmpdlist[,set:=as.character(set)]
+    slugs = cmpdlist[,.(slug=gen_fname_slug(ORIG)),by="ORIG"]
+    slugs[,slug:=uniqy_slugs(slug)]
+    cmpdlist[slugs,set:=fifelse(is.na(set),i.slug,set),on="ORIG"]
+    cmpdlist
+}
No results found