Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • eci/shinyscreen
  • miroslav.kratochvil/shinyscreen
2 results
Show changes
Commits on Source (67)
Showing with 1220 additions and 1468 deletions
......@@ -44,9 +44,6 @@ variables:
base-image:
tags:
- docker
- $RUNNER_TAG
stage: dep_images
rules:
- if: $CI_COMMIT_TAG == ""
......
Package: shinyscreen
Title: Pre-screening of Mass Spectrometry Data
Version: 1.3.19
Version: 1.3.21
Author: Todor Kondić
Maintainer: Todor Kondić <todor.kondic@uni.lu>
Authors@R:
......@@ -50,6 +50,7 @@ Collate:
'errors.R'
'mix.R'
'envopts.R'
'data-model.R'
'state.R'
'metfrag.R'
'plotting.R'
......
---
title: "shinyscreen Docker Documentation"
author: "Anjana Elapavalore"
date: '`r Sys.Date()`'
output: pdf_document
---
The objective of this documentation is to provide a detailed procedure for running the shinyScreen application within a Docker container. shinyScreen is a web-based tool developed for visualizing and analyzing high-throughput screening data using Shiny, a web application framework for R. Docker is employed to containerize the application, ensuring reproducibility and simplifying the deployment process by packaging the application with all its dependencies.
This document provides step-by-step instructions on how to install Docker, pull the ShinyScreen Docker image, and run the application.
## Prerequisites
Before proceeding, the following prerequisites must be met:
- Docker Desktop must be installed and properly configured on the system.
- Sufficient system resources, including at least 4 GB of RAM, are recommended for optimal performance.
## Step 1: Install Docker Desktop on Windows
If Docker Desktop is not already installed on your system, follow these steps to set it up:
1. **Download Docker Desktop**:
- Visit the [Docker Desktop for Windows](https://docs.docker.com/desktop/install/windows-install/) download page and download the installer.
2. **Install Docker Desktop**:
- Run the downloaded installer.
- Follow the installation prompts, and make sure the option to "Use the WSL 2-based engine" is selected during setup (if available).
3. **Verify Docker Installation**:
- After installation, launch Docker Desktop.
- Ensure Docker is running by checking the Docker icon in the system tray (it should be active).
- Open a command prompt (CMD) or PowerShell and type the following command to check the Docker version:
```bash
docker --version
```
This should return the installed Docker version.
## Step 2: Search for the ShinyScreen Docker Image in Docker Desktop
Docker Desktop provides a graphical interface to search for Docker images. To search and pull the **shinyScreen** Docker image:
1. **Open Docker Desktop**:
- Launch Docker Desktop from the Windows Start menu or by clicking the Docker icon in the system tray.
2. **Search for the ShinyScreen Image**:
- In Docker Desktop, navigate to the **search** tab as shown in the image below.
**Docker Image : anjanae/shinyscreen:v21_latest**
```{r, echo=FALSE, out.width="70%"}
knitr::include_graphics("images/searchtab.png")
```
3. **Pull the Image**:
- Click on the **Pull** button next to the desired ShinyScreen image. This will download the image to your local system.
You can monitor the download progress in Docker Desktop’s **Images** tab.
## Step 3: Run the ShinyScreen Docker Container
Once the ShinyScreen image has been pulled successfully, the next step is to run the container. This can be done directly from Docker Desktop or via the command line:
### Running the Container via Docker Desktop:
1. **Navigate to the Images Tab**:
- In Docker Desktop, go to the **Images** tab where your downloaded ShinyScreen image will be listed.
2. **Launch the Container**:
- Click the **Run** button next to the ShinyScreen image.
- In the pop-up dialog, ensure that port `3838` is exposed. To do this, add the following port configuration in the **Optional Settings**:
```
Host Port: 3838
Container Port: 3838
```
3. **To create the volume mounts**:
- To share files between the host and the container, you can use Docker’s volume mounting feature.
### local Directory Structure
Before launching the Docker container, the following directory structure should be set up on your local system. These directories will be mounted to specific locations inside the container, enabling the application to access the required input data and store the output files.
### Root Directory
The **root directory** is the top-level folder where the entire project is organized. This root directory will contain the following subdirectories:
1. **Project Directory**
2. **Data Directory**
3. **MetFrag Databases Directory**
Below is an overview of each subdirectory and its purpose.
### Project Directory
The **Project Directory** should initially contain only the **compound list** file, which is an essential input for the ShinyScreen application.
After the application runs, this directory will also store various output files generated during the analysis.
### Data Directory
The **Data Directory** should contain only the **mzML files**. These files serve as the primary data input for the ShinyScreen application and represent mass spectrometry data in mzML format.
### MetFrag Databases Directory
The **MetFrag Databases Directory** must include the latest **PubChemLite_Exposomics.csv** file, which will be used by the ShinyScreen application for MetFrag analysis.
### Configure Volume Mounts in Docker Desktop
Now that the local directories are set up, you can configure **volume mounts** in Docker Desktop to map these local directories to the corresponding paths inside the Docker container.
For the volume mounts, each directory on the host machine (local system) will be mapped to a specific location within the Docker container, allowing the application to access the input data and save output files.
In the **Optional Settings** pop-up window, you will configure the following **three volume mounts** (Please refer to the table and image below):
- **Local Directory on Windows**:
- This is the folder on your local machine where the data will reside.
- **Container Directory**:
- This is the path inside the Docker container where the files will be mounted.
| Local Directory (Host) | Container Directory (Mount) | Description |
| ------------------------------ | -------------------------------------- | ------------------------------- |
| `C:\path\to\rootdirectory` | `/home/ssuser/projects` | Directory for project files. |
| `C:\path\to\rootdirectory` | `/home/ssuser/top_data_dir` | Directory for top-level data. |
| `C:\path\to\metfrag_dbs` | `/home/ssuser/metfrag_dbs` | Directory for MetFrag databases. |
```{r, echo=FALSE, out.width="70%"}
knitr::include_graphics("images/runtab.png")
```
By setting up the volume mounts, any changes made to files in the mapped directories on the host machine will automatically reflect inside the container.
Once the **port** and **volume configurations** are confirmed, click **Run** to start the container. The ShinyScreen application will now be running inside the container.
Now click on the port mapping to access the application in the web browser
```{r, echo=FALSE, out.width="85%"}
knitr::include_graphics("images/containertab.png")
```
File added
Docker_Documentation/images/containertab.png

198 KiB

Docker_Documentation/images/runtab.png

97.9 KiB

Docker_Documentation/images/searchtab.png

141 KiB

FROM gitlab.lcsb.uni.lu:4567/eci/shinyscreen/dep/ssuser:latest
MAINTAINER todor.kondic@uni.lu
EXPOSE 5432
EXPOSE 3838
ENV SS_MF_DB="PubChemLite_exposomics.csv"
ENV SS_CPU 2
ADD . shinyscreen/
RUN R CMD build shinyscreen
USER root
RUN R CMD INSTALL shinyscreen
USER ssuser
RUN cp shinyscreen/runme /home/ssuser/runme
RUN chmod u+x /home/ssuser/runme
# RUN chown ssuser /home/ssuser/runme
# RUN chown -R ssuser /home/ssuser/shinyscreen
RUN R -e 'library(shinyscreen);setwd("~");init(top_data_dir="~/top_data_dir",projects="~/projects",users_dir="~/users",metfrag_db_dir=Sys.getenv("SS_MF_DB_DIR"),metfrag_jar="/usr/local/bin/MetFragCommandLine.jar",no_structure_plots=T,save=T,merge=F)'
ENTRYPOINT ["/home/ssuser/runme"]
CMD ["app"]
......@@ -6,7 +6,6 @@ export(conf_trans)
export(create_plots)
export(create_stub_gui)
export(extr_data)
export(extract)
export(get_fn_comp)
export(get_fn_conf)
export(get_fn_extr)
......@@ -47,8 +46,6 @@ export(sort_spectra)
export(subset_summary)
export(tk_save_file)
import(data.table)
importFrom(MSnbase,filterMz)
importFrom(MSnbase,readMSData)
importFrom(promises,"%...>%")
importFrom(promises,future_promise)
importFrom(shiny,HTML)
......
......@@ -32,6 +32,7 @@ run <- function(envopts,
all_phases=list(setup=setup_phase,
comptab=mk_comp_tab,
db=make_db,
extract=extr_data,
prescreen=prescreen,
sort=sort_spectra,
......@@ -98,39 +99,14 @@ run_in_dir <- function(m) {
}
##' @export
load_compound_input <- function(m) {
coll <- list()
fields <- colnames(EMPTY_CMPD_LIST)
fns <- file.path(m$run$paths$project,m$conf$compounds$lists)
message("fns:",paste0(fns,collapse=","))
coltypes <- c(ID="character",
SMILES="character",
Formula="character",
Name="character",
RT="numeric",
mz="numeric")
for (l in 1:length(fns)) {
fn <- fns[[l]]
## Figure out column headers.
nms <- colnames(file2tab(fn,nrows=0))
## Read the table. Knowing column headers prevents unnecessary
## warnings.
dt <- file2tab(fn, colClasses=coltypes[nms])
verify_cmpd_l(dt=dt,fn=fn)
# nonexist <- setdiff(fnfields,fields)
coll[[l]] <- dt #if (length(nonexist)==0) dt else dt[,(nonexist) := NULL]
coll[[l]]$ORIG <- fn
}
cmpds <- if (length(fns)>0) rbindlist(l=c(list(EMPTY_CMPD_LIST), coll), use.names = T, fill = T) else EMPTY_CMPD_LIST
fns <- file.path(m$run$paths$project,m$conf$compounds$lists)
cmpds = join_compound_lists(fns)
## Process sets.
cmpds = process_cmpd_sets(cmpds)
dups <- duplicated(cmpds$ID)
dups <- dups | duplicated(cmpds$ID,fromLast = T)
dupIDs <- cmpds$ID[dups]
......@@ -219,8 +195,6 @@ mk_comp_tab <- function(m) {
smiforadd <- smiles[smiforadd,.(ID,SMILES,Formula,adduct),on=c("SMILES")]
data.table::setkey(smiforadd,"adduct","ID")
## FIXME: Why is Formula a list when there are no SMILES, instead
## of an empty string?
smiforadd[,Formula:=as.character(Formula)]
## Update the intermediate table with masses.
......@@ -336,181 +310,67 @@ mk_tol_funcs <- function(m) {
}
##' @export
extr_data <-function(m) {
message("Stage: extract")
if (is.null(m$conf$serial) || !m$conf$serial) {
extr_data_future(m)
} else {
message("(extract): Serial extraction.")
extr_data_serial(m)
}
}
extr_data_future <- function(m) {
## Reduce the comp table to only unique masses (this is because
## different sets can have same masses).
m$out$tab$data <- m$out$tab$comp[,head(.SD,1),by=BASE_KEY]
m$out$tab$data[,set:=NULL] #This column is meaningless now.
file <- m$out$tab$data[,unique(file)]
fpaths <- file.path(m$run$paths$data,file)
allCEs <- do.call(c,args=lapply(fpaths,function(fn) {
z <- MSnbase::readMSData(files=fn,msLevel = c(1,2),mode="onDisk")
fine = create_fine_table(m)
unique(MSnbase::collisionEnergy(z),fromLast=T)
}))
allCEs <- unique(allCEs)
allCEs <- allCEs[!is.na(allCEs)]
cols <-paste('CE',allCEs,sep = '')
vals <- rep(NA,length(cols))
m$out$tab$data[,(cols) := .(rep(NA,.N))]
file <- m$out$tab$data[,unique(file)]
ftags <- m$out$tab$data[,.(tag=unique(tag)),by=file]
fpaths <- file.path(m$run$paths$data,ftags[,file])
futuref <- m$future
tmp <- lapply(1:nrow(ftags),function(ii) {
fn <- fpaths[[ii]]
the_tag <- ftags[ii,tag]
message("(extract): Commencing extraction for tag: ", the_tag, "; file: ",fn)
tab <- as.data.frame(data.table::copy(m$out$tab$data[tag==the_tag,.(file,tag,adduct,mz,rt,ID)]))
## err_ms1_eic <- m$extr$tol$eic
## err_coarse_fun <- m$extr$tol$coarse
## err_fine_fun <- m$extr$tol$fine
## err_rt <- m$extr$tol$rt
err_coarse <- m$conf$tolerance[["ms1 coarse"]]
err_fine <- m$conf$tolerance[["ms1 fine"]]
dpath = m$run$paths$data
err_ms1_eic <- m$conf$tolerance$eic
err_rt <- m$conf$tolerance$rt
## Open all files.
fns = fine[,unique(file)]
lms = lapply(fns,function(fn) read_data_file(file.path(dpath,fn)))
names(lms) = fns
## Load all feature data.
lfdata = lapply(lms,get_fdata)
names(lfdata) = fns
missing_precursor_info <- m$conf$extract$missing_precursor_info
x <- futuref(extract(fn=fn,
tag=the_tag,
tab=tab,
err_ms1_eic=err_ms1_eic,
err_coarse = err_coarse,
err_fine= err_fine,
err_rt= err_rt,
missing_precursors = missing_precursor_info),
lazy = F)
## Extract MS1 chromatograms using "fine" tolerance.
cgram_ms1 = fine[,extr_cgrams_ms1(lms[[file]],.SD,lfdata[[file]]),
by="file",
.SDcols=c("iso_fine_min",
"iso_fine_max",
"rt_min",
"rt_max",
"precid")]
setkey(cgram_ms1,file,precid)
x
## Extract MS2 chromatograms.
})
## Create the "coarse" table. Parent masses are known with
## "coarse". We will prefilter our ms2 results based on that...
coarse = create_coarse_table(m)
msk <- sapply(tmp,future::resolved)
curr_done <- which(msk)
for (x in curr_done) {
message("Done extraction for ", future::value(tmp[[x]])$ms1$tag[[1]])
}
while (!all(msk)) {
msk <- sapply(tmp,future::resolved)
newly_done <- which(msk)
for (x in setdiff(newly_done,curr_done)) {
message("Done extraction for ", future::value(tmp[[x]])$ms1$tag[[1]])
}
Sys.sleep(0.5)
curr_done <- newly_done
cgram_ms2 = data.table(precid=integer(0),
ce=numeric(0),
scan=character(0),
idx=integer(0),
rt=numeric(0),
intensity=numeric(0))
## Extract MS2 spectra.
spectra = empty_spectra_table()
for (fn in names(lfdata)) {
rtab = relate_ms2_to_precid(coarse=coarse[.(fn),on=.(file)],
ms2=lfdata[[fn]]$ms2,
cgram_ms1=cgram_ms1[.(fn),
on=.(file)])
sptab = extract_spectra(lms[[fn]],rtab)
cgram_ms2 = rbind(cgram_ms2,rtab)
spectra = rbind(spectra,sptab)
}
ztmp <- lapply(tmp,future::value)
m$extr$ms1 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms1))
m$extr$ms2 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms2))
data.table::setkeyv(m$extr$ms1,BASE_KEY)
data.table::setkeyv(m$extr$ms2,c(BASE_KEY,"CE"))
fn_ex <- get_fn_extr(m)
timetag <- format(Sys.time(), "%Y%m%d_%H%M%S")
saveRDS(object = m, file = file.path(m$run$paths$project,FN_EXTR_STATE))
setkey(cgram_ms1,precid,rt)
setkey(cgram_ms2,precid,ce,rt)
setkey(spectra,precid,scan)
m$db$extr$cgm$ms1 = cgram_ms1
m$db$extr$cgm$ms2 = cgram_ms2
m$db$extr$spectra = spectra
m
}
extr_data_serial <- function(m) {
## Reduce the comp table to only unique masses (this is because
## different sets can have same masses).
m$out$tab$data <- m$out$tab$comp[,head(.SD,1),by=BASE_KEY]
m$out$tab$data[,set:=NULL] #This column is meaningless now.
file <- m$out$tab$data[,unique(file)]
fpaths <- file.path(m$run$paths$data,file)
allCEs <- do.call(c,args=lapply(fpaths,function(fn) {
z <- MSnbase::readMSData(files=fn,msLevel = c(1,2),mode="onDisk")
unique(MSnbase::collisionEnergy(z),fromLast=T)
}))
allCEs <- unique(allCEs)
allCEs <- allCEs[!is.na(allCEs)]
cols <-paste('CE',allCEs,sep = '')
vals <- rep(NA,length(cols))
m$out$tab$data[,(cols) := .(rep(NA,.N))]
file <- file.path(m$run$paths$data,m$out$tab$data[,unique(file)])
ftags <- m$out$tab$data[,.(tag=unique(tag)),by=file]
ftags[,path:=file.path(..m$run$paths$data,file)]
futuref <- m$future
tmp <- lapply(1:nrow(ftags),function(ii) {
fn <- ftags[ii,path]
the_tag <- ftags[ii,tag]
message("(extract): Commencing extraction for tag: ", the_tag, "; file: ",fn)
tab <- as.data.frame(data.table::copy(m$out$tab$data[tag==the_tag,.(file,tag,adduct,mz,rt,ID)]))
## err_ms1_eic <- m$extr$tol$eic
## err_coarse_fun <- m$extr$tol$coarse
## err_fine_fun <- m$extr$tol$fine
## err_rt <- m$extr$tol$rt
err_coarse <- m$conf$tolerance[["ms1 coarse"]]
err_fine <- m$conf$tolerance[["ms1 fine"]]
err_ms1_eic <- m$conf$tolerance$eic
err_rt <- m$conf$tolerance$rt
missing_precursor_info <- m$conf$extract$missing_precursor_info
x <- extract(fn=fn,
tag=the_tag,
tab=tab,
err_ms1_eic=err_ms1_eic,
err_coarse = err_coarse,
err_fine= err_fine,
err_rt= err_rt,
missing_precursors = missing_precursor_info)
message("Done extraction for ", x$ms1$tag[[1]])
x
})
ztmp <- tmp
m$extr$ms1 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms1))
m$extr$ms2 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms2))
data.table::setkeyv(m$extr$ms1,BASE_KEY)
data.table::setkeyv(m$extr$ms2,c(BASE_KEY,"CE"))
fn_ex <- get_fn_extr(m)
timetag <- format(Sys.time(), "%Y%m%d_%H%M%S")
saveRDS(object = m, file = file.path(m$run$paths$project,FN_EXTR_STATE))
m
}
##' @export
......@@ -523,11 +383,10 @@ conf_trans <- function(conf) {
prescreen <- function(m) {
## Top-level auto prescreening function.
message("(prescreen): Start.")
## confpres <- conf_trans_pres(m$conf$prescreen)
m$qa <- NULL
m$out$tab$summ <- NULL
m$qa <- analyse_extracted_data(m$extr,m$conf$prescreen)
m$out$tab$summ <- gen_summ(m$out$tab$comp,m$qa)
m$qa = NULL
m$out$tab$summ = NULL
m$qa = analyse_extracted_data(m$db,m$conf$prescreen)
m$out$tab$summ = gen_summ(m$db,m$qa,m$out$tab$comp)
message("(prescreen): End.")
m
}
......@@ -605,8 +464,8 @@ create_plots <- function(m) {
## Select the data nedeed for plotting.
flt_summ <- m$out$tab$flt_summ
ms2 <- m$extr$ms2
ms1 <- m$extr$ms1
ms2 <- m$db$extr$cgm$ms2
ms1 <- m$db$extr$cgm$ms1
rt_min <- rt_in_min(m$conf$figures$rt_min)
rt_max <- rt_in_min(m$conf$figures$rt_max)
keytab <- flt_summ[,unique(.SD),.SDcol=c("adduct","ID")]
......@@ -747,8 +606,8 @@ report <- function(m) {
dir.create(REP_TOPDIR,recursive = T,showWarnings = F)
header <- knitr::knit_expand(fn_header)
flt_summ <- m$out$tab$reptab
ms2 <- m$extr$ms2
ms1 <- m$extr$ms1
ms2 <- m$db$extr$cgm$ms2
ms1 <- m$db$extr$cgm$ms1
rt_min <- rt_in_min(m$conf$figures$rt_min)
rt_max <- rt_in_min(m$conf$figures$rt_max)
keytab <- flt_summ[,unique(.SD),.SDcol=c("adduct","ID")]
......@@ -922,7 +781,7 @@ metfrag <- function(m) {
path = m$run$metfrag$path,
subpaths = m$run$metfrag$subpaths,
db_file = m$run$metfrag$db_file,
stag_tab = stagtab, ms2 = m$extr$ms2,
stag_tab = stagtab, ms2 = m$db$extr$spectra,
runtime=m$run$metfrag$runtime,
java_bin=m$run$metfrag$java_bin,
nproc = m$conf$metfrag$nproc)
......@@ -942,3 +801,9 @@ metfrag <- function(m) {
m
}
make_db <- function(m) {
m = make_db_catalogue(m)
m = make_db_precursors(m)
m
}
......@@ -18,7 +18,13 @@
the_ifelse <- data.table::fifelse
dtable <- data.table::data.table
norm_path <- function(...) normalizePath(...,winslash='/')
norm_path <- function(...) {
test = nchar(...) > 0L
res = character(length(test))
res[test] = normalizePath(...[test],winslash='/')
res[!test] = ...[!test]
res
}
tab2file<-function(tab,file,...) {
data.table::fwrite(x=tab,file=file,...)
......@@ -143,3 +149,23 @@ uniqy_slugs <- function(slugs) {
dt = data.table::data.table(slug=slugs)
dt[,slug:=fifelse(rep(.N==1L,.N),slug,paste0(slug,"_",seq(1L,.N))),by="slug"]$slug
}
gen_val_unc <- function(x,dx) {
## Doesn't work well for <=0.
p = floor(log10(x))
dp = floor(log10(dx))
## Zero?
message("p ",p)
w = which(is.infinite(p))
p[w] = 0
## Normalise x and dx.
main = x/10**p
unc = round(dx/10**dp,0)
place = p - dp
main = mapply(function (m,d) formatC(m,digits=d,format='f',flag="#"),main,place,USE.NAMES=F)
w = which(main=='10.')
main[w]='1'
p[w]=p[w]+1
paste0(main,"(",unc,") x 10^",p)
}
#Copyright (C) 2023 by University of Luxembourg
## Shinyscreen works of an internal relational database implemented
## using `data.table' package. Implementation is here.
make_db_catalogue <- function(m) {
## Takes comprehensive database from state `m' and generates a
## catalogue with a unique key. This catalogue is based on
## inputs. Each entry in the catalogue corresponds to a single
## target mass from a single experimental run.
res = m$out$tab$comp[,unique(.SD),.SDcols=c("set","tag","adduct","ID")]
res[,catid:=.I]
setkeyv(res,DB_CATALOGUE_KEY)
setindex(res,catid)
m$db$cat = res
m
}
merge_precid_4_isobars <- function(orig_precids,masses,up_masses) {
start = head(orig_precids,1L)
n = length(orig_precids)
precid = orig_precids
i = 1L
while (i < n) {
theprecid = orig_precids[[i]]
themz = masses[[i]]
mzup = up_masses[[i]]
w = which(masses[(i+1L):n]<mzup)
precid[(i+1L):n][w] = theprecid
i = i + length(w) + 1L
}
precid
}
make_db_precursors <- function(m) {
## Generate masses and label isobars.
cat = m$db$cat
masses = m$out$tab$comp[cat,.(tag=tag,catid=catid,mz=mz,rt=rt),on=key(cat)]
setkey(masses,tag,mz)
## Retention time.
tmp = get_val_unit(m$conf$tolerance[['rt']])
rttol = as.numeric(tmp[['val']])
rtunit = tmp[['unit']]
if (rtunit == "s") {
rttol = rttol/60.
} else if (rtunit != "min") {
stop('make_db_precursors: Unknown retention time unit.')
}
masses[!is.na(rt),`:=`(rt_min=rt-rttol,rt_max=rt+rttol)]
## Fine error.
tmp = get_val_unit(m$conf$tolerance[['ms1 fine']])
ms1tol = as.numeric(tmp[['val']])
ms1unit = tmp[['unit']]
if (ms1unit == "ppm") {
masses[,`:=`(mz_fine_min=mz-ms1tol*mz*1e-6,mz_fine_max=mz+ms1tol*mz*1e-6)]
} else if (ms1unit == "Da") {
masses[,`:=`(mz_fine_min=mz-ms1tol,mz_fine_max=mz+ms1tol)]
} else {
stop('make_db_precursors: Unknown mass unit (fine).')
}
## Coarse error.
tmp = get_val_unit(m$conf$tolerance[['ms1 coarse']])
ms1tol = as.numeric(tmp[['val']])
ms1unit = tmp[['unit']]
if (ms1unit == "ppm") {
masses[,`:=`(mz_coarse_min=mz-ms1tol*mz*1e-6,mz_coarse_max=mz+ms1tol*mz*1e-6)]
} else if (ms1unit == "Da") {
masses[,`:=`(mz_coarse_min=mz-ms1tol,mz_coarse_max=mz+ms1tol)]
} else {
stop('make_db_precursors: Unknown mass unit (coarse).')
}
## Assign "fine" isobars to same isocoarse number.
masses[,precid:=merge_precid_4_isobars(catid,mz,mz_fine_max),by="tag"]
## Assign "coarse" isobars to same isocoarse number.
masses[,isocoarse:=merge_precid_4_isobars(catid,mz,mz_coarse_max),by="tag"]
masses[,`:=`(iso_coarse_min=min(mz_coarse_min),
iso_coarse_max=max(mz_coarse_max)),
by=isocoarse]
masses[,`:=`(iso_fine_min=min(mz_fine_min),
iso_fine_max=max(mz_fine_max)),
by=precid]
setindex(masses,isocoarse,precid)
## Add files.
filetab = m$input$tab$mzml[m$db$cat,
.(catid=i.catid,file=file),
on=c("set","tag"),nomatch=NULL]
masses[filetab,file:=i.file,on="catid"]
m$db$precursors = masses
m
}
empty_cgram_ms1 <- function(n=0L) {
r = data.table(file=character(n),
cgmidx=integer(n),
precid=integer(n),
scan=integer(n),
rt=numeric(n),
intensity=numeric(n))
setkey(r,precid,rt)
r
}
empty_cgram_ms2 <- function(n=0L) {
r = data.table(precid=integer(n),
ce=numeric(n),
scan=character(n),
idx=integer(n),
rt=numeric(n),
intensity=numeric(n))
setkey(r,precid,ce,idx)
r
}
empty_spectra_table <- function(n=0L) {
r = data.table(precid=integer(n),
scan=character(n),
mz=numeric(n),
intensity=numeric(n))
setkey(r,precid,scan)
r
}
summ_needs_from_cat <- function(cat) {
## Catalogue columns.
cat
}
summ_needs_from_precursors <- function(res,precursors) {
## Mass columns.
precursors[res,on=.(catid),.(precid,
mz,
set,
adduct,
tag,
ID,
mz_l=mz_fine_min,
mz_r=mz_fine_max),by=.EACHI]
}
summ_needs_from_qa <- function(res,qa) {
needs = qa[,.SD,.SDcols=c("precid",
"ce",
"scan",
"ms1_rt",
"ms1_int",
"ms2_rt",
"ms2_int",
"ms1_mean",
"ms2_sel",
"qa_pass",
"qa_ms1_exists",
"qa_ms2_exists",
"qa_ms1_good_int",
"qa_ms1_above_noise",
"qa_ms2_near",
"qa_ms2_good_int",
"qlt_ms1",
"qlt_ms2")]
res = needs[res,on=.(precid),allow.cartesian=T]
## TODO: additional processing?
res
}
summ_needs_from_comp <- function(res,comp) {
needs = comp[,.(set,ID,Name,SMILES)]
setkey(needs,set,ID)
res[needs,on=.(set,ID),`:=`(Name=i.Name,
SMILES=i.SMILES)]
}
## This function creates `summ' table.
gen_summ <- function(db,qa,comp) {
## Start with the basic things.
res = summ_needs_from_cat(db$cat)
## Add masses and precids.
res = summ_needs_from_precursors(res,db$precursors)
## Add qa columns.
res = summ_needs_from_qa(res,qa)
setkeyv(res,SUMM_KEY)
## Add comp columns.
summ_needs_from_comp(res,comp)
}
......@@ -12,598 +12,298 @@
## See the License for the specific language governing permissions and
## limitations under the License.
load_raw_data<-function(fn,mode="inMemory") {
ms1<-MSnbase::readMSData(files=fn,mode=mode,msLevel.=1)
ms2<-MSnbase::readMSData(files=fn,mode=mode,msLevel.=2)
c(ms1=ms1,ms2=ms2)
}
create_fine_table <- function(m) {
## Select fine mz-ranges and split them into those with rt entries
## and those without.
precs = m$db$precursors
precs[,unique(.SD),.SDcols=c("iso_fine_min",
"iso_fine_max",
"rt_min",
"rt_max",
"file"),
keyby=c("file","precid")]
centroided1 <- function(ms) {
if (all(MSnbase::centroided(ms)) == T)
return(T) else {
state <- MSnbase::isCentroided(ms)
N <- length(state)
fls <-length(which(state == F))
if (fls/(1.*N) < 0.01) T else F
}
}
centroided <- function(msvec) {
if (is.vector(msvec)) {
f <- list()
for (i in 1:length(msvec)) {
f[[i]] <- future::future(centroided1(msvec[[i]]))
}
lapply(f, FUN = future::value)
} else {
centroided1(msvec)
}
}
create_coarse_table <- function(m) {
## Select coarse mz-ranges and split them into those with rt entries
## and those without.
precs = m$db$precursors
precs[,unique(.SD),.SDcols=c("iso_coarse_min",
"iso_coarse_max",
"rt_min",
"rt_max",
"file"),
keyby=c("file","isocoarse","precid")]
acq_mz<-function(tabFn) {
df<-read.csv(tabFn,
stringsAsFactors=F,
comment.char='')
x<-as.numeric(df$mz)
names(x)<-paste("ID:",as.character(df$ID),sep='')
x
}
name2id<-function(nm) {as.integer(substring(nm,4))}
id2name<-function(id) {paste("ID:",id,sep='')}
ppm2dev<-function(m,ppm) 1e-6*ppm*m
gen_mz_range<-function(mz,err) {
mat<-matrix(data=numeric(1),nrow=length(mz),ncol=2)
mat[,1]<-mz - err
mat[,2]<-mz + err
mat
read_data_file <- function(file) {
MSnbase::readMSData(file=file,msLevel=c(1,2),mode="onDisk")
}
gen_rt_range<-function(rt,err) {
mat<-matrix(data=numeric(1),nrow=length(rt),ncol=2)
rV<-which(!is.na(rt))
rNA<-which(is.na(rt))
mat[rV,1]<-(rt[rV] - err)*60
mat[rV,2]<-(rt[rV] + err)*60
mat[rNA,1]<--Inf
mat[rNA,2]<-Inf
mat
}
filt_ms2_by_prcs <- function(ms2,mzrng,ids,adduct) {
pre<-MSnbase::precursorMz(ms2)
psn<-MSnbase::precScanNum(ms2)
acN<-MSnbase::acquisitionNum(ms2)
nR<-length(pre)
inRange<-function(i) {
mp<-pre[[i]]
x<-mzrng[,1]<mp & mp<mzrng[,2]
ind<-which(x)
sids <- ids[ind]
add <- adduct[ind]
dtable(ID=sids,adduct=add)
extr_cgrams_ms1 <- function(ms,tab,fdata) {
## Some helpers.
new_restab <- function(intab,cgm) {
base = intab[,.(precid=precid,cgmidx=.I)]
cgm = base[,{
rt = rtime(cgm[cgmidx,1])
inte = intensity(cgm[cgmidx,1])
.(precid=precid,
rt = rt,
intensity = inte,
scan = names(rt))},
by="cgmidx"]
setkey(cgm,scan)
cgm[fdata$ms1,idx:=i.idx,on="scan"]
cgm
}
lst<-lapply(1:nR,function(i) {
dt <- inRange(i)
list(n=i,prec_scan=psn[[i]],aN=acN[[i]],ids=dt$ID,adduct=dt$adduct)
})
nemp<-sapply(lst,function(m) length(m$ids)>0)
wrk<-lst[nemp]
dfL<-sum(sapply(wrk,function(w) length(w$ids)))
df<-dtable(ID=character(dfL),
adduct=character(dfL),
prec_scan=integer(dfL),
aN=integer(dfL),
OK=logical(dfL))
df$OK<-T #TODO Introduced for testing, be careful.
offD<-0
for (m in wrk) {
l<-length(m$ids)
rng<-(offD+1):(offD+l)
df[rng,"ID"] <- m$ids
df[rng,"prec_scan"] <- m$prec_scan
df[rng,"aN"] <- m$aN
df[rng,"adduct"] <- m$adduct
offD<-offD+l
}
df[order(df$aN),]
}
filt_ms2_by_prcs_ht<-function(ms2,mzrng,ids,adduct) {
lgnd<-filt_ms2_by_prcs(ms2,mzrng=mzrng,ids=ids,adduct=adduct)
scans<-unique(lgnd$aN)
ns<-which(MSnbase::acquisitionNum(ms2) %in% scans)
sms2<-ms2[ns]
list(ms2=sms2,leg=lgnd)
}
pick_unique_precScans<-function(idx) {
ps<-unique(idx$prec_scan)
mind<-match(ps,idx$prec_scan)
data.frame(prec_scan=idx$prec_scan[mind],
ID=idx$ID[mind],
adduct=idx$adduct[mind],
stringsAsFactors=F)
}
trt = tab[!is.na(rt_min)]
tnort = tab[is.na(rt_min)]
pick_uniq_pscan<-function(leg) {
res <- leg[,.(prec_scan=unique(prec_scan)),by=c("ID","adduct")]
res[order(prec_scan),]
## ids<-unique(leg$ID)
## x<-lapply(ids,function(id) {ups<-unique(leg[id==leg$ID,"prec_scan"]);data.frame(ID=rep(id,length(ups)),prec_scan=ups,stringsAsFactors = F)})
## res<-do.call(rbind,c(x,list(stringsAsFactors=F)))
## res[order(res$prec_scan),]
}
resrt = if (nrow(trt)>0L) {
## Call with rt argument (in seconds).
mzrng = as.matrix(trt[,.(iso_fine_min,iso_fine_max)])
rtrng = as.matrix(trt[,.(rt_min*60,rt_max*60)])
new_restab(trt,MSnbase::chromatogram(ms,mz = mzrng, rt = rtrng))
} else data.table()
verif_prec_fine_ht<-function(preLeg,ms1,mz,mzrng,ids,adduct) {
## TODO FIXME TESTPHASE Something goes wrong here, all mapply results are
## not OK. More testing needed. ... huh? But, it works now? (23/02/2022)
df<-preLeg
xx <- dtable(adduct=adduct,ID=ids,mz=mz,mz1=mzrng[,1],mz2=mzrng[,2])
df <- preLeg[xx,on=c("ID","adduct")]
df$ipns<-match(df$prec_scan,MSnbase::acquisitionNum(ms1))
df[, ("mzsp") := .(lapply(ipns,function (ip) if (!is.na(ip)) MSnbase::mz(ms1[[ip]]) else NA_real_))]
df$OK<-mapply(function(m1,sp,m2) any((m1<sp) & (sp<m2)),df$mz1,df$mzsp,df$mz2)
res<-df[df$OK,]
res$ipns<-NULL
res$mz1<-NULL
res$mz2<-NULL
res$mzsp<-NULL
res
}
filt_ms2<-function(ms1,ms2,mz,errCoarse,errFinePPM) {
tmp<-filt_ms2_by_prcs_ht(ms2,mz,errCoarse=errCoarse)
legMS2<-tmp$leg
legPcs<-pick_uniq_pscan(legMS2)
legPcs<-verif_prec_fine_ht(legPcs,ms1=ms1,mz=mz,errFinePPM=errFinePPM)
x<-Map(function (id,psn) {legMS2[id==legMS2$ID & psn==legMS2$prec_scan,]},legPcs[,"ID"],legPcs[,"prec_scan"])
x<-do.call(rbind,c(x,list(make.row.names=F,stringsAsFactors=F)))[c("ID","aN")]
rownames(x)<-NULL
x<-x[order(x$aN),]
uids<-unique(x$ID)
acN<-MSnbase::acquisitionNum(ms2)
res<-lapply(uids,function(id) {
x<-ms2[match(x[id==x$ID,"aN"],acN)]
fData(x)[,"rtm"]<-MSnbase::rtime(x)/60.
fData(x)[,"maxI"]<-sapply(MSnbase::intensity(x),max)
x})
names(res)<-uids
res
}
filt_ms2_fine <- function(ms1,ms2,mz,ids,adduct,err_coarse_fun,err_fine_fun) {
## This function is supposed to extract only those MS2 spectra for
## which it is proven that the precursor exists within the fine
## error range.
mzrng_c <- gen_mz_range(mz,err_coarse_fun(mz))
mzrng_f <- gen_mz_range(mz,err_fine_fun(mz))
tmp<-filt_ms2_by_prcs_ht(ms2,mzrng=mzrng_c,ids=ids,adduct=adduct)
legMS2<-tmp$leg
message("nrow legMS2:", nrow(legMS2))
legPcs<-pick_uniq_pscan(legMS2)
legPcs<-verif_prec_fine_ht(legPcs,ms1=ms1,mz=mz,mzrng=mzrng_f,ids=ids,adduct=adduct)
## x<-Map(function (id,psn,a) {legMS2[id==legMS2$ID & a==legMS2$adduct & psn==legMS2$prec_scan,]},legPcs[,"ID"],legPcs[,"prec_scan"],legPcs[,"adduct"])
## x <- data.table::rbindlist(x)[,.(ID,adduct,aN)]
x <- legMS2[legPcs[,.(ID,adduct,prec_scan)],on=c("ID","adduct","prec_scan")]
## x<-do.call(rbind,c(x,list(make.row.names=F,stringsAsFactors=F)))[c("ID","aN")]
## rownames(x)<-NULL
x<-x[order(x$aN),]
x
}
extr_ms2<-function(ms1,ms2,ids,mz,adduct,err_coarse_fun, err_fine_fun) {
## Extraction of MS2 EICs and spectra.
x <- filt_ms2_fine(ms1=ms1,
ms2=ms2,
mz=mz,
ids=ids,
adduct=adduct,
err_coarse_fun=err_coarse_fun,
err_fine_fun=err_fine_fun)
## This was here before and obviously wrong when multiple adducts
## correspond to the same ID:
##
## uids <- unique(x$ID)
## uadds <- unique(x$adduct)
idadd <- x[,unique(.SD),.SDcols=c("ID","adduct")]
acN<-MSnbase::acquisitionNum(ms2)
chunks <- Map(function(id,ad) {
ans <- x[id==x$ID & ad==x$adduct,]$aN
sp<-ms2[which(acN %in% ans)]
res <- gen_ms2_spec_blk(sp)
res$ID <- id
res$adduct <- ad
res
},
idadd$ID,idadd$adduct)
data.table::rbindlist(chunks,fill = T)
}
resnort = if (nrow(tnort)>0L) {
mzrng = as.matrix(tnort[,.(iso_fine_min,iso_fine_max)])
new_restab(tnort,MSnbase::chromatogram(ms,mz = mzrng))
} else data.table()
res = rbind(resnort,resrt,fill=T)
res[,rt:=rt/60]
add_ms2_prcs_scans<-function(ms2,idx) {
df<-idx
df$prec_scan<-integer(nrow(idx))
for (i in 1:nrow(df)) {
sn<-df$sn[[i]]
df$prec_scan[[i]]<-MSnbase::precScanNum(ms2[[sn]])
}
## sn<-as.integer(df$sn)
## df$prec_scan[]<-MSnbase::precScanNum(ms2[sn])
##
## This errors with: msLevel for signature "NULL" for a specific
## compound. However, the above approach is cool.
df
}
refn_ms2_by_prec<-function(idxMS2,preFine) {
pf<-preFine[preFine$OK,]
pf$ID<-as.character(pf$ID)
idxMS2$OK<-logical(nrow(idxMS2))
idxMS2$ID<-as.character(idxMS2$ID)
for (n in 1:nrow(idxMS2)) {
scan<-idxMS2$prec_scan[[n]]
id2<-idxMS2$ID[[n]]
ppf<-pf[pf$ID==id2,]
inPF<- ppf$prec_scan %in% scan
idxMS2$OK[[n]]<-any(inPF)
}
idxMS2
}
trim_ms2_by_prec<-function(rawMS2,mz,errCoarse,errFinePPM) {
idxMS2<-filt_ms2_by_prcs(ms2=ms2,mz=mz,errCoarse=errCoarse)
}
grab_ms2_spec<-function(idx,raw) {
idx<-idx[idx$OK,]
IDs<-unique(idx$ID)
res<-lapply(IDs,function (id) {
sn<-idx$sn[idx$ID==id]
spec<-raw[sn]
rts<-MSnbase::rtime(spec)
lmz<-MSnbase::mz(spec)
lI<-MSnbase::intensity(spec)
rts<-rts/60.
names(lmz)<-NULL
names(lI)<-NULL
Map(function (mz,I,rt) {
mat<-matrix(data=0.0,ncol=length(mz),nrow=2,dimnames=list(c("mz","intensity")))
mat["mz",]<-as.numeric(mz)
mat["intensity",]<-as.numeric(I)
list(rt=rt,spec=mat)
},lmz,lI,rts)
})
names(res)<-IDs
get_fdata <- function(ms) {
fdata = as.data.table(fData(ms),keep.rownames="scan")
setkey(fdata,scan)
res = list()
res$ms1 = fdata[msLevel==1L,.(scan,
idx=spIdx)]
res$ms2 = fdata[msLevel==2L,.(scan,
idx=spIdx,
an=acquisitionNum,
rt=retentionTime/60.,
intensity=basePeakIntensity,
ce=collisionEnergy,
prec_mz=precursorMZ,
prec_idx=precursorScanNum)]
res
}
gen_ms2_chrom<-function(ms2Spec) {
lapply(ms2Spec, function(sp)
{
if (length(sp)>0) {
nRow<-length(sp)
mat<-matrix(0.0,nrow=nRow,ncol=2)
rt<-sapply(sp,function(x) x$rt)
ord<-order(rt)
intn<-lapply(sp,function (x) max(x$spec))
rt<-as.numeric(rt[ord])
intn<-as.numeric(intn[ord])
names(intn)<-NULL
names(rt)<-NULL
mat[,1]<-rt
mat[,2]<-intn
colnames(mat)<-c("rt","intensity")
mat
} else list()
})
}
gen_ms1_chrom <- function(raw,mz,errEIC,id,rt=NULL,errRT=NULL) {
mzRng<-gen_mz_range(mz,err = errEIC)
rtRng<-gen_rt_range(rt,err = errRT)
x<-MSnbase::chromatogram(raw,mz=mzRng,msLevel=1,missing=0.0,rt=rtRng)
res<-lapply(x,function (xx) {
rt<-MSnbase::rtime(xx)/60.
ints<-MSnbase::intensity(xx)
df<-dtable(rt=rt,intensity=ints)
df
})
names(res)<-id
res
relate_ms2_to_precid <- function(coarse,ms2,cgram_ms1) {
## Take `coarse' table (the one with coarse mass limits), ms2
## fData and ms1 chromatogram, then relate precids from cgram_ms1
## to ms2 data.
## Select those MS2 entries the parents of which coarsely match
## compound lists masses.
res = ms2[coarse,on=.(prec_mz>iso_coarse_min,prec_mz<iso_coarse_max),.(prec_mz=x.prec_mz,precid,prec_idx,scan,idx,ce,rt,intensity),nomatch=NULL]
setkey(res,precid,prec_idx)
## Now, make sure those coarsely matched MS2 actually have a
## parent that finely matches something in the chromatogram (and
## this is by ensuring that a `precid' with the correct scan (idx)
## shows up in the chromatogram.
x = cgram_ms1[!is.na(intensity)]
x[res,on=.(precid,idx==prec_idx),
.(precid,ce,scan=i.scan,
idx=i.idx,rt=i.rt,
intensity=i.intensity),nomatch=NULL]
}
gen_ms1_chrom_ht<-function(raw,mz,errEIC,rt=NULL,errRT=NULL) {
mzRng<-gen_mz_range(mz,err=errEIC)
rtRng<-gen_rt_range(rt,err=errRT)
res<-MSnbase::chromatogram(raw,mz=mzRng,msLevel=1,missing=0.0,rt=rtRng)
fData(res)[["ID"]]<-rownames(mzRng)
res
}
get_ext_width <- function(maxid) {as.integer(log10(maxid)+1)}
id_fn_ext<-function(width,id) {
formatC(as.numeric(id),width=width,flag=0)
}
write_eic<-function(eic,suff="eic.csv",dir=".",width=get_ext_width(max(as.numeric(names(eic))))) {
Map(function (e,n) {
if (length(e)>0) {
fn<-file.path(dir,paste(id_fn_ext(width,n),suff,sep="."))
tab2file(tab=e,file=fn)
}
},eic,names(eic))
extract_spectra <- function(ms,cgram_ms2) {
## This will extract full MS2 spectra based on ms2 chromatogram entries.
indices = cgram_ms2[,.SD,.SDcol=c("precid","scan","idx")]
res = empty_spectra_table()
selind = indices[,unique(.SD),.SDcol=c("scan","idx")]
sel = ms[selind$idx]
masses = mz(sel)
intensities = intensity(sel)
res = selind
setkey(res,scan)
res = res[,data.table(mz=masses[[scan]],
intensity=intensities[[scan]]),
keyby=c("scan")]
res[indices,on=.(scan),precid:=i.precid]
}
write_ms2_spec<-function(ms2Spec,dir=".") {
ids<-as.numeric(names(ms2Spec))
maxid<-max(ids)
width<-get_ext_width(maxid)
for (id in names(ms2Spec)) {
sp<-ms2Spec[[id]]
if (length(sp)>0) {
dr<-file.path(dir,id_fn_ext(width,id))
dir.create(path=dr,showWarnings=F)
for (s in sp) {
fn<-file.path(dr,paste("RT_",s$rt,"_spectrum.csv",sep=""))
df<-t(s$spec)
colnames(df)<-c("mz","intensity")
tab2file(tab=df,file=fn)
}
## PRESCREENING
## This function extracts intensity maxima on intervals given by
## RT vectors rt_1 and rt_2.
find_ms1_max <- function(rt,intensity,rt_1,rt_2)
{
x = mapply(function (rt_1,rt_2) {
rt_ival <- c(rt_1,rt_2)
intv <- findInterval(rt,rt_ival)
lintv = length(intv)
if (intv[1]==0L && intv[lintv] == 2L) {
pos = match(c(1L,2L),intv)
} else if (intv[1]==1L && intv[lintv]!=1L) {
pos = c(1L,match(2L,intv))
} else if (intv[1]==0L && intv[lintv]!=0L) {
pos = c(match(1L,intv),lintv)
} else {
pos = c(1L,lintv)
}
}
}
extr_msnb <-function(file,wd,mz,errEIC, errFinePPM,errCoarse=0.5,rt=NULL,errRT=NULL,mode="inMemory") {
## Perform the entire data extraction procedure.
##
## file - The input mzML file.
## wd - Top-level directory where the results should be deposited.
## mz - A named vector of precursor masses for which to scan the
## file. The names can be RMassBank IDs.
## rt - A named vector of length 1, or same as mz, giving the retention
## times in minutes. The names should be the same as for mz.
## errRT - A vector of length 1, or same as mz, giving the
## half-width of the time window in which the peak for the
## corresponding mz is supposed to be.
## errEIC - Absolute mz tolerance used to extract precursor EICs.
## errFinePPM - Tolerance given in PPM used to associate input
## masses with what the instrument assigned as precursors to MS2
## products.
message("Loading ",file," in mode ",mode, ".")
data<-load_raw_data(file,mode=mode)
ms1<-data[["ms1"]]
ms2<-data[["ms2"]]
message("Done loading ",file,".")
## EICs for precursors.
message("Extracting precursor EICs. Please wait.")
eicMS1<-gen_ms1_chrom(raw=ms1,mz=mz,errEIC=errEIC,rt=rt,errRT=errRT)
write_eic(eicMS1,dir=wd)
message("Extracting precursor EICs finished.")
## Extract MS2 spectra.
message("Extracting MS2 spectra.")
idxMS2<-filt_ms2_by_prcs(ms2=ms2,mz=mz,errCoarse=errCoarse)
message("Resampling MS2 spectra.")
# idxMS2<-add_ms2_prcs_scans(ms2,idxMS2)
prsc<-pick_unique_precScans(idxMS2)
vprsc<-verif_prec_fine(preSc=prsc,ms1=ms1,mz=mz,errFinePPM = errFinePPM)
idxMS2<-refn_ms2_by_prec(idxMS2=idxMS2,preFine=vprsc)
message("Resampling MS2 spectra finished.")
pmax = pos[[1]] + which.max(intensity[pos[[1]]:pos[[2]]]) - 1L
if (length(pmax)==0L) pmax = pos[[1]]
c(rt[pmax],intensity[pmax])
}, rt_1, rt_2, USE.NAMES=F)
x
ms2Spec<-grab_ms2_spec(idxMS2,ms2)
eicMS2<-gen_ms2_chrom(ms2Spec)
message("Extracting MS2 spectra finished.")
write_eic(eicMS2,dir=wd,suff="kids.csv",width=get_ext_width(max(as.numeric(names(eicMS1)))))
specDir<-file.path(wd,"ms2_spectra")
dir.create(specDir,showWarnings = F)
write_ms2_spec(ms2Spec,dir=specDir)
message("Done with ", file)
}
##' @importFrom MSnbase filterMz
extr_msnb_ht <-function(file,wd,mz,errEIC, errFinePPM,errCoarse,fnSpec,rt=NULL,errRT=NULL,mode="onDisk") {
## Perform the entire data extraction procedure.
##
## file - The input mzML file.
## wd - Top-level directory where the results should be deposited.
## mz - A named vector of precursor masses for which to scan the
## file. The names can be RMassBank IDs.
## rt - A named vector of length 1, or same as mz, giving the retention
## times in minutes. The names should be the same as for mz.
## errRT - A vector of length 1, or same as mz, giving the
## half-width of the time window in which the peak for the
## corresponding mz is supposed to be.
## errEIC - Absolute mz tolerance used to extract precursor EICs.
## errFinePPM - Tolerance given in PPM used to associate input
## masses with what the instrument assigned as precursors to MS2
## products.
message("Loading ",file," in mode ",mode, ".")
data<-load_raw_data(file,mode=mode)
ms1<-data[["ms1"]]
ms2<-data[["ms2"]]
message("Done loading ",file,".")
## Filtering
mzCrs<-gen_mz_range(mz=mz,err=errCoarse)
mzMin<-min(mzCrs)
mzMax<-max(mzCrs)
ms1<-filterMz(ms1,c(mzMin,mzMax))
fms2<-filt_ms2(ms1,ms2,mz,errCoarse=errCoarse,errFinePPM=errFinePPM)
## EICs for precursors.
message("Extracting precursor EICs. Please wait.")
eicMS1<-gen_ms1_chrom_ht(raw=ms1,mz=mz,errEIC=errEIC,rt=rt,errRT=errRT)
message("Extracting precursor EICs finished.")
analyse_extracted_data <- function(db,prescreen_param) {
## Note
##
## I am working on this two days before the group meeting. The
## point of a meeting is to have something to show, so I will just
## minimally adapt the old `analyse_extracted_data' to the new
## `db' entries in the state. I suspect, even this is not going to
## be very easy.
##
## If no meeting was happening, then I'd create a nice, sleek
## function that fully adheres to the new `data model'
## philosophy. Alas, ...
ms1 = db$extr$cgm$ms1
ms2 = db$extr$cgm$ms2
spectra = db$extr$spectra
precursors = db$precursors
x<-list(eic=eicMS1,ms2=fms2)
saveRDS(object=x,file=file.path(wd,fnSpec))
x
}
## Get file info.
ms2_noise_table = precursors[spectra,.(file,intensity),on="precid",by=.EACHI,nomatch=NULL]
## Calculate threshold.
ms2_noise_table[,threshold:=0.33333333*mean(intensity),by="file"]
extr_eic_ms1 <- function(tab,err) {
## Asynchronous extraction of ms1 spectra. The result is a list of
## running futures.
file <- unique(tab$file)
## Reduce table.
ms2_noise_table = ms2_noise_table[,.(threshold=first(threshold,1L)),keyby="precid"]
res <-lapply(file,function (fn) future::futur(extr_fn(fn), lazy=T))
names(res) <- file
res
}
##' @importFrom MSnbase filterMz readMSData
##' @export
extract <- function(fn,tag,tab,err_ms1_eic.,err_coarse,err_fine,err_rt.,missing_precursors) {
## Extracts MS1 and MS2 EICs, as well as MS2 spectra, subject to
## tolerance specifications.
## Parameters.
presconf = conf_trans_pres(prescreen_param)
rt_shift = presconf$ret_time_shift_tol
det_ms2_noise = presconf$det_ms2_noise
ms2_int_thresh = presconf$ms2_int_thresh
ms1_int_thresh = presconf$ms1_int_thresh
## We start populating the ms2 qa table.
tab_ms2 = copy(ms2)
## Calculate noise.
tab_ms2[ms2_noise_table,qa_ms2_good_int:=intensity>threshold,on=.(precid)]
## Rename as downstream wants it.
setnames(tab_ms2,c("rt","intensity"),c("ms2_rt","ms2_int"))
tab_ms2[,`:=`(rt_left = ms2_rt - rt_shift,rt_right = ms2_rt + rt_shift)]
## Get mean ms1 value.
tab_ms1 = copy(ms1)
## TODO: Still detecting external references ... but which?
## However, the results check out, compared to sequential access.
err_coarse_fun <- gen_mz_err_f(err_coarse,
"ms1 coarse error: Only ppm, or Da units allowed.")
## To (artificially) differentiate beween ms1 and ms2 (because,
## they get stapled together later on, set scan to NA_character_.
tab_ms1[,scan:=NA_character_]
tab_ms1_mean = tab_ms1[,.(ms1_mean=mean(intensity,na.rm=T)),keyby="precid"]
err_fine_fun <- gen_mz_err_f(err_fine,
"ms1 fine error: Only ppm, or Da units allowed.")
## Perform MS1 maxima calculation in the neighbourhood of each
## MS2 result.
err_ms1_eic <- gen_mz_err_f(err_ms1_eic.,
"eic error: Only ppm, or Da units allowed.")
tmp = tab_ms1[tab_ms2,
{
xx = find_ms1_max(rt,intensity,i.rt_left,i.rt_right)
.(scan=i.scan,
ms1_rt = xx[1,],
ms1_int = xx[2,])
},on=.(precid),
by=.EACHI,
nomatch=NULL]
## Calculate QA values.
tab_ms2[tmp,on=.(precid,scan),c("ms1_rt","ms1_int"):=.(i.ms1_rt,i.ms1_int)]
tab_ms2[,c("rt_left","rt_right"):=c(NULL,NULL)]
tab_ms2[tab_ms1_mean,ms1_mean:=i.ms1_mean]
tab_ms2[,`:=`(qa_ms1_good_int=fifelse(ms1_int>ms1_int_thresh,T,F),
qa_ms1_above_noise=F,
qa_ms2_near=F)]
## TODO: I wonder if so stupidly auto-calculated ms1 noise should
## be taken into account at all? My recollection from earlier
## times was that it was helpful, at least sometimes.
tab_ms2[qa_ms1_good_int==T,qa_ms1_above_noise:=fifelse(ms1_int>ms1_mean/3.,T,F)]
tab_ms2[qa_ms1_good_int==T & qa_ms1_above_noise==T & qa_ms2_good_int==T,qa_ms2_near:=T]
tab_ms2$qa_ms2_exists=T
## Reduce tab_ms1 to only to MS1 with no children.
precs_ms1 = tab_ms1[,unique(precid)]
precs_ms2 = tab_ms2[,unique(precid)]
precs_noms = data.table(precid=precs_ms1[!(precs_ms1 %in% precs_ms2)])
tab_noms2 = tab_ms1[precs_noms,
on=.(precid),
.SD,nomatch=NULL]
tab_noms2 = tab_noms2[,.(ms1_mean=mean(intensity,na.rm=T),
ms1_rt=rt[which.max(intensity)],
ms1_int=max(intensity, na.rm=T)),
keyby="precid"]
## QA for the above (lazy qa ... take only the max peak into account).
tab_noms2[,c("qa_ms1_good_int","qa_ms1_above_noise"):=.(ms1_int>ms1_int_thresh,ms1_int>ms1_mean/3.)]
## MS2 QA criteria all fail.
tab_noms2[,c("qa_ms2_exists","qa_ms2_good_int","qa_ms2_near"):=.(F,F,F)]
## Bind MS1-only and MS1/MS2 entries together.
res = rbind(tab_ms2,tab_noms2,fill=T,use.names=T)
## If ms1_int has been calculated as a Na(N) value, this means
## that no MS1 has been found for that precid.
res[,qa_ms1_exists:=F]
res[!is.na(ms1_int),qa_ms1_exists:=T]
data.table::setkey(res,precid)
err_rt <- gen_rt_err(err_rt.,
"rt error: Only s(econds), or min(utes) allowed.")
tab <- data.table::as.data.table(tab)
## chunk <- tab[file==fn]
mz <- tab$mz
rt <- tab$rt
id <- tab$ID
adduct <- tab$adduct
names(mz) <- id
names(rt) <- id
mzerr <- err_coarse_fun(mz)
mzrng <- gen_mz_range(mz=mz,err=mzerr)
rtrng <- gen_rt_range(rt=rt,err=err_rt)
mzmin <- min(mzrng)
mzmax <- max(mzrng)
read_ms1 <- function() {
ms1 <- readMSData(file=fn,msLevel=1,mode="onDisk")
ms1 <- filterMz(ms1,mz=c(mzmin,mzmax),msLevel=1)
ms1
}
read_ms2 <- function() {
ms2 <- MSnbase::readMSData(file=fn,msLevel=2,mode="onDisk")
ms2
}
read_all <- function() {
MSnbase::readMSData(file=fn,msLevel. = c(1,2),mode="onDisk")
}
extr_ms1_eic <- function(ms1) {
eic <- MSnbase::chromatogram(ms1,mz=mzrng,msLevel=1,missing=0.0,rt=rtrng)
bits <- dtable(N=sapply(eic,NROW))
bigN <- bits[,sum(N)]
bits[,idx:=paste0('I',.I)]
bits$ID <- id
bits$adduct <- adduct
bits$tag <- tag
res<-dtable(rt=numeric(bigN),
intensity=numeric(bigN),
tag=tag,
adduct=bits[,rep(adduct,N)],
ID=bits[,rep(ID,N)],
idx=bits[,rep(idx,N)])
data.table::setkey(res,idx)
names(eic)<-bits$idx
res[,c("rt","intensity") :=
.(MSnbase::rtime(eic[[idx]])/60.,
MSnbase::intensity(eic[[idx]])),
by=idx]
data.table::setkeyv(res,BASE_KEY)
res
}
qflg = QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
res[,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
res[.(T),del_rt:=abs(ms2_rt - ms1_rt),on="qa_pass",by='scan']
resby = BASE_KEY_MS2[! (BASE_KEY_MS2 %in% 'scan')]
res[.(T),qa_tmp_ms1_max:= ms1_int==max(ms1_int),on="qa_pass",by=resby]
res[,ms2_sel:=F]
res[.(T,T),ms2_sel:= del_rt == del_rt[which.min(del_rt)],on=c("qa_pass","qa_tmp_ms1_max"),by=resby]
res[,qlt_ms1:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms1_exists",
"qa_ms1_above_noise",
"qa_ms1_good_int")]
res[,qlt_ms2:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms2_exists",
"qa_ms2_near",
"qa_ms2_good_int")]
res[is.na(qlt_ms1),qlt_ms1:=0L]
res[is.na(qlt_ms2),qlt_ms2:=0L]
## Set all other flags to false when qa_ms1_exists == F by decree.
flgs = c(QA_FLAGS,"ms2_sel")
res[qa_ms1_exists == F,(flgs):=F]
if (is.null(missing_precursors)) missing_precursors <- "do_nothing"
if (missing_precursors != "do_nothing") {
ms <- clean_na_precs(read_all(),missing_precursors = missing_precursors)
fdta <- as.data.table(MSnbase::fData(ms),keep.rownames = "rn")
fdta[,idx := .I]
idx_ms1 <- fdta[msLevel == 1,idx]
idx_ms2 <- fdta[msLevel == 2,idx]
ms1 <- ms[idx_ms1]
ms2 <- ms[idx_ms2]
} else {
ms1 <- read_ms1()
ms2 <- read_ms2()
}
res_ms1 <- extr_ms1_eic(ms1)
res_ms2 <- extr_ms2(ms1=ms1,
ms2=ms2,
ids=id,
mz=mz,
adduct=adduct,
err_coarse_fun=err_coarse_fun,
err_fine_fun=err_fine_fun)
res_ms2[,"tag":=tag]
## Clean all the NA intensity MS2 (sometimes a consequence of 'fill'.)
res_ms2<-res_ms2[!is.na(intensity)]
res <- list(ms1=res_ms1,
ms2=res_ms2)
res
}
gen_ms2_spec_blk <- function(spectra) {
dt <- dtable(mz=MSnbase::mz(spectra),
intensity=MSnbase::intensity(spectra),
rt = lapply(MSnbase::rtime(spectra),function (z) z/60.),
CE = MSnbase::collisionEnergy(spectra),
an = MSnbase::acquisitionNum(spectra))
dt[,maspI:=sapply(intensity,function (zz) max(zz))]
data.table::rbindlist(apply(dt,1,function(row) dtable(intensity=row[["intensity"]],
rt = row[["rt"]],
mz = row[["mz"]],
CE = row[["CE"]],
an = row[["an"]])))
}
......@@ -36,7 +36,7 @@ metfrag_get_stag_tab <- function(summ) {
## Argument summ can be a subset of actual `summ' table.
x = gen_1d_keytab(summ)
data.table::setnames(x,old="key1d",new="stag")
res = x[summ,`:=`(CE=i.CE,ion_mz=mz)]
res = x[summ,`:=`(ce=i.ce,ion_mz=mz)]
res
}
......@@ -54,18 +54,21 @@ get_mf_res_ext <- function(fn) {
metfrag_run <- function(param,path,subpaths,db_file,stag_tab,ms2,runtime,java_bin,nproc = 1L) {
keys = intersect(colnames(stag_tab),colnames(ms2))
rms2 = ms2[stag_tab,on=keys,nomatch=NULL]
message("Generating MetFrag configs.")
file_tab = ms2[stag_tab,{
file_tab = rms2[,{
r = write_metfrag_config(param = ..param,
path = ..path,
subpaths = ..subpaths,
db_file = ..db_file,
stag = stag,
adduct = adduct,
ion_mz = ion_mz,
stag = first(stag),
adduct = first(adduct),
ion_mz = first(ion_mz),
spec = data.table(mz=mz,intensity=intensity))
c(r,stag = stag)
},by=.EACHI,on=keys]
c(r,stag = first(stag))
},keyby=keys]
message("Done generating MetFrag configs.")
withr::with_dir(path,{
......@@ -89,33 +92,30 @@ metfrag_run <- function(param,path,subpaths,db_file,stag_tab,ms2,runtime,java_bi
mf_narrow_summ <- function(summ,kv,ms2_rt_i=NA_integer_,ms2_rt_f=NA_integer_) {
skey = data.table::key(summ)
cols = c("adduct","tag","ID","CE","an","mz","qa_pass","ms2_rt")
nsumm = get_rows_from_summ(summ,kv,cols)
cols = union(names(skey),c("adduct","tag","ID","ce","precid","scan","mz","qa_pass","ms2_rt"))
dtkv = as.data.table(kv)
nsumm = summ[dtkv,on=names(kv),.SD,.SDcols=cols]
nsumm = nsumm[qa_pass==T] # Those that make sense.
nsumm_key = union(SUMM_KEY,"ms2_rt")
nsumm_key = intersect(union(SUMM_KEY,"ms2_rt"),colnames(nsumm))
data.table::setkeyv(nsumm,nsumm_key)
if (!is.na(ms2_rt_i)) {
nsumm = nsumm[ms2_rt>(ms2_rt_i)]
}
ms2_rt_i = if (!is.na(ms2_rt_i)) ms2_rt_i else 0.
ms2_rt_f = if (!is.na(ms2_rt_f)) ms2_rt_f else Inf
if (!is.na(ms2_rt_f)) {
nsumm = nsumm[ms2_rt<(ms2_rt_f)]
}
nsumm[ms2_rt > (ms2_rt_i) & ms2_rt < (ms2_rt_f)]
nsumm
}
get_metfrag_targets <- function(stag_tab,ms2) {
## Take the columns we need from summ.
x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz")]
mrg_keys = c(intersect(key(ms2),key(summ)),"an")
mrg_keys = c(intersect(key(ms2),key(summ)),"scan")
x=ms2[x,.(CE=CE,ion_mz=i.mz,mz,intensity),on=mrg_keys,by=.EACHI]
## Get column order so that `an' follows `CE'.
resnms = setdiff(mrg_keys,"an")
nms = union(union(resnms,"CE"),c("an","ion_mz","mz","intensity"))
resnms = setdiff(mrg_keys,"scan")
nms = union(union(resnms,"CE"),c("scan","ion_mz","mz","intensity"))
data.table::setcolorder(x,neworder = nms)
setkeyv(x,unique(c(resnms,"CE","an")))
setkeyv(x,unique(c(resnms,"CE","scan")))
x
}
......@@ -220,7 +220,7 @@ summarise_metfrag_results <- function(param,path,subpaths,cand_parameters,db_sco
}
.adapt_col_types <- function(x) {
x[,(names(db_scores)):=lapply(.SD, as.numeric),.SDcol=names(db_scores)]
if (length(db_scores)>0) x[,(names(db_scores)):=lapply(.SD, as.numeric),.SDcol=names(db_scores)] else x
}
.calc_basic_scores <- function(x) {
......
......@@ -191,7 +191,7 @@ gen_empty_summ <- function() {
## ms2_cols <- intersect(colnames(qa_ms2),SUMM_COLS)
## ms2_cols <- setdiff(ms2_cols,colnames(summ))
## summ <- qa_ms2[summ,c(..comp_cols,..ms1_cols,..ms2_cols),on=BASE_KEY]
## data.table::setkeyv(summ,c(BASE_KEY_MS2,"an"))
## data.table::setkeyv(summ,c(BASE_KEY_MS2,"scan"))
## summ[,qa_ms1_exists:=the_ifelse(!is.na(qa_ms1_good_int),T,F)]
## summ[,qa_ms2_exists:=the_ifelse(!is.na(CE),T,F)]
## summ[,qa_pass:=apply(.SD,1,all),.SDcols=QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]]
......@@ -547,274 +547,6 @@ conf_trans_pres <- function(pres_list) {
pres_list
}
## PRESCREENING
create_qa_table <- function(extr,conf_presc) {
## The first input argument is the extracted `ms`, table
## containing MS1 and MS2 spectra. The argument `conf_presc` is
## m$conf$prescreen, the prescreening parameters given in the conf
## file.
## The qa table is just a copy of ms with added quality control
## columns QA_COLS.
## The QA_FLAGS columns are flags specifying which properties of
## compounds are known well, or not.
## For each compound (mass) we ask the following questions:
## qa_ms1_exists -- does the MS1 spectrum exist at all?
## qa_ms2_exists -- do we have any MS2 spectra at all?
## qa_ms1_above_noise -- is MS1 above the noise treshold?
## qa_ms2_near -- is there any MS2 spectrum inside the tolerated
## retention time window around the MS1 peak? That is, are we
## non-RT-shifted?
## qa_ms2_good_int -- Is there any MS2 spectral intensity greater
## than the MS2 threshold and less than the MS1 peak?
## qa_pass -- did the spectrum pass all the checks?
## The columns in QA_NUM_REAL are:
##
## ms1_int -- the maximum intensity of MS1 spectrum over the
## entire run;
##
## ms1_rt -- the retention time of the peak MS1.
## The columns in QA_NUM_INT are:
##
## ms2_sel -- index of the selected MS2 spectrum; if not NA, the
## associated spectrum passed all the checks (qa_pass == T); the
## spectrum itself is in one of the member sublists of the `spec'
## column. The integer `ms2_sel' is then the index of the spectrum
## in that sublist.
##
## ms1_rt_ind -- TODO (but not important to end users).
qa <- list(prescreen=conf_presc)
## checks <- extr$ms2[,{
## },keyby=BASE_KEY_MS2]
checks <- extr$ms2
checks[,(QA_FLAGS):=F]
## message("checks:")
## print(checks)
## message("done checks")
checks[,(QA_NUM_INT):=NA_integer_]
checks[,(QA_NUM_REAL):=NA_real_]
setkeyv(checks,BASE_KEY_MS2)
qa$checks <- checks
qa
}
assess_ms1 <- function(m) {
qa <- m$qa
ms1 <- m$extr$ms1
## Calculate auxiliary variables and indices.
qa_ms1 <- ms1[,.(ms1_rt_ind=which.max(intensity)),keyby=BASE_KEY]
qa_ms1 <- ms1[qa_ms1,.(ms1_rt_ind=ms1_rt_ind,
ms1_int=intensity[[ms1_rt_ind]],
ms1_rt=rt[[ms1_rt_ind]],
ms1_mean=mean(intensity)),on=BASE_KEY,by=.EACHI]
qa_ms1[,qa_ms1_good_int := ms1_int > qa$prescreen$ms1_int_thresh]
qa_ms1[,qa_ms1_above_noise := F]
qa_ms1[qa_ms1_good_int==T,qa_ms1_above_noise := .(ms1_int > qa$prescreen$s2n*ms1_mean)]
## checks[(!qa_ms1_above_noise),c("qa_ms2_good_int","qa_ms2_near","qa_ms2_exists","qa_pass"):=F]
## qa_ms1 <- check_ms1_noise(check_ms1(qa_ms1))
m$qa$ms1 <- qa_ms1
m
}
assess_ms2 <- function(m) {
presconf <- conf_trans_pres(m$conf$prescreen)
ms1 <- m$extr$ms1
ms2 <- m$extr$ms2
qa_ms1 <- m$qa$ms1
qa_ms2 <- ms2[qa_ms1[qa_ms1_above_noise==T],.(CE=unique(CE),
pc_rt=i.ms1_rt,
pc_int=i.ms1_int,
an=unique(an)),on=BASE_KEY,by=.EACHI,nomatch=NULL]
rt_win2 <- presconf$ret_time_shift_tol
qa_ms2 <- ms2[qa_ms2,.(pc_rt=pc_rt,
pc_int=pc_int,
ms2_int=max(intensity),
ms2_rt=unique(rt),
qa_ms2_near=head(rt,1) < pc_rt + rt_win2 & head(rt,1) > pc_rt - rt_win2),
by=.EACHI,on=c(BASE_KEY_MS2,"an")]
qa_ms2$qa_ms2_good_int <-F
qa_ms2[qa_ms2_near==T,
qa_ms2_good_int := ms2_int > presconf$ms2_int_thresh & ms2_int < pc_int,
by=c(BASE_KEY_MS2,"an")]
## qa_ms2$qa_pass <- F
## qa_ms2[qa_ms2_good_int==T,qa_pass:=T]
qa_ms2$ms2_sel <- F
qa_ms2[qa_ms2_good_int==T,ms2_sel:={
ind<-which.min(abs(ms2_rt-pc_rt))
z<-ms2_sel
z[[ind]]<-T
z
},by=BASE_KEY_MS2]
setkeyv(qa_ms2,BASE_KEY_MS2)
m$qa$ms2 <- qa_ms2
m
}
## Analyze extracted data.
analyse_extracted_data <- function(extr,prescreen_param) {
ms1 <- extr$ms1
ms2 <- extr$ms2
## Parameters.
presconf <- conf_trans_pres(prescreen_param)
rt_shift <- presconf$ret_time_shift_tol
det_ms2_noise <- presconf$det_ms2_noise
ms2_int_thresh <- presconf$ms2_int_thresh
ms1_int_thresh <- presconf$ms1_int_thresh
## Detect MS2 noise.
ms2_clc_ns <- if (det_ms2_noise) {
if (ms2_int_thresh>0) {
warning("Ignore user specified ms2_int_thresh value and autodetect noise instead.")
}
ms2[,ms2_thr:=0.33333333*mean(intensity),by="tag"]
} else {
ms2[,ms2_thr:=ms2_int_thresh]
}
## message('ms2_clc_ns:',key(ms2_clc_ns))
## ms2_clc_ns <- ms2_clc_ns[intensity>ms2_thr]
## We drop mz info.
tab_ms2 <- ms2_clc_ns[,.(ms2_rt=first(rt),ms2_int=max(intensity),ms2_thr=first(ms2_thr)),by=c(BASE_KEY_MS2,'an')]
tab_ms2[,qa_ms2_good_int:=ms2_int>ms2_thr,by="an"]
data.table::setkeyv(tab_ms2,BASE_KEY_MS2)
tab_ms2[,`:=`(rt_left = ms2_rt - rt_shift,rt_right = ms2_rt + rt_shift)]
## Get mean ms1 value.
tab_ms1 <- extr$ms1
tab_ms1_mean <- tab_ms1[,.(ms1_mean=mean(intensity)),keyby=BASE_KEY]
## This function extracts intensity maxima on intervals given by
## RT vectors rt_1 and rt_2.
find_ms1_max <- function(rt,intensity,rt_1,rt_2)
{
mapply(function (rt_1,rt_2) {
rt_ival <- c(rt_1,rt_2)
intv <- findInterval(rt,rt_ival)
lintv = length(intv)
if (intv[1]==0L && intv[lintv] == 2L) {
pos = match(c(1L,2L),intv)
} else if (intv[1]==1L && intv[lintv]!=1L) {
pos = c(1L,match(2L,intv))
} else if (intv[1]==0L && intv[lintv]!=0L) {
pos = c(match(1L,intv),lintv)
} else {
pos = c(1L,lintv)
}
pmax = pos[[1]] + which.max(intensity[pos[[1]]:pos[[2]]]) - 1L
c(rt[pmax],intensity[pmax])
}, rt_1, rt_2, USE.NAMES=F)
}
## Perform MS1 maxima calculation in the neighbourhood of each
## MS2 result.
tmp = tab_ms1[tab_ms2,{
xx = find_ms1_max(rt,intensity,i.rt_left,i.rt_right)
.(an=i.an,
ms1_rt = xx[1,],
ms1_int = xx[2,])
},by=.EACHI, nomatch=NULL]
## Calculate QA values.
tab_ms2[tmp,c("ms1_rt","ms1_int"):=.(i.ms1_rt,i.ms1_int),on=c(BASE_KEY,'an')]
tab_ms2[,c("rt_left","rt_right"):=c(NULL,NULL)]
tab_ms2[tab_ms1_mean,ms1_mean:=i.ms1_mean]
tab_ms2[,`:=`(qa_ms1_good_int=fifelse(ms1_int>ms1_int_thresh,T,F),
qa_ms1_above_noise=F,
qa_ms2_near=F)]
## TODO: I wonder if so stupidly auto-calculated ms1 noise should
## be taken into account at all? My recollection from earlier
## times was that it was helpful, at least sometimes.
tab_ms2[qa_ms1_good_int==T,qa_ms1_above_noise:=fifelse(ms1_int>ms1_mean/3.,T,F)]
tab_ms2[qa_ms1_good_int==T & qa_ms1_above_noise==T & qa_ms2_good_int==T,qa_ms2_near:=T]
tab_ms2$qa_ms2_exists=T
## Find MS1 with no corresponding MS2.
ms1key <- tab_ms1[,unique(.SD),.SDcol=BASE_KEY]
ms2key <- tab_ms2[,unique(.SD),.SDcol=BASE_KEY]
ms2key$mch=T
tabmatches <- ms2key[ms1key]
ms1woms2 <- tabmatches[is.na(mch)][,mch:=NULL]
## calculate the most intense peak, its location and the mean for
## childless MS1.
tab_noms2 <- tab_ms1[ms1woms2,.(ms1_mean=mean(intensity),ms1_rt=rt[which.max(intensity)],ms1_int=max(intensity)),by=.EACHI,nomatch=NULL]
## QA for the above (lazy qa ... take only the max peak into account).
tab_noms2[,c("qa_ms1_good_int","qa_ms1_above_noise"):=.(ms1_int>ms1_int_thresh,ms1_int>ms1_mean/3.)]
## MS2 QA criteria all fail.
tab_noms2[,c("qa_ms2_exists","qa_ms2_good_int","qa_ms2_near"):=.(F,F,F)]
## Bind MS1-only and MS1/MS2 entries together.
res <- rbind(tab_ms2,tab_noms2,fill=T,use.names=T)
## Every single entry which was extracted has at least MS1.
res[,qa_ms1_exists:=T]
data.table::setkeyv(res,BASE_KEY)
qflg <- QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
res[,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
res[.(T),del_rt:=abs(ms2_rt - ms1_rt),on="qa_pass",by='an']
resby <- BASE_KEY_MS2[! (BASE_KEY_MS2 %in% 'an')]
res[.(T),qa_tmp_ms1_max:= ms1_int==max(ms1_int),on="qa_pass",by=resby]
res[,ms2_sel:=F]
res[.(T,T),ms2_sel:= del_rt == del_rt[which.min(del_rt)],on=c("qa_pass","qa_tmp_ms1_max"),by=resby]
res[,qlt_ms1:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms1_exists",
"qa_ms1_above_noise",
"qa_ms1_good_int")]
res[,qlt_ms2:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms2_exists",
"qa_ms2_near",
"qa_ms2_good_int")]
res
}
## Based on the `comprehensive' and `qa' tabs, greate `summ'.
gen_summ <- function(comp,qa) {
comp_cols <- intersect(SUMM_COLS,colnames(comp))
rdcomp <- comp[,..comp_cols]
data.table::setkeyv(rdcomp,BASE_KEY)
summ <- qa[rdcomp,nomatch=F] #We changed `nomatch' cases from NA
#to F, because NA does not work well
#with X == F condition.
## flgs <- c(QA_FLAGS,"ms2_sel")
## summ[is.na(qa_ms1_exists),(flgs):=F]
data.table::setkeyv(summ,SUMM_KEY)
summ[.(F),c("qlt_ms1","qlt_ms2"):=0.,on="qa_ms1_exists"]
summ
}
gen_mz_err_f <- function(entry,msg) {
eppm <- grab_unit(entry,"ppm")
eda <- grab_unit(entry,"Da")
......
......@@ -21,8 +21,8 @@ aes <- ggplot2::aes
plot_fname <- function(kvals) {
if (!is.null(kvals)) {
kparts <- mapply(paste0,names(kvals),kvals,USE.NAMES=F)
stump <- paste(kparts,collapse="_")
kparts = mapply(paste0,names(kvals),kvals,USE.NAMES=F)
stump = paste(kparts,collapse="_")
paste0("plot_",stump,".pdf")
} else 'default.pdf'
}
......@@ -33,23 +33,23 @@ is_fname_rds <- function(fn) {
}
sci10 <- function(x) {
prefmt <- formatC(x,format="e",digits=2)
bits <- strsplit(prefmt,split="e")
bits1 <-sapply(bits,function(x) {
prefmt = formatC(x,format="e",digits=2)
bits = strsplit(prefmt,split="e")
bits1 =sapply(bits,function(x) {
if (length(x) > 1) {
res <- x[[1]]
res = x[[1]]
sub(" ","~",res)
} else {
x
}
})
bits2 <-sapply(bits,function(x) if (length(x)>1) paste0(" %*% 10^","'",sub("[+]"," ",x[[2]]),"'") else "")
txt <- mapply(function(b1,b2) if (nchar(b2)!=0) {paste0("'",b1,"'",b2)} else NA,
bits2 =sapply(bits,function(x) if (length(x)>1) paste0(" %*% 10^","'",sub("[+]"," ",x[[2]]),"'") else "")
txt = mapply(function(b1,b2) if (nchar(b2)!=0) {paste0("'",b1,"'",b2)} else NA,
bits1,
bits2,
SIMPLIFY = F)
names(txt) <- NULL
txt <- gsub(pattern = "^'0\\.00'.*$"," 0",x=txt)
names(txt) = NULL
txt = gsub(pattern = "^'0\\.00'.*$"," 0",x=txt)
parse(text=txt)
......@@ -58,15 +58,15 @@ sci10 <- function(x) {
scale_legend <- function(colrdata,pdata) {
if (is.null(colrdata) || is.null(pdata)) NULL
labs <- data.table::key(colrdata)
labs = data.table::key(colrdata)
sdcols <- c(labs,"label")
sdcols = c(labs,"label")
tab_lab <- pdata[,unique(.SD),.SDcols=sdcols][colrdata,.(colour=i.colour),on=labs,nomatch=NULL]
tab_lab = pdata[,unique(.SD),.SDcols=sdcols][colrdata,.(colour=i.colour),on=labs,nomatch=NULL]
x <- tab_lab$colour
x = tab_lab$colour
names(x) <- tab_lab$label
names(x) = tab_lab$label
ggplot2::scale_colour_manual(values=x)
}
......@@ -79,15 +79,15 @@ pal_maker <- function(n,palname = NULL) {
## the first colours to the original palette, then go over to the
## generated ones. Returns a vector of colours.
krzywinski <- c("#68023F","#008169","#EF0096","#00DCB5",
krzywinski = c("#68023F","#008169","#EF0096","#00DCB5",
"#FFCFE2","#003C86","#9400E6","#009FFA",
"#FF71FD","#7CFFFA","#6A0213","#008607")
info <- as.data.table(RColorBrewer::brewer.pal.info,keep.rownames = T)
maxcol <- if (!is.null(palname)) info[rn == palname]$maxcolors else length(krzywinski)
startpal <- if (!is.null(palname)) RColorBrewer::brewer.pal(maxcol,palname) else krzywinski
pal <- if (n>length(startpal)) {
intrppal <-(colorRampPalette(startpal))(n)
newcol <- setdiff(intrppal,startpal)
info = as.data.table(RColorBrewer::brewer.pal.info,keep.rownames = T)
maxcol = if (!is.null(palname)) info[rn == palname]$maxcolors else length(krzywinski)
startpal = if (!is.null(palname)) RColorBrewer::brewer.pal(maxcol,palname) else krzywinski
pal = if (n>length(startpal)) {
intrppal =(colorRampPalette(startpal))(n)
newcol = setdiff(intrppal,startpal)
unique(c(startpal,intrppal))
} else startpal
......@@ -106,16 +106,16 @@ make_struct_plot <- function(smiles, kekulise=TRUE, width=300, height=300,
## structure -> grob
smiles2img <- function() {
if (is.na(smiles) || nchar(smiles)==0) return(NULL) #Handle empty SMILES.
dep <- rcdk::get.depictor(width = width, height = height, zoom = zoom, style = style, annotate = annotate,
dep = rcdk::get.depictor(width = width, height = height, zoom = zoom, style = style, annotate = annotate,
abbr = abbr, suppressh = suppressh, showTitle = showTitle, smaLimit = smaLimit,
sma = NULL)
mol <- RMassBank::getMolecule(smiles)
z<-rcdk::view.image.2d(mol, depictor=dep)
mol = RMassBank::getMolecule(smiles)
z=rcdk::view.image.2d(mol, depictor=dep)
grid::rasterGrob(z)
}
grob <- smiles2img()
grob = smiles2img()
if (!is.null(grob)) {
qplot(1:5, 2*(1:5), geom="blank") +
......@@ -156,13 +156,13 @@ theme_print <- function(...) ggplot2::theme_light()+ggplot2::theme(axis.title=gg
...)+guide_fun()
theme_empty <- ggplot2::theme_bw()
theme_empty$line <- ggplot2::element_blank()
theme_empty$rect <- ggplot2::element_blank()
theme_empty$strip.text <- ggplot2::element_blank()
theme_empty$axis.text <- ggplot2::element_blank()
theme_empty$plot.title <- ggplot2::element_blank()
theme_empty$axis.title <- ggplot2::element_blank()
theme_empty = ggplot2::theme_bw()
theme_empty$line = ggplot2::element_blank()
theme_empty$rect = ggplot2::element_blank()
theme_empty$strip.text = ggplot2::element_blank()
theme_empty$axis.text = ggplot2::element_blank()
theme_empty$plot.title = ggplot2::element_blank()
theme_empty$axis.title = ggplot2::element_blank()
cust_geom_line <- function(key_glyph="rect",...) ggplot2::geom_line(...,key_glyph=key_glyph)
......@@ -187,18 +187,31 @@ mk_logic_exp <- function(rest,sofar=NULL) {
} else {
nm = names(rest)[[1]]
val = rest[[1]]
ex <- bquote(.(as.symbol(nm)) %in% .(val))
zz <- if (is.null(sofar)) ex else bquote(.(ex) & .(sofar))
ex = bquote(.(as.symbol(nm)) %in% .(val))
zz = if (is.null(sofar)) ex else bquote(.(ex) & .(sofar))
mk_logic_exp(tail(rest,-1L), zz)
}
}
get_data_from_key <- function(tab,key) {
skey <- mk_logic_exp(key)
tab <- eval(bquote(tab[.(skey)]))
setkeyv(tab,names(key))
tab
get_data_from_key <- function(db,tab,kvals,outcols=NULL) {
## Ensure only names that exist in cat are used in selection. Or,
## should we not do this?
valid_names = intersect(names(kvals),colnames(db$cat))
## Turn list into a data.table.
dt = as.data.table(kvals[valid_names])
## Get catids.
cattab = db$cat[dt,on=valid_names]
## Get precids.
mztab = db$precursors[cattab,on="catid"]
outnames = union(valid_names,outcols)
res = tab[mztab,on="precid"]
if (!is.null(outcols)) res[,..outnames] else res
}
......@@ -220,9 +233,9 @@ define_colrdata <- function(comptab,labs) {
## Determine colours based on `labs'.
one_keyset <- function(dt) {
labtab = dt[,unique(.SD),.SDcol=labs]
n <- NROW(labtab)
cols <- if (n<13L) {
pal <- RColorBrewer::brewer.pal(n=n,name="Paired")
n = NROW(labtab)
cols = if (n<13L) {
pal = RColorBrewer::brewer.pal(n=n,name="Paired")
if (n>3L) pal else if (n>0L) pal[1:n] else character()
} else {
scales::viridis_pal()(n)
......@@ -232,14 +245,14 @@ define_colrdata <- function(comptab,labs) {
}
## Calculate lengths of all the COLRDATA_KEY subgroups.
dt <- comptab[,unique(.SD),.SDcols=labs,by=COLRDATA_KEY]
dt = comptab[,unique(.SD),.SDcols=labs,by=COLRDATA_KEY]
## Arrange colours to map to specific labels by sorting.
allcols <- union(COLRDATA_KEY,labs)
allcols = union(COLRDATA_KEY,labs)
data.table::setkeyv(dt,allcols)
## Assign colours to labels subgroups.
res <- dt[,one_keyset(.SD),by=COLRDATA_KEY]
res = dt[,one_keyset(.SD),by=COLRDATA_KEY]
## Sort everything again,
data.table::setkeyv(res,allcols)
......@@ -251,10 +264,10 @@ define_colrdata <- function(comptab,labs) {
## compound set.).
narrow_colrdata <- function(colrdata,kvals) {
if (is.null(colrdata)) return(NULL)
vals <- as.list(kvals[COLRDATA_KEY])
vals = as.list(kvals[COLRDATA_KEY])
res <- colrdata[,(COLRDATA_KEY):=NULL]
labs <- names(res)[names(res)!="colour"]
res = colrdata[,(COLRDATA_KEY):=NULL]
labs = names(res)[names(res)!="colour"]
data.table::setkeyv(res,labs)
res
}
......@@ -266,201 +279,219 @@ narrow_colrdata <- function(colrdata,kvals) {
## subset of the `summ' table based on `kvals'. We need it for rt-s in
## the labels. Argument `labs' is a vector of names that will be used
## to construct the legend labels.
get_data_4_eic_ms1 <- function(extr_ms1,summ_rows,kvals,labs) {
get_data_4_eic_ms1 <- function(db,summ_rows,kvals,labs) {
## Which of the selected keys are in the extr_ms1? This can be
## Which of the selected keys are in the db$extr$cgm$ms1? This can be
## made more obvious to the user, but note necessary atm.
keys <- names(kvals)
actual_key <- intersect(keys,names(extr_ms1))
actual_kvals <- kvals[actual_key]
keys = names(kvals)
actual_key = intersect(keys,names(db$extr$cgm$ms1))
actual_kvals = kvals[actual_key]
## Subset db$extr$cgm$ms1 by the actual key.
tab = get_data_from_key(db=db,
tab=db$extr$cgm$ms1,
kvals=kvals,
outcols = c("catid","mz","rt","intensity"))
## Subset extr_ms1 by the actual key.
tab <-get_data_from_key(tab=extr_ms1,key=actual_kvals)
## Group the plot data per label group (ie tags, or adducts, or
## both).
xlxx <- intersect(labs,names(extr_ms1))
xlxx <- as.character(xlxx)
pdata <- tab[,.(rt,intensity),by=xlxx]
meta = db$cat[tab[,.(catid=unique(catid))],on=.(catid),.SD,by=.EACHI,.SDcols=labs]
## Attach catid information.
tab = meta[tab,on=.(catid),nomatch=NULL]
pdata = tab[,.(rt,intensity),by=labs]
## TODO: FIXME: This fails because summ_rows sux wrt calcing of ms1_rt for labels. #Now, add the RTs in.
## pdata[summ_rows,ms1_rt:=signif(i.ms1_rt,5),on=xlxx]
text_labels = labs
pdata = eval(bquote(pdata[,label:=make_line_label(..(lapply(text_labels,as.symbol))),by=text_labels],splice=T))
setkeyv(pdata,cols=unique(as.character(text_labels)))
## Create labels.
## xlxx <- unique(c(xlxx,"ms1_rt"))
pdata <- eval(bquote(pdata[,label:=make_line_label(..(lapply(xlxx,as.symbol))),by=xlxx],splice=T))
setkeyv(pdata,cols=unique(as.character(xlxx),"rt"))
pdata
}
## Prepare MS2 eic data: rt and intensity + key made of splitby.
get_data_4_eic_ms2 <- function(summ,kvals,labs) {
tab <-get_data_from_key(tab=summ,key=kvals)
nms <- names(kvals)
byby <- unique(c(nms,labs,"an"))
pdata <- tab[,.(intensity=ms2_int,rt=ms2_rt),by=byby]
get_data_4_eic_ms2 <- function(db,summ,kvals,labs) {
tab = get_data_from_key(db=db,tab=db$extr$cgm$ms2,kvals=kvals,outcols=c("catid","ce","mz","rt","intensity"))
meta = db$cat[tab[,.(catid=unique(catid))],on=.(catid),.SD,by=.EACHI,.SDcols=labs]
## Attach catid information.
tab = meta[tab,on=.(catid),nomatch=NULL]
pdata = tab[,.(rt,intensity),by=labs]
if (NROW(pdata)==0L) return(NULL)
xlxx <- as.character(labs)
pdata <- eval(bquote(pdata[,label:=make_line_label(..(lapply(xlxx,as.symbol))),by=.(xlxx)],splice=T))
setkeyv(pdata,cols=c(labs,"rt"))
text_labels = labs
pdata = eval(bquote(pdata[,label:=make_line_label(..(lapply(text_labels,as.symbol))),by=text_labels],splice=T))
setkeyv(pdata,cols=unique(as.character(text_labels)))
pdata
}
get_rows_from_summ <- function(summ,kvals,...) {
summ_rows_cols <- union(names(kvals),c(...))
get_data_from_key(summ,key=kvals)[,unique(.SD),.SDcol=summ_rows_cols]
}
narrow_summ <- function(summ,kvals,labs,...) {
keys <- names(kvals)
## keys <- keys[!is.na(keys)]
needed <- setdiff(labs,keys)
x <- as.list(c(needed,...))
x <- c(list(summ,kvals),x)
do.call(get_rows_from_summ,x)
narrow_summ <- function(db,summ,kvals,labs,...) {
keys = names(kvals)
nms = union(names(kvals),
labs)
nms = union(union("precid",nms),c(...))
nsumm = get_data_from_key(db=db,
tab=summ,
kvals=kvals,
outcols=nms)
nsumm
}
### PLOTTING: TOP-LEVEL PLOT CREATION
make_eic_ms1_plot <- function(extr_ms1,summ,kvals,labs,axis="linear",rt_range=NULL,i_range=NULL, asp=1,colrdata=NULL) {
make_eic_ms1_plot <- function(db,summ,kvals,labs,axis="linear",rt_range=NULL,i_range=NULL, asp=1,colrdata=NULL) {
## If nothing selected, just return NULL.
if (is.null(kvals)) return(NULL)
key <- names(kvals)
key = names(kvals)
## Get metadata.
## TODO: FIXME: Somehow calculating representationve ms1_rt for
## plots is wrong. Horrible and wrong. Will remove those labels
## until we fix.
summ_rows <- narrow_summ(summ,kvals,labs,"mz","ms1_rt","ms1_int","Name","SMILES","Formula","qa_ms1_exists","an","ms2_sel")
rows_key <- union(data.table::key(summ_rows),labs)
summ_rows = narrow_summ(db=db,summ,kvals,labs,"mz","ms1_rt","ms1_int","Name","SMILES","qa_ms1_exists","scan","ms2_sel")
rows_key = union(data.table::key(summ_rows),labs)
summ_rows$sel_ms1_rt=NA_real_
summ_rows[ms2_sel==T,sel_ms1_rt:=ms1_rt[which.max(ms1_int)],by=rows_key]
summ_rows[is.na(sel_ms1_rt) & ms2_sel==F & qa_ms1_exists==T,sel_ms1_rt:=ms1_rt[which.max(ms1_int)],by=rows_key]
summ_rows[,ms1_rt:=sel_ms1_rt]
summ_rows[,sel_ms1_rt:=NULL]
summ_rows[,c("an","qa_ms1_exists","ms2_sel"):=NULL]
summ_rows <- summ_rows[,unique(.SD)]
summ_rows[,c("scan","qa_ms1_exists","ms2_sel"):=NULL]
summ_rows = summ_rows[,unique(.SD)]
## Get the table with ms1 data.
pdata <- get_data_4_eic_ms1(extr_ms1, summ_rows, kvals, labs)
pdata = get_data_4_eic_ms1(db=db, summ_rows, kvals, labs)
## Deal with retention time range.
coord <- if (is.null(rt_range) && is.null(i_range)) {
coord = if (is.null(rt_range) && is.null(i_range)) {
NULL
} else {
ggplot2::coord_cartesian(xlim=rt_range,
ylim=i_range)
}
xrng <- range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt)
dx <- abs(xrng[[2]]-xrng[[1]])
yrng <- range(pdata$intensity)
dy <- abs(yrng[[2]]-yrng[[1]])
xrng = range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt)
dx = abs(xrng[[2]]-xrng[[1]])
yrng = range(pdata$intensity)
dy = abs(yrng[[2]]-yrng[[1]])
## Calculate aspect ratio.
aspr <- if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
aspr = if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])),
collapse='; ') ## paste0("Set: ", set, " ID: ",id)
title_txt = paste0("MS1 EIC for ion m/z = ",paste0(signif(unique(summ_rows$mz),digits=7L),collapse=", "))
nm <- paste(unique(summ_rows$Name),collapse="; ")
nm = paste(unique(summ_rows$Name),collapse="; ")
subt_txt = if (!length(nm)==0L && !is.na(nm) && nchar(nm)>0L) nm else NULL
p <- ggplot2::ggplot(pdata,aes(x=rt,y=intensity,colour=label))+
p = ggplot2::ggplot(pdata,aes(x=rt,y=intensity,colour=label))+
ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt)+
ggplot2::xlab("retention time")+
cust_geom_line()+
scale_y(axis=axis,labels=sci10)+
coord
colrdata <- narrow_colrdata(colrdata,kvals)
colrdata = narrow_colrdata(colrdata,kvals)
p + scale_legend(colrdata,pdata) + theme_eic()
}
make_eic_ms2_plot <- function(summ,kvals,labs,axis="linear",rt_range=NULL,asp=1, colrdata=NULL) {
make_eic_ms2_plot <- function(db,summ,kvals,labs,axis="linear",rt_range=NULL,asp=1, colrdata=NULL) {
## If nothing selected, just return NULL.
if (is.null(kvals)) return(NULL)
## Get metadata.
summ_rows <- narrow_summ(summ,kvals,labs,"mz","ms2_rt","ms2_int","Name","SMILES","Formula")
summ_rows = narrow_summ(db=db,summ,kvals,labs,"mz","ms2_rt","ms2_int","Name","SMILES")
## Get plotting data for the compound.
pdata <- get_data_4_eic_ms2(summ,
kvals=kvals,
labs=labs)
pdata = get_data_4_eic_ms2(db=db,
summ,
kvals=kvals,
labs=labs)
if (NROW(pdata)==0L) return(NULL)
## Deal with retention time range.
rt_lim <- if (is.null(rt_range)) NULL else ggplot2::coord_cartesian(xlim=rt_range)#ggplot2::xlim(rt_range)
xrng <- range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt)
dx <- abs(xrng[[2]]-xrng[[1]])
yrng <- range(pdata$intensity)
dy <- abs(yrng[[2]]-yrng[[1]])
rt_lim = if (is.null(rt_range)) NULL else ggplot2::coord_cartesian(xlim=rt_range)#ggplot2::xlim(rt_range)
xrng = range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt)
dx = abs(xrng[[2]]-xrng[[1]])
yrng = range(pdata$intensity)
dy = abs(yrng[[2]]-yrng[[1]])
## Fix aspect ratio.
aspr <- if (is.null(dx) || is.na(dx) || dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
aspr = if (is.null(dx) || is.na(dx) || dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
## Derive various labels.
tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])),
collapse='; ')
title_txt = paste0("MS2 EIC for ion m/z = ",paste0(signif(unique(summ_rows$mz),digits=7L),collapse=", "))
subt_txt = if (!length(summ_rows$Name)==0L && !is.na(summ_rows$Name[[1]]) && nchar(summ_rows$Name[[1]])>0L) summ_rows$Name[[1]] else NULL
## Base plot.
p <- ggplot2::ggplot(pdata,aes(x=rt,ymin=0,ymax=intensity,colour=label)) +
p = ggplot2::ggplot(pdata,aes(x=rt,ymin=0,ymax=intensity,colour=label)) +
ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt) +
ggplot2::xlab("retention time")+ggplot2::ylab("intensity")+cust_geom_linerange()+
scale_y(axis=axis,labels=sci10)+rt_lim+guide_fun()
ans <- pdata[,unique(an)]
## Add theme.
colrdata <- narrow_colrdata(colrdata,kvals)
colrdata = narrow_colrdata(colrdata,kvals)
p + scale_legend(colrdata,pdata) + theme_eic()
}
make_spec_ms2_plot <- function(extr_ms2,summ,kvals,labs,axis="linear",asp=1, colrdata=NULL) {
make_spec_ms2_plot <- function(db,summ,kvals,labs,axis="linear",asp=1, colrdata=NULL) {
## Only the chosen ones.
mdata <- get_data_from_key(summ,key=kvals)[ms2_sel==T]
common_key <- intersect(names(extr_ms2),names(kvals))
common_vals <- kvals[common_key]
if (length(common_key) == 0L) return(NULL)
subxdata <- get_data_from_key(extr_ms2,key=common_vals)
mdata = get_data_from_key(db=db,
tab=summ,
kvals=kvals,
outcols=union(names(kvals),
colnames(summ)))[ms2_sel==T]
spectra = db$extr$spectra[mdata,on=.(precid,scan),.(catid,ce,mz,scan,rt=i.ms2_rt,intensity)]
if (NROW(mdata)==0L) return(NULL)
if (NROW(subxdata) == 0L) return(NULL)
ans <- data.table(an=mdata[,unique(an)],key="an")
ms2ctg <- c(intersect(c(names(kvals),labs),names(extr_ms2)),"CE")
xlxx <- intersect(as.character(labs),names(extr_ms2))
common_labels <- unique(c("an",common_key,intersect(names(extr_ms2),labs)))
pdata <- subxdata[ans,on="an"][,.(mz=mz,intensity=intensity,rt=signif(unique(rt),5)),by=common_labels]
pdata <- eval(bquote(pdata[,label:=make_line_label(..(lapply(c(xlxx,"rt"),as.symbol))),by=.(xlxx)],splice=T))
if (NROW(spectra) == 0L) return(NULL)
meta = db$cat[spectra[,.(catid=unique(catid))],
on=.(catid),
.SD,
by=.EACHI,
.SDcols=labs]
subxdata = meta[spectra,on=.(catid),nomatch=NULL]
labs = c(labs,"ce")
pdata = subxdata[,.(mz=mz,intensity=intensity,rt=signif(unique(rt),5)),by=labs]
pdata = eval(bquote(pdata[,label:=make_line_label(..(lapply(c(labs,"rt"),as.symbol))),by=.(labs)],splice=T))
if (NROW(pdata)==0L) return(NULL)
# Aspect ratio.
xrng <- range(pdata$mz)
dx <- abs(xrng[[2]]-xrng[[1]])
yrng <- range(pdata$intensity)
dy <- abs(yrng[[2]]-yrng[[1]])
aspr <- if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
xrng = range(pdata$mz,na.rm=T)
dx = abs(xrng[[2]]-xrng[[1]])
yrng = range(pdata$intensity)
dy = abs(yrng[[2]]-yrng[[1]])
aspr = if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
## Get labels.
tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])),
collapse='; ')
title_txt = paste0("MS2 spectra for ion m/z = ",paste0(signif(unique(mdata$mz),digits=7L),collapse=", "))
nm <- paste(unique(mdata$Name),collapse="; ")
nm = paste(unique(mdata$Name),collapse="; ")
subt_txt = if (!length(nm)==0L && !is.na(nm) && nchar(nm)>0L) nm else NULL
p <- ggplot2::ggplot(pdata,aes(x=mz,ymin=0,ymax=intensity,colour=label))+ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt)+ggplot2::xlab("m/z")+cust_geom_linerange()+scale_y(axis=axis,labels=sci10)+guide_fun()
p = ggplot2::ggplot(pdata,aes(x=mz,ymin=0,ymax=intensity,colour=label))+ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt)+ggplot2::xlab("m/z")+cust_geom_linerange()+scale_y(axis=axis,labels=sci10)+guide_fun()
## Add theme.
colrdata <- narrow_colrdata(colrdata,kvals)
colrdata = narrow_colrdata(colrdata,kvals)
p + scale_legend(colrdata,pdata) + theme_eic()
}
......
......@@ -124,13 +124,14 @@ NUM_INP_HEIGHT="5%"
## Possible compound list fields
EMPTY_CMPD_LIST = dtable(ID=character(),
SMILES=character(),
Name=character(),
Formula=character(),
RT=numeric(),
mz=numeric(),
known=character(),
ORIG=character())
SMILES=character(),
Name=character(),
Formula=character(),
RT=numeric(),
mz=numeric(),
known=character(),
set=character(),
ORIG=character())
COMP_LIST_COLS = c("ID","Name","SMILES","Formula","RT","mz")
## Comprehensive table properties
COMP_NAME_MAP = list(RT="rt")
......@@ -204,8 +205,8 @@ REPORT_TITLE = "Plots of EICs and MS2 Spectra"
## Select the most fundamental group of entries. Within this group,
## each ID is unique.
BASE_KEY = c("adduct","tag","ID")
BASE_KEY_MS2 = c(BASE_KEY,"CE","an")
BASE_KEY = "precid"#c("adduct","tag","ID")
BASE_KEY_MS2 = c("precid","ce","scan")#c(BASE_KEY,"CE","scan")
FIG_DEF_CONF =list(grouping=list(group="adduct",
plot="ID",
......@@ -216,7 +217,7 @@ FIG_DEF_CONF =list(grouping=list(group="adduct",
SUMM_COLS=c("set",BASE_KEY_MS2,"mz","ms1_rt", "ms1_int", "ms2_rt", "ms2_int",
"ms1_mean","ms2_sel",QA_FLAGS,"Name", "SMILES", "Formula", "known","Comments","file")
SUMM_KEY = c("set","ID","adduct","tag","an")
SUMM_KEY = c("set","ID","adduct","tag","precid","catid","scan")
PLOT_FEATURES = c("adduct",
"tag",
......@@ -224,34 +225,34 @@ PLOT_FEATURES = c("adduct",
## Empty summary table.
EMPTY_SUMM = data.table::data.table(set=character(0),
adduct=character(0),
tag=character(0),
ID=character(0),
CE=character(0),
an=integer(0),
mz=numeric(0),
ms1_rt=numeric(0),
ms1_int=numeric(0),
ms2_rt=numeric(0),
ms2_int=numeric(0),
ms1_mean=numeric(0),
ms2_sel=logical(0),
qa_pass=logical(0),
qa_ms1_exists=logical(0),
qa_ms2_exists=logical(0),
qa_ms1_good_int=logical(0),
qa_ms1_above_noise=logical(0),
qa_ms2_near=logical(0),
qa_ms2_good_int=logical(0),
Name=character(0),
SMILES=character(0),
Formula=character(0),
known=character(0),
Comments=character(0),
file=character(0))
adduct=character(0),
tag=character(0),
ID=character(0),
CE=character(0),
scan=character(0),
mz=numeric(0),
ms1_rt=numeric(0),
ms1_int=numeric(0),
ms2_rt=numeric(0),
ms2_int=numeric(0),
ms1_mean=numeric(0),
ms2_sel=logical(0),
qa_pass=logical(0),
qa_ms1_exists=logical(0),
qa_ms2_exists=logical(0),
qa_ms1_good_int=logical(0),
qa_ms1_above_noise=logical(0),
qa_ms2_near=logical(0),
qa_ms2_good_int=logical(0),
Name=character(0),
SMILES=character(0),
Formula=character(0),
known=character(0),
Comments=character(0),
file=character(0))
## Default sorting keys of spectra in the summary table
DEF_KEY_SUMM = c(BASE_KEY_MS2,"an")
DEF_KEY_SUMM = c(BASE_KEY_MS2,"scan")
SUBSET_VALS = c(IGNORE="ignore",
......@@ -360,3 +361,7 @@ METFRAG_RESULT_READF = list(csv = function(file,...) data.table::fread(file=file
xml = function(file,...) readxl::read_excel(path=file,...))
METFRAG_DEFAULT_PROC = 1L
## DATA MODEL
DB_CATALOGUE_KEY = c("set","tag","adduct","ID")
......@@ -22,7 +22,7 @@
#' @importFrom shiny selectInput numericInput textInput HTML
GUI_SELECT_INPUTS <- c("proj_list",
GUI_SELECT_INPUTS = c("proj_list",
"indir_list",
"ms1_coarse_unit",
"ms1_fine_unit",
......@@ -30,7 +30,7 @@ GUI_SELECT_INPUTS <- c("proj_list",
"ret_time_shift_tol_unit",
"dfile_list")
GUI_NUMERIC_INPUTS <- c("ms1_coarse",
GUI_NUMERIC_INPUTS = c("ms1_coarse",
"ms1_fine",
"ms1_eic",
"ms1_rt_win",
......@@ -39,14 +39,14 @@ GUI_NUMERIC_INPUTS <- c("ms1_coarse",
"s2n",
"ret_time_shift_tol")
GUI_TEXT_INPUTS <- c("rep_aut",
GUI_TEXT_INPUTS = c("rep_aut",
"rep_tit")
GUI_RADIO_INPUTS <- c("missingprec")
GUI_RADIO_INPUTS = c("missingprec")
GUI_ALL_INPUTS <- c(GUI_SELECT_INPUTS,
GUI_ALL_INPUTS = c(GUI_SELECT_INPUTS,
GUI_NUMERIC_INPUTS,
GUI_TEXT_INPUTS,
GUI_RADIO_INPUTS)
......@@ -55,8 +55,8 @@ GUI_ALL_INPUTS <- c(GUI_SELECT_INPUTS,
add_new_def_tag <- function(old_tags,how_many) {
ind = which(grepl(r"(^F\d+$)",old_tags))
st_num = if (length(ind)>0L) {
old_def_tags <- old_tags[ind]
tag_nums <- gsub(r"(^F(\d+)$)",r"(\1)",old_def_tags)
old_def_tags = old_tags[ind]
tag_nums = gsub(r"(^F(\d+)$)",r"(\1)",old_def_tags)
max(as.integer(tag_nums))
......@@ -162,26 +162,26 @@ create_gui <- function(project_path=NA_character_) {
#'@export
r2datatab <- function(rdatatab) {
shiny::isolate({
file <- rdatatab$file
adduct <- rdatatab$adduct
tag <- rdatatab$tag
set <- rdatatab$set
file = rdatatab$file
adduct = rdatatab$adduct
tag = rdatatab$tag
set = rdatatab$set
})
if (length(file)==0L) file <- character(0)
if (length(adduct)==0L) adduct <- rep(NA_character_,length(file))
if (length(tag)==0L) tag <- rep(NA_character_,length(file))
if (length(set)==0L) tag <- rep(NA_character_,length(file))
if (length(file)==0L) file = character(0)
if (length(adduct)==0L) adduct = rep(NA_character_,length(file))
if (length(tag)==0L) tag = rep(NA_character_,length(file))
if (length(set)==0L) tag = rep(NA_character_,length(file))
data.table(tag=tag,adduct=adduct,set=set,file=file)
}
r2filetag <- function(rfiletag) {
shiny::isolate({
file <- rfiletag$file
tag <- rfiletag$tag
file = rfiletag$file
tag = rfiletag$tag
})
if (length(file)==0L) file <- character(0)
if (length(tag)==0L) tag <- rep(NA_character_,length(file))
if (length(file)==0L) file = character(0)
if (length(tag)==0L) tag = rep(NA_character_,length(file))
data.table(tag=tag,file=file)
}
......@@ -193,7 +193,7 @@ gen_dtab <- function(tablist,sets) {
r2compounds <- function(rcompounds) {
shiny::isolate({
cmpd_lists <- rcompounds$lists
cmpd_lists = rcompounds$lists
})
list(lists=cmpd_lists)
......@@ -202,17 +202,17 @@ r2compounds <- function(rcompounds) {
#' @export
pack_app_state <- function(input, gui) {
pack <- list()
pack = list()
shiny::isolate({
pack_inputs <- list()
pack_input_names <- which_gui_inputs(inputs)
pack_inputs <- shiny::reactiveValuesToList(input)[pack_input_names]
pack$input <- pack_inputs
pack$datatab <- r2datatab(gui$datatab)
pack$filetag <- r2filetag(gui$filetag)
pack$compounds <- r2compounds(gui$compounds)
pack$paths <- list()
pack$paths$data <- gui$paths$data
pack_inputs = list()
pack_input_names = which_gui_inputs(inputs)
pack_inputs = shiny::reactiveValuesToList(input)[pack_input_names]
pack$input = pack_inputs
pack$datatab = r2datatab(gui$datatab)
pack$filetag = r2filetag(gui$filetag)
pack$compounds = r2compounds(gui$compounds)
pack$paths = list()
pack$paths$data = gui$paths$data
})
pack
......@@ -318,19 +318,21 @@ unpack_app_state <- function(session,envopts,input,top_data_dir,project_path,pac
input=input,
packed_state=packed_state)
gui <- create_gui(project_path=project_path)
gui$compounds$lists <- packed_state$compounds$lists
gui$datatab$file <- packed_state$datatab$file
gui$datatab$adduct <- packed_state$datatab$adduct
gui$datatab$tag <- packed_state$datatab$tag
gui$datatab$set <- packed_state$datatab$set
gui$filetag$file <- packed_state$filetag$file
gui$filetag$tag <- packed_state$filetag$tag
x <- packed_state$paths$data
gui$paths$data = if (length(x)>0 && nchar(x)>0) basename(x) else ""
if (!dir.exists(file.path(top_data_dir,gui$paths$data))) {warning("Data directory ", gui$paths$data, " does not exist. You must select one.")}
gui = create_gui(project_path=project_path)
gui$compounds$lists = packed_state$compounds$lists
gui$datatab$file = packed_state$datatab$file
gui$datatab$adduct = packed_state$datatab$adduct
gui$datatab$tag = packed_state$datatab$tag
gui$datatab$set = packed_state$datatab$set
gui$filetag$file = packed_state$filetag$file
gui$filetag$tag = packed_state$filetag$tag
x = packed_state$paths$data
gui$paths$data = if (length(x)>0 && nchar(x)>0) {
file.path(top_data_dir,basename(x))
} else ""
if (!dir.exists(gui$paths$data)) {warning("Data directory ", gui$paths$data, " does not exist. You must select one.")}
gui
})
......@@ -340,12 +342,12 @@ unpack_app_state <- function(session,envopts,input,top_data_dir,project_path,pac
input2conf_setup <- function(input,gui,conf=list()) {
if (length(conf)==0L) {
conf$compounds <- list()
conf$summary_table <- list()
conf$debug <- F
conf$compounds = list()
conf$summary_table = list()
conf$debug = F
}
conf$compounds$lists <- gui$compounds$lists
conf$compounds$lists = gui$compounds$lists
conf$paths$data = basename(gui$paths$data)
conf
}
......@@ -353,11 +355,11 @@ input2conf_setup <- function(input,gui,conf=list()) {
input2conf_extract <- function(input,conf) {
conf$tolerance = list()
conf$extract = list()
conf$tolerance[["ms1 fine"]] <- paste(input$ms1_fine,input$ms1_fine_unit)
conf$tolerance[["ms1 coarse"]] <- paste(input$ms1_coarse,input$ms1_coarse_unit)
conf$tolerance[["eic"]] <- paste(input$ms1_eic,input$ms1_eic_unit)
conf$tolerance[["rt"]] <- paste(input$ms1_rt_win,input$ms1_rt_win_unit)
conf$extract$missing_precursor_info <- input$missingprec
conf$tolerance[["ms1 fine"]] = paste(input$ms1_fine,input$ms1_fine_unit)
conf$tolerance[["ms1 coarse"]] = paste(input$ms1_coarse,input$ms1_coarse_unit)
conf$tolerance[["eic"]] = paste(input$ms1_eic,input$ms1_eic_unit)
conf$tolerance[["rt"]] = paste(input$ms1_rt_win,input$ms1_rt_win_unit)
conf$extract$missing_precursor_info = input$missingprec
conf
}
......@@ -365,26 +367,26 @@ input2conf_extract <- function(input,conf) {
input2conf_prescreen <- function(input,conf) {
conf$prescreen = list()
conf$prescreen[["ms1_int_thresh"]] <- input$ms1_int_thresh
conf$prescreen[["ms2_int_thresh"]] <- input$ms2_int_thresh
conf$prescreen[["s2n"]] <- input$s2n
conf$prescreen[["ret_time_shift_tol"]] <- paste(input$ret_time_shift_tol,input$ret_time_shift_tol_unit)
conf$prescreen[["ms1_int_thresh"]] = input$ms1_int_thresh
conf$prescreen[["ms2_int_thresh"]] = input$ms2_int_thresh
conf$prescreen[["s2n"]] = input$s2n
conf$prescreen[["ret_time_shift_tol"]] = paste(input$ret_time_shift_tol,input$ret_time_shift_tol_unit)
conf
}
input2conf_figures <- function(input,conf) {
conf$figures = list()
conf$figures$rt_min <- paste(input$plot_rt_min,input$plot_rt_min_unit)
conf$figures$rt_max <- paste(input$plot_rt_max,input$plot_rt_max_unit)
conf$figures$ext <- input$plot_ext
conf$figures$rt_min = paste(input$plot_rt_min,input$plot_rt_min_unit)
conf$figures$rt_max = paste(input$plot_rt_max,input$plot_rt_max_unit)
conf$figures$ext = input$plot_ext
conf
}
input2conf_report <- function(input,conf) {
conf$report = list()
conf$report$author <- input$rep_aut
conf$report$title <- input$rep_tit
conf$report$author = input$rep_aut
conf$report$title = input$rep_tit
conf
}
......@@ -401,7 +403,7 @@ input2conf_metfrag <- function(input,conf) {
MetFragPreProcessingCandidateFilter = paste(input$mf_pre_processing_candidate_filter,collapse=","),
MetFragPostProcessingCandidateFilter = paste(input$mf_post_processing_candidate_filter,collapse=","))
## TODO: FIXME: We need to move away from unit weights from some
## TODO: FIXME: We need to move away from unit weights at some
## point. This needs some extra widgets (Sigh!).
insc = sapply(input$mf_scores_intrinsic,function(x) 1.0)
names(insc) = input$mf_scores_intrinsic
......@@ -425,7 +427,7 @@ app_update_conf <- function(input,gui,envopts,fconf,m) {
fstr = paste0("input2conf_",fstrp)
m$conf = do.call(fstr,list(input,conf=m$conf))
}
m$run <- new_runtime_state(project=gui$paths$project,
m$run = new_runtime_state(project=gui$paths$project,
envopts = envopts,
conf=m$conf)
m
......@@ -453,28 +455,24 @@ app_state2state <- function(input,gui,envopts,m=NULL) {
}
get_sets <- function(gui) {
## TODO FIXME
## Think about this
## fn_lists <- file.path(gui$paths$project,gui$compounds$lists)
## df <- fread(file=fn_lists)
## if (!
## res = df[,unique(set)]
if (length(res)==0L) res = "ALL"
fn_lists = file.path(gui$paths$project,gui$compounds$lists)
cmpds = join_compound_lists(fn_lists)
cmpds = process_cmpd_sets(cmpds)
cmpds[,unique(set)]
}
gen_dfiles_tab <- function(gui) {
curr_file <- gui$filetag$file
curr_tag <- gui$filetag$tag
curr_file = gui$filetag$file
curr_tag = gui$filetag$tag
res <- data.table(file=curr_file,tag=curr_tag)
res = data.table(file=curr_file,tag=curr_tag)
res
}
gui2datatab <- function(gui) {
df <- data.table(tag=as.character(gui$datatab$tag),
df = data.table(tag=as.character(gui$datatab$tag),
adduct=as.character(gui$datatab$adduct),
set=as.character(gui$datatab$set),
file=as.character(gui$datatab$file))
......@@ -493,55 +491,59 @@ gui2datatab <- function(gui) {
## (of tags, CEs) in the index.
gen_cindex <- function(summ,sorder,cols = CINDEX_COLS,by. = CINDEX_BY) {
if (NROW(summ) == 0L) return(NULL)
allc <- c(by.,cols)
xsumm <- summ[,..allc]
allc = c(by.,cols)
xsumm = summ[,..allc]
setnames(xsumm,old="ms1_rt",new="rt",skip_absent=T)
res <- xsumm[,.SD[max(qlt_ms1)==qlt_ms1][max(qlt_ms2)==qlt_ms2],by=by.]
res <- res[,c("mz","rt","Name","qlt_ms1","qlt_ms2"):=.(first(mz),
first(mean(rt)),
first(Name),
first(qlt_ms1),
first(qlt_ms2)),
res = xsumm[,.SD[max(qlt_ms1)==qlt_ms1][max(qlt_ms2)==qlt_ms2],by=by.]
res = res[,c("mz","rt","Name","qlt_ms1","qlt_ms2"):=.(mean(mz,na.rm=T),
mean(rt,na.rm=T),
first(Name),
max(qlt_ms1,na.rm=T),
max(qlt_ms2,na.rm=T)),
by=by.]
res <- res[,unique(.SD),by=by.]
res = res[,unique(.SD),by=by.]
sorder <- unique(sorder)
wna <- which(sorder=="nothing"); if (length(wna)>0L) sorder <- sorder[-wna]
quality <- which("quality"==sorder)
sorder = unique(sorder)
wna = which(sorder=="nothing"); if (length(wna)>0L) sorder = sorder[-wna]
quality = which("quality"==sorder)
if (length(quality)>0L) {
pre <- head(sorder,quality-1L)
post <- tail(sorder,-quality)
sorder <- c(pre,"qlt_ms1","qlt_ms2",post)
pre = head(sorder,quality-1L)
post = tail(sorder,-quality)
sorder = c(pre,"qlt_ms1","qlt_ms2",post)
}
ord <- rep(1L,length(sorder))
ord = rep(1L,length(sorder))
if ("qlt_ms1" %in% sorder) {
ind <- which(sorder %in% c("qlt_ms1","qlt_ms2"))
ord[ind] <- -1L
ind = which(sorder %in% c("qlt_ms1","qlt_ms2"))
ord[ind] = -1L
}
if (length(sorder)>0) setorderv(res,cols=sorder,order=ord)
setnames(res,old="rt",new="rt(ms1)")
## Remove confusing columns.
res[,c("rt","mz"):=NULL]
res[,c("qlt_ms1","qlt_ms2"):=.(signif(100*qlt_ms1/10.,3),signif(100*qlt_ms2/10.,3))]
setnames(res,c("qlt_ms1","qlt_ms2"),c("best score (ms1)","best score (ms2)"))
res
}
cindex_from_input <- function(clabs,sort_catg=character(4),summ) {
grp <- if (isTruthy(clabs)) setdiff(CINDEX_BY,clabs) else CINDEX_BY
sorder <- setdiff(sort_catg,clabs)
grp = if (isTruthy(clabs)) setdiff(CINDEX_BY,clabs) else CINDEX_BY
sorder = setdiff(sort_catg,clabs)
gen_cindex(summ,sorder=sorder,by=grp)
}
get_cindex_key <- function(cindex) {
## Select only valid category names.
x <- which(CINDEX_BY %in% names(cindex))
x = which(CINDEX_BY %in% names(cindex))
CINDEX_BY[x]
}
get_cindex_parents <- function(summ,ckey,kvals,labs) {
## Get kvals part of summ.
tab <- summ[(kvals),on=names(kvals)][,unique(.SD),.SDcols=labs,by=ckey] #get_data_from_key(summ,kvals)
tab = summ[(kvals),on=names(kvals)][,unique(.SD),.SDcols=labs,by=ckey] #get_data_from_key(summ,kvals)
tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcol=labs]
keys <- names(tab)[names(tab)!="item"]
keys = names(tab)[names(tab)!="item"]
data.table::setkeyv(tab,keys)
tab
}
......@@ -550,28 +552,28 @@ get_cindex_kval <- function(cindex,row,key) {
## Accounting for not fully initialised state.
if (!is.numeric(row) || is.na(row) || length(key)==0L || is.na(key) || NROW(cindex)==0L) return(NULL)
rowtab <- cindex[(row),..key]
res <- lapply(rowtab,function (x) x[[1]])
names(res) <- key
rowtab = cindex[(row),..key]
res = lapply(rowtab,function (x) x[[1]])
names(res) = key
res
}
get_summ_subset <- function(summ,ptab,paritem,kvals) {
select <- ptab[item==(paritem)]
tab <- get_data_from_key(summ,kvals)[select,nomatch=NULL,on=key(ptab)]
if ("an.1" %in% names(tab)) tab[,an.1:=NULL] #TODO: This is
get_summ_subset <- function(db,summ,ptab,paritem,kvals) {
select = ptab[item==(paritem)]
tab = get_data_from_key(db=db,tab=summ,kvals=kvals)[select,nomatch=NULL,on=key(ptab)]
if ("scan.1" %in% names(tab)) tab[,scan.1:=NULL] #TODO: This is
#probably a lousy
#hack.
tab
}
get_ltab <- function(summ_subs,cols=c("an","ms2_rt")) {
tab <- summ_subs
get_ltab <- function(summ_subs,cols=c("scan","ms2_rt")) {
tab = summ_subs
if (NROW(tab)==1L && is.na(tab$an)) return(data.table::data.table(item=character()))
tab[is.na(ms2_sel),ms2_sel:=F] #TODO FIXME: Check why NAs exist at all?
tab[is.na(ms2_sel),ms2_sel:=F]
tab[,passval:=fifelse(qa_pass==T,"OK","BAD")]
tab[ms2_sel==T,passval:="SELECTED"]
res <- tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcols=c(cols,"passval")]
res = tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcols=c(cols,"passval")]
data.table::setkey(res,"ms2_rt")
res
}
......@@ -585,18 +587,18 @@ update_on_commit_chg <- function(summ,input,ptab,ltab) {
n_ms2_sel = input$chg_ms2sel
sel_par <- input$sel_parent_trace
sel_spec <- input$sel_spec
sel_par = input$sel_parent_trace
sel_spec = input$sel_spec
pkvals <- ptab[item==(sel_par),.SD,.SDcols=intersect(SUMM_KEY,names(ptab))]
lkvals <- ltab[item==(sel_spec),.SD,.SDcols=intersect(SUMM_KEY,names(ltab))]
kvals <- c(as.list(pkvals),as.list(lkvals))
kvals <- kvals[unique(names(kvals))]
pkvals = ptab[item==(sel_par),.SD,.SDcols=intersect(SUMM_KEY,names(ptab))]
lkvals = ltab[item==(sel_spec),.SD,.SDcols=intersect(SUMM_KEY,names(ltab))]
kvals = c(as.list(pkvals),as.list(lkvals))
kvals = kvals[unique(names(kvals))]
if ('an' %in% names(kvals) && n_ms2_sel) {
rkvals <- kvals[!(names(kvals) %in% 'an')]
rktab <- tabkey(summ,kvals=rkvals)
tabsel <- summ[rktab,.(an,ms2_sel)]
ansel <- tabsel[ms2_sel == T,an]
rkvals = kvals[!(names(kvals) %in% 'an')]
rktab = tabkey(summ,kvals=rkvals)
tabsel = summ[rktab,.(scan,ms2_sel)]
ansel = tabsel[ms2_sel == T,scan]
print('ansel')
print(ansel)
if (length(ansel)!=0) {
......@@ -607,29 +609,29 @@ update_on_commit_chg <- function(summ,input,ptab,ltab) {
}
tgts <- c("ms1_rt","ms1_int",names(n_qa),"ms2_sel")
srcs <- c(list(n_ms1_rt,n_ms1_int),as.list(n_qa),as.list(n_ms2_sel))
tgts = c("ms1_rt","ms1_int",names(n_qa),"ms2_sel")
srcs = c(list(n_ms1_rt,n_ms1_int),as.list(n_qa),as.list(n_ms2_sel))
the_row <- tabkey(summ,kvals=kvals)
the_row = tabkey(summ,kvals=kvals)
summ[the_row,(tgts):=..srcs]
summ[,an.1:=NULL] #FIXME: an.1 pops up somewhere.
qflg <- QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
summ[,scan.1:=NULL]
qflg = QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
summ[the_row,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
summ
}
get_mprop_ms2_metadata <- function(ltab_entry) {
res <- list(rt=NA_real_,int=NA_real_,qa=character(0),ms2_sel=F)
res = list(rt=NA_real_,int=NA_real_,qa=character(0),ms2_sel=F)
if (NROW(ltab_entry)==0L) return(res)
res$rt = ltab_entry$ms1_rt
res$int = ltab_entry$ms1_int
z <- ltab_entry[.SD,.SDcols=patterns("qa_ms[12].*")]
lqa_vals <- as.list(ltab_entry[,.SD,.SDcols=patterns("qa_ms[12].*")])
qa_names <- names(lqa_vals)
res$qa <- qa_names[as.logical(lqa_vals)]
z = ltab_entry[.SD,.SDcols=patterns("qa_ms[12].*")]
lqa_vals = as.list(ltab_entry[,.SD,.SDcols=patterns("qa_ms[12].*")])
qa_names = names(lqa_vals)
res$qa = qa_names[as.logical(lqa_vals)]
res$ms2_sel = ltab_entry$ms2_sel
res
......
......@@ -564,7 +564,7 @@ mk_shinyscreen_server <- function(projects,init) {
res = if (NROW(sel_dt)>0) {
tab[sel_dt,..coln,on=c("adduct","tag","ID","an")]
tab[sel_dt,..coln,on=c("adduct","tag","ID","scan")]
} else triv
data.table::setnames(res,"intensity","ms2_int")
res
......@@ -700,8 +700,6 @@ mk_shinyscreen_server <- function(projects,init) {
## REACTIVE FUNCTIONS: COMPOUND INDEX
rf_get_cindex <- reactive({
## TODO: FIXME: Uncomment after rearranging everything.
## input$cmt_changes_b
rvs$status$is_qa_stat
s1 = input$sort1
s2 = input$sort2
......@@ -741,8 +739,8 @@ mk_shinyscreen_server <- function(projects,init) {
rf_get_cindex_parents <- reactive({
rvs$m
isolate({
ms1 = rvs$m$extr$ms1
ms2 = rvs$m$extr$ms2
ms1 = rvs$m$db$extr$cgm$ms1
ms2 = rvs$m$db$extr$cgm$ms2
summ = req(rvs$m$out$tab$summ)
})
......@@ -760,7 +758,8 @@ mk_shinyscreen_server <- function(projects,init) {
kvals = req(rf_get_cindex_kval())
ptab = rf_get_cindex_parents()
if (isTruthy(parent)) {
get_summ_subset(summ=summ,
get_summ_subset(db=rvs$m$db,
summ=summ,
ptab=ptab,
paritem=parent,
kvals=kvals)
......@@ -813,20 +812,23 @@ mk_shinyscreen_server <- function(projects,init) {
rf_plot_eic_ms1 <- reactive({
isolate({
ms1 = rvs$m$extr$ms1
ms1 = rvs$m$db$extr$cgm$ms1
summ = rvs$m$out$tab$summ
db = rvs$m$db
})
req(NROW(summ)>0L)
req(NROW(ms1)>0L)
p = make_eic_ms1_plot(ms1,summ,kvals=rf_get_cindex_kval(),
labs=rf_get_cindex_labs(),
asp=PLOT_EIC_ASPECT,
rt_range=rf_get_rtrange(),
i_range=rf_get_irange(),
colrdata = rf_colrdata())
p = make_eic_ms1_plot(db=db,
summ,
kvals=rf_get_cindex_kval(),
labs=rf_get_cindex_labs(),
asp=PLOT_EIC_ASPECT,
rt_range=rf_get_rtrange(),
i_range=rf_get_irange(),
colrdata = rf_colrdata())
p = if (!is.null(p)) p else empty_plot("Nothing to plot")
......@@ -851,12 +853,13 @@ mk_shinyscreen_server <- function(projects,init) {
gg = rf_plot_eic_ms1()
rt_rng = range(gg$data$rt)
p = make_eic_ms2_plot(summ,
kvals=rf_get_cindex_kval(),
labs=rf_get_cindex_labs(),
rt_range = rf_get_ms2_eic_rtrange(),
asp=PLOT_EIC_ASPECT,
colrdata=rf_colrdata())
p = make_eic_ms2_plot(rvs$m$db,
summ,
kvals=rf_get_cindex_kval(),
labs=rf_get_cindex_labs(),
rt_range = rf_get_ms2_eic_rtrange(),
asp=PLOT_EIC_ASPECT,
colrdata=rf_colrdata())
p = if (!is.null(p)) p else empty_plot("Nothing to plot")
......@@ -878,15 +881,15 @@ mk_shinyscreen_server <- function(projects,init) {
rf_plot_spec_ms2 <- reactive({
isolate({
summ = rvs$m$out$tab$summ
ms2 = rvs$m$extr$ms2
ms2 = rvs$m$db$extr$cgm$ms2
})
req(NROW(summ)>0L)
req(NROW(ms2)>0L)
p = make_spec_ms2_plot(ms2,
summ,
kvals=req(rf_get_cindex_kval()),
labs=req(rf_get_cindex_labs()),
colrdata=rf_colrdata())
p = make_spec_ms2_plot(db = rvs$m$db,
summ,
kvals=req(rf_get_cindex_kval()),
labs=req(rf_get_cindex_labs()),
colrdata=rf_colrdata())
p = if (!is.null(p)) p else empty_plot("Nothing to plot")
p
......@@ -905,29 +908,35 @@ mk_shinyscreen_server <- function(projects,init) {
choices = list.files(path=top_data_dir,
pattern = CMPD_LIST_PATT))
updateSelectInput(session = session,
inputId = "dfile_list",
choices = list.files(path=top_data_dir,
pattern = DFILES_LIST_PATT))
updateSelectInput(session = session,
inputId = "top_data_dir_list",
selected = basename(top_data_dir),
choices = list.dirs(path = init$envopts$top_data_dir,
full.names = F,
recursive = F))
})
observe({
top_data_dir = rvs$gui$paths$data
req(isTruthy(top_data_dir) && dir.exists(top_data_dir))
data_dir = rvs$gui$paths$data
if ((isTruthy(data_dir) && dir.exists(data_dir))) {
dfchoices = list.files(path=data_dir,
pattern = DFILES_LIST_PATT)
updateSelectInput(session = session,
inputId = "dfile_list",
choices = dfchoices)
updateSelectInput(session = session,
inputId = "top_data_dir_list",
selected = basename(rvs$gui$paths$data),
choices = list.dirs(path = init$envopts$top_data_dir,
full.names = F,
recursive = F))
}
updateSelectInput(session = session,
inputId = "dfile_list",
choices = list.files(path=top_data_dir,
pattern = DFILES_LIST_PATT))
})
## Update projects and data directories every second.
observeEvent(rtimer1000(),{
......@@ -981,13 +990,15 @@ mk_shinyscreen_server <- function(projects,init) {
rvs$status$ms2_int_thresh_stat = rvs$m$conf$prescreen[["ms2_int_thresh"]]
rvs$status$s2n_stat = rvs$m$conf$prescreen[["s2n"]]
rvs$status$ret_time_shift_tol_stat = rvs$m$conf$prescreen[["ret_time_shift_tol"]]
if (NROW(m$extr$ms1)>0L) rvs$status$is_extracted_stat = "Yes."
if (NROW(m$db$extr$cgm$ms1)>0L) rvs$status$is_extracted_stat = "Yes."
if (NROW(m$out$tab$summ)>0L) rvs$status$is_qa_stat = "Yes."
} else {
message("Initialising project: ",wd)
rvs$gui = create_gui(project_path=fullwd)
}
message("project: ",rvs$gui$project())
}, label = "project-b")
......@@ -1017,10 +1028,11 @@ mk_shinyscreen_server <- function(projects,init) {
observeEvent(input$sel_data_dir_b,{
data_dir = input$top_data_dir_list
req(isTruthy(data_dir))
rvs$gui$paths$data = file.path(init$envopts$top_data_dir, data_dir)
message("Selected data dir:",rvs$gui$paths$data)
if (isTruthy(data_dir)) {
rvs$gui$paths$data = file.path(init$envopts$top_data_dir, data_dir)
message("Selected data dir:",rvs$gui$paths$data)
}
})
......@@ -1045,32 +1057,9 @@ mk_shinyscreen_server <- function(projects,init) {
observeEvent(input$datafiles_b,{
new_file = input$dfile_list
if (isTruthy(new_file)) {
rvs$gui$filetag = filetag_add_file(rvs$gui$filetag,
new_file)
if (isTruthy(new_file)) rvs$gui$filetag = filetag_add_file(rvs$gui$filetag,
new_file)
## curr_file = rvs$gui$datatab$file
## curr_tag = rvs$gui$datatab$tag
## curr_adduct = rvs$gui$datatab$adduct
## curr_set = rvs$gui$datatab$set
## res_file = union(curr_file,new_file)
## res_adduct = c(curr_adduct,rep(NA_character_,nd))
## res_set = c(curr_set,rep(NA_character_,nd))
## rvs$gui$datatab$file = res_file
## rvs$gui$datatab$tag = add_new_def_tag(as.character(rvs$gui$datatab$tag),nd)
## rvs$gui$datatab$adduct = res_adduct
## rvs$gui$datatab$set = res_set
}
updateSelectInput(session=session,
inputId="dfile_list",
selected=NULL)
})
observeEvent(input$rem_dfiles_b,{
......@@ -1162,7 +1151,7 @@ mk_shinyscreen_server <- function(projects,init) {
rv_extr_flag(F)
rvs$m = run(m=rvs$m,
envopts=init$envopts,
phases=c("setup","comptab","extract"))
phases=c("setup","comptab","db","extract"))
rvs$status$is_extracted_stat = "Yes."
rvs$status$is_qa_stat = "No."
fn_c_state = file.path(rvs$m$run$paths$project,
......@@ -1178,7 +1167,7 @@ mk_shinyscreen_server <- function(projects,init) {
observeEvent(input$presc_b,{
if (NROW(rvs$m$extr$ms1)>0L) {
if (NROW(rvs$m$db$extr$cgm$ms1)>0L) {
## Update just prescreening conf.
rvs$m = app_update_conf(input=input,
gui=rvs$gui,
......@@ -1310,9 +1299,10 @@ mk_shinyscreen_server <- function(projects,init) {
observeEvent(input$make_report_b,{
isolate({
ms1 = rvs$m$extr$ms1
ms2 = rvs$m$extr$ms2
ms1 = rvs$m$db$extr$cgm$ms1
ms2 = rvs$m$db$extr$cgm$ms2
summ = rvs$m$out$tab$summ
db = rvs$m$db
})
req(NROW(summ)>0L)
......@@ -1334,18 +1324,21 @@ mk_shinyscreen_server <- function(projects,init) {
names(kvals) = key
message('Compound index row: ',ri)
p1 = make_eic_ms1_plot(ms1,summ,kvals=kvals,
labs=labs,
asp=PLOT_EIC_ASPECT,
rt_range=rt_range,
i_range=i_range,
colrdata = colrdata) + theme_print()
p2 = make_eic_ms2_plot(summ,kvals=kvals,
labs=labs,
asp=PLOT_EIC_ASPECT,
rt_range=rt_range,
colrdata = colrdata) + theme_print()
p1 = make_eic_ms1_plot(db=db,
summ,
kvals=kvals,
labs=labs,
asp=PLOT_EIC_ASPECT,
rt_range=rt_range,
i_range=i_range,
colrdata = colrdata) + theme_print()
p2 = make_eic_ms2_plot(db=db,
summ,kvals=kvals,
labs=labs,
asp=PLOT_EIC_ASPECT,
rt_range=rt_range,
colrdata = colrdata) + theme_print()
......@@ -1353,11 +1346,11 @@ mk_shinyscreen_server <- function(projects,init) {
smi = rvs$m$out$tab$comp[ID==(id),SMILES][[1]]
p_struc = make_struct_plot(smi)
p_spec = make_spec_ms2_plot(ms2,
summ,
kvals=kvals,
labs=labs,
colrdata=colrdata)+theme_print()
p_spec = make_spec_ms2_plot(db=db,
summ,
kvals=kvals,
labs=labs,
colrdata=colrdata)+theme_print()
cmb = combine_plots(p1,p2,p_spec,p_struc)
print(cmb)
......@@ -1377,7 +1370,7 @@ mk_shinyscreen_server <- function(projects,init) {
fn = file.path(projdir,input$ms2_spectra_tab_name)
shinymsg(paste0("Saving MS2 spectra table to: ",basename(fn)))
tab2file(pack_ms2_w_summ(rvs$m$out$tab$summ,
rvs$m$extr$ms2),
rvs$m$db$extr$spectra),
fn)
shinymsg("Done saving MS2 spectra table.")
})
......@@ -1588,7 +1581,7 @@ mk_shinyscreen_server <- function(projects,init) {
})
output$comp_table = DT::renderDataTable({
## TODO FIXME
## TODO
## cmpds = rf_get_cmpd_tab()
## validate(need(NROW(cmpds)>0,"No compound list loaded yet."))
## DT::datatable(cmpds,
......@@ -1626,8 +1619,6 @@ mk_shinyscreen_server <- function(projects,init) {
m=rvs$m)
kv = rf_get_cindex_kval()
## Some cols that might be needed to be specified
## explicitely. FIXME TODO: Make this more robust.
nsumm = mf_narrow_summ(rvs$m$out$tab$summ,kv,
ms2_rt_i=input$mf_entry_rt_min,
ms2_rt_f=input$mf_entry_rt_max)
......@@ -1639,7 +1630,8 @@ mk_shinyscreen_server <- function(projects,init) {
path = rvs$m$run$metfrag$path,
subpaths = rvs$m$run$metfrag$subpaths,
db_file = rvs$m$run$metfrag$db_file,
stag_tab = stagtab, ms2 = rvs$m$extr$ms2,
stag_tab = stagtab,
ms2 = rvs$m$db$extr$spectra,
runtime=rvs$m$run$metfrag$runtime,
java_bin=rvs$m$run$metfrag$java_bin,
nproc = rvs$m$conf$metfrag$nproc)
......@@ -1768,7 +1760,7 @@ mk_shinyscreen_server <- function(projects,init) {
selMS2 = req(input$sel_spec)
if (NROW(ms2tabsel)!=0L) {
lval = lapply(ms2tabsel[item==(selMS2)],function(x) x)
ms2 = rvs$m$extr$ms2
ms2 = rvs$m$db$extr$cgm$ms2
kval = rf_get_cindex_kval()
allval = c(kval,lval)
## There can be some duplicates.
......@@ -1778,10 +1770,11 @@ mk_shinyscreen_server <- function(projects,init) {
#more than the names existing in extr$ms2. Also,
#BASE_KEY_MS2 does not contain `an', so we need to readd
#it.
key = unique(c(names(allval)[names(allval) %in% BASE_KEY_MS2],"an"))
key = unique(c(names(allval)[names(allval) %in% BASE_KEY_MS2],"scan"))
kval2 = allval[key]
spec = get_data_from_key(ms2,kval2)[,.(mz,intensity)]
## as.character(lapply(1L:NROW(spec),function(nr) paste0(spec[nr,mz]," ",spec[nr,intensity])))
## Only precid and scan can be used for selection in spectra.
kval2 = as.data.table(kval2[c("precid","scan")])
spec = rvs$m$db$extr$spectra[kval2,.(mz,intensity),on=.(precid,scan)]
print(as.data.frame(spec),row.names=F)
} else {
......
......@@ -37,13 +37,12 @@ new_state <- function() {
runtime_from_conf <- function(run,envopts,conf) {
lst_cmpl <- conf$compounds$lists
lst_fn_cmpl <- lapply(names(lst_cmpl),function (nm) {
bfn_cmpl <- lst_cmpl[[nm]]
lst_fn_cmpl <- lapply(lst_cmpl,function (lst) {
bfn_cmpl <- lst
fn <- file.path(run$paths$project,bfn_cmpl)
if (!file.exists(fn)) stop("File ", fn, " does not exist in ", run$paths$project," .")
fn
})
names(lst_fn_cmpl) <- names(lst_cmpl)
run$paths$compounds$lists <- lst_fn_cmpl
run$paths$data = norm_path(file.path(envopts$top_data_dir,conf$paths$data))
......@@ -95,7 +94,7 @@ new_runtime_state <- function(project,envopts,conf=NULL) {
## If MetFrag possible, set up the file structure.
if (metfrag$cando_metfrag) {
mfdir = file.path(project_path,"metfrag")
dir.create(mfdir)
dir.create(mfdir, showWarnings = F)
metfrag$path=mfdir
subpaths = list(results = "results",
config = "config",
......@@ -204,7 +203,7 @@ new_project <- function(project,envopts,datatab=NULL,conf=NULL) {
check_file_absent(fn_conf,what="conf-file")
yaml::yaml.load_file(fn_conf)
} else conf
m$conf$compounds$lists = label_cmpd_lists(m$conf$compounds$lists)
## m$conf$compounds$lists = label_cmpd_lists(m$conf$compounds$lists)
m$run = new_runtime_state(project,envopts=envopts,conf=m$conf)
if (!is.null(datatab)) {
m$input$tab$mzml = datatab
......@@ -409,9 +408,11 @@ pack_ms2_w_summ <- function(summ,ms2) {
## Takes summ, finds entries with high quality spectra and subsets ms2 based on that.
## Take the columns we need from summ.
x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz","SMILES","Formula","Name")]
mrg_keys = c(intersect(key(ms2),key(summ)),"an")
ms2[x,.(mz=i.mz,ms2_spectrum=encode_ms2_to_line(.SD[,c("mz","intensity")])),on=mrg_keys,by=.EACHI]
x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz","SMILES","Name")]
mrg_keys = intersect(key(ms2),key(summ))
mrg_keys = c(mrg_keys,"scan")
## ms2[x,.(mz=i.mz,ms2_spectrum=encode_ms2_to_line(.SD[,c("mz","intensity")])),on=mrg_keys,by=.EACHI]
ms2[x,.(ion_mz=i.mz,mz,intensity),on=mrg_keys,by=.EACHI]
}
......@@ -431,3 +432,40 @@ pack_project <- function(m,fn_arch) {
})
fn_arch
}
join_compound_lists <- function(fname) {
coll <- list()
fields <- colnames(EMPTY_CMPD_LIST)
coltypes <- c(ID="character",
SMILES="character",
Formula="character",
Name="character",
RT="numeric",
mz="numeric")
l=0L
for (fn in fname) {
l = l + 1L
## Figure out column headers.
nms <- colnames(file2tab(fn,nrows=0))
## Read the table. Knowing column headers prevents unnecessary
## warnings.
dt <- file2tab(fn, colClasses=coltypes[nms])
verify_cmpd_l(dt=dt,fn=fn)
# nonexist <- setdiff(fnfields,fields)
coll[[l]] <- dt #if (length(nonexist)==0) dt else dt[,(nonexist) := NULL]
coll[[l]]$ORIG <- fn
}
if (length(fname)>0) rbindlist(l=c(list(EMPTY_CMPD_LIST), coll), use.names = T, fill = T) else EMPTY_CMPD_LIST
}
process_cmpd_sets <- function(cmpdlist) {
if (nrow(cmpdlist)==0L) return(EMPTY_CMPD_LIST)
## Process sets.
if (! ("set" %in% colnames(cmpdlist))) cmpdlist$set=NA_character_ else cmpdlist[,set:=as.character(set)]
slugs = cmpdlist[,.(slug=gen_fname_slug(ORIG)),by="ORIG"]
slugs[,slug:=uniqy_slugs(slug)]
cmpdlist[slugs,set:=fifelse(is.na(set),i.slug,set),on="ORIG"]
cmpdlist
}