Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • eci/shinyscreen
  • miroslav.kratochvil/shinyscreen
2 results
Show changes
Commits on Source (67)
Showing with 1220 additions and 1468 deletions
...@@ -44,9 +44,6 @@ variables: ...@@ -44,9 +44,6 @@ variables:
base-image: base-image:
tags:
- docker
- $RUNNER_TAG
stage: dep_images stage: dep_images
rules: rules:
- if: $CI_COMMIT_TAG == "" - if: $CI_COMMIT_TAG == ""
......
Package: shinyscreen Package: shinyscreen
Title: Pre-screening of Mass Spectrometry Data Title: Pre-screening of Mass Spectrometry Data
Version: 1.3.19 Version: 1.3.21
Author: Todor Kondić Author: Todor Kondić
Maintainer: Todor Kondić <todor.kondic@uni.lu> Maintainer: Todor Kondić <todor.kondic@uni.lu>
Authors@R: Authors@R:
...@@ -50,6 +50,7 @@ Collate: ...@@ -50,6 +50,7 @@ Collate:
'errors.R' 'errors.R'
'mix.R' 'mix.R'
'envopts.R' 'envopts.R'
'data-model.R'
'state.R' 'state.R'
'metfrag.R' 'metfrag.R'
'plotting.R' 'plotting.R'
......
---
title: "shinyscreen Docker Documentation"
author: "Anjana Elapavalore"
date: '`r Sys.Date()`'
output: pdf_document
---
The objective of this documentation is to provide a detailed procedure for running the shinyScreen application within a Docker container. shinyScreen is a web-based tool developed for visualizing and analyzing high-throughput screening data using Shiny, a web application framework for R. Docker is employed to containerize the application, ensuring reproducibility and simplifying the deployment process by packaging the application with all its dependencies.
This document provides step-by-step instructions on how to install Docker, pull the ShinyScreen Docker image, and run the application.
## Prerequisites
Before proceeding, the following prerequisites must be met:
- Docker Desktop must be installed and properly configured on the system.
- Sufficient system resources, including at least 4 GB of RAM, are recommended for optimal performance.
## Step 1: Install Docker Desktop on Windows
If Docker Desktop is not already installed on your system, follow these steps to set it up:
1. **Download Docker Desktop**:
- Visit the [Docker Desktop for Windows](https://docs.docker.com/desktop/install/windows-install/) download page and download the installer.
2. **Install Docker Desktop**:
- Run the downloaded installer.
- Follow the installation prompts, and make sure the option to "Use the WSL 2-based engine" is selected during setup (if available).
3. **Verify Docker Installation**:
- After installation, launch Docker Desktop.
- Ensure Docker is running by checking the Docker icon in the system tray (it should be active).
- Open a command prompt (CMD) or PowerShell and type the following command to check the Docker version:
```bash
docker --version
```
This should return the installed Docker version.
## Step 2: Search for the ShinyScreen Docker Image in Docker Desktop
Docker Desktop provides a graphical interface to search for Docker images. To search and pull the **shinyScreen** Docker image:
1. **Open Docker Desktop**:
- Launch Docker Desktop from the Windows Start menu or by clicking the Docker icon in the system tray.
2. **Search for the ShinyScreen Image**:
- In Docker Desktop, navigate to the **search** tab as shown in the image below.
**Docker Image : anjanae/shinyscreen:v21_latest**
```{r, echo=FALSE, out.width="70%"}
knitr::include_graphics("images/searchtab.png")
```
3. **Pull the Image**:
- Click on the **Pull** button next to the desired ShinyScreen image. This will download the image to your local system.
You can monitor the download progress in Docker Desktop’s **Images** tab.
## Step 3: Run the ShinyScreen Docker Container
Once the ShinyScreen image has been pulled successfully, the next step is to run the container. This can be done directly from Docker Desktop or via the command line:
### Running the Container via Docker Desktop:
1. **Navigate to the Images Tab**:
- In Docker Desktop, go to the **Images** tab where your downloaded ShinyScreen image will be listed.
2. **Launch the Container**:
- Click the **Run** button next to the ShinyScreen image.
- In the pop-up dialog, ensure that port `3838` is exposed. To do this, add the following port configuration in the **Optional Settings**:
```
Host Port: 3838
Container Port: 3838
```
3. **To create the volume mounts**:
- To share files between the host and the container, you can use Docker’s volume mounting feature.
### local Directory Structure
Before launching the Docker container, the following directory structure should be set up on your local system. These directories will be mounted to specific locations inside the container, enabling the application to access the required input data and store the output files.
### Root Directory
The **root directory** is the top-level folder where the entire project is organized. This root directory will contain the following subdirectories:
1. **Project Directory**
2. **Data Directory**
3. **MetFrag Databases Directory**
Below is an overview of each subdirectory and its purpose.
### Project Directory
The **Project Directory** should initially contain only the **compound list** file, which is an essential input for the ShinyScreen application.
After the application runs, this directory will also store various output files generated during the analysis.
### Data Directory
The **Data Directory** should contain only the **mzML files**. These files serve as the primary data input for the ShinyScreen application and represent mass spectrometry data in mzML format.
### MetFrag Databases Directory
The **MetFrag Databases Directory** must include the latest **PubChemLite_Exposomics.csv** file, which will be used by the ShinyScreen application for MetFrag analysis.
### Configure Volume Mounts in Docker Desktop
Now that the local directories are set up, you can configure **volume mounts** in Docker Desktop to map these local directories to the corresponding paths inside the Docker container.
For the volume mounts, each directory on the host machine (local system) will be mapped to a specific location within the Docker container, allowing the application to access the input data and save output files.
In the **Optional Settings** pop-up window, you will configure the following **three volume mounts** (Please refer to the table and image below):
- **Local Directory on Windows**:
- This is the folder on your local machine where the data will reside.
- **Container Directory**:
- This is the path inside the Docker container where the files will be mounted.
| Local Directory (Host) | Container Directory (Mount) | Description |
| ------------------------------ | -------------------------------------- | ------------------------------- |
| `C:\path\to\rootdirectory` | `/home/ssuser/projects` | Directory for project files. |
| `C:\path\to\rootdirectory` | `/home/ssuser/top_data_dir` | Directory for top-level data. |
| `C:\path\to\metfrag_dbs` | `/home/ssuser/metfrag_dbs` | Directory for MetFrag databases. |
```{r, echo=FALSE, out.width="70%"}
knitr::include_graphics("images/runtab.png")
```
By setting up the volume mounts, any changes made to files in the mapped directories on the host machine will automatically reflect inside the container.
Once the **port** and **volume configurations** are confirmed, click **Run** to start the container. The ShinyScreen application will now be running inside the container.
Now click on the port mapping to access the application in the web browser
```{r, echo=FALSE, out.width="85%"}
knitr::include_graphics("images/containertab.png")
```
File added
Docker_Documentation/images/containertab.png

198 KiB

Docker_Documentation/images/runtab.png

97.9 KiB

Docker_Documentation/images/searchtab.png

141 KiB

FROM gitlab.lcsb.uni.lu:4567/eci/shinyscreen/dep/ssuser:latest FROM gitlab.lcsb.uni.lu:4567/eci/shinyscreen/dep/ssuser:latest
MAINTAINER todor.kondic@uni.lu MAINTAINER todor.kondic@uni.lu
EXPOSE 5432
EXPOSE 3838
ENV SS_MF_DB="PubChemLite_exposomics.csv" ENV SS_MF_DB="PubChemLite_exposomics.csv"
ENV SS_CPU 2 ENV SS_CPU 2
ADD . shinyscreen/ ADD . shinyscreen/
RUN R CMD build shinyscreen RUN R CMD build shinyscreen
USER root USER root
RUN R CMD INSTALL shinyscreen RUN R CMD INSTALL shinyscreen
USER ssuser USER ssuser
RUN cp shinyscreen/runme /home/ssuser/runme RUN cp shinyscreen/runme /home/ssuser/runme
RUN chmod u+x /home/ssuser/runme RUN chmod u+x /home/ssuser/runme
# RUN chown ssuser /home/ssuser/runme # RUN chown ssuser /home/ssuser/runme
# RUN chown -R ssuser /home/ssuser/shinyscreen # RUN chown -R ssuser /home/ssuser/shinyscreen
RUN R -e 'library(shinyscreen);setwd("~");init(top_data_dir="~/top_data_dir",projects="~/projects",users_dir="~/users",metfrag_db_dir=Sys.getenv("SS_MF_DB_DIR"),metfrag_jar="/usr/local/bin/MetFragCommandLine.jar",no_structure_plots=T,save=T,merge=F)' RUN R -e 'library(shinyscreen);setwd("~");init(top_data_dir="~/top_data_dir",projects="~/projects",users_dir="~/users",metfrag_db_dir=Sys.getenv("SS_MF_DB_DIR"),metfrag_jar="/usr/local/bin/MetFragCommandLine.jar",no_structure_plots=T,save=T,merge=F)'
ENTRYPOINT ["/home/ssuser/runme"] ENTRYPOINT ["/home/ssuser/runme"]
CMD ["app"] CMD ["app"]
...@@ -6,7 +6,6 @@ export(conf_trans) ...@@ -6,7 +6,6 @@ export(conf_trans)
export(create_plots) export(create_plots)
export(create_stub_gui) export(create_stub_gui)
export(extr_data) export(extr_data)
export(extract)
export(get_fn_comp) export(get_fn_comp)
export(get_fn_conf) export(get_fn_conf)
export(get_fn_extr) export(get_fn_extr)
...@@ -47,8 +46,6 @@ export(sort_spectra) ...@@ -47,8 +46,6 @@ export(sort_spectra)
export(subset_summary) export(subset_summary)
export(tk_save_file) export(tk_save_file)
import(data.table) import(data.table)
importFrom(MSnbase,filterMz)
importFrom(MSnbase,readMSData)
importFrom(promises,"%...>%") importFrom(promises,"%...>%")
importFrom(promises,future_promise) importFrom(promises,future_promise)
importFrom(shiny,HTML) importFrom(shiny,HTML)
......
...@@ -32,6 +32,7 @@ run <- function(envopts, ...@@ -32,6 +32,7 @@ run <- function(envopts,
all_phases=list(setup=setup_phase, all_phases=list(setup=setup_phase,
comptab=mk_comp_tab, comptab=mk_comp_tab,
db=make_db,
extract=extr_data, extract=extr_data,
prescreen=prescreen, prescreen=prescreen,
sort=sort_spectra, sort=sort_spectra,
...@@ -98,39 +99,14 @@ run_in_dir <- function(m) { ...@@ -98,39 +99,14 @@ run_in_dir <- function(m) {
} }
##' @export ##' @export
load_compound_input <- function(m) { load_compound_input <- function(m) {
coll <- list()
fields <- colnames(EMPTY_CMPD_LIST)
fns <- file.path(m$run$paths$project,m$conf$compounds$lists)
message("fns:",paste0(fns,collapse=","))
coltypes <- c(ID="character",
SMILES="character",
Formula="character",
Name="character",
RT="numeric",
mz="numeric")
for (l in 1:length(fns)) {
fn <- fns[[l]]
## Figure out column headers.
nms <- colnames(file2tab(fn,nrows=0))
## Read the table. Knowing column headers prevents unnecessary
## warnings.
dt <- file2tab(fn, colClasses=coltypes[nms])
verify_cmpd_l(dt=dt,fn=fn)
# nonexist <- setdiff(fnfields,fields)
coll[[l]] <- dt #if (length(nonexist)==0) dt else dt[,(nonexist) := NULL]
coll[[l]]$ORIG <- fn
}
cmpds <- if (length(fns)>0) rbindlist(l=c(list(EMPTY_CMPD_LIST), coll), use.names = T, fill = T) else EMPTY_CMPD_LIST
fns <- file.path(m$run$paths$project,m$conf$compounds$lists)
cmpds = join_compound_lists(fns)
## Process sets.
cmpds = process_cmpd_sets(cmpds)
dups <- duplicated(cmpds$ID) dups <- duplicated(cmpds$ID)
dups <- dups | duplicated(cmpds$ID,fromLast = T) dups <- dups | duplicated(cmpds$ID,fromLast = T)
dupIDs <- cmpds$ID[dups] dupIDs <- cmpds$ID[dups]
...@@ -219,8 +195,6 @@ mk_comp_tab <- function(m) { ...@@ -219,8 +195,6 @@ mk_comp_tab <- function(m) {
smiforadd <- smiles[smiforadd,.(ID,SMILES,Formula,adduct),on=c("SMILES")] smiforadd <- smiles[smiforadd,.(ID,SMILES,Formula,adduct),on=c("SMILES")]
data.table::setkey(smiforadd,"adduct","ID") data.table::setkey(smiforadd,"adduct","ID")
## FIXME: Why is Formula a list when there are no SMILES, instead
## of an empty string?
smiforadd[,Formula:=as.character(Formula)] smiforadd[,Formula:=as.character(Formula)]
## Update the intermediate table with masses. ## Update the intermediate table with masses.
...@@ -336,181 +310,67 @@ mk_tol_funcs <- function(m) { ...@@ -336,181 +310,67 @@ mk_tol_funcs <- function(m) {
} }
##' @export ##' @export
extr_data <-function(m) { extr_data <-function(m) {
message("Stage: extract")
if (is.null(m$conf$serial) || !m$conf$serial) {
extr_data_future(m)
} else {
message("(extract): Serial extraction.")
extr_data_serial(m)
}
}
extr_data_future <- function(m) { fine = create_fine_table(m)
## Reduce the comp table to only unique masses (this is because
## different sets can have same masses).
m$out$tab$data <- m$out$tab$comp[,head(.SD,1),by=BASE_KEY]
m$out$tab$data[,set:=NULL] #This column is meaningless now.
file <- m$out$tab$data[,unique(file)]
fpaths <- file.path(m$run$paths$data,file)
allCEs <- do.call(c,args=lapply(fpaths,function(fn) {
z <- MSnbase::readMSData(files=fn,msLevel = c(1,2),mode="onDisk")
dpath = m$run$paths$data
unique(MSnbase::collisionEnergy(z),fromLast=T)
}))
allCEs <- unique(allCEs)
allCEs <- allCEs[!is.na(allCEs)]
cols <-paste('CE',allCEs,sep = '')
vals <- rep(NA,length(cols))
m$out$tab$data[,(cols) := .(rep(NA,.N))]
file <- m$out$tab$data[,unique(file)]
ftags <- m$out$tab$data[,.(tag=unique(tag)),by=file]
fpaths <- file.path(m$run$paths$data,ftags[,file])
futuref <- m$future
tmp <- lapply(1:nrow(ftags),function(ii) {
fn <- fpaths[[ii]]
the_tag <- ftags[ii,tag]
message("(extract): Commencing extraction for tag: ", the_tag, "; file: ",fn)
tab <- as.data.frame(data.table::copy(m$out$tab$data[tag==the_tag,.(file,tag,adduct,mz,rt,ID)]))
## err_ms1_eic <- m$extr$tol$eic
## err_coarse_fun <- m$extr$tol$coarse
## err_fine_fun <- m$extr$tol$fine
## err_rt <- m$extr$tol$rt
err_coarse <- m$conf$tolerance[["ms1 coarse"]]
err_fine <- m$conf$tolerance[["ms1 fine"]]
## Open all files.
err_ms1_eic <- m$conf$tolerance$eic fns = fine[,unique(file)]
lms = lapply(fns,function(fn) read_data_file(file.path(dpath,fn)))
names(lms) = fns
err_rt <- m$conf$tolerance$rt
## Load all feature data.
lfdata = lapply(lms,get_fdata)
names(lfdata) = fns
missing_precursor_info <- m$conf$extract$missing_precursor_info ## Extract MS1 chromatograms using "fine" tolerance.
x <- futuref(extract(fn=fn, cgram_ms1 = fine[,extr_cgrams_ms1(lms[[file]],.SD,lfdata[[file]]),
tag=the_tag, by="file",
tab=tab, .SDcols=c("iso_fine_min",
err_ms1_eic=err_ms1_eic, "iso_fine_max",
err_coarse = err_coarse, "rt_min",
err_fine= err_fine, "rt_max",
err_rt= err_rt, "precid")]
missing_precursors = missing_precursor_info), setkey(cgram_ms1,file,precid)
lazy = F)
x ## Extract MS2 chromatograms.
}) ## Create the "coarse" table. Parent masses are known with
## "coarse". We will prefilter our ms2 results based on that...
coarse = create_coarse_table(m)
msk <- sapply(tmp,future::resolved)
curr_done <- which(msk)
for (x in curr_done) { cgram_ms2 = data.table(precid=integer(0),
message("Done extraction for ", future::value(tmp[[x]])$ms1$tag[[1]]) ce=numeric(0),
} scan=character(0),
while (!all(msk)) { idx=integer(0),
msk <- sapply(tmp,future::resolved) rt=numeric(0),
newly_done <- which(msk) intensity=numeric(0))
for (x in setdiff(newly_done,curr_done)) {
message("Done extraction for ", future::value(tmp[[x]])$ms1$tag[[1]])
} ## Extract MS2 spectra.
Sys.sleep(0.5) spectra = empty_spectra_table()
curr_done <- newly_done
for (fn in names(lfdata)) {
rtab = relate_ms2_to_precid(coarse=coarse[.(fn),on=.(file)],
ms2=lfdata[[fn]]$ms2,
cgram_ms1=cgram_ms1[.(fn),
on=.(file)])
sptab = extract_spectra(lms[[fn]],rtab)
cgram_ms2 = rbind(cgram_ms2,rtab)
spectra = rbind(spectra,sptab)
} }
setkey(cgram_ms1,precid,rt)
ztmp <- lapply(tmp,future::value) setkey(cgram_ms2,precid,ce,rt)
m$extr$ms1 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms1)) setkey(spectra,precid,scan)
m$extr$ms2 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms2)) m$db$extr$cgm$ms1 = cgram_ms1
data.table::setkeyv(m$extr$ms1,BASE_KEY) m$db$extr$cgm$ms2 = cgram_ms2
data.table::setkeyv(m$extr$ms2,c(BASE_KEY,"CE")) m$db$extr$spectra = spectra
fn_ex <- get_fn_extr(m)
timetag <- format(Sys.time(), "%Y%m%d_%H%M%S")
saveRDS(object = m, file = file.path(m$run$paths$project,FN_EXTR_STATE))
m m
}
extr_data_serial <- function(m) {
## Reduce the comp table to only unique masses (this is because
## different sets can have same masses).
m$out$tab$data <- m$out$tab$comp[,head(.SD,1),by=BASE_KEY]
m$out$tab$data[,set:=NULL] #This column is meaningless now.
file <- m$out$tab$data[,unique(file)]
fpaths <- file.path(m$run$paths$data,file)
allCEs <- do.call(c,args=lapply(fpaths,function(fn) {
z <- MSnbase::readMSData(files=fn,msLevel = c(1,2),mode="onDisk")
unique(MSnbase::collisionEnergy(z),fromLast=T)
}))
allCEs <- unique(allCEs)
allCEs <- allCEs[!is.na(allCEs)]
cols <-paste('CE',allCEs,sep = '')
vals <- rep(NA,length(cols))
m$out$tab$data[,(cols) := .(rep(NA,.N))]
file <- file.path(m$run$paths$data,m$out$tab$data[,unique(file)])
ftags <- m$out$tab$data[,.(tag=unique(tag)),by=file]
ftags[,path:=file.path(..m$run$paths$data,file)]
futuref <- m$future
tmp <- lapply(1:nrow(ftags),function(ii) {
fn <- ftags[ii,path]
the_tag <- ftags[ii,tag]
message("(extract): Commencing extraction for tag: ", the_tag, "; file: ",fn)
tab <- as.data.frame(data.table::copy(m$out$tab$data[tag==the_tag,.(file,tag,adduct,mz,rt,ID)]))
## err_ms1_eic <- m$extr$tol$eic
## err_coarse_fun <- m$extr$tol$coarse
## err_fine_fun <- m$extr$tol$fine
## err_rt <- m$extr$tol$rt
err_coarse <- m$conf$tolerance[["ms1 coarse"]]
err_fine <- m$conf$tolerance[["ms1 fine"]]
err_ms1_eic <- m$conf$tolerance$eic
err_rt <- m$conf$tolerance$rt
missing_precursor_info <- m$conf$extract$missing_precursor_info
x <- extract(fn=fn,
tag=the_tag,
tab=tab,
err_ms1_eic=err_ms1_eic,
err_coarse = err_coarse,
err_fine= err_fine,
err_rt= err_rt,
missing_precursors = missing_precursor_info)
message("Done extraction for ", x$ms1$tag[[1]])
x
})
ztmp <- tmp
m$extr$ms1 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms1))
m$extr$ms2 <- data.table::rbindlist(lapply(ztmp,function(x) x$ms2))
data.table::setkeyv(m$extr$ms1,BASE_KEY)
data.table::setkeyv(m$extr$ms2,c(BASE_KEY,"CE"))
fn_ex <- get_fn_extr(m)
timetag <- format(Sys.time(), "%Y%m%d_%H%M%S")
saveRDS(object = m, file = file.path(m$run$paths$project,FN_EXTR_STATE))
m
} }
##' @export ##' @export
...@@ -523,11 +383,10 @@ conf_trans <- function(conf) { ...@@ -523,11 +383,10 @@ conf_trans <- function(conf) {
prescreen <- function(m) { prescreen <- function(m) {
## Top-level auto prescreening function. ## Top-level auto prescreening function.
message("(prescreen): Start.") message("(prescreen): Start.")
## confpres <- conf_trans_pres(m$conf$prescreen) m$qa = NULL
m$qa <- NULL m$out$tab$summ = NULL
m$out$tab$summ <- NULL m$qa = analyse_extracted_data(m$db,m$conf$prescreen)
m$qa <- analyse_extracted_data(m$extr,m$conf$prescreen) m$out$tab$summ = gen_summ(m$db,m$qa,m$out$tab$comp)
m$out$tab$summ <- gen_summ(m$out$tab$comp,m$qa)
message("(prescreen): End.") message("(prescreen): End.")
m m
} }
...@@ -605,8 +464,8 @@ create_plots <- function(m) { ...@@ -605,8 +464,8 @@ create_plots <- function(m) {
## Select the data nedeed for plotting. ## Select the data nedeed for plotting.
flt_summ <- m$out$tab$flt_summ flt_summ <- m$out$tab$flt_summ
ms2 <- m$extr$ms2 ms2 <- m$db$extr$cgm$ms2
ms1 <- m$extr$ms1 ms1 <- m$db$extr$cgm$ms1
rt_min <- rt_in_min(m$conf$figures$rt_min) rt_min <- rt_in_min(m$conf$figures$rt_min)
rt_max <- rt_in_min(m$conf$figures$rt_max) rt_max <- rt_in_min(m$conf$figures$rt_max)
keytab <- flt_summ[,unique(.SD),.SDcol=c("adduct","ID")] keytab <- flt_summ[,unique(.SD),.SDcol=c("adduct","ID")]
...@@ -747,8 +606,8 @@ report <- function(m) { ...@@ -747,8 +606,8 @@ report <- function(m) {
dir.create(REP_TOPDIR,recursive = T,showWarnings = F) dir.create(REP_TOPDIR,recursive = T,showWarnings = F)
header <- knitr::knit_expand(fn_header) header <- knitr::knit_expand(fn_header)
flt_summ <- m$out$tab$reptab flt_summ <- m$out$tab$reptab
ms2 <- m$extr$ms2 ms2 <- m$db$extr$cgm$ms2
ms1 <- m$extr$ms1 ms1 <- m$db$extr$cgm$ms1
rt_min <- rt_in_min(m$conf$figures$rt_min) rt_min <- rt_in_min(m$conf$figures$rt_min)
rt_max <- rt_in_min(m$conf$figures$rt_max) rt_max <- rt_in_min(m$conf$figures$rt_max)
keytab <- flt_summ[,unique(.SD),.SDcol=c("adduct","ID")] keytab <- flt_summ[,unique(.SD),.SDcol=c("adduct","ID")]
...@@ -922,7 +781,7 @@ metfrag <- function(m) { ...@@ -922,7 +781,7 @@ metfrag <- function(m) {
path = m$run$metfrag$path, path = m$run$metfrag$path,
subpaths = m$run$metfrag$subpaths, subpaths = m$run$metfrag$subpaths,
db_file = m$run$metfrag$db_file, db_file = m$run$metfrag$db_file,
stag_tab = stagtab, ms2 = m$extr$ms2, stag_tab = stagtab, ms2 = m$db$extr$spectra,
runtime=m$run$metfrag$runtime, runtime=m$run$metfrag$runtime,
java_bin=m$run$metfrag$java_bin, java_bin=m$run$metfrag$java_bin,
nproc = m$conf$metfrag$nproc) nproc = m$conf$metfrag$nproc)
...@@ -942,3 +801,9 @@ metfrag <- function(m) { ...@@ -942,3 +801,9 @@ metfrag <- function(m) {
m m
} }
make_db <- function(m) {
m = make_db_catalogue(m)
m = make_db_precursors(m)
m
}
...@@ -18,7 +18,13 @@ ...@@ -18,7 +18,13 @@
the_ifelse <- data.table::fifelse the_ifelse <- data.table::fifelse
dtable <- data.table::data.table dtable <- data.table::data.table
norm_path <- function(...) normalizePath(...,winslash='/') norm_path <- function(...) {
test = nchar(...) > 0L
res = character(length(test))
res[test] = normalizePath(...[test],winslash='/')
res[!test] = ...[!test]
res
}
tab2file<-function(tab,file,...) { tab2file<-function(tab,file,...) {
data.table::fwrite(x=tab,file=file,...) data.table::fwrite(x=tab,file=file,...)
...@@ -143,3 +149,23 @@ uniqy_slugs <- function(slugs) { ...@@ -143,3 +149,23 @@ uniqy_slugs <- function(slugs) {
dt = data.table::data.table(slug=slugs) dt = data.table::data.table(slug=slugs)
dt[,slug:=fifelse(rep(.N==1L,.N),slug,paste0(slug,"_",seq(1L,.N))),by="slug"]$slug dt[,slug:=fifelse(rep(.N==1L,.N),slug,paste0(slug,"_",seq(1L,.N))),by="slug"]$slug
} }
gen_val_unc <- function(x,dx) {
## Doesn't work well for <=0.
p = floor(log10(x))
dp = floor(log10(dx))
## Zero?
message("p ",p)
w = which(is.infinite(p))
p[w] = 0
## Normalise x and dx.
main = x/10**p
unc = round(dx/10**dp,0)
place = p - dp
main = mapply(function (m,d) formatC(m,digits=d,format='f',flag="#"),main,place,USE.NAMES=F)
w = which(main=='10.')
main[w]='1'
p[w]=p[w]+1
paste0(main,"(",unc,") x 10^",p)
}
#Copyright (C) 2023 by University of Luxembourg
## Shinyscreen works of an internal relational database implemented
## using `data.table' package. Implementation is here.
make_db_catalogue <- function(m) {
## Takes comprehensive database from state `m' and generates a
## catalogue with a unique key. This catalogue is based on
## inputs. Each entry in the catalogue corresponds to a single
## target mass from a single experimental run.
res = m$out$tab$comp[,unique(.SD),.SDcols=c("set","tag","adduct","ID")]
res[,catid:=.I]
setkeyv(res,DB_CATALOGUE_KEY)
setindex(res,catid)
m$db$cat = res
m
}
merge_precid_4_isobars <- function(orig_precids,masses,up_masses) {
start = head(orig_precids,1L)
n = length(orig_precids)
precid = orig_precids
i = 1L
while (i < n) {
theprecid = orig_precids[[i]]
themz = masses[[i]]
mzup = up_masses[[i]]
w = which(masses[(i+1L):n]<mzup)
precid[(i+1L):n][w] = theprecid
i = i + length(w) + 1L
}
precid
}
make_db_precursors <- function(m) {
## Generate masses and label isobars.
cat = m$db$cat
masses = m$out$tab$comp[cat,.(tag=tag,catid=catid,mz=mz,rt=rt),on=key(cat)]
setkey(masses,tag,mz)
## Retention time.
tmp = get_val_unit(m$conf$tolerance[['rt']])
rttol = as.numeric(tmp[['val']])
rtunit = tmp[['unit']]
if (rtunit == "s") {
rttol = rttol/60.
} else if (rtunit != "min") {
stop('make_db_precursors: Unknown retention time unit.')
}
masses[!is.na(rt),`:=`(rt_min=rt-rttol,rt_max=rt+rttol)]
## Fine error.
tmp = get_val_unit(m$conf$tolerance[['ms1 fine']])
ms1tol = as.numeric(tmp[['val']])
ms1unit = tmp[['unit']]
if (ms1unit == "ppm") {
masses[,`:=`(mz_fine_min=mz-ms1tol*mz*1e-6,mz_fine_max=mz+ms1tol*mz*1e-6)]
} else if (ms1unit == "Da") {
masses[,`:=`(mz_fine_min=mz-ms1tol,mz_fine_max=mz+ms1tol)]
} else {
stop('make_db_precursors: Unknown mass unit (fine).')
}
## Coarse error.
tmp = get_val_unit(m$conf$tolerance[['ms1 coarse']])
ms1tol = as.numeric(tmp[['val']])
ms1unit = tmp[['unit']]
if (ms1unit == "ppm") {
masses[,`:=`(mz_coarse_min=mz-ms1tol*mz*1e-6,mz_coarse_max=mz+ms1tol*mz*1e-6)]
} else if (ms1unit == "Da") {
masses[,`:=`(mz_coarse_min=mz-ms1tol,mz_coarse_max=mz+ms1tol)]
} else {
stop('make_db_precursors: Unknown mass unit (coarse).')
}
## Assign "fine" isobars to same isocoarse number.
masses[,precid:=merge_precid_4_isobars(catid,mz,mz_fine_max),by="tag"]
## Assign "coarse" isobars to same isocoarse number.
masses[,isocoarse:=merge_precid_4_isobars(catid,mz,mz_coarse_max),by="tag"]
masses[,`:=`(iso_coarse_min=min(mz_coarse_min),
iso_coarse_max=max(mz_coarse_max)),
by=isocoarse]
masses[,`:=`(iso_fine_min=min(mz_fine_min),
iso_fine_max=max(mz_fine_max)),
by=precid]
setindex(masses,isocoarse,precid)
## Add files.
filetab = m$input$tab$mzml[m$db$cat,
.(catid=i.catid,file=file),
on=c("set","tag"),nomatch=NULL]
masses[filetab,file:=i.file,on="catid"]
m$db$precursors = masses
m
}
empty_cgram_ms1 <- function(n=0L) {
r = data.table(file=character(n),
cgmidx=integer(n),
precid=integer(n),
scan=integer(n),
rt=numeric(n),
intensity=numeric(n))
setkey(r,precid,rt)
r
}
empty_cgram_ms2 <- function(n=0L) {
r = data.table(precid=integer(n),
ce=numeric(n),
scan=character(n),
idx=integer(n),
rt=numeric(n),
intensity=numeric(n))
setkey(r,precid,ce,idx)
r
}
empty_spectra_table <- function(n=0L) {
r = data.table(precid=integer(n),
scan=character(n),
mz=numeric(n),
intensity=numeric(n))
setkey(r,precid,scan)
r
}
summ_needs_from_cat <- function(cat) {
## Catalogue columns.
cat
}
summ_needs_from_precursors <- function(res,precursors) {
## Mass columns.
precursors[res,on=.(catid),.(precid,
mz,
set,
adduct,
tag,
ID,
mz_l=mz_fine_min,
mz_r=mz_fine_max),by=.EACHI]
}
summ_needs_from_qa <- function(res,qa) {
needs = qa[,.SD,.SDcols=c("precid",
"ce",
"scan",
"ms1_rt",
"ms1_int",
"ms2_rt",
"ms2_int",
"ms1_mean",
"ms2_sel",
"qa_pass",
"qa_ms1_exists",
"qa_ms2_exists",
"qa_ms1_good_int",
"qa_ms1_above_noise",
"qa_ms2_near",
"qa_ms2_good_int",
"qlt_ms1",
"qlt_ms2")]
res = needs[res,on=.(precid),allow.cartesian=T]
## TODO: additional processing?
res
}
summ_needs_from_comp <- function(res,comp) {
needs = comp[,.(set,ID,Name,SMILES)]
setkey(needs,set,ID)
res[needs,on=.(set,ID),`:=`(Name=i.Name,
SMILES=i.SMILES)]
}
## This function creates `summ' table.
gen_summ <- function(db,qa,comp) {
## Start with the basic things.
res = summ_needs_from_cat(db$cat)
## Add masses and precids.
res = summ_needs_from_precursors(res,db$precursors)
## Add qa columns.
res = summ_needs_from_qa(res,qa)
setkeyv(res,SUMM_KEY)
## Add comp columns.
summ_needs_from_comp(res,comp)
}
...@@ -12,598 +12,298 @@ ...@@ -12,598 +12,298 @@
## See the License for the specific language governing permissions and ## See the License for the specific language governing permissions and
## limitations under the License. ## limitations under the License.
load_raw_data<-function(fn,mode="inMemory") { create_fine_table <- function(m) {
ms1<-MSnbase::readMSData(files=fn,mode=mode,msLevel.=1) ## Select fine mz-ranges and split them into those with rt entries
ms2<-MSnbase::readMSData(files=fn,mode=mode,msLevel.=2) ## and those without.
c(ms1=ms1,ms2=ms2) precs = m$db$precursors
} precs[,unique(.SD),.SDcols=c("iso_fine_min",
"iso_fine_max",
"rt_min",
"rt_max",
"file"),
keyby=c("file","precid")]
centroided1 <- function(ms) {
if (all(MSnbase::centroided(ms)) == T)
return(T) else {
state <- MSnbase::isCentroided(ms)
N <- length(state)
fls <-length(which(state == F))
if (fls/(1.*N) < 0.01) T else F
}
} }
centroided <- function(msvec) { create_coarse_table <- function(m) {
if (is.vector(msvec)) { ## Select coarse mz-ranges and split them into those with rt entries
f <- list() ## and those without.
for (i in 1:length(msvec)) { precs = m$db$precursors
f[[i]] <- future::future(centroided1(msvec[[i]])) precs[,unique(.SD),.SDcols=c("iso_coarse_min",
} "iso_coarse_max",
lapply(f, FUN = future::value) "rt_min",
} else { "rt_max",
centroided1(msvec) "file"),
} keyby=c("file","isocoarse","precid")]
}
acq_mz<-function(tabFn) {
df<-read.csv(tabFn,
stringsAsFactors=F,
comment.char='')
x<-as.numeric(df$mz)
names(x)<-paste("ID:",as.character(df$ID),sep='')
x
} }
name2id<-function(nm) {as.integer(substring(nm,4))} read_data_file <- function(file) {
id2name<-function(id) {paste("ID:",id,sep='')} MSnbase::readMSData(file=file,msLevel=c(1,2),mode="onDisk")
ppm2dev<-function(m,ppm) 1e-6*ppm*m
gen_mz_range<-function(mz,err) {
mat<-matrix(data=numeric(1),nrow=length(mz),ncol=2)
mat[,1]<-mz - err
mat[,2]<-mz + err
mat
} }
gen_rt_range<-function(rt,err) {
mat<-matrix(data=numeric(1),nrow=length(rt),ncol=2)
rV<-which(!is.na(rt))
rNA<-which(is.na(rt))
mat[rV,1]<-(rt[rV] - err)*60
mat[rV,2]<-(rt[rV] + err)*60
mat[rNA,1]<--Inf
mat[rNA,2]<-Inf
mat
}
filt_ms2_by_prcs <- function(ms2,mzrng,ids,adduct) { extr_cgrams_ms1 <- function(ms,tab,fdata) {
pre<-MSnbase::precursorMz(ms2) ## Some helpers.
psn<-MSnbase::precScanNum(ms2) new_restab <- function(intab,cgm) {
acN<-MSnbase::acquisitionNum(ms2) base = intab[,.(precid=precid,cgmidx=.I)]
nR<-length(pre) cgm = base[,{
rt = rtime(cgm[cgmidx,1])
inRange<-function(i) { inte = intensity(cgm[cgmidx,1])
mp<-pre[[i]] .(precid=precid,
x<-mzrng[,1]<mp & mp<mzrng[,2] rt = rt,
ind<-which(x) intensity = inte,
sids <- ids[ind] scan = names(rt))},
add <- adduct[ind] by="cgmidx"]
dtable(ID=sids,adduct=add) setkey(cgm,scan)
cgm[fdata$ms1,idx:=i.idx,on="scan"]
cgm
} }
lst<-lapply(1:nR,function(i) {
dt <- inRange(i)
list(n=i,prec_scan=psn[[i]],aN=acN[[i]],ids=dt$ID,adduct=dt$adduct)
})
nemp<-sapply(lst,function(m) length(m$ids)>0)
wrk<-lst[nemp]
dfL<-sum(sapply(wrk,function(w) length(w$ids)))
df<-dtable(ID=character(dfL),
adduct=character(dfL),
prec_scan=integer(dfL),
aN=integer(dfL),
OK=logical(dfL))
df$OK<-T #TODO Introduced for testing, be careful.
offD<-0
for (m in wrk) {
l<-length(m$ids)
rng<-(offD+1):(offD+l)
df[rng,"ID"] <- m$ids
df[rng,"prec_scan"] <- m$prec_scan
df[rng,"aN"] <- m$aN
df[rng,"adduct"] <- m$adduct
offD<-offD+l
}
df[order(df$aN),]
}
filt_ms2_by_prcs_ht<-function(ms2,mzrng,ids,adduct) {
lgnd<-filt_ms2_by_prcs(ms2,mzrng=mzrng,ids=ids,adduct=adduct)
scans<-unique(lgnd$aN)
ns<-which(MSnbase::acquisitionNum(ms2) %in% scans)
sms2<-ms2[ns]
list(ms2=sms2,leg=lgnd)
}
pick_unique_precScans<-function(idx) {
ps<-unique(idx$prec_scan)
mind<-match(ps,idx$prec_scan)
data.frame(prec_scan=idx$prec_scan[mind],
ID=idx$ID[mind],
adduct=idx$adduct[mind],
stringsAsFactors=F)
} trt = tab[!is.na(rt_min)]
tnort = tab[is.na(rt_min)]
pick_uniq_pscan<-function(leg) { resrt = if (nrow(trt)>0L) {
res <- leg[,.(prec_scan=unique(prec_scan)),by=c("ID","adduct")] ## Call with rt argument (in seconds).
res[order(prec_scan),] mzrng = as.matrix(trt[,.(iso_fine_min,iso_fine_max)])
## ids<-unique(leg$ID) rtrng = as.matrix(trt[,.(rt_min*60,rt_max*60)])
## x<-lapply(ids,function(id) {ups<-unique(leg[id==leg$ID,"prec_scan"]);data.frame(ID=rep(id,length(ups)),prec_scan=ups,stringsAsFactors = F)}) new_restab(trt,MSnbase::chromatogram(ms,mz = mzrng, rt = rtrng))
## res<-do.call(rbind,c(x,list(stringsAsFactors=F))) } else data.table()
## res[order(res$prec_scan),]
}
verif_prec_fine_ht<-function(preLeg,ms1,mz,mzrng,ids,adduct) { resnort = if (nrow(tnort)>0L) {
## TODO FIXME TESTPHASE Something goes wrong here, all mapply results are mzrng = as.matrix(tnort[,.(iso_fine_min,iso_fine_max)])
## not OK. More testing needed. ... huh? But, it works now? (23/02/2022) new_restab(tnort,MSnbase::chromatogram(ms,mz = mzrng))
df<-preLeg } else data.table()
xx <- dtable(adduct=adduct,ID=ids,mz=mz,mz1=mzrng[,1],mz2=mzrng[,2])
df <- preLeg[xx,on=c("ID","adduct")]
df$ipns<-match(df$prec_scan,MSnbase::acquisitionNum(ms1))
df[, ("mzsp") := .(lapply(ipns,function (ip) if (!is.na(ip)) MSnbase::mz(ms1[[ip]]) else NA_real_))]
df$OK<-mapply(function(m1,sp,m2) any((m1<sp) & (sp<m2)),df$mz1,df$mzsp,df$mz2)
res<-df[df$OK,]
res$ipns<-NULL
res$mz1<-NULL
res$mz2<-NULL
res$mzsp<-NULL
res
}
filt_ms2<-function(ms1,ms2,mz,errCoarse,errFinePPM) {
tmp<-filt_ms2_by_prcs_ht(ms2,mz,errCoarse=errCoarse)
legMS2<-tmp$leg
legPcs<-pick_uniq_pscan(legMS2)
legPcs<-verif_prec_fine_ht(legPcs,ms1=ms1,mz=mz,errFinePPM=errFinePPM)
x<-Map(function (id,psn) {legMS2[id==legMS2$ID & psn==legMS2$prec_scan,]},legPcs[,"ID"],legPcs[,"prec_scan"])
x<-do.call(rbind,c(x,list(make.row.names=F,stringsAsFactors=F)))[c("ID","aN")]
rownames(x)<-NULL
x<-x[order(x$aN),]
uids<-unique(x$ID)
acN<-MSnbase::acquisitionNum(ms2)
res<-lapply(uids,function(id) {
x<-ms2[match(x[id==x$ID,"aN"],acN)]
fData(x)[,"rtm"]<-MSnbase::rtime(x)/60.
fData(x)[,"maxI"]<-sapply(MSnbase::intensity(x),max)
x})
names(res)<-uids
res
}
filt_ms2_fine <- function(ms1,ms2,mz,ids,adduct,err_coarse_fun,err_fine_fun) {
## This function is supposed to extract only those MS2 spectra for
## which it is proven that the precursor exists within the fine
## error range.
mzrng_c <- gen_mz_range(mz,err_coarse_fun(mz))
mzrng_f <- gen_mz_range(mz,err_fine_fun(mz))
tmp<-filt_ms2_by_prcs_ht(ms2,mzrng=mzrng_c,ids=ids,adduct=adduct)
legMS2<-tmp$leg
message("nrow legMS2:", nrow(legMS2))
legPcs<-pick_uniq_pscan(legMS2)
legPcs<-verif_prec_fine_ht(legPcs,ms1=ms1,mz=mz,mzrng=mzrng_f,ids=ids,adduct=adduct)
## x<-Map(function (id,psn,a) {legMS2[id==legMS2$ID & a==legMS2$adduct & psn==legMS2$prec_scan,]},legPcs[,"ID"],legPcs[,"prec_scan"],legPcs[,"adduct"])
## x <- data.table::rbindlist(x)[,.(ID,adduct,aN)]
x <- legMS2[legPcs[,.(ID,adduct,prec_scan)],on=c("ID","adduct","prec_scan")]
## x<-do.call(rbind,c(x,list(make.row.names=F,stringsAsFactors=F)))[c("ID","aN")]
## rownames(x)<-NULL
x<-x[order(x$aN),]
x
}
extr_ms2<-function(ms1,ms2,ids,mz,adduct,err_coarse_fun, err_fine_fun) {
## Extraction of MS2 EICs and spectra.
x <- filt_ms2_fine(ms1=ms1,
ms2=ms2,
mz=mz,
ids=ids,
adduct=adduct,
err_coarse_fun=err_coarse_fun,
err_fine_fun=err_fine_fun)
## This was here before and obviously wrong when multiple adducts
## correspond to the same ID:
##
## uids <- unique(x$ID)
## uadds <- unique(x$adduct)
idadd <- x[,unique(.SD),.SDcols=c("ID","adduct")]
acN<-MSnbase::acquisitionNum(ms2)
chunks <- Map(function(id,ad) {
ans <- x[id==x$ID & ad==x$adduct,]$aN
sp<-ms2[which(acN %in% ans)]
res <- gen_ms2_spec_blk(sp)
res$ID <- id
res$adduct <- ad
res
},
idadd$ID,idadd$adduct)
data.table::rbindlist(chunks,fill = T)
}
res = rbind(resnort,resrt,fill=T)
res[,rt:=rt/60]
add_ms2_prcs_scans<-function(ms2,idx) {
df<-idx
df$prec_scan<-integer(nrow(idx))
for (i in 1:nrow(df)) {
sn<-df$sn[[i]]
df$prec_scan[[i]]<-MSnbase::precScanNum(ms2[[sn]])
}
## sn<-as.integer(df$sn)
## df$prec_scan[]<-MSnbase::precScanNum(ms2[sn])
##
## This errors with: msLevel for signature "NULL" for a specific
## compound. However, the above approach is cool.
df
} }
refn_ms2_by_prec<-function(idxMS2,preFine) { get_fdata <- function(ms) {
pf<-preFine[preFine$OK,] fdata = as.data.table(fData(ms),keep.rownames="scan")
pf$ID<-as.character(pf$ID) setkey(fdata,scan)
idxMS2$OK<-logical(nrow(idxMS2)) res = list()
idxMS2$ID<-as.character(idxMS2$ID) res$ms1 = fdata[msLevel==1L,.(scan,
for (n in 1:nrow(idxMS2)) { idx=spIdx)]
scan<-idxMS2$prec_scan[[n]] res$ms2 = fdata[msLevel==2L,.(scan,
id2<-idxMS2$ID[[n]] idx=spIdx,
ppf<-pf[pf$ID==id2,] an=acquisitionNum,
inPF<- ppf$prec_scan %in% scan rt=retentionTime/60.,
idxMS2$OK[[n]]<-any(inPF) intensity=basePeakIntensity,
} ce=collisionEnergy,
prec_mz=precursorMZ,
idxMS2 prec_idx=precursorScanNum)]
}
trim_ms2_by_prec<-function(rawMS2,mz,errCoarse,errFinePPM) {
idxMS2<-filt_ms2_by_prcs(ms2=ms2,mz=mz,errCoarse=errCoarse)
}
grab_ms2_spec<-function(idx,raw) {
idx<-idx[idx$OK,]
IDs<-unique(idx$ID)
res<-lapply(IDs,function (id) {
sn<-idx$sn[idx$ID==id]
spec<-raw[sn]
rts<-MSnbase::rtime(spec)
lmz<-MSnbase::mz(spec)
lI<-MSnbase::intensity(spec)
rts<-rts/60.
names(lmz)<-NULL
names(lI)<-NULL
Map(function (mz,I,rt) {
mat<-matrix(data=0.0,ncol=length(mz),nrow=2,dimnames=list(c("mz","intensity")))
mat["mz",]<-as.numeric(mz)
mat["intensity",]<-as.numeric(I)
list(rt=rt,spec=mat)
},lmz,lI,rts)
})
names(res)<-IDs
res res
}
gen_ms2_chrom<-function(ms2Spec) {
lapply(ms2Spec, function(sp)
{
if (length(sp)>0) {
nRow<-length(sp)
mat<-matrix(0.0,nrow=nRow,ncol=2)
rt<-sapply(sp,function(x) x$rt)
ord<-order(rt)
intn<-lapply(sp,function (x) max(x$spec))
rt<-as.numeric(rt[ord])
intn<-as.numeric(intn[ord])
names(intn)<-NULL
names(rt)<-NULL
mat[,1]<-rt
mat[,2]<-intn
colnames(mat)<-c("rt","intensity")
mat
} else list()
})
} }
gen_ms1_chrom <- function(raw,mz,errEIC,id,rt=NULL,errRT=NULL) { relate_ms2_to_precid <- function(coarse,ms2,cgram_ms1) {
mzRng<-gen_mz_range(mz,err = errEIC) ## Take `coarse' table (the one with coarse mass limits), ms2
rtRng<-gen_rt_range(rt,err = errRT) ## fData and ms1 chromatogram, then relate precids from cgram_ms1
x<-MSnbase::chromatogram(raw,mz=mzRng,msLevel=1,missing=0.0,rt=rtRng) ## to ms2 data.
res<-lapply(x,function (xx) { ## Select those MS2 entries the parents of which coarsely match
rt<-MSnbase::rtime(xx)/60. ## compound lists masses.
ints<-MSnbase::intensity(xx) res = ms2[coarse,on=.(prec_mz>iso_coarse_min,prec_mz<iso_coarse_max),.(prec_mz=x.prec_mz,precid,prec_idx,scan,idx,ce,rt,intensity),nomatch=NULL]
df<-dtable(rt=rt,intensity=ints) setkey(res,precid,prec_idx)
df
}) ## Now, make sure those coarsely matched MS2 actually have a
names(res)<-id ## parent that finely matches something in the chromatogram (and
res ## this is by ensuring that a `precid' with the correct scan (idx)
## shows up in the chromatogram.
x = cgram_ms1[!is.na(intensity)]
x[res,on=.(precid,idx==prec_idx),
.(precid,ce,scan=i.scan,
idx=i.idx,rt=i.rt,
intensity=i.intensity),nomatch=NULL]
} }
extract_spectra <- function(ms,cgram_ms2) {
gen_ms1_chrom_ht<-function(raw,mz,errEIC,rt=NULL,errRT=NULL) { ## This will extract full MS2 spectra based on ms2 chromatogram entries.
mzRng<-gen_mz_range(mz,err=errEIC) indices = cgram_ms2[,.SD,.SDcol=c("precid","scan","idx")]
rtRng<-gen_rt_range(rt,err=errRT)
res<-MSnbase::chromatogram(raw,mz=mzRng,msLevel=1,missing=0.0,rt=rtRng) res = empty_spectra_table()
fData(res)[["ID"]]<-rownames(mzRng) selind = indices[,unique(.SD),.SDcol=c("scan","idx")]
res sel = ms[selind$idx]
}
masses = mz(sel)
get_ext_width <- function(maxid) {as.integer(log10(maxid)+1)} intensities = intensity(sel)
id_fn_ext<-function(width,id) { res = selind
formatC(as.numeric(id),width=width,flag=0) setkey(res,scan)
} res = res[,data.table(mz=masses[[scan]],
intensity=intensities[[scan]]),
write_eic<-function(eic,suff="eic.csv",dir=".",width=get_ext_width(max(as.numeric(names(eic))))) { keyby=c("scan")]
Map(function (e,n) { res[indices,on=.(scan),precid:=i.precid]
if (length(e)>0) {
fn<-file.path(dir,paste(id_fn_ext(width,n),suff,sep="."))
tab2file(tab=e,file=fn)
}
},eic,names(eic))
} }
write_ms2_spec<-function(ms2Spec,dir=".") { ## PRESCREENING
ids<-as.numeric(names(ms2Spec))
maxid<-max(ids)
width<-get_ext_width(maxid) ## This function extracts intensity maxima on intervals given by
## RT vectors rt_1 and rt_2.
for (id in names(ms2Spec)) { find_ms1_max <- function(rt,intensity,rt_1,rt_2)
sp<-ms2Spec[[id]] {
if (length(sp)>0) { x = mapply(function (rt_1,rt_2) {
dr<-file.path(dir,id_fn_ext(width,id)) rt_ival <- c(rt_1,rt_2)
dir.create(path=dr,showWarnings=F) intv <- findInterval(rt,rt_ival)
for (s in sp) { lintv = length(intv)
fn<-file.path(dr,paste("RT_",s$rt,"_spectrum.csv",sep="")) if (intv[1]==0L && intv[lintv] == 2L) {
df<-t(s$spec) pos = match(c(1L,2L),intv)
colnames(df)<-c("mz","intensity") } else if (intv[1]==1L && intv[lintv]!=1L) {
tab2file(tab=df,file=fn) pos = c(1L,match(2L,intv))
} } else if (intv[1]==0L && intv[lintv]!=0L) {
pos = c(match(1L,intv),lintv)
} else {
pos = c(1L,lintv)
} }
} pmax = pos[[1]] + which.max(intensity[pos[[1]]:pos[[2]]]) - 1L
} if (length(pmax)==0L) pmax = pos[[1]]
c(rt[pmax],intensity[pmax])
extr_msnb <-function(file,wd,mz,errEIC, errFinePPM,errCoarse=0.5,rt=NULL,errRT=NULL,mode="inMemory") { }, rt_1, rt_2, USE.NAMES=F)
## Perform the entire data extraction procedure. x
##
## file - The input mzML file.
## wd - Top-level directory where the results should be deposited.
## mz - A named vector of precursor masses for which to scan the
## file. The names can be RMassBank IDs.
## rt - A named vector of length 1, or same as mz, giving the retention
## times in minutes. The names should be the same as for mz.
## errRT - A vector of length 1, or same as mz, giving the
## half-width of the time window in which the peak for the
## corresponding mz is supposed to be.
## errEIC - Absolute mz tolerance used to extract precursor EICs.
## errFinePPM - Tolerance given in PPM used to associate input
## masses with what the instrument assigned as precursors to MS2
## products.
message("Loading ",file," in mode ",mode, ".")
data<-load_raw_data(file,mode=mode)
ms1<-data[["ms1"]]
ms2<-data[["ms2"]]
message("Done loading ",file,".")
## EICs for precursors.
message("Extracting precursor EICs. Please wait.")
eicMS1<-gen_ms1_chrom(raw=ms1,mz=mz,errEIC=errEIC,rt=rt,errRT=errRT)
write_eic(eicMS1,dir=wd)
message("Extracting precursor EICs finished.")
## Extract MS2 spectra.
message("Extracting MS2 spectra.")
idxMS2<-filt_ms2_by_prcs(ms2=ms2,mz=mz,errCoarse=errCoarse)
message("Resampling MS2 spectra.")
# idxMS2<-add_ms2_prcs_scans(ms2,idxMS2)
prsc<-pick_unique_precScans(idxMS2)
vprsc<-verif_prec_fine(preSc=prsc,ms1=ms1,mz=mz,errFinePPM = errFinePPM)
idxMS2<-refn_ms2_by_prec(idxMS2=idxMS2,preFine=vprsc)
message("Resampling MS2 spectra finished.")
ms2Spec<-grab_ms2_spec(idxMS2,ms2)
eicMS2<-gen_ms2_chrom(ms2Spec)
message("Extracting MS2 spectra finished.")
write_eic(eicMS2,dir=wd,suff="kids.csv",width=get_ext_width(max(as.numeric(names(eicMS1)))))
specDir<-file.path(wd,"ms2_spectra")
dir.create(specDir,showWarnings = F)
write_ms2_spec(ms2Spec,dir=specDir)
message("Done with ", file)
} }
##' @importFrom MSnbase filterMz analyse_extracted_data <- function(db,prescreen_param) {
extr_msnb_ht <-function(file,wd,mz,errEIC, errFinePPM,errCoarse,fnSpec,rt=NULL,errRT=NULL,mode="onDisk") { ## Note
## Perform the entire data extraction procedure. ##
## ## I am working on this two days before the group meeting. The
## file - The input mzML file. ## point of a meeting is to have something to show, so I will just
## wd - Top-level directory where the results should be deposited. ## minimally adapt the old `analyse_extracted_data' to the new
## mz - A named vector of precursor masses for which to scan the ## `db' entries in the state. I suspect, even this is not going to
## file. The names can be RMassBank IDs. ## be very easy.
## rt - A named vector of length 1, or same as mz, giving the retention ##
## times in minutes. The names should be the same as for mz. ## If no meeting was happening, then I'd create a nice, sleek
## errRT - A vector of length 1, or same as mz, giving the ## function that fully adheres to the new `data model'
## half-width of the time window in which the peak for the ## philosophy. Alas, ...
## corresponding mz is supposed to be.
## errEIC - Absolute mz tolerance used to extract precursor EICs.
## errFinePPM - Tolerance given in PPM used to associate input
## masses with what the instrument assigned as precursors to MS2
## products.
message("Loading ",file," in mode ",mode, ".")
data<-load_raw_data(file,mode=mode)
ms1<-data[["ms1"]]
ms2<-data[["ms2"]]
message("Done loading ",file,".")
## Filtering
mzCrs<-gen_mz_range(mz=mz,err=errCoarse)
mzMin<-min(mzCrs)
mzMax<-max(mzCrs)
ms1<-filterMz(ms1,c(mzMin,mzMax))
fms2<-filt_ms2(ms1,ms2,mz,errCoarse=errCoarse,errFinePPM=errFinePPM)
## EICs for precursors.
message("Extracting precursor EICs. Please wait.")
eicMS1<-gen_ms1_chrom_ht(raw=ms1,mz=mz,errEIC=errEIC,rt=rt,errRT=errRT)
message("Extracting precursor EICs finished.")
ms1 = db$extr$cgm$ms1
ms2 = db$extr$cgm$ms2
spectra = db$extr$spectra
precursors = db$precursors
x<-list(eic=eicMS1,ms2=fms2) ## Get file info.
saveRDS(object=x,file=file.path(wd,fnSpec)) ms2_noise_table = precursors[spectra,.(file,intensity),on="precid",by=.EACHI,nomatch=NULL]
x
}
## Calculate threshold.
ms2_noise_table[,threshold:=0.33333333*mean(intensity),by="file"]
extr_eic_ms1 <- function(tab,err) { ## Reduce table.
## Asynchronous extraction of ms1 spectra. The result is a list of ms2_noise_table = ms2_noise_table[,.(threshold=first(threshold,1L)),keyby="precid"]
## running futures.
file <- unique(tab$file)
res <-lapply(file,function (fn) future::futur(extr_fn(fn), lazy=T))
names(res) <- file
res
}
##' @importFrom MSnbase filterMz readMSData
##' @export ## Parameters.
extract <- function(fn,tag,tab,err_ms1_eic.,err_coarse,err_fine,err_rt.,missing_precursors) { presconf = conf_trans_pres(prescreen_param)
## Extracts MS1 and MS2 EICs, as well as MS2 spectra, subject to rt_shift = presconf$ret_time_shift_tol
## tolerance specifications. det_ms2_noise = presconf$det_ms2_noise
ms2_int_thresh = presconf$ms2_int_thresh
ms1_int_thresh = presconf$ms1_int_thresh
## We start populating the ms2 qa table.
tab_ms2 = copy(ms2)
## Calculate noise.
tab_ms2[ms2_noise_table,qa_ms2_good_int:=intensity>threshold,on=.(precid)]
## Rename as downstream wants it.
setnames(tab_ms2,c("rt","intensity"),c("ms2_rt","ms2_int"))
tab_ms2[,`:=`(rt_left = ms2_rt - rt_shift,rt_right = ms2_rt + rt_shift)]
## Get mean ms1 value.
tab_ms1 = copy(ms1)
## TODO: Still detecting external references ... but which? ## To (artificially) differentiate beween ms1 and ms2 (because,
## However, the results check out, compared to sequential access. ## they get stapled together later on, set scan to NA_character_.
err_coarse_fun <- gen_mz_err_f(err_coarse, tab_ms1[,scan:=NA_character_]
"ms1 coarse error: Only ppm, or Da units allowed.")
tab_ms1_mean = tab_ms1[,.(ms1_mean=mean(intensity,na.rm=T)),keyby="precid"]
err_fine_fun <- gen_mz_err_f(err_fine, ## Perform MS1 maxima calculation in the neighbourhood of each
"ms1 fine error: Only ppm, or Da units allowed.") ## MS2 result.
err_ms1_eic <- gen_mz_err_f(err_ms1_eic., tmp = tab_ms1[tab_ms2,
"eic error: Only ppm, or Da units allowed.") {
xx = find_ms1_max(rt,intensity,i.rt_left,i.rt_right)
.(scan=i.scan,
ms1_rt = xx[1,],
ms1_int = xx[2,])
},on=.(precid),
by=.EACHI,
nomatch=NULL]
## Calculate QA values.
tab_ms2[tmp,on=.(precid,scan),c("ms1_rt","ms1_int"):=.(i.ms1_rt,i.ms1_int)]
tab_ms2[,c("rt_left","rt_right"):=c(NULL,NULL)]
tab_ms2[tab_ms1_mean,ms1_mean:=i.ms1_mean]
tab_ms2[,`:=`(qa_ms1_good_int=fifelse(ms1_int>ms1_int_thresh,T,F),
qa_ms1_above_noise=F,
qa_ms2_near=F)]
## TODO: I wonder if so stupidly auto-calculated ms1 noise should
## be taken into account at all? My recollection from earlier
## times was that it was helpful, at least sometimes.
tab_ms2[qa_ms1_good_int==T,qa_ms1_above_noise:=fifelse(ms1_int>ms1_mean/3.,T,F)]
tab_ms2[qa_ms1_good_int==T & qa_ms1_above_noise==T & qa_ms2_good_int==T,qa_ms2_near:=T]
tab_ms2$qa_ms2_exists=T
## Reduce tab_ms1 to only to MS1 with no children.
precs_ms1 = tab_ms1[,unique(precid)]
precs_ms2 = tab_ms2[,unique(precid)]
precs_noms = data.table(precid=precs_ms1[!(precs_ms1 %in% precs_ms2)])
tab_noms2 = tab_ms1[precs_noms,
on=.(precid),
.SD,nomatch=NULL]
tab_noms2 = tab_noms2[,.(ms1_mean=mean(intensity,na.rm=T),
ms1_rt=rt[which.max(intensity)],
ms1_int=max(intensity, na.rm=T)),
keyby="precid"]
## QA for the above (lazy qa ... take only the max peak into account).
tab_noms2[,c("qa_ms1_good_int","qa_ms1_above_noise"):=.(ms1_int>ms1_int_thresh,ms1_int>ms1_mean/3.)]
## MS2 QA criteria all fail.
tab_noms2[,c("qa_ms2_exists","qa_ms2_good_int","qa_ms2_near"):=.(F,F,F)]
## Bind MS1-only and MS1/MS2 entries together.
res = rbind(tab_ms2,tab_noms2,fill=T,use.names=T)
## If ms1_int has been calculated as a Na(N) value, this means
## that no MS1 has been found for that precid.
res[,qa_ms1_exists:=F]
res[!is.na(ms1_int),qa_ms1_exists:=T]
data.table::setkey(res,precid)
err_rt <- gen_rt_err(err_rt.,
"rt error: Only s(econds), or min(utes) allowed.")
tab <- data.table::as.data.table(tab) qflg = QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
## chunk <- tab[file==fn] res[,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
mz <- tab$mz res[.(T),del_rt:=abs(ms2_rt - ms1_rt),on="qa_pass",by='scan']
rt <- tab$rt resby = BASE_KEY_MS2[! (BASE_KEY_MS2 %in% 'scan')]
id <- tab$ID res[.(T),qa_tmp_ms1_max:= ms1_int==max(ms1_int),on="qa_pass",by=resby]
adduct <- tab$adduct res[,ms2_sel:=F]
names(mz) <- id res[.(T,T),ms2_sel:= del_rt == del_rt[which.min(del_rt)],on=c("qa_pass","qa_tmp_ms1_max"),by=resby]
names(rt) <- id res[,qlt_ms1:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms1_exists",
mzerr <- err_coarse_fun(mz) "qa_ms1_above_noise",
mzrng <- gen_mz_range(mz=mz,err=mzerr) "qa_ms1_good_int")]
rtrng <- gen_rt_range(rt=rt,err=err_rt) res[,qlt_ms2:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms2_exists",
mzmin <- min(mzrng) "qa_ms2_near",
mzmax <- max(mzrng) "qa_ms2_good_int")]
read_ms1 <- function() { res[is.na(qlt_ms1),qlt_ms1:=0L]
ms1 <- readMSData(file=fn,msLevel=1,mode="onDisk") res[is.na(qlt_ms2),qlt_ms2:=0L]
ms1 <- filterMz(ms1,mz=c(mzmin,mzmax),msLevel=1)
ms1 ## Set all other flags to false when qa_ms1_exists == F by decree.
} flgs = c(QA_FLAGS,"ms2_sel")
read_ms2 <- function() { res[qa_ms1_exists == F,(flgs):=F]
ms2 <- MSnbase::readMSData(file=fn,msLevel=2,mode="onDisk")
ms2
}
read_all <- function() {
MSnbase::readMSData(file=fn,msLevel. = c(1,2),mode="onDisk")
}
extr_ms1_eic <- function(ms1) {
eic <- MSnbase::chromatogram(ms1,mz=mzrng,msLevel=1,missing=0.0,rt=rtrng)
bits <- dtable(N=sapply(eic,NROW))
bigN <- bits[,sum(N)]
bits[,idx:=paste0('I',.I)]
bits$ID <- id
bits$adduct <- adduct
bits$tag <- tag
res<-dtable(rt=numeric(bigN),
intensity=numeric(bigN),
tag=tag,
adduct=bits[,rep(adduct,N)],
ID=bits[,rep(ID,N)],
idx=bits[,rep(idx,N)])
data.table::setkey(res,idx)
names(eic)<-bits$idx
res[,c("rt","intensity") :=
.(MSnbase::rtime(eic[[idx]])/60.,
MSnbase::intensity(eic[[idx]])),
by=idx]
data.table::setkeyv(res,BASE_KEY)
res
}
if (is.null(missing_precursors)) missing_precursors <- "do_nothing"
if (missing_precursors != "do_nothing") {
ms <- clean_na_precs(read_all(),missing_precursors = missing_precursors)
fdta <- as.data.table(MSnbase::fData(ms),keep.rownames = "rn")
fdta[,idx := .I]
idx_ms1 <- fdta[msLevel == 1,idx]
idx_ms2 <- fdta[msLevel == 2,idx]
ms1 <- ms[idx_ms1]
ms2 <- ms[idx_ms2]
} else {
ms1 <- read_ms1()
ms2 <- read_ms2()
}
res_ms1 <- extr_ms1_eic(ms1)
res_ms2 <- extr_ms2(ms1=ms1,
ms2=ms2,
ids=id,
mz=mz,
adduct=adduct,
err_coarse_fun=err_coarse_fun,
err_fine_fun=err_fine_fun)
res_ms2[,"tag":=tag]
## Clean all the NA intensity MS2 (sometimes a consequence of 'fill'.)
res_ms2<-res_ms2[!is.na(intensity)]
res <- list(ms1=res_ms1,
ms2=res_ms2)
res res
} }
gen_ms2_spec_blk <- function(spectra) {
dt <- dtable(mz=MSnbase::mz(spectra),
intensity=MSnbase::intensity(spectra),
rt = lapply(MSnbase::rtime(spectra),function (z) z/60.),
CE = MSnbase::collisionEnergy(spectra),
an = MSnbase::acquisitionNum(spectra))
dt[,maspI:=sapply(intensity,function (zz) max(zz))]
data.table::rbindlist(apply(dt,1,function(row) dtable(intensity=row[["intensity"]],
rt = row[["rt"]],
mz = row[["mz"]],
CE = row[["CE"]],
an = row[["an"]])))
}
...@@ -36,7 +36,7 @@ metfrag_get_stag_tab <- function(summ) { ...@@ -36,7 +36,7 @@ metfrag_get_stag_tab <- function(summ) {
## Argument summ can be a subset of actual `summ' table. ## Argument summ can be a subset of actual `summ' table.
x = gen_1d_keytab(summ) x = gen_1d_keytab(summ)
data.table::setnames(x,old="key1d",new="stag") data.table::setnames(x,old="key1d",new="stag")
res = x[summ,`:=`(CE=i.CE,ion_mz=mz)] res = x[summ,`:=`(ce=i.ce,ion_mz=mz)]
res res
} }
...@@ -54,18 +54,21 @@ get_mf_res_ext <- function(fn) { ...@@ -54,18 +54,21 @@ get_mf_res_ext <- function(fn) {
metfrag_run <- function(param,path,subpaths,db_file,stag_tab,ms2,runtime,java_bin,nproc = 1L) { metfrag_run <- function(param,path,subpaths,db_file,stag_tab,ms2,runtime,java_bin,nproc = 1L) {
keys = intersect(colnames(stag_tab),colnames(ms2)) keys = intersect(colnames(stag_tab),colnames(ms2))
rms2 = ms2[stag_tab,on=keys,nomatch=NULL]
message("Generating MetFrag configs.") message("Generating MetFrag configs.")
file_tab = ms2[stag_tab,{
file_tab = rms2[,{
r = write_metfrag_config(param = ..param, r = write_metfrag_config(param = ..param,
path = ..path, path = ..path,
subpaths = ..subpaths, subpaths = ..subpaths,
db_file = ..db_file, db_file = ..db_file,
stag = stag, stag = first(stag),
adduct = adduct, adduct = first(adduct),
ion_mz = ion_mz, ion_mz = first(ion_mz),
spec = data.table(mz=mz,intensity=intensity)) spec = data.table(mz=mz,intensity=intensity))
c(r,stag = stag) c(r,stag = first(stag))
},by=.EACHI,on=keys] },keyby=keys]
message("Done generating MetFrag configs.") message("Done generating MetFrag configs.")
withr::with_dir(path,{ withr::with_dir(path,{
...@@ -89,33 +92,30 @@ metfrag_run <- function(param,path,subpaths,db_file,stag_tab,ms2,runtime,java_bi ...@@ -89,33 +92,30 @@ metfrag_run <- function(param,path,subpaths,db_file,stag_tab,ms2,runtime,java_bi
mf_narrow_summ <- function(summ,kv,ms2_rt_i=NA_integer_,ms2_rt_f=NA_integer_) { mf_narrow_summ <- function(summ,kv,ms2_rt_i=NA_integer_,ms2_rt_f=NA_integer_) {
skey = data.table::key(summ) skey = data.table::key(summ)
cols = c("adduct","tag","ID","CE","an","mz","qa_pass","ms2_rt") cols = union(names(skey),c("adduct","tag","ID","ce","precid","scan","mz","qa_pass","ms2_rt"))
nsumm = get_rows_from_summ(summ,kv,cols) dtkv = as.data.table(kv)
nsumm = summ[dtkv,on=names(kv),.SD,.SDcols=cols]
nsumm = nsumm[qa_pass==T] # Those that make sense. nsumm = nsumm[qa_pass==T] # Those that make sense.
nsumm_key = union(SUMM_KEY,"ms2_rt") nsumm_key = intersect(union(SUMM_KEY,"ms2_rt"),colnames(nsumm))
data.table::setkeyv(nsumm,nsumm_key) data.table::setkeyv(nsumm,nsumm_key)
if (!is.na(ms2_rt_i)) { ms2_rt_i = if (!is.na(ms2_rt_i)) ms2_rt_i else 0.
nsumm = nsumm[ms2_rt>(ms2_rt_i)] ms2_rt_f = if (!is.na(ms2_rt_f)) ms2_rt_f else Inf
}
if (!is.na(ms2_rt_f)) { nsumm[ms2_rt > (ms2_rt_i) & ms2_rt < (ms2_rt_f)]
nsumm = nsumm[ms2_rt<(ms2_rt_f)]
}
nsumm
} }
get_metfrag_targets <- function(stag_tab,ms2) { get_metfrag_targets <- function(stag_tab,ms2) {
## Take the columns we need from summ. ## Take the columns we need from summ.
x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz")] x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz")]
mrg_keys = c(intersect(key(ms2),key(summ)),"an") mrg_keys = c(intersect(key(ms2),key(summ)),"scan")
x=ms2[x,.(CE=CE,ion_mz=i.mz,mz,intensity),on=mrg_keys,by=.EACHI] x=ms2[x,.(CE=CE,ion_mz=i.mz,mz,intensity),on=mrg_keys,by=.EACHI]
## Get column order so that `an' follows `CE'. ## Get column order so that `an' follows `CE'.
resnms = setdiff(mrg_keys,"an") resnms = setdiff(mrg_keys,"scan")
nms = union(union(resnms,"CE"),c("an","ion_mz","mz","intensity")) nms = union(union(resnms,"CE"),c("scan","ion_mz","mz","intensity"))
data.table::setcolorder(x,neworder = nms) data.table::setcolorder(x,neworder = nms)
setkeyv(x,unique(c(resnms,"CE","an"))) setkeyv(x,unique(c(resnms,"CE","scan")))
x x
} }
...@@ -220,7 +220,7 @@ summarise_metfrag_results <- function(param,path,subpaths,cand_parameters,db_sco ...@@ -220,7 +220,7 @@ summarise_metfrag_results <- function(param,path,subpaths,cand_parameters,db_sco
} }
.adapt_col_types <- function(x) { .adapt_col_types <- function(x) {
x[,(names(db_scores)):=lapply(.SD, as.numeric),.SDcol=names(db_scores)] if (length(db_scores)>0) x[,(names(db_scores)):=lapply(.SD, as.numeric),.SDcol=names(db_scores)] else x
} }
.calc_basic_scores <- function(x) { .calc_basic_scores <- function(x) {
......
...@@ -191,7 +191,7 @@ gen_empty_summ <- function() { ...@@ -191,7 +191,7 @@ gen_empty_summ <- function() {
## ms2_cols <- intersect(colnames(qa_ms2),SUMM_COLS) ## ms2_cols <- intersect(colnames(qa_ms2),SUMM_COLS)
## ms2_cols <- setdiff(ms2_cols,colnames(summ)) ## ms2_cols <- setdiff(ms2_cols,colnames(summ))
## summ <- qa_ms2[summ,c(..comp_cols,..ms1_cols,..ms2_cols),on=BASE_KEY] ## summ <- qa_ms2[summ,c(..comp_cols,..ms1_cols,..ms2_cols),on=BASE_KEY]
## data.table::setkeyv(summ,c(BASE_KEY_MS2,"an")) ## data.table::setkeyv(summ,c(BASE_KEY_MS2,"scan"))
## summ[,qa_ms1_exists:=the_ifelse(!is.na(qa_ms1_good_int),T,F)] ## summ[,qa_ms1_exists:=the_ifelse(!is.na(qa_ms1_good_int),T,F)]
## summ[,qa_ms2_exists:=the_ifelse(!is.na(CE),T,F)] ## summ[,qa_ms2_exists:=the_ifelse(!is.na(CE),T,F)]
## summ[,qa_pass:=apply(.SD,1,all),.SDcols=QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]] ## summ[,qa_pass:=apply(.SD,1,all),.SDcols=QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]]
...@@ -547,274 +547,6 @@ conf_trans_pres <- function(pres_list) { ...@@ -547,274 +547,6 @@ conf_trans_pres <- function(pres_list) {
pres_list pres_list
} }
## PRESCREENING
create_qa_table <- function(extr,conf_presc) {
## The first input argument is the extracted `ms`, table
## containing MS1 and MS2 spectra. The argument `conf_presc` is
## m$conf$prescreen, the prescreening parameters given in the conf
## file.
## The qa table is just a copy of ms with added quality control
## columns QA_COLS.
## The QA_FLAGS columns are flags specifying which properties of
## compounds are known well, or not.
## For each compound (mass) we ask the following questions:
## qa_ms1_exists -- does the MS1 spectrum exist at all?
## qa_ms2_exists -- do we have any MS2 spectra at all?
## qa_ms1_above_noise -- is MS1 above the noise treshold?
## qa_ms2_near -- is there any MS2 spectrum inside the tolerated
## retention time window around the MS1 peak? That is, are we
## non-RT-shifted?
## qa_ms2_good_int -- Is there any MS2 spectral intensity greater
## than the MS2 threshold and less than the MS1 peak?
## qa_pass -- did the spectrum pass all the checks?
## The columns in QA_NUM_REAL are:
##
## ms1_int -- the maximum intensity of MS1 spectrum over the
## entire run;
##
## ms1_rt -- the retention time of the peak MS1.
## The columns in QA_NUM_INT are:
##
## ms2_sel -- index of the selected MS2 spectrum; if not NA, the
## associated spectrum passed all the checks (qa_pass == T); the
## spectrum itself is in one of the member sublists of the `spec'
## column. The integer `ms2_sel' is then the index of the spectrum
## in that sublist.
##
## ms1_rt_ind -- TODO (but not important to end users).
qa <- list(prescreen=conf_presc)
## checks <- extr$ms2[,{
## },keyby=BASE_KEY_MS2]
checks <- extr$ms2
checks[,(QA_FLAGS):=F]
## message("checks:")
## print(checks)
## message("done checks")
checks[,(QA_NUM_INT):=NA_integer_]
checks[,(QA_NUM_REAL):=NA_real_]
setkeyv(checks,BASE_KEY_MS2)
qa$checks <- checks
qa
}
assess_ms1 <- function(m) {
qa <- m$qa
ms1 <- m$extr$ms1
## Calculate auxiliary variables and indices.
qa_ms1 <- ms1[,.(ms1_rt_ind=which.max(intensity)),keyby=BASE_KEY]
qa_ms1 <- ms1[qa_ms1,.(ms1_rt_ind=ms1_rt_ind,
ms1_int=intensity[[ms1_rt_ind]],
ms1_rt=rt[[ms1_rt_ind]],
ms1_mean=mean(intensity)),on=BASE_KEY,by=.EACHI]
qa_ms1[,qa_ms1_good_int := ms1_int > qa$prescreen$ms1_int_thresh]
qa_ms1[,qa_ms1_above_noise := F]
qa_ms1[qa_ms1_good_int==T,qa_ms1_above_noise := .(ms1_int > qa$prescreen$s2n*ms1_mean)]
## checks[(!qa_ms1_above_noise),c("qa_ms2_good_int","qa_ms2_near","qa_ms2_exists","qa_pass"):=F]
## qa_ms1 <- check_ms1_noise(check_ms1(qa_ms1))
m$qa$ms1 <- qa_ms1
m
}
assess_ms2 <- function(m) {
presconf <- conf_trans_pres(m$conf$prescreen)
ms1 <- m$extr$ms1
ms2 <- m$extr$ms2
qa_ms1 <- m$qa$ms1
qa_ms2 <- ms2[qa_ms1[qa_ms1_above_noise==T],.(CE=unique(CE),
pc_rt=i.ms1_rt,
pc_int=i.ms1_int,
an=unique(an)),on=BASE_KEY,by=.EACHI,nomatch=NULL]
rt_win2 <- presconf$ret_time_shift_tol
qa_ms2 <- ms2[qa_ms2,.(pc_rt=pc_rt,
pc_int=pc_int,
ms2_int=max(intensity),
ms2_rt=unique(rt),
qa_ms2_near=head(rt,1) < pc_rt + rt_win2 & head(rt,1) > pc_rt - rt_win2),
by=.EACHI,on=c(BASE_KEY_MS2,"an")]
qa_ms2$qa_ms2_good_int <-F
qa_ms2[qa_ms2_near==T,
qa_ms2_good_int := ms2_int > presconf$ms2_int_thresh & ms2_int < pc_int,
by=c(BASE_KEY_MS2,"an")]
## qa_ms2$qa_pass <- F
## qa_ms2[qa_ms2_good_int==T,qa_pass:=T]
qa_ms2$ms2_sel <- F
qa_ms2[qa_ms2_good_int==T,ms2_sel:={
ind<-which.min(abs(ms2_rt-pc_rt))
z<-ms2_sel
z[[ind]]<-T
z
},by=BASE_KEY_MS2]
setkeyv(qa_ms2,BASE_KEY_MS2)
m$qa$ms2 <- qa_ms2
m
}
## Analyze extracted data.
analyse_extracted_data <- function(extr,prescreen_param) {
ms1 <- extr$ms1
ms2 <- extr$ms2
## Parameters.
presconf <- conf_trans_pres(prescreen_param)
rt_shift <- presconf$ret_time_shift_tol
det_ms2_noise <- presconf$det_ms2_noise
ms2_int_thresh <- presconf$ms2_int_thresh
ms1_int_thresh <- presconf$ms1_int_thresh
## Detect MS2 noise.
ms2_clc_ns <- if (det_ms2_noise) {
if (ms2_int_thresh>0) {
warning("Ignore user specified ms2_int_thresh value and autodetect noise instead.")
}
ms2[,ms2_thr:=0.33333333*mean(intensity),by="tag"]
} else {
ms2[,ms2_thr:=ms2_int_thresh]
}
## message('ms2_clc_ns:',key(ms2_clc_ns))
## ms2_clc_ns <- ms2_clc_ns[intensity>ms2_thr]
## We drop mz info.
tab_ms2 <- ms2_clc_ns[,.(ms2_rt=first(rt),ms2_int=max(intensity),ms2_thr=first(ms2_thr)),by=c(BASE_KEY_MS2,'an')]
tab_ms2[,qa_ms2_good_int:=ms2_int>ms2_thr,by="an"]
data.table::setkeyv(tab_ms2,BASE_KEY_MS2)
tab_ms2[,`:=`(rt_left = ms2_rt - rt_shift,rt_right = ms2_rt + rt_shift)]
## Get mean ms1 value.
tab_ms1 <- extr$ms1
tab_ms1_mean <- tab_ms1[,.(ms1_mean=mean(intensity)),keyby=BASE_KEY]
## This function extracts intensity maxima on intervals given by
## RT vectors rt_1 and rt_2.
find_ms1_max <- function(rt,intensity,rt_1,rt_2)
{
mapply(function (rt_1,rt_2) {
rt_ival <- c(rt_1,rt_2)
intv <- findInterval(rt,rt_ival)
lintv = length(intv)
if (intv[1]==0L && intv[lintv] == 2L) {
pos = match(c(1L,2L),intv)
} else if (intv[1]==1L && intv[lintv]!=1L) {
pos = c(1L,match(2L,intv))
} else if (intv[1]==0L && intv[lintv]!=0L) {
pos = c(match(1L,intv),lintv)
} else {
pos = c(1L,lintv)
}
pmax = pos[[1]] + which.max(intensity[pos[[1]]:pos[[2]]]) - 1L
c(rt[pmax],intensity[pmax])
}, rt_1, rt_2, USE.NAMES=F)
}
## Perform MS1 maxima calculation in the neighbourhood of each
## MS2 result.
tmp = tab_ms1[tab_ms2,{
xx = find_ms1_max(rt,intensity,i.rt_left,i.rt_right)
.(an=i.an,
ms1_rt = xx[1,],
ms1_int = xx[2,])
},by=.EACHI, nomatch=NULL]
## Calculate QA values.
tab_ms2[tmp,c("ms1_rt","ms1_int"):=.(i.ms1_rt,i.ms1_int),on=c(BASE_KEY,'an')]
tab_ms2[,c("rt_left","rt_right"):=c(NULL,NULL)]
tab_ms2[tab_ms1_mean,ms1_mean:=i.ms1_mean]
tab_ms2[,`:=`(qa_ms1_good_int=fifelse(ms1_int>ms1_int_thresh,T,F),
qa_ms1_above_noise=F,
qa_ms2_near=F)]
## TODO: I wonder if so stupidly auto-calculated ms1 noise should
## be taken into account at all? My recollection from earlier
## times was that it was helpful, at least sometimes.
tab_ms2[qa_ms1_good_int==T,qa_ms1_above_noise:=fifelse(ms1_int>ms1_mean/3.,T,F)]
tab_ms2[qa_ms1_good_int==T & qa_ms1_above_noise==T & qa_ms2_good_int==T,qa_ms2_near:=T]
tab_ms2$qa_ms2_exists=T
## Find MS1 with no corresponding MS2.
ms1key <- tab_ms1[,unique(.SD),.SDcol=BASE_KEY]
ms2key <- tab_ms2[,unique(.SD),.SDcol=BASE_KEY]
ms2key$mch=T
tabmatches <- ms2key[ms1key]
ms1woms2 <- tabmatches[is.na(mch)][,mch:=NULL]
## calculate the most intense peak, its location and the mean for
## childless MS1.
tab_noms2 <- tab_ms1[ms1woms2,.(ms1_mean=mean(intensity),ms1_rt=rt[which.max(intensity)],ms1_int=max(intensity)),by=.EACHI,nomatch=NULL]
## QA for the above (lazy qa ... take only the max peak into account).
tab_noms2[,c("qa_ms1_good_int","qa_ms1_above_noise"):=.(ms1_int>ms1_int_thresh,ms1_int>ms1_mean/3.)]
## MS2 QA criteria all fail.
tab_noms2[,c("qa_ms2_exists","qa_ms2_good_int","qa_ms2_near"):=.(F,F,F)]
## Bind MS1-only and MS1/MS2 entries together.
res <- rbind(tab_ms2,tab_noms2,fill=T,use.names=T)
## Every single entry which was extracted has at least MS1.
res[,qa_ms1_exists:=T]
data.table::setkeyv(res,BASE_KEY)
qflg <- QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
res[,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
res[.(T),del_rt:=abs(ms2_rt - ms1_rt),on="qa_pass",by='an']
resby <- BASE_KEY_MS2[! (BASE_KEY_MS2 %in% 'an')]
res[.(T),qa_tmp_ms1_max:= ms1_int==max(ms1_int),on="qa_pass",by=resby]
res[,ms2_sel:=F]
res[.(T,T),ms2_sel:= del_rt == del_rt[which.min(del_rt)],on=c("qa_pass","qa_tmp_ms1_max"),by=resby]
res[,qlt_ms1:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms1_exists",
"qa_ms1_above_noise",
"qa_ms1_good_int")]
res[,qlt_ms2:=apply(.SD,1,function(rw) sum(c(5L,3L,2L)*rw)),.SDcol=c("qa_ms2_exists",
"qa_ms2_near",
"qa_ms2_good_int")]
res
}
## Based on the `comprehensive' and `qa' tabs, greate `summ'.
gen_summ <- function(comp,qa) {
comp_cols <- intersect(SUMM_COLS,colnames(comp))
rdcomp <- comp[,..comp_cols]
data.table::setkeyv(rdcomp,BASE_KEY)
summ <- qa[rdcomp,nomatch=F] #We changed `nomatch' cases from NA
#to F, because NA does not work well
#with X == F condition.
## flgs <- c(QA_FLAGS,"ms2_sel")
## summ[is.na(qa_ms1_exists),(flgs):=F]
data.table::setkeyv(summ,SUMM_KEY)
summ[.(F),c("qlt_ms1","qlt_ms2"):=0.,on="qa_ms1_exists"]
summ
}
gen_mz_err_f <- function(entry,msg) { gen_mz_err_f <- function(entry,msg) {
eppm <- grab_unit(entry,"ppm") eppm <- grab_unit(entry,"ppm")
eda <- grab_unit(entry,"Da") eda <- grab_unit(entry,"Da")
......
...@@ -21,8 +21,8 @@ aes <- ggplot2::aes ...@@ -21,8 +21,8 @@ aes <- ggplot2::aes
plot_fname <- function(kvals) { plot_fname <- function(kvals) {
if (!is.null(kvals)) { if (!is.null(kvals)) {
kparts <- mapply(paste0,names(kvals),kvals,USE.NAMES=F) kparts = mapply(paste0,names(kvals),kvals,USE.NAMES=F)
stump <- paste(kparts,collapse="_") stump = paste(kparts,collapse="_")
paste0("plot_",stump,".pdf") paste0("plot_",stump,".pdf")
} else 'default.pdf' } else 'default.pdf'
} }
...@@ -33,23 +33,23 @@ is_fname_rds <- function(fn) { ...@@ -33,23 +33,23 @@ is_fname_rds <- function(fn) {
} }
sci10 <- function(x) { sci10 <- function(x) {
prefmt <- formatC(x,format="e",digits=2) prefmt = formatC(x,format="e",digits=2)
bits <- strsplit(prefmt,split="e") bits = strsplit(prefmt,split="e")
bits1 <-sapply(bits,function(x) { bits1 =sapply(bits,function(x) {
if (length(x) > 1) { if (length(x) > 1) {
res <- x[[1]] res = x[[1]]
sub(" ","~",res) sub(" ","~",res)
} else { } else {
x x
} }
}) })
bits2 <-sapply(bits,function(x) if (length(x)>1) paste0(" %*% 10^","'",sub("[+]"," ",x[[2]]),"'") else "") bits2 =sapply(bits,function(x) if (length(x)>1) paste0(" %*% 10^","'",sub("[+]"," ",x[[2]]),"'") else "")
txt <- mapply(function(b1,b2) if (nchar(b2)!=0) {paste0("'",b1,"'",b2)} else NA, txt = mapply(function(b1,b2) if (nchar(b2)!=0) {paste0("'",b1,"'",b2)} else NA,
bits1, bits1,
bits2, bits2,
SIMPLIFY = F) SIMPLIFY = F)
names(txt) <- NULL names(txt) = NULL
txt <- gsub(pattern = "^'0\\.00'.*$"," 0",x=txt) txt = gsub(pattern = "^'0\\.00'.*$"," 0",x=txt)
parse(text=txt) parse(text=txt)
...@@ -58,15 +58,15 @@ sci10 <- function(x) { ...@@ -58,15 +58,15 @@ sci10 <- function(x) {
scale_legend <- function(colrdata,pdata) { scale_legend <- function(colrdata,pdata) {
if (is.null(colrdata) || is.null(pdata)) NULL if (is.null(colrdata) || is.null(pdata)) NULL
labs <- data.table::key(colrdata) labs = data.table::key(colrdata)
sdcols <- c(labs,"label") sdcols = c(labs,"label")
tab_lab <- pdata[,unique(.SD),.SDcols=sdcols][colrdata,.(colour=i.colour),on=labs,nomatch=NULL] tab_lab = pdata[,unique(.SD),.SDcols=sdcols][colrdata,.(colour=i.colour),on=labs,nomatch=NULL]
x <- tab_lab$colour x = tab_lab$colour
names(x) <- tab_lab$label names(x) = tab_lab$label
ggplot2::scale_colour_manual(values=x) ggplot2::scale_colour_manual(values=x)
} }
...@@ -79,15 +79,15 @@ pal_maker <- function(n,palname = NULL) { ...@@ -79,15 +79,15 @@ pal_maker <- function(n,palname = NULL) {
## the first colours to the original palette, then go over to the ## the first colours to the original palette, then go over to the
## generated ones. Returns a vector of colours. ## generated ones. Returns a vector of colours.
krzywinski <- c("#68023F","#008169","#EF0096","#00DCB5", krzywinski = c("#68023F","#008169","#EF0096","#00DCB5",
"#FFCFE2","#003C86","#9400E6","#009FFA", "#FFCFE2","#003C86","#9400E6","#009FFA",
"#FF71FD","#7CFFFA","#6A0213","#008607") "#FF71FD","#7CFFFA","#6A0213","#008607")
info <- as.data.table(RColorBrewer::brewer.pal.info,keep.rownames = T) info = as.data.table(RColorBrewer::brewer.pal.info,keep.rownames = T)
maxcol <- if (!is.null(palname)) info[rn == palname]$maxcolors else length(krzywinski) maxcol = if (!is.null(palname)) info[rn == palname]$maxcolors else length(krzywinski)
startpal <- if (!is.null(palname)) RColorBrewer::brewer.pal(maxcol,palname) else krzywinski startpal = if (!is.null(palname)) RColorBrewer::brewer.pal(maxcol,palname) else krzywinski
pal <- if (n>length(startpal)) { pal = if (n>length(startpal)) {
intrppal <-(colorRampPalette(startpal))(n) intrppal =(colorRampPalette(startpal))(n)
newcol <- setdiff(intrppal,startpal) newcol = setdiff(intrppal,startpal)
unique(c(startpal,intrppal)) unique(c(startpal,intrppal))
} else startpal } else startpal
...@@ -106,16 +106,16 @@ make_struct_plot <- function(smiles, kekulise=TRUE, width=300, height=300, ...@@ -106,16 +106,16 @@ make_struct_plot <- function(smiles, kekulise=TRUE, width=300, height=300,
## structure -> grob ## structure -> grob
smiles2img <- function() { smiles2img <- function() {
if (is.na(smiles) || nchar(smiles)==0) return(NULL) #Handle empty SMILES. if (is.na(smiles) || nchar(smiles)==0) return(NULL) #Handle empty SMILES.
dep <- rcdk::get.depictor(width = width, height = height, zoom = zoom, style = style, annotate = annotate, dep = rcdk::get.depictor(width = width, height = height, zoom = zoom, style = style, annotate = annotate,
abbr = abbr, suppressh = suppressh, showTitle = showTitle, smaLimit = smaLimit, abbr = abbr, suppressh = suppressh, showTitle = showTitle, smaLimit = smaLimit,
sma = NULL) sma = NULL)
mol <- RMassBank::getMolecule(smiles) mol = RMassBank::getMolecule(smiles)
z<-rcdk::view.image.2d(mol, depictor=dep) z=rcdk::view.image.2d(mol, depictor=dep)
grid::rasterGrob(z) grid::rasterGrob(z)
} }
grob <- smiles2img() grob = smiles2img()
if (!is.null(grob)) { if (!is.null(grob)) {
qplot(1:5, 2*(1:5), geom="blank") + qplot(1:5, 2*(1:5), geom="blank") +
...@@ -156,13 +156,13 @@ theme_print <- function(...) ggplot2::theme_light()+ggplot2::theme(axis.title=gg ...@@ -156,13 +156,13 @@ theme_print <- function(...) ggplot2::theme_light()+ggplot2::theme(axis.title=gg
...)+guide_fun() ...)+guide_fun()
theme_empty <- ggplot2::theme_bw() theme_empty = ggplot2::theme_bw()
theme_empty$line <- ggplot2::element_blank() theme_empty$line = ggplot2::element_blank()
theme_empty$rect <- ggplot2::element_blank() theme_empty$rect = ggplot2::element_blank()
theme_empty$strip.text <- ggplot2::element_blank() theme_empty$strip.text = ggplot2::element_blank()
theme_empty$axis.text <- ggplot2::element_blank() theme_empty$axis.text = ggplot2::element_blank()
theme_empty$plot.title <- ggplot2::element_blank() theme_empty$plot.title = ggplot2::element_blank()
theme_empty$axis.title <- ggplot2::element_blank() theme_empty$axis.title = ggplot2::element_blank()
cust_geom_line <- function(key_glyph="rect",...) ggplot2::geom_line(...,key_glyph=key_glyph) cust_geom_line <- function(key_glyph="rect",...) ggplot2::geom_line(...,key_glyph=key_glyph)
...@@ -187,18 +187,31 @@ mk_logic_exp <- function(rest,sofar=NULL) { ...@@ -187,18 +187,31 @@ mk_logic_exp <- function(rest,sofar=NULL) {
} else { } else {
nm = names(rest)[[1]] nm = names(rest)[[1]]
val = rest[[1]] val = rest[[1]]
ex <- bquote(.(as.symbol(nm)) %in% .(val)) ex = bquote(.(as.symbol(nm)) %in% .(val))
zz <- if (is.null(sofar)) ex else bquote(.(ex) & .(sofar)) zz = if (is.null(sofar)) ex else bquote(.(ex) & .(sofar))
mk_logic_exp(tail(rest,-1L), zz) mk_logic_exp(tail(rest,-1L), zz)
} }
} }
get_data_from_key <- function(tab,key) { get_data_from_key <- function(db,tab,kvals,outcols=NULL) {
skey <- mk_logic_exp(key)
tab <- eval(bquote(tab[.(skey)])) ## Ensure only names that exist in cat are used in selection. Or,
setkeyv(tab,names(key)) ## should we not do this?
tab valid_names = intersect(names(kvals),colnames(db$cat))
## Turn list into a data.table.
dt = as.data.table(kvals[valid_names])
## Get catids.
cattab = db$cat[dt,on=valid_names]
## Get precids.
mztab = db$precursors[cattab,on="catid"]
outnames = union(valid_names,outcols)
res = tab[mztab,on="precid"]
if (!is.null(outcols)) res[,..outnames] else res
} }
...@@ -220,9 +233,9 @@ define_colrdata <- function(comptab,labs) { ...@@ -220,9 +233,9 @@ define_colrdata <- function(comptab,labs) {
## Determine colours based on `labs'. ## Determine colours based on `labs'.
one_keyset <- function(dt) { one_keyset <- function(dt) {
labtab = dt[,unique(.SD),.SDcol=labs] labtab = dt[,unique(.SD),.SDcol=labs]
n <- NROW(labtab) n = NROW(labtab)
cols <- if (n<13L) { cols = if (n<13L) {
pal <- RColorBrewer::brewer.pal(n=n,name="Paired") pal = RColorBrewer::brewer.pal(n=n,name="Paired")
if (n>3L) pal else if (n>0L) pal[1:n] else character() if (n>3L) pal else if (n>0L) pal[1:n] else character()
} else { } else {
scales::viridis_pal()(n) scales::viridis_pal()(n)
...@@ -232,14 +245,14 @@ define_colrdata <- function(comptab,labs) { ...@@ -232,14 +245,14 @@ define_colrdata <- function(comptab,labs) {
} }
## Calculate lengths of all the COLRDATA_KEY subgroups. ## Calculate lengths of all the COLRDATA_KEY subgroups.
dt <- comptab[,unique(.SD),.SDcols=labs,by=COLRDATA_KEY] dt = comptab[,unique(.SD),.SDcols=labs,by=COLRDATA_KEY]
## Arrange colours to map to specific labels by sorting. ## Arrange colours to map to specific labels by sorting.
allcols <- union(COLRDATA_KEY,labs) allcols = union(COLRDATA_KEY,labs)
data.table::setkeyv(dt,allcols) data.table::setkeyv(dt,allcols)
## Assign colours to labels subgroups. ## Assign colours to labels subgroups.
res <- dt[,one_keyset(.SD),by=COLRDATA_KEY] res = dt[,one_keyset(.SD),by=COLRDATA_KEY]
## Sort everything again, ## Sort everything again,
data.table::setkeyv(res,allcols) data.table::setkeyv(res,allcols)
...@@ -251,10 +264,10 @@ define_colrdata <- function(comptab,labs) { ...@@ -251,10 +264,10 @@ define_colrdata <- function(comptab,labs) {
## compound set.). ## compound set.).
narrow_colrdata <- function(colrdata,kvals) { narrow_colrdata <- function(colrdata,kvals) {
if (is.null(colrdata)) return(NULL) if (is.null(colrdata)) return(NULL)
vals <- as.list(kvals[COLRDATA_KEY]) vals = as.list(kvals[COLRDATA_KEY])
res <- colrdata[,(COLRDATA_KEY):=NULL] res = colrdata[,(COLRDATA_KEY):=NULL]
labs <- names(res)[names(res)!="colour"] labs = names(res)[names(res)!="colour"]
data.table::setkeyv(res,labs) data.table::setkeyv(res,labs)
res res
} }
...@@ -266,201 +279,219 @@ narrow_colrdata <- function(colrdata,kvals) { ...@@ -266,201 +279,219 @@ narrow_colrdata <- function(colrdata,kvals) {
## subset of the `summ' table based on `kvals'. We need it for rt-s in ## subset of the `summ' table based on `kvals'. We need it for rt-s in
## the labels. Argument `labs' is a vector of names that will be used ## the labels. Argument `labs' is a vector of names that will be used
## to construct the legend labels. ## to construct the legend labels.
get_data_4_eic_ms1 <- function(extr_ms1,summ_rows,kvals,labs) { get_data_4_eic_ms1 <- function(db,summ_rows,kvals,labs) {
## Which of the selected keys are in the extr_ms1? This can be ## Which of the selected keys are in the db$extr$cgm$ms1? This can be
## made more obvious to the user, but note necessary atm. ## made more obvious to the user, but note necessary atm.
keys <- names(kvals) keys = names(kvals)
actual_key <- intersect(keys,names(extr_ms1)) actual_key = intersect(keys,names(db$extr$cgm$ms1))
actual_kvals <- kvals[actual_key] actual_kvals = kvals[actual_key]
## Subset db$extr$cgm$ms1 by the actual key.
tab = get_data_from_key(db=db,
tab=db$extr$cgm$ms1,
kvals=kvals,
outcols = c("catid","mz","rt","intensity"))
## Subset extr_ms1 by the actual key.
tab <-get_data_from_key(tab=extr_ms1,key=actual_kvals)
## Group the plot data per label group (ie tags, or adducts, or meta = db$cat[tab[,.(catid=unique(catid))],on=.(catid),.SD,by=.EACHI,.SDcols=labs]
## both).
xlxx <- intersect(labs,names(extr_ms1))
xlxx <- as.character(xlxx)
pdata <- tab[,.(rt,intensity),by=xlxx]
## Attach catid information.
tab = meta[tab,on=.(catid),nomatch=NULL]
pdata = tab[,.(rt,intensity),by=labs]
## TODO: FIXME: This fails because summ_rows sux wrt calcing of ms1_rt for labels. #Now, add the RTs in. text_labels = labs
## pdata[summ_rows,ms1_rt:=signif(i.ms1_rt,5),on=xlxx] pdata = eval(bquote(pdata[,label:=make_line_label(..(lapply(text_labels,as.symbol))),by=text_labels],splice=T))
setkeyv(pdata,cols=unique(as.character(text_labels)))
## Create labels.
## xlxx <- unique(c(xlxx,"ms1_rt"))
pdata <- eval(bquote(pdata[,label:=make_line_label(..(lapply(xlxx,as.symbol))),by=xlxx],splice=T))
setkeyv(pdata,cols=unique(as.character(xlxx),"rt"))
pdata pdata
} }
## Prepare MS2 eic data: rt and intensity + key made of splitby. ## Prepare MS2 eic data: rt and intensity + key made of splitby.
get_data_4_eic_ms2 <- function(summ,kvals,labs) { get_data_4_eic_ms2 <- function(db,summ,kvals,labs) {
tab <-get_data_from_key(tab=summ,key=kvals) tab = get_data_from_key(db=db,tab=db$extr$cgm$ms2,kvals=kvals,outcols=c("catid","ce","mz","rt","intensity"))
nms <- names(kvals)
byby <- unique(c(nms,labs,"an")) meta = db$cat[tab[,.(catid=unique(catid))],on=.(catid),.SD,by=.EACHI,.SDcols=labs]
pdata <- tab[,.(intensity=ms2_int,rt=ms2_rt),by=byby]
## Attach catid information.
tab = meta[tab,on=.(catid),nomatch=NULL]
pdata = tab[,.(rt,intensity),by=labs]
if (NROW(pdata)==0L) return(NULL) if (NROW(pdata)==0L) return(NULL)
xlxx <- as.character(labs)
pdata <- eval(bquote(pdata[,label:=make_line_label(..(lapply(xlxx,as.symbol))),by=.(xlxx)],splice=T))
setkeyv(pdata,cols=c(labs,"rt")) text_labels = labs
pdata = eval(bquote(pdata[,label:=make_line_label(..(lapply(text_labels,as.symbol))),by=text_labels],splice=T))
setkeyv(pdata,cols=unique(as.character(text_labels)))
pdata pdata
} }
get_rows_from_summ <- function(summ,kvals,...) {
summ_rows_cols <- union(names(kvals),c(...))
get_data_from_key(summ,key=kvals)[,unique(.SD),.SDcol=summ_rows_cols]
}
narrow_summ <- function(summ,kvals,labs,...) { narrow_summ <- function(db,summ,kvals,labs,...) {
keys <- names(kvals) keys = names(kvals)
## keys <- keys[!is.na(keys)] nms = union(names(kvals),
needed <- setdiff(labs,keys) labs)
x <- as.list(c(needed,...)) nms = union(union("precid",nms),c(...))
x <- c(list(summ,kvals),x) nsumm = get_data_from_key(db=db,
do.call(get_rows_from_summ,x) tab=summ,
kvals=kvals,
outcols=nms)
nsumm
} }
### PLOTTING: TOP-LEVEL PLOT CREATION ### PLOTTING: TOP-LEVEL PLOT CREATION
make_eic_ms1_plot <- function(extr_ms1,summ,kvals,labs,axis="linear",rt_range=NULL,i_range=NULL, asp=1,colrdata=NULL) { make_eic_ms1_plot <- function(db,summ,kvals,labs,axis="linear",rt_range=NULL,i_range=NULL, asp=1,colrdata=NULL) {
## If nothing selected, just return NULL. ## If nothing selected, just return NULL.
if (is.null(kvals)) return(NULL) if (is.null(kvals)) return(NULL)
key <- names(kvals) key = names(kvals)
## Get metadata. ## Get metadata.
## TODO: FIXME: Somehow calculating representationve ms1_rt for summ_rows = narrow_summ(db=db,summ,kvals,labs,"mz","ms1_rt","ms1_int","Name","SMILES","qa_ms1_exists","scan","ms2_sel")
## plots is wrong. Horrible and wrong. Will remove those labels rows_key = union(data.table::key(summ_rows),labs)
## until we fix.
summ_rows <- narrow_summ(summ,kvals,labs,"mz","ms1_rt","ms1_int","Name","SMILES","Formula","qa_ms1_exists","an","ms2_sel")
rows_key <- union(data.table::key(summ_rows),labs)
summ_rows$sel_ms1_rt=NA_real_ summ_rows$sel_ms1_rt=NA_real_
summ_rows[ms2_sel==T,sel_ms1_rt:=ms1_rt[which.max(ms1_int)],by=rows_key] summ_rows[ms2_sel==T,sel_ms1_rt:=ms1_rt[which.max(ms1_int)],by=rows_key]
summ_rows[is.na(sel_ms1_rt) & ms2_sel==F & qa_ms1_exists==T,sel_ms1_rt:=ms1_rt[which.max(ms1_int)],by=rows_key] summ_rows[is.na(sel_ms1_rt) & ms2_sel==F & qa_ms1_exists==T,sel_ms1_rt:=ms1_rt[which.max(ms1_int)],by=rows_key]
summ_rows[,ms1_rt:=sel_ms1_rt] summ_rows[,ms1_rt:=sel_ms1_rt]
summ_rows[,sel_ms1_rt:=NULL] summ_rows[,sel_ms1_rt:=NULL]
summ_rows[,c("an","qa_ms1_exists","ms2_sel"):=NULL] summ_rows[,c("scan","qa_ms1_exists","ms2_sel"):=NULL]
summ_rows <- summ_rows[,unique(.SD)] summ_rows = summ_rows[,unique(.SD)]
## Get the table with ms1 data. ## Get the table with ms1 data.
pdata <- get_data_4_eic_ms1(extr_ms1, summ_rows, kvals, labs) pdata = get_data_4_eic_ms1(db=db, summ_rows, kvals, labs)
## Deal with retention time range. ## Deal with retention time range.
coord <- if (is.null(rt_range) && is.null(i_range)) { coord = if (is.null(rt_range) && is.null(i_range)) {
NULL NULL
} else { } else {
ggplot2::coord_cartesian(xlim=rt_range, ggplot2::coord_cartesian(xlim=rt_range,
ylim=i_range) ylim=i_range)
} }
xrng <- range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt) xrng = range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt)
dx <- abs(xrng[[2]]-xrng[[1]]) dx = abs(xrng[[2]]-xrng[[1]])
yrng <- range(pdata$intensity) yrng = range(pdata$intensity)
dy <- abs(yrng[[2]]-yrng[[1]]) dy = abs(yrng[[2]]-yrng[[1]])
## Calculate aspect ratio. ## Calculate aspect ratio.
aspr <- if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy) aspr = if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])), tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])),
collapse='; ') ## paste0("Set: ", set, " ID: ",id) collapse='; ') ## paste0("Set: ", set, " ID: ",id)
title_txt = paste0("MS1 EIC for ion m/z = ",paste0(signif(unique(summ_rows$mz),digits=7L),collapse=", ")) title_txt = paste0("MS1 EIC for ion m/z = ",paste0(signif(unique(summ_rows$mz),digits=7L),collapse=", "))
nm <- paste(unique(summ_rows$Name),collapse="; ") nm = paste(unique(summ_rows$Name),collapse="; ")
subt_txt = if (!length(nm)==0L && !is.na(nm) && nchar(nm)>0L) nm else NULL subt_txt = if (!length(nm)==0L && !is.na(nm) && nchar(nm)>0L) nm else NULL
p <- ggplot2::ggplot(pdata,aes(x=rt,y=intensity,colour=label))+ p = ggplot2::ggplot(pdata,aes(x=rt,y=intensity,colour=label))+
ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt)+ ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt)+
ggplot2::xlab("retention time")+ ggplot2::xlab("retention time")+
cust_geom_line()+ cust_geom_line()+
scale_y(axis=axis,labels=sci10)+ scale_y(axis=axis,labels=sci10)+
coord coord
colrdata <- narrow_colrdata(colrdata,kvals) colrdata = narrow_colrdata(colrdata,kvals)
p + scale_legend(colrdata,pdata) + theme_eic() p + scale_legend(colrdata,pdata) + theme_eic()
} }
make_eic_ms2_plot <- function(summ,kvals,labs,axis="linear",rt_range=NULL,asp=1, colrdata=NULL) { make_eic_ms2_plot <- function(db,summ,kvals,labs,axis="linear",rt_range=NULL,asp=1, colrdata=NULL) {
## If nothing selected, just return NULL. ## If nothing selected, just return NULL.
if (is.null(kvals)) return(NULL) if (is.null(kvals)) return(NULL)
## Get metadata. ## Get metadata.
summ_rows <- narrow_summ(summ,kvals,labs,"mz","ms2_rt","ms2_int","Name","SMILES","Formula") summ_rows = narrow_summ(db=db,summ,kvals,labs,"mz","ms2_rt","ms2_int","Name","SMILES")
## Get plotting data for the compound. ## Get plotting data for the compound.
pdata <- get_data_4_eic_ms2(summ, pdata = get_data_4_eic_ms2(db=db,
kvals=kvals, summ,
labs=labs) kvals=kvals,
labs=labs)
if (NROW(pdata)==0L) return(NULL) if (NROW(pdata)==0L) return(NULL)
## Deal with retention time range. ## Deal with retention time range.
rt_lim <- if (is.null(rt_range)) NULL else ggplot2::coord_cartesian(xlim=rt_range)#ggplot2::xlim(rt_range) rt_lim = if (is.null(rt_range)) NULL else ggplot2::coord_cartesian(xlim=rt_range)#ggplot2::xlim(rt_range)
xrng <- range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt) xrng = range(pdata$rt) #if (!is.null(rt_range)) rt_range else range(pdata$rt)
dx <- abs(xrng[[2]]-xrng[[1]]) dx = abs(xrng[[2]]-xrng[[1]])
yrng <- range(pdata$intensity) yrng = range(pdata$intensity)
dy <- abs(yrng[[2]]-yrng[[1]]) dy = abs(yrng[[2]]-yrng[[1]])
## Fix aspect ratio. ## Fix aspect ratio.
aspr <- if (is.null(dx) || is.na(dx) || dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy) aspr = if (is.null(dx) || is.na(dx) || dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
## Derive various labels. ## Derive various labels.
tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])), tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])),
collapse='; ') collapse='; ')
title_txt = paste0("MS2 EIC for ion m/z = ",paste0(signif(unique(summ_rows$mz),digits=7L),collapse=", ")) title_txt = paste0("MS2 EIC for ion m/z = ",paste0(signif(unique(summ_rows$mz),digits=7L),collapse=", "))
subt_txt = if (!length(summ_rows$Name)==0L && !is.na(summ_rows$Name[[1]]) && nchar(summ_rows$Name[[1]])>0L) summ_rows$Name[[1]] else NULL subt_txt = if (!length(summ_rows$Name)==0L && !is.na(summ_rows$Name[[1]]) && nchar(summ_rows$Name[[1]])>0L) summ_rows$Name[[1]] else NULL
## Base plot. ## Base plot.
p <- ggplot2::ggplot(pdata,aes(x=rt,ymin=0,ymax=intensity,colour=label)) + p = ggplot2::ggplot(pdata,aes(x=rt,ymin=0,ymax=intensity,colour=label)) +
ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt) + ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt) +
ggplot2::xlab("retention time")+ggplot2::ylab("intensity")+cust_geom_linerange()+ ggplot2::xlab("retention time")+ggplot2::ylab("intensity")+cust_geom_linerange()+
scale_y(axis=axis,labels=sci10)+rt_lim+guide_fun() scale_y(axis=axis,labels=sci10)+rt_lim+guide_fun()
ans <- pdata[,unique(an)]
## Add theme. ## Add theme.
colrdata <- narrow_colrdata(colrdata,kvals) colrdata = narrow_colrdata(colrdata,kvals)
p + scale_legend(colrdata,pdata) + theme_eic() p + scale_legend(colrdata,pdata) + theme_eic()
} }
make_spec_ms2_plot <- function(extr_ms2,summ,kvals,labs,axis="linear",asp=1, colrdata=NULL) {
make_spec_ms2_plot <- function(db,summ,kvals,labs,axis="linear",asp=1, colrdata=NULL) {
## Only the chosen ones. ## Only the chosen ones.
mdata <- get_data_from_key(summ,key=kvals)[ms2_sel==T] mdata = get_data_from_key(db=db,
common_key <- intersect(names(extr_ms2),names(kvals)) tab=summ,
common_vals <- kvals[common_key] kvals=kvals,
if (length(common_key) == 0L) return(NULL) outcols=union(names(kvals),
subxdata <- get_data_from_key(extr_ms2,key=common_vals) colnames(summ)))[ms2_sel==T]
spectra = db$extr$spectra[mdata,on=.(precid,scan),.(catid,ce,mz,scan,rt=i.ms2_rt,intensity)]
if (NROW(mdata)==0L) return(NULL) if (NROW(mdata)==0L) return(NULL)
if (NROW(subxdata) == 0L) return(NULL) if (NROW(spectra) == 0L) return(NULL)
ans <- data.table(an=mdata[,unique(an)],key="an")
ms2ctg <- c(intersect(c(names(kvals),labs),names(extr_ms2)),"CE") meta = db$cat[spectra[,.(catid=unique(catid))],
xlxx <- intersect(as.character(labs),names(extr_ms2)) on=.(catid),
common_labels <- unique(c("an",common_key,intersect(names(extr_ms2),labs))) .SD,
pdata <- subxdata[ans,on="an"][,.(mz=mz,intensity=intensity,rt=signif(unique(rt),5)),by=common_labels] by=.EACHI,
pdata <- eval(bquote(pdata[,label:=make_line_label(..(lapply(c(xlxx,"rt"),as.symbol))),by=.(xlxx)],splice=T)) .SDcols=labs]
subxdata = meta[spectra,on=.(catid),nomatch=NULL]
labs = c(labs,"ce")
pdata = subxdata[,.(mz=mz,intensity=intensity,rt=signif(unique(rt),5)),by=labs]
pdata = eval(bquote(pdata[,label:=make_line_label(..(lapply(c(labs,"rt"),as.symbol))),by=.(labs)],splice=T))
if (NROW(pdata)==0L) return(NULL) if (NROW(pdata)==0L) return(NULL)
# Aspect ratio. # Aspect ratio.
xrng <- range(pdata$mz) xrng = range(pdata$mz,na.rm=T)
dx <- abs(xrng[[2]]-xrng[[1]]) dx = abs(xrng[[2]]-xrng[[1]])
yrng <- range(pdata$intensity) yrng = range(pdata$intensity)
dy <- abs(yrng[[2]]-yrng[[1]]) dy = abs(yrng[[2]]-yrng[[1]])
aspr <- if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy) aspr = if (dx < .Machine$double.eps) 1 else asp*as.numeric(dx)/as.numeric(dy)
## Get labels. ## Get labels.
tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])), tag_txt = paste0(sapply(names(kvals),function (nx) paste0(nx,": ", kvals[[nx]])),
collapse='; ') collapse='; ')
title_txt = paste0("MS2 spectra for ion m/z = ",paste0(signif(unique(mdata$mz),digits=7L),collapse=", ")) title_txt = paste0("MS2 spectra for ion m/z = ",paste0(signif(unique(mdata$mz),digits=7L),collapse=", "))
nm <- paste(unique(mdata$Name),collapse="; ") nm = paste(unique(mdata$Name),collapse="; ")
subt_txt = if (!length(nm)==0L && !is.na(nm) && nchar(nm)>0L) nm else NULL subt_txt = if (!length(nm)==0L && !is.na(nm) && nchar(nm)>0L) nm else NULL
p <- ggplot2::ggplot(pdata,aes(x=mz,ymin=0,ymax=intensity,colour=label))+ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt)+ggplot2::xlab("m/z")+cust_geom_linerange()+scale_y(axis=axis,labels=sci10)+guide_fun() p = ggplot2::ggplot(pdata,aes(x=mz,ymin=0,ymax=intensity,colour=label))+ggplot2::labs(caption=tag_txt,title=title_txt,subtitle=subt_txt)+ggplot2::xlab("m/z")+cust_geom_linerange()+scale_y(axis=axis,labels=sci10)+guide_fun()
## Add theme. ## Add theme.
colrdata <- narrow_colrdata(colrdata,kvals) colrdata = narrow_colrdata(colrdata,kvals)
p + scale_legend(colrdata,pdata) + theme_eic() p + scale_legend(colrdata,pdata) + theme_eic()
} }
......
...@@ -124,13 +124,14 @@ NUM_INP_HEIGHT="5%" ...@@ -124,13 +124,14 @@ NUM_INP_HEIGHT="5%"
## Possible compound list fields ## Possible compound list fields
EMPTY_CMPD_LIST = dtable(ID=character(), EMPTY_CMPD_LIST = dtable(ID=character(),
SMILES=character(), SMILES=character(),
Name=character(), Name=character(),
Formula=character(), Formula=character(),
RT=numeric(), RT=numeric(),
mz=numeric(), mz=numeric(),
known=character(), known=character(),
ORIG=character()) set=character(),
ORIG=character())
COMP_LIST_COLS = c("ID","Name","SMILES","Formula","RT","mz") COMP_LIST_COLS = c("ID","Name","SMILES","Formula","RT","mz")
## Comprehensive table properties ## Comprehensive table properties
COMP_NAME_MAP = list(RT="rt") COMP_NAME_MAP = list(RT="rt")
...@@ -204,8 +205,8 @@ REPORT_TITLE = "Plots of EICs and MS2 Spectra" ...@@ -204,8 +205,8 @@ REPORT_TITLE = "Plots of EICs and MS2 Spectra"
## Select the most fundamental group of entries. Within this group, ## Select the most fundamental group of entries. Within this group,
## each ID is unique. ## each ID is unique.
BASE_KEY = c("adduct","tag","ID") BASE_KEY = "precid"#c("adduct","tag","ID")
BASE_KEY_MS2 = c(BASE_KEY,"CE","an") BASE_KEY_MS2 = c("precid","ce","scan")#c(BASE_KEY,"CE","scan")
FIG_DEF_CONF =list(grouping=list(group="adduct", FIG_DEF_CONF =list(grouping=list(group="adduct",
plot="ID", plot="ID",
...@@ -216,7 +217,7 @@ FIG_DEF_CONF =list(grouping=list(group="adduct", ...@@ -216,7 +217,7 @@ FIG_DEF_CONF =list(grouping=list(group="adduct",
SUMM_COLS=c("set",BASE_KEY_MS2,"mz","ms1_rt", "ms1_int", "ms2_rt", "ms2_int", SUMM_COLS=c("set",BASE_KEY_MS2,"mz","ms1_rt", "ms1_int", "ms2_rt", "ms2_int",
"ms1_mean","ms2_sel",QA_FLAGS,"Name", "SMILES", "Formula", "known","Comments","file") "ms1_mean","ms2_sel",QA_FLAGS,"Name", "SMILES", "Formula", "known","Comments","file")
SUMM_KEY = c("set","ID","adduct","tag","an") SUMM_KEY = c("set","ID","adduct","tag","precid","catid","scan")
PLOT_FEATURES = c("adduct", PLOT_FEATURES = c("adduct",
"tag", "tag",
...@@ -224,34 +225,34 @@ PLOT_FEATURES = c("adduct", ...@@ -224,34 +225,34 @@ PLOT_FEATURES = c("adduct",
## Empty summary table. ## Empty summary table.
EMPTY_SUMM = data.table::data.table(set=character(0), EMPTY_SUMM = data.table::data.table(set=character(0),
adduct=character(0), adduct=character(0),
tag=character(0), tag=character(0),
ID=character(0), ID=character(0),
CE=character(0), CE=character(0),
an=integer(0), scan=character(0),
mz=numeric(0), mz=numeric(0),
ms1_rt=numeric(0), ms1_rt=numeric(0),
ms1_int=numeric(0), ms1_int=numeric(0),
ms2_rt=numeric(0), ms2_rt=numeric(0),
ms2_int=numeric(0), ms2_int=numeric(0),
ms1_mean=numeric(0), ms1_mean=numeric(0),
ms2_sel=logical(0), ms2_sel=logical(0),
qa_pass=logical(0), qa_pass=logical(0),
qa_ms1_exists=logical(0), qa_ms1_exists=logical(0),
qa_ms2_exists=logical(0), qa_ms2_exists=logical(0),
qa_ms1_good_int=logical(0), qa_ms1_good_int=logical(0),
qa_ms1_above_noise=logical(0), qa_ms1_above_noise=logical(0),
qa_ms2_near=logical(0), qa_ms2_near=logical(0),
qa_ms2_good_int=logical(0), qa_ms2_good_int=logical(0),
Name=character(0), Name=character(0),
SMILES=character(0), SMILES=character(0),
Formula=character(0), Formula=character(0),
known=character(0), known=character(0),
Comments=character(0), Comments=character(0),
file=character(0)) file=character(0))
## Default sorting keys of spectra in the summary table ## Default sorting keys of spectra in the summary table
DEF_KEY_SUMM = c(BASE_KEY_MS2,"an") DEF_KEY_SUMM = c(BASE_KEY_MS2,"scan")
SUBSET_VALS = c(IGNORE="ignore", SUBSET_VALS = c(IGNORE="ignore",
...@@ -360,3 +361,7 @@ METFRAG_RESULT_READF = list(csv = function(file,...) data.table::fread(file=file ...@@ -360,3 +361,7 @@ METFRAG_RESULT_READF = list(csv = function(file,...) data.table::fread(file=file
xml = function(file,...) readxl::read_excel(path=file,...)) xml = function(file,...) readxl::read_excel(path=file,...))
METFRAG_DEFAULT_PROC = 1L METFRAG_DEFAULT_PROC = 1L
## DATA MODEL
DB_CATALOGUE_KEY = c("set","tag","adduct","ID")
...@@ -22,7 +22,7 @@ ...@@ -22,7 +22,7 @@
#' @importFrom shiny selectInput numericInput textInput HTML #' @importFrom shiny selectInput numericInput textInput HTML
GUI_SELECT_INPUTS <- c("proj_list", GUI_SELECT_INPUTS = c("proj_list",
"indir_list", "indir_list",
"ms1_coarse_unit", "ms1_coarse_unit",
"ms1_fine_unit", "ms1_fine_unit",
...@@ -30,7 +30,7 @@ GUI_SELECT_INPUTS <- c("proj_list", ...@@ -30,7 +30,7 @@ GUI_SELECT_INPUTS <- c("proj_list",
"ret_time_shift_tol_unit", "ret_time_shift_tol_unit",
"dfile_list") "dfile_list")
GUI_NUMERIC_INPUTS <- c("ms1_coarse", GUI_NUMERIC_INPUTS = c("ms1_coarse",
"ms1_fine", "ms1_fine",
"ms1_eic", "ms1_eic",
"ms1_rt_win", "ms1_rt_win",
...@@ -39,14 +39,14 @@ GUI_NUMERIC_INPUTS <- c("ms1_coarse", ...@@ -39,14 +39,14 @@ GUI_NUMERIC_INPUTS <- c("ms1_coarse",
"s2n", "s2n",
"ret_time_shift_tol") "ret_time_shift_tol")
GUI_TEXT_INPUTS <- c("rep_aut", GUI_TEXT_INPUTS = c("rep_aut",
"rep_tit") "rep_tit")
GUI_RADIO_INPUTS <- c("missingprec") GUI_RADIO_INPUTS = c("missingprec")
GUI_ALL_INPUTS <- c(GUI_SELECT_INPUTS, GUI_ALL_INPUTS = c(GUI_SELECT_INPUTS,
GUI_NUMERIC_INPUTS, GUI_NUMERIC_INPUTS,
GUI_TEXT_INPUTS, GUI_TEXT_INPUTS,
GUI_RADIO_INPUTS) GUI_RADIO_INPUTS)
...@@ -55,8 +55,8 @@ GUI_ALL_INPUTS <- c(GUI_SELECT_INPUTS, ...@@ -55,8 +55,8 @@ GUI_ALL_INPUTS <- c(GUI_SELECT_INPUTS,
add_new_def_tag <- function(old_tags,how_many) { add_new_def_tag <- function(old_tags,how_many) {
ind = which(grepl(r"(^F\d+$)",old_tags)) ind = which(grepl(r"(^F\d+$)",old_tags))
st_num = if (length(ind)>0L) { st_num = if (length(ind)>0L) {
old_def_tags <- old_tags[ind] old_def_tags = old_tags[ind]
tag_nums <- gsub(r"(^F(\d+)$)",r"(\1)",old_def_tags) tag_nums = gsub(r"(^F(\d+)$)",r"(\1)",old_def_tags)
max(as.integer(tag_nums)) max(as.integer(tag_nums))
...@@ -162,26 +162,26 @@ create_gui <- function(project_path=NA_character_) { ...@@ -162,26 +162,26 @@ create_gui <- function(project_path=NA_character_) {
#'@export #'@export
r2datatab <- function(rdatatab) { r2datatab <- function(rdatatab) {
shiny::isolate({ shiny::isolate({
file <- rdatatab$file file = rdatatab$file
adduct <- rdatatab$adduct adduct = rdatatab$adduct
tag <- rdatatab$tag tag = rdatatab$tag
set <- rdatatab$set set = rdatatab$set
}) })
if (length(file)==0L) file <- character(0) if (length(file)==0L) file = character(0)
if (length(adduct)==0L) adduct <- rep(NA_character_,length(file)) if (length(adduct)==0L) adduct = rep(NA_character_,length(file))
if (length(tag)==0L) tag <- rep(NA_character_,length(file)) if (length(tag)==0L) tag = rep(NA_character_,length(file))
if (length(set)==0L) tag <- rep(NA_character_,length(file)) if (length(set)==0L) tag = rep(NA_character_,length(file))
data.table(tag=tag,adduct=adduct,set=set,file=file) data.table(tag=tag,adduct=adduct,set=set,file=file)
} }
r2filetag <- function(rfiletag) { r2filetag <- function(rfiletag) {
shiny::isolate({ shiny::isolate({
file <- rfiletag$file file = rfiletag$file
tag <- rfiletag$tag tag = rfiletag$tag
}) })
if (length(file)==0L) file <- character(0) if (length(file)==0L) file = character(0)
if (length(tag)==0L) tag <- rep(NA_character_,length(file)) if (length(tag)==0L) tag = rep(NA_character_,length(file))
data.table(tag=tag,file=file) data.table(tag=tag,file=file)
} }
...@@ -193,7 +193,7 @@ gen_dtab <- function(tablist,sets) { ...@@ -193,7 +193,7 @@ gen_dtab <- function(tablist,sets) {
r2compounds <- function(rcompounds) { r2compounds <- function(rcompounds) {
shiny::isolate({ shiny::isolate({
cmpd_lists <- rcompounds$lists cmpd_lists = rcompounds$lists
}) })
list(lists=cmpd_lists) list(lists=cmpd_lists)
...@@ -202,17 +202,17 @@ r2compounds <- function(rcompounds) { ...@@ -202,17 +202,17 @@ r2compounds <- function(rcompounds) {
#' @export #' @export
pack_app_state <- function(input, gui) { pack_app_state <- function(input, gui) {
pack <- list() pack = list()
shiny::isolate({ shiny::isolate({
pack_inputs <- list() pack_inputs = list()
pack_input_names <- which_gui_inputs(inputs) pack_input_names = which_gui_inputs(inputs)
pack_inputs <- shiny::reactiveValuesToList(input)[pack_input_names] pack_inputs = shiny::reactiveValuesToList(input)[pack_input_names]
pack$input <- pack_inputs pack$input = pack_inputs
pack$datatab <- r2datatab(gui$datatab) pack$datatab = r2datatab(gui$datatab)
pack$filetag <- r2filetag(gui$filetag) pack$filetag = r2filetag(gui$filetag)
pack$compounds <- r2compounds(gui$compounds) pack$compounds = r2compounds(gui$compounds)
pack$paths <- list() pack$paths = list()
pack$paths$data <- gui$paths$data pack$paths$data = gui$paths$data
}) })
pack pack
...@@ -318,19 +318,21 @@ unpack_app_state <- function(session,envopts,input,top_data_dir,project_path,pac ...@@ -318,19 +318,21 @@ unpack_app_state <- function(session,envopts,input,top_data_dir,project_path,pac
input=input, input=input,
packed_state=packed_state) packed_state=packed_state)
gui <- create_gui(project_path=project_path) gui = create_gui(project_path=project_path)
gui$compounds$lists <- packed_state$compounds$lists gui$compounds$lists = packed_state$compounds$lists
gui$datatab$file <- packed_state$datatab$file gui$datatab$file = packed_state$datatab$file
gui$datatab$adduct <- packed_state$datatab$adduct gui$datatab$adduct = packed_state$datatab$adduct
gui$datatab$tag <- packed_state$datatab$tag gui$datatab$tag = packed_state$datatab$tag
gui$datatab$set <- packed_state$datatab$set gui$datatab$set = packed_state$datatab$set
gui$filetag$file <- packed_state$filetag$file gui$filetag$file = packed_state$filetag$file
gui$filetag$tag <- packed_state$filetag$tag gui$filetag$tag = packed_state$filetag$tag
x <- packed_state$paths$data x = packed_state$paths$data
gui$paths$data = if (length(x)>0 && nchar(x)>0) basename(x) else "" gui$paths$data = if (length(x)>0 && nchar(x)>0) {
if (!dir.exists(file.path(top_data_dir,gui$paths$data))) {warning("Data directory ", gui$paths$data, " does not exist. You must select one.")} file.path(top_data_dir,basename(x))
} else ""
if (!dir.exists(gui$paths$data)) {warning("Data directory ", gui$paths$data, " does not exist. You must select one.")}
gui gui
}) })
...@@ -340,12 +342,12 @@ unpack_app_state <- function(session,envopts,input,top_data_dir,project_path,pac ...@@ -340,12 +342,12 @@ unpack_app_state <- function(session,envopts,input,top_data_dir,project_path,pac
input2conf_setup <- function(input,gui,conf=list()) { input2conf_setup <- function(input,gui,conf=list()) {
if (length(conf)==0L) { if (length(conf)==0L) {
conf$compounds <- list() conf$compounds = list()
conf$summary_table <- list() conf$summary_table = list()
conf$debug <- F conf$debug = F
} }
conf$compounds$lists <- gui$compounds$lists conf$compounds$lists = gui$compounds$lists
conf$paths$data = basename(gui$paths$data) conf$paths$data = basename(gui$paths$data)
conf conf
} }
...@@ -353,11 +355,11 @@ input2conf_setup <- function(input,gui,conf=list()) { ...@@ -353,11 +355,11 @@ input2conf_setup <- function(input,gui,conf=list()) {
input2conf_extract <- function(input,conf) { input2conf_extract <- function(input,conf) {
conf$tolerance = list() conf$tolerance = list()
conf$extract = list() conf$extract = list()
conf$tolerance[["ms1 fine"]] <- paste(input$ms1_fine,input$ms1_fine_unit) conf$tolerance[["ms1 fine"]] = paste(input$ms1_fine,input$ms1_fine_unit)
conf$tolerance[["ms1 coarse"]] <- paste(input$ms1_coarse,input$ms1_coarse_unit) conf$tolerance[["ms1 coarse"]] = paste(input$ms1_coarse,input$ms1_coarse_unit)
conf$tolerance[["eic"]] <- paste(input$ms1_eic,input$ms1_eic_unit) conf$tolerance[["eic"]] = paste(input$ms1_eic,input$ms1_eic_unit)
conf$tolerance[["rt"]] <- paste(input$ms1_rt_win,input$ms1_rt_win_unit) conf$tolerance[["rt"]] = paste(input$ms1_rt_win,input$ms1_rt_win_unit)
conf$extract$missing_precursor_info <- input$missingprec conf$extract$missing_precursor_info = input$missingprec
conf conf
} }
...@@ -365,26 +367,26 @@ input2conf_extract <- function(input,conf) { ...@@ -365,26 +367,26 @@ input2conf_extract <- function(input,conf) {
input2conf_prescreen <- function(input,conf) { input2conf_prescreen <- function(input,conf) {
conf$prescreen = list() conf$prescreen = list()
conf$prescreen[["ms1_int_thresh"]] <- input$ms1_int_thresh conf$prescreen[["ms1_int_thresh"]] = input$ms1_int_thresh
conf$prescreen[["ms2_int_thresh"]] <- input$ms2_int_thresh conf$prescreen[["ms2_int_thresh"]] = input$ms2_int_thresh
conf$prescreen[["s2n"]] <- input$s2n conf$prescreen[["s2n"]] = input$s2n
conf$prescreen[["ret_time_shift_tol"]] <- paste(input$ret_time_shift_tol,input$ret_time_shift_tol_unit) conf$prescreen[["ret_time_shift_tol"]] = paste(input$ret_time_shift_tol,input$ret_time_shift_tol_unit)
conf conf
} }
input2conf_figures <- function(input,conf) { input2conf_figures <- function(input,conf) {
conf$figures = list() conf$figures = list()
conf$figures$rt_min <- paste(input$plot_rt_min,input$plot_rt_min_unit) conf$figures$rt_min = paste(input$plot_rt_min,input$plot_rt_min_unit)
conf$figures$rt_max <- paste(input$plot_rt_max,input$plot_rt_max_unit) conf$figures$rt_max = paste(input$plot_rt_max,input$plot_rt_max_unit)
conf$figures$ext <- input$plot_ext conf$figures$ext = input$plot_ext
conf conf
} }
input2conf_report <- function(input,conf) { input2conf_report <- function(input,conf) {
conf$report = list() conf$report = list()
conf$report$author <- input$rep_aut conf$report$author = input$rep_aut
conf$report$title <- input$rep_tit conf$report$title = input$rep_tit
conf conf
} }
...@@ -401,7 +403,7 @@ input2conf_metfrag <- function(input,conf) { ...@@ -401,7 +403,7 @@ input2conf_metfrag <- function(input,conf) {
MetFragPreProcessingCandidateFilter = paste(input$mf_pre_processing_candidate_filter,collapse=","), MetFragPreProcessingCandidateFilter = paste(input$mf_pre_processing_candidate_filter,collapse=","),
MetFragPostProcessingCandidateFilter = paste(input$mf_post_processing_candidate_filter,collapse=",")) MetFragPostProcessingCandidateFilter = paste(input$mf_post_processing_candidate_filter,collapse=","))
## TODO: FIXME: We need to move away from unit weights from some ## TODO: FIXME: We need to move away from unit weights at some
## point. This needs some extra widgets (Sigh!). ## point. This needs some extra widgets (Sigh!).
insc = sapply(input$mf_scores_intrinsic,function(x) 1.0) insc = sapply(input$mf_scores_intrinsic,function(x) 1.0)
names(insc) = input$mf_scores_intrinsic names(insc) = input$mf_scores_intrinsic
...@@ -425,7 +427,7 @@ app_update_conf <- function(input,gui,envopts,fconf,m) { ...@@ -425,7 +427,7 @@ app_update_conf <- function(input,gui,envopts,fconf,m) {
fstr = paste0("input2conf_",fstrp) fstr = paste0("input2conf_",fstrp)
m$conf = do.call(fstr,list(input,conf=m$conf)) m$conf = do.call(fstr,list(input,conf=m$conf))
} }
m$run <- new_runtime_state(project=gui$paths$project, m$run = new_runtime_state(project=gui$paths$project,
envopts = envopts, envopts = envopts,
conf=m$conf) conf=m$conf)
m m
...@@ -453,28 +455,24 @@ app_state2state <- function(input,gui,envopts,m=NULL) { ...@@ -453,28 +455,24 @@ app_state2state <- function(input,gui,envopts,m=NULL) {
} }
get_sets <- function(gui) { get_sets <- function(gui) {
## TODO FIXME fn_lists = file.path(gui$paths$project,gui$compounds$lists)
## Think about this cmpds = join_compound_lists(fn_lists)
## fn_lists <- file.path(gui$paths$project,gui$compounds$lists) cmpds = process_cmpd_sets(cmpds)
cmpds[,unique(set)]
## df <- fread(file=fn_lists)
## if (!
## res = df[,unique(set)]
if (length(res)==0L) res = "ALL"
} }
gen_dfiles_tab <- function(gui) { gen_dfiles_tab <- function(gui) {
curr_file <- gui$filetag$file curr_file = gui$filetag$file
curr_tag <- gui$filetag$tag curr_tag = gui$filetag$tag
res <- data.table(file=curr_file,tag=curr_tag) res = data.table(file=curr_file,tag=curr_tag)
res res
} }
gui2datatab <- function(gui) { gui2datatab <- function(gui) {
df <- data.table(tag=as.character(gui$datatab$tag), df = data.table(tag=as.character(gui$datatab$tag),
adduct=as.character(gui$datatab$adduct), adduct=as.character(gui$datatab$adduct),
set=as.character(gui$datatab$set), set=as.character(gui$datatab$set),
file=as.character(gui$datatab$file)) file=as.character(gui$datatab$file))
...@@ -493,55 +491,59 @@ gui2datatab <- function(gui) { ...@@ -493,55 +491,59 @@ gui2datatab <- function(gui) {
## (of tags, CEs) in the index. ## (of tags, CEs) in the index.
gen_cindex <- function(summ,sorder,cols = CINDEX_COLS,by. = CINDEX_BY) { gen_cindex <- function(summ,sorder,cols = CINDEX_COLS,by. = CINDEX_BY) {
if (NROW(summ) == 0L) return(NULL) if (NROW(summ) == 0L) return(NULL)
allc <- c(by.,cols) allc = c(by.,cols)
xsumm <- summ[,..allc] xsumm = summ[,..allc]
setnames(xsumm,old="ms1_rt",new="rt",skip_absent=T) setnames(xsumm,old="ms1_rt",new="rt",skip_absent=T)
res <- xsumm[,.SD[max(qlt_ms1)==qlt_ms1][max(qlt_ms2)==qlt_ms2],by=by.] res = xsumm[,.SD[max(qlt_ms1)==qlt_ms1][max(qlt_ms2)==qlt_ms2],by=by.]
res <- res[,c("mz","rt","Name","qlt_ms1","qlt_ms2"):=.(first(mz), res = res[,c("mz","rt","Name","qlt_ms1","qlt_ms2"):=.(mean(mz,na.rm=T),
first(mean(rt)), mean(rt,na.rm=T),
first(Name), first(Name),
first(qlt_ms1), max(qlt_ms1,na.rm=T),
first(qlt_ms2)), max(qlt_ms2,na.rm=T)),
by=by.] by=by.]
res <- res[,unique(.SD),by=by.] res = res[,unique(.SD),by=by.]
sorder <- unique(sorder) sorder = unique(sorder)
wna <- which(sorder=="nothing"); if (length(wna)>0L) sorder <- sorder[-wna] wna = which(sorder=="nothing"); if (length(wna)>0L) sorder = sorder[-wna]
quality <- which("quality"==sorder) quality = which("quality"==sorder)
if (length(quality)>0L) { if (length(quality)>0L) {
pre <- head(sorder,quality-1L) pre = head(sorder,quality-1L)
post <- tail(sorder,-quality) post = tail(sorder,-quality)
sorder <- c(pre,"qlt_ms1","qlt_ms2",post) sorder = c(pre,"qlt_ms1","qlt_ms2",post)
} }
ord <- rep(1L,length(sorder)) ord = rep(1L,length(sorder))
if ("qlt_ms1" %in% sorder) { if ("qlt_ms1" %in% sorder) {
ind <- which(sorder %in% c("qlt_ms1","qlt_ms2")) ind = which(sorder %in% c("qlt_ms1","qlt_ms2"))
ord[ind] <- -1L ord[ind] = -1L
} }
if (length(sorder)>0) setorderv(res,cols=sorder,order=ord) if (length(sorder)>0) setorderv(res,cols=sorder,order=ord)
setnames(res,old="rt",new="rt(ms1)")
## Remove confusing columns.
res[,c("rt","mz"):=NULL]
res[,c("qlt_ms1","qlt_ms2"):=.(signif(100*qlt_ms1/10.,3),signif(100*qlt_ms2/10.,3))]
setnames(res,c("qlt_ms1","qlt_ms2"),c("best score (ms1)","best score (ms2)"))
res res
} }
cindex_from_input <- function(clabs,sort_catg=character(4),summ) { cindex_from_input <- function(clabs,sort_catg=character(4),summ) {
grp <- if (isTruthy(clabs)) setdiff(CINDEX_BY,clabs) else CINDEX_BY grp = if (isTruthy(clabs)) setdiff(CINDEX_BY,clabs) else CINDEX_BY
sorder <- setdiff(sort_catg,clabs) sorder = setdiff(sort_catg,clabs)
gen_cindex(summ,sorder=sorder,by=grp) gen_cindex(summ,sorder=sorder,by=grp)
} }
get_cindex_key <- function(cindex) { get_cindex_key <- function(cindex) {
## Select only valid category names. ## Select only valid category names.
x <- which(CINDEX_BY %in% names(cindex)) x = which(CINDEX_BY %in% names(cindex))
CINDEX_BY[x] CINDEX_BY[x]
} }
get_cindex_parents <- function(summ,ckey,kvals,labs) { get_cindex_parents <- function(summ,ckey,kvals,labs) {
## Get kvals part of summ. ## Get kvals part of summ.
tab <- summ[(kvals),on=names(kvals)][,unique(.SD),.SDcols=labs,by=ckey] #get_data_from_key(summ,kvals) tab = summ[(kvals),on=names(kvals)][,unique(.SD),.SDcols=labs,by=ckey] #get_data_from_key(summ,kvals)
tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcol=labs] tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcol=labs]
keys <- names(tab)[names(tab)!="item"] keys = names(tab)[names(tab)!="item"]
data.table::setkeyv(tab,keys) data.table::setkeyv(tab,keys)
tab tab
} }
...@@ -550,28 +552,28 @@ get_cindex_kval <- function(cindex,row,key) { ...@@ -550,28 +552,28 @@ get_cindex_kval <- function(cindex,row,key) {
## Accounting for not fully initialised state. ## Accounting for not fully initialised state.
if (!is.numeric(row) || is.na(row) || length(key)==0L || is.na(key) || NROW(cindex)==0L) return(NULL) if (!is.numeric(row) || is.na(row) || length(key)==0L || is.na(key) || NROW(cindex)==0L) return(NULL)
rowtab <- cindex[(row),..key] rowtab = cindex[(row),..key]
res <- lapply(rowtab,function (x) x[[1]]) res = lapply(rowtab,function (x) x[[1]])
names(res) <- key names(res) = key
res res
} }
get_summ_subset <- function(summ,ptab,paritem,kvals) { get_summ_subset <- function(db,summ,ptab,paritem,kvals) {
select <- ptab[item==(paritem)] select = ptab[item==(paritem)]
tab <- get_data_from_key(summ,kvals)[select,nomatch=NULL,on=key(ptab)] tab = get_data_from_key(db=db,tab=summ,kvals=kvals)[select,nomatch=NULL,on=key(ptab)]
if ("an.1" %in% names(tab)) tab[,an.1:=NULL] #TODO: This is if ("scan.1" %in% names(tab)) tab[,scan.1:=NULL] #TODO: This is
#probably a lousy #probably a lousy
#hack. #hack.
tab tab
} }
get_ltab <- function(summ_subs,cols=c("an","ms2_rt")) { get_ltab <- function(summ_subs,cols=c("scan","ms2_rt")) {
tab <- summ_subs tab = summ_subs
if (NROW(tab)==1L && is.na(tab$an)) return(data.table::data.table(item=character())) if (NROW(tab)==1L && is.na(tab$an)) return(data.table::data.table(item=character()))
tab[is.na(ms2_sel),ms2_sel:=F] #TODO FIXME: Check why NAs exist at all? tab[is.na(ms2_sel),ms2_sel:=F]
tab[,passval:=fifelse(qa_pass==T,"OK","BAD")] tab[,passval:=fifelse(qa_pass==T,"OK","BAD")]
tab[ms2_sel==T,passval:="SELECTED"] tab[ms2_sel==T,passval:="SELECTED"]
res <- tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcols=c(cols,"passval")] res = tab[,item:=do.call(paste,c(.SD,list(sep=";"))),.SDcols=c(cols,"passval")]
data.table::setkey(res,"ms2_rt") data.table::setkey(res,"ms2_rt")
res res
} }
...@@ -585,18 +587,18 @@ update_on_commit_chg <- function(summ,input,ptab,ltab) { ...@@ -585,18 +587,18 @@ update_on_commit_chg <- function(summ,input,ptab,ltab) {
n_ms2_sel = input$chg_ms2sel n_ms2_sel = input$chg_ms2sel
sel_par <- input$sel_parent_trace sel_par = input$sel_parent_trace
sel_spec <- input$sel_spec sel_spec = input$sel_spec
pkvals <- ptab[item==(sel_par),.SD,.SDcols=intersect(SUMM_KEY,names(ptab))] pkvals = ptab[item==(sel_par),.SD,.SDcols=intersect(SUMM_KEY,names(ptab))]
lkvals <- ltab[item==(sel_spec),.SD,.SDcols=intersect(SUMM_KEY,names(ltab))] lkvals = ltab[item==(sel_spec),.SD,.SDcols=intersect(SUMM_KEY,names(ltab))]
kvals <- c(as.list(pkvals),as.list(lkvals)) kvals = c(as.list(pkvals),as.list(lkvals))
kvals <- kvals[unique(names(kvals))] kvals = kvals[unique(names(kvals))]
if ('an' %in% names(kvals) && n_ms2_sel) { if ('an' %in% names(kvals) && n_ms2_sel) {
rkvals <- kvals[!(names(kvals) %in% 'an')] rkvals = kvals[!(names(kvals) %in% 'an')]
rktab <- tabkey(summ,kvals=rkvals) rktab = tabkey(summ,kvals=rkvals)
tabsel <- summ[rktab,.(an,ms2_sel)] tabsel = summ[rktab,.(scan,ms2_sel)]
ansel <- tabsel[ms2_sel == T,an] ansel = tabsel[ms2_sel == T,scan]
print('ansel') print('ansel')
print(ansel) print(ansel)
if (length(ansel)!=0) { if (length(ansel)!=0) {
...@@ -607,29 +609,29 @@ update_on_commit_chg <- function(summ,input,ptab,ltab) { ...@@ -607,29 +609,29 @@ update_on_commit_chg <- function(summ,input,ptab,ltab) {
} }
tgts <- c("ms1_rt","ms1_int",names(n_qa),"ms2_sel") tgts = c("ms1_rt","ms1_int",names(n_qa),"ms2_sel")
srcs <- c(list(n_ms1_rt,n_ms1_int),as.list(n_qa),as.list(n_ms2_sel)) srcs = c(list(n_ms1_rt,n_ms1_int),as.list(n_qa),as.list(n_ms2_sel))
the_row <- tabkey(summ,kvals=kvals) the_row = tabkey(summ,kvals=kvals)
summ[the_row,(tgts):=..srcs] summ[the_row,(tgts):=..srcs]
summ[,an.1:=NULL] #FIXME: an.1 pops up somewhere. summ[,scan.1:=NULL]
qflg <- QA_FLAGS[!(QA_FLAGS %in% "qa_pass")] qflg = QA_FLAGS[!(QA_FLAGS %in% "qa_pass")]
summ[the_row,qa_pass:=apply(.SD,1,all),.SDcols=qflg] summ[the_row,qa_pass:=apply(.SD,1,all),.SDcols=qflg]
summ summ
} }
get_mprop_ms2_metadata <- function(ltab_entry) { get_mprop_ms2_metadata <- function(ltab_entry) {
res <- list(rt=NA_real_,int=NA_real_,qa=character(0),ms2_sel=F) res = list(rt=NA_real_,int=NA_real_,qa=character(0),ms2_sel=F)
if (NROW(ltab_entry)==0L) return(res) if (NROW(ltab_entry)==0L) return(res)
res$rt = ltab_entry$ms1_rt res$rt = ltab_entry$ms1_rt
res$int = ltab_entry$ms1_int res$int = ltab_entry$ms1_int
z <- ltab_entry[.SD,.SDcols=patterns("qa_ms[12].*")] z = ltab_entry[.SD,.SDcols=patterns("qa_ms[12].*")]
lqa_vals <- as.list(ltab_entry[,.SD,.SDcols=patterns("qa_ms[12].*")]) lqa_vals = as.list(ltab_entry[,.SD,.SDcols=patterns("qa_ms[12].*")])
qa_names <- names(lqa_vals) qa_names = names(lqa_vals)
res$qa <- qa_names[as.logical(lqa_vals)] res$qa = qa_names[as.logical(lqa_vals)]
res$ms2_sel = ltab_entry$ms2_sel res$ms2_sel = ltab_entry$ms2_sel
res res
......
...@@ -564,7 +564,7 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -564,7 +564,7 @@ mk_shinyscreen_server <- function(projects,init) {
res = if (NROW(sel_dt)>0) { res = if (NROW(sel_dt)>0) {
tab[sel_dt,..coln,on=c("adduct","tag","ID","an")] tab[sel_dt,..coln,on=c("adduct","tag","ID","scan")]
} else triv } else triv
data.table::setnames(res,"intensity","ms2_int") data.table::setnames(res,"intensity","ms2_int")
res res
...@@ -700,8 +700,6 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -700,8 +700,6 @@ mk_shinyscreen_server <- function(projects,init) {
## REACTIVE FUNCTIONS: COMPOUND INDEX ## REACTIVE FUNCTIONS: COMPOUND INDEX
rf_get_cindex <- reactive({ rf_get_cindex <- reactive({
## TODO: FIXME: Uncomment after rearranging everything.
## input$cmt_changes_b
rvs$status$is_qa_stat rvs$status$is_qa_stat
s1 = input$sort1 s1 = input$sort1
s2 = input$sort2 s2 = input$sort2
...@@ -741,8 +739,8 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -741,8 +739,8 @@ mk_shinyscreen_server <- function(projects,init) {
rf_get_cindex_parents <- reactive({ rf_get_cindex_parents <- reactive({
rvs$m rvs$m
isolate({ isolate({
ms1 = rvs$m$extr$ms1 ms1 = rvs$m$db$extr$cgm$ms1
ms2 = rvs$m$extr$ms2 ms2 = rvs$m$db$extr$cgm$ms2
summ = req(rvs$m$out$tab$summ) summ = req(rvs$m$out$tab$summ)
}) })
...@@ -760,7 +758,8 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -760,7 +758,8 @@ mk_shinyscreen_server <- function(projects,init) {
kvals = req(rf_get_cindex_kval()) kvals = req(rf_get_cindex_kval())
ptab = rf_get_cindex_parents() ptab = rf_get_cindex_parents()
if (isTruthy(parent)) { if (isTruthy(parent)) {
get_summ_subset(summ=summ, get_summ_subset(db=rvs$m$db,
summ=summ,
ptab=ptab, ptab=ptab,
paritem=parent, paritem=parent,
kvals=kvals) kvals=kvals)
...@@ -813,20 +812,23 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -813,20 +812,23 @@ mk_shinyscreen_server <- function(projects,init) {
rf_plot_eic_ms1 <- reactive({ rf_plot_eic_ms1 <- reactive({
isolate({ isolate({
ms1 = rvs$m$extr$ms1 ms1 = rvs$m$db$extr$cgm$ms1
summ = rvs$m$out$tab$summ summ = rvs$m$out$tab$summ
db = rvs$m$db
}) })
req(NROW(summ)>0L) req(NROW(summ)>0L)
req(NROW(ms1)>0L) req(NROW(ms1)>0L)
p = make_eic_ms1_plot(ms1,summ,kvals=rf_get_cindex_kval(), p = make_eic_ms1_plot(db=db,
labs=rf_get_cindex_labs(), summ,
asp=PLOT_EIC_ASPECT, kvals=rf_get_cindex_kval(),
rt_range=rf_get_rtrange(), labs=rf_get_cindex_labs(),
i_range=rf_get_irange(), asp=PLOT_EIC_ASPECT,
colrdata = rf_colrdata()) rt_range=rf_get_rtrange(),
i_range=rf_get_irange(),
colrdata = rf_colrdata())
p = if (!is.null(p)) p else empty_plot("Nothing to plot") p = if (!is.null(p)) p else empty_plot("Nothing to plot")
...@@ -851,12 +853,13 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -851,12 +853,13 @@ mk_shinyscreen_server <- function(projects,init) {
gg = rf_plot_eic_ms1() gg = rf_plot_eic_ms1()
rt_rng = range(gg$data$rt) rt_rng = range(gg$data$rt)
p = make_eic_ms2_plot(summ, p = make_eic_ms2_plot(rvs$m$db,
kvals=rf_get_cindex_kval(), summ,
labs=rf_get_cindex_labs(), kvals=rf_get_cindex_kval(),
rt_range = rf_get_ms2_eic_rtrange(), labs=rf_get_cindex_labs(),
asp=PLOT_EIC_ASPECT, rt_range = rf_get_ms2_eic_rtrange(),
colrdata=rf_colrdata()) asp=PLOT_EIC_ASPECT,
colrdata=rf_colrdata())
p = if (!is.null(p)) p else empty_plot("Nothing to plot") p = if (!is.null(p)) p else empty_plot("Nothing to plot")
...@@ -878,15 +881,15 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -878,15 +881,15 @@ mk_shinyscreen_server <- function(projects,init) {
rf_plot_spec_ms2 <- reactive({ rf_plot_spec_ms2 <- reactive({
isolate({ isolate({
summ = rvs$m$out$tab$summ summ = rvs$m$out$tab$summ
ms2 = rvs$m$extr$ms2 ms2 = rvs$m$db$extr$cgm$ms2
}) })
req(NROW(summ)>0L) req(NROW(summ)>0L)
req(NROW(ms2)>0L) req(NROW(ms2)>0L)
p = make_spec_ms2_plot(ms2, p = make_spec_ms2_plot(db = rvs$m$db,
summ, summ,
kvals=req(rf_get_cindex_kval()), kvals=req(rf_get_cindex_kval()),
labs=req(rf_get_cindex_labs()), labs=req(rf_get_cindex_labs()),
colrdata=rf_colrdata()) colrdata=rf_colrdata())
p = if (!is.null(p)) p else empty_plot("Nothing to plot") p = if (!is.null(p)) p else empty_plot("Nothing to plot")
p p
...@@ -905,29 +908,35 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -905,29 +908,35 @@ mk_shinyscreen_server <- function(projects,init) {
choices = list.files(path=top_data_dir, choices = list.files(path=top_data_dir,
pattern = CMPD_LIST_PATT)) pattern = CMPD_LIST_PATT))
updateSelectInput(session = session,
inputId = "dfile_list",
choices = list.files(path=top_data_dir,
pattern = DFILES_LIST_PATT))
updateSelectInput(session = session,
inputId = "top_data_dir_list",
selected = basename(top_data_dir),
choices = list.dirs(path = init$envopts$top_data_dir,
full.names = F,
recursive = F))
}) })
observe({ observe({
top_data_dir = rvs$gui$paths$data data_dir = rvs$gui$paths$data
req(isTruthy(top_data_dir) && dir.exists(top_data_dir)) if ((isTruthy(data_dir) && dir.exists(data_dir))) {
dfchoices = list.files(path=data_dir,
pattern = DFILES_LIST_PATT)
updateSelectInput(session = session,
inputId = "dfile_list",
choices = dfchoices)
updateSelectInput(session = session,
inputId = "top_data_dir_list",
selected = basename(rvs$gui$paths$data),
choices = list.dirs(path = init$envopts$top_data_dir,
full.names = F,
recursive = F))
}
updateSelectInput(session = session,
inputId = "dfile_list",
choices = list.files(path=top_data_dir,
pattern = DFILES_LIST_PATT))
}) })
## Update projects and data directories every second. ## Update projects and data directories every second.
observeEvent(rtimer1000(),{ observeEvent(rtimer1000(),{
...@@ -981,13 +990,15 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -981,13 +990,15 @@ mk_shinyscreen_server <- function(projects,init) {
rvs$status$ms2_int_thresh_stat = rvs$m$conf$prescreen[["ms2_int_thresh"]] rvs$status$ms2_int_thresh_stat = rvs$m$conf$prescreen[["ms2_int_thresh"]]
rvs$status$s2n_stat = rvs$m$conf$prescreen[["s2n"]] rvs$status$s2n_stat = rvs$m$conf$prescreen[["s2n"]]
rvs$status$ret_time_shift_tol_stat = rvs$m$conf$prescreen[["ret_time_shift_tol"]] rvs$status$ret_time_shift_tol_stat = rvs$m$conf$prescreen[["ret_time_shift_tol"]]
if (NROW(m$extr$ms1)>0L) rvs$status$is_extracted_stat = "Yes." if (NROW(m$db$extr$cgm$ms1)>0L) rvs$status$is_extracted_stat = "Yes."
if (NROW(m$out$tab$summ)>0L) rvs$status$is_qa_stat = "Yes." if (NROW(m$out$tab$summ)>0L) rvs$status$is_qa_stat = "Yes."
} else { } else {
message("Initialising project: ",wd) message("Initialising project: ",wd)
rvs$gui = create_gui(project_path=fullwd) rvs$gui = create_gui(project_path=fullwd)
} }
message("project: ",rvs$gui$project()) message("project: ",rvs$gui$project())
}, label = "project-b") }, label = "project-b")
...@@ -1017,10 +1028,11 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1017,10 +1028,11 @@ mk_shinyscreen_server <- function(projects,init) {
observeEvent(input$sel_data_dir_b,{ observeEvent(input$sel_data_dir_b,{
data_dir = input$top_data_dir_list data_dir = input$top_data_dir_list
req(isTruthy(data_dir)) if (isTruthy(data_dir)) {
rvs$gui$paths$data = file.path(init$envopts$top_data_dir, data_dir) rvs$gui$paths$data = file.path(init$envopts$top_data_dir, data_dir)
message("Selected data dir:",rvs$gui$paths$data)
message("Selected data dir:",rvs$gui$paths$data) }
}) })
...@@ -1045,32 +1057,9 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1045,32 +1057,9 @@ mk_shinyscreen_server <- function(projects,init) {
observeEvent(input$datafiles_b,{ observeEvent(input$datafiles_b,{
new_file = input$dfile_list new_file = input$dfile_list
if (isTruthy(new_file)) { if (isTruthy(new_file)) rvs$gui$filetag = filetag_add_file(rvs$gui$filetag,
rvs$gui$filetag = filetag_add_file(rvs$gui$filetag, new_file)
new_file)
## curr_file = rvs$gui$datatab$file
## curr_tag = rvs$gui$datatab$tag
## curr_adduct = rvs$gui$datatab$adduct
## curr_set = rvs$gui$datatab$set
## res_file = union(curr_file,new_file)
## res_adduct = c(curr_adduct,rep(NA_character_,nd))
## res_set = c(curr_set,rep(NA_character_,nd))
## rvs$gui$datatab$file = res_file
## rvs$gui$datatab$tag = add_new_def_tag(as.character(rvs$gui$datatab$tag),nd)
## rvs$gui$datatab$adduct = res_adduct
## rvs$gui$datatab$set = res_set
}
updateSelectInput(session=session,
inputId="dfile_list",
selected=NULL)
}) })
observeEvent(input$rem_dfiles_b,{ observeEvent(input$rem_dfiles_b,{
...@@ -1162,7 +1151,7 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1162,7 +1151,7 @@ mk_shinyscreen_server <- function(projects,init) {
rv_extr_flag(F) rv_extr_flag(F)
rvs$m = run(m=rvs$m, rvs$m = run(m=rvs$m,
envopts=init$envopts, envopts=init$envopts,
phases=c("setup","comptab","extract")) phases=c("setup","comptab","db","extract"))
rvs$status$is_extracted_stat = "Yes." rvs$status$is_extracted_stat = "Yes."
rvs$status$is_qa_stat = "No." rvs$status$is_qa_stat = "No."
fn_c_state = file.path(rvs$m$run$paths$project, fn_c_state = file.path(rvs$m$run$paths$project,
...@@ -1178,7 +1167,7 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1178,7 +1167,7 @@ mk_shinyscreen_server <- function(projects,init) {
observeEvent(input$presc_b,{ observeEvent(input$presc_b,{
if (NROW(rvs$m$extr$ms1)>0L) { if (NROW(rvs$m$db$extr$cgm$ms1)>0L) {
## Update just prescreening conf. ## Update just prescreening conf.
rvs$m = app_update_conf(input=input, rvs$m = app_update_conf(input=input,
gui=rvs$gui, gui=rvs$gui,
...@@ -1310,9 +1299,10 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1310,9 +1299,10 @@ mk_shinyscreen_server <- function(projects,init) {
observeEvent(input$make_report_b,{ observeEvent(input$make_report_b,{
isolate({ isolate({
ms1 = rvs$m$extr$ms1 ms1 = rvs$m$db$extr$cgm$ms1
ms2 = rvs$m$extr$ms2 ms2 = rvs$m$db$extr$cgm$ms2
summ = rvs$m$out$tab$summ summ = rvs$m$out$tab$summ
db = rvs$m$db
}) })
req(NROW(summ)>0L) req(NROW(summ)>0L)
...@@ -1334,18 +1324,21 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1334,18 +1324,21 @@ mk_shinyscreen_server <- function(projects,init) {
names(kvals) = key names(kvals) = key
message('Compound index row: ',ri) message('Compound index row: ',ri)
p1 = make_eic_ms1_plot(ms1,summ,kvals=kvals, p1 = make_eic_ms1_plot(db=db,
labs=labs, summ,
asp=PLOT_EIC_ASPECT, kvals=kvals,
rt_range=rt_range, labs=labs,
i_range=i_range, asp=PLOT_EIC_ASPECT,
colrdata = colrdata) + theme_print() rt_range=rt_range,
i_range=i_range,
p2 = make_eic_ms2_plot(summ,kvals=kvals, colrdata = colrdata) + theme_print()
labs=labs,
asp=PLOT_EIC_ASPECT, p2 = make_eic_ms2_plot(db=db,
rt_range=rt_range, summ,kvals=kvals,
colrdata = colrdata) + theme_print() labs=labs,
asp=PLOT_EIC_ASPECT,
rt_range=rt_range,
colrdata = colrdata) + theme_print()
...@@ -1353,11 +1346,11 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1353,11 +1346,11 @@ mk_shinyscreen_server <- function(projects,init) {
smi = rvs$m$out$tab$comp[ID==(id),SMILES][[1]] smi = rvs$m$out$tab$comp[ID==(id),SMILES][[1]]
p_struc = make_struct_plot(smi) p_struc = make_struct_plot(smi)
p_spec = make_spec_ms2_plot(ms2, p_spec = make_spec_ms2_plot(db=db,
summ, summ,
kvals=kvals, kvals=kvals,
labs=labs, labs=labs,
colrdata=colrdata)+theme_print() colrdata=colrdata)+theme_print()
cmb = combine_plots(p1,p2,p_spec,p_struc) cmb = combine_plots(p1,p2,p_spec,p_struc)
print(cmb) print(cmb)
...@@ -1377,7 +1370,7 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1377,7 +1370,7 @@ mk_shinyscreen_server <- function(projects,init) {
fn = file.path(projdir,input$ms2_spectra_tab_name) fn = file.path(projdir,input$ms2_spectra_tab_name)
shinymsg(paste0("Saving MS2 spectra table to: ",basename(fn))) shinymsg(paste0("Saving MS2 spectra table to: ",basename(fn)))
tab2file(pack_ms2_w_summ(rvs$m$out$tab$summ, tab2file(pack_ms2_w_summ(rvs$m$out$tab$summ,
rvs$m$extr$ms2), rvs$m$db$extr$spectra),
fn) fn)
shinymsg("Done saving MS2 spectra table.") shinymsg("Done saving MS2 spectra table.")
}) })
...@@ -1588,7 +1581,7 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1588,7 +1581,7 @@ mk_shinyscreen_server <- function(projects,init) {
}) })
output$comp_table = DT::renderDataTable({ output$comp_table = DT::renderDataTable({
## TODO FIXME ## TODO
## cmpds = rf_get_cmpd_tab() ## cmpds = rf_get_cmpd_tab()
## validate(need(NROW(cmpds)>0,"No compound list loaded yet.")) ## validate(need(NROW(cmpds)>0,"No compound list loaded yet."))
## DT::datatable(cmpds, ## DT::datatable(cmpds,
...@@ -1626,8 +1619,6 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1626,8 +1619,6 @@ mk_shinyscreen_server <- function(projects,init) {
m=rvs$m) m=rvs$m)
kv = rf_get_cindex_kval() kv = rf_get_cindex_kval()
## Some cols that might be needed to be specified
## explicitely. FIXME TODO: Make this more robust.
nsumm = mf_narrow_summ(rvs$m$out$tab$summ,kv, nsumm = mf_narrow_summ(rvs$m$out$tab$summ,kv,
ms2_rt_i=input$mf_entry_rt_min, ms2_rt_i=input$mf_entry_rt_min,
ms2_rt_f=input$mf_entry_rt_max) ms2_rt_f=input$mf_entry_rt_max)
...@@ -1639,7 +1630,8 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1639,7 +1630,8 @@ mk_shinyscreen_server <- function(projects,init) {
path = rvs$m$run$metfrag$path, path = rvs$m$run$metfrag$path,
subpaths = rvs$m$run$metfrag$subpaths, subpaths = rvs$m$run$metfrag$subpaths,
db_file = rvs$m$run$metfrag$db_file, db_file = rvs$m$run$metfrag$db_file,
stag_tab = stagtab, ms2 = rvs$m$extr$ms2, stag_tab = stagtab,
ms2 = rvs$m$db$extr$spectra,
runtime=rvs$m$run$metfrag$runtime, runtime=rvs$m$run$metfrag$runtime,
java_bin=rvs$m$run$metfrag$java_bin, java_bin=rvs$m$run$metfrag$java_bin,
nproc = rvs$m$conf$metfrag$nproc) nproc = rvs$m$conf$metfrag$nproc)
...@@ -1768,7 +1760,7 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1768,7 +1760,7 @@ mk_shinyscreen_server <- function(projects,init) {
selMS2 = req(input$sel_spec) selMS2 = req(input$sel_spec)
if (NROW(ms2tabsel)!=0L) { if (NROW(ms2tabsel)!=0L) {
lval = lapply(ms2tabsel[item==(selMS2)],function(x) x) lval = lapply(ms2tabsel[item==(selMS2)],function(x) x)
ms2 = rvs$m$extr$ms2 ms2 = rvs$m$db$extr$cgm$ms2
kval = rf_get_cindex_kval() kval = rf_get_cindex_kval()
allval = c(kval,lval) allval = c(kval,lval)
## There can be some duplicates. ## There can be some duplicates.
...@@ -1778,10 +1770,11 @@ mk_shinyscreen_server <- function(projects,init) { ...@@ -1778,10 +1770,11 @@ mk_shinyscreen_server <- function(projects,init) {
#more than the names existing in extr$ms2. Also, #more than the names existing in extr$ms2. Also,
#BASE_KEY_MS2 does not contain `an', so we need to readd #BASE_KEY_MS2 does not contain `an', so we need to readd
#it. #it.
key = unique(c(names(allval)[names(allval) %in% BASE_KEY_MS2],"an")) key = unique(c(names(allval)[names(allval) %in% BASE_KEY_MS2],"scan"))
kval2 = allval[key] kval2 = allval[key]
spec = get_data_from_key(ms2,kval2)[,.(mz,intensity)] ## Only precid and scan can be used for selection in spectra.
## as.character(lapply(1L:NROW(spec),function(nr) paste0(spec[nr,mz]," ",spec[nr,intensity]))) kval2 = as.data.table(kval2[c("precid","scan")])
spec = rvs$m$db$extr$spectra[kval2,.(mz,intensity),on=.(precid,scan)]
print(as.data.frame(spec),row.names=F) print(as.data.frame(spec),row.names=F)
} else { } else {
......
...@@ -37,13 +37,12 @@ new_state <- function() { ...@@ -37,13 +37,12 @@ new_state <- function() {
runtime_from_conf <- function(run,envopts,conf) { runtime_from_conf <- function(run,envopts,conf) {
lst_cmpl <- conf$compounds$lists lst_cmpl <- conf$compounds$lists
lst_fn_cmpl <- lapply(names(lst_cmpl),function (nm) { lst_fn_cmpl <- lapply(lst_cmpl,function (lst) {
bfn_cmpl <- lst_cmpl[[nm]] bfn_cmpl <- lst
fn <- file.path(run$paths$project,bfn_cmpl) fn <- file.path(run$paths$project,bfn_cmpl)
if (!file.exists(fn)) stop("File ", fn, " does not exist in ", run$paths$project," .") if (!file.exists(fn)) stop("File ", fn, " does not exist in ", run$paths$project," .")
fn fn
}) })
names(lst_fn_cmpl) <- names(lst_cmpl)
run$paths$compounds$lists <- lst_fn_cmpl run$paths$compounds$lists <- lst_fn_cmpl
run$paths$data = norm_path(file.path(envopts$top_data_dir,conf$paths$data)) run$paths$data = norm_path(file.path(envopts$top_data_dir,conf$paths$data))
...@@ -95,7 +94,7 @@ new_runtime_state <- function(project,envopts,conf=NULL) { ...@@ -95,7 +94,7 @@ new_runtime_state <- function(project,envopts,conf=NULL) {
## If MetFrag possible, set up the file structure. ## If MetFrag possible, set up the file structure.
if (metfrag$cando_metfrag) { if (metfrag$cando_metfrag) {
mfdir = file.path(project_path,"metfrag") mfdir = file.path(project_path,"metfrag")
dir.create(mfdir) dir.create(mfdir, showWarnings = F)
metfrag$path=mfdir metfrag$path=mfdir
subpaths = list(results = "results", subpaths = list(results = "results",
config = "config", config = "config",
...@@ -204,7 +203,7 @@ new_project <- function(project,envopts,datatab=NULL,conf=NULL) { ...@@ -204,7 +203,7 @@ new_project <- function(project,envopts,datatab=NULL,conf=NULL) {
check_file_absent(fn_conf,what="conf-file") check_file_absent(fn_conf,what="conf-file")
yaml::yaml.load_file(fn_conf) yaml::yaml.load_file(fn_conf)
} else conf } else conf
m$conf$compounds$lists = label_cmpd_lists(m$conf$compounds$lists) ## m$conf$compounds$lists = label_cmpd_lists(m$conf$compounds$lists)
m$run = new_runtime_state(project,envopts=envopts,conf=m$conf) m$run = new_runtime_state(project,envopts=envopts,conf=m$conf)
if (!is.null(datatab)) { if (!is.null(datatab)) {
m$input$tab$mzml = datatab m$input$tab$mzml = datatab
...@@ -409,9 +408,11 @@ pack_ms2_w_summ <- function(summ,ms2) { ...@@ -409,9 +408,11 @@ pack_ms2_w_summ <- function(summ,ms2) {
## Takes summ, finds entries with high quality spectra and subsets ms2 based on that. ## Takes summ, finds entries with high quality spectra and subsets ms2 based on that.
## Take the columns we need from summ. ## Take the columns we need from summ.
x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz","SMILES","Formula","Name")] x = summ[ms2_sel==T,.SD,.SDcols=c(key(summ),"mz","SMILES","Name")]
mrg_keys = c(intersect(key(ms2),key(summ)),"an") mrg_keys = intersect(key(ms2),key(summ))
ms2[x,.(mz=i.mz,ms2_spectrum=encode_ms2_to_line(.SD[,c("mz","intensity")])),on=mrg_keys,by=.EACHI] mrg_keys = c(mrg_keys,"scan")
## ms2[x,.(mz=i.mz,ms2_spectrum=encode_ms2_to_line(.SD[,c("mz","intensity")])),on=mrg_keys,by=.EACHI]
ms2[x,.(ion_mz=i.mz,mz,intensity),on=mrg_keys,by=.EACHI]
} }
...@@ -431,3 +432,40 @@ pack_project <- function(m,fn_arch) { ...@@ -431,3 +432,40 @@ pack_project <- function(m,fn_arch) {
}) })
fn_arch fn_arch
} }
join_compound_lists <- function(fname) {
coll <- list()
fields <- colnames(EMPTY_CMPD_LIST)
coltypes <- c(ID="character",
SMILES="character",
Formula="character",
Name="character",
RT="numeric",
mz="numeric")
l=0L
for (fn in fname) {
l = l + 1L
## Figure out column headers.
nms <- colnames(file2tab(fn,nrows=0))
## Read the table. Knowing column headers prevents unnecessary
## warnings.
dt <- file2tab(fn, colClasses=coltypes[nms])
verify_cmpd_l(dt=dt,fn=fn)
# nonexist <- setdiff(fnfields,fields)
coll[[l]] <- dt #if (length(nonexist)==0) dt else dt[,(nonexist) := NULL]
coll[[l]]$ORIG <- fn
}
if (length(fname)>0) rbindlist(l=c(list(EMPTY_CMPD_LIST), coll), use.names = T, fill = T) else EMPTY_CMPD_LIST
}
process_cmpd_sets <- function(cmpdlist) {
if (nrow(cmpdlist)==0L) return(EMPTY_CMPD_LIST)
## Process sets.
if (! ("set" %in% colnames(cmpdlist))) cmpdlist$set=NA_character_ else cmpdlist[,set:=as.character(set)]
slugs = cmpdlist[,.(slug=gen_fname_slug(ORIG)),by="ORIG"]
slugs[,slug:=uniqy_slugs(slug)]
cmpdlist[slugs,set:=fifelse(is.na(set),i.slug,set),on="ORIG"]
cmpdlist
}