-
Todor Kondic authoredTodor Kondic authored
The Shinyscreen Package
Overview
Shinyscreen R package is an application intended to give the user a first look into raw mass-spectrometry data. This currently means that, given the input of data files and a list of masses of know, or unknown compounds, the application is going to produce the MS1 and MS2 chromatograms of the substances in the list, as well as the MS2 spectra. None of these features have been post-processed in the slightest. However, there is a built-in prescreening aid that will help the user assess the quality of the spectra.
The application is powered by the MSnbase package and built as a Shiny web application.
Installation
The Worst Case Scenario
The major issue users, especially those on Windows, will often experience is the conflict between x64 and x32 architecture of the Java installations and how they interact with many installation methods of R packages and their unfortunate default settings.
It seems that the best way to overcome the issue is to be consistent in installation of only single architecture packages. Various R install routines do not really help with this.
Steps for x64
- Download and install the latest Java runtime. We tested the most recent installation method with OpenJDK Java. During the installation, you will be presented with two by default disabled options, the JAVA_HOME and the registry key entry. While the installation may work with both of those left unchecked and fiddling later with the environment variables in R itself, it seems easier to just switch both the JAVA_HOME and the registry key on. As a side note, the procedure should work for the proprietary Oracle Java variant as well, but this was not very thorougly tested.
- Download and install R and RStudio. After running RStudio, make
sure that the R session it uses is indeed x64. Which session is
being used can be discovered in the
Tools>Global Options>General
menu (R Sessions). - During the procedure, there may be packages that can be be
prevented from being installed by mostly harmless
warnings. This is why the first order of business is to prevent
this from happening.
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true")
- From now on, it is essential to pass the
*INSTALL_opts=”–no-multiarch”* keyword argument to any
install
method which will be called subsequently. - Get devtools and BiocManager.
install.packages(c("devtools","BiocManager"),INSTALL_opts="--no-multiarch")
- Get rJava, rcdk and rcdklibs.
install.packages(c("rJava","rcdk","rcdklibs"),INSTALL_opts = "--no-multiarch")
- Get RMassBank.
BiocManager::install("RMassBank", INSTALL_opts = "--no-multiarch")
- Get rsvg and enviPat.
install.packages(c("rsvg","enviPat"), INSTALL_opts = "--no-multiarch")
- Now, the big challenge - install RChemMass.
devtools::install_github("schymane/RChemMass", dependencies = F, INSTALL_opts = "--no-multiarch")
- If the previous step worked, it only remains to install Shinyscreen.
devtools::install_url("https://git-r3lab.uni.lu/eci/shinyscreen/-/archive/master/shinyscreen-master.tar.gz", INSTALL_opts="--no-multiarch")
That’s it! Not exactly a piece of cake.
Problems that can arise
- rJava cannot be loaded, because of some DLL. Either you did not follow the “–no-multiarch” rule, or you have conflicting JAVA_HOME, maybe even Jave registry settings. Try playing around with JAVA_HOME environment variable, for example, try setting it either to “”, or to the path of your Java JRE installation (somewhere inside Program Files).
- If nothing helps, try the Previously Recommended Method
Steps for x32
Not tested, but the procedure should be exactly as outlined above, except that it would be required to install Java x32 bit runtime and make sure RStudio loads x32 bit R session.
Previously Recommended Method
This was only tested successfully with Oracle’s JRE (please let us know if you manage to achieve the same results with OpenJDK). The steps are same as above with the following exceptions.
- Ensure that both 32 bit and 64 bit versions are available for 64 bit systems. In case of Windows, check in `C:\Program Files\Java` and `C:\Program Files (x86)\Java`.
- Drop the INSTALL_opts keyword (as we now have two architectures in parallel).
- Detailed explanation of this method is given here.
Less Bad Scenario
You are on a OS with the sane package dependency management (i.e one of the GNU/Linux distributions). There is a reason why people put effort into packaging R software for the distribution (R package management sucks). Try to install as many dependencies as possible from your official distro channels, then fill the gaps using the standard R installation frameworks. One caveat here is tha some distros that focus on stability (such as Debian stable, or various so-called LTS editions) may have outdated R versions. This might not play well with some dependencies that are changing on shorter time-scales. The solution is to keep your R installation fresh.
Good Scenario
You have Guix installed. Great. Just subscribe to ECI’s Guix channel and install from there.
Running Shinyscreen
Provided Shinyscreen is successfully installed this snippet will run it.
library(shinyscreen)
PROJECT="project/location/somewhere/on/my/storage/device"
launch(projDir=PROJECT)
The `projDir` argument can be left out in which case shinyscreen is going to assume that the project directory is the result of
## Get current working directory of R instance.
getwd()
So, what is the project directory? This is the place where shinyscreen state, log and output files go by default. In other words, if you produce some PDF plots, this is where they are going to end up.
Usage
Before Starting
Compound Lists
The lists of known and unknown compounds contain different information and are treated differently. The application needs at least one, but can take both known and unknown lists as inputs. The formats of both lists are explained below.
Known Compounds List
- A comma-separated CSV file table.
- The column names are case-sensitive.
- Required headers:
- ID
- This is an integer compound identifier. This column must be filled and each ID entry must be unique. If both unknown and known lists are given, IDs from both lists must not overlap.
- SMILES
- The SMILES character string. Shinyscreen accepts
only MS-Ready SMILES. This column must be filled.
- Name
- The compound name. This column can be left empty.
- RT
- The retention time of the peak in minutes. This column can be left empty.
- Optional headers:
- mz
- m/z mass of the compound. If both SMILES and mz entries are present for a given compound, mz takes precedence.
"ID","Name","SMILES","RT" 33,"Isoproturon","CC(C)C1=CC=C(NC(=O)N(C)C)C=C1",19.6 717,"epsilon-Decalactone","CCCCC1CCCCC(=O)O1", 67,,"CCCCC1CCCCCC(=O)O1", ...,...,...,...
It is strongly suggested to quote all the character strings, such as SMILES and Name.
Unknown Compounds List
- A comma-separated CSV file table.
- Required headers:
- ID
- This is an integer compound identifier. This column must be filled and each ID entry must be unique. If both unknown and known lists are given, IDs from both lists must not overlap.
- mz
- m/z mass of the compound.
- RT
- The retention time of the peak in minutes. This column can be left empty.
"ID","mz","RT" 22,296.1160, 888,503.2816,
The compound sets.
Shinyscreen organises its data around the concept of compound sets. If, given a collection of data files, it is possible to break down the compounds into logical groups, shinyscreen will make it easier to navigate different groups if this is specified in a CSV list. In this case, the CSV file contains two columns: ID and set. The ID is the identifier of the compound from the compound list and set is a name of the set. If there is no sensible way of splitting compounds in groups, it is enough to copy all the ID-s from the compound list into a new CSV and use any character string to fill out the set column.
ID | set | RT |
---|---|---|
33 | mixA | |
717 | mixA | |
999 | mixA | |
… | … | |
129 | mixB | |
516 | mixB | |
… | … | |
333 | mixC | |
999 | mixC |
Data Files
These should be in mzML format.
Sets, Tags, Modes, Files and IDs
Each file is labelled by a tag, mode and set. Sets are defined in the compound set CSV file and group compounds according to their IDs. Modes correspond to the adducts. Tags label files in the plots.
For known compounds, each set can contain multiple modes. Sets of unknowns can only contain a single mode. Any files belonging to the same set that have been acquired in a single mode, must carry unique tags.
In addition, the IDs of compounds belonging to the same set/mode combination must be unique. Different ID sets may overlap.
Essentially, sets serve the purpouse of visually grouping files in the plots. Also, set combines those groups of files with particular collections of compounds (from the compound set CSV file).
Config Screen
This is the start tab. Import the compound and set lists first,
then proceed to import the mzML files. Provide tags in the tag text
box and then assign the sets, modes and tags to the imported mzML
files using table widget. Once this is done, move on to the
Spectra Extraction
tab.
Resetting State
In case some inputs have been changed, but the program for some
reason does not seem to respond to those changes, perhaps
resetting the state using the button Reset State
will help. This
will clean the current compound state tables (but, all the inputs
remain unchanged).
Switching Projects
The Switch project.
button can be used to start new projects, or
change between them in the middle of processing.
Switching projects while the program is running makes most sense if it is desired to change some of the inputs (e.g. different set configuration, or same compound lists but different files) while retaining the others. The user is presented with a directory change dialogue which is then used to select the new project directory. If needed, a new project directory can be created from the same dialogue. All the inputs that currently exist on the configuration tab will be kept during switching. This way, only what needs to be changed can be changed.
Spectra Extraction
Set the extraction parameters and then select a certain number of sets to scan for. This may take a while.
After one, or more sets have been extracted (once the status box gets checked), it is possible to carry out the auto quality check. This check is going to perform a rudimentary analysis of the spectra, as well as retrieve the retention times of the precursor peaks and their MS2 spectra. This procedure must be done in order to plot the MS2 spectra.
TODO: Explain the parameters
For entries that had RT empty, the entire retention time interval is scanned for peaks. Those entries with known RT will only be scanned within the interval specified by the parameters (by default 1 min). This means that the processing is going to take much less time then for the case if RT was left out.
Prescreening
The third tab allows the visual inspection of the spectra and the chromatogram, as well as exporting the plots in a PDF format.
Significant Contributions (in no particular order)
- Anjana Elapavalore
- Hiba Mohammed-Taha
- Jessy Krier
- Mira Narayanan
- Emma Schymanski
- Randolph Singh (contributed good mood, mostly :-) )
Thanks
Many thanks to the students of the Masters in Integrated Systems Biology course (March 2020)
- Tessy Prohaska
- Jeff Didier
- Claudia Cipriani
- Parviel Chirsir
for boldly wading through the Windows installation procedure, a task that led to more clarity in the docs.