Commit f8bc6549 authored by Emma Schymanski's avatar Emma Schymanski
Browse files

added Entrez section

..and tweaked figure captions to distinguish from text better.
parent 34a0b269
......@@ -82,9 +82,9 @@ There is also extensive documentation on the PubChem website, see:
|Navigating the Tree | [Go to heading](#search) | 7 |
|_Search via PubChem Search_ | [Go to heading](#pc-search) | 7 |
|_Interactions via Entrez_ | [Go to heading](#entrez) | 9 |
|_Interactions via PUG REST_ | [Go to heading](#pugrest) | 9 |
|Further Details | [Go to heading](#details) | 10 |
|Statements and References | [Go to heading](#statements) | 10 |
|_Interactions via PUG REST_ | [Go to heading](#pugrest) | 10 |
|Further Details | [Go to heading](#details) | 11 |
|Statements and References | [Go to heading](#statements) | 12 |
<!-- |References | [Go to heading](#statements) | 6 | -->
......@@ -107,7 +107,7 @@ Further details are given below.
<!-- To become more familiar with the PubChem Classification Browser features, -->
<!-- see Section [Navigating the Tree](#search). -->
![The "[PFAS and Fluorinated Organic Compounds in PubChem Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120)" Landing Page.](fig/PFAS_Tree_Landing.png)
![_The "[PFAS and Fluorinated Organic Compounds in PubChem Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120)" Landing Page._](fig/PFAS_Tree_Landing.png)
......@@ -133,13 +133,13 @@ CF~3~ groups (5.4 M), these were separated into individual sections
<!-- #### PFAS "parts": -->
![Examples of molecules with varying PFAS parts highlighted, drawn using CDK Depict [@mayfield_cdk].](fig/PFAS_parts_CDK.png)
![_Examples of molecules with varying PFAS parts highlighted, drawn using [CDK Depict](https://www.simolecule.com/cdkdepict/depict.html) [@mayfield_cdk]._](fig/PFAS_parts_CDK.png)
The _OECD PFAS Definition_ node
with the top two level subnodes, is shown in Figure 3.
![The OECD PFAS Definition part of the PFAS tree, with top two subnodes (24 March 2022).](fig/OECDPFAS_TopTwoSubnodes_v3.png)
![_The OECD PFAS Definition part of the PFAS tree, with top two subnodes (24 March 2022)._](fig/OECDPFAS_TopTwoSubnodes_v3.png)
......@@ -151,7 +151,7 @@ molecules in PubChem containing at least one isolated CF~2~ (top subnode)
or one isolated CF~3~ (next subnode). These are broken down similarly,
as shown in Figure 4 for CF~2~.
![The isolated CF~2~ section of the OECD PFAS Definition node, with breakdown of the major parts (numbers as of 24 March 2022).](fig/OECDPFAS_CF2combi.png)
![_The isolated CF~2~ section of the OECD PFAS Definition node, with breakdown of the major parts (numbers as of 24 March 2022)._](fig/OECDPFAS_CF2combi.png)
The larger PFAS parts (left) are broken down by part type (linear, branched,
_etc._). Within these subcategories, dynamic construction is used.
......@@ -181,7 +181,7 @@ This part of the tree is constructed dynamically - in other words,
the subnodes present depend on the contents within - to prevent
excessive scrolling.
![The "Molecule contains PFAS parts larger than CF~2~/CF~3~" part of the OECD PFAS Definition node, with dynamic breakdown of subnodes by isolated PFAS part count (numbers from 24 March 2022).](fig/OECDPFAS_largerPFASparts.png)
![_The "Molecule contains PFAS parts larger than CF~2~/CF~3~" part of the OECD PFAS Definition node, with dynamic breakdown of subnodes by isolated PFAS part count (numbers from 24 March 2022)._](fig/OECDPFAS_largerPFASparts.png)
#### The _Breakdown by isolated PFAS part count_
......@@ -221,7 +221,7 @@ a breakdown by the count of PFAS parts is added before the
breakdown by "_Also contains..._".
![The "Molecule contains PFAS parts larger than CF~2~/CF~3~" part of the OECD PFAS Definition node, with dynamic breakdown of subnodes by isolated PFAS part type (numbers from 24 March 2022).](fig/OECDPFAS_PFAS_part_type.png)
![_The "Molecule contains PFAS parts larger than CF~2~/CF~3~" part of the OECD PFAS Definition node, with dynamic breakdown of subnodes by isolated PFAS part type (numbers from 24 March 2022)._](fig/OECDPFAS_PFAS_part_type.png)
The dynamic navigation approach reduces the scrolling by users and
......@@ -239,7 +239,10 @@ for use in
[MetFragCL](https://ipb-halle.github.io/MetFrag/projects/metfragcl/)
and will be made available from the
[MetFragWeb](https://msbi.ipb-halle.de/MetFrag/)
drop down menu. See the description on the Zenodo record
drop down menu. This file contains several useful fields
from the [Download](#pc-search) file as well as Patent and Literature
(PMID) counts.
See the description on the Zenodo record
[@pubchem_pfas_metfrag_2022] for more details.
......@@ -259,7 +262,7 @@ to the right. Note that one additional
category was added ("_Other fluorinated substances_") to capture content
that did not fit into any other category defined in the OECD figure.
![The categorization of PFAS (blue shading) and non-PFAS (grey) from the OECD 2021 report [@oecd_reconciling_2021] (left panel) and the "Organofluorine compounds" node (right panel). Numbers from 24 March 2022.](fig/Organofluorine_OECDfig_tree.png)
![_The categorization of PFAS (blue shading) and non-PFAS (grey) from the OECD 2021 report [@oecd_reconciling_2021] (left panel) and the "Organofluorine compounds" node (right panel). Numbers from 24 March 2022._](fig/Organofluorine_OECDfig_tree.png)
The _Organofluorine compounds_ node is broken down very differently
to the _OECD PFAS Definition_ node, since not all the contents are
......@@ -292,7 +295,7 @@ also be added here. The mapping files to construct this are kept
on the [eci/pubchem](https://gitlab.lcsb.uni.lu/eci/pubchem/)
repository on GitLab.
![The "PFAS and Fluorinated Organic Compound Collections" node, with all major collections shown (CompTox as inset). Numbers and content listing from 24 March 2022.](fig/PFAS_list_of_lists.png)
![_The "PFAS and Fluorinated Organic Compound Collections" node, with all major collections shown (CompTox as inset). Numbers and content listing from 24 March 2022._](fig/PFAS_list_of_lists.png)
Currently, the content (see Figure 8) comes from:
......@@ -337,7 +340,7 @@ or sent to Entrez for advanced querying (see next section).
Note that clicking on the "?" beside a node (where present) will open a tool tip
explaining the node contents (Figure 9, bottom left).
![Querying node contents in PubChem Search. When clicking on the blue numbers (left), a search window will open in a new tab (right, main image). This collection can be browsed, downloaded (see inset) or sent to Entrez (see next section). Clicking on the "?" sign will open a tool tip (left panel, bottom, see yellow blurb).](fig/Tree_PubChemSearch.png)
![_Querying node contents in PubChem Search. When clicking on the blue numbers (left), a search window will open in a new tab (right, main image). This collection can be browsed, downloaded (see inset) or sent to Entrez (see next section). Clicking on the "?" sign will open a tool tip (left panel, bottom, see yellow blurb)._](fig/Tree_PubChemSearch.png)
The download file contains a number of fields of interest,
including: CIDs, names and synonyms, several properties (_e.g._ XlogP),
......@@ -347,7 +350,7 @@ of interest to indicate the amount of information available
to support the structures in the tree are explained
in the following table; a preview is shown in Figure 10.
Table: Relevant metadata files in the PubChem Download files
Table: _Relevant metadata files in the PubChem Download files._
| Header | Description | Type |
|-----------|---------|----|
......@@ -361,7 +364,7 @@ Table: Relevant metadata files in the PubChem Download files
<!-- cid cmpdname cmpdsynonym mw mf polararea complexity xlogp heavycnt hbonddonor hbondacc rotbonds inchi isosmiles inchikey iupacname meshheadings annothits annothitcnt aids cidcdate sidsrcname depcatg annotation -->
![PubChem Download file. Top left: CID, names, properties. Middle: structural information and metadata. Bottom: selected metadata with expanded view to show the information content of records. Downloaded from the query shown in Figure 9.](fig/PubChem_Download_File.png)
![_PubChem Download file. Top left: CID, names, properties. Middle: structural information and metadata. Bottom: selected metadata with expanded view to show the information content of records. Downloaded from the query shown in Figure 9 on 27 March 2022._](fig/PubChem_Download_File.png)
Note that the categories visible in the "_annothits_" column align
with the individual sections in PubChem records and can
......@@ -389,14 +392,77 @@ as explained in the next section.
### Interactions via Entrez {#entrez}
Content still to come ...
It is possible to build more extensive queries via the
[Entrez](https://pubchemdocs.ncbi.nlm.nih.gov/advanced-search-entrez)
interface, which is accessible through the button below
the download button (see Figure 9) or by clicking the "Use Entrez"
option on the PubChem landing page. More documentation on Entrez is given
[here](https://pubchemdocs.ncbi.nlm.nih.gov/advanced-search-entrez).
This section steps through a few interactive examples.
#### Example 1: All PFAS containing one linear C~4~F~9~ part with use information:
To find all molecules from the query above (in Figure 9) that also have
use information in PubChem, the first step is to send the 10,555 CIDs
from the query above to Entrez via the "Push to Entrez" option (Figure 9,
second box encircled in red on the right). This opens a new page in the
Entrez interface (not shown).
Next, go to the "Use and Manufacturing" section of the
[PubChem TOC Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72),
send this to PubChem Search via the numbers next to the node (Figure 11,
red circle on left), and push to Entrez (Figure 11, top right). By
selecting the "Advanced" option under the search bar (Figure 11, top),
the Advanced Search builder is opened and further queries can be built.
By selecting "#2 AND #6", this returns only the 436 chemicals with a single
C~4~F~9~ linear PFAS part (query #2) that also have use and manufacturing
information in PubChem (query #6).
![_Advanced search via Entrez. Left: [PubChem TOC Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72). Top right: the Use and Manufacturing query in Entrez. Bottom right: the Advanced Search builder in Entrez, where query #2 (one C~4~F~9~ part only) AND #6 (Use information) is built. This is then sent again to search via Entrez (middle right) and the 436 C~4~F~9~ compounds with use information can be browsed or downloaded via the "View or Download Structures in PubChem" option. Queries run 27 March 2022._](fig/Entrez_C4F9andUse.png)
#### Example 2: Browse all OECD PFAS with mass spectrometry information:
Analytical chemists may, for instance, be particularly keen on finding
out which PFAS or organofluorine compounds have mass spectrometry information
available in PubChem (or in resources integrated within PubChem). It is
also possible to use the Entrez functionality to subset the tree
contents according to other available information - shown in Figure 12
for this example. First, go to the "Mass Spectrometry" section of the
[PubChem TOC Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72),
which is under the "Spectral Information" heading and send this query
to Entrez (see Figure 11 left and top right). Then, go back to the
[PubChem PFAS Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120)
and ***refresh*** the contents. A new dropdown menu will appear
(if not already present) called "Filter by Entrez History" (Figure 12,
bottom right). By selecting the chosen query in this dropdown menu,
the tree will then be subset by the contents within that query, such
that only CIDs that are in the tree _and_ in the query will show.
The same holds for any advanced query, so it would be possible to
_e.g._ do a subset of only mass spectra that occur in
[MassBank EU](https://massbank.eu/MassBank/) or NIST by additionally
adding the relevant "_Information Sources_" (from the
[PubChem TOC Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72))
to the Entrez query. Since large queries such as the Mass Spectrometry
category, or advanced AND/OR combinations can end up quite complicated,
it is a good idea to carefully note the query number (#XXX) and the
number of compounds in the result, to ensure the correct entries are
chosen once multiple queries have been performed.
Also note that it is possible to send queries to Entrez via the
[PubChem Identifier Exchange Service](https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi).
Thus, it is possible to add external queries to Entrez history
by uploading this information via the ID Exchange.
![_Subsetting Tree Contents via Entrez. Left: [PubChem TOC Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72), "Mass Spectrometry" subsection. Top right: the "Mass Spectrometry" query in PubChem Search (to be sent to Entrez). Bottom right: the [PubChem PFAS Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120) subset by Mass Spectrometry, now only displaying CIDs where mass spectrometry information is available in PubChem. Queries run 27 March 2022._](fig/Entrez_MSandPFAS.png)
More examples coming soon ...
### Interactions via PUG REST {#pugrest}
It is also possible to interact with the PubChem PFAS Tree
programmatically. For more extensive details on PUG REST
and other programmatic access than contained below,
please see the PubChem documentation:
please see the following locations in the PubChem documentation:
- https://pubchemdocs.ncbi.nlm.nih.gov/programmatic-access
- https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest
......
No preview for this file type
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment