Commit bf239024 authored by Emma Schymanski's avatar Emma Schymanski
Browse files

More minor tweaks

... new image and proofing corrections/adjustments.
parent 5486a812
......@@ -121,10 +121,10 @@ along with the count of parts. For more information, see section
Browsing the 6 million entries in this node (see Figure 3) is challenging.
Since most of these PFAS contain isolated CF~2~ (600 K entries) or
CF~3~ groups (5.4 M entries), these were separated into individual sections
(see "[_isolated CF~2~ and CF~3~_](#isonodes)").
(see "[Isolated CF~2~ and CF~3~ Nodes](#isonodes)").
<!-- (see [next section](#isonodes)). -->
Approximately 188 K compounds contain PFAS parts larger than CF~2~/CF~3~
(see "[larger PFAS parts](#largerparts)").
(see "[PFAS Parts Larger than CF~2~/CF~3~](#largerparts)").
![_Examples of molecules with varying PFAS parts highlighted, drawn using [CDK Depict](https://www.simolecule.com/cdkdepict/depict.html) [@mayfield_cdk]._](fig/PFAS_parts_CDK.png)
......@@ -154,11 +154,12 @@ is added (_e.g.,_ Figure 4, bottom left, "_Contains isolated unsaturated-linear
PFAS part_"), if not, a list of the possibilities is given directly
(_e.g.,_ Figure 4, middle left, "_Contains isolated unsaturated-cyclic part_").
The "_Contains only isolated CF~2~_" (or, for the CF~3~ node, only isolated
CF~3~) is broken down by the number of isolated groups (CF~2~ or,
The "_Contains only isolated CF~2~_"
(or, for the CF~3~ node, "_Contains only isolated CF~3~_")
is broken down by the number of isolated groups (CF~2~ or,
for the CF~3~ node, by CF~3~ groups) - see Figure 4, middle panel. In both
cases, the vast majority of molecules have only one isolated group.
The "_Contains only isolated CF~2~/CF~3~_" is also broken down by
The "_Contains only isolated CF~2~/CF~3~_" node is also broken down by
the number of groups, sorted by increasing number of CF~2~ groups
(for both nodes). See Figure 4, right panel.
......@@ -168,8 +169,8 @@ the number of groups, sorted by increasing number of CF~2~ groups
The "_Molecule contains PFAS parts larger than CF~2~/CF~3~_" part of the
OECD PFAS node includes about 188 K molecules, which can be browsed
in two major breakdowns, by isolated PFAS part count (see Figure 5)
and by isolated PFAS part type (see Figure 6).
in two major breakdowns, by _isolated PFAS part count_ (see Figure 5)
and by _isolated PFAS part type_ (see Figure 6).
This section of the tree is constructed dynamically - in other words,
the subnodes present depend on the contents within - to prevent
excessive scrolling.
......@@ -199,7 +200,7 @@ ensure logical sorting.
#### The _Breakdown by isolated PFAS part type_
is first broken down by the
part type (linear, cyclic, _etc._) (_e.g.,_ Figure 6, left panel). These are
part type (linear, cyclic, _etc._) as shown in Figure 6, left panel. These are
again split dynamically. With fewer than 20 entries, the list split
according to PFAS part formulas appears. If a greater breakdown is needed,
an extra layer of "_Also contains ..._" or "_Only contains ..._" is
......@@ -275,21 +276,23 @@ if there are CIDs within this range.
The "_Other Diverse Fluorinated Compounds_" section of the
[PubChem PFAS Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120)
is designed to help users explore various
cases of fluorine chemistry not necessarily covered in the [OECD PFAS](#oecddef)
cases of fluorine chemistry that are not necessarily covered in the
[OECD PFAS](#oecddef)
or [Organofluorine compound](#orgf) sections above. The navigation in this
section helps explore fluorinated compound chemistry by various
fluorine-heteroatom bonds and the occurrence of different elements
(see Figure 8).
Many of the compounds present in this section are also present
in the other sections of the PubChem PFAS Tree - the overlap
can be investigated in Entrez (see section
in the other sections of the
[PubChem PFAS Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120).
The overlap can be investigated in Entrez (see section
[Interactions via Entrez](#entrez) below).
![_The "Other diverse fluorinated compounds" part of the PubChem PFAS Tree, showing the breakdown by fluorine bonded to non-carbon elements and by non-organic element (numbers from 29 May 2022)._](fig/DiverseFcmpds.png)
#### The "Contains fluorine bond to non-carbon element"
#### The _Contains fluorine bond to non-carbon element_
section (Figure 8, middle panel) is broken down first by the
count of molecules present in the given category, then by the
non-carbon element present in the F-element bond (sorted alphabetically).
......@@ -297,7 +300,7 @@ For the sections with counts above 100, there is an extra breakdown
by the numbers of fluorine present overall.
#### The "Contains non-organic element"
#### The _Contains non-organic element_
section (Figure 8, right panel) is likewise broken down first by the
count of molecules present in the given category, then by the
non-organic element present (sorted alphabetically).
......@@ -312,7 +315,7 @@ present overall for the sections with counts above 100.
The "_PFAS and Fluorinated Compound Collections_"
section of the PubChem PFAS tree contains various lists gathered
across PubChem content (see Figure 9).
across PubChem content (see Figure 9).
The mapping files to construct this are kept
on the [eci/pubchem](https://gitlab.lcsb.uni.lu/eci/pubchem/)
repository on GitLab.
......@@ -334,6 +337,7 @@ from the NORMAN Suspect List Exchange
in PubChem;
- The CORE and Patent PFAS lists from OntoChem [@barnabas_extracting_2022];
- Other collections from within PubChem Classification Trees, including
collections from
[Cameo](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=86),
[ChEBI](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=2) and
[MeSH](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=1).
......@@ -362,14 +366,14 @@ entire node contents, as shown in Figure 10. This query follows
"_OECD PFAS Definition_" > "_Molecule contains PFAS parts larger than
CF~2~/CF~3~_" > "_Breakdown by isolated PFAS part count_" >
"_Contains 01 isolated PFAS part_" > "_Count of molecules 10001-100000_" >
"_Contains 01xC04F09-linear_" and returns the 10,555 CIDs containing
only one single linear C~4~F~9~ PFAS part.
"_Contains 01xC04F09-linear_" and returns the 10,555 CIDs (26 March, 2022)
containing only one single linear C~4~F~9~ PFAS part.
This query can then be downloaded (Figure 10, inset),
or sent to Entrez for advanced querying (see [next section](#entrez)).
Note that clicking on the "**?**" beside a node (where present) will open a
tool tip explaining the node contents (Figure 10, bottom left).
![_Querying node contents in PubChem Search. When clicking on the blue numbers (left), a search window will open in a new tab (right, main image). This collection can be browsed, downloaded (see inset) or sent to Entrez (see next section). Clicking on the "**?**" sign next to a node name will open a tool tip (left panel, bottom, see yellow blurb)._](fig/Tree_PubChemSearch.png)
![_Querying node contents in PubChem Search. When clicking on the blue numbers (left), a search window will open in a new tab (right, main image). This collection can be browsed, downloaded (see inset) or sent to Entrez (see next section). Clicking on the "**?**" sign next to a node name will open a tool tip (left panel, bottom, see yellow blurb). Image from 26 March 2022._](fig/Tree_PubChemSearch.png)
The download file contains a number of fields of interest,
including: CIDs, names and synonyms, several properties (_e.g._ XlogP),
......@@ -394,7 +398,7 @@ Table: _Relevant metadata files in the PubChem Download files._
<!-- cid cmpdname cmpdsynonym mw mf polararea complexity xlogp heavycnt hbonddonor hbondacc rotbonds inchi isosmiles inchikey iupacname meshheadings annothits annothitcnt aids cidcdate sidsrcname depcatg annotation -->
![_PubChem Download file. Top left: CID, names, properties. Middle: structural information and metadata. Bottom: selected metadata with expanded view to show the information content of records. Downloaded from the query shown in Figure 9 on 27 March 2022._](fig/PubChem_Download_File.png)
![_PubChem Download file. Top left: CID, names, properties. Middle: structural information and metadata. Bottom: selected metadata with expanded view to show the information content of records. Downloaded from the query shown in Figure 10 on 27 March 2022._](fig/PubChem_Download_File.png)
Note that the categories visible in the "_annothits_" column align
with the individual sections in PubChem records and can
......@@ -412,7 +416,7 @@ in Figure 10 can be viewed with the following hyperlinks:
- https://pubchem.ncbi.nlm.nih.gov/compound/105447#section=Safety-and-Hazards
- https://pubchem.ncbi.nlm.nih.gov/compound/105447#section=Use-and-Manufacturing
As visible in the figure, there are many records where the information
As visible in Figure 11, there are many records where the information
has only been extracted from patents, or for which no annotation exists.
Thus, this metadata can help add a lot of context to the relevance of the
entries for the particular question at hand.
......@@ -431,7 +435,7 @@ option on the PubChem landing page. More documentation on Entrez is given
This section steps through a few interactive examples.
#### Example 1: Find all PFAS containing one linear C~4~F~9~ part with use information:
To find all molecules from the query in Figure 9 that also have
To find all molecules from the query in Figure 10 that also have
use information in PubChem, the first step is to send the 10,555 CIDs
from the query above to Entrez via the "Push to Entrez" option (Figure 10,
second box encircled in red on the right). This opens a new page in the
......@@ -465,14 +469,14 @@ and ***refresh*** the contents. A new dropdown menu will appear
bottom right). By selecting the chosen query in this dropdown menu,
the tree will then be subset by the contents within that query, such
that only CIDs that are in the tree _and_ in the query will show
(here, ~54K not 19M CIDs).
(in Figure 13, ~54K not 19M CIDs).
The same holds for any advanced query, so it would be possible to
_e.g._ do a subset of only mass spectra that occur in
[MassBank EU](https://massbank.eu/MassBank/) or NIST by additionally
adding the relevant "_Information Sources_" (from the
adding the relevant "Information Sources" (from the
[PubChem TOC Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72))
to the Entrez query. Since large queries such as the Mass Spectrometry
to the Entrez query. Since large queries such as the "Mass Spectrometry"
category, or advanced AND/OR combinations can end up quite complicated,
it is a good idea to carefully note the query number (#XXX) and the
number of compounds in the result, to ensure the correct entries are
......@@ -482,13 +486,18 @@ Also note that it is possible to send queries to Entrez via the
[PubChem Identifier Exchange Service](https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi).
Thus, it is possible to add external queries to Entrez history
by uploading this information via the
[ID Exchange](https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi).
[ID Exchange](https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi),
as shown in Figure 14.
![_Subsetting Tree Contents via Entrez. Left: [PubChem TOC Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72), "Mass Spectrometry" subsection. Top right: the "Mass Spectrometry" query in PubChem Search (to be sent to Entrez). Bottom right: the [PubChem PFAS Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120) subset by Mass Spectrometry, now only displaying CIDs where mass spectrometry information is available in PubChem. Queries run on 27 March 2022._](fig/Entrez_MSandPFAS.png)
<!-- More examples coming soon ... -->
![_Sending queries to Entrez via the [PubChem ID Exchange](https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi)._](fig/IDExch_to_Entrez.png)
### Interactions via PUG REST {#pugrest}
It is also possible to interact with the PubChem PFAS Tree
......@@ -502,7 +511,7 @@ please see the following locations in the PubChem documentation:
- https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest$classification_nodes
#### Interacting with the PubChem PFAS Tree in R
#### Interacting with the PubChem PFAS Tree in R:
The following contains a few tips to start interacting with the tree in R;
note that some of these features are also in active development.
......
No preview for this file type
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment