Commit 5177c750 authored by Emma Schymanski's avatar Emma Schymanski
Browse files

Rearranged docs

... and added some bits of new content. Mainly rearranged contents to the first page and added more pointers to PubChem docs
parent 1b56852f
......@@ -42,12 +42,12 @@ at the [LCSB](https://wwwen.uni.lu/lcsb/),
several community representatives (see [Contributions](#contrib)
and [Acknowledgements](#ack)).
![The "[PFAS and Fluorinated Organic Compounds in PubChem Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120)" Landing Page.](fig/PFAS_Tree_Landing.png)
<!-- ![The "[PFAS and Fluorinated Organic Compounds in PubChem Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120)" Landing Page.](fig/PFAS_Tree_Landing.png) -->
## Contents
This document is organised as follows:
This document is organised into several sections, as follows:
| Section | Navigation | PDF Page |
|-----------|---------|:----:|
......@@ -59,89 +59,104 @@ This document is organised as follows:
|_Search via PubChem Search_ | [Go to heading](#pc-search) | 5 |
|_Interactions via Entrez_ | [Go to heading](#entrez) | 6 |
|Implementation | [Go to heading](#impl) | 6 |
|Statements and References | [Go to heading](#statements) | 6 |
|Statements | [Go to heading](#statements) | 6 |
|References | [Go to heading](#statements) | 6 |
To become more familiar with the PubChem Classification Browser features
in general before embarking on content specific to the PFAS tree,
see Section [Navigating the Tree](#search).
There is also extensive documentation on the PubChem website, see:
- https://pubchem.ncbi.nlm.nih.gov/classification/
- https://pubchemdocs.ncbi.nlm.nih.gov/classification-browser
- https://pubchem.ncbi.nlm.nih.gov/classification/docs/classification_help.html
## PubChem PFAS Tree Nodes {#treenodes}
The tree is currently split into three main nodes that are constructed and
compiled separately. More nodes are under development and will be released
as they are ready.
Further details are given in the sections below.
To become more familiar with the PubChem Classification Browser features,
see Section [Navigating the Tree](#search).
compiled separately (see Figure 1).
More nodes are under development and will be released as they are ready.
Further details are given below.
<!-- To become more familiar with the PubChem Classification Browser features, -->
<!-- see Section [Navigating the Tree](#search). -->
![The "[PFAS and Fluorinated Organic Compounds in PubChem Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120)" Landing Page.](fig/PFAS_Tree_Landing.png)
### OECD PFAS Definition {#oecddef}
This node is constructed out of per- and polyfluoroalkyl substances
(PFAS) satisfying the OECD 2021 definition (contains at least one saturated CF~2~ or
CF~3~ part) in the OECD Report ENV/CBC/MONO(2021)25 (9 July 2021), available from
[here](https://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=ENV/CBC/MONO(2021)25&docLanguage=En)
CF~3~ part) in the 2021 OECD Report
[ENV/CBC/MONO(2021)25](https://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=ENV/CBC/MONO(2021)25&docLanguage=En)
[@oecd_reconciling_2021].
As this node includes over 6 million entries, browseability is a challenge.
Since the majority of these PFAS contained isolated CF~2~ or CF~3~ groups,
these were separated into individual sections. There are approx.
600K compounds with isolated CF~2~ groups, and approx 5.4 M compounds
with isolated CF~3~ groups. 188 K compounds remain that contain PFAS
parts greater than CF~2/3~.
#### PFAS "parts":
Note that here, "part" is used to describe a connected portion of the molecule
that satisfies the OECD PFAS definition. A given molecule may have
more than on PFAS parts present, some examples are given in Figure 2,
Browsing the 6 million entries in this node is a challenge.
Since the majority of these PFAS contained isolated CF~2~ (600 K) or
CF~3~ groups (5.4 M), these were separated into individual sections.
188 K compounds remain that contain PFAS parts larger than CF~2~/CF~3~.
<!-- #### PFAS "parts": -->
Note that here, "**PFAS part**" is used to describe a connected portion of
the molecule that satisfies the OECD PFAS definition. A given molecule may have
more than one PFAS parts present, some examples are given in Figure 2,
along with the count of parts.
![Examples of molecules with varying PFAS parts highlighted, drawn using CDK Depict [@mayfield_cdk].](fig/PFAS_parts_CDK.png)
#### The OECD PFAS Definition node
The _OECD PFAS Definition_ node
with the top two level subnodes, is shown in Figure 3.
![The OECD PFAS Definition part of the PFAS tree, with top two subnodes (numbers from 24 March 2022).](fig/OECDPFAS_TopTwoSubnodes.png)
![The OECD PFAS Definition part of the PFAS tree, with top two subnodes (24 March 2022).](fig/OECDPFAS_TopTwoSubnodes_v3.png)
#### Isolated CF~2~ and CF~3~ Nodes:
### OECD PFAS - Isolated CF~2~ and CF~3~ Nodes:
The top two subnodes of the OECD PFAS Definition allows the browsing of all PFAS
molecules in PubChem containing at least one isolated CF~2~ part (top subnode)
The _Isolated CF~2~ and CF~3~_ subnodes of the _OECD PFAS Definition_ node
allows the browsing of all PFAS
molecules in PubChem containing at least one isolated CF~2~ (top subnode)
or one isolated CF~3~ (next subnode). These are broken down similarly,
as shown in Figure 4 for the CF~2~ case.
as shown in Figure 4 for CF~2~.
![The isolated CF~2~ section of the OECD PFAS Definition node, with breakdown of the major parts (numbers from 24 March 2022).](fig/OECDPFAS_CF2combi.png)
![The isolated CF~2~ section of the OECD PFAS Definition node, with breakdown of the major parts (24 March 2022).](fig/OECDPFAS_CF2combi.png)
The larger PFAS parts (left) are broken down by part type (linear, branched,
_etc._). Within these subcategories, dynamic construction is used.
If many (>20) variants are present, a breakdown by number of PFAS parts
is added (bottom left, Figure 4, see "Contains isolated unsaturated-linear
PFAS part"), if not, a list of the possibilities is given directly
(middle left, Figure 4, see "Contains isolated unsaturated-cyclic part").
is added (Figure 4, bottom left, "_Contains isolated unsaturated-linear
PFAS part_"), if not, a list of the possibilities is given directly
(Figure 4, middle left, "_Contains isolated unsaturated-cyclic part_").
The "Contains only isolated CF~2~" (or, for the CF~3~ node, only isolated
The "_Contains only isolated CF~2~_" (or, for the CF~3~ node, only isolated
CF~3~) node (Figure 4, middle panel) is broken down by the number of
isolated groups (CF~2~ or, for the CF~3~ node, by CF~3~ groups). In both
cases the vast majority of molecules have only one isolated group.
The "Contains only isolated CF~2~/CF~3~" is also broken down by
The "_Contains only isolated CF~2~/CF~3~_" is also broken down by
the number of groups, sorted by increasing number of CF~2~ groups
(for both nodes). See Figure 4, right panel.
### PFAS Parts Larger than CF~2~/CF~3~
### OECD PFAS - PFAS Parts Larger than CF~2~/CF~3~
The "_Molecule contains PFAS parts larger than CF~2~/CF~3~_" part of the
OECD PFAS node includes approx. 188K molecules, which can be browsed
in two major breakdowns, by isolated PFAS part count (see Figure 5)
and by isolated PFAS part type (see Figure 6).
This part of the tree is constructed dynamically - in other words,
the subnodes present depend on the contents within - to prevent
excessive scrolling. The breakdown by _isolated PFAS part count_ is first
subset by the number of parts (Figure 5, left panel).
excessive scrolling.
![The "Molecule contains PFAS parts larger than CF~2~/CF~3~" part of the OECD PFAS Definition node, with dynamic breakdown of subnodes by isolated PFAS part count (numbers from 24 March 2022).](fig/OECDPFAS_largerPFASparts.png)
#### The _Breakdown by isolated PFAS part count_
is first subset by the number of parts (Figure 5, left panel).
Should there be fewer than ~20 categories,
the immediate breakdown is by the formula of the parts (see
Figure 5, bottom right, "_Contains 11 isolated PFAS parts_").
......@@ -160,9 +175,8 @@ Note that throughout the tree, leading zeros are present to
ensure logical sorting.
![The "Molecule contains PFAS parts larger than CF~2~/CF~3~" part of the OECD PFAS Definition node, with dynamic breakdown of subnodes by isolated PFAS part count (numbers from 24 March 2022).](fig/OECDPFAS_largerPFASparts.png)
The breakdown by _isolated PFAS part type_ is first broken down by the
#### The _Breakdown by isolated PFAS part type_
is first broken down by the
part type (linear, cyclic, _etc._) (Figure 6, left panel). These are
again split dynamically. With fewer than 20 entries, the list split
according to PFAS part formulas appears. If a greater breakdown is needed,
......@@ -185,9 +199,11 @@ the load time for large parts of the tree. It is possible to use some
advanced search and querying capabilities to improve the interaction
with the tree, see [Navigating the Tree](#search) below.
The **PFAS Parts Larger than CF~2~/CF~3~** will soon be available as
The _PFAS Parts Larger than CF~2~/CF~3~_ will soon be available as
a [MetFrag](https://msbi.ipb-halle.de/MetFrag/) file for further interactions.
#### TODO: add to Zenodo
### Organofluorine Compounds {#orgf}
......
No preview for this file type
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment