......@@ -42,14 +42,22 @@ Environmental Cheminformatics group
at the [LCSB](,
[University of Luxembourg]( in consultation with
several community representatives (see [Contributions](#contrib)
and [Acknowledgements](#ack)).
[PFAS Tree](
includes all compounds in [PubChem](
that satisfy various definitions, as explained later in this document.
Each compound in PubChem has a PubChem Compound Identifier (CID), and the
blue numbers next to each node header reflects the number of
compounds (_i.e._ CIDs) in that node.
## Contents
This document is organised into several sections, as follows:
| Section | Navigation | PDF Page |
......@@ -57,12 +65,12 @@ This document is organised into several sections, as follows:
|_OECD PFAS Definition_ | [Go to heading](#oecddef) | 2 |
|_Organofluorine Compounds_ | [Go to heading](#orgf) | 5 |
|_PFAS and Fluorinated Organic Compound Collections_ | [Go to heading](#lists) | 5 |
|Navigating the Tree | [Go to heading](#search) | 6 |
|_Search via PubChem Search_ | [Go to heading](#pc-search) | 6 |
|_Interactions via Entrez_ | [Go to heading](#entrez) | 6 |
|_Interactions via PUG REST_ | [Go to heading](#pugrest) | 6 |
|Further Details | [Go to heading](#details) | 7 |
|Statements and References | [Go to heading](#statements) | 8 |
|Navigating the Tree | [Go to heading](#search) | 7 |
|_Search via PubChem Search_ | [Go to heading](#pc-search) | 7 |
|_Interactions via Entrez_ | [Go to heading](#entrez) | 8 |
|_Interactions via PUG REST_ | [Go to heading](#pugrest) | 8 |
|Further Details | [Go to heading](#details) | 9 |
|Statements and References | [Go to heading](#statements) | 9 |
......@@ -218,9 +226,35 @@ more details.
This node contains _organofluorine compounds_ as defined in Figure 8 in
the 2021 OECD PFAS Report
Further content still to come ...
Figure 7 in the current report shows an extract from Figure 8 of
the OECD report on the left panel,
and the corresponding node breakdown in the _Organofluorine compounds_
section of the PubChem PFAS Tree to the right. Note that one additional
category was added ("_Other fluorinated substances_") to capture content
that did not fit into any other category defined in the OECD figure.
![The categorization of PFAS (blue shading) and non-PFAS (grey) from the OECD 2021 report [@oecd_reconciling_2021] (left panel) and the "Organofluorine compounds" node (right panel). Numbers from 24 March 2022.](fig/Organofluorine_OECDfig_tree.png)
The _Organofluorine compounds_ node is broken down very differently
to the _OECD PFAS Definition_ node, since not all the contents are
PFAS (and thus do not contain PFAS parts). Each subnode is broken down first
by the number of fluorine atoms (1 through to 15, then >15) and then by
an exact mass range. If there are no CIDs for the given category, it is not
present. For instance, the "_Fluorinated aliphatic substances that have a
fully fluorinated methyl or methylene carbon atom_" category starts at
"_Contains 02 Fluorine atoms_" as no entries in this category could contain
only one F.
The exact mass is split into the ranges 1-250, 250-500, 500-750, 750-1000
and >1000.
### PFAS and Fluorinated Organic Compound Collections {#lists}
......@@ -231,7 +265,9 @@ also be added here. The mapping files to construct this are kept
on the [eci/pubchem](
repository on GitLab.
Currently, the content (see Figure 7) comes from:
![The "PFAS and Fluorinated Organic Compound Collections" node, with all major collections shown (as of 24 March 2022).](fig/PFAS_list_of_lists.png)
Currently, the content (see Figure 8) comes from:
- All [PFAS lists](
from the
......@@ -252,17 +288,47 @@ in PubChem;
## Navigating the Tree {#search}
While the tree offers possibilities for browsing and searching
PFAS and other organofluorine content, there are several powerful
search capabilites to empower this further.
search capabilites to empower this further, explained in the next
### Search via PubChem Search {#pc-search}
Content still to come ...
Perhaps the most intuitive interaction is directly through
clicking on the numbers besides each node. This sends a query
directly to the PubChem Search interface and displays the
entire node contents, as shown in Figure 9. This query follows
"OECD PFAS Definition" > "Molecule contains PFAS parts larger than
CF~2~/CF~3~" > "Breakdown by isolated PFAS part count" >
"Contains 01 isolated PFAS part" > "Count of molecules 10001-100000" >
"Contains 01xC04F09-linear" and returns the 10,555 CIDs with
a C~4~F~9~ PFAS part. This query could then be downloaded,
or sent to Entrez (see next section).
![Querying node contents in PubChem Search. When clicking on the blue numbers (left), a search window will open in a new tab (right, main image). This collection can be browsed, downloaded (see inset) or sent to Entrez (see next section). Clicking on the "?" sign will open a tool tip (left panel, yellow blurb).](fig/Tree_PubChemSearch.png)
The download file contains a number of fields of interest,
including: CIDs, names and synonyms, several properties (_e.g._ XlogP),
structural information (molecular formula, SMILES, InChI, InChIKey)
as well as several metadata entries. These relevant ones are explained
in the following table:
| Header | Description | Type |
|annothits| Annotation categories present for this CID | Text |
|annothitcount | Count of annotation categories for CID | Numeric |
|cidcdate | CID creation date | YYYYMMDD |
|sidsrcname | Name of the data source(s) contributing substance | Text |
| | (SID) information for given CID | |
|depcatg | Deposition category, reveals what type of sources | Text |
| | contributed information | |
### Interactions via Entrez {#entrez}
