Commit 17c92dc9 authored by Emma Schymanski's avatar Emma Schymanski
Browse files

finished list section

parent 7e804b12
......@@ -230,12 +230,21 @@ external content. The mapping files to construct this are kept
on the [eci/pubchem](
repository on GitLab.
More content to come including:
Currently, the content (see Figure 7) comes from:
- all [PFAS lists](
from the
[CompTox Chemicals Dashboard](
- all [PFAS lists](
from the NORMAN Suspect List Exchange
- the CORE PFAS lists from OntoChem [@barnabas_extracting_2022]
- Other collections from within PubChem Classification Trees, including
[ChEBI]( and
- CompTox
- OntoChem
- Other PubChem collecitons
![The "PFAS and Fluorinated Organic Compound Collections" node, with all major collections shown (status 24 March 2022).](fig/PFAS_list_of_lists.png)
No preview for this file type
......@@ -90,7 +90,6 @@
author = {Mayfield, John}
title = {{PubChem} {OECD} {PFAS} {Larger} {PFAS} {Parts} file for {MetFrag}},
copyright = {Creative Commons Attribution 4.0 International, Open Access},
......@@ -107,4 +106,32 @@ Type: dataset},
keywords = {MetFrag, PFAS},
title = {The {CompTox} {Chemistry} {Dashboard}: a community data resource for environmental chemistry},
volume = {9},
issn = {1758-2946},
shorttitle = {The {CompTox} {Chemistry} {Dashboard}},
url = {},
doi = {10.1186/s13321-017-0247-6},
language = {en},
number = {1},
urldate = {2018-11-01},
journal = {Journal of Cheminformatics},
author = {Williams, Antony J. and Grulke, Christopher M. and Edwards, Jeff and McEachran, Andrew D. and Mansouri, Kamel and Baker, Nancy C. and Patlewicz, Grace and Shah, Imran and Wambaugh, John F. and Judson, Richard S. and Richard, Ann M.},
month = dec,
year = {2017},
pages = {61}
type = {preprint},
title = {Extracting and {Comparing} {PFAS} from {Literature} and {Patent} {Documents} using {Open} {Access} {Chemistry} {Toolkits}},
url = {},
abstract = {The extraction of chemical information from documents is a demanding task in cheminformatics due to the variety of text and image-based representations of chemistry. The present work describes the extraction of chemical compounds with unique chemical structures from the open access CORE (COnnecting REpositories) and Google Patents full text document repositories. The importance of structure normalization is demonstrated using three open access cheminformatics toolkits: CDK, RDKit and OpenChemLib. Each toolkit was used for structure parsing, normalization and subsequent substructure searching, using SMILES as structure representations of chemical molecules. Per- and polyfluoroalkyl substances (PFAS) were chosen as a case study to perform the substructure search, due to their high environmental relevance, their presence in both literature and patent corpuses, and the current lack of community consensus on their definition. Three different structural definitions of PFAS were chosen to highlight the implications of various definitions from a cheminformatics perspective. Since CDK, RDKit and OpenChemLib implement different criteria and methods for SMILES parsing and normalization, different numbers of parsed compounds were extracted, which were then evaluated using the three PFAS definitions. A comparison of these toolkits and definitions is provided, along with a discussion of the implications for PFAS screening and text mining efforts in cheminformatics. Finally, the extracted PFAS ({\textasciitilde}1.7 M PFAS from patents and {\textasciitilde}27K from CORE) were compared against various existing PFAS lists and are provided in various formats for further community research efforts.},
urldate = {2022-03-25},
institution = {Chemistry},
author = {Barnabas, Shadrack and Böhme, Timo and Boyer, Stephen and Irmer, Matthias and Ruttkies, Christoph and Kondic, Todor and Schymanski, Emma and Weber, Lutz},
month = mar,
year = {2022},
doi = {10.26434/chemrxiv-2022-nmnnd}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment