Commit 1da53d98 authored by Emma Schymanski's avatar Emma Schymanski
Browse files

added further details

...and with that, the first draft is complete.
parent f8bc6549
......@@ -531,16 +531,47 @@ write.csv(PFAS_tree, file="PubChem_PFAS_Tree_Details.csv",row.names = F)
## Further Details {#details}
Content still to come ...
This documentation is primarily aimed at describing the features of the
[PubChem PFAS Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120),
with the implementation to be described elsewhere. Nonetheless, some
technical details are necessary and this is the section for it. This section
will be expanded as questions arise.
The [PubChem PFAS Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120)
currently excludes molecules (compounds) from consideration if it:
- test set
- notes on implementation
- Exclude molecules from consideration if:
- Is a mixture (i.e., has multiple components, which includes any salts)
- Contains a radical or isotopically labelled atom
- [molecules with non-organic elements not currently being removed]
Since the entire tree is constructed on CIDs, _i.e._ compounds, substance
entries (denoted by substance identifiers, SID) are also not included. Thus,
undefined or poorly defined entities are also not included.
While there is code to remove molecules with non-organic elements,
some entries are currently still sneaking through.
A test set of PFAS and non-PFAS from the OECD Report
[@oecd_reconciling_2021] has been compiled to check the
performance of the
[PubChem PFAS Tree](https://pubchem.ncbi.nlm.nih.gov/classification/#hid=120).
The test set (XLSX) can be downloaded
[here](https://gitlab.lcsb.uni.lu/eci/pubchem/-/raw/master/annotations/pfas/OECD_Report_Examples.xlsx?inline=false). Other formats can be made available if
requested (and if possible).
The current approach has a few limitations, which are being addressed
in future developments (and will be released as ready). These include:
- Ether (and other connecting atom) handling
- Handling of unsaturated PFAS
- Better browseability of special cases
User feedback is extremely valuable to help improve this tree further.
Please reach out to either contact author (details on first page,
or email [Evan](mailto:evan.bolton@nih.gov) and
[Emma](mailto:emma.schymanski@uni.lu) directly)
with feedback and comments!
......@@ -554,6 +585,7 @@ Content still to come ...
- ELS: Conceptualization (equal), data curation, methodology, software, validation, writing - original draft preparation, writing - review and editing.
- PC: Validation (supporting)
- TK: Software
- PAT: Data curation, methodology, software
- JZ: Data curation, methodology, software
- EEB: Conceptualization (equal), data curation, methodology, software (lead), validation, writing - original draft preparation, writing - review and editing.
......
No preview for this file type
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment