Commit 141cd40d authored by Mira Narayanan's avatar Mira Narayanan
Browse files

Added Documentation

parent c834d1e4
#+TITLE: TRANSFORMATION PRODUCTS-EXPANSION OF EXISTING REFTPs WITH ANLIKER TPs
*** Task 1: Data Retrieval
* Finding the unique TPs in Anliker and S66 EAWAGTPs
• Download *S66 | EAWAGTPs | Parent – Transformation Products* from EAWAG data file from [[https://zenodo.org/record/3754449#.YD2BP3WYXQz][Zenodo]]
• Data file - *Parent – TPs – EawagTPandParents.csv*
[[file:Images/Image1.png]]
* Download the supplementary file containing the list of target analytes (*es9b07085_si_002.xlsx*) from [[https://pubs.acs.org/doi/10.1021/acs.est.9b07085?goto=supporting-info][Anliker et.al.,]]
* The supplementary file contains – Substance name, CAS No., Molecular formula and Class of the TPs (the file lacks information about the parent compound from which the TP was derived)
[[file:Images/Image2.png]]
*** Task 2: CompTox Batch Search
* Anliker et.al data has only CAS-Nos and no *Pubchem CIDs*,hence CAS-Nos. were used as input for [[ https://comptox.epa.gov/dashboard/dsstoxdb/batch_search][CompTox Batch Search]]
* Select Input Type(s) – *CASRN* > select Download Chemical Data
[[file:Images/Image3.png]]
* Select Output Format (Excel) and download the file containing the following parameters:
• DTXCID, DTXSID, Chem Name, CAS-RN, INCHIKEY, IUPAC, SMILES, INCHISTRING, Molecular formula and Monoisotopic mass
[[file:Images/Image4.png]]
*** Task 3: Finding out PubChem CIDs of Anliker TPs
* Use *CASRN* from the output of batch search as an input in [[https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi][PubChem Identifier Exchange]] to get CIDs
* Input ID lists – Choose *Synonyms* and provide CAS-Nos. SMILES and INCHIs can also be given as input
* For the Operator type select *Same CID*
* For the Output ID select *CID*
* Select *Entrez History* in Output method and click *Submit job*
[[file:Images/Image5.png]]
*** Task 4: Identifying unique TPs
* Data retrieval of existing TPs from PubChem Classification Browser
• TP information of Swiss Pesticides and Metabolites (S60), EawagTPs (S66), HSDBTPs (S68), LUXPEST(S69) and REFTPs (S74) were obtained from [[https://pubchem.ncbi.nlm.nih.gov/classification/#hid=101][Norman Suspect List Exchange Classification]] of the PubChem Classification Browser
[[file:Images/Image6.png]]
/(Image obtained from FAIRTP documentation- Emma and Adelene)/
* In the above figure, under S66|EAWAGTPS, the Transformation Products section has 158 TPs. Select the blue icon showing 158 and it will display the transformations in a new window (see the image below)
[[file:Images/Image7.png]]
/(Image obtained from FAIRTP documentation- Emma and Adelene)/
* The annotation section of the TPs can be found under the [[https://pubchem.ncbi.nlm.nih.gov/compound/408#section=Use-and-Manufacturing][Use and Manufacturing]] section of the PubChem compound page
[[file:Images/Image8.png]]
/(Image obtained from FAIRTP documentation- Emma and Adelene)/
* If there is a transformation in a particular compound, the TP information can be directly found under the [[https://pubchem.ncbi.nlm.nih.gov/compound/1615#section=Transformations][Transformations]] section of the PubChem compound page
file:Images/Image9.png
/(Image obtained from FAIRTP documentation- Emma and Adelene)/
* For a complete list of Pharmacology and BioChemistry TPs, then complete list of it can be obtained from [[https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72][PubChem Compound TOC]] of the PubChem Classification Browser
[[file:Images/Image10.png]]
* Removing duplicate TPs between Anliker and existing TPs of Norman Suspect List Exchange
• CIDs was used as a field to identify unique TPs and duplicates were removed
*** Challenges:
Overall Anliker dataset cannot be considered as a ‘good primary reference’ owing to some of the reasons listed below:
* Missing CAS-Nos in Anliker list as a result of CompToX batch search (most of them had to be manually filled)
* Substance/compound names were different from PubChem and most of them were misspelt
* Major issue – only TP information were provided in Anliker dataset and no parent information (missing links) and lot of assumptions has to be made to fill the gaps
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment