From f82af11f4da02716632f6a7b1e70a6852156f369 Mon Sep 17 00:00:00 2001 From: Marek Ostaszewski Date: Fri, 28 May 2021 08:50:43 +0000 Subject: [PATCH] Update Readme.md --- Resources/Text mining/Readme.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Resources/Text mining/Readme.md b/Resources/Text mining/Readme.md index 2b170a1..993426f 100644 --- a/Resources/Text mining/Readme.md +++ b/Resources/Text mining/Readme.md @@ -1,3 +1,3 @@ -#OpenNLP-based text mining workflow (by Miguel Vázquez and Arnau Montagud) +# OpenNLP-based text mining workflow (by Miguel Vázquez and Arnau Montagud) Text mining and natural language processing (NLP) were used to help curators and modellers complete their networks. The same pipeline was applied to two different corpora: One is the CORD-19 [PMID: 32510522], the other is the collection of MEDLINE abstracts associated to the genes in the PPI network from Gordon et al [PMID: 32353859] using the Entrez Gene reference into function (GeneRIF). The text-mining pipeline consisted in identifying sentences using OpenNLP (https://opennlp.apache.org/) and mentions of genes using GNormPlus [DOI: dx.doi.org/10.1155/2015/918710]. Sentences with mentions to at least two different genes were reported. Each of these sentences thus contains one or more pairs of genes that might have a protein-protein interaction (PPI) between them, for each of the corpora. Additionally, a tentative network consisting of all the potential PPIs is derived from each corpora but restricted to only genes in the Gordon et al. publication. These 4 resources (2 for each corpora) contain the literal and normalized gene mentions, the text of the corresponding sentence, and the coordinates of those sentences in the original document. -- GitLab