Commit f82af11f authored by Marek Ostaszewski's avatar Marek Ostaszewski
Browse files


parent 5ba70be1
Pipeline #42410 passed with stages
in 42 seconds
#OpenNLP-based text mining workflow (by Miguel Vázquez and Arnau Montagud)
# OpenNLP-based text mining workflow (by Miguel Vázquez and Arnau Montagud)
Text mining and natural language processing (NLP) were used to help curators and modellers complete their networks. The same pipeline was applied to two different corpora: One is the CORD-19 [PMID: 32510522], the other is the collection of MEDLINE abstracts associated to the genes in the PPI network from Gordon et al [PMID: 32353859] using the Entrez Gene reference into function (GeneRIF). The text-mining pipeline consisted in identifying sentences using OpenNLP ( and mentions of genes using GNormPlus [DOI:]. Sentences with mentions to at least two different genes were reported. Each of these sentences thus contains one or more pairs of genes that might have a protein-protein interaction (PPI) between them, for each of the corpora. Additionally, a tentative network consisting of all the potential PPIs is derived from each corpora but restricted to only genes in the Gordon et al. publication. These 4 resources (2 for each corpora) contain the literal and normalized gene mentions, the text of the corresponding sentence, and the coordinates of those sentences in the original document.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment