Commit 8ad1c154 authored by Marek Ostaszewski's avatar Marek Ostaszewski
Browse files

Merge branch 'marek.ostaszewski-master-patch-29936' into 'master'


See merge request !297
parents 5ba70be1 f82af11f
#OpenNLP-based text mining workflow (by Miguel Vázquez and Arnau Montagud)
# OpenNLP-based text mining workflow (by Miguel Vázquez and Arnau Montagud)
Text mining and natural language processing (NLP) were used to help curators and modellers complete their networks. The same pipeline was applied to two different corpora: One is the CORD-19 [PMID: 32510522], the other is the collection of MEDLINE abstracts associated to the genes in the PPI network from Gordon et al [PMID: 32353859] using the Entrez Gene reference into function (GeneRIF). The text-mining pipeline consisted in identifying sentences using OpenNLP ( and mentions of genes using GNormPlus [DOI:]. Sentences with mentions to at least two different genes were reported. Each of these sentences thus contains one or more pairs of genes that might have a protein-protein interaction (PPI) between them, for each of the corpora. Additionally, a tentative network consisting of all the potential PPIs is derived from each corpora but restricted to only genes in the Gordon et al. publication. These 4 resources (2 for each corpora) contain the literal and normalized gene mentions, the text of the corresponding sentence, and the coordinates of those sentences in the original document.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment