Software
We are developing systems that may be deployed under an open source license. At present all of these are at a preliminary stage and source code is available for download from our SVN server for collaborators (contact us for access).
Text Mining Infrastructure
After having built a research prototype system for performing work in-house on our local, private corpus, we now seek to develop this approach in a way that interoperates with (and provides additional functionality to other BioNLP groups). Our approach uses the Conditional Random Field model developed by Andrew MacCallum at University of Massachusetts Amherst. In order to work well with other groups (such as Larry Hunter's group at University of Colorado Denver and the Textpresso group run by Paul Sternberg at Caltech), we seek to develop on a standard platform (such as Apache / IBM's UIMA platform).
Our distinguishing feature is that we provide tools to extract text accurately from PDF files (which is how the majority of biomedical full-text articles are published). We also base our biocuration interface and NLP annotation system on a PDF-based viewer.
KE-f-ED model demo
We have completed a preliminary demonstration system of the KE-f-ED model that provides a breakdown of neuroendocrinology experiments from 11 papers. We are currently writing this up as a paper and so will not divulge additional details here.
SciMaps
We contribute data and domain expertise to the development of online mapping tools for mapping of the neuroscientific literature. We seek to integrate our Text-mining infrastructure with this framework in collaboration with our colleagues at UCI and IU.
BioScholar
The next iteration of the NeuroScholar system will provide an local implementation of the KE-f-ED model with a biocuration interface that is driven by our text-mining approaches.
NeuArt II
NeuArt II provides an interface modeled on neuroanatomical atlases. We will augment this with indexes of the neuroanatomical literature based on extracted brain-region names.
References & Links
Conditional Random Fields
Hanna M. Wallach's introduction to CRFs
An excellent tutorial paper on the CRF model
Text Mining Infrastructure
Unstructured Information Management Architecture (UIMA)
NeuroScholar and NeuARt II







