Funded Projects
The Dragon ToolKit
|
The
Dragon Tooolkit is a cute Java-based development
package for academic research use in language modeling (LM) and information
retrieval (IR). Language modeling has recently emerged as an attractive new
framework for text information retrieval and text mining (TM). However, most Java-based free search engines such as Lucene
does not support LM very well. The Lemur toolkit is designed for LM
and IR, but written in C and C++, which may be a hindrance to people who
prefer Java programming. Basically, the dragon toolkit is tailored for
researchers who work on large-scale LM and IR and prefer Java programming.
Moreover, different from Lucene and Lemur, it
provides built-in supports for semantic-based IR and TM. The dragon tookit seamlessly intergrates
and implements a set of NLP tools, which enable the toolkit to index text
collections with various representation schemes including words, phrases,
ontology-based concepts and relationships. However, to minimize the learning
time, we intentionally keep the package small and simple. The toolkit does
not have some features including distributed IR and cross-language IR which
are part of Lemur toolkit. |
|
How
to Cite Dragon Toolkit |
|
If
you are using the Dragon Toolkit for research work, please cite it in your
published papers: Zhou,
X., Zhang, X., and Hu, X., The Dragon Toolkit, Data
Mining & Bioinformatics Lab, iSchool at |
|
Download
Dragon Toolkit |
|
Get
the Dragon Toolkit source code and binary libraries (including external
libraries) and necessary supporting data. Click here
to download. |