Terrier Information Retrieval Platform version 1.0.1

This is the documentation for Terrier Information Retrieval Platform version 1.0.1. Terrier is open source and it is distributed under the Mozilla Public License (MPL) Version 1.1. The license can also be found in the file LICENSE.txt, in the top directory of the distribution.

Overview

Terrier is a cross-platform framework for the rapid development of Information Retrieval (IR) applications, and it is implemented in Java. Among other features, it offers a range of IR models derived from the Divergence From Randomness (DFR) Framework [1], along with several classic IR models, such as tf-idf, BM25 and Ponte-Croft's language model.

Terrier has been tested on Linux, Windows and Mac OS X platforms. In particular for the Darwin Mac OS X platforms, the Linux scripts can be used. Terrier is distributed as a .zip file, or as a .tar.gz file, created using the GNU tar utility. The only requirements for using Terrier is having a Java Virtual machine.

After downloading and uncompressing Terrier, you may proceed with using it. For the remainder of this document, we assume that the operating system is Linux, the shell is Bash, and the .tar.gz distribution will be extracted in the directory directory /local:

bash-2.05b$ cd /local
bash-2.05b$ gzip -dc terrier-1.0.1.tar.gz | tar xf -
bash-2.05b$ ls terrier
bin  etc  licenses     Makefile    share  var
doc  lib  LICENSE.txt  README.txt  src
bash-2.05b$ cd terrier
bash-2.05b$ ls
bin  etc  licenses     Makefile    share  var
doc  lib  LICENSE.txt  README.txt  src
bash-2.05b$

A brief description of the created directories and files is given bellow:

bin A directory that contains scripts for setting up and running the applications distributed with Terrier.
doc A directory that contains documentation about setting up, indexing using and programming with Terrier, as well as source code level documentation generated automatically with Javadoc.
etc A directory that contains all the configuration files for Terrier.
lib A directory that contains a jar file with the compiled classes of Terrier and all the other third-party libraries used by Terrier.
licenses A directory that contains license information about Terrier and all the third-party libraries used by Terrier.
shareA directory that contains various files used by the Terrier platform and its applications (eg image files for DesktopTerrier).
srcA directory that contains the source code of Terrier.
varA directory that contains the data structures, created by Terrier during indexing and the results of batch retrieval experiments.
LICENSE.txtThe Mozilla Public License.
MakefileA makefile for building the Terrier platform.
README.txtA short description of the distribution.

By this stage, Terrier is ready to use with the sample applications it comes with. If you want to use Terrier for developping your applications, then you may refer to the guide for developping with Terrier, or to the automatically generated source code documentation. You may also see a list of configurable properties, or a list of new features and changes, for each version of Terrier.

The Terrier team actively welcomes patches, especially in areas identified in the TO-DO list. Firstly, if you wish to alter Terrier, please join the mailing lists. Discuss your needs on the mailing list, and if possible document the results on the Terrier Wiki. As with any large system there are often many ways of achieving the same outcome. The Terrier development team have used Terrier in many contexts since 2001, and are able to provide advice on methods, and their advantages and pitfalls.

For more information, you can visit the homepage of Terrier, or you can contact the Terrier team at terrier@dcs.gla.ac.uk. Additional information can be found in [2].

[1] G. Amati and C.J. van Rijsbergen. Probabilistic models of information retrieval based on measuring divergence from randomness. ACM Transactions on Information Systems, 20(4):357-389, 2002.
[2] I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald and D. Johnson. Terrier Information Retrieval Platform. In Proceedings of ECIR'05, Santiago de Compostela, Spain, 2005