[Previous: What's New] [Contents] [Next: Terrier Components]

Installing and Running Terrier

If you are interested to use Terrier straight away in order to index and retrieve from standard test collections, then you may follow the steps described below. We provide step-by-step instructions for the installation of Terrier on Linux and Windows operating systems and guide you through your first indexing and retrieval steps on the TREC WT2G test collection.

Terrier Requirements

Terrier's single requirement consists of an installed Java JRE 1.6.0 or higher. You can download the JRE, or the JDK (if you want to develop with Terrier, or run the web-based interface), from the Java website.

Download Terrier

A copy of Terrier version 3.6 can be downloaded from the following location: [Terrier Home]. The site offers pre-compiled releases of the newest and previous Unix and Windows versions of Terrier.

Step by Step Unix Installation

After having downloaded Terrier, copy the file to the directory where you want to install Terrier. Navigate to this directory and execute the following command to decompress the distribution:

tar -zxvf terrier-3.6.tar.gz

This will result in the creation of a terrier directory in your current directory. Next we will have to make sure that you have the correct Java version available on the system. Type:

echo $JAVA_HOME

If the environment variable $JAVA_HOME is set, this command will output the path of your Java installation. (e.g. /usr/java/jre1.6.0). If this command shows that you have a correct Java version (1.6.0 or later) installed then your all done. If your system does not meet these requirements you can download a Java 1.6 from the JRE 1.6 download website and set the environment variable by including the following line either in your /etc/profile or ~/.bashrc files:

export JAVA_HOME="Absolute_Path_of_Java_Installation"

Step by Step Windows Installation

In order to be able to use Terrier you simply have to extract the contents of the downloaded Zip file into a directory of your choice. Terrier requires Java version 1.6 or higher. If your system does not meet this requirement you can download an appropriate version from the JRE download website. Finally, Terrier assumes that java.exe is on the path, so you should use the System applet in the control panel, to ensure that your Java\bin folder is in your PATH environment variable.

Using Terrier

Terrier comes with three applications:

Batch (TREC) Terrier

This allows you to easily index, retrieve, and evaluate results on TREC collections. In the next session, we provide you with a step-by-step tutorial of how to use this application.

Interactive Terrier

This allows you to to do interactive retrieval. This is a quick way to test Terrier. Given that you have installed Terrier on Windows you can start Interactive Terrier by executing the interactive_terrier.bat file in Terrier's bin directory. On a Unix system or Mac you can run interactive Terrier by executing the interactive_terrier.sh file. You can configure the retrieval functionalities of InteractiveTerrier using properties described in the InteractiveQuerying class.

Desktop Terrier

A sample Desktop search application. If you are interested in getting to know more about it you should take a look at its tutorial.

Tutorial: How to use the Batch (TREC) Terrier

Indexing

This guide will provide step-by-step instructions for using Terrier to index a TREC collection. We assume that the operating system is Linux, and that the collection, along with the topics and the relevance assessments (qrels), is stored in the directory /local/collections/WT2G/.

1. Go to the Terrier folder.

cd terrier

2. Setup Terrier for using a TREC test collection by calling

./bin/trec_setup.sh "Absolute_Path_To_Collection_Files"
in our example:
./bin/trec_setup.sh /local/collections/WT2G/

This will result in the creation of a collection.spec file containing a list of the files in the specified directory in the "etc" directory.

3. If necessary, modify the collection.spec file. This might be required if the collection directory contained files that you don't want to index. Alternatively, you can do this directly by using the following command:

find /local/collections/WT2G/ -type f | grep -v "PATTERN" > etc/collection.spec
where "PATTERN" is the regular expression used to identify the files that should not be indexed.

4. Now we are ready to actually begin the indexing of the collection.

./bin/trec_terrier.sh -i

  NB: If you don't need the direct file, e.g. for query expansion, then you can use bin/trec_terrier.sh -i -j for the faster single-pass indexing introduced for Terrier 2.0

Retrieval

In order to perform retrieval from the just indexed test collection, follow the steps described below.

1. First of all we have to do some configuration. Much of Terrier's functionality is controlled by properties. You can pre-set these in the etc/terrier.properties file, or specify each on the command line. In the following, we're going to use the command line to specify the appropriate properties. To perform retrieval and evaluate the results of a batch of queries, we need to know:

  1. The location of the queries (also known as topic files) - specified using trec.topics
  2. The weighting model (e.g. TF_IDF) to use - specified using trec.model - along with any parameter.
  3. The corresponding relevance assessments file (or qrels) for the topics - specified by trec.qrels.

2. Lets do a retrieval run:

./bin/trec_terrier.sh -r -Dtrec.model=PL2 -c 10.99 -Dtrec.topics=/local/collections/WT2G/info/topics.401-450

So what are these? The "-r" parameter instructs Terrier to perform retrieval, while "-c" tells Terrier the parameter for the weighting model. PL2 is an advanced Divergence From Randomness weighting model, which is usually more effective than TF_IDF (to learn more about the model see the description of the DFR framework).

If all goes well this will result in a .res file in the var/results directory called: InL2c10.99_0.res.

3. Now we will evaluate the obtained results by using the "-e" parameter.

./bin/trec_terrier.sh -e -Dtrec.qrels=/local/collections/WT2G/info/qrels.trec8.small_web.gz

Note that Terrier can easily read compressed files (e.g. Gzip compression - indicated by the .gz suffix).

Terrier will look at the var/results directory, evaluate each .res file and save the output in a .eval file named the same as the corresponding .res file.

6. Now we will perform retrieval again but this time with query expansion (QE) enabled by using the "-q" parameter in addition to "-r".

./bin/trec_terrier.sh -r -q -Dtrec.model=PL2 -c 10.99 -Dtrec.topics=/local/collections/WT2G/info/topics.401-450

See Information Retrieval Wiki page on Query Expansion for more information about QE. Note that your index must have a direct index structure to support QE, which is not built by default with single-pass indexing (see Configuring Indexing for more information). Afterwards we can run the evaluation again by using trec_terrier.sh with the "-e" parameter.

./bin/trec_terrier.sh -e -Dtrec.qrels=/local/collections/WT2G/info/qrels.trec8.small_web.gz

7. Now we can look at all the Mean Average Precision (MAP) values of the runs by executing:

tail -1 var/results/*.eval

The obtained MAP for the first run should be 0.3140.

The obtained MAP for the run using query expansion should be 0.3305

Interacting with Terrier

You can interact with your index using the Web-based querying interface. Firstly, start the included HTTP server:

./bin/http_terrier.sh

You can then enter queries and view results at http://localhost:8080. If your running Terrier on another machine, replace localhost with the hostname of the remote machine. For more information on configuring the Web interface, please see Using Web-based results.

[Previous: What's New] [Contents] [Next: Terrier Components]