
Terrier provides APIs for indexing documents, and querying the generated indices. If you are developing applications using Terrier or extending it for your own research, then you may find the following information useful.
Terrier has a very flexible and modular architecture, with many classes, some with various alternatives. It is very easy to change many parts of the indexing and retrieval process. Essential to any in-depth extension of Terrier is to examine the very many properties that can be configured in Terrier. For instance, if you write a new Matching class, you can use this in a TREC-like setting by setting the property trec.matching, while if you write a new document weighting model you should set the property trec.model to use it, or add it in the etc/trec.models file. For more information about extending the retrieval functionalities of Terrier, see Extending Retrieval, and Extending Indexing for more information about the indexing process Terrier uses.
All File IO in Terrier (excluding the Desktop application and Terrier configuration) is performed using the Files class. This affords various opportunities for allowing Terrier to run in various environments. In Terrier, a FileSystem abstraction layer was integrated into the Files class, such that other FileSystem implementations could be plugged in. By default, Terrier ships with two implementation, namely LocalFileSystem for reading the local file system using the Java API, and HTTPFileSystem for reading files accessible by HTTP or HTTPS protocols. A filename is searched for a prefixing scheme (eg "file://"), similar to a URI or URL. If a scheme is detected, then Terrier will search through its known file system implementations for a file system supporting the found scheme. file:// is the default scheme if no scheme can be found in the filename; if the filename starts http://, then the file will be fetched by HTTP. Since Terrier 2.2, this abstraction layer has also supported Hadoop Distributed Filesystem for prefixes with hdfs:// - for more information, see Configuring Terrier for Hadoop.
The Files layer can also transform paths to filenames on the fly. For example, if a certain HTTP namespace is accessible as a local file system, the Files layer can be informed using Files.addPathTransformation(). If you have a slow network file system, consider using the in-built caching layer in Files.
Additional implementations can implement methods of the FileSystem interface that they support, and register themselves by calling the Files.addFileSystemCapability() method. The FileSystem denotes the operations it supports on a file or path by returning the bit-wise OR of the constants named in Files.FSCapability.
The main Terrier distribution comes pre-compiled as Java, and can be run on any Java 1.5 JDK. You should have no need to compile Terrier unless you have altered the Terrier source code and wish to check or use your changes.
Terrier is distributed with an Ant build.xml file to build terrier. It compiles files in the src folder, and creates terrier-$VERSION.jar files in the lib/ folder. The following Ant targets are defined:
If you use the Eclipse IDE, then you can get it to correctly compile Terrier by installing the Antlr Eclipse Plugin.
[Previous: List of Terrier properties] [Contents] [Next: Extending Indexing]Copyright © 2010 University of Glasgow | All Rights Reserved