Configuring Terrier |
Terrier is configured overall by a few files, all in the etc/ directory. Some of these contain information specific to various applications, such as collection.spec (indexing), or trec.models and trec.topics.list (TrecTerrier). However, the most central two files are terrier.properties and terrier-log.xml. In terrier.properties, you can specify any of the various properties that are defined in Terrier. The Properties documentation page lists these. The default terrier.properties file is given below:
#directory names terrier.home=/local/terrier #default controls for query expansion querying.postprocesses.order=QueryExpansion querying.postprocesses.controls=qe:QueryExpansion #default and allowed controls querying.default.controls=c:1.0,start:0,end:999 querying.allowed.controls=c,scope,qe,qemodel,start,end #document tags specification #for processing the contents of #the documents, ignoring DOCHDR TrecDocTags.doctag=DOC TrecDocTags.idtag=DOCNO TrecDocTags.skip=DOCHDR #query tags specification TrecQueryTags.doctag=TOP TrecQueryTags.idtag=NUM TrecQueryTags.process=TOP,NUM,TITLE TrecQueryTags.skip=DESC,NARR #stop-words file stopwords.filename=stopword-list.txt #create a temporary lexicon after #indexing bundle.size documents bundle.size=2500 #the processing stages a term goes through termpipelines=Stopwords,PorterStemmer
In the terrier.properties file, properties are specified in the format name=value. Comments are lines starting with #.
When looking for properties, Terrier first checks the System properties provided by Java. This means that you can set any property on the command line. For example, to use index using no stemmer, you could use the command line:
[user@machine]$ TERRIER_OPTIONS="-Dtermpipelines=Stopwords" bin/trec_terrier.sh -i
In another example, you can run Terrier using many settings for the expansion.terms property of query expansion:
[user@machine]$ for((i=2;i<10;i++)); do TERRIER_OPTIONS="-Dexpansion.terms=$i" bin/trec_terrier.sh -r -q done
Terrier now uses Log4j for logging. You can control the amount of logging information that Terrier outputs by altering the log4j config in etc/terrier-log.xml. For more information about configuring Log4j, see the Log4j documentation.
[Previous: Terrier Components] [Contents] [Next: Configuring Indexing]Copyright © 2015 University of Glasgow | All Rights Reserved