Description of Configurable properties of Terrier

Terrier allows the user to configure many different aspects of the framework, in order to be adaptable to the specific needs of different applications. Here, we describe the properties that are used while indexing or retrieving. A sample of how to set up the basic properties can be found in etc/terrier.properties.sample.

Property termcodes.garbagecollect
Used in uk.ac.gla.terrier.utility.TermCodes
Possible values true, false
Default value true
Configures enables or disables garbage collection while resetting the hash map of the class TermCodes

Property interactive.output.format.length
Used in uk.ac.gla.terrier.applications.InteractiveQuerying
Possible values integer number > 0
Default value 1000
Configures the maximum number of results to be displayed for Interactive querying

Property trec.output.format.length
Used in uk.ac.gla.terrier.applications.TRECQuerying, uk.ac.gla.terrier.applications.TRECQueryingExpansion, uk.ac.gla.terrier.applications.TRECLMQuerying
Possible values integer number > 0
Default value 1000
Configures the maximum number of results to be displayed for TREC querying

Property language.model
Used in uk.ac.gla.terrier.applications.TRECLMIndexing, uk.ac.gla.terrier.applications.TRECLMQuerying
Possible values Class names of implemented language models from the package uk.ac.gla.terrier.matching.models.languagemodel
Default value PonteCroft
Configures the language model to be used for indexing and querying the collection. Querying requires collection to have been indexed for language modelling.

Property trec.iteration
Used in uk.ac.gla.terrier.applications.TRECQuerying, uk.ac.gla.terrier.applications.TRECQueryingExpansion, uk.ac.gla.terrier.applications.TRECLMQuerying
Possible values String
Default value Q
Configures Related to standard format of TREC results

Property matching.tsms
Used in uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching
Possible values Comma delimited names of classes in uk/ac/gla/terrier/matching/tsms, or other fully qualified models
Default value not specified
Configures Specifies the static uk.ac.gla.terrier.matching.dsms.DocumentScoreModifiers that should be applied for all queries

Property matching.dsms
Used in uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching
Possible values Comma delimited names of classes in uk/ac/gla/terrier/matching/dsms, or other fully qualified models
Default value not specified
Configures Specifies the static uk.ac.gla.terrier.matching.dsms.DocumentScoreModifiers that should be applied to all terms of all queries.

Property matching.retrieved_set_size
Used in uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching
Possible values integer values > 0
Default value 1000
Configures Maximum size of the result set.

Property frequency.upper.threshold
Used in uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching
Possible values integer values >= 0
Default value 0
Configures Sets a maximum value for the document frequency of any term in a document. (Term spam prevention). 0 means no threshold.

Property ignore.low.idf.terms
Used in uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching
Possible values true, false
Default value true
Configures Ignores a term that has a low IDF, ie appears in many documents. You may wish to turn this off for small or focused collections.

Property match.empty.query
Used in uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching
Possible values true, false
Default value true
Configures If true, return all documents for an empty query. Use this if you have post filter/processes to filter out the documents. Eg link: site: etc

Property querying.allowed.controls
Used in uk.ac.gla.terrier.querying.Manager
Possible values Comma delimited list of which controls are allowed to be specified on the query. For use in interactive querying.
Default value c, range
Configures Comma delimited list of which controls are allowed to be specified on the query. For use in interactive querying. "String:String" in the query are assumed to be fields unless the first string is an allowed control. An example value would be: c, range, link, site.

Property querying.default.controls
Used in uk.ac.gla.terrier.querying.Manager
Possible values Comma delimited list of control names and values. Names and values are separated by colon.
Default value not specified
Configures Sets the defaults control values for the querying process. Controls are used to control the querying process, and may be used to set matching models, post filters post processes etc. An example value would be: c:10,site:gla.ac.uk

Property querying.postprocesses.order
Used in uk.ac.gla.terrier.querying.Manager
Possible values Comma delimited list of all allowed post processes.
Default value not specified
Configures Specifies the order in which post processes may be be called, and those that may be called. This is because post processes often have inter-dependancies. An example value would be: QueryExpansion,Scope,Site

Property querying.postprocesses.controls
Used in uk.ac.gla.terrier.querying.Manager
Possible values Comma and colon delimited list of control names and post process names.
Default value not specified
Configures Specifies which controls enable which post processes. An example value would be: site:Site,qe:QueryExpansion,scope:Scope

Property querying.postfilters.order
Used in uk.ac.gla.terrier.querying.Manager
Possible values Comma delimited list of all allowed post filters.
Default value not specified
Configures Specifies the order in which post filters may be be called, and those that may be called. This is because post filters often have inter-dependancies. An example value would be: LinkFilter

Property querying.postfilters.controls
Used in uk.ac.gla.terrier.querying.Manager
Possible values Comma and colon delimited list of control names and post filter names.
Default value not specified
Configures Specifies which controls enable which post filters. An example value would be: link:LinkFilter

Property termpipelines
Used in uk.ac.gla.terrier.querying.Manager, uk.ac.gla.terrier.indexing.Indexer
Possible values Comma delimited list of term pipeline entities to pass query terms through
Default value Stopwords,PorterStemmer
Configures Defines which term pipeline entities to pass query terms through.

Property invertedfile.processterms
Used in uk.ac.gla.terrier.structures.indexing.InvertedIndexBuilder, uk.ac.gla.terrier.structures.indexing.BlockInvertedIndexBuilder
Possible values Integer value > 0
Default value 75000
Configures Defines the number of terms that should be processed at once when building the inverted index. The InvertedIndexBuilder scans the direct index looking for each of these term, then writes them to inverted index, then repeats scan for next bunch of terms. Increasing this speeds up inverted index building for large collections, but uses more memory.

Property lexicon.builder.templexperdir
Used in uk.ac.gla.terrier.structures.indexing.LexiconBuilder, uk.ac.gla.terrier.structures.indexing.BlockLexiconBuilder
Possible values integer values > 0
Default value 100
Configures Number of temporary lexicon files to place in each temporary directory during lexicon building.

Property te.suffix
Used in uk.ac.gla.terrier.structures.indexing.TermEstimateIndex
Possible values A reasonable filename extension for the Language modelling term estimates data structure.
Default value te
Configures The filename extension for the Language modelling term estimates data structure

Property dw.suffix
Used in uk.ac.gla.terrier.structures.indexing.DocumentInitialWeightIndex
Possible values A reasonable filename extension for the Language modelling document weight data structure.
Default value dw
Configures The filename extension for the Language modelling document weights data structure

Property field.modifiers
Used in uk.ac.gla.terrier.utility.FieldScore
Possible values List of double values, comma separated
Default value not specified
Configures Boosts the score assigned to these terms by the given amount when it occurs in the field of the same place

Comma delimited list of tags to process
Property ???.process
Used in uk.ac.gla.terrier.utility.TagSet
Possible values Comma delimited list of tags to process
Default value not specified
Configures For many of the tokenisers, configures which tags should be processed.

Property ???.skip
Used in uk.ac.gla.terrier.utility.TagSet
Possible values Comma delimited list of tags to not process
Default value not specified
Configures For many of the tokenisers, configures which tags should be skipped completely

Property ???.doctag
Used in uk.ac.gla.terrier.utility.TagSet
Possible values Name of tag that marks the start of the document (trec only)
Default value not specified
Configures For some of the tokenisers, configures which tag which contains the opening tag (or query ID)

Property ???.idtag
Used in uk.ac.gla.terrier.utility.TagSet
Possible values Name of tag that contains the unique identifier (trec only)
Default value not specified
Configures For some of the tokenisers, configures which tag which contains the document ID (or query ID)

Property termcodes.initialcapacity
Used in uk.ac.gla.terrier.utility.TermCodes
Possible values integer value > 0
Default value 3000000
Configures Specifes the initial size of the hashmap used for the term->term_id mapping. Setting this appropriately decreases the likelihood of a hashmap grow occurring

Property termcodes.garbagecollect
Used in uk.ac.gla.terrier.utility.TermCodes
Possible values true, false
Default value true
Configures If true, forces a full Java garbage collection to reclaim memory once the direct index creation has finished, as the hashmap is then no longer required.

Property terrier.home
Used in uk.ac.gla.terrier.utility.ApplicationSetup
Possible values Absolute directory path
Default value not specified
Configures TERRIER_HOME. Where Terrier is installed.

Property terrier.etc
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.TRECCollection
Possible values Absolute directory path
Default value TERRIER_HOME + "etc/"
Configures TERRIER_ETC. Where terrier finds it's terrier.properties file if -Dterrier.setup is not specified

Property terrier.share
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.terms.Stopwords
Possible values Absolute directory path
Default value TERRIER_HOME + "share/"
Configures TERRIER_SHARE. Where static distribution files are found.

Property terrier.var
Used in
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.applications.desktop.filehandling.WindowsFileOpener, uk.ac.gla.terrier.structures.Index
Possible values Absolute directory path
Default value TERRIER_HOME + "var/"
Configures TERRIER_VAR. Where Terrier file it builds are found.

Property collection.spec
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.SimpleFileCollection, uk.ac.gla.terrier.indexing.TRECCollection
Possible values Absolute filename
Default value TERRIER_ETC + value of "collection.spec"
Configures COLLECTION_SPEC. Where the indexing process should find it's configuration for the Collection object. This is often a list of files or directories.

Property trec.results
Used in uk.ac.gla.terrier.utility.ApplicationSetup, TrecTerrier
Possible values Absolute directory path
Default value TERRIER_VAR + value of "trec.results""
Configures TREC_RESULTS. Where TREC*Querying applications should store their results files and where evaluation files should be placed.

Property trec.topics.list
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.structures.TRECQuery
Possible values Absolute filename
Default value TERRIER_ETC + value of "trec.topics.list"
Configures TREC_TOPICS_LIST. Contains file with list of TREC topics (queries) files to run.

Property trec.results.suffix
Used in uk.ac.gla.terrier.utility.ApplicationSetup
Possible values Sensible filename extension for results files.
Default value ".res"
Configures TREC_RESULTS_SUFFIX. Filename extensions given to output files

Property trec.models
Used in uk.ac.gla.terrier.utility.ApplicationSetup
Possible values Absolute filename
Default value TERRIER_ETC + value of "trec.models"
Configures TREC_MODELS. Contains file with list of Weightign models to query with

Property trec.qrels
Used in uk.ac.gla.terrier.utility.ApplicationSetup
Possible values Absolute filename
Default value TERRIER_ETC + "trec.qrels"
Configures TREC_QRELS. Contains file with list of TREC qrel files to evaluate with

Property if.suffix
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BasicIndexer , uk.ac.gla.terrier.structures.Index
Possible values Sensible filename extension for inverted index files.
Default value ".if"
Configures IFSUFFIX.Filename extension for inverted index files.

Property lexicon.suffix
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BasicIndexer, uk.ac.gla.terrier.indexing.BlockIndexer, uk.ac.gla.terrier.structures.Index
Possible values Sensible filename extension for lexicon files.
Default value ".lex"
Configures LEXICONSUFFIX. Filename extension for lexicon files.

Property doc.index.suffix
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BasicIndexer, uk.ac.gla.terrier.indexing.BlockIndexer
Possible values Sensible filename extension for document index files.
Default value ".docid"
Configures DOC_INDEX_SUFFIX. Filename extension for document index files.

Property lexicon.index.suffix
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.structures.Index
Possible values Sensible filename extension for lexicon index files.
Default value ".lexid"
Configures LEXICON_INDEX_SUFFIX. Filename extension for lexicon index files.

Property log.suffix
Used in uk.ac.gla.terrier.utility.ApplicationSetup
Possible values Sensible filename extension for index log files.
Default value ".log"
Configures LOG_SUFFIX. Filename extension for index log files.

Property df.suffix
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BasicIndexer , uk.ac.gla.terrier.indexing.BlockIndexer
Possible values Sensible filename extension for direct index files.
Default value ".df"
Configures DF_SUFFIX. Filename extension for direct index files.

Property merge.prefix
Used in uk.ac.gla.terrier.utility.ApplicationSetup
Possible values part of a filename
Default value "MRG_"
Configures MERGE_PREFIX. Prefix of temporary lexicon files created during merging

Property merge.temp.number
Used in uk.ac.gla.terrier.utility.ApplicationSetup
Possible values Integer values > 0
Default value 100000
Configures MERGE_TEMP_NUMBER. Used in temporary lexicon building

Property bundle.size
Used in uk.ac.gla.terrier.utility.ApplicationSetup
Possible values Integer values > 0
Default value 2000
Configures BUNDLE_SIZE. During indexing, number of documents to be processed before a new index is created.

Property string.byte.length
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.FileDocument, uk.ac.gla.terrier.indexing.HTMLDocument, uk.ac.gla.terrier.indexing.TRECDocument, uk.ac.gla.terrier.indexing.TRECFullTokenizer, uk.ac.gla.terrier.structures.Lexicon, uk.ac.gla.terrier.structures.BlockLexicon, uk.ac.gla.terrier.structures.BlockLexiconInputStream, uk.ac.gla.terrier.structures.BlockLexiconOutputStream, uk.ac.gla.terrier.structures.DocumentIndex, uk.ac.gla.terrier.structures.DocumentIndexEncoded, uk.ac.gla.terrier.structures.DocumentIndexInMemory, uk.ac.gla.terrier.structures.DocumentIndexInputStream, uk.ac.gla.terrier.structures.LexiconInputStream, uk.ac.gla.terrier.structures.LexiconOutputStream
Possible values Integer value > 0
Default value 20
Configures STRING_BYTE_LENGTH. The size in the lexicon reserved for a string term, and the size in the document index reserved for the document ID.

Property ignore.empty.documents
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.Indexer
Possible values true, false
Default value false
Configures IGNORE_EMPTY_DOCUMENTS. Whether empty documents have an entry in the document index.

Property terrier.index.prefix
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.Indexer
Possible values Filename prefix for all the indices
Default value "data"
Configures TERRIER_INDEX_PREFIX. Filename prefix for all the indices.

Property desktop.file.associations
Used in uk.ac.gla.terrier.applications.desktop.filehandling.AssociationFileOpener
Possible values absolute path to filename
Default value TERRIER_VAR/desktop.fileassoc
Configures the name of the file in which we save the file type associations with applications

Property block.size
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BlockIndexer
Possible values integer > 0
Default value 1
Configures ApplicationSetup.BLOCK_SIZE. The number of terms contained in the same block

Property block.indexing
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.applications.TRECIndexing
Possible values true, false
Default value false
Configures ApplicationSetup.BLOCK_INDEXING. Sets whether block positions should be saved during indexing. This is required to do phrasal searches. Client code should examine this to determine whether to use the BasicIndexer or the BlockIndexer.

Property max.blocks
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BlockIndexer
Possible values integer >= 0
Default value 100000
Configures MAX_BLOCKS. The maximum number of blocks a document may contain.

Property lowercase
Used in uk.ac.gla.terrier.indexing.HTMLDocument, uk.ac.gla.terrier.indexing.TRECDocument, uk.ac.gla.terrier.indexing.TRECFullTokenizer
Possible values true, or false
Default value true
Configures Whether text is converted to lowercase before parsing

Property indexing.max.tokens
Used in uk.ac.gla.terrier.indexing.Indexer
Possible values integer >=0
Default value 0
Configures Sets a limit to the maximum number of tokens indexed for a document. The default value 0 means that there is no limit.

Property indexing.excel.maxfilesize.mb
Used in uk.ac.gla.terrier.indexing.MSExcelDocument
Possible values size of a file in megabytes
Default value 0.5
Configures The maximum file size of an Excel spreadsheet to be parsed.

Property indexing.simplefilecollection.extensionsparsers
Used in uk.ac.gla.terrier.indexing.SimpleFileCollection
Possible values comma delimited list of file extensions and associated parsers to use for the corresponding files.
Default value txt:FileDocument,text:FileDocument,tex:FileDocument,bib:FileDocument, pdf:PDFDocument,html:HTMLDocument,htm:HTMLDocument,xhtml:HTMLDocument, xml:HTMLDocument,doc:MSWordDocument,ppt:MSPowerpointDocument,xls:MSExcelDocument
Configures The parsers to be used for processing files with the specified extensions.

Property indexing.simplefilecollection.defaultparser
Used in uk.ac.gla.terrier.indexing.SimpleFileCollection
Possible values fully qualified class name
Default value not specified
Configures The parser to use by default for processing files with unknown extensions

Property terrier.index.path
Used in uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.SimpleFileCollection, uk.ac.gla.terrier.indexing.TRECCollection
Possible values fully path of a directory
Default value TERRIER_VAR + "index/"
Configures TERRIER_INDEX_PATH. The name of the directory in which the data structures created by Terrier are stored

Property trec.blacklist.docids
Used in uk.ac.gla.terrier.indexing.TRECCollection
Possible values full path to filename
Default value not specified
Configures The name of a file that contains a black list of document identifiers to be ignored during indexing

Property trec.collection.pointers
Used in uk.ac.gla.terrier.indexing.TRECCollection
Possible values full path to filename
Default value TERRIER_INDEX_PATH + "docpointers.col"
Configures The name of a file that saves pointers for each file to the original text in the collection files.

Property parameter.free.expansion
Used in uk.ac.gla.terrier.matching.models.queryexpansion.QueryExpansionModel, uk.ac.gla.terrier.structures.ExpansionTerms
Possible values true or false
Default value true
Configures Whether we apply parameter-free query expansion or not.

Property rocchio_beta
Used in uk.ac.gla.terrier.matching.models.queryexpansion.QueryExpansionModel, uk.ac.gla.terrier.structures.ExpansionTerms
Possible values float
Default value 0.4
Configures The parameter of Rocchio's automatic query expansion

Property stopwords.filename
Used in uk.ac.gla.terrier.terms.Stopwords
Possible values absolute path to file
Default value TERRIER_SHARE + "stopword-list.txt"
Configures The name of the file which contains a list of stopwords.