Description of Configurable properties of Terrier |
![]() |
Terrier allows the user to configure many different aspects of the framework, in order to be adaptable to the specific needs of different applications. Here, we describe the properties that are used while indexing or retrieving. A sample of how to set up the basic properties can be found in etc/terrier.properties.sample.
Property | termcodes.garbagecollect |
Used in | uk.ac.gla.terrier.utility.TermCodes |
Possible values | true, false |
Default value | true |
Configures | enables or disables garbage collection while resetting the hash map of the class TermCodes |
Property | interactive.output.format.length |
Used in | uk.ac.gla.terrier.applications.InteractiveQuerying |
Possible values | integer number > 0 |
Default value | 1000 |
Configures | the maximum number of results to be displayed for Interactive querying |
Property | trec.output.format.length |
Used in | uk.ac.gla.terrier.applications.TRECQuerying, uk.ac.gla.terrier.applications.TRECQueryingExpansion, uk.ac.gla.terrier.applications.TRECLMQuerying |
Possible values | integer number > 0 |
Default value | 1000 |
Configures | the maximum number of results to be displayed for TREC querying |
Property | language.model |
Used in | uk.ac.gla.terrier.applications.TRECLMIndexing, uk.ac.gla.terrier.applications.TRECLMQuerying |
Possible values | Class names of implemented language models from the package uk.ac.gla.terrier.matching.models.languagemodel |
Default value | PonteCroft |
Configures | the language model to be used for indexing and querying the collection. Querying requires collection to have been indexed for language modelling. |
Property | trec.iteration |
Used in | uk.ac.gla.terrier.applications.TRECQuerying, uk.ac.gla.terrier.applications.TRECQueryingExpansion, uk.ac.gla.terrier.applications.TRECLMQuerying |
Possible values | String |
Default value | Q |
Configures | Related to standard format of TREC results |
Property | matching.tsms |
Used in | uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching |
Possible values | Comma delimited names of classes in uk/ac/gla/terrier/matching/tsms, or other fully qualified models |
Default value | not specified |
Configures | Specifies the static uk.ac.gla.terrier.matching.dsms.DocumentScoreModifiers that should be applied for all queries |
Property | matching.dsms |
Used in | uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching |
Possible values | Comma delimited names of classes in uk/ac/gla/terrier/matching/dsms, or other fully qualified models |
Default value | not specified |
Configures | Specifies the static uk.ac.gla.terrier.matching.dsms.DocumentScoreModifiers that should be applied to all terms of all queries. |
Property | matching.retrieved_set_size |
Used in | uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching |
Possible values | integer values > 0 |
Default value | 1000 |
Configures | Maximum size of the result set. |
Property | frequency.upper.threshold |
Used in | uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching |
Possible values | integer values >= 0 |
Default value | 0 |
Configures | Sets a maximum value for the document frequency of any term in a document. (Term spam prevention). 0 means no threshold. |
Property | ignore.low.idf.terms |
Used in | uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching |
Possible values | true, false |
Default value | true |
Configures | Ignores a term that has a low IDF, ie appears in many documents. You may wish to turn this off for small or focused collections. |
Property | match.empty.query |
Used in | uk.ac.gla.terrier.matching.Matching, uk.ac.gla.terrier.matching.LMMatching |
Possible values | true, false |
Default value | true |
Configures | If true, return all documents for an empty query. Use this if you have post filter/processes to filter out the documents. Eg link: site: etc |
Property | querying.allowed.controls |
Used in | uk.ac.gla.terrier.querying.Manager |
Possible values | Comma delimited list of which controls are allowed to be specified on the query. For use in interactive querying. |
Default value | c, range |
Configures | Comma delimited list of which controls are allowed to be specified on the query. For use in interactive querying. "String:String" in the query are assumed to be fields unless the first string is an allowed control. An example value would be: c, range, link, site. |
Property | querying.default.controls |
Used in | uk.ac.gla.terrier.querying.Manager |
Possible values | Comma delimited list of control names and values. Names and values are separated by colon. |
Default value | not specified |
Configures | Sets the defaults control values for the querying process. Controls are used to control the querying process, and may be used to set matching models, post filters post processes etc. An example value would be: c:10,site:gla.ac.uk |
Property | querying.postprocesses.order |
Used in | uk.ac.gla.terrier.querying.Manager |
Possible values | Comma delimited list of all allowed post processes. |
Default value | not specified |
Configures | Specifies the order in which post processes may be be called, and those that may be called. This is because post processes often have inter-dependancies. An example value would be: QueryExpansion,Scope,Site |
Property | querying.postprocesses.controls |
Used in | uk.ac.gla.terrier.querying.Manager |
Possible values | Comma and colon delimited list of control names and post process names. |
Default value | not specified |
Configures | Specifies which controls enable which post processes. An example value would be: site:Site,qe:QueryExpansion,scope:Scope |
Property | querying.postfilters.order |
Used in | uk.ac.gla.terrier.querying.Manager |
Possible values | Comma delimited list of all allowed post filters. |
Default value | not specified |
Configures | Specifies the order in which post filters may be be called, and those that may be called. This is because post filters often have inter-dependancies. An example value would be: LinkFilter |
Property | querying.postfilters.controls |
Used in | uk.ac.gla.terrier.querying.Manager |
Possible values | Comma and colon delimited list of control names and post filter names. |
Default value | not specified |
Configures | Specifies which controls enable which post filters. An example value would be: link:LinkFilter |
Property | termpipelines |
Used in | uk.ac.gla.terrier.querying.Manager, uk.ac.gla.terrier.indexing.Indexer |
Possible values | Comma delimited list of term pipeline entities to pass query terms through |
Default value | Stopwords,PorterStemmer |
Configures | Defines which term pipeline entities to pass query terms through. |
Property | invertedfile.processterms |
Used in | uk.ac.gla.terrier.structures.indexing.InvertedIndexBuilder, uk.ac.gla.terrier.structures.indexing.BlockInvertedIndexBuilder |
Possible values | Integer value > 0 |
Default value | 75000 |
Configures | Defines the number of terms that should be processed at once when building the inverted index. The InvertedIndexBuilder scans the direct index looking for each of these term, then writes them to inverted index, then repeats scan for next bunch of terms. Increasing this speeds up inverted index building for large collections, but uses more memory. |
Property | lexicon.builder.templexperdir |
Used in | uk.ac.gla.terrier.structures.indexing.LexiconBuilder, uk.ac.gla.terrier.structures.indexing.BlockLexiconBuilder |
Possible values | integer values > 0 |
Default value | 100 |
Configures | Number of temporary lexicon files to place in each temporary directory during lexicon building. |
Property | te.suffix |
Used in | uk.ac.gla.terrier.structures.indexing.TermEstimateIndex |
Possible values | A reasonable filename extension for the Language modelling term estimates data structure. |
Default value | te |
Configures | The filename extension for the Language modelling term estimates data structure |
Property | dw.suffix |
Used in | uk.ac.gla.terrier.structures.indexing.DocumentInitialWeightIndex |
Possible values | A reasonable filename extension for the Language modelling document weight data structure. |
Default value | dw |
Configures | The filename extension for the Language modelling document weights data structure |
Property | field.modifiers |
Used in | uk.ac.gla.terrier.utility.FieldScore |
Possible values | List of double values, comma separated |
Default value | not specified |
Configures | Boosts the score assigned to these terms by the given amount when it occurs in the field of the same place |
Property | ???.process |
Used in | uk.ac.gla.terrier.utility.TagSet |
Possible values | Comma delimited list of tags to process |
Default value | not specified |
Configures | For many of the tokenisers, configures which tags should be processed. |
Property | ???.skip |
Used in | uk.ac.gla.terrier.utility.TagSet |
Possible values | Comma delimited list of tags to not process |
Default value | not specified |
Configures | For many of the tokenisers, configures which tags should be skipped completely |
Property | ???.doctag |
Used in | uk.ac.gla.terrier.utility.TagSet |
Possible values | Name of tag that marks the start of the document (trec only) |
Default value | not specified |
Configures | For some of the tokenisers, configures which tag which contains the opening tag (or query ID) |
Property | ???.idtag |
Used in | uk.ac.gla.terrier.utility.TagSet |
Possible values | Name of tag that contains the unique identifier (trec only) |
Default value | not specified |
Configures | For some of the tokenisers, configures which tag which contains the document ID (or query ID) |
Property | termcodes.initialcapacity |
Used in | uk.ac.gla.terrier.utility.TermCodes |
Possible values | integer value > 0 |
Default value | 3000000 |
Configures | Specifes the initial size of the hashmap used for the term->term_id mapping. Setting this appropriately decreases the likelihood of a hashmap grow occurring |
Property | termcodes.garbagecollect |
Used in | uk.ac.gla.terrier.utility.TermCodes |
Possible values | true, false |
Default value | true |
Configures | If true, forces a full Java garbage collection to reclaim memory once the direct index creation has finished, as the hashmap is then no longer required. |
Property | terrier.home |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup |
Possible values | Absolute directory path |
Default value | not specified |
Configures | TERRIER_HOME. Where Terrier is installed. |
Property | terrier.etc |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.TRECCollection |
Possible values | Absolute directory path |
Default value | TERRIER_HOME + "etc/" |
Configures | TERRIER_ETC. Where terrier finds it's terrier.properties file if -Dterrier.setup is not specified |
Property | terrier.share |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.terms.Stopwords |
Possible values | Absolute directory path |
Default value | TERRIER_HOME + "share/" |
Configures | TERRIER_SHARE. Where static distribution files are found. |
Property | terrier.var |
Used in | |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.applications.desktop.filehandling.WindowsFileOpener, uk.ac.gla.terrier.structures.Index |
Possible values | Absolute directory path |
Default value | TERRIER_HOME + "var/" |
Configures | TERRIER_VAR. Where Terrier file it builds are found. |
Property | collection.spec |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.SimpleFileCollection, uk.ac.gla.terrier.indexing.TRECCollection |
Possible values | Absolute filename |
Default value | TERRIER_ETC + value of "collection.spec" |
Configures | COLLECTION_SPEC. Where the indexing process should find it's configuration for the Collection object. This is often a list of files or directories. |
Property | trec.results |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, TrecTerrier |
Possible values | Absolute directory path |
Default value | TERRIER_VAR + value of "trec.results"" |
Configures | TREC_RESULTS. Where TREC*Querying applications should store their results files and where evaluation files should be placed. |
Property | trec.topics.list |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.structures.TRECQuery |
Possible values | Absolute filename |
Default value | TERRIER_ETC + value of "trec.topics.list" |
Configures | TREC_TOPICS_LIST. Contains file with list of TREC topics (queries) files to run. |
Property | trec.results.suffix |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup |
Possible values | Sensible filename extension for results files. |
Default value | ".res" |
Configures | TREC_RESULTS_SUFFIX. Filename extensions given to output files |
Property | trec.models |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup |
Possible values | Absolute filename |
Default value | TERRIER_ETC + value of "trec.models" |
Configures | TREC_MODELS. Contains file with list of Weightign models to query with |
Property | trec.qrels |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup |
Possible values | Absolute filename |
Default value | TERRIER_ETC + "trec.qrels" |
Configures | TREC_QRELS. Contains file with list of TREC qrel files to evaluate with |
Property | if.suffix |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BasicIndexer , uk.ac.gla.terrier.structures.Index |
Possible values | Sensible filename extension for inverted index files. |
Default value | ".if" |
Configures | IFSUFFIX.Filename extension for inverted index files. |
Property | lexicon.suffix |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BasicIndexer, uk.ac.gla.terrier.indexing.BlockIndexer, uk.ac.gla.terrier.structures.Index |
Possible values | Sensible filename extension for lexicon files. |
Default value | ".lex" |
Configures | LEXICONSUFFIX. Filename extension for lexicon files. |
Property | doc.index.suffix |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BasicIndexer, uk.ac.gla.terrier.indexing.BlockIndexer |
Possible values | Sensible filename extension for document index files. |
Default value | ".docid" |
Configures | DOC_INDEX_SUFFIX. Filename extension for document index files. |
Property | lexicon.index.suffix |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.structures.Index |
Possible values | Sensible filename extension for lexicon index files. |
Default value | ".lexid" |
Configures | LEXICON_INDEX_SUFFIX. Filename extension for lexicon index files. |
Property | log.suffix |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup |
Possible values | Sensible filename extension for index log files. |
Default value | ".log" |
Configures | LOG_SUFFIX. Filename extension for index log files. |
Property | df.suffix |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BasicIndexer , uk.ac.gla.terrier.indexing.BlockIndexer |
Possible values | Sensible filename extension for direct index files. |
Default value | ".df" |
Configures | DF_SUFFIX. Filename extension for direct index files. |
Property | merge.prefix |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup |
Possible values | part of a filename |
Default value | "MRG_" |
Configures | MERGE_PREFIX. Prefix of temporary lexicon files created during merging |
Property | merge.temp.number |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup |
Possible values | Integer values > 0 |
Default value | 100000 |
Configures | MERGE_TEMP_NUMBER. Used in temporary lexicon building |
Property | bundle.size |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup |
Possible values | Integer values > 0 |
Default value | 2000 |
Configures | BUNDLE_SIZE. During indexing, number of documents to be processed before a new index is created. |
Property | string.byte.length |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.FileDocument, uk.ac.gla.terrier.indexing.HTMLDocument, uk.ac.gla.terrier.indexing.TRECDocument, uk.ac.gla.terrier.indexing.TRECFullTokenizer, uk.ac.gla.terrier.structures.Lexicon, uk.ac.gla.terrier.structures.BlockLexicon, uk.ac.gla.terrier.structures.BlockLexiconInputStream, uk.ac.gla.terrier.structures.BlockLexiconOutputStream, uk.ac.gla.terrier.structures.DocumentIndex, uk.ac.gla.terrier.structures.DocumentIndexEncoded, uk.ac.gla.terrier.structures.DocumentIndexInMemory, uk.ac.gla.terrier.structures.DocumentIndexInputStream, uk.ac.gla.terrier.structures.LexiconInputStream, uk.ac.gla.terrier.structures.LexiconOutputStream |
Possible values | Integer value > 0 |
Default value | 20 |
Configures | STRING_BYTE_LENGTH. The size in the lexicon reserved for a string term, and the size in the document index reserved for the document ID. |
Property | ignore.empty.documents |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.Indexer |
Possible values | true, false |
Default value | false |
Configures | IGNORE_EMPTY_DOCUMENTS. Whether empty documents have an entry in the document index. |
Property | terrier.index.prefix |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.Indexer |
Possible values | Filename prefix for all the indices |
Default value | "data" |
Configures | TERRIER_INDEX_PREFIX. Filename prefix for all the indices. |
Property | desktop.file.associations |
Used in | uk.ac.gla.terrier.applications.desktop.filehandling.AssociationFileOpener |
Possible values | absolute path to filename |
Default value | TERRIER_VAR/desktop.fileassoc |
Configures | the name of the file in which we save the file type associations with applications |
Property | block.size |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BlockIndexer |
Possible values | integer > 0 |
Default value | 1 |
Configures | ApplicationSetup.BLOCK_SIZE. The number of terms contained in the same block |
Property | block.indexing |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.applications.TRECIndexing |
Possible values | true, false |
Default value | false |
Configures | ApplicationSetup.BLOCK_INDEXING. Sets whether block positions should be saved during indexing. This is required to do phrasal searches. Client code should examine this to determine whether to use the BasicIndexer or the BlockIndexer. |
Property | max.blocks |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.BlockIndexer |
Possible values | integer >= 0 |
Default value | 100000 |
Configures | MAX_BLOCKS. The maximum number of blocks a document may contain. |
Property | lowercase |
Used in | uk.ac.gla.terrier.indexing.HTMLDocument, uk.ac.gla.terrier.indexing.TRECDocument, uk.ac.gla.terrier.indexing.TRECFullTokenizer |
Possible values | true, or false |
Default value | true |
Configures | Whether text is converted to lowercase before parsing |
Property | indexing.max.tokens |
Used in | uk.ac.gla.terrier.indexing.Indexer |
Possible values | integer >=0 |
Default value | 0 |
Configures | Sets a limit to the maximum number of tokens indexed for a document. The default value 0 means that there is no limit. |
Property | indexing.excel.maxfilesize.mb |
Used in | uk.ac.gla.terrier.indexing.MSExcelDocument |
Possible values | size of a file in megabytes |
Default value | 0.5 |
Configures | The maximum file size of an Excel spreadsheet to be parsed. |
Property | indexing.simplefilecollection.extensionsparsers |
Used in | uk.ac.gla.terrier.indexing.SimpleFileCollection |
Possible values | comma delimited list of file extensions and associated parsers to use for the corresponding files. |
Default value | txt:FileDocument,text:FileDocument,tex:FileDocument,bib:FileDocument, pdf:PDFDocument,html:HTMLDocument,htm:HTMLDocument,xhtml:HTMLDocument, xml:HTMLDocument,doc:MSWordDocument,ppt:MSPowerpointDocument,xls:MSExcelDocument |
Configures | The parsers to be used for processing files with the specified extensions. |
Property | indexing.simplefilecollection.defaultparser |
Used in | uk.ac.gla.terrier.indexing.SimpleFileCollection |
Possible values | fully qualified class name |
Default value | not specified |
Configures | The parser to use by default for processing files with unknown extensions |
Property | terrier.index.path |
Used in | uk.ac.gla.terrier.utility.ApplicationSetup, uk.ac.gla.terrier.indexing.SimpleFileCollection, uk.ac.gla.terrier.indexing.TRECCollection |
Possible values | fully path of a directory |
Default value | TERRIER_VAR + "index/" |
Configures | TERRIER_INDEX_PATH. The name of the directory in which the data structures created by Terrier are stored |
Property | trec.blacklist.docids |
Used in | uk.ac.gla.terrier.indexing.TRECCollection |
Possible values | full path to filename |
Default value | not specified |
Configures | The name of a file that contains a black list of document identifiers to be ignored during indexing |
Property | trec.collection.pointers |
Used in | uk.ac.gla.terrier.indexing.TRECCollection |
Possible values | full path to filename |
Default value | TERRIER_INDEX_PATH + "docpointers.col" |
Configures | The name of a file that saves pointers for each file to the original text in the collection files. |
Property | parameter.free.expansion |
Used in | uk.ac.gla.terrier.matching.models.queryexpansion.QueryExpansionModel, uk.ac.gla.terrier.structures.ExpansionTerms |
Possible values | true or false |
Default value | true |
Configures | Whether we apply parameter-free query expansion or not. |
Property | rocchio_beta |
Used in | uk.ac.gla.terrier.matching.models.queryexpansion.QueryExpansionModel, uk.ac.gla.terrier.structures.ExpansionTerms |
Possible values | float |
Default value | 0.4 |
Configures | The parameter of Rocchio's automatic query expansion |
Property | stopwords.filename |
Used in | uk.ac.gla.terrier.terms.Stopwords |
Possible values | absolute path to file |
Default value | TERRIER_SHARE + "stopword-list.txt" |
Configures | The name of the file which contains a list of stopwords. |
Copyright © 2015 University of Glasgow | All Rights Reserved