[Previous: Hadoop MapReduce Indexing with Terrier] [Contents] [Next: Developing with Terrier]

Description of Configurable properties of Terrier

Terrier allows the user to configure many different aspects of the framework, in order to be adaptable to the specific needs of different applications. Here, we describe the properties that are used while indexing or retrieving. A sample of how to set up the basic properties can be found in etc/terrier.properties.sample. This page contains many of the properties in Terrier, broken down by category: General, Indexing, Retrieval, Desktop Search and Miscellaneous.

General properties

Property terrier.setup
Used in org.terrier.utility.ApplicationSetup
Possible values Absolute directory path
Default value not specified
Configures Specifies where Terrier finds the terrier.properties file, which is usually in the etc/ directory. Analogous to terrier.etc property
Property terrier.home
Used in org.terrier.utility.ApplicationSetup
Possible values Absolute directory path
Default value not specified
Configures ApplicationSetup.TERRIER_HOME. Where Terrier is installed.
Property terrier.etc
Used in org.terrier.utility.ApplicationSetup
Possible values Absolute directory path
Default value TERRIER_HOME + "etc/"
Configures TERRIER_ETC. Where terrier finds it's terrier.properties file if -Dterrier.setup is not specified
Property terrier.share
Used in org.terrier.utility.ApplicationSetup, org.terrier.terms.Stopwords
Possible values Absolute directory path
Default value TERRIER_HOME + "share/"
Configures ApplicationSetup.TERRIER_SHARE. Where static distribution files are found, for instance the stopword files.
Property terrier.var
Used in org.terrier.utility.ApplicationSetup, org.terrier.applications.desktop.filehandling.WindowsFileOpener, org.terrier.structures.Index
Possible values Absolute directory path
Default value TERRIER_HOME + "var/"
Configures TERRIER_VAR. Where Terrier puts files that it creates, e.g. indices and results files.
Property terrier.plugins
Used in org.terrier.utility.ApplicationSetup
Possible values A comma-separated list of plugins.
Default value not specified
Configures The list of plugins to be preloaded.
Property log4j.config
Used in org.terrier.utility.ApplicationSetup
Possible values A valid log4j configuration file
Default value terrier-log.xml
Configures ApplicationSetup.LOG4J_CONFIG. The configuration file used by log4j.

Indexing


Property terrier.index.path
Used in org.terrier.utility.ApplicationSetup, org.terrier.indexing.SimpleFileCollection, org.terrier.indexing.TRECCollection
Possible values fully path of a directory
Default value TERRIER_VAR + "index/"
Configures TERRIER_INDEX_PATH. The name of the directory in which the data structures created by Terrier are stored
Property terrier.index.prefix
Used in org.terrier.utility.ApplicationSetup, org.terrier.indexing.Indexer
Possible values Filename prefix for all the indices
Default value "data"
Configures TERRIER_INDEX_PREFIX. Filename prefix for all the indices.
Property stopwords.filename
Used in org.terrier.terms.Stopwords
Possible values absolute path to file
Default value TERRIER_SHARE + "stopword-list.txt"
Configures The name of the file which contains a list of stopwords.
Property collection.spec
Used in org.terrier.utility.ApplicationSetup, org.terrier.indexing.SimpleFileCollection, org.terrier.indexing.TRECCollection
Possible values Absolute filename
Default value TERRIER_ETC + value of "collection.spec"
Configures COLLECTION_SPEC. Where the indexing process should find it's configuration for the Collection object. This is often a list of files or directories.
Property ignore.empty.documents
Used in org.terrier.utility.ApplicationSetup, org.terrier.indexing.Indexer
Possible values true, false
Default value false
Configures IGNORE_EMPTY_DOCUMENTS. Whether empty documents have an entry in the document index.
Property ???.process
Used in org.terrier.utility.TagSet
Possible values Comma delimited list of tags to process
Default value not specified
Configures For many of the tokenisers, configures which tags should be processed. ??? can be TrecDocTags or TrecQueryTags, to configure the TREC Collection and Query parsers respectively. ??? as FieldTags specifies the field that should be stored in the index.
Property ???.skip
Used in org.terrier.utility.TagSet
Possible values Comma delimited list of tags to not process
Default value not specified
Configures For many of the tokenisers, configures which tags should be skipped completely. ??? can be TrecDocTags or TrecQueryTags, to configure the TREC Collection and Query parsers respectively.
Property ???.doctag
Used in org.terrier.utility.TagSet
Possible values Name of tag that marks the start of the document (trec only)
Default value not specified
Configures For some of the tokenisers, configures which tag which contains the opening tag (or query ID). ??? can be TrecDocTags or TrecQueryTags, to configure the TREC Collection and Query parsers respectively.
Property ???.idtag
Used in org.terrier.utility.TagSet
Possible values Name of tag that contains the unique identifier (trec only)
Default value not specified
Configures For some of the tokenisers, configures which tag which contains the document ID (or query ID). ??? can be TrecDocTags or TrecQueryTags, to configure the TREC Collection and Query parsers respectively.
Property ???.casesensitive
Used in org.terrier.utility.TagSet org.terrier.indexing.TRECCollection
Possible values true or false
Default value true for TrecDocTags, false otherwise
Configures For some of the tokenisers, configures if the tag matching is case-sensitive or not. The default is true for TRECCollection (TrecDocTags), and false for FieldTags and TrecQueryTags (TRECFullTokenizer which is used by the TREC query parser (TRECQuery)).
Property ???.propertytags
Used in org.terrier.utility.TagSet org.terrier.indexing.TRECCollection
Possible values Comma delimited list of tags to add as document properties
Default value not specified
Configures During indexing this enables document tags to be saved as document properties instead of being indexed. This is useful to store document properties in the meta index for use later, e.g. for display by the Terrier Web-based interface.
Property block.indexing
Used in org.terrier.utility.ApplicationSetup, org.terrier.applications.TRECIndexing
Possible values true, false
Default value false
Configures ApplicationSetup.BLOCK_INDEXING. Sets whether block positions should be saved during indexing. This is required to do phrasal searches. Client code should examine this to determine whether to use the BasicIndexer or the BlockIndexer.
Property blocks.size
Used in org.terrier.utility.ApplicationSetup, org.terrier.indexing.BlockIndexer
Possible values integer > 0
Default value 1
Configures ApplicationSetup.BLOCK_SIZE. The number of terms contained in the same block
Property blocks.max
Used in org.terrier.utility.ApplicationSetup, org.terrier.indexing.BlockIndexer
Possible values integer >= 0
Default value 100000
Configures MAX_BLOCKS. The maximum number of blocks a document may contain.
Property lowercase
Used in org.terrier.indexing.HTMLDocument, org.terrier.indexing.TRECDocument, org.terrier.indexing.TRECFullTokenizer
Possible values true, or false
Default value true
Configures Whether text is converted to lowercase before parsing
Property tokeniser
Used in org.terrier.indexing.tokenisation.Tokeniser
Possible values a classname implementing the Tokeniser interface
Default value EnglishTokeniser
Configures The Tokeniser implementation to be used when splitting text into tokens. This allows for corpora in different languages to be indexed by setting a Tokeniser implementation appropriate for each language.
Property indexing.max.tokens
Used in org.terrier.indexing.Indexer
Possible values integer >=0
Default value 0
Configures Sets a limit to the maximum number of tokens indexed for a document. The default value 0 means that there is no limit.
Property indexing.max.docs.per.builder
Used in org.terrier.indexing.Indexer
Possible values integer >=0
Default value 18,000,000
Configures Sets a limit to the maximum number of documents in one index during indexing. After this point, a new index will be created, and at the end, all the indices will be merged. Reasoning: During classical two-pass indexing, memory is constrained by the TermCodes table. If too many different unique terms are indexed, then an OutOfMemoryError will occur. For TREC GOV2 collection, 18 million documents is a good point to start a new index. The special value 0 means that there is no limit. This property also applies for single-pass indexing, although it can be safely set higher. It does not apply for MapReduce indexing.

Advanced

Property termpipelines
Used in org.terrier.querying.Manager, org.terrier.indexing.Indexer
Possible values Comma delimited list of term pipeline entities to pass query terms through. Use blank to denote no termpipeline objects
Default value Stopwords,PorterStemmer
Configures Defines which term pipeline entities to pass query terms through.
Property invertedfile.processpointers
Used in org.terrier.structures.indexing.InvertedIndexBuilder, org.terrier.structures.indexing.BlockInvertedIndexBuilder
Possible values Integer value > 0
Default value 20000000
Configures Defines the number of pointers that should be processed at once when building the inverted index. The InvertedIndexBuilder first works out how many terms correspond to that many pointers, then scans the direct index looking for each of these term, then writes them to inverted index, then repeats scan for next bunch of terms. Increasing this speeds up inverted index building for large collections, but uses more memory. Decrease this if you encounter OutOfMemory errors while building the inverted index. Note that for block indexing, the default is lower: 2,000,000 pointers.

This option supersedes invertedfile.processterms. For the invertedfile.processterms strategy to be used, set invertedfile.processpointers to 0.

Property lexicon.builder.merge.lex.max
Used in org.terrier.structures.indexing.LexiconBuilder, org.terrier.structures.indexing.BlockLexiconBuilder
Possible values integer values > 1
Default value 16
Configures The number of temporary lexicons to merge at once during indexing. during lexicon building. Bigger is generally faster, but too many open file-handles causes slowness. 16 is a good trade-off. (See also the MERGE_FACTOR in GNU sort source code).
Property indexing.excel.maxfilesize.mb
Used in org.terrier.indexing.MSExcelDocument
Possible values size of a file in megabytes
Default value 0.5
Configures The maximum file size of an Excel spreadsheet to be parsed.
Property indexing.simplefilecollection.extensionsparsers
Used in org.terrier.indexing.SimpleFileCollection
Possible values comma delimited list of file extensions and associated parsers to use for the corresponding files.
Default value txt:FileDocument,text:FileDocument,tex:FileDocument,bib:FileDocument, pdf:PDFDocument,html:HTMLDocument,htm:HTMLDocument,xhtml:HTMLDocument, xml:HTMLDocument,doc:MSWordDocument,ppt:MSPowerpointDocument,xls:MSExcelDocument
Configures The parsers to be used for processing files with the specified extensions.
Property indexing.simplefilecollection.defaultparser
Used in org.terrier.indexing.SimpleFileCollection
Possible values fully qualified class name
Default value not specified
Configures The parser to use by default for processing files with unknown extensions
Property trec.blacklist.docids
Used in org.terrier.indexing.TRECCollection
Possible values full path to filename
Default value not specified
Configures The name of a file that contains a black list of document identifiers to be ignored during indexing
Property trec.collection.class
Used in org.terrier.applications.TRECIndexing
Possible values a classname implementing Collection interface
Default value TRECCollection
Configures The Collection object to be used to parse the collection. This allows test collection similar but not identical to TREC to be parsed using Terrier's TREC tools. New in Terrier 1.1.0 is the ability to chain Collections. The Collection specified last is the inner-most one of the chain, the first is the outer-most (i.e. instantiation right-to-left). the first collection should have a default constructor (no arguments), while the other collections should take as argument in their constructor the inner-collection class. E.g. trec.collection.class=RemoveSmallDocsCollection,TRECCollection. Instantiation handled by the CollectionFactory class.
Property indexer.meta.forward.keys
Used in CompressingMetaIndexBuilder
Possible values comma delimited list of properties of a Document object that should be used as metadata.
Default value docno
Configures The document properties that should be recorded as document metadata.
Property indexer.meta.forward.keylens
Used in CompressingMetaIndexBuilder
Possible values comma delimited list of the lengths of the values corresponding to the keys to be used as document metadata.
Default value 20
Configures How long values can be in the MetaIndex.
Property indexer.meta.reverse.keys
Used in CompressingMetaIndexBuilder
Possible values comma delimited list of the keys that can be used to uniquely identify documents.
Default value 20
Configures The MetaIndex keys that can unique identify a document. E.g. docno,url.
Property max.term.length
Used in org.terrier.utility.ApplicationSetup, org.terrier.indexing.FileDocument, org.terrier.indexing.HTMLDocument, org.terrier.indexing.TRECDocument, org.terrier.indexing.TRECFullTokenizer, org.terrier.structures.Lexicon, org.terrier.structures.BlockLexicon, org.terrier.structures.BlockLexiconInputStream, org.terrier.structures.BlockLexiconOutputStream, org.terrier.structures.LexiconInputStream, org.terrier.structures.LexiconOutputStream
Possible values Integer value > 0
Default value 20
Configures MAX_TERM_LENGTH. The size in the lexicon reserved for a string, i.e. the max length of any term in the index. term.
Property memory.reserved
Used in org.terrier.indexing.BasicSinglePassIndexer
Possible values integer > 0, probably around 50 million
Default value 50000000
Configures Free memory threshold that forces a run to be committed to disk in the single-pass indexer. Higher values means less chance of OutOfMemoryError occurring, but slower indexing speed as more runs will be generated.
Property memory.heap.usage
Used in org.terrier.indexing.BasicSinglePassIndexer
Possible values positive float, range 0.0f - 1.0f
Default value 0.70
Configures amount of max heap allocated to JVM before a run is committed. Smaller values mean more runs and hence slower indexing. Larger values means more risk of OutOfMemoryError occurrences.
Property docs.check
Used in org.terrier.indexing.BasicSinglePassIndexer
Possible values positive integer > 0
Default value 20
Configures how often to check the amount of free memory. Lower values gives more protection from OutOfMemoryError.
Property inverted2direct.processtokens
Used in org.terrier.structures.indexing.singlepass.Inverted2DirectIndexBuilder
Possible values positive long > 0
Default value 100000000, 10000000 for blocks
Configures total number of tokens to attempt each iteration of building the direct index. Use a lower value if OutOfMemoryError occurs.
Property terrier.index.retrievalLoadingProfile.default
Used in org.terrier.structures.Index
Possible values true, false
Default value true
Configures Index.RETRIEVAL_LOADING_PROFILE. Whether index structures should be preloaded for retrieval.
Property TaggedDocument.abstracts
Used in org.terrier.indexing.TaggedDocument
Possible values Comma delimited list of abstract names to save as document properties
Default value not specified
Configures The list of abstract names to save as document properties when indexing a TaggedDocument or one of its subclasses.
Property TaggedDocument.abstracts.tags
Used in org.terrier.indexing.TaggedDocument
Possible values Comma delimited list of tags from which to save abstracts
Default value not specified
Configures The names of tags to save text from. ELSE is special tag name, which means anything not consumed by other tags.
Property TaggedDocument.abstracts.tags.casesensitive
Used in org.terrier.indexing.TaggedDocument
Possible values true or false
Default value false
Configures Configures if the tag matching is case-sensitive or not.
Property TaggedDocument.abstracts.lengths
Used in org.terrier.indexing.TaggedDocument
Possible values Comma delimited list of maximum lengths for each abstract
Default value Length 0
Configures The max lengths of the abstracts. Defaults to empty.
Property FileDocument.abstract
Used in org.terrier.indexing.FileDocument
Possible values Name to call the abstract
Default value not specified
Configures The name of the abstract to save from the document. Note that only if this is set will an abstract be generated. Only a single abstract can be generated from a FileDocument.
Property FileDocument.abstract.length
Used in org.terrier.indexing.FileDocument
Possible values The maximum length for the abstract
Default value 0
Configures The maximum length for the abstract.

Retrieval

Model

Property ignore.low.idf.terms
Used in org.terrier.matching.Matching, org.terrier.matching.LMMatching
Possible values true, false
Default value true
Configures Ignores a term that has a low IDF, ie appears in many documents. You may wish to turn this off for small or focused collections.

Interactive Retrieval

Property interactive.output.format.length
Used in org.terrier.applications.InteractiveQuerying
Possible values integer number > 0
Default value 1000
Configures the maximum number of results to be displayed for Interactive querying

TREC-style Batch Retrieval

Property trec.model
Used in org.terrier.applications.TRECQuerying
Possible values Name of weighting models
Default value InL2
Configures The weighting model to use during retrieval.
Property trec.results
Used in org.terrier.utility.ApplicationSetup, TrecTerrier
Possible values Absolute directory path
Default value TERRIER_VAR + value of "trec.results""
Configures TREC_RESULTS. Where TREC*Querying applications should store their results files and where evaluation files should be placed.
Property trec.results.file
Used in org.terrier.applications.TRECQuerying
Possible values A valid file name.
Default value not specified
Configures An arbitrary name for a TREC results file.
Property trec.querycounter.type
Used in org.terrier.applications.TRECQuerying
Possible values sequential, random
Default value sequential
Configures Whether to use sequential (auto-incremented) or randomly generated suffixes for run names.
Property trec.results.suffix
Used in org.terrier.utility.ApplicationSetup
Possible values string
Default value .res
Configures ApplicationSetup.TREC_RESULTS_SUFFIX. The suffix to be used for result files.
Property trec.runtag
Used in org.terrier.applications.TRECQuerying, org.terrier.applications.TRECQueryingExpansion
Possible values string
Default value not specified
Configures An arbitrary runtag (6th field) for a TREC results file.
Property trec.topics
Used in org.terrier.applications.TRECQuerying
Possible values A valid topics file name
Default value not specified
Configures A single file containing the topics to be processed.
Property trec.topics.parser
Used in org.terrier.applications.TRECQuerying
Possible values A sub-class of org.terrier.structures.QuerySource
Default value TRECQuery
Configures The class to be used when parsing a topics file.
Property trec.encoding
Used in org.terrier.structures.TRECQuery, org.terrier.indexing.TRECCollection, org.terrier.indexing.TRECUTFCollection, org.terrier.terms.Stopwords
Possible values A valid encoding scheme.
Default value The system's default charset.
Configures The encoding to use for topics, documents, and stopwords files.
Property trec.qrels
Used in org.terrier.utility.ApplicationSetup
Possible values Absolute filename
Default value not specified
Configures A single file containing the qrels to evaluate with.
Property trec.output.format.length
Used in org.terrier.applications.TRECQuerying, org.terrier.applications.TRECQueryingExpansion, org.terrier.applications.TRECLMQuerying
Possible values integer number > 0
Default value 1000
Configures the maximum number of results to be displayed for TREC querying
Property trec.querying.outputformat
Used in org.terrier.applications.TRECQuerying
Possible values A sub-class of TRECQuerying$OutputFormat
Default value TRECQuerying$TRECDocnoOutputFormat
Configures The class used to write the results file.
Property trec.querying.resultscache
Used in org.terrier.applications.TRECQuerying
Possible values A sub-class of TRECQuerying$QueryResultCache
Default value TRECQuerying$NullQueryResultCache
Configures The class used to cache the results.
Property trec.querying.dump.settings
Used in org.terrier.applications.TRECQuerying
Possible values true, false
Default value true
Configures Whether the settings used to generate a results file should be dumped to a .settings file in conjunction with the .res file.
Property trec.iteration
Used in org.terrier.applications.TRECQuerying, org.terrier.applications.TRECQueryingExpansion, org.terrier.applications.TRECLMQuerying
Possible values String
Default value Q
Configures Related to standard format of TREC results
Property trec.manager
Used in org.terrier.applications.TRECQuerying, org.terrier.applications.TRECQueryingExpansion,
Possible values String, Class name in org.terrier.querying
Default value Manager
Configures The Manager class to use during querying
Property trec.matching
Used in org.terrier.applications.TRECQuerying, org.terrier.applications.TRECQueryingExpansion,
Possible values String, Class name in org.terrier.matching
Default value org.terrier.matching.taat.Full
Configures The Matching class to use during querying
Property matching.trecresults.file
Used in org.terrier.matching.TRECResultsMatching
Possible values A valid TREC results file
Default value not specified
Configures The TREC-formatted results file containing search results for each of the topics specified in the trec.topics property
Property matching.trecresults.format
Used in org.terrier.matching.TRECResultsMatching
Possible values DOCNO, DOCID
Default value DOCNO
Configures Whether the TREC-formatted results file contains DOCNOs or Terrier's internal (integer) docids
Property matching.trecresults.scores
Used in org.terrier.matching.TRECResultsMatching
Possible values true, false
Default value true
Configures Whether Terrier should use the relevance scores from the TREC-formatted results file
Property matching.trecresults.length
Used in org.terrier.matching.TRECResultsMatching
Possible values a non-negative integer
Default value 1000
Configures The maximum number of results to be retrieved from a TREC results file for each query. If set to 0, all available results are retrieved (note that setting this property to 0 may slow down the retrieval process for large collections, as a result set of the size of the collection will be allocated in memory)

Query Expansion

Property parameter.free.expansion
Used in org.terrier.matching.models.queryexpansion.QueryExpansionModel
Possible values true or false
Default value true
Configures Whether we apply parameter-free query expansion or not.
Property rocchio.beta
Used in org.terrier.matching.models.queryexpansion.QueryExpansionModel
Possible values float
Default value 0.4
Configures The parameter of Rocchio's automatic query expansion
Property trec.qe.model
Used in org.terrier.applications.TRECQuerying
Possible values Query expansion models
Default value Bo1
Configures A name of a query expansion model
Property expansion.documents
Used in org.terrier.matching.models.queryexpansion.QueryExpansionModel
Possible values integer
Default value 3
Configures The number of top-ranked documents to be considered in the pseudo relevance set
Property expansion.terms
Used in org.terrier.matching.models.queryexpansion.QueryExpansionModel,
Possible values integer
Default value 10
Configures The number of the highest weighted terms from the pseudo relevance set to be added to the original query. There can be overlap between the original query terms and the added terms from the pseudo relevance set
Property expansion.mindocuments
Used in org.terrier.querying.ExpansionTerms
Possible values integer
Default value 2
Configures The minimum number of documents a term must exist in before it can be considered to be informative. Defaults to 2. For more information, see Giambattista Amati: Information Theoretic Approach to Information Extraction. FQAS 2006: 519-529 DOI 10.1007/11766254_44
Property qe.feedback.selector
Used in org.terrier.querying.QueryExpansion
Possible values classname, or comma-delimited class names
Default value PseudoRelevanceFeedbackSelector
Configures Class(es) that select feedback documents for query expansion. All classes must implement FeedbackSelector. If more than one is specified, then a chain is assumed, with last being innermost in the chain.
Property qe.expansion.terms.class
Used in org.terrier.querying.QueryExpansion
Possible values classname, or comma-delimited class names
Default value DFRBagExpansionTerms
Configures Class(es) that select terms during query expansion. All classes must extend ExpansionTerms. If more than one is specified, then a chain is assumed, with last being innermost in the chain.

Querying

Property match.empty.query
Used in org.terrier.matching.Matching, org.terrier.matching.LMMatching
Possible values true, false
Default value true
Configures If true, return all documents for an empty query. Use this if you have post filter/processes to filter out the documents. E.g. link: site: etc
Property querying.allowed.controls
Used in org.terrier.querying.Manager
Possible values Comma delimited list of which controls are allowed to be specified on the query. For use in interactive querying.
Default value c, range
Configures Comma delimited list of which controls are allowed to be specified on the query. For use in interactive querying. "String:String" in the query are assumed to be fields unless the first string is an allowed control. An example value would be: c, range, link, site.
Property querying.default.controls
Used in org.terrier.querying.Manager
Possible values Comma delimited list of control names and values. Names and values are separated by colon.
Default value not specified
Configures Sets the defaults control values for the querying process. Controls are used to control the querying process, and may be used to set matching models, post filters post processes etc. An example value would be: c:10,site:gla.ac.uk
Property querying.postprocesses.order
Used in org.terrier.querying.Manager
Possible values Comma delimited list of all allowed post processes.
Default value not specified
Configures Specifies the order in which post processes may be be called, and those that may be called. This is because post processes often have inter-dependencies. An example value would be: QueryExpansion,Scope,Site
Property querying.postprocesses.controls
Used in org.terrier.querying.Manager
Possible values Comma and colon delimited list of control names and post process names.
Default value not specified
Configures Specifies which controls enable which post processes. An example value would be: site:Site,qe:QueryExpansion,scope:Scope
Property querying.postfilters.order
Used in org.terrier.querying.Manager
Possible values Comma delimited list of all allowed post filters.
Default value not specified
Configures Specifies the order in which post filters may be be called, and those that may be called. This is because post filters often have inter-dependencies. An example value would be: LinkFilter
Property querying.postfilters.controls
Used in org.terrier.querying.Manager
Possible values Comma and colon delimited list of control names and post filter names.
Default value not specified
Configures Specifies which controls enable which post filters. An example value would be: link:LinkFilter

Advanced

Property matching.dsms
Used in org.terrier.matching.Matching, org.terrier.matching.LMMatching
Possible values Comma delimited names of classes in uk/ac/gla/terrier/matching/dsms, or other fully qualified models
Default value not specified
Configures Specifies the static org.terrier.matching.dsms.DocumentScoreModifiers that should be applied to all terms of all queries.
Property matching.retrieved_set_size
Used in org.terrier.matching.Matching, org.terrier.matching.LMMatching
Possible values integer values > 0
Default value 1000
Configures Maximum size of the result set.

Desktop Terrier

Property desktop.file.associations
Used in org.terrier.applications.desktop.filehandling.AssociationFileOpener
Possible values absolute path to filename
Default value TERRIER_VAR/desktop.fileassoc
Configures the name of the file in which we save the file type associations with applications. If no absolute path is specified it will be presumed by TERRIER_HOME/var
Property desktop.indexing.singlepass
Used in org.terrier.applications.desktop.DesktopTerrier
Possible values true, false
Default value false
Configures Whether single-pass indexing is used by in the Desktop Terrier.
Property desktop.directories.spec Used in org.terrier.applications.desktop.DesktopTerrier Possible values absolute path to filename Default value TERRIER_VAR/desktop.spec Configures the name of the file that holds a list of directories that are to be indexed by the Desktop Terrier application
Property desktop.directories.filelist
Used in org.terrier.applications.desktop.DesktopTerrier
Possible values absolute path to filename
Default value TERRIER_VAR\index\data.filelist
Configures the name of the file in which we list all files that have been indexed

Miscellaneous

Property trec.collection.pointers
Used in org.terrier.indexing.TRECCollection
Possible values full path to filename
Default value TERRIER_INDEX_PATH + "docpointers.col"
Configures The name of a file that saves pointers for each file to the original text in the collection files.
Property stopwords.intern.terms
Used in org.terrier.terms.Stopwords
Possible values true, false
Default value false
Configures Whether stopwords should be interned during indexing.
Property string.use_utf
Used in org.terrier.structures.TRECQuery, org.terrier.structures.LexiconMerger, org.terrier.indexing.BasicIndexer, org.terrier.indexing.BlockIndexer, org.terrier.indexing.SimpleXMLCollection
Possible values true, false
Default value false
Configures Whether UTF support should be enabled.

[Previous: Hadoop MapReduce Indexing with Terrier] [Contents] [Next: Developing with Terrier]