Terrier IR Platform
2.2.1

uk.ac.gla.terrier.utility
Class ApplicationSetup

java.lang.Object
  extended by uk.ac.gla.terrier.utility.ApplicationSetup

public class ApplicationSetup
extends java.lang.Object

This class retrieves and provides access to all the constants and parameters for the system. When it is statically initialised, it loads the properties file specified by the system property terrier.setup. If this is not specified, then the default value is the value of the terrier.home system property, appended by etc/terrier.properties.
eg java -D terrier.home=$TERRIER_HOME -Dterrier.setup=$TERRIER_HOME/etc/terrier.properties TrecTerrier

System Properties used:

terrier.setupSpecifies where the terrier.properties file can be found.
terrier.homeSpecified where Terrier has been installed, if the terrier.properties file cannot be found, or the terrier.properties file does not specify the terrier.home in it.
NB:In the future, this may further default to $TERRIER_HOME from the environment.
file.separatorWhat separates directory names in this platform. Set automatically by Java
line.separatorWhat separates lines in a file on this platform. Set automatically by Java

In essence, for Terrier to function properly, you need to specify one of the following on the command line:

Any property defined in the properties file can be overridden as follows:

Version:
$Revision: 1.71 $
Author:
Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald

Nested Class Summary
static interface ApplicationSetup.TerrierApplicationPlugin
           
 
Field Summary
static boolean BLOCK_INDEXING
          Specifies whether block information will be used for indexing.
static int BLOCK_SIZE
          The size of a block of terms in a document.
static int BUNDLE_SIZE
          The number of documents to be processed as a group during indexing.
static java.lang.String COLLECTION_SPEC
          The name of the file that contains the list of resources to be processed during indexing.
static java.lang.String DEFAULT_LOG4J_CONFIG
          Default log4j config Terrier loads if no TERRIER_ETC/terrier-log.xml file exists
static java.lang.String DF_SUFFIX
          The suffix of the direct index.
static java.lang.String DIRECT_FILENAME
          The filename of the direct file.
static java.lang.String DOC_INDEX_SUFFIX
          The suffix of the file that contains the document index.
static int DOCNO_BYTE_LENGTH
          The number of bytes used to store a document number.
static int DOCS_CHECK_SINGLEPASS
          Number of documents between each memory check in the single pass inversion method.
static java.lang.String DOCUMENT_INDEX_FILENAME
          The filename of the document index.
static java.lang.String EOL
          The new line character used by the operating system.
static int EXPANSION_DOCUMENTS
          The number of top ranked documents considered for expanding the query.
static java.lang.String EXPANSION_MODELS
          The name of the file which contains the query expansion methods used.
static int EXPANSION_TERMS
          The number of terms added to the original query.
static boolean FIELD_QUERYING
          Specifies whether fields will be used for querying.
static java.lang.String FILE_SEPARATOR
          The file separator used by the operating system.
static java.lang.String IFSUFFIX
          The suffix of the inverted file.
static boolean IGNORE_EMPTY_DOCUMENTS
          Ignore or not empty documents.
static java.lang.String INVERTED_FILENAME
          The filename of the inverted file.
static java.lang.String LEXICON_FILENAME
          The filename of the lexicon file.
static java.lang.String LEXICON_HASH_SUFFIX
          The suffix of the lexicon hash file.
static java.lang.String LEXICON_INDEX_FILENAME
          The filename of the lexicon index file.
static java.lang.String LEXICON_INDEX_SUFFIX
          The suffix of the lexicon index file that contains the offset of each term in the lexicon.
static java.lang.String LEXICONSUFFIX
          The suffix of the file that contains the lexicon.
static java.lang.String LOG_FILENAME
          The filename of the log (statistics) file.
static java.lang.String LOG_SUFFIX
          The suffix of the file that contains the collection statistics.
static java.lang.String LOG4J_CONFIG
          The configuration file used by log4j
static int MAX_BLOCKS
          The maximum number of blocks in a document.
static int MAX_TERM_LENGTH
          The maximum size of a term.
static int MEMORY_THRESHOLD_SINGLEPASS
          Memory threshold in the single pass inversion method.
static java.lang.String MERGE_PREFIX
          The prefix of the temporary merged files, which are created during merging the lexicon files.
static int MERGE_TEMP_NUMBER
          A progressive number which is assigned to the temporary lexicon files built during the indexing.
static java.lang.String PROPERTIES_SUFFIX
          The suffix of the file that contains the index properties.
static int STRING_BYTE_LENGTH
          The number of bytes used to store a term.
static java.lang.String TERRIER_ETC
          The directory under which the configuration files of Terrier are stored.
static java.lang.String TERRIER_HOME
          The directory under which the application is installed.
static java.lang.String TERRIER_INDEX_PATH
          The name of the directory where the inverted file and other data structures are stored.
static java.lang.String TERRIER_INDEX_PREFIX
          The prefix of the data structures' filenames.
static java.lang.String TERRIER_SHARE
          The name of the directory where installation independent read-only data is stored.
static java.lang.String TERRIER_VAR
          The name of the directory where the data structures and the output of Terrier are stored.
static java.lang.String TERRIER_VERSION
           
static java.lang.String TREC_MODELS
          The filename of the file that contains the weighting models to be used.
static java.lang.String TREC_QRELS
          The name of the file that contains a list of qrels files to be used for evaluation.
static java.lang.String TREC_RESULTS
          The name of the directory where the results are stored.
static java.lang.String TREC_RESULTS_SUFFIX
          The suffix of the files, where the results are stored.
static java.lang.String TREC_TOPICS_LIST
          The name of the file that contains a list of files where queries are stored.
 
Constructor Summary
ApplicationSetup()
           
 
Method Summary
static void configure(java.io.InputStream propertiesStream)
           
 ApplicationSetup.TerrierApplicationPlugin getPlugin(java.lang.String name)
          Return a loaded plugin by name.
static java.util.Properties getProperties()
           
static java.lang.String getProperty(java.lang.String propertyKey, java.lang.String defaultValue)
          Returns the value for the specified property, given a default value, in case the property was not defined during the initialization of the system.
static java.util.Properties getUsedProperties()
          Returns a properties object detailing all the properties fetched during the lifetime of this class.
static void loadCommonProperties()
           
static java.lang.String makeAbsolute(java.lang.String filename, java.lang.String DefaultPath)
          Checks whether the given filename is absolute and if not, it adds on the default path to make it absolute.
static void setDefaultProperty(java.lang.String propertyKey, java.lang.String defaultValue)
          set a property value only if it has not already been set
static void setProperty(java.lang.String propertyKey, java.lang.String value)
          Sets a value for the specified property.
static void setupFilenames()
          Sets up the names of the inverted file, the direct file, the document index file and the lexicon file.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TERRIER_VERSION

public static final java.lang.String TERRIER_VERSION
See Also:
Constant Field Values

DEFAULT_LOG4J_CONFIG

public static final java.lang.String DEFAULT_LOG4J_CONFIG
Default log4j config Terrier loads if no TERRIER_ETC/terrier-log.xml file exists

Since:
1.1.0
See Also:
Constant Field Values

FILE_SEPARATOR

public static java.lang.String FILE_SEPARATOR
The file separator used by the operating system. Defaults to the system property file.separator.


EOL

public static java.lang.String EOL
The new line character used by the operating system. Defaults to the system property line.separator.


TERRIER_HOME

public static java.lang.String TERRIER_HOME
The directory under which the application is installed. It corresponds to the property terrier.home and it should be set in the properties file, or as a property on the command line.


TERRIER_ETC

public static java.lang.String TERRIER_ETC
The directory under which the configuration files of Terrier are stored. The corresponding property is terrier.etc and it should be set in the properties file. If a relative path is given, TERRIER_HOME will be prefixed.


TERRIER_SHARE

public static java.lang.String TERRIER_SHARE
The name of the directory where installation independent read-only data is stored. Files like stopword lists, and example and testing data are examples. The corresponding property is terrier.share and its default value is share. If a relative path is given, then TERRIER_HOME will be prefixed.


TERRIER_VAR

public static java.lang.String TERRIER_VAR
The name of the directory where the data structures and the output of Terrier are stored. The corresponding property is terrier.var and its default value is var. If a relative path is given, TERRIER_HOME will be prefixed.


TERRIER_INDEX_PATH

public static java.lang.String TERRIER_INDEX_PATH
The name of the directory where the inverted file and other data structures are stored. The default value is InvFileCollection but it can be overridden with the property terrier.index.path. If a relative path is given, TERRIER_VAR will be prefixed.


COLLECTION_SPEC

public static java.lang.String COLLECTION_SPEC
The name of the file that contains the list of resources to be processed during indexing. The contents of this file are collection implementation dependent. For example, for a TREC collection, this file must contain the list of files to index. The corresponding property is collection.spec and by default its value is collection.spec. If a relative path is given, TERRIER_ETC will be prefixed.


TREC_RESULTS

public static java.lang.String TREC_RESULTS
The name of the directory where the results are stored. The corresponding property is trec.results and the default value is results. If a relative path is given, TERRIER_VAR will be prefixed.


TREC_TOPICS_LIST

public static java.lang.String TREC_TOPICS_LIST
The name of the file that contains a list of files where queries are stored. The corresponding property is trec.topics.list and the default value is trec.topics.list. If a relative path is given, TERRIER_ETC will be prefixed.


TREC_QRELS

public static java.lang.String TREC_QRELS
The name of the file that contains a list of qrels files to be used for evaluation. The corresponding property is trec.qrels and its default value is trec.qrels. If a relative path is given, TERRIER_ETC will be prefixed.


TREC_RESULTS_SUFFIX

public static java.lang.String TREC_RESULTS_SUFFIX
The suffix of the files, where the results are stored. It corresponds to the property trec.results.suffix and the default value is .res.


TREC_MODELS

public static java.lang.String TREC_MODELS
The filename of the file that contains the weighting models to be used. The corresponding property is trec.models and the default value is trec.models. If a relative path is given, then it is prefixed with TERRIER_ETC.


IFSUFFIX

public static java.lang.String IFSUFFIX
The suffix of the inverted file. The corresponding property is if.suffix and by default the value of this property is .if


LEXICONSUFFIX

public static java.lang.String LEXICONSUFFIX
The suffix of the file that contains the lexicon. The corresponding property is lexicon.suffix and by default the value of this property is .lex


DOC_INDEX_SUFFIX

public static java.lang.String DOC_INDEX_SUFFIX
The suffix of the file that contains the document index. The corresponding property is doc.index.suffix and by default the value of this property is .docid


LEXICON_INDEX_SUFFIX

public static java.lang.String LEXICON_INDEX_SUFFIX
The suffix of the lexicon index file that contains the offset of each term in the lexicon. The corresponding property is lexicon.index.suffix and by default its value is .lexid.


LEXICON_HASH_SUFFIX

public static java.lang.String LEXICON_HASH_SUFFIX
The suffix of the lexicon hash file. Corresponding property is lexicon.hash.suffix, default is ".lexhash".


LOG_SUFFIX

public static java.lang.String LOG_SUFFIX
The suffix of the file that contains the collection statistics. It corresponds to the property log.suffix and by default the value of this property is .log


PROPERTIES_SUFFIX

public static java.lang.String PROPERTIES_SUFFIX
The suffix of the file that contains the index properties. It corresponds to the property indexproperties.suffix and by default the value of this property is .log


DF_SUFFIX

public static java.lang.String DF_SUFFIX
The suffix of the direct index. It corresponds to the property df.suffix and by default the value of this property is .df


MERGE_PREFIX

public static java.lang.String MERGE_PREFIX
The prefix of the temporary merged files, which are created during merging the lexicon files. It corresponds to the property merge.prefix and the default value is MRG_.


MERGE_TEMP_NUMBER

public static int MERGE_TEMP_NUMBER
A progressive number which is assigned to the temporary lexicon files built during the indexing. It is used to keep track of the order with which the temporary files were created. It corresponds to the property merge.temp.number and the default value is 100000


BUNDLE_SIZE

public static int BUNDLE_SIZE
The number of documents to be processed as a group during indexing. For each such group of documents, a temporary lexicon is built, and after indexing, all temporary lexicons are merged in order to create a single lexicon. It corresponds to the property bundle.size and the default value is 2000.


STRING_BYTE_LENGTH

public static int STRING_BYTE_LENGTH
The number of bytes used to store a term. Corresponds to MAX_TERM_LENGTH if not using UTF, and 3*MAX_TERM_LENGTH if using UTF. No property is associated. UTF support can be enabled by setting the property string.use_utf to true.


DOCNO_BYTE_LENGTH

public static int DOCNO_BYTE_LENGTH
The number of bytes used to store a document number. It corresponds to the property docno.byte.length, and the default value is 20.

Since:
1.1.0

MAX_TERM_LENGTH

public static int MAX_TERM_LENGTH
The maximum size of a term. It corresponds to the the property max.term.length, and the default value is 20.

Since:
1.1.0

IGNORE_EMPTY_DOCUMENTS

public static boolean IGNORE_EMPTY_DOCUMENTS
Ignore or not empty documents. That is, if it is true, then a document that does not contain any terms will have a corresponding entry in the .docid file and the total number of documents in the statistics will be the total number of documents in the collection, even if some of them are empty. It corresponds to the property ignore.empty.documents and the default value is false.


TERRIER_INDEX_PREFIX

public static java.lang.String TERRIER_INDEX_PREFIX
The prefix of the data structures' filenames. It corresponds to the property terrier.index.prefix and the default value is data.


INVERTED_FILENAME

public static java.lang.String INVERTED_FILENAME
The filename of the inverted file.


DIRECT_FILENAME

public static java.lang.String DIRECT_FILENAME
The filename of the direct file.


DOCUMENT_INDEX_FILENAME

public static java.lang.String DOCUMENT_INDEX_FILENAME
The filename of the document index.


LEXICON_FILENAME

public static java.lang.String LEXICON_FILENAME
The filename of the lexicon file.


LEXICON_INDEX_FILENAME

public static java.lang.String LEXICON_INDEX_FILENAME
The filename of the lexicon index file.


LOG_FILENAME

public static java.lang.String LOG_FILENAME
The filename of the log (statistics) file.


EXPANSION_TERMS

public static int EXPANSION_TERMS
The number of terms added to the original query. The corresponding property is expansion.terms and the default value is 10.


EXPANSION_DOCUMENTS

public static int EXPANSION_DOCUMENTS
The number of top ranked documents considered for expanding the query. The corresponding property is expansion.documents and the default value is 3.


EXPANSION_MODELS

public static java.lang.String EXPANSION_MODELS
The name of the file which contains the query expansion methods used. The corresponding property is expansion.models and the default value is qemodels. If a relative path is given, it is prefixed with TERRIER_ETC.


BLOCK_SIZE

public static int BLOCK_SIZE
The size of a block of terms in a document. The corresponding property is block.size and the default value is 1.


MAX_BLOCKS

public static int MAX_BLOCKS
The maximum number of blocks in a document. The corresponding property is max.blocks and the default value is 100000.


BLOCK_INDEXING

public static boolean BLOCK_INDEXING
Specifies whether block information will be used for indexing. The corresponding property is block.indexing and the default value is false. The value of this property cannot be modified after the index of a collection has been built.


FIELD_QUERYING

public static boolean FIELD_QUERYING
Specifies whether fields will be used for querying. The corresponding property is field.querying and the default value is false.


MEMORY_THRESHOLD_SINGLEPASS

public static int MEMORY_THRESHOLD_SINGLEPASS
Memory threshold in the single pass inversion method. If a memory check is below this value, the postings in memory are written to disk. The default value is 50M (100M for 64bit JVMs), and this can be configured using the property memory.reserved.


DOCS_CHECK_SINGLEPASS

public static int DOCS_CHECK_SINGLEPASS
Number of documents between each memory check in the single pass inversion method. The default value is 20, and this can be configured using the property docs.check.


LOG4J_CONFIG

public static java.lang.String LOG4J_CONFIG
The configuration file used by log4j

Constructor Detail

ApplicationSetup

public ApplicationSetup()
Method Detail

loadCommonProperties

public static void loadCommonProperties()

configure

public static void configure(java.io.InputStream propertiesStream)
                      throws java.io.IOException
Throws:
java.io.IOException

getProperty

public static java.lang.String getProperty(java.lang.String propertyKey,
                                           java.lang.String defaultValue)
Returns the value for the specified property, given a default value, in case the property was not defined during the initialization of the system. The property values are read from the properties file. If the value of the property terrier.usecontext is true, then the properties file is overridden by the context. If the value of the property terrier.usecontext is false, then the properties file is overridden

Parameters:
propertyKey - The property to be returned
defaultValue - The default value used, in case it is not defined
Returns:
the value for the given property.

getUsedProperties

public static java.util.Properties getUsedProperties()
Returns a properties object detailing all the properties fetched during the lifetime of this class. It is of note that this is NOT the underlying appProperties table, as to update that would mean that properties fetched using their defaults, could not have different defaults in different places.


getProperties

public static java.util.Properties getProperties()

setProperty

public static void setProperty(java.lang.String propertyKey,
                               java.lang.String value)
Sets a value for the specified property. The properties set with this method are not saved in the properties file.

Parameters:
propertyKey - the name of the property to set.
value - the value of the property to set.

setDefaultProperty

public static void setDefaultProperty(java.lang.String propertyKey,
                                      java.lang.String defaultValue)
set a property value only if it has not already been set

Parameters:
propertyKey - the name of the property to set.
defaultValue - the value of the property to set.

setupFilenames

public static void setupFilenames()
Sets up the names of the inverted file, the direct file, the document index file and the lexicon file.


getPlugin

public ApplicationSetup.TerrierApplicationPlugin getPlugin(java.lang.String name)
Return a loaded plugin by name. Returns null if a plugin of that name has not been loaded


makeAbsolute

public static java.lang.String makeAbsolute(java.lang.String filename,
                                            java.lang.String DefaultPath)
Checks whether the given filename is absolute and if not, it adds on the default path to make it absolute. If a URI scheme is present, the filename is assumed to be absolute

Parameters:
filename - String the filename to make absolute
DefaultPath - String the prefix to add
Returns:
the absolute filename

Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow