org.terrier.applications
Class TRECQuerying

java.lang.Object
  extended by org.terrier.applications.TRECQuerying
Direct Known Subclasses:
TRECQueryingExpansion

public class TRECQuerying
extends java.lang.Object

This class performs a batch mode retrieval from a set of TREC queries.

Configuring

In the following, we list the main ways for configuring TRECQuerying, before exhaustively listing the properties that can affect TRECQuerying.

Topics

Files containing topics (queries to be evaluated) should be set using the trec.topics property. Multiple topic files can be used together by separating their filenames using commas. By default TRECQuerying assumes TREC tagged topic files, e.g.:
 <top>
 <num> Number 1 </num>
 <title> Query terms </title>
 <desc> Description : A setence about the information need </desc>
 <narr> Narrative: More sentences about what is relevant or not</narr>
 </top>
 
If you have a topic files in a different format, you can used a differed QuerySource by setting the property trec.topics.parser. For instance trec.topics.parser=SingleLineTRECQuery should be used for topics where one line is one query. See TRECQuery and SingleLineTRECQuery for more information.

Models

By default, Terrier uses the InL2 retrieval model for all runs. If the trec.model property is specified, then all runs will be made using that weighting model. You can change this by specifying another model using the property trec.model. E.g., to use PL2, set trec.model=PL2. Similarly, when query expansion is enabled, the default query expansion model is Bo1, controlled by the property trec.qe.model.

Result Files

The results from the system are output in a trec_eval compatable format. The filename of the results file is specified as the WEIGHTINGMODELNAME_cCVALUE.RUNNO.res, in the var/results folder. RUNNO is (usually) a constantly increasing number, as specified by a file in the results folder. The location of the results folder can be altered by the trec.results property. If the property trec.querycounter.type is not set to sequential, the RUNNO will be a string including the time and a randomly generated number. This is best to use when many instances of Terrier are writing to the same results folder, as the incrementing RUNNO method is not mult-process safe (eg one Terrier could delete it while another is reading it).

Properties

Author:
Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, Nut Limsopatham

Nested Class Summary
static class TRECQuerying.NullOutputFormat
          An OutputFormat instance which does nothing.
static interface TRECQuerying.OutputFormat
          interface for adjusting the output of TRECQuerying
static interface TRECQuerying.QueryResultCache
          Interface for introducing caching strategies into TRECQuerying
static interface TRECQuerying.QuerySource
          This interface denotes a source of queries for batch evaluation
static class TRECQuerying.TRECDocidOutputFormat
          OutputFormat for writing TREC runs where the docnos are NOT looked up, but instead the (integer, internal) docids are recorded in the .res file.
static class TRECQuerying.TRECDocnoOutputFormat
          Standard OutputFormat for writing TREC runs
 
Field Summary
protected  java.lang.String defaultQEModel
          The name of the query expansion model used.
protected static boolean DUMP_SETTINGS
          Dump the current settings along with the results.
protected  Index index
          The object that encapsulates the data structures used by Terrier.
protected static org.apache.log4j.Logger logger
          The logger used
protected  java.lang.String managerName
          The name of the manager object that handles the queries.
protected  int matchingCount
          The number of matched queries.
protected  java.lang.String method
          The method - ie the weighting model and parameters.
protected  java.lang.String mModel
          The name of the matching model that is used for retrieval.
protected  TRECQuerying.OutputFormat printer
          Where results of the stream of queries are output to.
protected  boolean queryexpansion
          the boolean indicates whether to expand queries
protected  Manager queryingManager
          The manager object that handles the queries.
protected  TRECQuerying.QuerySource querySource
          Where the stream of queries is obtained from.
protected static java.util.Random random
          random number generator
protected static boolean removeQueryPeriods
           
protected  java.io.PrintWriter resultFile
          The file to store the output to.
protected  TRECQuerying.QueryResultCache resultsCache
          results are obtained a query cache is one is enabled.
protected  java.lang.String resultsFilename
          The filename of the last file results were output to.
protected  java.lang.String topicsParser
          What class parse to parse the batch topic files.
protected  java.lang.String wModel
          The name of the weighting model that is used for retrieval.
 
Constructor Summary
TRECQuerying()
          TRECQuerying default constructor initialises the inverted index, the lexicon and the document index structures.
TRECQuerying(boolean _queryexpansion)
          TRECQuerying constructor initialises the inverted index, the lexicon and the document index structures.
TRECQuerying(Index i)
          TRECQuerying constructor initialises the specified inverted index, the lexicon and the document index structures.
 
Method Summary
 void close()
          Closes the used structures.
protected  void createManager()
          Create a querying manager.
protected  void finishedQueries()
          After finishing with a batch of queries, close the result file
 Index getIndex()
          Get the index pointer.
 Manager getManager()
          Get the querying manager.
protected  java.lang.String getNextQueryCounter(java.lang.String resultsFolder)
          Get the sequential number of the next result file in the results folder.
protected  TRECQuerying.OutputFormat getOutputFormat()
           
protected  TRECQuerying.QuerySource getQueryParser()
          Get the query parser that is being used.
protected  java.lang.String getRandomQueryCounter()
          Get a random number between 0 and 1000.
 java.io.PrintWriter getResultFile(java.lang.String predefinedName)
          Returns a PrintWriter used to store the results.
protected  TRECQuerying.QueryResultCache getResultsCache()
          Obtain the query cache.
protected  java.lang.String getSequentialQueryCounter(java.lang.String resultsFolder)
          Get the sequential number of the current result file in the results folder.
protected  void initSearchRequestModification(java.lang.String queryId, SearchRequest srq)
           
protected  void loadIndex()
          Loads index(s) from disk.
protected  void preQueryingSearchRequestModification(java.lang.String queryId, SearchRequest srq)
           
 void printSettings(SearchRequest default_q, java.lang.String[] topicsFiles, java.lang.String otherComments)
          prints the current settings to a file with the same name as the current results file.
 java.lang.String processQueries()
          Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism.
 java.lang.String processQueries(double c)
          Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism.
 java.lang.String processQueries(double c, boolean c_set)
          Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism.
 SearchRequest processQuery(java.lang.String queryId, java.lang.String query)
          According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.
 SearchRequest processQuery(java.lang.String queryId, java.lang.String query, double cParameter)
          According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.
 SearchRequest processQuery(java.lang.String queryId, java.lang.String query, double cParameter, boolean c_set)
          According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.
protected  void processQueryAndWrite(java.lang.String queryId, java.lang.String query, double cParameter, boolean c_set)
          According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.
 void setIndex(Index i)
          Set the index pointer.
protected  void startingBatchOfQueries()
          Before starting a batch of queries, this method is called by processQueries()
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

defaultQEModel

protected java.lang.String defaultQEModel
The name of the query expansion model used.


logger

protected static final org.apache.log4j.Logger logger
The logger used


removeQueryPeriods

protected static boolean removeQueryPeriods

random

protected static final java.util.Random random
random number generator


matchingCount

protected int matchingCount
The number of matched queries.


queryexpansion

protected boolean queryexpansion
the boolean indicates whether to expand queries


resultFile

protected java.io.PrintWriter resultFile
The file to store the output to.


resultsFilename

protected java.lang.String resultsFilename
The filename of the last file results were output to.


DUMP_SETTINGS

protected static boolean DUMP_SETTINGS
Dump the current settings along with the results. Controlled by property trec.querying.dump.settings, defaults to true.


managerName

protected java.lang.String managerName
The name of the manager object that handles the queries. Set by property trec.manager, defaults to Manager.


queryingManager

protected Manager queryingManager
The manager object that handles the queries.


wModel

protected java.lang.String wModel
The name of the weighting model that is used for retrieval. Defaults to PL2


mModel

protected java.lang.String mModel
The name of the matching model that is used for retrieval. Defaults to Matching


index

protected Index index
The object that encapsulates the data structures used by Terrier.


method

protected java.lang.String method
The method - ie the weighting model and parameters. Examples: TF_IDF, PL2c1.0


topicsParser

protected java.lang.String topicsParser
What class parse to parse the batch topic files. Configured by property trec.topics.parser


querySource

protected TRECQuerying.QuerySource querySource
Where the stream of queries is obtained from. Configured by property trec.topics.parser


printer

protected TRECQuerying.OutputFormat printer
Where results of the stream of queries are output to. Specified by property trec.querying.outputformat - defaults to TRECDocnoOutputFormat


resultsCache

protected TRECQuerying.QueryResultCache resultsCache
results are obtained a query cache is one is enabled. Configured to a class using property trec.querying.resultscache. Defaults to NullQueryResultCache (no caching).

Constructor Detail

TRECQuerying

public TRECQuerying()
TRECQuerying default constructor initialises the inverted index, the lexicon and the document index structures.


TRECQuerying

public TRECQuerying(boolean _queryexpansion)
TRECQuerying constructor initialises the inverted index, the lexicon and the document index structures.


TRECQuerying

public TRECQuerying(Index i)
TRECQuerying constructor initialises the specified inverted index, the lexicon and the document index structures.

Parameters:
i - The specified index.
Method Detail

getResultsCache

protected TRECQuerying.QueryResultCache getResultsCache()
Obtain the query cache. Loads the class specified by property trec.querying.resultscache


getOutputFormat

protected TRECQuerying.OutputFormat getOutputFormat()

createManager

protected void createManager()
Create a querying manager. This method should be overriden if another matching model is required.


loadIndex

protected void loadIndex()
Loads index(s) from disk.


getIndex

public Index getIndex()
Get the index pointer.

Returns:
The index pointer.

setIndex

public void setIndex(Index i)
Set the index pointer.

Parameters:
i - The index pointer.

getManager

public Manager getManager()
Get the querying manager.

Returns:
The querying manager.

close

public void close()
Closes the used structures.


getNextQueryCounter

protected java.lang.String getNextQueryCounter(java.lang.String resultsFolder)
Get the sequential number of the next result file in the results folder.

Parameters:
resultsFolder - The path of the results folder.
Returns:
The sequential number of the next result file in the results folder.

getRandomQueryCounter

protected java.lang.String getRandomQueryCounter()
Get a random number between 0 and 1000.

Returns:
A random number between 0 and 1000.

getSequentialQueryCounter

protected java.lang.String getSequentialQueryCounter(java.lang.String resultsFolder)
Get the sequential number of the current result file in the results folder.

Parameters:
resultsFolder - The path of the results folder.
Returns:
The sequential number of the current result file in the results folder.

getResultFile

public java.io.PrintWriter getResultFile(java.lang.String predefinedName)
Returns a PrintWriter used to store the results.

Parameters:
predefinedName - java.lang.String a non-standard prefix for the result file.
Returns:
a handle used as a destination for storing results.

processQuery

public SearchRequest processQuery(java.lang.String queryId,
                                  java.lang.String query)
According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.

Parameters:
queryId - the identifier of the query to process.
query - the query to process.

processQuery

public SearchRequest processQuery(java.lang.String queryId,
                                  java.lang.String query,
                                  double cParameter)
According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.

Parameters:
queryId - the identifier of the query to process.
query - the query to process.
cParameter - double the value of the parameter to use.

processQueryAndWrite

protected void processQueryAndWrite(java.lang.String queryId,
                                    java.lang.String query,
                                    double cParameter,
                                    boolean c_set)
According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.

Parameters:
queryId - the identifier of the query to process.
query - the query to process.
cParameter - double the value of the parameter to use.
c_set - A boolean variable indicating if cParameter has been specified.

processQuery

public SearchRequest processQuery(java.lang.String queryId,
                                  java.lang.String query,
                                  double cParameter,
                                  boolean c_set)
According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.

Parameters:
queryId - the identifier of the query to process.
query - the query to process.
cParameter - double the value of the parameter to use.
c_set - boolean specifies whether the parameter c is set.

preQueryingSearchRequestModification

protected void preQueryingSearchRequestModification(java.lang.String queryId,
                                                    SearchRequest srq)

initSearchRequestModification

protected void initSearchRequestModification(java.lang.String queryId,
                                             SearchRequest srq)

processQueries

public java.lang.String processQueries()
Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism. It parses the file with the queries (the name of the file is defined in the address_query file), creates the file of results, and for each query, gets the relevant documents, scores them, and outputs the results to the result file.

Returns:
String the filename that the results have been written to

processQueries

public java.lang.String processQueries(double c)
Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism. It parses the file with the queries, creates the file of results, and for each query, gets the relevant documents, scores them, and outputs the results to the result file. It the term frequency normalisation parameter equal to the given value.

Parameters:
c - double the value of the term frequency parameter to use.
Returns:
String the filename that the results have been written to

getQueryParser

protected TRECQuerying.QuerySource getQueryParser()
Get the query parser that is being used.

Returns:
The query parser that is being used.

processQueries

public java.lang.String processQueries(double c,
                                       boolean c_set)
Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism. It parses the file with the queries creates the file of results, and for each query, gets the relevant documents, scores them, and outputs the results to the result file.

Queries
Queries are parsed from file, specified by the trec.topics property (comma delimited)

Parameters:
c - the value of c.
c_set - specifies whether a value for c has been specified.
Returns:
String the filename that the results have been written to

startingBatchOfQueries

protected void startingBatchOfQueries()
Before starting a batch of queries, this method is called by processQueries()

Since:
2.2

finishedQueries

protected void finishedQueries()
After finishing with a batch of queries, close the result file


printSettings

public void printSettings(SearchRequest default_q,
                          java.lang.String[] topicsFiles,
                          java.lang.String otherComments)
prints the current settings to a file with the same name as the current results file. this assists in tracing the settings used to generate a given run.



Terrier 3.5. Copyright © 2004-2011 University of Glasgow