Class TRECQuerying
- java.lang.Object
-
- org.terrier.applications.AbstractQuerying
-
- org.terrier.applications.batchquerying.TRECQuerying
-
- Direct Known Subclasses:
ParallelTRECQuerying
public class TRECQuerying extends AbstractQuerying
This class performs a batch mode retrieval from a set of TREC queries.Configuring
In the following, we list the main ways for configuring TRECQuerying, before exhaustively listing the properties that can affect TRECQuerying.
Topics
Files containing topics (queries to be evaluated) should be set using the trec.topics property. Multiple topic files can be used together by separating their filenames using commas. By default TRECQuerying assumes TREC tagged topic files, e.g.:<top> <num> Number 1 </num> <title> Query terms </title> <desc> Description : A sentence about the information need </desc> <narr> Narrative: More sentences about what is relevant or not</narr> </top>
If you have a topic files in a different format, you can used a differed QuerySource by setting the property trec.topics.parser. For instance trec.topics.parser=SingleLineTRECQuery should be used for topics where one line is one query. SeeTRECQuery
andSingleLineTRECQuery
for more information.Models
By default, Terrier uses theInL2
retrieval model for all runs. If the trec.model property is specified, then all runs will be made using that weighting model. You can change this by specifying another model using the property trec.model. E.g., to usePL2
, set trec.model=PL2. Similarly, when query expansion is enabled, the default query expansion model isBo1
, controlled by the property trec.qe.model.Result Files
The results from the system are output in a trec_eval compatable format. The filename of the results file is specified as the WEIGHTINGMODELNAME_cCVALUE.RUNNO.res, in the var/results folder. RUNNO is (usually) a constantly increasing number, as specified by a file in the results folder. The location of the results folder can be altered by the trec.results property. If the property trec.querycounter.type is not set to sequential, the RUNNO will be a string including the time and a randomly generated number. This is best to use when many instances of Terrier are writing to the same results folder, as the incrementing RUNNO method is not mult-process safe (eg one Terrier could delete it while another is reading it).Properties
- trec.topics.parser - the query parser that parses the topic file(s).
TRECQuery
by default. Subclass theTRECQuery
class and alter this property if your topics come in a very different format to those of TREC. - trec.topics - the name of the topic file. Multiple topics files can be used, if separated by comma.
- trec.topics.matchopql - if the topics should be parsed using the matchopql parser. Defaults to false.
- trec.model the name of the weighting model to be used during retrieval. Default InL2
- trec.qe.model the name of the query expansion model to be used during query expansion. Default Bo1.
- c - the term frequency normalisation parameter value. A value specified at runtime as an API parameter (e.g. TrecTerrier -c) overrides this property.
- trec.matching the name of the matching model that is used for retrieval. Defaults to org.terrier.matching.daat.Full.
- trec.results the location of the results folder for results. Defaults to TERRIER_VAR/results/
- trec.results.file the exact result filename to be output. Defaults to an automatically generated filename - see trec.querycounter.type.
- trec.querycounter.type - how the number (RUNNO) at the end of a run file should be generated. Defaults to sequential, in which case RUNNO is a constantly increasing number. Otherwise it is a string including the time and a randomly generated number.
- trec.output.format.length - the very maximum number of results ever output per-query into the results file . Default value 1000. 0 means no limit.
- trec.iteration - the contents of the Iteration column in the trec_eval compatible results. Defaults to 0.
- trec.querying.dump.settings - controls whether the settings used to generate a results file should be dumped to a .settings file in conjunction with the .res file. Defaults to true.
- trec.querying.outputformat - controls class to write the results file. Defaults to
TRECDocnoOutputFormat
. Alternatives:TRECDocnoOutputFormat
,TRECDocidOutputFormat
,NullOutputFormat
- trec.querying.outputformat.docno.meta.key - for
TRECDocnoOutputFormat
, defines the MetaIndex key to use as the docno. Defaults to "docno". - trec.querying.resultscache - controls cache to use for query caching.
Defaults to
NullQueryResultCache
- Author:
- Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, Nut Limsopatham
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
TRECQuerying.Command
-
Nested classes/interfaces inherited from class org.terrier.applications.AbstractQuerying
AbstractQuerying.AbstractQueryingCommand
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
BATCHRETRIEVE_COMMAND
static java.lang.String
BATCHRETRIEVE_PROP_PREFIX
protected java.lang.String
defaultQEModel
The name of the query expansion model used.protected static boolean
DUMP_SETTINGS
Dump the current settings along with the results.protected static java.lang.String
ITERATION
A TREC specific output field.protected static org.slf4j.Logger
logger
The logger usedprotected java.lang.String
method
The method - ie the weighting model and parameters.protected java.lang.String
mModel
The name of the matching model that is used for retrieval.protected OutputFormat
printer
Where results of the stream of queries are output to.protected Manager
queryingManager
The manager object that handles the queries.protected QuerySource
querySource
Where the stream of queries is obtained from.protected static java.util.Random
random
random number generatorprotected static boolean
removeQueryPeriods
protected java.io.PrintWriter
resultFile
The file to store the output to.protected java.io.OutputStream
resultFileRaw
protected static int
RESULTS_LENGTH
The number of results to output.protected QueryResultCache
resultsCache
results are obtained a query cache is one is enabled.protected java.lang.String
resultsFilename
The filename of the last file results were output to.-
Fields inherited from class org.terrier.applications.AbstractQuerying
controls, indexref, matchingCount, matchopQl
-
-
Constructor Summary
Constructors Constructor Description TRECQuerying()
Deprecated.TRECQuerying(IndexRef _indexref)
TRECQuerying constructor initialises the specified inverted index, the lexicon and the document index structures.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Closes the used structures.protected void
createManager()
Create a querying manager.protected void
finishedQueries()
After finishing with a batch of queries, close the result fileIndexRef
getIndexRef()
Get the index pointer.Manager
getManager()
Get the querying manager.protected java.lang.String
getNextQueryCounter(java.lang.String resultsFolder)
Get the sequential number of the next result file in the results folder.protected OutputFormat
getOutputFormat()
static QuerySource
getQueryParser(java.lang.String parserName)
Get the query parser that is being used.protected java.lang.String
getRandomQueryCounter()
Get a random number between 0 and 1000.java.io.PrintWriter
getResultFile(java.lang.String predefinedName)
Returns a PrintWriter used to store the results.protected QueryResultCache
getResultsCache()
Obtain the query cache.protected java.lang.String
getSequentialQueryCounter(java.lang.String resultsFolder)
Get the sequential number of the current result file in the results folder.java.lang.String
getTopicsParser()
protected void
initSearchRequestModification(java.lang.String queryId, SearchRequest srq)
void
intialise()
protected void
loadIndex()
Loads index(s) from disk.protected void
preQueryingSearchRequestModification(java.lang.String queryId, SearchRequest srq)
void
printSettings(SearchRequest default_q, java.lang.String[] topicsFiles, java.lang.String otherComments)
prints the current settings to a file with the same name as the current results file.java.lang.String
processQueries()
Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism.java.lang.String
processQueries(QuerySource _qs)
SearchRequest
processQuery(java.lang.String queryId, java.lang.String query)
According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.protected void
processQueryAndWrite(java.lang.String queryId, java.lang.String query)
According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.void
setTopicsParser(java.lang.String topicsParser)
protected void
startingBatchOfQueries()
Before starting a batch of queries, this method is called by processQueries()-
Methods inherited from class org.terrier.applications.AbstractQuerying
controls
-
-
-
-
Field Detail
-
BATCHRETRIEVE_COMMAND
public static final java.lang.String BATCHRETRIEVE_COMMAND
- See Also:
- Constant Field Values
-
BATCHRETRIEVE_PROP_PREFIX
public static final java.lang.String BATCHRETRIEVE_PROP_PREFIX
- See Also:
- Constant Field Values
-
defaultQEModel
protected java.lang.String defaultQEModel
The name of the query expansion model used.
-
logger
protected static final org.slf4j.Logger logger
The logger used
-
removeQueryPeriods
protected static boolean removeQueryPeriods
-
random
protected static final java.util.Random random
random number generator
-
resultFile
protected volatile java.io.PrintWriter resultFile
The file to store the output to.
-
resultFileRaw
protected java.io.OutputStream resultFileRaw
-
resultsFilename
protected java.lang.String resultsFilename
The filename of the last file results were output to.
-
DUMP_SETTINGS
protected static boolean DUMP_SETTINGS
Dump the current settings along with the results. Controlled by property trec.querying.dump.settings, defaults to true.
-
queryingManager
protected Manager queryingManager
The manager object that handles the queries.
-
mModel
protected java.lang.String mModel
The name of the matching model that is used for retrieval. If not set, defaults to matching configured in the Manager.- See Also:
LocalManager
-
RESULTS_LENGTH
protected static int RESULTS_LENGTH
The number of results to output. Set by property trec.output.format.length.
-
ITERATION
protected static java.lang.String ITERATION
A TREC specific output field.
-
method
protected java.lang.String method
The method - ie the weighting model and parameters. Examples: TF_IDF, PL2c1.0
-
querySource
protected QuerySource querySource
Where the stream of queries is obtained from. Configured by property trec.topics.parser
-
printer
protected OutputFormat printer
Where results of the stream of queries are output to. Specified by property trec.querying.outputformat - defaults to TRECDocnoOutputFormat
-
resultsCache
protected QueryResultCache resultsCache
results are obtained a query cache is one is enabled. Configured to a class using property trec.querying.resultscache. Defaults to NullQueryResultCache (no caching).
-
-
Constructor Detail
-
TRECQuerying
public TRECQuerying()
Deprecated.TRECQuerying default constructor initialises the inverted index, the lexicon and the document index structures.
-
TRECQuerying
public TRECQuerying(IndexRef _indexref)
TRECQuerying constructor initialises the specified inverted index, the lexicon and the document index structures.- Parameters:
_indexref
- The specified index reference.
-
-
Method Detail
-
intialise
public void intialise()
-
getResultsCache
protected QueryResultCache getResultsCache()
Obtain the query cache. Loads the class specified by property trec.querying.resultscache
-
getOutputFormat
protected OutputFormat getOutputFormat()
-
createManager
protected void createManager()
Description copied from class:AbstractQuerying
Create a querying manager.- Overrides:
createManager
in classAbstractQuerying
-
loadIndex
protected void loadIndex()
Loads index(s) from disk.
-
getIndexRef
public IndexRef getIndexRef()
Get the index pointer.- Returns:
- The index pointer.
-
getManager
public Manager getManager()
Get the querying manager.- Returns:
- The querying manager.
-
close
public void close()
Closes the used structures.
-
getNextQueryCounter
protected java.lang.String getNextQueryCounter(java.lang.String resultsFolder)
Get the sequential number of the next result file in the results folder.- Parameters:
resultsFolder
- The path of the results folder.- Returns:
- The sequential number of the next result file in the results folder.
-
getRandomQueryCounter
protected java.lang.String getRandomQueryCounter()
Get a random number between 0 and 1000.- Returns:
- A random number between 0 and 1000.
-
getSequentialQueryCounter
protected java.lang.String getSequentialQueryCounter(java.lang.String resultsFolder)
Get the sequential number of the current result file in the results folder.- Parameters:
resultsFolder
- The path of the results folder.- Returns:
- The sequential number of the current result file in the results folder.
-
getResultFile
public java.io.PrintWriter getResultFile(java.lang.String predefinedName)
Returns a PrintWriter used to store the results.- Parameters:
predefinedName
- java.lang.String a non-standard prefix for the result file.- Returns:
- a handle used as a destination for storing results.
-
processQueryAndWrite
protected void processQueryAndWrite(java.lang.String queryId, java.lang.String query)
According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.- Parameters:
queryId
- the identifier of the query to process.query
- the query to process.
-
processQuery
public SearchRequest processQuery(java.lang.String queryId, java.lang.String query)
According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.- Overrides:
processQuery
in classAbstractQuerying
- Parameters:
queryId
- the identifier of the query to process.query
- the query to process.
-
preQueryingSearchRequestModification
protected void preQueryingSearchRequestModification(java.lang.String queryId, SearchRequest srq)
-
initSearchRequestModification
protected void initSearchRequestModification(java.lang.String queryId, SearchRequest srq)
-
getQueryParser
public static QuerySource getQueryParser(java.lang.String parserName)
Get the query parser that is being used.- Returns:
- The query parser that is being used.
-
processQueries
public java.lang.String processQueries()
Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism. It parses the file with the queries creates the file of results, and for each query, gets the relevant documents, scores them, and outputs the results to the result file.Queries
Queries are parsed from file, specified by the trec.topics property (comma delimited)- Returns:
- String the filename that the results have been written to
-
processQueries
public java.lang.String processQueries(QuerySource _qs)
-
startingBatchOfQueries
protected void startingBatchOfQueries()
Before starting a batch of queries, this method is called by processQueries()- Since:
- 2.2
-
finishedQueries
protected void finishedQueries()
After finishing with a batch of queries, close the result file
-
printSettings
public void printSettings(SearchRequest default_q, java.lang.String[] topicsFiles, java.lang.String otherComments)
prints the current settings to a file with the same name as the current results file. This assists in tracing the settings used to generate a given run.
-
getTopicsParser
public java.lang.String getTopicsParser()
-
setTopicsParser
public void setTopicsParser(java.lang.String topicsParser)
-
-