Class TRECQuerying
- java.lang.Object
-
- org.terrier.applications.AbstractQuerying
-
- org.terrier.applications.batchquerying.TRECQuerying
-
- Direct Known Subclasses:
ParallelTRECQuerying
public class TRECQuerying extends AbstractQuerying
This class performs a batch mode retrieval from a set of TREC queries.Configuring
In the following, we list the main ways for configuring TRECQuerying, before exhaustively listing the properties that can affect TRECQuerying.
Topics
Files containing topics (queries to be evaluated) should be set using the trec.topics property. Multiple topic files can be used together by separating their filenames using commas. By default TRECQuerying assumes TREC tagged topic files, e.g.:<top> <num> Number 1 </num> <title> Query terms </title> <desc> Description : A sentence about the information need </desc> <narr> Narrative: More sentences about what is relevant or not</narr> </top>
If you have a topic files in a different format, you can used a differed QuerySource by setting the property trec.topics.parser. For instance trec.topics.parser=SingleLineTRECQuery should be used for topics where one line is one query. SeeTRECQueryandSingleLineTRECQueryfor more information.Models
By default, Terrier uses theInL2retrieval model for all runs. If the trec.model property is specified, then all runs will be made using that weighting model. You can change this by specifying another model using the property trec.model. E.g., to usePL2, set trec.model=PL2. Similarly, when query expansion is enabled, the default query expansion model isBo1, controlled by the property trec.qe.model.Result Files
The results from the system are output in a trec_eval compatable format. The filename of the results file is specified as the WEIGHTINGMODELNAME_cCVALUE.RUNNO.res, in the var/results folder. RUNNO is (usually) a constantly increasing number, as specified by a file in the results folder. The location of the results folder can be altered by the trec.results property. If the property trec.querycounter.type is not set to sequential, the RUNNO will be a string including the time and a randomly generated number. This is best to use when many instances of Terrier are writing to the same results folder, as the incrementing RUNNO method is not mult-process safe (eg one Terrier could delete it while another is reading it).Properties
- trec.topics.parser - the query parser that parses the topic file(s).
TRECQueryby default. Subclass theTRECQueryclass and alter this property if your topics come in a very different format to those of TREC. - trec.topics - the name of the topic file. Multiple topics files can be used, if separated by comma.
- trec.topics.matchopql - if the topics should be parsed using the matchopql parser. Defaults to false.
- trec.model the name of the weighting model to be used during retrieval. Default InL2
- trec.qe.model the name of the query expansion model to be used during query expansion. Default Bo1.
- c - the term frequency normalisation parameter value. A value specified at runtime as an API parameter (e.g. TrecTerrier -c) overrides this property.
- trec.matching the name of the matching model that is used for retrieval. Defaults to org.terrier.matching.daat.Full.
- trec.results the location of the results folder for results. Defaults to TERRIER_VAR/results/
- trec.results.file the exact result filename to be output. Defaults to an automatically generated filename - see trec.querycounter.type.
- trec.querycounter.type - how the number (RUNNO) at the end of a run file should be generated. Defaults to sequential, in which case RUNNO is a constantly increasing number. Otherwise it is a string including the time and a randomly generated number.
- trec.output.format.length - the very maximum number of results ever output per-query into the results file . Default value 1000. 0 means no limit.
- trec.iteration - the contents of the Iteration column in the trec_eval compatible results. Defaults to 0.
- trec.querying.dump.settings - controls whether the settings used to generate a results file should be dumped to a .settings file in conjunction with the .res file. Defaults to true.
- trec.querying.outputformat - controls class to write the results file. Defaults to
TRECDocnoOutputFormat. Alternatives:TRECDocnoOutputFormat,TRECDocidOutputFormat,NullOutputFormat - trec.querying.outputformat.docno.meta.key - for
TRECDocnoOutputFormat, defines the MetaIndex key to use as the docno. Defaults to "docno". - trec.querying.resultscache - controls cache to use for query caching.
Defaults to
NullQueryResultCache
- Author:
- Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, Nut Limsopatham
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classTRECQuerying.Command-
Nested classes/interfaces inherited from class org.terrier.applications.AbstractQuerying
AbstractQuerying.AbstractQueryingCommand
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringBATCHRETRIEVE_COMMANDstatic java.lang.StringBATCHRETRIEVE_PROP_PREFIXprotected java.lang.StringdefaultQEModelThe name of the query expansion model used.protected static booleanDUMP_SETTINGSDump the current settings along with the results.protected static java.lang.StringITERATIONA TREC specific output field.protected static org.slf4j.LoggerloggerThe logger usedprotected java.lang.StringmethodThe method - ie the weighting model and parameters.protected java.lang.StringmModelThe name of the matching model that is used for retrieval.protected OutputFormatprinterWhere results of the stream of queries are output to.protected ManagerqueryingManagerThe manager object that handles the queries.protected QuerySourcequerySourceWhere the stream of queries is obtained from.protected static java.util.Randomrandomrandom number generatorprotected static booleanremoveQueryPeriodsprotected java.io.PrintWriterresultFileThe file to store the output to.protected java.io.OutputStreamresultFileRawprotected static intRESULTS_LENGTHThe number of results to output.protected QueryResultCacheresultsCacheresults are obtained a query cache is one is enabled.protected java.lang.StringresultsFilenameThe filename of the last file results were output to.-
Fields inherited from class org.terrier.applications.AbstractQuerying
controls, indexref, matchingCount, matchopQl
-
-
Constructor Summary
Constructors Constructor Description TRECQuerying()Deprecated.TRECQuerying(IndexRef _indexref)TRECQuerying constructor initialises the specified inverted index, the lexicon and the document index structures.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Closes the used structures.protected voidcreateManager()Create a querying manager.protected voidfinishedQueries()After finishing with a batch of queries, close the result fileIndexRefgetIndexRef()Get the index pointer.ManagergetManager()Get the querying manager.protected java.lang.StringgetNextQueryCounter(java.lang.String resultsFolder)Get the sequential number of the next result file in the results folder.protected OutputFormatgetOutputFormat()static QuerySourcegetQueryParser(java.lang.String parserName)Get the query parser that is being used.protected java.lang.StringgetRandomQueryCounter()Get a random number between 0 and 1000.java.io.PrintWritergetResultFile(java.lang.String predefinedName)Returns a PrintWriter used to store the results.protected QueryResultCachegetResultsCache()Obtain the query cache.protected java.lang.StringgetSequentialQueryCounter(java.lang.String resultsFolder)Get the sequential number of the current result file in the results folder.java.lang.StringgetTopicsParser()protected voidinitSearchRequestModification(java.lang.String queryId, SearchRequest srq)voidintialise()protected voidloadIndex()Loads index(s) from disk.protected voidpreQueryingSearchRequestModification(java.lang.String queryId, SearchRequest srq)voidprintSettings(SearchRequest default_q, java.lang.String[] topicsFiles, java.lang.String otherComments)prints the current settings to a file with the same name as the current results file.java.lang.StringprocessQueries()Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism.java.lang.StringprocessQueries(QuerySource _qs)SearchRequestprocessQuery(java.lang.String queryId, java.lang.String query)According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.protected voidprocessQueryAndWrite(java.lang.String queryId, java.lang.String query)According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.voidsetTopicsParser(java.lang.String topicsParser)protected voidstartingBatchOfQueries()Before starting a batch of queries, this method is called by processQueries()-
Methods inherited from class org.terrier.applications.AbstractQuerying
controls
-
-
-
-
Field Detail
-
BATCHRETRIEVE_COMMAND
public static final java.lang.String BATCHRETRIEVE_COMMAND
- See Also:
- Constant Field Values
-
BATCHRETRIEVE_PROP_PREFIX
public static final java.lang.String BATCHRETRIEVE_PROP_PREFIX
- See Also:
- Constant Field Values
-
defaultQEModel
protected java.lang.String defaultQEModel
The name of the query expansion model used.
-
logger
protected static final org.slf4j.Logger logger
The logger used
-
removeQueryPeriods
protected static boolean removeQueryPeriods
-
random
protected static final java.util.Random random
random number generator
-
resultFile
protected volatile java.io.PrintWriter resultFile
The file to store the output to.
-
resultFileRaw
protected java.io.OutputStream resultFileRaw
-
resultsFilename
protected java.lang.String resultsFilename
The filename of the last file results were output to.
-
DUMP_SETTINGS
protected static boolean DUMP_SETTINGS
Dump the current settings along with the results. Controlled by property trec.querying.dump.settings, defaults to true.
-
queryingManager
protected Manager queryingManager
The manager object that handles the queries.
-
mModel
protected java.lang.String mModel
The name of the matching model that is used for retrieval. If not set, defaults to matching configured in the Manager.- See Also:
LocalManager
-
RESULTS_LENGTH
protected static int RESULTS_LENGTH
The number of results to output. Set by property trec.output.format.length.
-
ITERATION
protected static java.lang.String ITERATION
A TREC specific output field.
-
method
protected java.lang.String method
The method - ie the weighting model and parameters. Examples: TF_IDF, PL2c1.0
-
querySource
protected QuerySource querySource
Where the stream of queries is obtained from. Configured by property trec.topics.parser
-
printer
protected OutputFormat printer
Where results of the stream of queries are output to. Specified by property trec.querying.outputformat - defaults to TRECDocnoOutputFormat
-
resultsCache
protected QueryResultCache resultsCache
results are obtained a query cache is one is enabled. Configured to a class using property trec.querying.resultscache. Defaults to NullQueryResultCache (no caching).
-
-
Constructor Detail
-
TRECQuerying
public TRECQuerying()
Deprecated.TRECQuerying default constructor initialises the inverted index, the lexicon and the document index structures.
-
TRECQuerying
public TRECQuerying(IndexRef _indexref)
TRECQuerying constructor initialises the specified inverted index, the lexicon and the document index structures.- Parameters:
_indexref- The specified index reference.
-
-
Method Detail
-
intialise
public void intialise()
-
getResultsCache
protected QueryResultCache getResultsCache()
Obtain the query cache. Loads the class specified by property trec.querying.resultscache
-
getOutputFormat
protected OutputFormat getOutputFormat()
-
createManager
protected void createManager()
Description copied from class:AbstractQueryingCreate a querying manager.- Overrides:
createManagerin classAbstractQuerying
-
loadIndex
protected void loadIndex()
Loads index(s) from disk.
-
getIndexRef
public IndexRef getIndexRef()
Get the index pointer.- Returns:
- The index pointer.
-
getManager
public Manager getManager()
Get the querying manager.- Returns:
- The querying manager.
-
close
public void close()
Closes the used structures.
-
getNextQueryCounter
protected java.lang.String getNextQueryCounter(java.lang.String resultsFolder)
Get the sequential number of the next result file in the results folder.- Parameters:
resultsFolder- The path of the results folder.- Returns:
- The sequential number of the next result file in the results folder.
-
getRandomQueryCounter
protected java.lang.String getRandomQueryCounter()
Get a random number between 0 and 1000.- Returns:
- A random number between 0 and 1000.
-
getSequentialQueryCounter
protected java.lang.String getSequentialQueryCounter(java.lang.String resultsFolder)
Get the sequential number of the current result file in the results folder.- Parameters:
resultsFolder- The path of the results folder.- Returns:
- The sequential number of the current result file in the results folder.
-
getResultFile
public java.io.PrintWriter getResultFile(java.lang.String predefinedName)
Returns a PrintWriter used to store the results.- Parameters:
predefinedName- java.lang.String a non-standard prefix for the result file.- Returns:
- a handle used as a destination for storing results.
-
processQueryAndWrite
protected void processQueryAndWrite(java.lang.String queryId, java.lang.String query)According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.- Parameters:
queryId- the identifier of the query to process.query- the query to process.
-
processQuery
public SearchRequest processQuery(java.lang.String queryId, java.lang.String query)
According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.- Overrides:
processQueryin classAbstractQuerying- Parameters:
queryId- the identifier of the query to process.query- the query to process.
-
preQueryingSearchRequestModification
protected void preQueryingSearchRequestModification(java.lang.String queryId, SearchRequest srq)
-
initSearchRequestModification
protected void initSearchRequestModification(java.lang.String queryId, SearchRequest srq)
-
getQueryParser
public static QuerySource getQueryParser(java.lang.String parserName)
Get the query parser that is being used.- Returns:
- The query parser that is being used.
-
processQueries
public java.lang.String processQueries()
Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism. It parses the file with the queries creates the file of results, and for each query, gets the relevant documents, scores them, and outputs the results to the result file.Queries
Queries are parsed from file, specified by the trec.topics property (comma delimited)- Returns:
- String the filename that the results have been written to
-
processQueries
public java.lang.String processQueries(QuerySource _qs)
-
startingBatchOfQueries
protected void startingBatchOfQueries()
Before starting a batch of queries, this method is called by processQueries()- Since:
- 2.2
-
finishedQueries
protected void finishedQueries()
After finishing with a batch of queries, close the result file
-
printSettings
public void printSettings(SearchRequest default_q, java.lang.String[] topicsFiles, java.lang.String otherComments)
prints the current settings to a file with the same name as the current results file. This assists in tracing the settings used to generate a given run.
-
getTopicsParser
public java.lang.String getTopicsParser()
-
setTopicsParser
public void setTopicsParser(java.lang.String topicsParser)
-
-