Class TRECQuerying

  • Direct Known Subclasses:
    ParallelTRECQuerying

    public class TRECQuerying
    extends AbstractQuerying
    This class performs a batch mode retrieval from a set of TREC queries.

    Configuring

    In the following, we list the main ways for configuring TRECQuerying, before exhaustively listing the properties that can affect TRECQuerying.

    Topics

    Files containing topics (queries to be evaluated) should be set using the trec.topics property. Multiple topic files can be used together by separating their filenames using commas. By default TRECQuerying assumes TREC tagged topic files, e.g.:
     <top>
     <num> Number 1 </num>
     <title> Query terms </title>
     <desc> Description : A sentence about the information need </desc>
     <narr> Narrative: More sentences about what is relevant or not</narr>
     </top>
     
    If you have a topic files in a different format, you can used a differed QuerySource by setting the property trec.topics.parser. For instance trec.topics.parser=SingleLineTRECQuery should be used for topics where one line is one query. See TRECQuery and SingleLineTRECQuery for more information.

    Models

    By default, Terrier uses the InL2 retrieval model for all runs. If the trec.model property is specified, then all runs will be made using that weighting model. You can change this by specifying another model using the property trec.model. E.g., to use PL2, set trec.model=PL2. Similarly, when query expansion is enabled, the default query expansion model is Bo1, controlled by the property trec.qe.model.

    Result Files

    The results from the system are output in a trec_eval compatable format. The filename of the results file is specified as the WEIGHTINGMODELNAME_cCVALUE.RUNNO.res, in the var/results folder. RUNNO is (usually) a constantly increasing number, as specified by a file in the results folder. The location of the results folder can be altered by the trec.results property. If the property trec.querycounter.type is not set to sequential, the RUNNO will be a string including the time and a randomly generated number. This is best to use when many instances of Terrier are writing to the same results folder, as the incrementing RUNNO method is not mult-process safe (eg one Terrier could delete it while another is reading it).

    Properties

    • trec.topics.parser - the query parser that parses the topic file(s). TRECQuery by default. Subclass the TRECQuery class and alter this property if your topics come in a very different format to those of TREC.
    • trec.topics - the name of the topic file. Multiple topics files can be used, if separated by comma.
    • trec.topics.matchopql - if the topics should be parsed using the matchopql parser. Defaults to false.
    • trec.model the name of the weighting model to be used during retrieval. Default InL2
    • trec.qe.model the name of the query expansion model to be used during query expansion. Default Bo1.
    • c - the term frequency normalisation parameter value. A value specified at runtime as an API parameter (e.g. TrecTerrier -c) overrides this property.
    • trec.matching the name of the matching model that is used for retrieval. Defaults to org.terrier.matching.daat.Full.
    • trec.results the location of the results folder for results. Defaults to TERRIER_VAR/results/
    • trec.results.file the exact result filename to be output. Defaults to an automatically generated filename - see trec.querycounter.type.
    • trec.querycounter.type - how the number (RUNNO) at the end of a run file should be generated. Defaults to sequential, in which case RUNNO is a constantly increasing number. Otherwise it is a string including the time and a randomly generated number.
    • trec.output.format.length - the very maximum number of results ever output per-query into the results file . Default value 1000. 0 means no limit.
    • trec.iteration - the contents of the Iteration column in the trec_eval compatible results. Defaults to 0.
    • trec.querying.dump.settings - controls whether the settings used to generate a results file should be dumped to a .settings file in conjunction with the .res file. Defaults to true.
    • trec.querying.outputformat - controls class to write the results file. Defaults to TRECDocnoOutputFormat. Alternatives: TRECDocnoOutputFormat, TRECDocidOutputFormat, NullOutputFormat
    • trec.querying.outputformat.docno.meta.key - for TRECDocnoOutputFormat, defines the MetaIndex key to use as the docno. Defaults to "docno".
    • trec.querying.resultscache - controls cache to use for query caching. Defaults to NullQueryResultCache
    Author:
    Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, Nut Limsopatham
    • Field Detail

      • BATCHRETRIEVE_COMMAND

        public static final java.lang.String BATCHRETRIEVE_COMMAND
        See Also:
        Constant Field Values
      • BATCHRETRIEVE_PROP_PREFIX

        public static final java.lang.String BATCHRETRIEVE_PROP_PREFIX
        See Also:
        Constant Field Values
      • defaultQEModel

        protected java.lang.String defaultQEModel
        The name of the query expansion model used.
      • logger

        protected static final org.slf4j.Logger logger
        The logger used
      • removeQueryPeriods

        protected static boolean removeQueryPeriods
      • random

        protected static final java.util.Random random
        random number generator
      • resultFile

        protected volatile java.io.PrintWriter resultFile
        The file to store the output to.
      • resultFileRaw

        protected java.io.OutputStream resultFileRaw
      • resultsFilename

        protected java.lang.String resultsFilename
        The filename of the last file results were output to.
      • DUMP_SETTINGS

        protected static boolean DUMP_SETTINGS
        Dump the current settings along with the results. Controlled by property trec.querying.dump.settings, defaults to true.
      • queryingManager

        protected Manager queryingManager
        The manager object that handles the queries.
      • mModel

        protected java.lang.String mModel
        The name of the matching model that is used for retrieval. If not set, defaults to matching configured in the Manager.
        See Also:
        LocalManager
      • RESULTS_LENGTH

        protected static int RESULTS_LENGTH
        The number of results to output. Set by property trec.output.format.length.
      • ITERATION

        protected static java.lang.String ITERATION
        A TREC specific output field.
      • method

        protected java.lang.String method
        The method - ie the weighting model and parameters. Examples: TF_IDF, PL2c1.0
      • querySource

        protected QuerySource querySource
        Where the stream of queries is obtained from. Configured by property trec.topics.parser
      • printer

        protected OutputFormat printer
        Where results of the stream of queries are output to. Specified by property trec.querying.outputformat - defaults to TRECDocnoOutputFormat
      • resultsCache

        protected QueryResultCache resultsCache
        results are obtained a query cache is one is enabled. Configured to a class using property trec.querying.resultscache. Defaults to NullQueryResultCache (no caching).
    • Constructor Detail

      • TRECQuerying

        public TRECQuerying()
        Deprecated.
        TRECQuerying default constructor initialises the inverted index, the lexicon and the document index structures.
      • TRECQuerying

        public TRECQuerying​(IndexRef _indexref)
        TRECQuerying constructor initialises the specified inverted index, the lexicon and the document index structures.
        Parameters:
        _indexref - The specified index reference.
    • Method Detail

      • intialise

        public void intialise()
      • getResultsCache

        protected QueryResultCache getResultsCache()
        Obtain the query cache. Loads the class specified by property trec.querying.resultscache
      • getOutputFormat

        protected OutputFormat getOutputFormat()
      • loadIndex

        protected void loadIndex()
        Loads index(s) from disk.
      • getIndexRef

        public IndexRef getIndexRef()
        Get the index pointer.
        Returns:
        The index pointer.
      • getManager

        public Manager getManager()
        Get the querying manager.
        Returns:
        The querying manager.
      • close

        public void close()
        Closes the used structures.
      • getNextQueryCounter

        protected java.lang.String getNextQueryCounter​(java.lang.String resultsFolder)
        Get the sequential number of the next result file in the results folder.
        Parameters:
        resultsFolder - The path of the results folder.
        Returns:
        The sequential number of the next result file in the results folder.
      • getRandomQueryCounter

        protected java.lang.String getRandomQueryCounter()
        Get a random number between 0 and 1000.
        Returns:
        A random number between 0 and 1000.
      • getSequentialQueryCounter

        protected java.lang.String getSequentialQueryCounter​(java.lang.String resultsFolder)
        Get the sequential number of the current result file in the results folder.
        Parameters:
        resultsFolder - The path of the results folder.
        Returns:
        The sequential number of the current result file in the results folder.
      • getResultFile

        public java.io.PrintWriter getResultFile​(java.lang.String predefinedName)
        Returns a PrintWriter used to store the results.
        Parameters:
        predefinedName - java.lang.String a non-standard prefix for the result file.
        Returns:
        a handle used as a destination for storing results.
      • processQueryAndWrite

        protected void processQueryAndWrite​(java.lang.String queryId,
                                            java.lang.String query)
        According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.
        Parameters:
        queryId - the identifier of the query to process.
        query - the query to process.
      • processQuery

        public SearchRequest processQuery​(java.lang.String queryId,
                                          java.lang.String query)
        According to the given parameters, it sets up the correct matching class and performs retrieval for the given query.
        Overrides:
        processQuery in class AbstractQuerying
        Parameters:
        queryId - the identifier of the query to process.
        query - the query to process.
      • preQueryingSearchRequestModification

        protected void preQueryingSearchRequestModification​(java.lang.String queryId,
                                                            SearchRequest srq)
      • initSearchRequestModification

        protected void initSearchRequestModification​(java.lang.String queryId,
                                                     SearchRequest srq)
      • getQueryParser

        public static QuerySource getQueryParser​(java.lang.String parserName)
        Get the query parser that is being used.
        Returns:
        The query parser that is being used.
      • processQueries

        public java.lang.String processQueries()
        Performs the matching using the specified weighting model from the setup and possibly a combination of evidence mechanism. It parses the file with the queries creates the file of results, and for each query, gets the relevant documents, scores them, and outputs the results to the result file.

        Queries
        Queries are parsed from file, specified by the trec.topics property (comma delimited)

        Returns:
        String the filename that the results have been written to
      • processQueries

        public java.lang.String processQueries​(QuerySource _qs)
      • startingBatchOfQueries

        protected void startingBatchOfQueries()
        Before starting a batch of queries, this method is called by processQueries()
        Since:
        2.2
      • finishedQueries

        protected void finishedQueries()
        After finishing with a batch of queries, close the result file
      • printSettings

        public void printSettings​(SearchRequest default_q,
                                  java.lang.String[] topicsFiles,
                                  java.lang.String otherComments)
        prints the current settings to a file with the same name as the current results file. This assists in tracing the settings used to generate a given run.
      • getTopicsParser

        public java.lang.String getTopicsParser()
      • setTopicsParser

        public void setTopicsParser​(java.lang.String topicsParser)