Class SingleLineTRECQuery

  • All Implemented Interfaces:
    java.util.Iterator<java.lang.String>, QuerySource

    public class SingleLineTRECQuery
    extends TRECQuery
    This class can be used to extract batch queries from a simpler format than the regular SGML TREC format. In particular, this class reads queries, one per line, verbatim from the specified file(s). Empty lines and lines starting with # are ignored. By default, queries are not tokenised by this class, and are passed verbatim to the query parser. Tokenisation can be turned on by the property SingleLineTRECQuery.tokenise, with the tokensier specified by tokeniser. Moreover, this class assumes that the first token on each line is the query Id. This can be controlled by the properties SingleLineTRECQuery.queryid.exists (default true). Trailing colons in the query Id are removed (aka TREC single line format from the Million Query track). Use this class by specifying trec.topics.parser=SingleLineTRECQuery and running TRECQuerying or TrecTerrier as normal.

    Properties:

    • SingleLineTRECQuery.queryid.exists - does the line start with a query Id? (defaults to true)
    • SingleLineTRECQuery.tokenise (defaults to false). By default, the query is not passed through a tokeniser. If set to true, then it will be passed through the tokeniser configured by the tokeniser property.
    • trec.encoding - expected encoding of topics file
    • Field Detail

      • tokenise

        protected boolean tokenise
    • Constructor Detail

      • SingleLineTRECQuery

        public SingleLineTRECQuery()
        Constructor - default
      • SingleLineTRECQuery

        public SingleLineTRECQuery​(java.lang.String[] queryfilenames)
        Reads queries from the specified filenames
      • SingleLineTRECQuery

        public SingleLineTRECQuery​(java.lang.String queryfilename)
        Reads queries from the specified filename
      • SingleLineTRECQuery

        public SingleLineTRECQuery​(java.lang.String queryfilename,
                                   boolean tokenise)
      • SingleLineTRECQuery

        public SingleLineTRECQuery​(java.lang.String queryfilename,
                                   Tokeniser tokeniser)
      • SingleLineTRECQuery

        public SingleLineTRECQuery​(java.lang.String[] queryfilenames,
                                   boolean tokenise)
      • SingleLineTRECQuery

        public SingleLineTRECQuery​(java.lang.String[] queryfilenames,
                                   Tokeniser tokeniser)
    • Method Detail

      • extractQuery

        public boolean extractQuery​(java.lang.String queryfilename,
                                    TagSet ignore,
                                    java.util.Vector<java.lang.String> vecStringQueries,
                                    java.util.Vector<java.lang.String> vecStringIds)
        Extracts queries from the specified filename, adding their contents to vecStringQueries and the corresponding query ids to vecStringIds.
        Overrides:
        extractQuery in class TRECQuery
        Parameters:
        queryfilename - String the name of a file containing topics.
        vecStringQueries - Vector a vector containing the queries as strings.
        vecStringIds - Vector a vector containing the query identifiers as strings.
        Returns:
        true if some queries were successfully read