org.terrier.structures
Class SingleLineTRECQuery

java.lang.Object
  extended by org.terrier.structures.TRECQuery
      extended by org.terrier.structures.SingleLineTRECQuery
All Implemented Interfaces:
Iterator<String>, TRECQuerying.QuerySource

public class SingleLineTRECQuery
extends TRECQuery

This class can be used to extract batch queries from a simpler format than the regular SGML TREC format. In particular, this class reads queries, one per line, verbatim from the specified file(s). Empty lines and lines starting with # are ignored. By default, queries are not tokenised by this class, and are passed verbatim to the query parser. Tokenisation can be turned on by the property SingleLineTRECQuery.tokenise, with the tokensier specified by tokeniser. Moreover, this class assumes that the first token on each line is the query Id. This can be controlled by the properties SingleLineTRECQuery.queryid.exists (default true). Trailing colons in the query Id are removed (aka TREC single line format from the Million Query track). Use this class by specifying trec.topics.parser=SingleLineTRECQuery and running TRECQuerying or TrecTerrier as normal.

Properties:


Field Summary
 
Fields inherited from class org.terrier.structures.TRECQuery
desiredEncoding, IGNORE_DESC_NARR_NAME_TOKENS, index, logger, queries, query_ids, topicFiles
 
Constructor Summary
SingleLineTRECQuery()
          Constructor - default
SingleLineTRECQuery(File queryfile)
          Reads queries from the specified file
SingleLineTRECQuery(String queryfilename)
          Reads queries from the specified filename
SingleLineTRECQuery(String[] queryfilenames)
          Reads queries from the specified filenames
 
Method Summary
 boolean extractQuery(String queryfilename, Vector<String> vecStringQueries, Vector<String> vecStringIds)
          Extracts queries from the specified filename, adding their contents to vecStringQueries and the corresponding query ids to vecStringIds.
 
Methods inherited from class org.terrier.structures.TRECQuery
extractQuery, getIndexOfCurrentQuery, getInfo, getNumberOfQueries, getQuery, getQueryId, getQueryIds, getTopicFilenames, hasMoreQueries, hasNext, main, next, nextQuery, remove, reset, toArray
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SingleLineTRECQuery

public SingleLineTRECQuery()
Constructor - default


SingleLineTRECQuery

public SingleLineTRECQuery(File queryfile)
Reads queries from the specified file


SingleLineTRECQuery

public SingleLineTRECQuery(String queryfilename)
Reads queries from the specified filename


SingleLineTRECQuery

public SingleLineTRECQuery(String[] queryfilenames)
Reads queries from the specified filenames

Method Detail

extractQuery

public boolean extractQuery(String queryfilename,
                            Vector<String> vecStringQueries,
                            Vector<String> vecStringIds)
Extracts queries from the specified filename, adding their contents to vecStringQueries and the corresponding query ids to vecStringIds.

Overrides:
extractQuery in class TRECQuery
Parameters:
queryfilename - String the name of a file containing topics.
vecStringQueries - Vector a vector containing the queries as strings.
vecStringIds - Vector a vector containing the query identifiers as strings.
Returns:
true if some queries were successfully read


Terrier 3.6. Copyright © 2004-2011 University of Glasgow