Class SingleLineTRECQuery
- java.lang.Object
-
- org.terrier.applications.batchquerying.TRECQuery
-
- org.terrier.applications.batchquerying.SingleLineTRECQuery
-
- All Implemented Interfaces:
java.util.Iterator<java.lang.String>
,QuerySource
public class SingleLineTRECQuery extends TRECQuery
This class can be used to extract batch queries from a simpler format than the regular SGML TREC format. In particular, this class reads queries, one per line, verbatim from the specified file(s). Empty lines and lines starting with # are ignored. By default, queries are not tokenised by this class, and are passed verbatim to the query parser. Tokenisation can be turned on by the property SingleLineTRECQuery.tokenise, with the tokensier specified by tokeniser. Moreover, this class assumes that the first token on each line is the query Id. This can be controlled by the properties SingleLineTRECQuery.queryid.exists (default true). Trailing colons in the query Id are removed (aka TREC single line format from the Million Query track). Use this class by specifying trec.topics.parser=SingleLineTRECQuery and running TRECQuerying or TrecTerrier as normal.Properties:
- SingleLineTRECQuery.queryid.exists - does the line start with a query Id? (defaults to true)
- SingleLineTRECQuery.tokenise (defaults to false). By default, the query is not passed through a tokeniser. If set to true, then it will be passed through the tokeniser configured by the tokeniser property.
- trec.encoding - expected encoding of topics file
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
tokenise
protected Tokeniser
tokeniser
-
Fields inherited from class org.terrier.applications.batchquerying.TRECQuery
desiredEncoding, IGNORE_DESC_NARR_NAME_TOKENS, index, logger, queries, query_ids, tags, topicFiles
-
-
Constructor Summary
Constructors Constructor Description SingleLineTRECQuery()
Constructor - defaultSingleLineTRECQuery(java.lang.String queryfilename)
Reads queries from the specified filenameSingleLineTRECQuery(java.lang.String[] queryfilenames)
Reads queries from the specified filenamesSingleLineTRECQuery(java.lang.String[] queryfilenames, boolean tokenise)
SingleLineTRECQuery(java.lang.String[] queryfilenames, Tokeniser tokeniser)
SingleLineTRECQuery(java.lang.String queryfilename, boolean tokenise)
SingleLineTRECQuery(java.lang.String queryfilename, Tokeniser tokeniser)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
extractQuery(java.lang.String queryfilename, TagSet ignore, java.util.Vector<java.lang.String> vecStringQueries, java.util.Vector<java.lang.String> vecStringIds)
Extracts queries from the specified filename, adding their contents to vecStringQueries and the corresponding query ids to vecStringIds.-
Methods inherited from class org.terrier.applications.batchquerying.TRECQuery
checkEncoding, extractQuery, getIndexOfCurrentQuery, getInfo, getNumberOfQueries, getQuery, getQueryId, getQueryIds, hasNext, main, next, performExtraction, remove, reset, toArray
-
-
-
-
Field Detail
-
tokeniser
protected Tokeniser tokeniser
-
tokenise
protected boolean tokenise
-
-
Constructor Detail
-
SingleLineTRECQuery
public SingleLineTRECQuery()
Constructor - default
-
SingleLineTRECQuery
public SingleLineTRECQuery(java.lang.String[] queryfilenames)
Reads queries from the specified filenames
-
SingleLineTRECQuery
public SingleLineTRECQuery(java.lang.String queryfilename)
Reads queries from the specified filename
-
SingleLineTRECQuery
public SingleLineTRECQuery(java.lang.String queryfilename, boolean tokenise)
-
SingleLineTRECQuery
public SingleLineTRECQuery(java.lang.String queryfilename, Tokeniser tokeniser)
-
SingleLineTRECQuery
public SingleLineTRECQuery(java.lang.String[] queryfilenames, boolean tokenise)
-
SingleLineTRECQuery
public SingleLineTRECQuery(java.lang.String[] queryfilenames, Tokeniser tokeniser)
-
-
Method Detail
-
extractQuery
public boolean extractQuery(java.lang.String queryfilename, TagSet ignore, java.util.Vector<java.lang.String> vecStringQueries, java.util.Vector<java.lang.String> vecStringIds)
Extracts queries from the specified filename, adding their contents to vecStringQueries and the corresponding query ids to vecStringIds.- Overrides:
extractQuery
in classTRECQuery
- Parameters:
queryfilename
- String the name of a file containing topics.vecStringQueries
- Vector a vector containing the queries as strings.vecStringIds
- Vector a vector containing the query identifiers as strings.- Returns:
- true if some queries were successfully read
-
-