Class TRECQuery

  • All Implemented Interfaces:
    java.util.Iterator<java.lang.String>, QuerySource
    Direct Known Subclasses:
    SingleLineTRECQuery

    public class TRECQuery
    extends java.lang.Object
    implements QuerySource
    This class is used for reading the queries from TREC topic files.

    Properties:

    • trecquery.ignore.desc.narr.name.tokens - should the token DESCRIPTION and NARRATIVE in the desc and narr fields be ignored? Defaluts to true
    • tokeniser - name of the Tokeniser class to use to tokenise topics. Defaults to EnglishTokeniser.
    • trec.encoding - use to set the encoding of TREC topic files. Defaults to the systems default encoding.
    Author:
    Ben He & Craig Macdonald
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected java.lang.String desiredEncoding
      Encoding to be used to open all files.
      protected static boolean IGNORE_DESC_NARR_NAME_TOKENS
      Value of trecquery.ignore.desc.narr.name.tokens - should the token DESCRIPTION and NARRATIVE in the desc and narr fields be ignored? Defaluts to true?
      protected int index
      The index of the queries.
      protected static org.slf4j.Logger logger
      The logger used for this class
      protected java.lang.String[] queries
      The queries in the topic files.
      protected java.lang.String[] query_ids
      The query identifiers in the topic files.
      protected TagSet tags  
      protected java.lang.String[] topicFiles
      The topic files used in this object
    • Constructor Summary

      Constructors 
      Constructor Description
      TRECQuery()
      Constructs an instance of TRECQuery, that reads and stores all the queries from the files defined in the trec.topics property.
      TRECQuery​(java.lang.String queryfilename)
      Constructs an instance of TRECQuery that reads and stores all the queries from a file with the specified filename.
      TRECQuery​(java.lang.String[] queryfilenames)
      Constructs an instance of TRECQuery that reads and stores all the queries from files with the specified filename.
      TRECQuery​(java.lang.String[] queryfilenames, java.lang.String docTag, java.lang.String idTag, java.lang.String[] whitelist, java.lang.String[] blacklist)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void checkEncoding()  
      boolean extractQuery​(java.lang.String[] queryfilenames, TagSet t, java.util.Vector<java.lang.String> vecStringQueries, java.util.Vector<java.lang.String> vecStringIds)
      Extracts and stores all the queries from query files.
      boolean extractQuery​(java.lang.String queryfilename, TagSet t, java.util.Vector<java.lang.String> vecStringQueries, java.util.Vector<java.lang.String> vecStringIds)
      Extracts and stores all the queries from a query file.
      int getIndexOfCurrentQuery()
      Returns the index of the last obtained query.
      java.lang.String[] getInfo()
      Returns the filenames of the topic files from which the queries were extracted
      int getNumberOfQueries()
      Returns the number of the queries read from the processed topic files.
      java.lang.String getQuery​(java.lang.String queryNo)
      Return the query for the given query number.
      java.lang.String getQueryId()
      Returns the query identifier of the last query fetched, or the first one, if none has been fetched yet.
      java.lang.String[] getQueryIds()
      Returns the query ids
      boolean hasNext()
      static void main​(java.lang.String[] args)
      main
      java.lang.String next()
      protected void performExtraction()  
      void remove()
      void reset()
      Resets the query source back to the first query.
      java.lang.String[] toArray()
      Returns the queries in an array of strings
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.util.Iterator

        forEachRemaining
    • Field Detail

      • logger

        protected static final org.slf4j.Logger logger
        The logger used for this class
      • IGNORE_DESC_NARR_NAME_TOKENS

        protected static final boolean IGNORE_DESC_NARR_NAME_TOKENS
        Value of trecquery.ignore.desc.narr.name.tokens - should the token DESCRIPTION and NARRATIVE in the desc and narr fields be ignored? Defaluts to true?
      • desiredEncoding

        protected java.lang.String desiredEncoding
        Encoding to be used to open all files.
      • topicFiles

        protected java.lang.String[] topicFiles
        The topic files used in this object
      • queries

        protected java.lang.String[] queries
        The queries in the topic files.
      • query_ids

        protected java.lang.String[] query_ids
        The query identifiers in the topic files.
      • index

        protected int index
        The index of the queries.
    • Constructor Detail

      • TRECQuery

        public TRECQuery​(java.lang.String[] queryfilenames,
                         java.lang.String docTag,
                         java.lang.String idTag,
                         java.lang.String[] whitelist,
                         java.lang.String[] blacklist)
      • TRECQuery

        public TRECQuery()
        Constructs an instance of TRECQuery, that reads and stores all the queries from the files defined in the trec.topics property.
      • TRECQuery

        public TRECQuery​(java.lang.String queryfilename)
        Constructs an instance of TRECQuery that reads and stores all the queries from a file with the specified filename.
        Parameters:
        queryfilename - String the name of the file containing all the queries.
      • TRECQuery

        public TRECQuery​(java.lang.String[] queryfilenames)
        Constructs an instance of TRECQuery that reads and stores all the queries from files with the specified filename.
        Parameters:
        queryfilenames - String[] the name of the files containing all the queries.
    • Method Detail

      • extractQuery

        public boolean extractQuery​(java.lang.String[] queryfilenames,
                                    TagSet t,
                                    java.util.Vector<java.lang.String> vecStringQueries,
                                    java.util.Vector<java.lang.String> vecStringIds)
        Extracts and stores all the queries from query files.
        Parameters:
        queryfilenames - String the name of files containing topics.
        vecStringQueries - Vector a vector containing the queries as strings.
        vecStringIds - Vector a vector containing the query identifiers as strings.
        Returns:
        boolean true if some queries were successfully extracted.
      • extractQuery

        public boolean extractQuery​(java.lang.String queryfilename,
                                    TagSet t,
                                    java.util.Vector<java.lang.String> vecStringQueries,
                                    java.util.Vector<java.lang.String> vecStringIds)
        Extracts and stores all the queries from a query file.
        Parameters:
        queryfilename - String the name of a file containing topics.
        vecStringQueries - Vector a vector containing the queries as strings.
        vecStringIds - Vector a vector containing the query identifiers as strings.
        Returns:
        boolean true if some queries were successfully extracted.
      • checkEncoding

        protected void checkEncoding()
      • performExtraction

        protected void performExtraction()
      • getIndexOfCurrentQuery

        public int getIndexOfCurrentQuery()
        Returns the index of the last obtained query.
        Returns:
        int the index of the last obtained query.
      • getNumberOfQueries

        public int getNumberOfQueries()
        Returns the number of the queries read from the processed topic files.
        Returns:
        int the number of topics contained in the processed topic files.
      • getInfo

        public java.lang.String[] getInfo()
        Returns the filenames of the topic files from which the queries were extracted
        Specified by:
        getInfo in interface QuerySource
      • getQuery

        public java.lang.String getQuery​(java.lang.String queryNo)
        Return the query for the given query number.
        Parameters:
        queryNo - String The number of a query.
        Returns:
        String the string representing the query.
      • hasNext

        public boolean hasNext()
        Specified by:
        hasNext in interface java.util.Iterator<java.lang.String>
      • next

        public java.lang.String next()
        Specified by:
        next in interface java.util.Iterator<java.lang.String>
      • getQueryId

        public java.lang.String getQueryId()
        Returns the query identifier of the last query fetched, or the first one, if none has been fetched yet.
        Specified by:
        getQueryId in interface QuerySource
        Returns:
        String the query number of a query.
      • getQueryIds

        public java.lang.String[] getQueryIds()
        Returns the query ids
        Returns:
        String array containing the query ids.
        Since:
        2.2
      • toArray

        public java.lang.String[] toArray()
        Returns the queries in an array of strings
        Returns:
        String[] an array containing the strings that represent the queries.
      • reset

        public void reset()
        Resets the query source back to the first query.
        Specified by:
        reset in interface QuerySource
      • remove

        public void remove()
        Specified by:
        remove in interface java.util.Iterator<java.lang.String>
      • main

        public static void main​(java.lang.String[] args)
        main
        Parameters:
        args -