Class TwitterJSONCollection

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, Collection

    public class TwitterJSONCollection
    extends java.lang.Object
    implements Collection
    This class represents a collection of tweets stored in JSON format. Like TRECCollection, it expects a collection specification containing all of the files to be read. Each file is assumed to be in gzip format, with one tweet per line. The google.gson parser is used to read the tweet JSON. The FlatJSONDocument representation is used.
    Since:
    4.0
    Author:
    Richard McCreadie
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected Document currentDocument
      The current document
      protected java.lang.String currentFilename
      The name of the current file
      protected java.io.BufferedReader currentTweetStream
      The underlying file stream reading tweets from the current file
      protected boolean endOfCollection
      Have we reached the end of the collection yet?
      protected int FileNumber
      The index in the FilesToProcess of the currently processed file.
      protected java.util.List<java.lang.String> FilesToProcess
      The list of files to process.
      protected com.google.gson.JsonStreamParser JSONStream
      The JSON stream containing the tweets
      protected static org.slf4j.Logger logger
      logger for this class
      protected boolean SkipFile
      A boolean which is true when a new file is open.
    • Constructor Summary

      Constructors 
      Constructor Description
      TwitterJSONCollection()  
      TwitterJSONCollection​(java.lang.String CollectionSpecFile)  
      TwitterJSONCollection​(java.lang.String addressCollectionFilename, java.lang.String ignored1, java.lang.String ignored2, java.lang.String ignored3)
      additional constructors required by TRECIndexing
      TwitterJSONCollection​(java.util.List<java.lang.String> files, java.lang.String ignored1, java.lang.String ignored2, java.lang.String ignored3)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addFileToProcess​(java.lang.String JSONFile)  
      void close()  
      boolean endOfCollection()
      Returns true if the end of the collection has been reached
      Document getDocument()
      Get the document object representing the current document.
      void init()  
      protected void loadJSON​(java.lang.String file)  
      boolean nextDocument()
      Move the collection to the start of the next document.
      boolean openNextFile()
      Opens the next document from the collection specification.
      protected void readCollectionSpec​(java.lang.String CollectionSpecFilename)  
      com.google.gson.JsonObject readTweet()  
      void reset()
      Resets the Collection iterator to the start of the collection.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • logger

        protected static final org.slf4j.Logger logger
        logger for this class
      • FilesToProcess

        protected java.util.List<java.lang.String> FilesToProcess
        The list of files to process.
      • SkipFile

        protected boolean SkipFile
        A boolean which is true when a new file is open.
      • JSONStream

        protected com.google.gson.JsonStreamParser JSONStream
        The JSON stream containing the tweets
      • currentTweetStream

        protected java.io.BufferedReader currentTweetStream
        The underlying file stream reading tweets from the current file
      • currentDocument

        protected Document currentDocument
        The current document
      • currentFilename

        protected java.lang.String currentFilename
        The name of the current file
      • FileNumber

        protected int FileNumber
        The index in the FilesToProcess of the currently processed file.
      • endOfCollection

        protected boolean endOfCollection
        Have we reached the end of the collection yet?
    • Constructor Detail

      • TwitterJSONCollection

        public TwitterJSONCollection​(java.lang.String CollectionSpecFile)
      • TwitterJSONCollection

        public TwitterJSONCollection()
      • TwitterJSONCollection

        public TwitterJSONCollection​(java.lang.String addressCollectionFilename,
                                     java.lang.String ignored1,
                                     java.lang.String ignored2,
                                     java.lang.String ignored3)
        additional constructors required by TRECIndexing
      • TwitterJSONCollection

        public TwitterJSONCollection​(java.util.List<java.lang.String> files,
                                     java.lang.String ignored1,
                                     java.lang.String ignored2,
                                     java.lang.String ignored3)
    • Method Detail

      • init

        public void init()
      • loadJSON

        protected void loadJSON​(java.lang.String file)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • addFileToProcess

        public void addFileToProcess​(java.lang.String JSONFile)
      • readCollectionSpec

        protected void readCollectionSpec​(java.lang.String CollectionSpecFilename)
      • openNextFile

        public boolean openNextFile()
                             throws java.io.IOException
        Opens the next document from the collection specification.
        Returns:
        boolean true if the file was opened successufully. If there are no more files to open, it returns false.
        Throws:
        java.io.IOException - if there is an exception while opening the collection files.
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException
      • nextDocument

        public boolean nextDocument()
        Description copied from interface: Collection
        Move the collection to the start of the next document.
        Specified by:
        nextDocument in interface Collection
        Returns:
        boolean true if there exists another document in the collection, otherwise it returns false.
      • readTweet

        public com.google.gson.JsonObject readTweet()
      • getDocument

        public Document getDocument()
        Description copied from interface: Collection
        Get the document object representing the current document.
        Specified by:
        getDocument in interface Collection
        Returns:
        Document the current document;
      • endOfCollection

        public boolean endOfCollection()
        Description copied from interface: Collection
        Returns true if the end of the collection has been reached
        Specified by:
        endOfCollection in interface Collection
        Returns:
        boolean true if the end of collection has been reached, otherwise it returns false.
      • reset

        public void reset()
        Description copied from interface: Collection
        Resets the Collection iterator to the start of the collection.
        Specified by:
        reset in interface Collection