Package org.terrier.indexing
Class TwitterJSONCollection
- java.lang.Object
-
- org.terrier.indexing.TwitterJSONCollection
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,Collection
public class TwitterJSONCollection extends java.lang.Object implements Collection
This class represents a collection of tweets stored in JSON format. Like TRECCollection, it expects a collection specification containing all of the files to be read. Each file is assumed to be in gzip format, with one tweet per line. The google.gson parser is used to read the tweet JSON. The FlatJSONDocument representation is used.- Since:
- 4.0
- Author:
- Richard McCreadie
-
-
Field Summary
Fields Modifier and Type Field Description protected Document
currentDocument
The current documentprotected java.lang.String
currentFilename
The name of the current fileprotected java.io.BufferedReader
currentTweetStream
The underlying file stream reading tweets from the current fileprotected boolean
endOfCollection
Have we reached the end of the collection yet?protected int
FileNumber
The index in the FilesToProcess of the currently processed file.protected java.util.List<java.lang.String>
FilesToProcess
The list of files to process.protected com.google.gson.JsonStreamParser
JSONStream
The JSON stream containing the tweetsprotected static org.slf4j.Logger
logger
logger for this classprotected boolean
SkipFile
A boolean which is true when a new file is open.
-
Constructor Summary
Constructors Constructor Description TwitterJSONCollection()
TwitterJSONCollection(java.lang.String CollectionSpecFile)
TwitterJSONCollection(java.lang.String addressCollectionFilename, java.lang.String ignored1, java.lang.String ignored2, java.lang.String ignored3)
additional constructors required by TRECIndexingTwitterJSONCollection(java.util.List<java.lang.String> files, java.lang.String ignored1, java.lang.String ignored2, java.lang.String ignored3)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addFileToProcess(java.lang.String JSONFile)
void
close()
boolean
endOfCollection()
Returns true if the end of the collection has been reachedDocument
getDocument()
Get the document object representing the current document.void
init()
protected void
loadJSON(java.lang.String file)
boolean
nextDocument()
Move the collection to the start of the next document.boolean
openNextFile()
Opens the next document from the collection specification.protected void
readCollectionSpec(java.lang.String CollectionSpecFilename)
com.google.gson.JsonObject
readTweet()
void
reset()
Resets the Collection iterator to the start of the collection.
-
-
-
Field Detail
-
logger
protected static final org.slf4j.Logger logger
logger for this class
-
FilesToProcess
protected java.util.List<java.lang.String> FilesToProcess
The list of files to process.
-
SkipFile
protected boolean SkipFile
A boolean which is true when a new file is open.
-
JSONStream
protected com.google.gson.JsonStreamParser JSONStream
The JSON stream containing the tweets
-
currentTweetStream
protected java.io.BufferedReader currentTweetStream
The underlying file stream reading tweets from the current file
-
currentDocument
protected Document currentDocument
The current document
-
currentFilename
protected java.lang.String currentFilename
The name of the current file
-
FileNumber
protected int FileNumber
The index in the FilesToProcess of the currently processed file.
-
endOfCollection
protected boolean endOfCollection
Have we reached the end of the collection yet?
-
-
Constructor Detail
-
TwitterJSONCollection
public TwitterJSONCollection(java.lang.String CollectionSpecFile)
-
TwitterJSONCollection
public TwitterJSONCollection()
-
TwitterJSONCollection
public TwitterJSONCollection(java.lang.String addressCollectionFilename, java.lang.String ignored1, java.lang.String ignored2, java.lang.String ignored3)
additional constructors required by TRECIndexing
-
TwitterJSONCollection
public TwitterJSONCollection(java.util.List<java.lang.String> files, java.lang.String ignored1, java.lang.String ignored2, java.lang.String ignored3)
-
-
Method Detail
-
init
public void init()
-
loadJSON
protected void loadJSON(java.lang.String file) throws java.io.IOException
- Throws:
java.io.IOException
-
addFileToProcess
public void addFileToProcess(java.lang.String JSONFile)
-
readCollectionSpec
protected void readCollectionSpec(java.lang.String CollectionSpecFilename)
-
openNextFile
public boolean openNextFile() throws java.io.IOException
Opens the next document from the collection specification.- Returns:
- boolean true if the file was opened successufully. If there are no more files to open, it returns false.
- Throws:
java.io.IOException
- if there is an exception while opening the collection files.
-
close
public void close() throws java.io.IOException
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
-
nextDocument
public boolean nextDocument()
Description copied from interface:Collection
Move the collection to the start of the next document.- Specified by:
nextDocument
in interfaceCollection
- Returns:
- boolean true if there exists another document in the collection, otherwise it returns false.
-
readTweet
public com.google.gson.JsonObject readTweet()
-
getDocument
public Document getDocument()
Description copied from interface:Collection
Get the document object representing the current document.- Specified by:
getDocument
in interfaceCollection
- Returns:
- Document the current document;
-
endOfCollection
public boolean endOfCollection()
Description copied from interface:Collection
Returns true if the end of the collection has been reached- Specified by:
endOfCollection
in interfaceCollection
- Returns:
- boolean true if the end of collection has been reached, otherwise it returns false.
-
reset
public void reset()
Description copied from interface:Collection
Resets the Collection iterator to the start of the collection.- Specified by:
reset
in interfaceCollection
-
-