public class SimpleFileCollection extends Object implements Collection
Modifier and Type | Field and Description |
---|---|
protected InputStream |
currentStream
The InputStream of the most recently opened document.
|
protected int |
Docid
The identifier of a document in the collection.
|
protected Map<String,Class<? extends Document>> |
extension_DocumentClass
Maps filename extensions to Document classes.
|
protected LinkedList<String> |
FileList
The list of files to index.
|
protected List<String> |
firstList
Contains the list of files first handed to the SimpleFileCollection, allowing
the SimpleFileCollection instance to be simply reset.
|
protected List<String> |
indexedFiles
This is filled during traversal, so document IDs can be matched with filenames
|
protected static org.slf4j.Logger |
logger |
static String |
NAMESPACE_DOCUMENTS
The default namespace for all parsers to be loaded from.
|
protected boolean |
Recurse
Whether directories should be recursed into by this class
|
protected String |
thisFilename
The filename of the current file we are processing.
|
protected Tokeniser |
tokeniser |
Constructor and Description |
---|
SimpleFileCollection()
A default constructor that uses the files to be processed
by this collection, as specified by the property
collection.spec
|
SimpleFileCollection(List<String> filelist,
boolean recurse)
Constructs an instance of the class with the given list of files.
|
SimpleFileCollection(String addressCollectionFilename)
Creates an instance of the class.
|
Modifier and Type | Method and Description |
---|---|
protected void |
addDirectoryListing()
Called when thisFile is identified as a directory, this adds the entire
contents of the directory onto the list to be processed.
|
void |
close() |
protected void |
createExtensionDocumentMapping()
Parses the properties indexing.simplefilecollection.extensionsparsers
and indexing.simplefilecollection.defaultparser and attempts to load
all the mentioned classes, in a hashtable mapping filename extension to their
respective parsers.
|
boolean |
endOfCollection()
Checks whether there are more documents in the colection.
|
String |
getDocid()
Returns the current document's identifier string.
|
Document |
getDocument()
Return the current document in the collection.
|
List<String> |
getFileList()
Returns the ist of indexed files in the order they were indexed in.
|
boolean |
hasNext()
Check whether there is a next document in the collection to be processed
|
static void |
main(String[] args)
Simple test case.
|
protected Document |
makeDocument(String Filename,
InputStream in)
Given the opened document in, of Filename and File f, work out which
parser to try, and instantiate it.
|
Document |
next()
Move onto the next document in the collection to be processed.
|
boolean |
nextDocument()
Move onto the next document in the collection to be processed.
|
void |
remove()
This is unsupported by this Collection implementation, and
any calls will throw UnsupportedOperationException
Throws UnsupportedOperationException on all invocations
|
void |
reset()
Starts again from the beginning of the collection.
|
protected static final org.slf4j.Logger logger
public static final String NAMESPACE_DOCUMENTS
protected LinkedList<String> FileList
protected List<String> firstList
protected List<String> indexedFiles
protected int Docid
protected boolean Recurse
protected Map<String,Class<? extends Document>> extension_DocumentClass
protected String thisFilename
protected InputStream currentStream
protected Tokeniser tokeniser
public SimpleFileCollection(List<String> filelist, boolean recurse)
filelist
- ArrayList the files to be processed by this collection.public SimpleFileCollection()
public SimpleFileCollection(String addressCollectionFilename)
addressCollectionFilename
- String the name of the file that
contains the list of files to be processed by this collecion.protected void createExtensionDocumentMapping()
public boolean hasNext()
public Document next()
public void remove()
public boolean nextDocument()
nextDocument
in interface Collection
public Document getDocument()
getDocument
in interface Collection
protected Document makeDocument(String Filename, InputStream in)
Filename
- the filename of the currently open documentin
- The stream of the currently open documentpublic boolean endOfCollection()
endOfCollection
in interface Collection
public void reset()
reset
in interface Collection
public String getDocid()
public void close()
close
in interface Closeable
close
in interface AutoCloseable
public List<String> getFileList()
protected void addDirectoryListing()
public static void main(String[] args)
Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow