|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use Document | |
---|---|
org.terrier.indexing | Provides classes and interfaces related to the indexing of documents. |
org.terrier.indexing.hadoop | Provides classes for Terrier's MapReduce indexer. |
org.terrier.structures.indexing.singlepass.hadoop | Provides classes implemeting the Hadoop MapReduce indexing in Terrier. |
Uses of Document in org.terrier.indexing |
---|
Classes in org.terrier.indexing that implement Document | |
---|---|
class |
FileDocument
Models a document which corresponds to one file. |
class |
HTMLDocument
Deprecated. |
class |
MSExcelDocument
Implements a Document object for a Microsoft Excel spreadsheet. |
class |
MSPowerpointDocument
Implements a Document object for reading Microsoft Powerpoint files. |
class |
MSWordDocument
This class is used for indexing MS Word document files (ie files ending .doc). |
class |
PDFDocument
Implements a Document object for reading PDF documents. |
class |
TaggedDocument
Models a tagged document (e.g., an HTML or TREC document). |
class |
TRECDocument
Deprecated. |
Fields in org.terrier.indexing with type parameters of type Document | |
---|---|
protected java.lang.Class<? extends Document> |
WARC09Collection.documentClass
Class to use for all documents parsed by this class |
protected java.lang.Class<? extends Document> |
WARC018Collection.documentClass
Class to use for all documents parsed by this class |
protected java.lang.Class<? extends Document> |
TRECCollection.documentClass
|
protected java.util.Map<java.lang.String,java.lang.Class<? extends Document>> |
SimpleFileCollection.extension_DocumentClass
Maps filename extensions to Document classes. |
Methods in org.terrier.indexing that return Document | |
---|---|
static Document |
TaggedDocument.generateDocumentFromFile(java.lang.String filename)
instantiates a TREC document from a file |
Document |
WARC09Collection.getDocument()
Get the document object representing the current document. |
Document |
WARC018Collection.getDocument()
Get the document object representing the current document. |
Document |
TRECCollection.getDocument()
Returns the current document to process. |
Document |
SimpleXMLCollection.getDocument()
Get the document object representing the current document. |
Document |
SimpleFileCollection.getDocument()
Return the current document in the collection. |
Document |
Collection.getDocument()
Get the document object representing the current document. |
Document |
TRECCollection.getDocument(TagSet _tags,
TagSet _exact,
TagSet _fields)
Deprecated. |
protected Document |
SimpleFileCollection.makeDocument(java.lang.String Filename,
java.io.InputStream in)
Given the opened document in, of Filename and File f, work out which parser to try, and instantiate it. |
Document |
WARC09Collection.next()
Return the next document |
Document |
WARC018Collection.next()
Return the next document |
Document |
TRECCollection.next()
Return next document |
Document |
SimpleXMLCollection.next()
get the next document |
Document |
SimpleFileCollection.next()
Move onto the next document in the collection to be processed. |
Methods in org.terrier.indexing with parameters of type Document | |
---|---|
static void |
TaggedDocument.dumpDocument(Document d)
Dumps a document to stdout |
protected abstract void |
ExtensibleSinglePassIndexer.preProcess(Document doc,
java.lang.String term)
Perform an operation before the term pipeline is initiated. |
Uses of Document in org.terrier.indexing.hadoop |
---|
Method parameters in org.terrier.indexing.hadoop with type arguments of type Document | |
---|---|
void |
Hadoop_BasicSinglePassIndexer.map(org.apache.hadoop.io.Text key,
SplitAwareWrapper<Document> value,
org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputPostingListCollector,
org.apache.hadoop.mapred.Reporter reporter)
Map processes a single document. |
Uses of Document in org.terrier.structures.indexing.singlepass.hadoop |
---|
Methods in org.terrier.structures.indexing.singlepass.hadoop that return types with arguments of type Document | |
---|---|
SplitAwareWrapper<Document> |
CollectionRecordReader.createValue()
Create a new Text value, each value is a document |
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>> |
MultiFileCollectionInputFormat.getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
|
Method parameters in org.terrier.structures.indexing.singlepass.hadoop with type arguments of type Document | |
---|---|
boolean |
CollectionRecordReader.next(org.apache.hadoop.io.Text DocID,
SplitAwareWrapper<Document> document)
Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document. |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |