|
||||||||||
| PREV NEXT | FRAMES NO FRAMES | |||||||||
| Packages that use Document | |
|---|---|
| org.terrier.indexing | Provides classes and interfaces related to the indexing of documents. |
| org.terrier.indexing.hadoop | Provides classes for Terrier's MapReduce indexer. |
| org.terrier.structures.indexing.singlepass.hadoop | Provides classes implemeting the Hadoop MapReduce indexing in Terrier. |
| Uses of Document in org.terrier.indexing |
|---|
| Classes in org.terrier.indexing that implement Document | |
|---|---|
class |
FileDocument
Models a document which corresponds to one file. |
class |
HTMLDocument
Deprecated. |
class |
MSExcelDocument
Implements a Document object for a Microsoft Excel spreadsheet. |
class |
MSPowerpointDocument
Implements a Document object for reading Microsoft Powerpoint files. |
class |
MSWordDocument
This class is used for indexing MS Word document files (ie files ending .doc). |
class |
PDFDocument
Implements a Document object for reading PDF documents. |
class |
TaggedDocument
Models a tagged document (e.g., an HTML or TREC document). |
class |
TRECDocument
Deprecated. |
| Fields in org.terrier.indexing with type parameters of type Document | |
|---|---|
protected java.lang.Class<? extends Document> |
WARC09Collection.documentClass
Class to use for all documents parsed by this class |
protected java.lang.Class<? extends Document> |
WARC018Collection.documentClass
Class to use for all documents parsed by this class |
protected java.lang.Class<? extends Document> |
TRECCollection.documentClass
|
protected java.util.Map<java.lang.String,java.lang.Class<? extends Document>> |
SimpleFileCollection.extension_DocumentClass
Maps filename extensions to Document classes. |
| Methods in org.terrier.indexing that return Document | |
|---|---|
static Document |
TaggedDocument.generateDocumentFromFile(java.lang.String filename)
instantiates a TREC document from a file |
Document |
WARC09Collection.getDocument()
Get the document object representing the current document. |
Document |
WARC018Collection.getDocument()
Get the document object representing the current document. |
Document |
TRECCollection.getDocument()
Returns the current document to process. |
Document |
SimpleXMLCollection.getDocument()
Get the document object representing the current document. |
Document |
SimpleFileCollection.getDocument()
Return the current document in the collection. |
Document |
Collection.getDocument()
Get the document object representing the current document. |
Document |
TRECCollection.getDocument(TagSet _tags,
TagSet _exact,
TagSet _fields)
Deprecated. |
protected Document |
SimpleFileCollection.makeDocument(java.lang.String Filename,
java.io.InputStream in)
Given the opened document in, of Filename and File f, work out which parser to try, and instantiate it. |
Document |
WARC09Collection.next()
Return the next document |
Document |
WARC018Collection.next()
Return the next document |
Document |
TRECCollection.next()
Return next document |
Document |
SimpleXMLCollection.next()
get the next document |
Document |
SimpleFileCollection.next()
Move onto the next document in the collection to be processed. |
| Methods in org.terrier.indexing with parameters of type Document | |
|---|---|
static void |
TaggedDocument.dumpDocument(Document d)
Dumps a document to stdout |
protected abstract void |
ExtensibleSinglePassIndexer.preProcess(Document doc,
java.lang.String term)
Perform an operation before the term pipeline is initiated. |
| Uses of Document in org.terrier.indexing.hadoop |
|---|
| Method parameters in org.terrier.indexing.hadoop with type arguments of type Document | |
|---|---|
void |
Hadoop_BasicSinglePassIndexer.map(org.apache.hadoop.io.Text key,
SplitAwareWrapper<Document> value,
org.apache.hadoop.mapred.OutputCollector<SplitEmittedTerm,MapEmittedPostingList> _outputPostingListCollector,
org.apache.hadoop.mapred.Reporter reporter)
Map processes a single document. |
| Uses of Document in org.terrier.structures.indexing.singlepass.hadoop |
|---|
| Methods in org.terrier.structures.indexing.singlepass.hadoop that return types with arguments of type Document | |
|---|---|
SplitAwareWrapper<Document> |
CollectionRecordReader.createValue()
Create a new Text value, each value is a document |
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,SplitAwareWrapper<Document>> |
MultiFileCollectionInputFormat.getRecordReader(org.apache.hadoop.mapred.InputSplit genericSplit,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
|
| Method parameters in org.terrier.structures.indexing.singlepass.hadoop with type arguments of type Document | |
|---|---|
boolean |
CollectionRecordReader.next(org.apache.hadoop.io.Text DocID,
SplitAwareWrapper<Document> document)
Moves to the next Document in the Collections accessing this InputSplit if one exists, setting DocID to the property "DOCID" and Document to the text within the document. |
|
||||||||||
| PREV NEXT | FRAMES NO FRAMES | |||||||||