Package | Description |
---|---|
org.terrier.indexing |
Provides classes and interfaces related to the indexing of documents.
|
org.terrier.structures.indexing |
Provides the classes used for creating the data structures of
the Terrier platform.
|
org.terrier.structures.indexing.classical |
Provides functionality for creating on-disk indices via indexer classes.
|
org.terrier.structures.indexing.singlepass |
Provides implementation of the structures needed for performing a single
pass indexing
|
org.terrier.structures.indexing.singlepass.hadoop |
Provides classes implemeting the Hadoop MapReduce indexing in Terrier.
|
Modifier and Type | Class and Description |
---|---|
class |
MultiDocumentFileCollection |
class |
SimpleFileCollection
Implements a collection that can read arbitrary files on disk.
|
class |
SimpleMedlineXMLCollection
Initial implementation of a class that generates a Collection with Documents from a
series of XML files in the Medline format.
|
class |
SimpleXMLCollection
Initial implementation of a class that generates a Collection with Documents from a
series of XML files.
|
class |
TRECCollection
Models a TREC test collection by implementing the interfaces
Collection and DocumentExtractor.
|
class |
TRECUTFCollection
Deprecated.
|
class |
TRECWebCollection
Version of TRECCollection which can parse
standard form DOCHDR tags in TREC Web corpoa.
|
class |
TwitterJSONCollection
This class represents a collection of tweets stored in JSON
format.
|
class |
WARC018Collection
This object is used to parse WARC format web crawls, 0.18.
|
class |
WARC09Collection
This object is used to parse WARC format web crawls, version 0.9.
|
class |
WARC10Collection
This object is used to parse WARC format web crawls, version 0.10.
|
Modifier and Type | Method and Description |
---|---|
static Collection |
CollectionFactory.loadCollection(String CollectionName)
Load collection(s) of the specified name.
|
static Collection |
CollectionFactory.loadCollection(String CollectionName,
Class<?>[] contructorTypes,
Object[] constructorValues)
Load collection(s) of the specified name.
|
static Collection |
CollectionFactory.loadCollections()
Use the default property trec.collection.class, or it's default value TRECCollection
|
static Collection |
CollectionFactory.loadCollections(String[] collNames)
Load collection(s) of the specified name.
|
static Collection |
CollectionFactory.loadCollections(String[] collNames,
Class<?>[] contructorTypes,
Object[] constructorValues)
Load collection(s) of the specified name.
|
Modifier and Type | Method and Description |
---|---|
abstract void |
Indexer.createDirectIndex(Collection[] collections)
An abstract method for creating the direct index, the document index
and the lexicon for the given collections.
|
void |
Indexer.index(Collection[] collections)
Creates the data structures for a set of collections.
|
Modifier and Type | Method and Description |
---|---|
void |
BasicIndexer.createDirectIndex(Collection[] collections)
Creates the direct index, the document index and the lexicon.
|
void |
BlockIndexer.createDirectIndex(Collection[] collections)
For the given collection, it iterates through the documents and
creates the direct index, document index and lexicon, using
information about blocks and possibly fields.
|
Modifier and Type | Method and Description |
---|---|
void |
BasicSinglePassIndexer.createDirectIndex(Collection[] collections) |
void |
ExtensibleSinglePassIndexer.createInvertedIndex(Collection[] collections)
Builds the inverted file and lexicon file for the given collections
Loops through each document in each of the collections,
extracting terms and pushing these through the Term Pipeline
(e.g.
|
void |
BasicSinglePassIndexer.createInvertedIndex(Collection[] collections)
Builds the inverted file and lexicon file for the given collections
Loops through each document in each of the collections,
extracting terms and pushing these through the Term Pipeline
(eg stemming, stopping, lowercase).
|
Modifier and Type | Field and Description |
---|---|
protected Collection |
CollectionRecordReader.documentCollection
document collection currently being iterated through.
|
Modifier and Type | Method and Description |
---|---|
protected abstract Collection |
CollectionRecordReader.openCollectionSplit(int index)
open a collection for the index'th parth of the current split
|
protected Collection |
FileCollectionRecordReader.openCollectionSplit(int index)
Opens a collection on the next file.
|
Terrier Information Retrieval Platform4.1. Copyright © 2004-2015, University of Glasgow