public class Inverted2DirectIndexBuilder extends Object
Algorithm:
For a selection of document ids
-(Scan the inverted index looking for postings with these document ids)
-For each term in the inverted index
--Select required postings from all the postings of that term
--Add these to posting objects that represents each document
-For each posting object
--Write out the postings for that document
Notes:
This algorithm assumes that termids start at 0 and are strictly increasing. This assumption holds true
only for inverted indices generated by the single pass indexing method.
Properties:
Modifier and Type | Field and Description |
---|---|
protected String |
basicDirectIndexPostingIteratorClass |
protected String |
destinationStructure |
protected String |
directIndexClass
Class to read the generated direct index
|
protected String |
directIndexInputStreamClass
Class to read the generated inverted index
|
protected int |
fieldCount
The number of different fields that are used for indexing field information.
|
protected String |
fieldDirectIndexPostingIteratorClass |
protected IndexOnDisk |
index
index currently being used
|
protected static org.slf4j.Logger |
logger
The logger used
|
protected long |
processTokens
number of tokens limit per iteration
|
protected boolean |
saveTagInformation
Indicates whether field information is used.
|
protected String |
sourceStructure |
Constructor and Description |
---|
Inverted2DirectIndexBuilder(IndexOnDisk i)
Construct a new instance of this builder class
|
Modifier and Type | Method and Description |
---|---|
void |
createDirectIndex()
create the direct index when the collection contains an existing inverted index
|
protected PostingInRun |
getPostingReader()
returns the SPIR implementation that should be used for reading the postings
written earlier
|
protected Posting[] |
getPostings(int count)
get an array of posting object of the specified size.
|
static void |
main(String[] args) |
protected static int |
scanDocumentIndexForTokens(long _processTokens,
Iterator<DocumentIndexEntry> docidStream)
Iterates through the document index, until it has reached the given number of terms
|
protected long |
traverseInvertedFile(PostingIndexInputStream iiis,
int firstDocid,
int countDocuments,
Posting[] directPostings)
traverse the inverted file, looking for all occurrences of documents in the given range
|
protected static final org.slf4j.Logger logger
protected IndexOnDisk index
protected final int fieldCount
protected final boolean saveTagInformation
protected String directIndexClass
protected String directIndexInputStreamClass
protected String basicDirectIndexPostingIteratorClass
protected String fieldDirectIndexPostingIteratorClass
protected long processTokens
protected String sourceStructure
protected String destinationStructure
public Inverted2DirectIndexBuilder(IndexOnDisk i)
public void createDirectIndex()
protected Posting[] getPostings(int count)
protected PostingInRun getPostingReader()
protected long traverseInvertedFile(PostingIndexInputStream iiis, int firstDocid, int countDocuments, Posting[] directPostings) throws IOException
IOException
protected static int scanDocumentIndexForTokens(long _processTokens, Iterator<DocumentIndexEntry> docidStream) throws IOException
_processTokens
- Number of tokens to stop reading the documentindex afterdocidStream
- the document index stream to readIOException
Terrier Information Retrieval Platform 5.1. Copyright © 2004-2019, University of Glasgow