Class Inverted2DirectIndexBuilder
- java.lang.Object
-
- org.terrier.structures.indexing.singlepass.Inverted2DirectIndexBuilder
-
- Direct Known Subclasses:
BlockInverted2DirectIndexBuilder
public class Inverted2DirectIndexBuilder extends java.lang.Object
Create a direct index from an InvertedIndex. The algorithm is similar to that followed by InvertedIndexBuilder. To summarise, InvertedIndexBuilder builds an InvertedIndex from a DirectIndex. This class does the opposite, building a DirectIndex from an InvertedIndex.Algorithm:
For a selection of document ids -(Scan the inverted index looking for postings with these document ids) -For each term in the inverted index --Select required postings from all the postings of that term --Add these to posting objects that represents each document -For each posting object --Write out the postings for that documentNotes:
This algorithm assumes that termids start at 0 and are strictly increasing. This assumption holds true only for inverted indices generated by the single pass indexing method.Properties:
- inverted2direct.processtokens - total number of tokens to attempt each iteration. Defaults to 100000000. Memory usage would more likely be linked to the number of pointers, however as the document index does not contain the number of unique terms in each document, the pointers calculation is impossible to make.
- Since:
- 2.0
- Author:
- Craig Macdonald
-
-
Field Summary
Fields Modifier and Type Field Description protected java.lang.String
basicDirectIndexPostingIteratorClass
protected java.lang.String
destinationStructure
protected java.lang.String
directIndexClass
Class to read the generated direct indexprotected java.lang.String
directIndexInputStreamClass
Class to read the generated inverted indexprotected int
fieldCount
The number of different fields that are used for indexing field information.protected java.lang.String
fieldDirectIndexPostingIteratorClass
protected IndexOnDisk
index
index currently being usedprotected static org.slf4j.Logger
logger
The logger usedprotected long
processTokens
number of tokens limit per iterationprotected boolean
saveTagInformation
Indicates whether field information is used.protected java.lang.String
sourceStructure
-
Constructor Summary
Constructors Constructor Description Inverted2DirectIndexBuilder(IndexOnDisk i)
Construct a new instance of this builder class
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
createDirectIndex()
create the direct index when the collection contains an existing inverted indexprotected org.terrier.structures.indexing.singlepass.PostingInRun
getPostingReader()
returns the SPIR implementation that should be used for reading the postings written earlierprotected org.terrier.structures.indexing.singlepass.Posting[]
getPostings(int count)
get an array of posting object of the specified size.static void
main(java.lang.String[] args)
protected static int
scanDocumentIndexForTokens(long _processTokens, java.util.Iterator<DocumentIndexEntry> docidStream)
Iterates through the document index, until it has reached the given number of termsprotected long
traverseInvertedFile(PostingIndexInputStream iiis, int firstDocid, int countDocuments, org.terrier.structures.indexing.singlepass.Posting[] directPostings)
traverse the inverted file, looking for all occurrences of documents in the given range
-
-
-
Field Detail
-
logger
protected static final org.slf4j.Logger logger
The logger used
-
index
protected IndexOnDisk index
index currently being used
-
fieldCount
protected final int fieldCount
The number of different fields that are used for indexing field information.
-
saveTagInformation
protected final boolean saveTagInformation
Indicates whether field information is used.
-
directIndexClass
protected java.lang.String directIndexClass
Class to read the generated direct index
-
directIndexInputStreamClass
protected java.lang.String directIndexInputStreamClass
Class to read the generated inverted index
-
basicDirectIndexPostingIteratorClass
protected java.lang.String basicDirectIndexPostingIteratorClass
-
fieldDirectIndexPostingIteratorClass
protected java.lang.String fieldDirectIndexPostingIteratorClass
-
processTokens
protected long processTokens
number of tokens limit per iteration
-
sourceStructure
protected java.lang.String sourceStructure
-
destinationStructure
protected java.lang.String destinationStructure
-
-
Constructor Detail
-
Inverted2DirectIndexBuilder
public Inverted2DirectIndexBuilder(IndexOnDisk i)
Construct a new instance of this builder class
-
-
Method Detail
-
createDirectIndex
public void createDirectIndex()
create the direct index when the collection contains an existing inverted index
-
getPostings
protected org.terrier.structures.indexing.singlepass.Posting[] getPostings(int count)
get an array of posting object of the specified size. These will be used to hold the postings for a range of documents
-
getPostingReader
protected org.terrier.structures.indexing.singlepass.PostingInRun getPostingReader()
returns the SPIR implementation that should be used for reading the postings written earlier
-
traverseInvertedFile
protected long traverseInvertedFile(PostingIndexInputStream iiis, int firstDocid, int countDocuments, org.terrier.structures.indexing.singlepass.Posting[] directPostings) throws java.io.IOException
traverse the inverted file, looking for all occurrences of documents in the given range- Returns:
- the number of tokens found in all of the document.
- Throws:
java.io.IOException
-
scanDocumentIndexForTokens
protected static int scanDocumentIndexForTokens(long _processTokens, java.util.Iterator<DocumentIndexEntry> docidStream) throws java.io.IOException
Iterates through the document index, until it has reached the given number of terms- Parameters:
_processTokens
- Number of tokens to stop reading the documentindex afterdocidStream
- the document index stream to read- Returns:
- the number of documents to process
- Throws:
java.io.IOException
-
main
public static void main(java.lang.String[] args) throws java.lang.Exception
- Throws:
java.lang.Exception
-
-