Class BlockInverted2DirectIndexBuilder
- java.lang.Object
-
- org.terrier.structures.indexing.singlepass.Inverted2DirectIndexBuilder
-
- org.terrier.structures.indexing.singlepass.BlockInverted2DirectIndexBuilder
-
public class BlockInverted2DirectIndexBuilder extends Inverted2DirectIndexBuilder
Create a block direct index from a BlockInvertedIndex.Properties:
- inverted2direct.processtokens - total number of tokens to attempt each iteration. Defaults to 50000000. Memory usage would more likely be linked to the number of pointers and the number of blocks, however as the document index does not contain these statistics on a document basis. these are impossible to estimate. Note that the default is less than Inverted2DirectIndexBuilder.
- Since:
- 2.0
- Author:
- Craig Macdonald
-
-
Field Summary
-
Fields inherited from class org.terrier.structures.indexing.singlepass.Inverted2DirectIndexBuilder
basicDirectIndexPostingIteratorClass, destinationStructure, directIndexClass, directIndexInputStreamClass, fieldCount, fieldDirectIndexPostingIteratorClass, index, logger, processTokens, saveTagInformation, sourceStructure
-
-
Constructor Summary
Constructors Constructor Description BlockInverted2DirectIndexBuilder(IndexOnDisk i)
constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.terrier.structures.indexing.singlepass.PostingInRun
getPostingReader()
returns the SPIR implementation that should be used for reading the postings written earlierprotected org.terrier.structures.indexing.singlepass.Posting[]
getPostings(int count)
get an array of posting object of the specified size.protected long
traverseInvertedFile(PostingIndexInputStream iiis, int firstDocid, int countDocuments, org.terrier.structures.indexing.singlepass.Posting[] directPostings)
traverse the inverted file, looking for all occurrences of documents in the given range-
Methods inherited from class org.terrier.structures.indexing.singlepass.Inverted2DirectIndexBuilder
createDirectIndex, main, scanDocumentIndexForTokens
-
-
-
-
Constructor Detail
-
BlockInverted2DirectIndexBuilder
public BlockInverted2DirectIndexBuilder(IndexOnDisk i)
constructor- Parameters:
i
-
-
-
Method Detail
-
getPostings
protected org.terrier.structures.indexing.singlepass.Posting[] getPostings(int count)
get an array of posting object of the specified size. These will be used to hold the postings for a range of documents- Overrides:
getPostings
in classInverted2DirectIndexBuilder
-
getPostingReader
protected org.terrier.structures.indexing.singlepass.PostingInRun getPostingReader()
returns the SPIR implementation that should be used for reading the postings written earlier- Overrides:
getPostingReader
in classInverted2DirectIndexBuilder
-
traverseInvertedFile
protected long traverseInvertedFile(PostingIndexInputStream iiis, int firstDocid, int countDocuments, org.terrier.structures.indexing.singlepass.Posting[] directPostings) throws java.io.IOException
traverse the inverted file, looking for all occurrences of documents in the given range- Overrides:
traverseInvertedFile
in classInverted2DirectIndexBuilder
- Returns:
- the number of tokens found in all of the document.
- Throws:
java.io.IOException
-
-