Inverted2DirectIndexBuilder (Terrier 4.0 API)

java.lang.Object
- org.terrier.structures.indexing.singlepass.Inverted2DirectIndexBuilder

Direct Known Subclasses:

BlockInverted2DirectIndexBuilder
```
public class Inverted2DirectIndexBuilder
extends Object
```
Create a direct index from an InvertedIndex. The algorithm is similar to that followed by InvertedIndexBuilder. To summarise, InvertedIndexBuilder builds an InvertedIndex from a DirectIndex. This class does the opposite, building a DirectIndex from an InvertedIndex.
Algorithm:
For a selection of document ids (Scan the inverted index looking for postings with these document ids) For each term in the inverted index Select required postings from all the postings of that term Add these to posting objects that represents each document &nsbp;For each posting object Write out the postings for that document
Notes:
This algorithm assumes that termids start at 0 and are strictly increasing. This assumption holds true only for inverted indices generated by the single pass indexing method.
Properties:
1. inverted2direct.processtokens - total number of tokens to attempt each iteration. Defaults to 100000000. Memory usage would more likely be linked to the number of pointers, however as the document index does not contain the number of unique terms in each document, the pointers calculation is impossible to make.
Since:

2.0

Author:

Craig Macdonald

Field Summary

Fields
Modifier and Type	Field and Description
`protected String`	`basicDirectIndexPostingIteratorClass`
`protected String`	`destinationStructure`
`protected String`	`directIndexClass` Class to read the generated direct index
`protected String`	`directIndexInputStreamClass` Class to read the generated inverted index
`protected int`	`fieldCount` The number of different fields that are used for indexing field information.
`protected String`	`fieldDirectIndexPostingIteratorClass`
`protected IndexOnDisk`	`index` index currently being used
`protected static org.apache.log4j.Logger`	`logger` The logger used
`protected long`	`processTokens` number of tokens limit per iteration
`protected boolean`	`saveTagInformation` Indicates whether field information is used.
`protected String`	`sourceStructure`

Constructor Summary

Constructors
Constructor and Description

Inverted2DirectIndexBuilder(IndexOnDisk i)
Construct a new instance of this builder class

Constructors
Constructor and Description
`Inverted2DirectIndexBuilder(IndexOnDisk i)` Construct a new instance of this builder class

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`createDirectIndex()` create the direct index when the collection contains an existing inverted index
`protected PostingInRun`	`getPostingReader()` returns the SPIR implementation that should be used for reading the postings written earlier
`protected Posting[]`	`getPostings(int count)` get an array of posting object of the specified size.
`static void`	`main(String[] args)` main
`protected int`	`scanDocumentIndexForTokens(long _processTokens, Iterator<DocumentIndexEntry> docidStream)` Iterates through the document index, until it has reached the given number of terms
`protected long`	`traverseInvertedFile(PostingIndexInputStream iiis, int firstDocid, int lastDocid, Posting[] directPostings)` traverse the inverted file, looking for all occurrences of documents in the given range

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - logger
```
protected static final org.apache.log4j.Logger logger
```
    The logger used
  - index
```
protected IndexOnDisk index
```
    index currently being used
  - fieldCount
```
protected final int fieldCount
```
    The number of different fields that are used for indexing field information.
  - saveTagInformation
```
protected final boolean saveTagInformation
```
    Indicates whether field information is used.
  - directIndexClass
```
protected String directIndexClass
```
    Class to read the generated direct index
  - directIndexInputStreamClass
```
protected String directIndexInputStreamClass
```
    Class to read the generated inverted index
  - basicDirectIndexPostingIteratorClass
```
protected String basicDirectIndexPostingIteratorClass
```
  - fieldDirectIndexPostingIteratorClass
```
protected String fieldDirectIndexPostingIteratorClass
```
  - processTokens
```
protected long processTokens
```
    number of tokens limit per iteration
  - sourceStructure
```
protected String sourceStructure
```
  - destinationStructure
```
protected String destinationStructure
```
- Constructor Detail
  - Inverted2DirectIndexBuilder
```
public Inverted2DirectIndexBuilder(IndexOnDisk i)
```
    Construct a new instance of this builder class
- Method Detail
  - createDirectIndex
```
public void createDirectIndex()
```
    create the direct index when the collection contains an existing inverted index
  - getPostings
```
protected Posting[] getPostings(int count)
```
    get an array of posting object of the specified size. These will be used to hold the postings for a range of documents
  - getPostingReader
```
protected PostingInRun getPostingReader()
```
    returns the SPIR implementation that should be used for reading the postings written earlier
  - traverseInvertedFile
```
protected long traverseInvertedFile(PostingIndexInputStream iiis,
                        int firstDocid,
                        int lastDocid,
                        Posting[] directPostings)
                             throws IOException
```
    traverse the inverted file, looking for all occurrences of documents in the given range
    
    Returns:
    the number of tokens found in all of the document.
    
    Throws:
    
    IOException
  - scanDocumentIndexForTokens
```
protected int scanDocumentIndexForTokens(long _processTokens,
                             Iterator<DocumentIndexEntry> docidStream)
                                  throws IOException
```
    Iterates through the document index, until it has reached the given number of terms
    
    Parameters:
    _processTokens - Number of tokens to stop reading the lexicon after
    docidStream - the document index stream to read
    
    Returns:
    the number of documents to process
    
    Throws:
    
    IOException
  - main
```
public static void main(String[] args)
                 throws Exception
```
    main
    
    Parameters:
    args -
    
    Throws:
    
    Exception

Class Inverted2DirectIndexBuilder

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

logger

index

fieldCount

saveTagInformation

directIndexClass

directIndexInputStreamClass

basicDirectIndexPostingIteratorClass

fieldDirectIndexPostingIteratorClass

processTokens

sourceStructure

destinationStructure

Constructor Detail

Inverted2DirectIndexBuilder

Method Detail

createDirectIndex

getPostings

getPostingReader

traverseInvertedFile

scanDocumentIndexForTokens

main