org.terrier.structures.indexing.singlepass
Class Inverted2DirectIndexBuilder

java.lang.Object
  extended by org.terrier.structures.indexing.singlepass.Inverted2DirectIndexBuilder
Direct Known Subclasses:
BlockInverted2DirectIndexBuilder

public class Inverted2DirectIndexBuilder
extends java.lang.Object

Create a direct index from an InvertedIndex. The algorithm is similar to that followed by InvertedIndexBuilder. To summarise, InvertedIndexBuilder builds an InvertedIndex from a DirectIndex. This class does the opposite, building a DirectIndex from an InvertedIndex.

Algorithm:
For a selection of document ids  (Scan the inverted index looking for postings with these document ids)  For each term in the inverted index   Select required postings from all the postings of that term   Add these to posting objects that represents each document &nsbp;For each posting object   Write out the postings for that document

Notes:
This algorithm assumes that termids start at 0 and are strictly increasing. This assumption holds true only for inverted indices generated by the single pass indexing method.

Properties:

  1. inverted2direct.processtokens - total number of tokens to attempt each iteration. Defaults to 100000000. Memory usage would more likely be linked to the number of pointers, however as the document index does not contain the number of unique terms in each document, the pointers calculation is impossible to make.

Since:
2.0
Author:
Craig Macdonald

Field Summary
protected  java.lang.String basicDirectIndexPostingIteratorClass
           
protected  java.lang.String destinationStructure
           
protected  java.lang.String directIndexClass
          Class to read the generated direct index
protected  java.lang.String directIndexInputStreamClass
          Class to read the generated inverted index
protected  int fieldCount
          The number of different fields that are used for indexing field information.
protected  java.lang.String fieldDirectIndexPostingIteratorClass
           
protected  Index index
          index currently being used
protected static org.apache.log4j.Logger logger
          The logger used
protected  long processTokens
          number of tokens limit per iteration
protected  boolean saveTagInformation
          Indicates whether field information is used.
protected  java.lang.String sourceStructure
           
 
Constructor Summary
Inverted2DirectIndexBuilder(Index i)
          Construct a new instance of this builder class
 
Method Summary
 void createDirectIndex()
          create the direct index when the collection contains an existing inverted index
protected  PostingInRun getPostingReader()
          returns the SPIR implementation that should be used for reading the postings written earlier
protected  Posting[] getPostings(int count)
          get an array of posting object of the specified size.
static void main(java.lang.String[] args)
          main
protected  int scanDocumentIndexForTokens(long _processTokens, java.util.Iterator<DocumentIndexEntry> docidStream)
          Iterates through the document index, until it has reached the given number of terms
protected  long traverseInvertedFile(InvertedIndexInputStream iiis, int firstDocid, int lastDocid, Posting[] directPostings)
          traverse the inverted file, looking for all occurrences of documents in the given range
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected static final org.apache.log4j.Logger logger
The logger used


index

protected Index index
index currently being used


fieldCount

protected final int fieldCount
The number of different fields that are used for indexing field information.


saveTagInformation

protected final boolean saveTagInformation
Indicates whether field information is used.


directIndexClass

protected java.lang.String directIndexClass
Class to read the generated direct index


directIndexInputStreamClass

protected java.lang.String directIndexInputStreamClass
Class to read the generated inverted index


basicDirectIndexPostingIteratorClass

protected java.lang.String basicDirectIndexPostingIteratorClass

fieldDirectIndexPostingIteratorClass

protected java.lang.String fieldDirectIndexPostingIteratorClass

processTokens

protected long processTokens
number of tokens limit per iteration


sourceStructure

protected java.lang.String sourceStructure

destinationStructure

protected java.lang.String destinationStructure
Constructor Detail

Inverted2DirectIndexBuilder

public Inverted2DirectIndexBuilder(Index i)
Construct a new instance of this builder class

Method Detail

createDirectIndex

public void createDirectIndex()
create the direct index when the collection contains an existing inverted index


getPostings

protected Posting[] getPostings(int count)
get an array of posting object of the specified size. These will be used to hold the postings for a range of documents


getPostingReader

protected PostingInRun getPostingReader()
returns the SPIR implementation that should be used for reading the postings written earlier


traverseInvertedFile

protected long traverseInvertedFile(InvertedIndexInputStream iiis,
                                    int firstDocid,
                                    int lastDocid,
                                    Posting[] directPostings)
                             throws java.io.IOException
traverse the inverted file, looking for all occurrences of documents in the given range

Returns:
the number of tokens found in all of the document.
Throws:
java.io.IOException

scanDocumentIndexForTokens

protected int scanDocumentIndexForTokens(long _processTokens,
                                         java.util.Iterator<DocumentIndexEntry> docidStream)
                                  throws java.io.IOException
Iterates through the document index, until it has reached the given number of terms

Parameters:
_processTokens - Number of tokens to stop reading the lexicon after
docidStream - the document index stream to read
Returns:
the number of documents to process
Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
main

Parameters:
args -
Throws:
java.lang.Exception


Terrier 3.5. Copyright © 2004-2011 University of Glasgow