org.terrier.structures.indexing.singlepass
Class MemoryPostings

java.lang.Object
  extended by org.terrier.structures.indexing.singlepass.MemoryPostings
Direct Known Subclasses:
BlockMemoryPostings, FieldsMemoryPostings

public class MemoryPostings
extends java.lang.Object

Class for handling Simple posting lists in memory while indexing.

Author:
Roi Blanco

Field Summary
protected  long keyBytes
           
protected static org.apache.log4j.Logger logger
          logger to use in this class
protected  int maxSize
          The number of documents for any term in this run
protected  long numPointers
          Number of pointers ( tuples in memory in this run.
protected  java.util.Map<java.lang.String,Posting> postings
          Hashmap indexed by the term, containing the posting lists
protected  long valueBytes
           
 
Constructor Summary
MemoryPostings()
           
 
Method Summary
 void add(java.lang.String term, int doc, int frequency)
          Adds an occurrence of a term in a document to the posting in memory.
 void addTerms(DocumentPostingList docPostings, int docid)
          Add the terms in a DocumentPostingList to the postings in memory.
 void finish(RunWriter runWriter)
          Triggers the writing of the postings in memory to the specified RunWriter.
 void finish(java.lang.String[] file)
          Triggers the writing of the postings in memory to disk.
 long getMemoryConsumption()
          Returns the number of bytes consumed by this set of postings
 long getPointers()
          Returns the number of pointers in this posting list.
 int getSize()
          Returns the number of terms in this posting list.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected static final org.apache.log4j.Logger logger
logger to use in this class


postings

protected java.util.Map<java.lang.String,Posting> postings
Hashmap indexed by the term, containing the posting lists


maxSize

protected int maxSize
The number of documents for any term in this run


numPointers

protected long numPointers
Number of pointers ( tuples in memory in this run.


keyBytes

protected long keyBytes

valueBytes

protected long valueBytes
Constructor Detail

MemoryPostings

public MemoryPostings()
Method Detail

addTerms

public void addTerms(DocumentPostingList docPostings,
                     int docid)
              throws java.io.IOException
Add the terms in a DocumentPostingList to the postings in memory.

Parameters:
docPostings - DocumentPostingList containing the term information for the denoted document.
docid - Current document Identifier.
Throws:
java.io.IOException - if an I/O error occurs.

add

public void add(java.lang.String term,
                int doc,
                int frequency)
         throws java.io.IOException
Adds an occurrence of a term in a document to the posting in memory.

Parameters:
term - String representing the term.
doc - int containing the document identifier.
frequency - int containing the frequency of the term in the document.
Throws:
java.io.IOException - if an I/O error occurs.

finish

public void finish(java.lang.String[] file)
            throws java.io.IOException
Triggers the writing of the postings in memory to disk. Uses the default RunWriter, writing to the specified files.

Parameters:
file - name of the file to write the postings.
Throws:
java.io.IOException - if an I/O error occurs.

finish

public void finish(RunWriter runWriter)
            throws java.io.IOException
Triggers the writing of the postings in memory to the specified RunWriter. If the RunWriter requires that terms are written in order, then this will happen.

Parameters:
runWriter -
Throws:
java.io.IOException

getSize

public int getSize()
Returns the number of terms in this posting list.

Returns:
the number of posting lists in memory.

getMemoryConsumption

public long getMemoryConsumption()
Returns the number of bytes consumed by this set of postings


getPointers

public long getPointers()
Returns the number of pointers in this posting list. Pointers are unique (term,docid) tuples.

Returns:
the number of pointers in memory.


Terrier 3.5. Copyright © 2004-2011 University of Glasgow