org.terrier.structures
Class CompressingMetaIndex

java.lang.Object
  extended by org.terrier.structures.CompressingMetaIndex
All Implemented Interfaces:
java.io.Closeable, MetaIndex

public class CompressingMetaIndex
extends java.lang.Object
implements MetaIndex

A MetaIndex implementation that compresses contents. Values have maximum lengths, but overall value blobs are compressed using java.util.zip.Inflater.

Since:
3.0
Author:
Craig Macdonald & Vassilis Plachouras

Nested Class Summary
static class CompressingMetaIndex.CompressingMetaIndexInputFormat
          A Hadoop input format for a compressing meta index (allows the reading of a meta index as input to a MapReduce job.
static class CompressingMetaIndex.InputStream
          An iterator for reading a MetaIndex as a stream
 
Field Summary
protected  int compressionLevel
           
protected  org.terrier.structures.CompressingMetaIndex.ByteAccessor dataSource
           
protected  java.util.Map<org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable>[] forwardMetaMaps
           
protected static java.lang.ThreadLocal<java.util.zip.Inflater> inflaterCache
          thread-local cache of Inflaters to be re-used for decompression
protected  gnu.trove.TObjectIntHashMap<java.lang.String> key2bytelength
           
protected  gnu.trove.TObjectIntHashMap<java.lang.String> key2byteoffset
           
protected  gnu.trove.TObjectIntHashMap<java.lang.String> key2forwardOffset
           
protected  int keyCount
           
protected  FixedSizeWriteableFactory<org.apache.hadoop.io.Text>[] keyFactories
           
protected  java.lang.String[] keyNames
           
protected  org.terrier.structures.CompressingMetaIndex.Docid2OffsetLookup offsetLookup
           
protected  java.lang.String path
           
protected  java.lang.String prefix
           
protected  int recordLength
           
protected  int[] valueByteLengths
           
protected  int[] valueByteOffsets
           
 
Constructor Summary
CompressingMetaIndex(Index index, java.lang.String structureName)
          Construct an instance of the class with
 
Method Summary
 void close()
          Closes the underlying structures.
 java.lang.String[] getAllItems(int docid)
          Obtain all metadata for specified document.
 int getDocument(java.lang.String key, java.lang.String value)
          Obtain docid where document has specified metadata value in the specified type.
 java.lang.String getItem(java.lang.String Key, int docid)
          Obtain metadata of specified type for specified document.
 java.lang.String[] getItems(java.lang.String[] Keys, int docid)
          Obtain metadata of specified types for specified document.
 java.lang.String[][] getItems(java.lang.String[] Keys, int[] _docids)
          Obtain metadata of specified types for specified documents.
 java.lang.String[] getItems(java.lang.String Key, int[] _docids)
          Obtain metadata of specified type for specified documents.
 java.lang.String[] getKeys()
          Returns the keys of this meta index
protected  void loadIndex(Index index, java.lang.String structureName)
           
static void main(java.lang.String[] args)
          main
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

inflaterCache

protected static final java.lang.ThreadLocal<java.util.zip.Inflater> inflaterCache
thread-local cache of Inflaters to be re-used for decompression


offsetLookup

protected org.terrier.structures.CompressingMetaIndex.Docid2OffsetLookup offsetLookup

compressionLevel

protected int compressionLevel

recordLength

protected int recordLength

keyNames

protected java.lang.String[] keyNames

key2byteoffset

protected gnu.trove.TObjectIntHashMap<java.lang.String> key2byteoffset

key2bytelength

protected gnu.trove.TObjectIntHashMap<java.lang.String> key2bytelength

key2forwardOffset

protected gnu.trove.TObjectIntHashMap<java.lang.String> key2forwardOffset

keyCount

protected int keyCount

valueByteOffsets

protected int[] valueByteOffsets

valueByteLengths

protected int[] valueByteLengths

path

protected final java.lang.String path

prefix

protected final java.lang.String prefix

dataSource

protected final org.terrier.structures.CompressingMetaIndex.ByteAccessor dataSource

forwardMetaMaps

protected java.util.Map<org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable>[] forwardMetaMaps

keyFactories

protected FixedSizeWriteableFactory<org.apache.hadoop.io.Text>[] keyFactories
Constructor Detail

CompressingMetaIndex

public CompressingMetaIndex(Index index,
                            java.lang.String structureName)
                     throws java.io.IOException
Construct an instance of the class with

Parameters:
index -
structureName -
Throws:
java.io.IOException
Method Detail

getKeys

public java.lang.String[] getKeys()
Returns the keys of this meta index

Specified by:
getKeys in interface MetaIndex

close

public void close()
           throws java.io.IOException
Closes the underlying structures.

Specified by:
close in interface java.io.Closeable
Throws:
java.io.IOException

getDocument

public int getDocument(java.lang.String key,
                       java.lang.String value)
                throws java.io.IOException
Obtain docid where document has specified metadata value in the specified type. Returns -1 if the value cannot be found for the specified key type.

Specified by:
getDocument in interface MetaIndex
Throws:
java.io.IOException

getItems

public java.lang.String[] getItems(java.lang.String Key,
                                   int[] _docids)
                            throws java.io.IOException
Obtain metadata of specified type for specified documents.. In this implementation, _docids are sorted to improve disk cache hits. _docids is however unchanged.

Specified by:
getItems in interface MetaIndex
Throws:
java.io.IOException

getItems

public java.lang.String[][] getItems(java.lang.String[] Keys,
                                     int[] _docids)
                              throws java.io.IOException
Obtain metadata of specified types for specified documents. In this implementation, _docids are sorted to improve disk cache hits. _docids is however unchanged.

Specified by:
getItems in interface MetaIndex
Throws:
java.io.IOException

getItem

public java.lang.String getItem(java.lang.String Key,
                                int docid)
                         throws java.io.IOException
Obtain metadata of specified type for specified document.

Specified by:
getItem in interface MetaIndex
Throws:
java.io.IOException

getItems

public java.lang.String[] getItems(java.lang.String[] Keys,
                                   int docid)
                            throws java.io.IOException
Obtain metadata of specified types for specified document.

Specified by:
getItems in interface MetaIndex
Throws:
java.io.IOException

getAllItems

public java.lang.String[] getAllItems(int docid)
                               throws java.io.IOException
Obtain all metadata for specified document.

Specified by:
getAllItems in interface MetaIndex
Throws:
java.io.IOException

loadIndex

protected void loadIndex(Index index,
                         java.lang.String structureName)
                  throws java.io.IOException
Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
main

Parameters:
args -
Throws:
java.lang.Exception


Terrier 3.5. Copyright © 2004-2011 University of Glasgow