Class BaseCompressingMetaIndex

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, MetaIndex
    Direct Known Subclasses:
    CompressingMetaIndex, LZ4CompressedMetaIndex, ZstdCompressedMetaIndex

    public abstract class BaseCompressingMetaIndex
    extends java.lang.Object
    implements MetaIndex
    A MetaIndex implementation that compresses contents. Values have maximum lengths, but overall value blobs are compressed. Various sub-classes vary in the particular compression algorithm used. From version 3.0 zlib deflate was default.
    Since:
    3.0
    Author:
    Craig Macdonald & Vassilis Plachouras
    • Constructor Summary

      Constructors 
      Constructor Description
      BaseCompressingMetaIndex​(IndexOnDisk index, java.lang.String structureName)
      Construct an instance of the class with
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      protected int _binarySearch​(java.lang.String key, java.lang.String value)
      performs a binary search on the metaindex, if they keys happen to be in lexographical order
      void close()
      Closes the underlying structures.
      protected abstract byte[] decode​(byte[] input)  
      java.lang.String[] getAllItems​(int docid)
      Obtain all metadata for specified document.
      int getDocument​(java.lang.String key, java.lang.String value)
      Obtain docid where document has specified metadata value in the specified type.
      java.lang.String getItem​(java.lang.String Key, int docid)
      Obtain metadata of specified type for specified document.
      java.lang.String[] getItems​(java.lang.String[] Keys, int docid)
      Obtain metadata of specified types for specified document.
      java.lang.String[][] getItems​(java.lang.String[] Keys, int[] _docids)
      Obtain metadata of specified types for specified documents.
      java.lang.String[] getItems​(java.lang.String Key, int[] _docids)
      Obtain metadata of specified type for specified documents.
      java.lang.String[] getKeys()
      Returns the keys of this meta index
      java.lang.String[] getReverseKeys()
      Returns the reverse keys of this meta index
      protected void loadIndex​(IndexOnDisk index, java.lang.String structureName)  
      static void main​(java.lang.String[] args)
      main
      int size()
      How many documents in this metaindex
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • pointerCache

        protected static final java.lang.ThreadLocal<org.terrier.structures.BaseCompressingMetaIndex.OffsetPointer> pointerCache
      • offsetLookup

        protected org.terrier.structures.BaseCompressingMetaIndex.Docid2OffsetLookup offsetLookup
      • recordLength

        protected int recordLength
      • keyNames

        protected java.lang.String[] keyNames
      • key2byteoffset

        protected gnu.trove.TObjectIntHashMap<java.lang.String> key2byteoffset
      • key2bytelength

        protected gnu.trove.TObjectIntHashMap<java.lang.String> key2bytelength
      • key2reverseOffset

        protected gnu.trove.TObjectIntHashMap<java.lang.String> key2reverseOffset
      • keyCount

        protected int keyCount
      • valueByteOffsets

        protected int[] valueByteOffsets
      • valueByteLengths

        protected int[] valueByteLengths
      • valuesSorted

        protected boolean[] valuesSorted
      • numDocs

        protected int numDocs
      • path

        protected final java.lang.String path
      • prefix

        protected final java.lang.String prefix
      • dataSource

        protected final org.terrier.structures.BaseCompressingMetaIndex.ByteAccessor dataSource
      • reverseMetaMaps

        protected java.util.Map<org.apache.hadoop.io.Text,​org.apache.hadoop.io.IntWritable>[] reverseMetaMaps
    • Constructor Detail

      • BaseCompressingMetaIndex

        public BaseCompressingMetaIndex​(IndexOnDisk index,
                                        java.lang.String structureName)
                                 throws java.io.IOException
        Construct an instance of the class with
        Parameters:
        index -
        structureName -
        Throws:
        java.io.IOException
    • Method Detail

      • size

        public int size()
        Description copied from interface: MetaIndex
        How many documents in this metaindex
        Specified by:
        size in interface MetaIndex
      • getKeys

        public java.lang.String[] getKeys()
        Returns the keys of this meta index
        Specified by:
        getKeys in interface MetaIndex
      • close

        public void close()
                   throws java.io.IOException
        Closes the underlying structures.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException
      • getReverseKeys

        public java.lang.String[] getReverseKeys()
        Returns the reverse keys of this meta index
        Specified by:
        getReverseKeys in interface MetaIndex
      • _binarySearch

        protected int _binarySearch​(java.lang.String key,
                                    java.lang.String value)
                             throws java.io.IOException
        performs a binary search on the metaindex, if they keys happen to be in lexographical order
        Throws:
        java.io.IOException
      • getDocument

        public int getDocument​(java.lang.String key,
                               java.lang.String value)
                        throws java.io.IOException
        Obtain docid where document has specified metadata value in the specified type. Returns -1 if the value cannot be found for the specified key type.
        Specified by:
        getDocument in interface MetaIndex
        Throws:
        java.io.IOException
      • getItems

        public java.lang.String[] getItems​(java.lang.String Key,
                                           int[] _docids)
                                    throws java.io.IOException
        Obtain metadata of specified type for specified documents.. In this implementation, _docids are sorted to improve disk cache hits. _docids is however unchanged.
        Specified by:
        getItems in interface MetaIndex
        Throws:
        java.io.IOException
      • getItems

        public java.lang.String[][] getItems​(java.lang.String[] Keys,
                                             int[] _docids)
                                      throws java.io.IOException
        Obtain metadata of specified types for specified documents. Return array is indexed by documents, then by metakeys. In this implementation, _docids are sorted to improve disk cache hits. _docids is however unchanged.
        Specified by:
        getItems in interface MetaIndex
        Throws:
        java.io.IOException
      • decode

        protected abstract byte[] decode​(byte[] input)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • getItem

        public java.lang.String getItem​(java.lang.String Key,
                                        int docid)
                                 throws java.io.IOException
        Obtain metadata of specified type for specified document.
        Specified by:
        getItem in interface MetaIndex
        Throws:
        java.io.IOException
      • getItems

        public java.lang.String[] getItems​(java.lang.String[] Keys,
                                           int docid)
                                    throws java.io.IOException
        Obtain metadata of specified types for specified document.
        Specified by:
        getItems in interface MetaIndex
        Throws:
        java.io.IOException
      • getAllItems

        public java.lang.String[] getAllItems​(int docid)
                                       throws java.io.IOException
        Obtain all metadata for specified document.
        Specified by:
        getAllItems in interface MetaIndex
        Throws:
        java.io.IOException
      • loadIndex

        protected void loadIndex​(IndexOnDisk index,
                                 java.lang.String structureName)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        main
        Parameters:
        args -
        Throws:
        java.lang.Exception