Package org.terrier.structures
Class BaseCompressingMetaIndex
- java.lang.Object
-
- org.terrier.structures.BaseCompressingMetaIndex
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,MetaIndex
- Direct Known Subclasses:
CompressingMetaIndex
,LZ4CompressedMetaIndex
,ZstdCompressedMetaIndex
public abstract class BaseCompressingMetaIndex extends java.lang.Object implements MetaIndex
AMetaIndex
implementation that compresses contents. Values have maximum lengths, but overall value blobs are compressed. Various sub-classes vary in the particular compression algorithm used. From version 3.0 zlib deflate was default.- Since:
- 3.0
- Author:
- Craig Macdonald & Vassilis Plachouras
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BaseCompressingMetaIndex.InputStream
An iterator for reading a MetaIndex as a stream
-
Field Summary
Fields Modifier and Type Field Description protected org.terrier.structures.BaseCompressingMetaIndex.ByteAccessor
dataSource
protected gnu.trove.TObjectIntHashMap<java.lang.String>
key2bytelength
protected gnu.trove.TObjectIntHashMap<java.lang.String>
key2byteoffset
protected gnu.trove.TObjectIntHashMap<java.lang.String>
key2reverseOffset
protected int
keyCount
protected FixedSizeWriteableFactory<org.apache.hadoop.io.Text>[]
keyFactories
protected java.lang.String[]
keyNames
protected int
numDocs
protected org.terrier.structures.BaseCompressingMetaIndex.Docid2OffsetLookup
offsetLookup
protected java.lang.String
path
protected static java.lang.ThreadLocal<org.terrier.structures.BaseCompressingMetaIndex.OffsetPointer>
pointerCache
protected java.lang.String
prefix
protected int
recordLength
protected java.util.Map<org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable>[]
reverseMetaMaps
protected int[]
valueByteLengths
protected int[]
valueByteOffsets
protected boolean[]
valuesSorted
-
Constructor Summary
Constructors Constructor Description BaseCompressingMetaIndex(IndexOnDisk index, java.lang.String structureName)
Construct an instance of the class with
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected int
_binarySearch(java.lang.String key, java.lang.String value)
performs a binary search on the metaindex, if they keys happen to be in lexographical ordervoid
close()
Closes the underlying structures.protected abstract byte[]
decode(byte[] input)
java.lang.String[]
getAllItems(int docid)
Obtain all metadata for specified document.int
getDocument(java.lang.String key, java.lang.String value)
Obtain docid where document has specified metadata value in the specified type.java.lang.String
getItem(java.lang.String Key, int docid)
Obtain metadata of specified type for specified document.java.lang.String[]
getItems(java.lang.String[] Keys, int docid)
Obtain metadata of specified types for specified document.java.lang.String[][]
getItems(java.lang.String[] Keys, int[] _docids)
Obtain metadata of specified types for specified documents.java.lang.String[]
getItems(java.lang.String Key, int[] _docids)
Obtain metadata of specified type for specified documents.java.lang.String[]
getKeys()
Returns the keys of this meta indexjava.lang.String[]
getReverseKeys()
Returns the reverse keys of this meta indexprotected void
loadIndex(IndexOnDisk index, java.lang.String structureName)
static void
main(java.lang.String[] args)
mainint
size()
How many documents in this metaindex
-
-
-
Field Detail
-
pointerCache
protected static final java.lang.ThreadLocal<org.terrier.structures.BaseCompressingMetaIndex.OffsetPointer> pointerCache
-
offsetLookup
protected org.terrier.structures.BaseCompressingMetaIndex.Docid2OffsetLookup offsetLookup
-
recordLength
protected int recordLength
-
keyNames
protected java.lang.String[] keyNames
-
key2byteoffset
protected gnu.trove.TObjectIntHashMap<java.lang.String> key2byteoffset
-
key2bytelength
protected gnu.trove.TObjectIntHashMap<java.lang.String> key2bytelength
-
key2reverseOffset
protected gnu.trove.TObjectIntHashMap<java.lang.String> key2reverseOffset
-
keyCount
protected int keyCount
-
valueByteOffsets
protected int[] valueByteOffsets
-
valueByteLengths
protected int[] valueByteLengths
-
valuesSorted
protected boolean[] valuesSorted
-
numDocs
protected int numDocs
-
path
protected final java.lang.String path
-
prefix
protected final java.lang.String prefix
-
dataSource
protected final org.terrier.structures.BaseCompressingMetaIndex.ByteAccessor dataSource
-
reverseMetaMaps
protected java.util.Map<org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable>[] reverseMetaMaps
-
keyFactories
protected FixedSizeWriteableFactory<org.apache.hadoop.io.Text>[] keyFactories
-
-
Constructor Detail
-
BaseCompressingMetaIndex
public BaseCompressingMetaIndex(IndexOnDisk index, java.lang.String structureName) throws java.io.IOException
Construct an instance of the class with- Parameters:
index
-structureName
-- Throws:
java.io.IOException
-
-
Method Detail
-
size
public int size()
Description copied from interface:MetaIndex
How many documents in this metaindex
-
getKeys
public java.lang.String[] getKeys()
Returns the keys of this meta index
-
close
public void close() throws java.io.IOException
Closes the underlying structures.- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
-
getReverseKeys
public java.lang.String[] getReverseKeys()
Returns the reverse keys of this meta index- Specified by:
getReverseKeys
in interfaceMetaIndex
-
_binarySearch
protected int _binarySearch(java.lang.String key, java.lang.String value) throws java.io.IOException
performs a binary search on the metaindex, if they keys happen to be in lexographical order- Throws:
java.io.IOException
-
getDocument
public int getDocument(java.lang.String key, java.lang.String value) throws java.io.IOException
Obtain docid where document has specified metadata value in the specified type. Returns -1 if the value cannot be found for the specified key type.- Specified by:
getDocument
in interfaceMetaIndex
- Throws:
java.io.IOException
-
getItems
public java.lang.String[] getItems(java.lang.String Key, int[] _docids) throws java.io.IOException
Obtain metadata of specified type for specified documents.. In this implementation, _docids are sorted to improve disk cache hits. _docids is however unchanged.
-
getItems
public java.lang.String[][] getItems(java.lang.String[] Keys, int[] _docids) throws java.io.IOException
Obtain metadata of specified types for specified documents. Return array is indexed by documents, then by metakeys. In this implementation, _docids are sorted to improve disk cache hits. _docids is however unchanged.
-
decode
protected abstract byte[] decode(byte[] input) throws java.io.IOException
- Throws:
java.io.IOException
-
getItem
public java.lang.String getItem(java.lang.String Key, int docid) throws java.io.IOException
Obtain metadata of specified type for specified document.
-
getItems
public java.lang.String[] getItems(java.lang.String[] Keys, int docid) throws java.io.IOException
Obtain metadata of specified types for specified document.
-
getAllItems
public java.lang.String[] getAllItems(int docid) throws java.io.IOException
Obtain all metadata for specified document.- Specified by:
getAllItems
in interfaceMetaIndex
- Throws:
java.io.IOException
-
loadIndex
protected void loadIndex(IndexOnDisk index, java.lang.String structureName) throws java.io.IOException
- Throws:
java.io.IOException
-
main
public static void main(java.lang.String[] args) throws java.lang.Exception
main- Parameters:
args
-- Throws:
java.lang.Exception
-
-