Package org.terrier.structures.indexing
Class BaseMetaIndexBuilder
- java.lang.Object
-
- org.terrier.structures.indexing.MetaIndexBuilder
-
- org.terrier.structures.indexing.BaseMetaIndexBuilder
-
- All Implemented Interfaces:
java.io.Closeable
,java.io.Flushable
,java.lang.AutoCloseable
- Direct Known Subclasses:
CompressingMetaIndexBuilder
,LZ4MetaIndexBuilder
,UncompressedMetaIndexBuilder
,ZstdMetaIndexBuilder
public abstract class BaseMetaIndexBuilder extends MetaIndexBuilder implements java.io.Flushable
Abstract base class for compressed and uncompressed metaindex building Properties:- metaindex.compressed.max.data.in-mem.mb - maximum size that a meta index .zdata file will be kept in memory. Defaults to 400(mb).
- metaindex.compressed.max.index.in-mem.mb - maximum size that a meta index .zdata file will be kept in memory. Defaults to 100(mb).
- metaindex.compressed.reverse.allow.duplicates - set this property to true to suppress errors when a reverse meta value is not unique. Default false.
- metaindex.compressed.crop.long - set this property to suppress errors with overlong Document metadata, while will instead be cropped.
- Since:
- 3.0
- Author:
- Craig Macdonald & Vassilis Plachouras
-
-
Field Summary
Fields Modifier and Type Field Description protected java.io.ByteArrayOutputStream
baos
protected byte[]
compressedBuffer
protected boolean
CROP_LONG
protected long
currentIndexOffset
protected long
currentOffset
protected java.io.DataOutputStream
dataOutput
protected int
DOCS_PER_CHECK
protected int
entryCount
protected int
entryLengthBytes
protected IndexOnDisk
index
protected java.io.DataOutputStream
indexOutput
protected gnu.trove.TObjectIntHashMap<java.lang.String>
key2Index
protected int
keyCount
protected FixedSizeWriteableFactory<org.apache.hadoop.io.Text>[]
keyFactories
protected java.lang.String[]
keyNames
protected java.lang.String[]
lastValues
protected org.slf4j.Logger
logger
protected int
MAX_INDEX_MB_IN_MEM_RETRIEVAL
protected int
MAX_MB_IN_MEM_RETRIEVAL
protected MemoryChecker
memCheck
protected boolean
REVERSE_ALLOW_DUPS
protected int
REVERSE_KEY_LOOKUP_WRITING_BUFFER_SIZE
protected java.lang.String[]
reverseKeyNames
protected int[]
reverseKeys
protected FSOrderedMapFile.MapFileWriter[]
reverseWriters
protected byte[]
spaces
protected java.lang.Class<? extends MetaIndex>
structureClass
protected java.lang.Class<? extends java.util.Iterator>
structureInputStreamClass
protected java.lang.String
structureName
protected int[]
valueLensBytes
protected int[]
valueLensChars
protected boolean[]
valuesSorted
-
Constructor Summary
Constructors Constructor Description BaseMetaIndexBuilder(IndexOnDisk _index, java.lang.String[] _keyNames, int[] _valueLens, java.lang.String[] _reverseKeys)
constructorBaseMetaIndexBuilder(IndexOnDisk _index, java.lang.String _structureName, java.lang.String[] _keyNames, int[] _valueLens, java.lang.String[] _reverseKeys)
constructor
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
close()
void
flush()
protected abstract int
writeData(byte[] data)
void
writeDocumentEntry(java.lang.String[] data)
Write out metadata for current document.void
writeDocumentEntry(java.util.Map<java.lang.String,java.lang.String> data)
Write out metadata for current document, extracted from specified map Typically, the MetaIndexBuilder will know which keys from data that it is interested in.-
Methods inherited from class org.terrier.structures.indexing.MetaIndexBuilder
create
-
-
-
-
Field Detail
-
logger
protected final org.slf4j.Logger logger
-
MAX_MB_IN_MEM_RETRIEVAL
protected final int MAX_MB_IN_MEM_RETRIEVAL
-
MAX_INDEX_MB_IN_MEM_RETRIEVAL
protected final int MAX_INDEX_MB_IN_MEM_RETRIEVAL
-
REVERSE_ALLOW_DUPS
protected final boolean REVERSE_ALLOW_DUPS
-
CROP_LONG
protected final boolean CROP_LONG
-
REVERSE_KEY_LOOKUP_WRITING_BUFFER_SIZE
protected final int REVERSE_KEY_LOOKUP_WRITING_BUFFER_SIZE
- See Also:
- Constant Field Values
-
DOCS_PER_CHECK
protected final int DOCS_PER_CHECK
-
key2Index
protected final gnu.trove.TObjectIntHashMap<java.lang.String> key2Index
-
dataOutput
protected java.io.DataOutputStream dataOutput
-
keyNames
protected final java.lang.String[] keyNames
-
keyCount
protected final int keyCount
-
baos
protected java.io.ByteArrayOutputStream baos
-
indexOutput
protected java.io.DataOutputStream indexOutput
-
compressedBuffer
protected byte[] compressedBuffer
-
index
protected IndexOnDisk index
-
valueLensChars
protected int[] valueLensChars
-
valueLensBytes
protected int[] valueLensBytes
-
spaces
protected byte[] spaces
-
entryLengthBytes
protected int entryLengthBytes
-
currentOffset
protected long currentOffset
-
currentIndexOffset
protected long currentIndexOffset
-
entryCount
protected int entryCount
-
reverseKeys
protected int[] reverseKeys
-
reverseKeyNames
protected java.lang.String[] reverseKeyNames
-
reverseWriters
protected FSOrderedMapFile.MapFileWriter[] reverseWriters
-
valuesSorted
protected boolean[] valuesSorted
-
lastValues
protected java.lang.String[] lastValues
-
memCheck
protected MemoryChecker memCheck
-
keyFactories
protected FixedSizeWriteableFactory<org.apache.hadoop.io.Text>[] keyFactories
-
structureName
protected java.lang.String structureName
-
structureClass
protected java.lang.Class<? extends MetaIndex> structureClass
-
structureInputStreamClass
protected java.lang.Class<? extends java.util.Iterator> structureInputStreamClass
-
-
Constructor Detail
-
BaseMetaIndexBuilder
public BaseMetaIndexBuilder(IndexOnDisk _index, java.lang.String[] _keyNames, int[] _valueLens, java.lang.String[] _reverseKeys)
constructor- Parameters:
_index
-_keyNames
-_valueLens
-_reverseKeys
-
-
BaseMetaIndexBuilder
public BaseMetaIndexBuilder(IndexOnDisk _index, java.lang.String _structureName, java.lang.String[] _keyNames, int[] _valueLens, java.lang.String[] _reverseKeys)
constructor- Parameters:
_index
-_structureName
-_keyNames
-_valueLens
-_reverseKeys
-
-
-
Method Detail
-
writeDocumentEntry
public void writeDocumentEntry(java.util.Map<java.lang.String,java.lang.String> data) throws java.io.IOException
Write out metadata for current document, extracted from specified map Typically, the MetaIndexBuilder will know which keys from data that it is interested in.- Specified by:
writeDocumentEntry
in classMetaIndexBuilder
- Throws:
java.io.IOException
-
writeDocumentEntry
public void writeDocumentEntry(java.lang.String[] data) throws java.io.IOException
Write out metadata for current document. Values for all keys are specified.- Specified by:
writeDocumentEntry
in classMetaIndexBuilder
- Throws:
java.io.IOException
-
writeData
protected abstract int writeData(byte[] data) throws java.io.IOException
- Throws:
java.io.IOException
-
flush
public void flush() throws java.io.IOException
- Specified by:
flush
in interfacejava.io.Flushable
- Throws:
java.io.IOException
-
close
public void close() throws java.io.IOException
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
-
-