Package org.terrier.structures.indexing
Class BaseMetaIndexBuilder
- java.lang.Object
-
- org.terrier.structures.indexing.MetaIndexBuilder
-
- org.terrier.structures.indexing.BaseMetaIndexBuilder
-
- All Implemented Interfaces:
java.io.Closeable,java.io.Flushable,java.lang.AutoCloseable
- Direct Known Subclasses:
CompressingMetaIndexBuilder,LZ4MetaIndexBuilder,UncompressedMetaIndexBuilder,ZstdMetaIndexBuilder
public abstract class BaseMetaIndexBuilder extends MetaIndexBuilder implements java.io.Flushable
Abstract base class for compressed and uncompressed metaindex building Properties:- metaindex.compressed.max.data.in-mem.mb - maximum size that a meta index .zdata file will be kept in memory. Defaults to 400(mb).
- metaindex.compressed.max.index.in-mem.mb - maximum size that a meta index .zdata file will be kept in memory. Defaults to 100(mb).
- metaindex.compressed.reverse.allow.duplicates - set this property to true to suppress errors when a reverse meta value is not unique. Default false.
- metaindex.compressed.crop.long - set this property to suppress errors with overlong Document metadata, while will instead be cropped.
- Since:
- 3.0
- Author:
- Craig Macdonald & Vassilis Plachouras
-
-
Field Summary
Fields Modifier and Type Field Description protected java.io.ByteArrayOutputStreambaosprotected byte[]compressedBufferprotected booleanCROP_LONGprotected longcurrentIndexOffsetprotected longcurrentOffsetprotected java.io.DataOutputStreamdataOutputprotected intDOCS_PER_CHECKprotected intentryCountprotected intentryLengthBytesprotected IndexOnDiskindexprotected java.io.DataOutputStreamindexOutputprotected gnu.trove.TObjectIntHashMap<java.lang.String>key2Indexprotected intkeyCountprotected FixedSizeWriteableFactory<org.apache.hadoop.io.Text>[]keyFactoriesprotected java.lang.String[]keyNamesprotected java.lang.String[]lastValuesprotected org.slf4j.Loggerloggerprotected intMAX_INDEX_MB_IN_MEM_RETRIEVALprotected intMAX_MB_IN_MEM_RETRIEVALprotected MemoryCheckermemCheckprotected booleanREVERSE_ALLOW_DUPSprotected intREVERSE_KEY_LOOKUP_WRITING_BUFFER_SIZEprotected java.lang.String[]reverseKeyNamesprotected int[]reverseKeysprotected FSOrderedMapFile.MapFileWriter[]reverseWritersprotected byte[]spacesprotected java.lang.Class<? extends MetaIndex>structureClassprotected java.lang.Class<? extends java.util.Iterator>structureInputStreamClassprotected java.lang.StringstructureNameprotected int[]valueLensBytesprotected int[]valueLensCharsprotected boolean[]valuesSorted
-
Constructor Summary
Constructors Constructor Description BaseMetaIndexBuilder(IndexOnDisk _index, java.lang.String[] _keyNames, int[] _valueLens, java.lang.String[] _reverseKeys)constructorBaseMetaIndexBuilder(IndexOnDisk _index, java.lang.String _structureName, java.lang.String[] _keyNames, int[] _valueLens, java.lang.String[] _reverseKeys)constructor
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidclose()voidflush()protected abstract intwriteData(byte[] data)voidwriteDocumentEntry(java.lang.String[] data)Write out metadata for current document.voidwriteDocumentEntry(java.util.Map<java.lang.String,java.lang.String> data)Write out metadata for current document, extracted from specified map Typically, the MetaIndexBuilder will know which keys from data that it is interested in.-
Methods inherited from class org.terrier.structures.indexing.MetaIndexBuilder
create
-
-
-
-
Field Detail
-
logger
protected final org.slf4j.Logger logger
-
MAX_MB_IN_MEM_RETRIEVAL
protected final int MAX_MB_IN_MEM_RETRIEVAL
-
MAX_INDEX_MB_IN_MEM_RETRIEVAL
protected final int MAX_INDEX_MB_IN_MEM_RETRIEVAL
-
REVERSE_ALLOW_DUPS
protected final boolean REVERSE_ALLOW_DUPS
-
CROP_LONG
protected final boolean CROP_LONG
-
REVERSE_KEY_LOOKUP_WRITING_BUFFER_SIZE
protected final int REVERSE_KEY_LOOKUP_WRITING_BUFFER_SIZE
- See Also:
- Constant Field Values
-
DOCS_PER_CHECK
protected final int DOCS_PER_CHECK
-
key2Index
protected final gnu.trove.TObjectIntHashMap<java.lang.String> key2Index
-
dataOutput
protected java.io.DataOutputStream dataOutput
-
keyNames
protected final java.lang.String[] keyNames
-
keyCount
protected final int keyCount
-
baos
protected java.io.ByteArrayOutputStream baos
-
indexOutput
protected java.io.DataOutputStream indexOutput
-
compressedBuffer
protected byte[] compressedBuffer
-
index
protected IndexOnDisk index
-
valueLensChars
protected int[] valueLensChars
-
valueLensBytes
protected int[] valueLensBytes
-
spaces
protected byte[] spaces
-
entryLengthBytes
protected int entryLengthBytes
-
currentOffset
protected long currentOffset
-
currentIndexOffset
protected long currentIndexOffset
-
entryCount
protected int entryCount
-
reverseKeys
protected int[] reverseKeys
-
reverseKeyNames
protected java.lang.String[] reverseKeyNames
-
reverseWriters
protected FSOrderedMapFile.MapFileWriter[] reverseWriters
-
valuesSorted
protected boolean[] valuesSorted
-
lastValues
protected java.lang.String[] lastValues
-
memCheck
protected MemoryChecker memCheck
-
keyFactories
protected FixedSizeWriteableFactory<org.apache.hadoop.io.Text>[] keyFactories
-
structureName
protected java.lang.String structureName
-
structureClass
protected java.lang.Class<? extends MetaIndex> structureClass
-
structureInputStreamClass
protected java.lang.Class<? extends java.util.Iterator> structureInputStreamClass
-
-
Constructor Detail
-
BaseMetaIndexBuilder
public BaseMetaIndexBuilder(IndexOnDisk _index, java.lang.String[] _keyNames, int[] _valueLens, java.lang.String[] _reverseKeys)
constructor- Parameters:
_index-_keyNames-_valueLens-_reverseKeys-
-
BaseMetaIndexBuilder
public BaseMetaIndexBuilder(IndexOnDisk _index, java.lang.String _structureName, java.lang.String[] _keyNames, int[] _valueLens, java.lang.String[] _reverseKeys)
constructor- Parameters:
_index-_structureName-_keyNames-_valueLens-_reverseKeys-
-
-
Method Detail
-
writeDocumentEntry
public void writeDocumentEntry(java.util.Map<java.lang.String,java.lang.String> data) throws java.io.IOExceptionWrite out metadata for current document, extracted from specified map Typically, the MetaIndexBuilder will know which keys from data that it is interested in.- Specified by:
writeDocumentEntryin classMetaIndexBuilder- Throws:
java.io.IOException
-
writeDocumentEntry
public void writeDocumentEntry(java.lang.String[] data) throws java.io.IOExceptionWrite out metadata for current document. Values for all keys are specified.- Specified by:
writeDocumentEntryin classMetaIndexBuilder- Throws:
java.io.IOException
-
writeData
protected abstract int writeData(byte[] data) throws java.io.IOException- Throws:
java.io.IOException
-
flush
public void flush() throws java.io.IOException- Specified by:
flushin interfacejava.io.Flushable- Throws:
java.io.IOException
-
close
public void close() throws java.io.IOException- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Throws:
java.io.IOException
-
-