|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object uk.ac.gla.terrier.structures.indexing.InvertedIndexBuilder
public class InvertedIndexBuilder
Builds an inverted index. It optionally saves term-field information as well.
Algorithm:
Lexicon term selection:
There are two strategies of selecting the number of terms to read from the lexicon. The trade-off here
is to read a small enough number of terms into memory such that the occurrences of all those terms from
the direct file can fit in memory. On the other hand, the less terms that are read implies more iterations,
which is I/O expensive, as the entire direct file has to be read for every iteration.
The two strategies are:
Properties:
Field Summary | |
---|---|
int |
numberOfDocuments
The number of documents in the collection. |
long |
numberOfPointers
The number of pointers in the inverted file. |
long |
numberOfTokens
The number of tokens in the collection. |
int |
numberOfUniqueTerms
The number of unique terms in the vocabulary. |
Constructor Summary | |
---|---|
InvertedIndexBuilder()
Deprecated. |
|
InvertedIndexBuilder(Index i)
|
|
InvertedIndexBuilder(java.lang.String filename)
Deprecated. Use this() or this(String, String) |
|
InvertedIndexBuilder(java.lang.String Path,
java.lang.String Prefix)
Deprecated. |
Method Summary | |
---|---|
void |
close()
Closes the underlying bit file. |
void |
createInvertedIndex()
Creates the inverted index using the already created direct index, document index and lexicon. |
static void |
displayMemoryUsage(java.lang.Runtime r)
|
LexiconInputStream |
getLexInputStream(java.lang.String filename)
|
LexiconOutputStream |
getLexOutputStream(java.lang.String filename)
|
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public int numberOfUniqueTerms
public int numberOfDocuments
public long numberOfTokens
public long numberOfPointers
Constructor Detail |
---|
public InvertedIndexBuilder(java.lang.String Path, java.lang.String Prefix)
public InvertedIndexBuilder(Index i)
public InvertedIndexBuilder()
public InvertedIndexBuilder(java.lang.String filename)
filename
- The name of the inverted fileMethod Detail |
---|
public void close() throws java.io.IOException
java.io.IOException
public void createInvertedIndex()
public static void displayMemoryUsage(java.lang.Runtime r)
public LexiconInputStream getLexInputStream(java.lang.String filename)
public LexiconOutputStream getLexOutputStream(java.lang.String filename)
|
Terrier IR Platform 2.2.1 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |