LexiconBuilder (Terrier Information Retrieval Platform version 1.1.1 API Specification)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Terrier IR Platform
1.1.1

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

uk.ac.gla.terrier.structures.indexing
Class LexiconBuilder

java.lang.Object
  uk.ac.gla.terrier.structures.indexing.LexiconBuilder

Direct Known Subclasses:: BlockLexiconBuilder, UTFLexiconBuilder

public class LexiconBuilder
extends java.lang.Object
extends java.lang.Object

Builds temporary lexicons during indexing a collection and merges them when the indexing of a collection has finished.

Version:: $Revision: 1.36 $
Author:: Craig Macdonald & Vassilis Plachouras

Constructor Summary
`LexiconBuilder()` A default constructor of the class.
`LexiconBuilder(java.lang.String pathname, java.lang.String prefix)` Creates an instance of the class, given the path to save the temporary lexicons.

Method Summary
`void`	`addDocumentTerms(DocumentPostingList terms)` adds the terms of a document to the temporary lexicon in memory.
`void`	`addDocumentTerms(FieldDocumentTreeNode[] terms)` Adds the terms of a document in the temporary lexicon in memory.
`void`	`addTemporaryLexicon(java.lang.String filename)` If the application code generated lexicons itself, use this method to add them to the merge list Otherwise dont touch this method.
`void`	`createLexiconHash(LexiconInputStream lexStream)`
`static void`	`createLexiconHash(LexiconInputStream lexStream, java.lang.String path, java.lang.String prefix)` This method reads the lexicon and finds the entries which start with a different letter.
`void`	`createLexiconIndex(LexiconInputStream lexicon, int lexiconEntries, int lexiconEntrySize)` Creates the lexicon index file that contains a mapping from the given term id to the offset in the lexicon, in order to be able to retrieve the term information according to the term identifier.
`static void`	`createLexiconIndex(LexiconInputStream lexicon, int lexiconEntries, int lexiconEntrySize, java.lang.String path, java.lang.String prefix)`
`void`	`finishedDirectIndexBuild()` Processing the lexicon after finished creating the direct and document indexes.
`void`	`finishedInvertedIndexBuild()` Processing the lexicon after finished creating the inverted index.
`int`	`getFinalNumberOfTerms()` Returns the number of terms in the final lexicon.
`LexiconInputStream`	`getLexInputStream(java.lang.String filename)`
`LexiconOutputStream`	`getLexOutputStream(java.lang.String filename)`
`static void`	`main(java.lang.String[] args)`
`void`	`merge(java.util.LinkedList<java.lang.String> filesToMerge)` Merges the intermediate lexicon files created during the indexing.

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

LexiconBuilder

public LexiconBuilder()

A default constructor of the class. The lexicon is built in the default path and file: ApplicationSetup.TERRIER_INDEX_PATH and ApplicationSetup.TERRIER_INDEX_PREFIX respectively.

LexiconBuilder

public LexiconBuilder(java.lang.String pathname,
                      java.lang.String prefix)

Creates an instance of the class, given the path to save the temporary lexicons.

Parameters:: pathname - String the path to save the temporary lexicons.

Method Detail

getFinalNumberOfTerms

public int getFinalNumberOfTerms()

Returns the number of terms in the final lexicon. Only updated once finishDirectIndexBuild() has executed

addTemporaryLexicon

public void addTemporaryLexicon(java.lang.String filename)

If the application code generated lexicons itself, use this method to add them to the merge list Otherwise dont touch this method.

Parameters:: filename - Fully path to a lexicon to merge

addDocumentTerms

public void addDocumentTerms(FieldDocumentTreeNode[] terms)

Adds the terms of a document in the temporary lexicon in memory.

Parameters:: terms - FieldDocumentTreeNode[] the terms of the document to add in the temporary lexicon in memory.

addDocumentTerms

public void addDocumentTerms(DocumentPostingList terms)

adds the terms of a document to the temporary lexicon in memory.

Parameters:: terms - DocumentPostingList the terms of the document to add to the temporary lexicon

finishedInvertedIndexBuild

public void finishedInvertedIndexBuild()

Processing the lexicon after finished creating the inverted index.

finishedDirectIndexBuild

public void finishedDirectIndexBuild()

Processing the lexicon after finished creating the direct and document indexes.

merge

public void merge(java.util.LinkedList<java.lang.String> filesToMerge)
           throws java.io.IOException

Merges the intermediate lexicon files created during the indexing.

Parameters:: filesToMerge - java.util.LinkedList the list containing the filenames of the temporary files.
Throws:: java.io.IOException - an input/output exception is throws if a problem is encountered.

createLexiconIndex

public void createLexiconIndex(LexiconInputStream lexicon,
                               int lexiconEntries,
                               int lexiconEntrySize)
                        throws java.io.IOException

Creates the lexicon index file that contains a mapping from the given term id to the offset in the lexicon, in order to be able to retrieve the term information according to the term identifier. This is necessary, because the terms in the lexicon file are saved in lexicographical order, and we also want to have fast access based on their term identifier.

Parameters:: lexicon - The input stream of the lexicon that we are creating the lexid file for; lexiconEntries - The number of entries in this lexicon; lexiconEntrySize - The size of one entry in this lexicon
Throws:: java.io.IOException - Throws an Input/Output exception if there is an input/output error.

createLexiconIndex

public static void createLexiconIndex(LexiconInputStream lexicon,
                                      int lexiconEntries,
                                      int lexiconEntrySize,
                                      java.lang.String path,
                                      java.lang.String prefix)
                               throws java.io.IOException

Throws:: java.io.IOException

createLexiconHash

public void createLexiconHash(LexiconInputStream lexStream)

createLexiconHash

public static void createLexiconHash(LexiconInputStream lexStream,
                                     java.lang.String path,
                                     java.lang.String prefix)

This method reads the lexicon and finds the entries which start with a different letter. The offset of these entries is used to speed up the binary search performed during retrieval.

main

public static void main(java.lang.String[] args)

getLexInputStream

public LexiconInputStream getLexInputStream(java.lang.String filename)

getLexOutputStream

public LexiconOutputStream getLexOutputStream(java.lang.String filename)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Terrier IR Platform
1.1.1

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

uk.ac.gla.terrier.structures.indexing Class LexiconBuilder

LexiconBuilder

LexiconBuilder

getFinalNumberOfTerms

addTemporaryLexicon

addDocumentTerms

addDocumentTerms

finishedInvertedIndexBuild

finishedDirectIndexBuild

merge

createLexiconIndex

createLexiconIndex

createLexiconHash

createLexiconHash

main

getLexInputStream

getLexOutputStream

uk.ac.gla.terrier.structures.indexing
Class LexiconBuilder