Terrier Components |
On this page we will give an overview of Terrier's main components and their interaction.
The graphic below gives an overview of the interaction of the main components involved in the indexing process.
The graphic below gives an overview of the interaction of Terrier's components in the retrieval phase.
Here we provide a listing and brief description of Terrier's components.
Name | Description |
---|---|
Collection | This component encapsulates the most fundamental concept to indexing with Terrier - a Collection i.e. a set of documents. See uk.ac.gla.terrier.indexing.Collection for more details. |
Document | This component encapsulates the concept of a document. It is essentially an Iterator over terms in a document. See uk.ac.gla.terrier.indexing.Document for more details. |
TermPipeline | Models the concept of a component in a pipeline of term processors. Classes that implement this interface could be stemming algorithms, stopwords removers, or acronym expansion just to mention few examples. See uk.ac.gla.terrier.terms.TermPipeline for more details. |
Indexer | The component responsible for managing the indexing process. It instantiates TermPipelines and Builders. See uk.ac.gla.terrier.indexing.Indexer for more details. |
Builders | Builders are responsible for writing an index to disk. See uk.ac.gla.terrier.structures.indexing package for more details. |
Name | Description |
---|---|
BitFile | A highly compressed I/O layer using gamma and unary encodings. See uk.ac.gla.terrier.compression.BitFile for more details. |
Direct Index | The direct index stores the identifiers of terms that appear in each document and the corresponding frequencies. It is used for automatic query expansion, but can also be used for user profiling activities. See uk.ac.gla.terrier.structures.DirectIndex for more details. |
Document Index | The document index stores information about each document for example the document length and identifier, and a pointer to the corresponding entry in the direct index. See uk.ac.gla.terrier.structures.DocumentIndex for more details. |
Inverted Index | The inverted index stores the posting lists, i.e. the identifiers of the documents and their corresponding term frequencies. Moreover it is capable of storing the position of terms within a document. See uk.ac.gla.terrier.structures.InvertedIndex for more details. |
Lexicon | The lexicon stores the collection vocabulary and the corresponding document and term frequencies. See uk.ac.gla.terrier.structures.Lexicon for more details. |
Name | Description |
---|---|
Manager |
This component is responsible for handling/coordinating the main high-level
operations of a query. These are:
|
Matching | The matching component is responsible for determining which documents match a specific query and for scoring documents with respect to a query. See uk.ac.gla.terrier.matching.Matching for more details. |
Query | The matching component is responsible for determining which documents match a specific query and for scoring documents with respect to a query. See uk.ac.gla.terrier.querying.parser.Query for more details. |
Weighting Model | The Weighting model represents the retrieval model that is used to weight the terms of a document. See uk.ac.gla.terrier.matching.models.WeightingModel for more details. |
Document Score Modifiers | Responsible for query dependent modification document scores. See uk.ac.gla.terrier.matching.dsms package for more details. |
Term Score Modifiers | Modifies the scores of the documents for a given set of pointers, or postings. See uk.ac.gla.terrier.matching.tsms package for more details. |
Name | Description |
---|---|
Trec Terrier | An application that enables indexing and querying of TREC collections. See TrecTerrier for more details. |
Desktop Terrier | An application that allows for indexing and retrieval of local user content. See uk.ac.gla.terrier.applications.desktop package for more details. |
Copyright © 2015 University of Glasgow | All Rights Reserved