In addition to traditional on-disk indices, Terrier provides both memory-only and hybrid memory+disk index structures that can be updated dynamically with new documents over time. Since Terrier 4.0, the top level Index class became abstract such that different types of indices can be supported. The pre-Terrier 4.0 index functionality is contained within the IndexOnDisk class, while new index types were added to enable search systems that can be updated in real-time without a lengthy batch indexing process.
To support real-time indexing, two new interfaces have been defined, namely UpdatableIndex and WritableIndex. An index class that implements WritableIndex supports the dynamic addition of new documents via a indexDocument() method. When indexDocument() is called, that document will be added to the index immediately and will be searchable once the indexDocument() returns. The WritableIndex interface represents an index that can be written to disk. In particular, a class that implements WritableIndex will implement a write() method that will convert each of the index structures into equivalent on-disk structures and will be written out to a specified path and with a named prefix. An index written in this way can then be later loaded as an IndexOnDisk index.
There are two real-time index structures supported in Terrier 4.0:
// define an example document and query String docContent = "Real-time indexing and retrieval is easy to use in Terrier"; String query = "Indexing"; // create a new index MemoryIndex memIndex = new MemoryIndex(); // get the default tokeniser to break the document down into words Tokeniser tokeniser = Tokeniser.Tokeniser.getTokeniser(); // create a Terrier document from the content string Reader contentReader = new StringReader(docContent); Map[Previous: Evaluation] [Contents] [Next: Desktop Search in Terrier]
documentProperties = new HashMap (); FileDocument document = new FileDocument(contentReader, documentProperties, tokeniser); // index the document memIndex.indexDocument(document); // the document is now available for searching // create a search manager (runs the search process over an index) Manager queryingManager = new Manager(memIndex); // a search request represents the search to be carried out SearchRequest srq = queryingManager.newSearchRequest("query", sb.toString()); srq.setOriginalQuery(sb.toString()); // define a matching model, in this case use the classical BM25 retrieval model srq.addMatchingModel("Matching","BM25"); // run the four stages of a Terrier search queryingManager.runPreProcessing(srq); queryingManager.runMatching(srq); queryingManager.runPostProcessing(srq); queryingManager.runPostFilters(srq); ResultSet results = srq.getResultSet();
Copyright © 2014 University of Glasgow | All Rights Reserved