Release Notes - Terrier Core - Version 3.0 - HTML format

Configure Release Notes

Bug

  • [TR-39] - DocID's are not assigned correctly during the reduce step of MapReduce Indexing.
  • [TR-46] - Multiple reducing ends up with a document index and a metaindex for ALL shards
  • [TR-47] - Inv2DirectMapReduce doesnt work for multiple reducers
  • [TR-48] - Field indexing is reported to not be correct.
  • [TR-52] - FSOrderedMapFile causes seek(-1) when searching for an entry less than the first.
  • [TR-53] - Rounding.toString() doesnt work for 10dp.
  • [TR-54] - Hadoop Indexing MetaIndex finishing made docids out by 2
  • [TR-55] - Singlepass indexing efficiency hindered by getMemoryConsumption() calls
  • [TR-56] - 2Way StructureMerger - produces too large termids
  • [TR-57] - Inverted2DirectMutilReduce leaves the last document one token short
  • [TR-59] - Reset problem in Terrier evaluation tool
  • [TR-61] - Desktop example app should use MetaIndex
  • [TR-63] - Minor quickstart documentation updates
  • [TR-66] - TRECQuery needs refactored
  • [TR-70] - Printing of inverted/direct indices with fields support
  • [TR-72] - FSOrderedMapFile.EntryIterator.skip() breaks FSOrderedMapFile.EntryIterator.hasNext()
  • [TR-77] - MR InputFormat for MetaIndex processes one too many entries on last segment
  • [TR-78] - BitInputFormat: some minor changes
  • [TR-81] - Move a proximity score weighting model to Core
  • [TR-83] - Hadoop indexing: splits are uneven
  • [TR-84] - TRECQuery.hasMoreQueries() returns itself
  • [TR-87] - PorterStemmer doesnt match expected output by Porter himself
  • [TR-88] - MultiFileSplit.java from Hadoop 0.18 is in Terrier core
  • [TR-89] - Check all .java and .sh files have Terrier license header
  • [TR-90] - MatchingQueryTerms does not retain query term order
  • [TR-91] - WARC09Collection does not have InputStream constructor for MR indexing
  • [TR-92] - utility.io.CountingInputStream does not count single bytes correctly.
  • [TR-93] - Inv2DirectMultiReduce doesnt handle empty documents on reducer boundaries
  • [TR-94] - BitPostingIndexInputFormat tries to use negative offsets
  • [TR-95] - FSArrayFile.ArrayFileIterator.skip() doesnt update entry index correctly
  • [TR-97] - bin/*.sh scripts are not executable in the .tar.gz file
  • [TR-98] - In MR indexing, termids should be ascending and unique
  • [TR-102] - StructureMerger: new termids for terms only in 2nd lexicon are not used

Improvement

  • [TR-2] - Use ant to build Terrier
  • [TR-13] - Allow fields to contain count information
  • [TR-37] - Full support for direct file generation in Hadoop mode indexing
  • [TR-38] - MapReduce InputFormat for BitPostingIndexInputStream
  • [TR-40] - Enable Hadoop-mode Map Output Compression
  • [TR-41] - Hadoop Indexing loads CompressedMetaIndex into memory during reduce phase
  • [TR-44] - In singlepass indexing, checking for enough free memory is insufficient
  • [TR-50] - in MR indexing, corpus order is not retained.
  • [TR-64] - Examine IterablePosting interface from context of efficiency code
  • [TR-65] - Replace Terrier's Makefile with Ant build.xml
  • [TR-67] - Request object should contain the Index
  • [TR-69] - When indexing, support the ELSE field
  • [TR-71] - Allow BitInputStream structures to be split across multiple files
  • [TR-73] - Models which use FieldPosting.getFieldLength() are slow
  • [TR-74] - Improve efficiency of FieldPosting.getFieldLengths() for field-based models
  • [TR-79] - Refactor Inv2DirectMultiReduce for other purposes
  • [TR-80] - Move code to terrier.org Java package namespaces
  • [TR-85] - Collection should extend Iterator<Document>
  • [TR-86] - Matching should be an interface
  • [TR-100] - Revisit default and sample terrier.properties files
  • [TR-101] - Full pass over documentation

New Feature

  • [TR-15] - support for per-document term weighting in query expansion
  • [TR-28] - Index WARC collections
  • [TR-36] - Index WARC collections
  • [TR-43] - Super-fields in single index support
  • [TR-45] - Add (read|write)(Delta|Golomb) etc to BitIn/BitOut
  • [TR-49] - Let TRECQuerying filename be predetermined by property
  • [TR-51] - Very large metaindex zdata files in memory
  • [TR-62] - Add caching support to Files API
  • [TR-68] - Implement Multinomial Field-based DFR Model
  • [TR-75] - Terrier should allow us to set the runtag in our runs
  • [TR-82] - Have a simple webapps search results interface
  • [TR-99] - Provide way to integrate static doc prior easily

Task

  • [TR-42] - Improved Index format and class changes
  • [TR-60] - Remove PonteCroft language modelling
  • [TR-96] - Have field-based models to CORE

Test

  • [TR-58] - End to end Shakespeare tests dont test empty documents
  • [TR-76] - Bump Junit to version 4.7

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.