[TR-38] MapReduce InputFormat for BitPostingIndexInputStream Created: 07/May/09  Updated: 05/Mar/10  Resolved: 16/Jul/09

Status: Resolved
Project: Terrier Core
Component/s: .structures
Affects Version/s: 3.0
Fix Version/s: 3.0

Type: Improvement Priority: Major
Reporter: Craig Macdonald Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Attachments: File BitPostingIndexInputFormat.v1.patch     File BitPostingIndexInputFormat.v2.patch    

Recent core changes have made generic PostingIndex and PostingIndexInputStream objects which give access to IterablePostings. It would be good to have an InputFormat for splitting the reading of a PostingIndex across various map tasks.

Comment by Craig Macdonald [ 07/May/09 ]

This would have benefits in the following scenarios:

  • If we use PostingIndex as a LinkServer, then this would allow link analysis index processing to be easily split
  • Inversion of indices (i.e. Inverted-> Direct; Direct->Inverted) could be split to run in parallel
  • DirectIndex analysis.

As a PostingIndex can be large, locality should be supported.

Comment by Craig Macdonald [ 07/May/09 ]

Initial version, untested.

Comment by Craig Macdonald [ 08/May/09 ]

Updated version - some files were missing from the patch.

Comment by Craig Macdonald [ 16/Jul/09 ]

Resolved a final version to trunk.

Generated at Wed Jan 27 04:04:54 GMT 2021 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.