Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-54

Hadoop Indexing MetaIndex finishing made docids out by 2

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: .indexing, .structures
    • Labels:
      None

      Description

      From Rodrygo's email


      Hello all,

      Today, we came across a problem with the meta indices. Basically, for
      some indices, looking up a docid given a docno produces wrong results.

        Attachments

          Activity

          craigm Craig Macdonald created issue -
          Hide
          craigm Craig Macdonald added a comment -

          Fixed this issue by CompressingMetaIndex.RecordReader and .InputStream stuff

          Show
          craigm Craig Macdonald added a comment - Fixed this issue by CompressingMetaIndex.RecordReader and .InputStream stuff
          craigm Craig Macdonald made changes -
          Field Original Value New Value
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          craigm Craig Macdonald added a comment -

          I added a test case for the shakespeare mechant of venice test suite.

          Show
          craigm Craig Macdonald added a comment - I added a test case for the shakespeare mechant of venice test suite.
          craigm Craig Macdonald made changes -
          Affects Version/s 3.0 [ 10020 ]
          Description From Rodrygo's email


          Hello all,

          Today, we came across a problem with the meta indices. Basically, for
          some indices, looking up a docid given a docno produces wrong results.
          For example, on
          hdfs://trmaster:9000/Indices/ClueWeb09/TREC-B/classical, 0 maps to
          clueweb09-enwp00-87-00000, which maps back to -2. This problem was
          observed in the following indices:

          hdfs://trmaster:9000/Indices/ClueWeb09/TREC-B/classical
          >> >> 0 maps to clueweb09-enwp00-87-00000 which maps to -2 : PROBLEM
          hdfs://trmaster:9000/Indices/blogs08/classical_v3
          >> >> 0 maps to BLOG08-20081222-002-0000000000 which maps to -2 : PROBLEM
          /local/terrier/Indices/Blogs08/classical_v3
          >> >> 0 maps to BLOG08-20081222-002-0000000000 which maps to -2 : PROBLEM
          /local/tr.clueweb09/rodrygo/Blogs08/classical_v3_blocks
          >> >> 0 maps to BLOG08-20081222-002-0000000000 which maps to -2 : PROBLEM

          As other (smaller) indices have not been affected, my guess is that it
          could be a merging problem. As for our submitted runs, if I remember
          correctly, only the diversity runs had to convert from docnos to
          docids (when looking up the docids of documents in the sub-rankings
          produced for the suggested queries). However, given that the diversity
          runs relied on a temporary meta index built on the fly (a hack on
          TRECResultMatching) rather than the actual meta index, I think these
          runs are probably safe.
          From Rodrygo's email


          Hello all,

          Today, we came across a problem with the meta indices. Basically, for
          some indices, looking up a docid given a docno produces wrong results.
          Fix Version/s 3.0 [ 10020 ]
          craigm Craig Macdonald made changes -
          Project TREC [ 10010 ] Terrier Core [ 10000 ]
          Key TREC-59 TR-54
          Workflow jira [ 10141 ] Terrier Open Source [ 10312 ]
          Affects Version/s 3.0 [ 10030 ]
          Affects Version/s 3.0 [ 10020 ]
          Component/s .indexing [ 10002 ]
          Component/s .structures [ 10007 ]
          Component/s Core [ 10020 ]
          Component/s TREC2009 [ 10033 ]
          Fix Version/s 3.0 [ 10030 ]
          Fix Version/s 3.0 [ 10020 ]

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              craigm Craig Macdonald
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: