Uploaded image for project: 'Terrier Core'
  1. Terrier Core
  2. TR-150

TRECCollection parse DOCHDR tags, including URLs SHOULD they exist

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5
    • Component/s: .indexing
    • Labels:
      None

      Description

      TRECCollection parse DOCHDR tags, including URLs SHOULD they exist.

      TREC Web test collections (WT2G etc) have DOCHDR tags, which include the URL. We should parse these out. However, TRECCollection should not bork if the DOCHDR tag does not exist.

        Attachments

          Issue Links

            Activity

            craigm Craig Macdonald created issue -
            craigm Craig Macdonald made changes -
            Field Original Value New Value
            Link This issue is related to TREC-200 [ TREC-200 ]
            craigm Craig Macdonald made changes -
            Description TRECCollection parse DOCHDR tags, including URLs SHOULD they exist TRECCollection parse DOCHDR tags, including URLs SHOULD they exist.

            TREC Web test collections (WT2G etc) have DOCHDR tags, which include the URL. We should parse these out. However, TRECCollection should not bork if the DOCHDR tag does not exist.
            Hide
            craigm Craig Macdonald added a comment -

            I decided this was easier by having a sub-class called TRECWebCollection, which knows how to parse DOCHDR tags of various types of test collection.

            Show
            craigm Craig Macdonald added a comment - I decided this was easier by having a sub-class called TRECWebCollection, which knows how to parse DOCHDR tags of various types of test collection.
            Hide
            craigm Craig Macdonald added a comment -

            Committed to trunk, including junit tests. Tested indexing WT2G.

            Show
            craigm Craig Macdonald added a comment - Committed to trunk, including junit tests. Tested indexing WT2G.
            craigm Craig Macdonald made changes -
            Status Open [ 1 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            craigm Craig Macdonald made changes -
            Project TREC [ 10010 ] Terrier Core [ 10000 ]
            Key TREC-240 TR-150
            Workflow jira [ 10520 ] Terrier Open Source [ 10543 ]
            Component/s .indexing [ 10002 ]
            Component/s Core [ 10020 ]
            Fix Version/s 3.1 [ 10040 ]
            Fix Version/s 3.1 [ 10021 ]

              People

              • Assignee:
                craigm Craig Macdonald
                Reporter:
                craigm Craig Macdonald
              • Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: