Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 2.2.1
    • Fix Version/s: 3.0
    • Component/s: None
    • Labels:
      None

      Description

      The documents in the new TREC ClueWeb09 collection are formatted in WARC.
      It will be good if Terrier provides support for this format.

        Attachments

          Activity

          Hide
          kcho Carlos Lorenzetti added a comment -

          Hi, I'm trying to index the UK2007 Spam collection that is in WARC format.
          I've patched my current version of Terrier with the file attached here and there is a problem with the line: logger = Logger.getLogger(WARC018Collection.class) because there insn't a WARC018Collection class. Why this getLogger is different to the others?

          Thank you.

          Show
          kcho Carlos Lorenzetti added a comment - Hi, I'm trying to index the UK2007 Spam collection that is in WARC format. I've patched my current version of Terrier with the file attached here and there is a problem with the line: logger = Logger.getLogger(WARC018Collection.class) because there insn't a WARC018Collection class. Why this getLogger is different to the others? Thank you.
          Hide
          craigm Craig Macdonald added a comment -

          Duplicate of TR-36

          Show
          craigm Craig Macdonald added a comment - Duplicate of TR-36

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              ounis Iadh Ounis
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: