[TR-28] Index WARC collections Created: 01/May/09  Updated: 08/Mar/10  Resolved: 08/Mar/10

Status: Resolved
Project: Terrier Core
Component/s: None
Affects Version/s: 2.2.1
Fix Version/s: 3.0

Type: New Feature Priority: Minor
Reporter: Iadh Ounis Assignee: Craig Macdonald
Resolution: Duplicate  
Labels: None

Attachments: File TR-28.patch    

The documents in the new TREC ClueWeb09 collection are formatted in WARC.
It will be good if Terrier provides support for this format.

Comment by Carlos Lorenzetti [ 15/Oct/09 ]

Hi, I'm trying to index the UK2007 Spam collection that is in WARC format.
I've patched my current version of Terrier with the file attached here and there is a problem with the line: logger = Logger.getLogger(WARC018Collection.class) because there insn't a WARC018Collection class. Why this getLogger is different to the others?

Thank you.

Comment by Craig Macdonald [ 08/Mar/10 ]

Duplicate of TR-36

Generated at Wed Jan 27 03:10:47 GMT 2021 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.