Details
-
Type:
New Feature
-
Status: Resolved
-
Priority:
Minor
-
Resolution: Duplicate
-
Affects Version/s: 2.2.1
-
Fix Version/s: 3.0
-
Component/s: None
-
Labels:None
Description
The documents in the new TREC ClueWeb09 collection are formatted in WARC.
It will be good if Terrier provides support for this format.
It will be good if Terrier provides support for this format.
Hi, I'm trying to index the UK2007 Spam collection that is in WARC format.
I've patched my current version of Terrier with the file attached here and there is a problem with the line: logger = Logger.getLogger(WARC018Collection.class) because there insn't a WARC018Collection class. Why this getLogger is different to the others?
Thank you.