Details
-
Type:
New Feature
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 3.0
-
Fix Version/s: 3.0
-
Component/s: None
-
Labels:None
Description
We need to check/ have support for indexing WARC collections
Initial version of parser for ClueWeb09. Currently, the class is named WARC0.18 parser. I'm thinking about having a sub-class of this which does the ClueWeb09 specific bits.
In essence, the changes from a standard WARC v 0.18 parser are:
This successfully indexes the sample collection