[TR-150] TRECCollection parse DOCHDR tags, including URLs SHOULD they exist Created: 01/Apr/11  Updated: 05/Apr/11  Resolved: 02/Apr/11

Status: Resolved
Project: Terrier Core
Component/s: .indexing
Affects Version/s: None
Fix Version/s: 3.5

Type: New Feature Priority: Major
Reporter: Craig Macdonald Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Issue Links:
Related
is related to TR-140 Indexing support for query-biased sum... Resolved

 Description   
TRECCollection parse DOCHDR tags, including URLs SHOULD they exist.

TREC Web test collections (WT2G etc) have DOCHDR tags, which include the URL. We should parse these out. However, TRECCollection should not bork if the DOCHDR tag does not exist.

 Comments   
Comment by Craig Macdonald [ 02/Apr/11 ]

I decided this was easier by having a sub-class called TRECWebCollection, which knows how to parse DOCHDR tags of various types of test collection.

Comment by Craig Macdonald [ 02/Apr/11 ]

Committed to trunk, including junit tests. Tested indexing WT2G.

Generated at Mon Dec 11 22:58:36 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.