[TR-175] Decorate class does not remove field qualifiers when generating query-biased summaries Created: 16/Aug/11  Updated: 17/Apr/12  Resolved: 17/Apr/12

Status: Resolved
Project: Terrier Core
Component/s: .querying
Affects Version/s: 3.5
Fix Version/s: 3.6

Type: Bug Priority: Minor
Reporter: Paul Holmes Assignee: Craig Macdonald
Resolution: Fixed  
Labels: None

Issue Links:
Block
is blocked by TR-253 Decorate & SimpleDecorate needs unit ... Resolved

 Description   
When the filter() method of the PostFilter org.terrier.querying.Decorate is called, the query string is split as follows:

String[] _qTerms = q.getOriginalQuery().replaceAll(" \\w+\\p{Punct}\\w+ "," ").toLowerCase().split(" ");

However, SearchRequest q's method 'getOriginalQuery' returns the full query inclusive of fields (e.g. FIELD:term1 FIELD:term2).

This results in field-prefixed query terms being used to rank sentences within Decorate's generateQueryBiasedSummary method. This means that every sentence will almost certainly score 0 (unless it happens, by chance, to contain 'FIELD:term' within). With all 0-score sentences the Decorate class resorts to using the first 2 sentences of the meta key's value being decorated.

Adding the following to the 'filter' method, immediately after "String[] metadata = getMetadata(metaKeys, docid);", remedies this issue:

for(int p = 0; p < _qTerms.length; p++)
if(_qTerms[p].contains(":"))
_qTerms[p] = _qTerms[p].substring(_qTerms[p].indexOf(':')+1);



 Comments   
Comment by Craig Macdonald [ 17/Apr/12 ]

Committed to r3613. Thanks Paul!

Generated at Sat Dec 16 16:35:50 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.