[TR-334] Terrier can not parse topic file when it contains only IDs (not English words) Created: 03/Apr/15  Updated: 01/Dec/15  Resolved: 01/Dec/15

Status: Resolved
Project: Terrier Core
Component/s: .querying
Affects Version/s: 4.0
Fix Version/s: 4.1

Type: Bug Priority: Major
Reporter: shadi saleh Assignee: Craig Macdonald
Resolution: Won't Fix  
Labels: None


 Description   
I am doing indexing for medical documents, and instead of indexing the English terms I am annotating the terms in both of the documents and query.
So I have something like a sequence if "C092736" .

Terrier can not index those terms, can I override that?

 Comments   
Comment by Craig Macdonald [ 06/Nov/15 ]

Tagging for 4.1.

Comment by Craig Macdonald [ 01/Dec/15 ]

Hi. This isn't a bug per-se, but by design. You need to override EnglishTokeniser such that the check() method isn't called.

Generated at Sun Dec 17 13:56:37 GMT 2017 using JIRA 7.1.1#71004-sha1:d6b2c0d9b7051e9fb5e4eb8ce177ca56d91d7bd8.