Details
-
Type:
Improvement
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 2.2.1
-
Fix Version/s: None
-
Component/s: .structures
-
Labels:None
Description
The current Lexicon implementations suffer from several disadvantages:
* To store more information in the lexicon, the Lexicon class has to be sub-classed
* LexiconInputStream and LexiconOutputStreams don't make it easy for more information to be added to the Lexicon
* Deprecated methods, e.g. getTF() etc should be removed
This issue is to track changes to the Lexicon so that the Lexicon code can be reused without extensive sub-classing.
* To store more information in the lexicon, the Lexicon class has to be sub-classed
* LexiconInputStream and LexiconOutputStreams don't make it easy for more information to be added to the Lexicon
* Deprecated methods, e.g. getTF() etc should be removed
This issue is to track changes to the Lexicon so that the Lexicon code can be reused without extensive sub-classing.
Attachments
Issue Links
Activity
Component/s | .structures [ 10007 ] | |
Affects Version/s | 2.2.1 [ 10010 ] |
Workflow | jira [ 10022 ] | Terrier Open Source [ 10036 ] |
Attachment | TR14-v1.patch [ 10020 ] |
Attachment | TR-14.v2.patch [ 10021 ] |
Attachment | TR-14.v3.svn.patch [ 10035 ] |
Status | Open [ 1 ] | Patch Available [ 10000 ] |
Status | Patch Available [ 10000 ] | Resolved [ 5 ] |
Resolution | Fixed [ 1 ] |
The issue here is if we want to store all or most of the information in a unique file or not. Lexicon contains information about how to find information about the elements of our algebra/probability space. For example exact matching or k-grams may require a dedicate structure and a dedicate lexicon.
So, suppose that all related issues about unary lexicons have been solved, so that we have all desiderata all fields, field counter, intelligent merger etc. In principle we have implicitly all information about the collection where a token is and under the scope of what label/tag occurrence. Now, we want to store more information. What is this new information?