[TR-230] proximity operator Created: 12/Jun/13 Updated: 01/Apr/14 Resolved: 01/Apr/14 |
|
Status: | Resolved |
Project: | Terrier Core |
Component/s: | .matching |
Affects Version/s: | 3.5 |
Fix Version/s: | 3.6 |
Type: | Bug | Priority: | Major |
Reporter: | Matteo Catena | Assignee: | Craig Macdonald |
Resolution: | Fixed | ||
Labels: | None |
Attachments: |
![]() ![]() |
Description |
proximity operator is not implemented. Attached a proposed implementation (TO BE TESTED)
|
Comments |
Comment by Richard McCreadie [ 11/Mar/14 ] |
Generated a test case for this issue (TestProximityIterablePosting) that this addition fails. We need to define what block distance in proximity means. Example: The current implementation would return this document because window radius is calculated as Window*numQueryTerms (3*3=9) My expectation is that window radius should be equal to Window? |
Comment by Matteo Catena [ 11/Mar/14 ] |
Probably, the implementation will be even simpler if you consider just 'window' instead of 'window * num_query_terms'. |
Comment by Richard McCreadie [ 14/Mar/14 ] |
I think the best idea is to define proximity as follows: 'All query terms must be contained within a window of n terms'. In this case, if Window 1) window is set to num_query_terms Per-term radius proximity is different I think. |
Comment by Matteo Catena [ 17/Mar/14 ] |
Option 1 sounds better to me. If you want, I can re-implement the class. Can you attach your test case, please? |
Comment by Craig Macdonald [ 17/Mar/14 ] |
I recall that Richard and I agreed that the Distance class was appropriate to use for this class. |
Comment by Richard McCreadie [ 19/Mar/14 ] |
Committed patch and test case for interpretation 'All query terms must be contained within a window of n terms'. Window is set to num_query_terms if window is less than num_query_terms. Commit 3754. Resolve? |
Comment by Richard McCreadie [ 19/Mar/14 ] |
Related note: The patch uses the local isInWindow method rather than Distance.noTimes method Either implementation should be valid, but each will be faster in different use cases. |
Comment by Craig Macdonald [ 19/Mar/14 ] |
Do you have numbers to prove this? |
Comment by Richard McCreadie [ 21/Mar/14 ] |
Did a test of Distance.noTimes for the proximity, but it fails all of the tests. Looking at the Distance test case, I think that noTimes is looking for n-grams not term sets in windows. If so, it is not applicable for this issue. |
Comment by Richard McCreadie [ 01/Apr/14 ] |
Current implementation passes all of the unit tests. Resolving this issue. Query language documentation should be updated to describe what this functionality does and how it differs from proximity score modifiers. |