Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6
    • Component/s: .matching
    • Labels:
      None

      Description

      proximity operator is not implemented. Attached a proposed implementation (TO BE TESTED)

        Attachments

          Activity

          catena.matteo Matteo Catena created issue -
          craigm Craig Macdonald made changes -
          Field Original Value New Value
          Fix Version/s 3.6 [ 10060 ]
          Hide
          richardm Richard McCreadie added a comment -

          Generated a test case for this issue (TestProximityIterablePosting) that this addition fails.

          We need to define what block distance in proximity means.

          Example:
          Document: "Whenever you win a coin flip, put a luck counter on Chance Encounter"
          Query: "coin flip luck"
          Window: 3

          The current implementation would return this document because window radius is calculated as Window*numQueryTerms (3*3=9)

          My expectation is that window radius should be equal to Window?

          Show
          richardm Richard McCreadie added a comment - Generated a test case for this issue (TestProximityIterablePosting) that this addition fails. We need to define what block distance in proximity means. Example: Document: "Whenever you win a coin flip, put a luck counter on Chance Encounter" Query: "coin flip luck" Window: 3 The current implementation would return this document because window radius is calculated as Window*numQueryTerms (3*3=9) My expectation is that window radius should be equal to Window?
          Hide
          catena.matteo Matteo Catena added a comment -

          Probably, the implementation will be even simpler if you consider just 'window' instead of 'window * num_query_terms'.
          But what if, for instance, window is less than num_query_term? Ex: "coin flip luck counter"~2, does it have to return no results? Or does it have to return all the documents s.t. the distance between consecutive query term is <= 2?

          Show
          catena.matteo Matteo Catena added a comment - Probably, the implementation will be even simpler if you consider just 'window' instead of 'window * num_query_terms'. But what if, for instance, window is less than num_query_term? Ex: "coin flip luck counter"~2, does it have to return no results? Or does it have to return all the documents s.t. the distance between consecutive query term is <= 2?
          Hide
          richardm Richard McCreadie added a comment -

          I think the best idea is to define proximity as follows: 'All query terms must be contained within a window of n terms'. In this case, if Window is less than num_query_terms then we have two options

          1) window is set to num_query_terms
          2) return nothing

          Per-term radius proximity is different I think.

          Show
          richardm Richard McCreadie added a comment - I think the best idea is to define proximity as follows: 'All query terms must be contained within a window of n terms'. In this case, if Window is less than num_query_terms then we have two options 1) window is set to num_query_terms 2) return nothing Per-term radius proximity is different I think.
          Hide
          catena.matteo Matteo Catena added a comment -

          Option 1 sounds better to me. If you want, I can re-implement the class. Can you attach your test case, please?

          Show
          catena.matteo Matteo Catena added a comment - Option 1 sounds better to me. If you want, I can re-implement the class. Can you attach your test case, please?
          Hide
          craigm Craig Macdonald added a comment -

          I recall that Richard and I agreed that the Distance class was appropriate to use for this class.

          Show
          craigm Craig Macdonald added a comment - I recall that Richard and I agreed that the Distance class was appropriate to use for this class.
          Hide
          richardm Richard McCreadie added a comment -

          Committed patch and test case for interpretation 'All query terms must be contained within a window of n terms'. Window is set to num_query_terms if window is less than num_query_terms.

          Commit 3754.

          Resolve?

          Show
          richardm Richard McCreadie added a comment - Committed patch and test case for interpretation 'All query terms must be contained within a window of n terms'. Window is set to num_query_terms if window is less than num_query_terms. Commit 3754. Resolve?
          Hide
          richardm Richard McCreadie added a comment -

          Related note:

          The patch uses the local isInWindow method rather than Distance.noTimes method

          Either implementation should be valid, but each will be faster in different use cases.
          isInWindow will be faster for long documents when the query terms appear only rarely. (Complexity: |Q| . window . occurences(Q,d))
          Distance.noTimes will be faster when the query terms appear often in a document. (Complexity: |Q| . documentLength-window)

          Show
          richardm Richard McCreadie added a comment - Related note: The patch uses the local isInWindow method rather than Distance.noTimes method Either implementation should be valid, but each will be faster in different use cases. isInWindow will be faster for long documents when the query terms appear only rarely. (Complexity: |Q| . window . occurences(Q,d)) Distance.noTimes will be faster when the query terms appear often in a document. (Complexity: |Q| . documentLength-window)
          Hide
          craigm Craig Macdonald added a comment -

          Do you have numbers to prove this?
          Also, does isInWindow() pass similar tests to Distance.noTimes
          Could the method be moved to the Distance class, to keep everything in the same place?

          Show
          craigm Craig Macdonald added a comment - Do you have numbers to prove this? Also, does isInWindow() pass similar tests to Distance.noTimes Could the method be moved to the Distance class, to keep everything in the same place?
          Hide
          richardm Richard McCreadie added a comment -

          Did a test of Distance.noTimes for the proximity, but it fails all of the tests. Looking at the Distance test case, I think that noTimes is looking for n-grams not term sets in windows.

          If so, it is not applicable for this issue.

          Show
          richardm Richard McCreadie added a comment - Did a test of Distance.noTimes for the proximity, but it fails all of the tests. Looking at the Distance test case, I think that noTimes is looking for n-grams not term sets in windows. If so, it is not applicable for this issue.
          Hide
          richardm Richard McCreadie added a comment -

          Current implementation passes all of the unit tests. Resolving this issue.

          Query language documentation should be updated to describe what this functionality does and how it differs from proximity score modifiers.

          Show
          richardm Richard McCreadie added a comment - Current implementation passes all of the unit tests. Resolving this issue. Query language documentation should be updated to describe what this functionality does and how it differs from proximity score modifiers.
          richardm Richard McCreadie made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]

            People

            • Assignee:
              craigm Craig Macdonald
              Reporter:
              catena.matteo Matteo Catena
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: