Ok, let's use this issue to discuss all of the proposed operators, but implementations are likely to come in other separate issues. A discussion should encapsulate the proposed syntax of the operators, and the semantics they encapsulate.
Firstly, it's probably worth reiterating the existing query constructs. The lack of amiguety here is caused by the use of best match semantics in combination with constructs which suggest filtering of some form.
| syntax |
semantics |
scoring terms |
| a |
retrieve documents containing a |
a |
| a b |
retrieve documents containing a and/or b |
a b |
| +a b c |
retrieve documents containing a and possibly containing b and/or c |
a b c |
| -a b c |
retrieve documents containing b and/or c, but no a |
b c |
| f1:a |
retrieve documents containing a in field f1 |
a |
| f1:a b |
retrieve documents containing a in field f1, and possibly containing b |
a b |
| -f1:a b |
retrieve documents containing b, but where a does not occur in field f1 |
b |
| "a b" c |
retrieve documents containg a and b as an adjacent phase, which may or may not contain c |
a b c |
| f1:"a b" c |
retrieve documents containg a and b as an adjacent phase within field f1, in a document which may or may not contain c |
a b c |
| "a b"~10 |
retrieve documents which contain a and b within 10 tokens of each other |
a b |
| c -"a b" |
retrieve documents which contain c, and which do not contain a or b as an adjacent phase |
c |
| c -(a b) |
retrieve documents which contain c, but do not contain a or b |
c |
| c -f1:(a b) |
retrieve documents which contain c, but which do not contain a or b in field f1 |
c |
There is also the ^ (hat) operator for controlling the weights on an individual term.
Ok, let's use this issue to discuss all of the proposed operators, but implementations are likely to come in other separate issues. A discussion should encapsulate the proposed syntax of the operators, and the semantics they encapsulate.
Firstly, it's probably worth reiterating the existing query constructs. The lack of amiguety here is caused by the use of best match semantics in combination with constructs which suggest filtering of some form.
There is also the ^ (hat) operator for controlling the weights on an individual term.