Commit Graph

19 Commits

Author SHA1 Message Date
Alexey Shekhirin
33860bc3b7
test(update, settings): set & reset synonyms
fixes after review

more fixes after review
2021-04-18 11:24:17 +03:00
Alexey Shekhirin
e39aabbfe6
feat(search, update): synonyms 2021-04-18 11:24:17 +03:00
tamo
dcb00b2e54
test a new implementation of the stop_words 2021-04-12 18:35:33 +02:00
tamo
da036dcc3e
Revert "Integrate the stop_words in the querytree"
This reverts commit 12fb509d84.
We revert this commit because it's causing the bug #150.
The initial algorithm we implemented for the stop_words was:

1. remove the stop_words from the dataset
2. keep the stop_words in the query to see if we can generate new words by
   integrating typos or if the word was a prefix
=> This was causing the bug since, in the case of “The hobbit”, we were
   **always** looking for something starting with “t he” or “th e”
   instead of ignoring the word completely.

For now we are going to fix the bug by completely ignoring the
stop_words in the query.
This could cause another problem were someone mistyped a normal word and
ended up typing a stop_word.

For example imagine someone searching for the music “Won't he do it”.
If that person misplace one space and write “Won' the do it” then we
will loose a part of the request.

One fix would be to update our query tree to something like that:

---------------------
OR
  OR
    TOLERANT hobbit # the first option is to ignore the stop_word
    AND
      CONSECUTIVE   # the second option is to do as we are doing
        EXACT t	    # currently
        EXACT he
      TOLERANT hobbit
---------------------

This would increase drastically the size of our query tree on request
with a lot of stop_words. For example think of “The Lord Of The Rings”.

For now whatsoever we decided we were going to ignore this problem and consider
that it doesn't reduce too much the relevancy of the search to do that
while it improves the performances.
2021-04-12 18:35:33 +02:00
tamo
12fb509d84
Integrate the stop_words in the querytree
remove the stop_words from the querytree except if it was a prefix or a typo
2021-04-01 13:57:55 +02:00
tamo
a2f46029c7
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
Kerollmops
5af63c74e0
Speed-up the MatchingWords highlighting struct 2021-03-03 15:45:03 +01:00
Kerollmops
ae4a237e58
Fix the maximum_proximity function 2021-03-03 15:43:44 +01:00
Kerollmops
9bc9b36645
Introduce the Proximity criterion 2021-03-03 15:43:44 +01:00
many
fb7e6df790
add tests on typo criterion 2021-03-03 15:43:43 +01:00
many
a273c46559
clean warnings 2021-03-03 15:43:42 +01:00
many
73286dc8bf
Introduce the query tree data structure 2021-03-03 15:43:40 +01:00
Kerollmops
240b02e175
Remove unused Operation constructors 2021-03-03 13:40:19 +01:00
many
a463ae821e
Add methods optional_words and authorize_typos on the query tree 2021-03-03 13:40:19 +01:00
Kerollmops
6d135beb21
Introduce the maximum_proximity helper function 2021-03-03 13:40:18 +01:00
Kerollmops
6008f528d0
Introduce the maximum_typo helper function 2021-03-03 13:40:18 +01:00
Kerollmops
1dc857a4b2
Fix the query tree optional word generation with phrases 2021-03-03 13:40:18 +01:00
Kerollmops
4f19749252
Introduce the word_documents_count method on the Context trait 2021-03-03 13:40:18 +01:00
Kerollmops
79a143b32f
Introduce the query tree data structure 2021-03-03 13:40:18 +01:00