Commit Graph

9795 Commits

Author SHA1 Message Date
Alexey Shekhirin
9eaf048a06
fix(http): use BTreeMap instead of HashMap to preserve stats order 2021-04-13 11:59:07 +03:00
tamo
dcb00b2e54
test a new implementation of the stop_words 2021-04-12 18:35:33 +02:00
tamo
da036dcc3e
Revert "Integrate the stop_words in the querytree"
This reverts commit 12fb509d84.
We revert this commit because it's causing the bug #150.
The initial algorithm we implemented for the stop_words was:

1. remove the stop_words from the dataset
2. keep the stop_words in the query to see if we can generate new words by
   integrating typos or if the word was a prefix
=> This was causing the bug since, in the case of “The hobbit”, we were
   **always** looking for something starting with “t he” or “th e”
   instead of ignoring the word completely.

For now we are going to fix the bug by completely ignoring the
stop_words in the query.
This could cause another problem were someone mistyped a normal word and
ended up typing a stop_word.

For example imagine someone searching for the music “Won't he do it”.
If that person misplace one space and write “Won' the do it” then we
will loose a part of the request.

One fix would be to update our query tree to something like that:

---------------------
OR
  OR
    TOLERANT hobbit # the first option is to ignore the stop_word
    AND
      CONSECUTIVE   # the second option is to do as we are doing
        EXACT t	    # currently
        EXACT he
      TOLERANT hobbit
---------------------

This would increase drastically the size of our query tree on request
with a lot of stop_words. For example think of “The Lord Of The Rings”.

For now whatsoever we decided we were going to ignore this problem and consider
that it doesn't reduce too much the relevancy of the search to do that
while it improves the performances.
2021-04-12 18:35:33 +02:00
Clément Renault
f9eab6e0de
Merge pull request #151 from meilisearch/release-drafter
Add release drafter files
2021-04-12 10:25:52 +02:00
Clémentine Urquizar
6a128d4ec7
Add release drafter files 2021-04-12 10:18:39 +02:00
Clément Renault
5efe67f375
Merge pull request #154 from shekhirin/shekhirin/fix-settings-serde-tests
test(http): fix and refactor settings assert_(ser|de)_tokens
2021-04-11 10:52:38 +02:00
Alexey Shekhirin
3af8fa194c
test(http): combine settings assert_(ser|de)_tokens into 1 test 2021-04-10 12:13:59 +03:00
Clément Renault
0d09c64dde
Merge pull request #148 from shekhirin/shekhirin/setting-enum
refactor(http, update): introduce setting enum
2021-04-09 22:48:58 +02:00
Alexey Shekhirin
adfdb99abc
feat(http): calculate updates' and uuids' dbs size 2021-04-09 15:59:12 +03:00
Alexey Shekhirin
ae1655586c
fixes after review 2021-04-09 14:40:48 +03:00
Alexey Shekhirin
698a1ea582
feat(http): store processing as RwLock<Option<Uuid>> in index_actor 2021-04-09 14:34:43 +03:00
Alexey Shekhirin
87412f63ef
feat(http): implement is_indexing for stats 2021-04-09 14:34:42 +03:00
Alexey Shekhirin
09d9a29176
test(http): server & index stats 2021-04-09 14:34:42 +03:00
Alexey Shekhirin
dd9eae8c26
feat(http): stats route 2021-04-09 14:34:42 +03:00
bors[bot]
a1d04fbff5
Merge #136
136: Rename update status "pending" into "enqueued" r=curquiza a=curquiza

Closes #107 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-08 16:46:12 +00:00
bors[bot]
dd1a08087b
Merge #134
134: fix(http, index): init analyzer with optional stop words r=MarinPostma a=shekhirin

Also bump `milli` and `meilisearch-tokenizer` packages versions

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-04-08 16:13:15 +00:00
Alexey Shekhirin
51ba1bd7d3
fix(http, index): init analyzer with optional stop words
Next release

update tokenizer
2021-04-08 17:16:13 +03:00
Alexey Shekhirin
84c1dda39d
test(http): setting enum serialize/deserialize 2021-04-08 17:03:40 +03:00
Alexey Shekhirin
dc636d190d
refactor(http, update): introduce setting enum 2021-04-08 17:03:40 +03:00
bors[bot]
f881e8691e
Merge #135
135: Add stop words r=curquiza a=irevoire

closes #21 

Co-authored-by: tamo <tamo@meilisearch.com>
2021-04-08 11:29:00 +00:00
bors[bot]
94c0858c27
Merge #1327
1327: Update link after branch renaming r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-08 05:47:20 +00:00
Clémentine Urquizar
6aaa4a8e19
Update link after branch renaming 2021-04-07 19:47:48 +02:00
Clémentine Urquizar
cb23775d18
Rename pending into enqueued 2021-04-07 19:46:36 +02:00
bors[bot]
0344cf5874
Merge #122
122: Update display r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-07 12:33:25 +00:00
bors[bot]
4a1b033765
Merge #1318
1318: Update README.md for contributions r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-06 23:11:29 +00:00
tamo
dcd60a5b45
add more tests for the stop_words 2021-04-06 18:29:38 +02:00
tamo
b1962c8e02
remove legacy files from meilisearch that have been replaced by a macro in routes/settings/mod.rs 2021-04-06 16:29:04 +02:00
tamo
40ef9a3c6a
push a first implementation of the stop_words 2021-04-06 16:29:04 +02:00
Clément Renault
2bcdd8844c
Merge pull request #141 from meilisearch/reorganize-criterion
reorganize criterion
2021-04-01 19:50:16 +02:00
tamo
0a4bde1f2f
update the default ordering of the criterion 2021-04-01 19:45:31 +02:00
Clément Renault
ee3f93c029
Merge pull request #136 from shekhirin/index-fields-ids-distribution-cache
feat(index): store fields distribution in index
2021-04-01 18:36:21 +02:00
Alexey Shekhirin
2658c5c545
feat(index): update fields distribution in clear & delete operations
fixes after review

bump the version of the tokenizer

implement a first version of the stop_words

The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests

Integrate the stop_words in the querytree

remove the stop_words from the querytree except if it was a prefix or a typo

more fixes after review
2021-04-01 19:12:35 +03:00
Alexey Shekhirin
27c7ab6e00
feat(index): store fields distribution in index 2021-04-01 18:35:19 +03:00
bors[bot]
2206a44baf
Merge #132
132: Next release (alpha2) r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-01 15:25:45 +00:00
Clémentine Urquizar
4ee6ce7871
Next release 2021-04-01 17:16:16 +02:00
bors[bot]
6cb8052d3d
Merge #104
104: Update all the response format (issue #64) r=MarinPostma a=irevoire

closes #64 

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: tamo <tamo@meilisearch.com>
2021-04-01 14:22:57 +00:00
tamo
73973e2b9e
fix more settings routes 2021-04-01 15:50:45 +02:00
bors[bot]
89e05fc6c5
Merge #113
113: snapshots r=MarinPostma a=MarinPostma

 This pr adds support for snapshoting.

The snapshoting process for an index requires that no other update is processing at the same time. A mutex lock has been added to prevent a snapshot from occuring at the same time as an update, while still premitting updates to be pushed.

The list of the indexes to snapshot is first retrieved from the `UuidResolver` which also performs its snapshot.

This list is passed to the update store, which attempts to acquire a lock on the update store while it snaphots itself and it's associated index store.

 This means that a snapshot can only be completed once all indexes have finished their ongoing update.

This pr also adds refactoring of the code to allow unit testing and mocking, and unit test the snapshot creation.

Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: tamo <irevoire@protonmail.ch>
Co-authored-by: marin <postma.marin@protonmail.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-01 13:16:00 +00:00
Marin Postma
248e9b3808
Merge remote-tracking branch 'origin/main' into snapshots 2021-04-01 15:10:33 +02:00
Clément Renault
67e25f8724
Merge pull request #128 from meilisearch/stop-words
Stop words
2021-04-01 14:02:37 +02:00
tamo
12fb509d84
Integrate the stop_words in the querytree
remove the stop_words from the querytree except if it was a prefix or a typo
2021-04-01 13:57:55 +02:00
tamo
a2f46029c7
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
tamo
62a8f1d707
bump the version of the tokenizer 2021-04-01 13:49:22 +02:00
tamo
79c63049d7
update the settings routes 2021-04-01 11:52:26 +02:00
Irevoire
96cffeab1e
update all the response format to be ISO with meilisearch, see #64 2021-04-01 11:43:03 +02:00
Clémentine Urquizar
39a18d4edc
Update README.md 2021-04-01 00:00:21 +02:00
Clément Renault
6e1ddfea5a
Merge pull request #129 from shekhirin/fix-docker-commit-sha
fix(ci, http): commit_sha and commit_date in docker builds
2021-03-31 21:46:17 +02:00
Marin Postma
d8af4a7202
ignore snapshot test (#130) 2021-03-31 20:07:52 +02:00
Clément Renault
56777af8e4
Merge pull request #135 from shekhirin/index-fields-ids-distribution
feat(index): introduce fields_ids_distribution
2021-03-31 17:53:45 +02:00
Alexey Shekhirin
9205b640a4 feat(index): introduce fields_ids_distribution 2021-03-31 18:44:47 +03:00