Commit Graph

2170 Commits

Author SHA1 Message Date
ad hoc
6b2c2509b2
fix bug in exact search 2022-04-04 20:54:03 +02:00
ad hoc
56b4f5dce2
add exact prefix to query_docids 2022-04-04 20:54:03 +02:00
ad hoc
21ae4143b1
add exact_word_prefix to Context 2022-04-04 20:54:03 +02:00
ad hoc
e8f06f6c06
extract exact_word_prefix_docids 2022-04-04 20:54:03 +02:00
ad hoc
6dd2e4ffbd
introduce exact_word_prefix database in index 2022-04-04 20:54:03 +02:00
ad hoc
ba0bb29cd8
refactor WordPrefixDocids to take dbs instead of indexes 2022-04-04 20:54:02 +02:00
ad hoc
c4c6e35352
query exact_word_docids in resolve_query_tree 2022-04-04 20:54:02 +02:00
ad hoc
8d46a5b0b5
extract exact word docids 2022-04-04 20:54:02 +02:00
ad hoc
5451c64d5d
increase criteria asc desc test map size 2022-04-04 20:54:02 +02:00
ad hoc
0a77be4ec0
introduce exact_word_docids db 2022-04-04 20:54:02 +02:00
ad hoc
5f9f82757d
refactor spawn_extraction_task 2022-04-04 20:54:02 +02:00
ad hoc
f82d4b36eb
introduce exact attribute setting 2022-04-04 20:54:02 +02:00
ad hoc
c882d8daf0
add test for exact words 2022-04-04 20:54:01 +02:00
ad hoc
7e9d56a9e7
disable typos on exact words 2022-04-04 20:54:01 +02:00
bors[bot]
900825bac0
Merge #474
474: Disable typos on exact word r=MarinPostma a=MarinPostma

This PR introduces the `exact_word` setting to disable typo tolerance on custom words.

If a user query contains a word from `exact_words`, no typo derivation will be made for that particular word.

I have chosen to store the words in a FST, to save on deserialization, and allow for fast lookups.

I had some trouble with the `serde` module, and had to rename it `serde_impl`.

## steps:
- [x] introduce new settings to register words to disable typos on
- [x] in `typos`, return exact match is the current word is part of the word to disable typos for.
- [x] update `Context` to return the exact words dictionary.
- [x] merge #473 


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 18:39:43 +00:00
ad hoc
3e67d8818c
fix typo in test comment 2022-04-04 20:34:23 +02:00
ad hoc
284d8a24e0
add intergration test for disabled typon on word 2022-04-04 20:15:51 +02:00
ad hoc
30a2711bac
rename serde module to serde_impl module
needed because of issues with rustfmt
2022-04-04 20:10:55 +02:00
ad hoc
0fd55db21c
fmt 2022-04-04 20:10:55 +02:00
ad hoc
559e46be5e
fix bad rebase bug 2022-04-04 20:10:55 +02:00
ad hoc
8b1e5d9c6d
add test for exact words 2022-04-04 20:10:55 +02:00
ad hoc
774fa8f065
disable typos on exact words 2022-04-04 20:10:55 +02:00
ad hoc
9bbffb8fee
add exact words setting 2022-04-04 20:10:54 +02:00
bors[bot]
48a5ce7434
Merge #473
473: set minimum word len for typos r=MarinPostma a=MarinPostma

this PR allows the configuration on the minimum word length for typos.

The default values are the same as previously.

## steps
- [x] introduce settings for the minimum word length for 1 and 2 typos
- [x] update the settings update flow to set this setting
- [x] create a structure `TypoConfig` to configure typo tolerance in the query builder
- [x] in `typo`, use the configuration to create the appropriate query tree node.
- [x] extend `Context` to return the setting for minimum word length for typos
- [x] return correct error message for wrong settings.
- [x] merge #469 

Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 17:53:14 +00:00
bors[bot]
6bf9824fec
Merge #485
485: fix bug on 2 typos derivation r=Kerollmops a=MarinPostma

I found a bug while working on #473. This pr fixes it and add the missing tests on word derivations.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-04 17:17:53 +00:00
ad hoc
853b4a520f
fmt 2022-04-04 10:41:46 +02:00
ad hoc
2cb71dff4a
add typo integration tests 2022-04-04 10:41:46 +02:00
ad hoc
1941072bb2
implement Copy on Setting 2022-04-04 10:41:46 +02:00
ad hoc
fdaf45aab2
replace hardcoded value with constant in TestContext 2022-04-04 10:41:46 +02:00
ad hoc
950a740bd4
refactor typos for readability 2022-04-04 10:41:46 +02:00
ad hoc
66020cd923
rename min_word_len* to use plain letter numbers 2022-04-04 10:41:46 +02:00
ad hoc
4c4b336ecb
rename min word len for typo error 2022-04-01 11:17:03 +02:00
ad hoc
286dd7b2e4
rename min_word_len_2_typo 2022-04-01 11:17:03 +02:00
ad hoc
55af85db3c
add tests for min_word_len_for_typo 2022-04-01 11:17:02 +02:00
ad hoc
9102de5500
fix error message 2022-04-01 11:17:02 +02:00
ad hoc
a1a3a49bc9
dynamic minimum word len for typos in query tree builder 2022-04-01 11:17:02 +02:00
ad hoc
5a24e60572
introduce word len for typo setting 2022-04-01 11:17:02 +02:00
ad hoc
9fe40df960
add word derivations tests 2022-04-01 11:05:18 +02:00
ad hoc
d5ddc6b080
fix 2 typos word derivation bug 2022-04-01 10:51:22 +02:00
bors[bot]
d2d930dd3f
Merge #469
469: add authorize typo setting r=Kerollmops a=MarinPostma

This PR adds support for an authorize typo settings. This makes is possible to disable typos for a whole index. Typos are enabled by default.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-31 15:18:08 +00:00
ad hoc
3e34981d9b
add test for authorize_typos in update 2022-03-31 14:12:00 +02:00
ad hoc
6ef3bb9d83
fmt 2022-03-31 14:06:23 +02:00
ad hoc
f782fe2062
add authorize_typo_test 2022-03-31 10:08:39 +02:00
ad hoc
c4653347fd
add authorize typo setting 2022-03-31 10:05:44 +02:00
bors[bot]
d8dd357326
Merge #480
480: Increase benchmarks (push) CI timeout r=Kerollmops a=Kerollmops

This PR fixes the fact that the benchmarks CI on push were [canceled by GitHub](https://github.com/meilisearch/milli/actions/runs/2028844132) because they reached the default timeout of 6h. This PR changes the timeout to 72h, the same setting as the manually triggered benchmark one.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-03-29 18:13:31 +00:00
Kerollmops
6a77c81a28
Increase benchmarks (push) CI timeout 2022-03-29 09:45:36 -07:00
bors[bot]
e10c26e70d
Merge #479
479: Update version (v0.24.1) r=Kerollmops a=curquiza

From v0.23.1 to v0.24.1 since we had an issue with the versionning for the previous release

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-03-24 20:12:37 +00:00
Clémentine Urquizar
ddf78a735b
Update version (v0.24.1) 2022-03-24 16:39:45 +01:00
bors[bot]
2c7cafbf20
Merge #475
475: Bump tokenizer r=Kerollmops a=irevoire

This PR bump the tokenizer in v0.2.9 which fixes an issue we had with lindera where reqwest was used with openssl (which was breaking our benchmarks).

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-03-23 13:26:44 +00:00
Irevoire
86dd88698d
bump tokenizer 2022-03-23 14:25:58 +01:00