meilisearch/meilisearch/tests
bors[bot] 39407885c2
Merge #3347
3347: Enhance language detection r=irevoire a=ManyTheFish

## Summary

Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query.

This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search.

## Detail

- Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected
- Fill the newly created database during indexing
- Create an allow-list with this database and pass it to Charabia
- Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese

## Related issues
Fixes #2403
Fixes #3513

Co-authored-by: f3r10 <frledesma@outlook.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-02-21 10:52:13 +00:00
..
assets serde ndjson fix 2022-12-21 11:27:15 +08:00
auth get rids of the whole error_message module since it has been integrated into the last version of deserr 2023-02-14 20:05:27 +01:00
common test various error on the document ressource 2023-02-16 17:37:10 +01:00
dashboard Renames meilisearch-http to meilisearch 2022-12-08 08:22:53 -07:00
documents Merge #3505 2023-02-20 17:01:36 +00:00
dumps fix the import of dump v2 generated by meilisearch v0.22.0 2023-01-31 13:03:28 +01:00
index add tests on the index resource 2023-01-24 13:20:20 +01:00
search Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
settings Fix tests 2023-01-19 15:48:20 +01:00
snapshot Merge --schedule-snapshot and --snapshot-interval-sec options 2023-01-04 14:13:54 +01:00
stats Renames meilisearch-http to meilisearch 2022-12-08 08:22:53 -07:00
swap_indexes Fix tests 2023-01-19 15:48:20 +01:00
tasks Fix non insta tests 2023-01-19 16:10:05 +01:00
content_type.rs Fix non insta tests 2023-01-19 16:10:05 +01:00
integration.rs add functionnal + error tests on the swap_indexes route 2023-01-18 09:36:04 +01:00