Commit Graph

2632 Commits

Author SHA1 Message Date
Louis Dureuil
553440632e
Introduce Setting::some_or_not_set 2024-07-25 12:01:52 +02:00
Louis Dureuil
7a347966da
Allow explicit dimensions for ollama 2024-07-25 12:01:51 +02:00
Louis Dureuil
4654d51e05
Add custom headers for REST embedder 2024-07-25 12:01:51 +02:00
ManyTheFish
a918561ac1
Fix PR comments 2024-07-25 10:52:56 +02:00
ManyTheFish
70d71581ee
fix clippy 2024-07-25 10:52:56 +02:00
ManyTheFish
04fa44e7eb
Implement localized attributes settings 2024-07-25 10:51:27 +02:00
ManyTheFish
90c0a6db7d
Implement localized search 2024-07-25 10:51:27 +02:00
ManyTheFish
cc02920f2b
Update charabia 2024-07-25 10:51:27 +02:00
Tamo
988552e178
add tests on the rest embedder 2024-07-24 14:34:17 +02:00
Louis Dureuil
0d8199f3b7
Change parameters in milli settings 2024-07-24 14:34:17 +02:00
Louis Dureuil
4b74803dae
Change parameters in vector settings 2024-07-24 14:34:17 +02:00
Louis Dureuil
d731fa661b
ollama and openai use new EmbedderOptions 2024-07-24 14:34:17 +02:00
Louis Dureuil
a1beddd5d9
rest embedder: use json_template 2024-07-24 14:34:17 +02:00
Louis Dureuil
4109182ca4
Add json_template module 2024-07-24 14:34:12 +02:00
Louis Dureuil
1a297c048e
Error changes 2024-07-24 14:34:12 +02:00
Louis Dureuil
303e601b87
HuggingFace: Clearer error message when a model is not supported 2024-07-23 15:13:22 +02:00
meili-bors[bot]
ea73615abf
Merge #4804
4804: Implements the experimental contains filter operator r=irevoire a=irevoire

# Pull Request
Related PRD: (private link) https://www.notion.so/meilisearch/Contains-Like-Filter-Operator-0d8ad53c6761466f913432eb1d843f1e
Public usage page: https://meilisearch.notion.site/Contains-filter-operator-usage-3e7421b0aacf45f48ab09abe259a1de6

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3613

## What does this PR do?
- Extract the contains operator from this PR: https://github.com/meilisearch/meilisearch/pull/3751
- Gate it behind a feature flag
- Add tests


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-07-17 15:47:11 +00:00
Tamo
02c61eabfa fix the range reported when the experimental feature has not been set 2024-07-17 16:54:33 +02:00
Tamo
2af9481804 Implements the experimental contains filter operator« 2024-07-17 11:13:37 +02:00
Louis Dureuil
24240934f9
Improve errors when indexing documents with a user provided embedder 2024-07-16 13:39:01 +02:00
Louis Dureuil
f4c94ac57f
manual embedders: limit max size of errors to 250 2024-07-16 13:39:01 +02:00
Louis Dureuil
4087a88dbe
rest|ollama|openai: increase tries to 10 + randomize retry duration 2024-07-16 13:39:00 +02:00
Louis Dureuil
5adacf2f45
OpenAI: embed only the first MAX_TOKENS tokens 2024-07-16 13:39:00 +02:00
Louis Dureuil
65d0c32aa7
Allow overriding OpenAI's url 2024-07-16 13:39:00 +02:00
Louis Dureuil
82647bcded
When retrieveVectors is true, retrieve _vectors.embedder even if there are no vector for that embedder 2024-07-16 13:39:00 +02:00
Louis Dureuil
e83da00446
Milli changes to match to allow for more flexible lifetimes 2024-07-11 16:29:35 +02:00
Louis Dureuil
7fb3e378ff
Do not fail sort comparisons when the field name or target point are different 2024-07-11 16:28:14 +02:00
meili-bors[bot]
29b44e5541
Merge #4626
4626: Edit Documents with Rhai r=ManyTheFish a=Kerollmops

This PR introduces a first version of [the _Update Documents with Function_ (internal)](https://www.notion.so/meilisearch/Update-Documents-by-Function-45f87b13e61c4435b73943768a490808). It uses [the Rhai programming language](https://rhai.rs/) to let users express the modifications they want apply.

You can read more about the way to use this functions on [the Usage PRD Page](https://meilisearch.notion.site/Edit-Documents-with-Rhai-0cff8fea7655436592e7c8a6de932062?pvs=25). The [prototype is available](https://github.com/meilisearch/meilisearch/actions/runs/9038384483) through Docker by using the following command:

```
docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-edit-documents-with-rhai-3
```

## TODO
 - [x] Support the `DocumentEdition` task in dumps.
 - [x] Remove the unwraps and panics.
 - [x] Improve error codes for the `function` parameter.
 - [x] [Update Rhai to v1.19.0](https://github.com/rhaiscript/rhai/releases/tag/v1.19.0) 🚀
 - [x] Make it an experimental feature (only restrict the HTTP calls).
 - [x] It must be possible not to send a context.
 - [x] Rebase on main.
 - [x] Check that the script cannot do any io.
 - [x] ~Introduce a `Documents.edit` action or~ require the `Documents.all` action.
 - [x] Change the `editionCode` to the clearer `function` field name in the tasks.
 - [x] Support a user provided context and maybe more (but keep function execution isolated for reproducibility).
 - [x] Support deleting documents when the `doc` is `()` (nil, null).
 - [x] Support canceling document edition.
 - [x] Multithread document edition by using rayon (and [rayon-par-bridge](https://docs.rs/rayon-par-bridge/latest/rayon_par_bridge/)).
 - [x] Limit the number of instruction by function execution.
 - [ ] ~Expose the limit of instructions in the settings.~ Not sure, in fact.
 - [x] Ignore unmodified documents in the tasks count.
 - [x] Make the `filter` field optional (not forced to be `null`).

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-07-11 09:02:55 +00:00
Clément Renault
6e80364c50
Apply review comments 2024-07-11 11:00:27 +02:00
Clément Renault
3bac22fd87
We do not do intersections with the universe when it is related to cache 2024-07-10 16:49:36 +02:00
Clément Renault
ce61cb7fe6
Simplify and speedup an intersection pass 2024-07-10 16:49:36 +02:00
Clément Renault
1693d1a311
Simplify the check to decide to stop a loop 2024-07-10 16:49:36 +02:00
Clément Renault
febea735ca
Remove the unused universe parameter from resolve_negative_phrases 2024-07-10 16:49:36 +02:00
Clément Renault
93ba051094
Remove the invalid get_phrases_docids universe parameter 2024-07-10 16:49:35 +02:00
Clément Renault
cd7a20fa32
Make it work by avoid storing invalid stuff in the cache 2024-07-10 16:49:35 +02:00
Clément Renault
41f51adbec
Do less useless intersections 2024-07-10 16:49:35 +02:00
Clément Renault
0ca1a4e805
Always do the intersections with the universe 2024-07-10 16:49:34 +02:00
Clément Renault
50a7393c55
Modify the compute_query_term_subset_docids function to accept the universe 2024-07-10 16:49:34 +02:00
Clément Renault
837274f853
Restrict even more the Rhai engine 2024-07-10 16:30:18 +02:00
Clément Renault
aace587dd1
Create errors for the internal processing ones 2024-07-10 16:29:18 +02:00
Clément Renault
f35d6710f3
Update rhai to v1.19.0 2024-07-10 16:29:17 +02:00
Clément Renault
81ec0abad1
Use the new rayon-par-bridge library 2024-07-10 16:29:04 +02:00
Clément Renault
b67d385cf0
Parallelize the edition functions 2024-07-10 16:28:54 +02:00
Clément Renault
dfecb25814
Disable the time package 2024-07-10 16:28:37 +02:00
Clément Renault
2eae2015d7
Support aborting documents edition by function 2024-07-10 16:28:15 +02:00
Clément Renault
33fa17bf12
Support deleting documents with functions 2024-07-10 16:28:15 +02:00
Clément Renault
400e6b93ce
Support user-provided context for documents edition 2024-07-10 16:28:15 +02:00
Clément Renault
f4add93043
Limit the number of script operations 2024-07-10 16:28:14 +02:00
Clément Renault
2fae96ac14
Show the actual number of actually edited documents 2024-07-10 16:28:14 +02:00
Clément Renault
45af18ae9c
Check the Rhai syntax before accepting the script 2024-07-10 16:28:13 +02:00
Clément Renault
2d97164d9f
It works perfectly with some Rhai 2024-07-10 16:28:13 +02:00
Clément Renault
efc156a4a4
Executing Lua works correctly 2024-07-10 16:27:36 +02:00
meili-bors[bot]
2099b4f0dd
Merge #4786
4786: Update dependencies r=Kerollmops a=irevoire

# Pull Request

## Related issue
Fixes #4753

## What does this PR do?
- Update all dependencies except rustls
- [x] Release charabia
- [x] Update charabia
- [x] Double check that the docker build works after updating charabia



Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-07-10 13:23:54 +00:00
Clément Renault
9d6885793e
Upgrade dependencies 2024-07-10 13:46:24 +02:00
Clément Renault
5f4530ce57
Remove more unused dependencies 2024-07-10 13:36:34 +02:00
Tamo
4d5005b01a make clippy happy 2024-07-10 10:06:59 +02:00
Tamo
952e742321 update charabia 2024-07-09 23:41:29 +02:00
hanbings
0a40a98bb6
Make milli use edition 2021 (#4770)
* Make milli use edition 2021

* Add lifetime annotations to milli.

* Run cargo fmt
2024-07-09 17:25:39 +02:00
Tamo
cd46ebd6b5 remove insta deprecating 2024-07-08 18:38:05 +02:00
Tamo
6afa578688 update most incompatible dependencies 2024-07-08 18:31:15 +02:00
Tamo
300bdfc2a7 update most dependencies 2024-07-08 18:09:12 +02:00
Louis Dureuil
128e6c7502
Search: spans with a finer granularity 2024-07-02 16:13:53 +02:00
ManyTheFish
015d90a962 merge main 2024-07-01 11:50:36 +02:00
Louis Dureuil
e53de15b8e
Fix behavior of limit and offset for hybrid search when keyword results are returned early
The test is fixed
2024-06-27 14:25:33 +02:00
Tamo
ce08dc509b add more tests and improve the location of the error 2024-06-27 11:51:45 +02:00
Tamo
1daaed163a Make _vectors.:embedding.regenerate mandatory + tests + error messages 2024-06-27 11:04:58 +02:00
meili-bors[bot]
7e3c306c54
Merge #4725
4725: Store primary key as String when Number exceeds i64 range r=irevoire a=JWSong

# Pull Request

## Related issue
Fixes #4696 

## What does this PR do?
- When a Number value exceeding the range of i64 is received as a primary key, it will be stored as a String.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: JWSong <thdwjddn123@gmail.com>
2024-06-26 07:06:04 +00:00
JWSong
dcdc83946f accept large number as string 2024-06-25 21:41:47 +09:00
meili-bors[bot]
3c4c46377b
Merge #4665
4665: Add missing Korean support r=ManyTheFish a=junhochoi

Some configuration is missing `korean` features and add a test case in `milli/src/search/mod.rs`.

# Pull Request

## Related issue

#3443 #3882 

## What does this PR do?
- Improvement on enabling Korean support

Inspired by the work (#3882) I tried to enable Korean features but have found some missing configurations.
This PR is add those missing configs (mostly Cargo.toml) and added one test case.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Junho Choi <jh.choi@catenoid.net>
2024-06-25 11:51:21 +00:00
Louis Dureuil
d75e0098c7
Fixes for Rust v1.79 2024-06-25 11:16:06 +02:00
Junho Choi
2e0ff56f3f Add missing Korean support
Some configuration is missing `korean` features and
add a test case in `milli/src/search/mod.rs`.
2024-06-25 12:45:21 +09:00
Tamo
1693332cab Update arroy and always build the tree that need to be built 2024-06-24 10:14:03 +02:00
meili-bors[bot]
ddd564665b
Merge #4713
4713: Speed up facet distribution r=ManyTheFish a=Kerollmops

This PR is akin to #4682, but this time, the same logic is applied to the facets. Bitmaps are not decoded, and we do an intersection on the bytes with the search candidates instead of materializing the RoaringBitmap to destroy it just after the operation.

A prospect raised some slow requests when performing facet searches, and I found out that the disk optimization intersection wasn't performed on the facets.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-06-24 05:23:46 +00:00
Clément Renault
9736e16a88
Make clippy happy 2024-06-20 13:02:44 +02:00
Clément Renault
6fa4da8ae7
Improve facet distribution speed in count mode 2024-06-20 12:58:51 +02:00
Clément Renault
19d7cdc20d
Improve facet distribution speed in lexico mode 2024-06-20 12:57:08 +02:00
Louis Dureuil
a04041c8f2
Only spawn the pool once 2024-06-19 16:25:33 +02:00
meili-bors[bot]
e580d6b98f
Merge #4693
4693: Introduce distinct attributes at search time r=irevoire a=Kerollmops

This PR fixes #4611.

### To Do
- [x] Remove the `distinguishableAttributes` settings (not even a commit about that).
- [x] Use the `filterableAttributes` to be able to use the `distinct` parameter at search.
- [x] Work on the errors and make tests.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-06-18 07:45:03 +00:00
Tamo
43875e6758 fix bug around nested fields 2024-06-17 15:59:30 +02:00
meili-bors[bot]
e9bf4c43a4
Merge #4649
4649: Don't store the vectors in the documents database r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4607

## What does this PR do?
- Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable
- [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings
- [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field.
- [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-17 12:32:03 +00:00
Louis Dureuil
0a8f50695e
Fixes for Rust v1.79 2024-06-13 17:47:44 +02:00
Louis Dureuil
e35ef31738
Small changes following review 2024-06-13 14:20:48 +02:00
Louis Dureuil
3bc8f81abc
user_provided => regenerate 2024-06-12 18:12:20 +02:00
Louis Dureuil
a89eea233b
Fix vectors injection 2024-06-12 17:10:19 +02:00
Louis Dureuil
f5cf01e7d1
Rework extraction to use EmbedderAction 2024-06-12 14:50:55 +02:00
Louis Dureuil
d1dd7e5d09
In transform for removed embedders, write back their user provided vectors in documents, and clear the writers 2024-06-12 14:50:55 +02:00
Louis Dureuil
d18c1f77d7
Update embedder configs with a finer granularity
- no longer clear vector DB between any two embedder changes
2024-06-12 14:50:55 +02:00
Louis Dureuil
d0b05ae691
Add EmbedderAction to settings 2024-06-12 14:50:54 +02:00
Louis Dureuil
e9bf4eb100
Reformulate ParsedVectorsDiff in terms of VectorState 2024-06-12 14:11:44 +02:00
Louis Dureuil
b368105272
Add EmbedderConfigs::into_inner 2024-06-12 14:11:44 +02:00
meili-bors[bot]
e0eff08095
Merge #4685
4685: Fix ci tests r=dureuill a=ManyTheFish

# Pull Request
Make the all following CI succeed:
https://github.com/meilisearch/meilisearch/actions/runs/9477183091

## Related issue
Fixes #4629

## What does this PR do?
- Change the test behavior for `swedish-recomposition` feature flag
- Remove the `-v` parameter from grep

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-06-12 07:58:33 +00:00
Clément Renault
39f60abd7d
Add and modify distinct tests 2024-06-11 17:53:53 -04:00
Clément Renault
1991bd03da
Distinct at search erases the distinct in the settings 2024-06-11 17:02:39 -04:00
Clément Renault
ee39309aae
Improve errors and introduce a new InvalidSearchDistinct error code 2024-06-11 16:03:39 -04:00
Clément Renault
0d31be1494
Make the distinct work at search 2024-06-11 11:39:35 -04:00
Louis Dureuil
7cef2299cf
Fix behavior when removing a document 2024-06-11 09:45:08 +02:00
ManyTheFish
57d066595b fix Tests almost all features 2024-06-06 17:24:50 +02:00
Clément Renault
75b2e02cd2
Log more stuff around filtering 2024-06-06 11:00:07 -04:00
Clément Renault
52d0d35b39
Revert "Reduce the universe while exploring the facet tree" because it's slower this way
This reverts commit 14026115f21409535772ede0ee4273f37848dd61.
2024-06-06 09:17:51 -04:00
Clément Renault
5432776132
Reduce the universe while exploring the facet tree 2024-06-06 09:17:51 -04:00
Clément Renault
66470b27e6
Use the MultiOps trait for IN operations 2024-06-06 09:17:51 -04:00
Clément Renault
0a9bd398c7
Improve the NOT operator to use the universe when possible 2024-06-06 09:17:51 -04:00
Clément Renault
7967e93c16
Skip evaluating when a universe is empty, nothing can be found 2024-06-06 09:17:51 -04:00
Clément Renault
a6f3a01c6a
Expose the universe to do efficient intersections on deserialization 2024-06-06 09:17:51 -04:00
Clément Renault
4ca4a3f954
Make the CboRoaringBitmapCodec support intersection on deserialization 2024-06-06 09:17:51 -04:00
Clément Renault
e4a69c5ac3
Introduce the FacetGroupLazyValue type 2024-06-06 09:17:50 -04:00
Clément Renault
531e3d7d6a
MultiOps trait for OR operations 2024-06-06 09:17:50 -04:00
Tamo
2cdcb703d9 fix the deletion of vectors and add a test 2024-06-06 11:39:29 +02:00
Tamo
31a793d226 fix the regeneration of the embeddings in the search 2024-06-06 11:39:29 +02:00
Tamo
d85ab23b82 rename all occurences of user_defined to user_provided for consistency 2024-06-06 11:39:29 +02:00
Tamo
b7349910d9 implements mor review comments 2024-06-06 11:39:29 +02:00
Tamo
376b3a19a7 makes clippy and fmt happy 2024-06-06 11:39:29 +02:00
Tamo
b867829ef1 remove useless dbg 2024-06-06 11:39:29 +02:00
Tamo
5d50850e12 always push the user defined vectors in arroy 2024-06-06 11:39:29 +02:00
Tamo
a73ccc78a6 forward the embedding config to the extractors 2024-06-06 11:39:28 +02:00
Tamo
9eb6f522ea wraps the index embedding config in a struct 2024-06-06 11:37:30 +02:00
Tamo
04f6523f3c expose a new parameter to retrieve the embedders at search time 2024-06-06 11:36:11 +02:00
Tamo
84e498299b Remove the vectors from the documents database 2024-06-06 11:36:11 +02:00
Tamo
7a84697570 never store the _vectors as searchable or faceted fields 2024-06-06 11:36:11 +02:00
Tamo
4148fbbe85 provide a method to get all the nested fields ids from a name 2024-06-06 11:36:11 +02:00
ManyTheFish
2e50c6ec81 Update Charabia 2024-06-06 10:18:43 +02:00
ManyTheFish
30293883e0 Fix condition mistake 2024-06-05 17:30:07 +02:00
ManyTheFish
b833be46b9 Avoid running proximity when only the exact attributes changes 2024-06-05 17:30:07 +02:00
ManyTheFish
0a4118329e Put only_additional_fields to None if the difference gives an empty result. 2024-06-05 17:30:07 +02:00
ManyTheFish
261e92d7e6 Skip iterating over documents when the faceted field list doesn't change 2024-06-05 17:30:07 +02:00
ManyTheFish
5cd08979b1 iterate over the faceted fields instead of over the whole document 2024-06-05 17:30:07 +02:00
Clément Renault
a998b881f6 Cache a lot of operations to know if a field must be indexed 2024-06-05 17:30:07 +02:00
Clément Renault
b81953a65d Add a span for the prepare_for_documents_reindexing 2024-06-05 17:30:07 +02:00
Clément Renault
091bb157f1 Add a span for the settings diff creation 2024-06-05 17:30:07 +02:00
Clément Renault
1b639ce44b Reduce the number of complex calls to settings diff functions 2024-06-05 17:30:07 +02:00
Clément Renault
87cf8a3c94 Introduce a new way to determine the operations to perform on the fields 2024-06-05 17:30:07 +02:00
Clément Renault
0f578348f1 Introduce a dedicated function to write proximity entries in database 2024-06-05 17:30:07 +02:00
Clément Renault
fad4675abe Give the settings diff to the write_typed_chunk_into_index function 2024-06-05 17:30:07 +02:00
Clément Renault
1ab03c4ede Fix an issue with settings diff and * in the searchable attributes 2024-06-05 17:30:07 +02:00
Clément Renault
0c6e4b2f00 Introducing a new into_del_add_obkv_conditional_operation function 2024-06-05 17:30:07 +02:00
Clément Renault
42b3f52ef9 Introduce the SettingDiff only_additional_fields method 2024-06-05 17:30:07 +02:00
meili-bors[bot]
93f5defedc
Merge #4656
4656: Adding a new `searchableAttribute` no longer re-index all the attributes r=ManyTheFish a=Kerollmops

Fixes #4492.

## To Do
 - [x] Do not call the `InnerSettingsDiff::only_additional_fields` function too many times
 - [ ] Add tests

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-06-05 14:51:14 +00:00
ManyTheFish
33241a6b12 Fix condition mistake 2024-06-05 16:00:24 +02:00
ManyTheFish
ff87b4db26 Avoid running proximity when only the exact attributes changes 2024-06-05 12:48:44 +02:00
ManyTheFish
ba9fadc8f1 Put only_additional_fields to None if the difference gives an empty result. 2024-06-05 10:51:16 +02:00
ManyTheFish
d29d4f88da Skip iterating over documents when the faceted field list doesn't change 2024-06-04 15:31:24 +02:00
ManyTheFish
17c5ceeb9d iterate over the faceted fields instead of over the whole document 2024-06-04 14:04:20 +02:00
meili-bors[bot]
fc584f1db3
Merge #4666
4666: Add a score threshold search parameter r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4609

## What does this PR do?
- See [usage](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#95b76ded400342ba9ab3d67c734836f0) and [the known limitation](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#e4e32195bf0e4195b5daecdbb7a97a17)


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-03 08:42:44 +00:00
Louis Dureuil
2b6db6541e
Changes after review 2024-06-03 10:30:00 +02:00
meili-bors[bot]
d6bd88ce4f
Merge #4667
4667: Frequency matching strategy r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #3773

## What does this PR do?
- add test for matching strategy
- implement frequency matching strategy

See the [PRD for more details](https://www.notion.so/meilisearch/Frequency-Matching-Strategy-0f3ba08833a442a39590a53a1505ab00).

[Public API](https://www.notion.so/meilisearch/frequency-matching-strategy-89868fb7fc584026bc56e378eb854a7f).


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-05-30 14:53:31 +00:00
Clément Renault
b9a0ff0dd6
Cache a lot of operations to know if a field must be indexed 2024-05-30 16:18:23 +02:00
Clément Renault
75496af985
Add a span for the prepare_for_documents_reindexing 2024-05-30 12:14:22 +02:00
Clément Renault
0e9eb9eedb
Add a span for the settings diff creation 2024-05-30 12:08:27 +02:00
ManyTheFish
3f1a510069 Add tests and fix matching strategy 2024-05-30 12:02:42 +02:00
Clément Renault
3a78e988da
Reduce the number of complex calls to settings diff functions 2024-05-30 11:23:07 +02:00
Clément Renault
d9e5074189
Introduce a new way to determine the operations to perform on the fields 2024-05-30 11:23:07 +02:00
Clément Renault
bc210bdc00
Introduce a dedicated function to write proximity entries in database 2024-05-30 11:23:06 +02:00
Clément Renault
4bf83f701c
Give the settings diff to the write_typed_chunk_into_index function 2024-05-30 11:23:06 +02:00
Clément Renault
db3887929f
Fix an issue with settings diff and * in the searchable attributes 2024-05-30 11:22:50 +02:00
Clément Renault
9af103a88e
Introducing a new into_del_add_obkv_conditional_operation function 2024-05-30 11:22:49 +02:00
Clément Renault
99211eb375
Introduce the SettingDiff only_additional_fields method 2024-05-30 11:22:49 +02:00
Louis Dureuil
4f03b0cf5b
Add ranking score threshold to similar 2024-05-30 11:20:50 +02:00
Louis Dureuil
c26db7878c
Expose rankingScoreThreshold in API 2024-05-30 10:32:35 +02:00
ManyTheFish
1ab88e10b9 Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 16:24:00 +02:00
Louis Dureuil
aac1d769a7
Add ranking_score_threshold to milli 2024-05-29 14:17:09 +02:00
ManyTheFish
abdc4afcca Implement Frequency matching strategy 2024-05-29 13:59:08 +02:00
Many the fish
e1fbfde6c4
Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 11:31:03 +02:00
ManyTheFish
27b75ec648 merge main into v1.8.1 2024-05-29 11:26:07 +02:00
Louis Dureuil
ca6cc4654b
Add similar route 2024-05-28 15:28:19 +02:00
Louis Dureuil
d35278320e
Add support functions for accessing arroy writers and readers 2024-05-28 15:27:43 +02:00
Louis Dureuil
02b3d82c60
filtered_universe accepts index and txn instead of SearchContext 2024-05-28 15:22:12 +02:00
Louis Dureuil
fd2c95999d
Change validate_document_id to public and remove extra layer of result 2024-05-28 15:21:19 +02:00
Clément Renault
dc949ab46a
Remove puffin usage 2024-05-27 15:59:14 +02:00
Clément Renault
7f3e51349e
Remove puffin for the dependencies 2024-05-27 15:53:06 +02:00
meili-bors[bot]
19acc65ad2
Merge #4646
4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops

This PR implements what is described in #4485. It reduces the number of disk writes and disk usage.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-23 16:06:50 +00:00
Clément Renault
fe17c0f52e
Construct the minimal OBKVs according to the settings diff 2024-05-23 11:23:57 +02:00
Clément Renault
bc5663e673
FieldIdsMap no longer useful thanks to #4631 2024-05-22 16:06:15 +02:00
Louis Dureuil
8a941c0241
Smaller review changes 2024-05-22 14:44:42 +02:00
Louis Dureuil
3412e7fbcf
"[]" is deserialized as 0 embedding rather than 1 embedding of dim 0 2024-05-22 12:25:21 +02:00
Louis Dureuil
16037e2169
Don't remove embedders that are not in the config from the document DB 2024-05-22 12:24:51 +02:00
Louis Dureuil
8f7c8ca7f0
Remove now unused error variant 2024-05-22 12:23:43 +02:00
Clément Renault
500ddc76b5
Make the flattened sorter optional 2024-05-21 16:16:36 +02:00
Clément Renault
943f8dba0c
Make clippy happy 2024-05-21 14:58:41 +02:00
Clément Renault
1aa8ed9ef7
Make the original sorter optional 2024-05-21 14:53:26 +02:00
ManyTheFish
f762307838 Fix clippy 2024-05-21 13:44:20 +02:00
ManyTheFish
3e94a90722 Fixes 2024-05-21 13:39:46 +02:00
Louis Dureuil
b17cb56dee
Test array of vectors 2024-05-20 14:44:10 +02:00
ManyTheFish
fc7e817221 Index geo points based on the settings differences 2024-05-20 12:27:26 +02:00
Louis Dureuil
d05d49ffd8
Fix tests 2024-05-20 10:36:18 +02:00
Louis Dureuil
0462ebbe58
Don't write an empty _vectors field 2024-05-20 10:36:18 +02:00
Louis Dureuil
2f7a8a4efb
Don't write vectors that weren't autogenerated in document DB 2024-05-20 10:36:18 +02:00
Louis Dureuil
52d9cb6e5a
Refactor vector indexing
- use the parsed_vectors module
- only parse `_vectors` once per document, instead of once per embedder per document
2024-05-20 10:36:17 +02:00
Louis Dureuil
261de888b7
Add function to get the embeddings of a document in an index 2024-05-20 10:36:17 +02:00
Louis Dureuil
98c811247e
Add parsed vectors module 2024-05-20 10:25:59 +02:00
Tamo
273c6e8c5c uses the latest version of heed to get rid of unsafe code 2024-05-16 18:31:32 +02:00
Tamo
897d25780e update milli to latest version 2024-05-16 18:31:32 +02:00
Tamo
f2d0a59f1d when no searchable attributes are defined, makes all the weight equals to zero 2024-05-16 01:06:33 +02:00
Tamo
c78a2fa4f5 rename method and variable around the attributes to search on feature 2024-05-15 18:04:42 +02:00
Tamo
5542f1d9f1 get back to what we were doingb efore in the DB cache and with the restricted field id 2024-05-15 18:00:39 +02:00
Tamo
ad4d8502b3 stops storing the whole fieldids weights map when no searchable are defined 2024-05-15 17:16:10 +02:00
Tamo
7ec4e2a3fb apply all style review comments 2024-05-15 15:02:26 +02:00
Tamo
9fffb8e83d make clippy happy 2024-05-14 17:36:32 +02:00
Tamo
caa6a7149a make the attribute ranking rule use the weights and fix the tests 2024-05-14 17:36:32 +02:00
Tamo
a0082c4df9 add a failing test on the attribute ranking rule 2024-05-14 17:00:02 +02:00
Tamo
b0afe0972e stop updating the fields ids map when fields are only swapped 2024-05-14 17:00:02 +02:00
Tamo
9ecde41853 add a test on the current behaviour 2024-05-14 17:00:02 +02:00
Tamo
685f452fb2 Fix the indexing of the searchable 2024-05-14 17:00:02 +02:00
Tamo
4e4a1ddff7 gate a test behind the required feature 2024-05-14 17:00:02 +02:00
Tamo
c22460045c Stops returning an option in the internal searchable fields 2024-05-14 17:00:02 +02:00
Clément Renault
ac4bc143c4
Bump ureq to v2.9.7 2024-05-07 10:39:38 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
Louis Dureuil
f4dd73ec8c
Destructure EmbedderOptions so we don't miss some options 2024-05-02 15:39:36 +02:00
ManyTheFish
88174b8ae4 Update charabia v0.8.10 2024-04-30 14:30:23 +02:00
meili-bors[bot]
ebca29f3de
Merge #4597
4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish

# Pull Request
- add some conditions reducing the work done when changing the settings
- add some benchmarks on embedders

## Related issue
Fixes #4585


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 16:37:28 +00:00
meili-bors[bot]
c793b6ef6d
Merge #4600
4600: Fix embedders api r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #4594
Fixes #4595


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 13:16:33 +00:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper 2024-04-24 16:40:12 +02:00
ManyTheFish
9b76501875 Display set API key for Ollama embedder 2024-04-24 12:33:07 +02:00
Clément Renault
b3173d0423
Remove useless dots in the error messages 2024-04-22 18:09:33 +02:00
Clément Renault
96cc5319c8
Introduce a new internal error type to categorize panics 2024-04-22 18:09:33 +02:00
Clément Renault
0c7003c5df
Introduce an atomic to catch panics in thread pools 2024-04-22 18:09:33 +02:00
ManyTheFish
a1aa999026 Add conditions reducing wrok 2024-04-22 14:18:35 +02:00
ManyTheFish
c71b5d09ff Updatre charabia v0.8.9 2024-04-18 11:38:26 +02:00
writegr
ab43a8a949 chore: fix some typos in comments
Signed-off-by: writegr <wellweek@outlook.com>
2024-04-18 14:12:52 +08:00
meili-bors[bot]
4a8459b799
Merge #4576
4576: increase the default search time budget from 150ms to 1.5s r=ManyTheFish a=irevoire

# Pull Request

## Related issue
Fixes #4575

## What does this PR do?
- increase the default search time budget from 150ms to 1.5s


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-17 16:04:47 +00:00
Clément Renault
c923adf222
Fix facet distribution for alpha on facet numbers 2024-04-17 16:31:16 +02:00
ManyTheFish
df29ba709a Make some cleaning in Arcs 2024-04-17 12:33:25 +02:00
ManyTheFish
3acfab2eb7 Fix PR comments 2024-04-17 10:55:51 +02:00
Tamo
19137be0ea increase the default search time budget from 150ms to 1.5s 2024-04-16 18:09:49 +02:00
ManyTheFish
87a93ba47d fix clippy 2024-04-16 14:39:30 +02:00
ManyTheFish
eaf113ef34 Fix wod pair proximity error when nothing has to be extracted 2024-04-16 14:39:30 +02:00
ManyTheFish
e5ae337aae Comeback to sorters in extract_word_docids
using buffers and merge the keys manually is less efficient
2024-04-16 14:39:30 +02:00
ManyTheFish
a489b406b4 fix test 2024-04-16 14:39:06 +02:00
ManyTheFish
02c3d6b265 finish work 2024-04-16 14:39:06 +02:00
ManyTheFish
b5e4a55af6 refactor faceted and searchable pipeline 2024-04-16 14:39:06 +02:00
ManyTheFish
a7e368aaa6 Create InnerIndexSettingsDiffs struct and populate it 2024-04-16 14:39:06 +02:00
ManyTheFish
893200ab87 Avoid clearing documents in transform 2024-04-16 14:39:06 +02:00
ManyTheFish
aabce52b1b Fix test 2024-04-16 14:39:06 +02:00
ManyTheFish
8fff5fc281 update tests 2024-04-16 14:39:06 +02:00
yudrywet
cf864a1c2e chore: fix some typos in comments
Signed-off-by: yudrywet <yudeyao@yeah.net>
2024-04-14 20:11:34 +08:00
Louis Dureuil
89e72fab32
Update grenad to fix rare DB corruption 2024-04-11 21:06:59 +02:00
meili-bors[bot]
b1844b0c27
Merge #4548
4548: v1.8 hybrid search changes r=dureuill a=dureuill

Implements the search changes from the [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#40f24df3da694428a39cc8043c9cfc64)

### ⚠️ Breaking changes in an experimental feature:

- Removed the `_semanticScore`. Use the `_rankingScore` instead.
- Removed `vector` in the response of the search (output was too big).
- Removed all the vectors from the `vectorSort` ranking score details
  - target vector appearing in the name of the rule
  - matched vector appearing in the details of the rule

### Other user-facing changes

- Added `semanticHitCount`, indicating how many hits were returned from the semantic search. This is especially useful in the hybrid search.
- Embed lazily: Meilisearch no longer generates an embedding when the keyword results are "good enough".
- Graceful embedding failure in hybrid search: when doing hybrid search (`semanticRatio in ]0.0, 1.0[`), an embedding failure no longer causes the search request to fail. Instead, only the keyword search is performed. When doing a full vector search (`semanticRatio==1.0`), a failure to embed will still result in failing that search.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-04 16:00:20 +00:00
Louis Dureuil
1ff2a2d6fb
Add semanticHitCount 2024-04-04 16:04:06 +02:00
Louis Dureuil
3c6e9851a4
Correct error formatting 2024-04-04 15:58:19 +02:00
Louis Dureuil
466d718a05
Fix test 2024-04-04 15:58:19 +02:00
Louis Dureuil
6ebb6b55a6
Lazily embed, don't fail hybrid search on embedding failure 2024-04-04 15:58:17 +02:00
Louis Dureuil
fabc9cf14a
milli: add Embedder::embed_one 2024-04-04 15:57:29 +02:00
Louis Dureuil
00c4ed3bc2
milli: refactor getting embedder and embedder name 2024-04-04 15:57:29 +02:00
Louis Dureuil
928e6e4c05
Breaking change: remove vector for score details 2024-04-04 15:57:29 +02:00
meili-bors[bot]
339a5e3431
Merge #4549
4549: Hugging Face embedder improvements r=dureuill a=dureuill

Architectural changes/Internal improvements

### 1. Prefer safetensors weights over pytorch weights when available

safetensors weights are memory mapped, which reduces memory usage of supported models.

### 2. Update candle

Updates candle to `0.4.1`, now targeting crates.io and the tokenizers to `v0.15.2` (still on github).

This might fix https://github.com/meilisearch/meilisearch/issues/4399 thanks to the now included https://github.com/huggingface/candle/issues/1454

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-04 13:47:18 +00:00
meili-bors[bot]
5509bafff8
Merge #4535
4535: Support Negative Keywords r=ManyTheFish a=Kerollmops

This PR fixes #4422 by supporting `-` before any word in the query.

The minus symbol `-`, from the ASCII table, is not the only character that can be considered the negative operator. You can see the two other matching characters under the `Based on "-" (U+002D)` section on [this unicode reference website](https://www.compart.com/en/unicode/U+002D).

It's important to notice the strange behavior when a query includes and excludes the same word; only the derivative ( synonyms and split) will be kept:
 - If you input `progamer -progamer`, the engine will still search for `pro gamer`.
 - If you have the synonym `like = love` and you input `like -like`, it will still search for `love`.

## TODO
 - [x] Add analytics
 - [x] Add support to the `-` operator
 - [x] Make sure to support spaces around `-` well
 - [x] Support phrase negation
 - [x] Add tests


Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-04-04 13:10:27 +00:00
Louis Dureuil
58cafcc824
Update candle 2024-04-03 13:11:56 +02:00
meili-bors[bot]
56bf8503db
Merge #4537
4537: Expose distribution shift in settings r=ManyTheFish a=dureuill

See [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#d652adc0890445658aaf36352dbc8802)

# Changes

- Distribution shift added to all embedders.
- Exposed in settings
- Changed the reindexing logic to not trigger a reindex operation when only the distribution shift or API key change

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-03 09:08:58 +00:00
Louis Dureuil
a1eccc762a
Prefer safetensors to pytorch when both are available 2024-04-03 11:05:59 +02:00
meili-bors[bot]
75f81a0bab
Merge #4547
4547: Fix milli/Cargo.toml for usage as dependency via git r=dureuill a=Toromyx

# Pull Request

## Related issues/discussions
This enables th usage of `milli` [via git repository](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) as mentioned in <https://github.com/meilisearch/meilisearch/issues/3367#issuecomment-1422613815>, <https://github.com/meilisearch/meilisearch/discussions/1523#discussioncomment-1039338>, and <https://github.com/meilisearch/meilisearch/discussions/1981#discussioncomment-1771568>

## What does this PR do?
Trying to depend on `milli` like

```
[dependencies.milli]
git = "https://github.com/meilisearch/meilisearch.git"
tag = "v1.7.4"
```

leads to the following error:

```
error: failed to select a version for the requirement `candle-core = "^0.3.1"`
candidate versions found which didn't match: 0.4.2
location searched: Git repository https://github.com/huggingface/candle.git
required by package `milli v1.7.4 (https://github.com/meilisearch/meilisearch.git?tag=v1.7.4#0259ad60)`
```

because the default branch of <https://github.com/huggingface/candle> does not contain the correct version.

To fix this, i added a `rev="..."` entry in the relevant dependencies, specifiyng the commit already present in the `Cargo.lock` file.
I also updated the version to the one in the Cargo.lock. This also updated `candle-kernels` sub-dependency from 0.3.1 to 0.3.3 which is probably correct?

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Thomas Gauges <thomas.gauges@gmail.com>
2024-04-03 07:31:36 +00:00
Thomas Gauges
d55d496250 Fix milli/Cargo.toml for usage as dependency via git 2024-04-02 15:19:30 +02:00