4352: Restore highlighting when possible for hybrid search r=ManyTheFish a=dureuill
# Pull Request
## Related issue
Fixes#4351
## What does this PR do?
- Use `MatchingWords` from keyword search instead of the one from vector search
- New: When `semanticRatio < 1.0`, all words from the query are now highlighted in all results, regardless of their source (keyword or semantic)
- No change: When `semanticRatio == 1.0`, no highlighting is applied, like before this PR
## Draft status
Should we merge this in a v1.6.1 version?
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4353: Update version for the next release (v1.6.1) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
4318: Hide embedders r=ManyTheFish a=dureuill
Hides `embedders` when it is an empty dictionary.
Manual tests:
- getting settings with empty embedders: not displayed
- getting settings with non-empty embedders: displayed like before
- dump with empty embedders: can be imported
- dump with non-empty embedders: can be imported
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4313: Fix document formatting performances r=Kerollmops a=ManyTheFish
reduce the formatted option list to the attributes that should be formatted,
instead of all the attributes to display.
The time to compute the `format` list scales with the number of fields to format;
cumulated with `map_leaf_values` that iterates over all the nested fields, it gives a quadratic complexity:
`d*f` where `d` is the total number of fields to display and `f` is the total number of fields to format.
Co-authored-by: ManyTheFish <many@meilisearch.com>
4314: Fix proximity precision telemetry r=Kerollmops a=ManyTheFish
The proximity precision telemetry was partially missing in the global setting route.
This PR adds the missing field and return the default value when the value is not set.
Co-authored-by: ManyTheFish <many@meilisearch.com>
4311: Limit the number of values returned by the facet search r=dureuill a=Kerollmops
This PR fixes a bug where the number of values per facet returned by the `indexes/{index}/facet-search` route was not tacking the `faceting.maxValuePerFacet` setting. It also adds a test.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4308: Fix hang on `/indexes` and `/stats` routes r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#4218
## Context
- A previous fix added a field to the `IndexScheduler` to memorize the `currently_updating_index`, so that accessing it through the search would return the handle without trying to open it. This resolved a hang on the search, but #4218 reported further hangs on the `/indexes` and `/stats` routes
- These routes were shunting the `IndexScheduler` and using internal `IndexMapper` logic to access the indexes, again trying to reopen the updating index.
## What does this PR do?
- Moves the logic relative to the `currently_updating_index` from the `IndexScheduler` to the `IndexMapper`, so that any index request to the `IndexMapper` can benefit from it.
## Test
1. Follow reproducer from #4218
2. Before this PR, notice a hang on `/stats` and `/indexes`, but not on `/indexes/<updating_index>/search`
3. After this PR, notice no hang on either of `/stats`, `/indexes` or `/indexes/<updating_index>/search`
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4303: Display default value when proximityPrecision is not set r=dureuill a=ManyTheFish
# Pull Request
## Related
Issue: #4187
Spec change requests: https://github.com/meilisearch/specifications/pull/261#discussion_r1441725272
## What does this PR do?
- Display default value when proximityPrecision is not set instead of Null
Co-authored-by: ManyTheFish <many@meilisearch.com>
4296: Fix single element search r=irevoire a=dureuill
# Pull Request
Before this PR, indexing a single vector in a single document would result in the vector not being found by the vector search.
This PR adds a test case for this condition, and resolves it by bumping arroy to a version containing the fix.
# Test case
Output of the test before and after this PR:
```diff
diff --git a/meilisearch/tests/search/hybrid.rs b/meilisearch/tests/search/hybrid.rs
index 2cd4b83e7..79819cab2 100644
--- a/meilisearch/tests/search/hybrid.rs on release-v1.6.0
+++ b/meilisearch/tests/search/hybrid.rs on fix-single-element-search
`@@` -171,5 +171,5 `@@` async fn single_document() {
.await;
snapshot!(code, `@"200` OK");
- snapshot!(response["hits"][0], `@r###"{"title":"Shazam!","desc":"a` Captain Marvel ersatz","id":"1","_vectors":{"default":[1.0,3.0]},"_rankingScore":0.0}"###);
+ snapshot!(response["hits"][0], `@r###"{"title":"Shazam!","desc":"a` Captain Marvel ersatz","id":"1","_vectors":{"default":[1.0,3.0]},"_rankingScore":1.0,"_semanticScore":1.0}"###);
}
```
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4294: fix compilation warnings for release v1.6 r=curquiza a=irevoire
# Pull Request
## Related issue
Fixes#4292
## What does this PR do?
- Removed unused imports
#4295 fixes the issue no main
Co-authored-by: Tamo <tamo@meilisearch.com>
4279: Check experimental feature on setting update query rather than in the task. r=ManyTheFish a=dureuill
Improve the UX by checking for the vector store feature and returning an error synchronously when sending a setting update, rather than in the indexing task.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4238: Task queue webhook r=dureuill a=irevoire
# Prototype `prototype-task-queue-webhook-1`
The prototype is available through Docker by using the following command:
```bash
docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-task-queue-webhook-1
```
# Pull Request
Implements the task queue webhook.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4236
## What does this PR do?
- Provide a new cli and env var for the webhook, respectively called `--task-webhook-url` and `MEILI_TASK_WEBHOOK_URL`
- Also supports sending the requests with a custom `Authorization` header by specifying the optional `--task-webhook-authorization-header` CLI parameter or `MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER` env variable.
- Throw an error if the specified URL is invalid
- Every time a batch is processed, send all the finished tasks into the webhook with our public `TaskView` type as a JSON Line GZIPed body.
- Add one test.
## PR checklist
### Before becoming ready to review
- [x] Add a test
- [x] Compress the data we send
- [x] Chunk and stream the data we send
- [x] Remove the unwrap in the index-scheduler when sending the data fails
- [x] The analytics are missing
### Before merging
- [x] Release a prototype
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
4277: Update mini-dashboard to v0.2.12 r=curquiza a=mdubus
# Pull Request
## Related issue
Fixes#4276
## What does this PR do?
Upgrade mini-dashboard to version 0.2.12 ([see changes](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.12))
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
4275: Flatten settings r=dureuill a=dureuill
# Pull Request
## Related issue
Initial internal feedback seems to indicate that the current shape of the `embedders` setting is undesirable: it has too much depth.
This PR changes this by flattening the structure of the embedders to the following:
```json5
// NEW structure
"embedders": {
// still starts with the embedder name
"default": {
"source": "huggingFace", // now a string
// properties of the source are all at the same level as the source
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd",
"documentTemplate": "A product titled '{{doc.title}}'" // now a string
}
}
```
By comparison, the old structure was:
```json5
// PREVIOUS version, no longer working with this PR
"embedders": {
// still starts with the embedder name
"default": {
"source": {
"huggingFace": {
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd"
},
"documentTemplate": {
"template": "A product titled '{{doc.title}}'" // now a string
}
}
}
```
The fields that are accepted in the new version of the `embedders` setting are depending on the value of the `source` field:
```json5
// huggingFace
"embedders": {
"default": {
"source": "huggingFace",
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd",
"documentTemplate": "A product titled '{{doc.title}}'"
}
}
// openAi
"embedders": {
"default": {
"source": "openAi",
"model": "text-embedding-ada-002",
"apiKey": "open_ai_api_key",
"documentTemplate": "A product titled '{{doc.title}}'"
}
}
// userProvided
"embedders": {
"default": {
"source": "userProvided",
"dimensions": 42, // mandatory
}
}
```
## What does this PR do?
- Flatten the settings structure
- Validate the prompt earlier to return a synchronous error on setting change rather than in the failing task
- Make it an error to pass a field for the wrong source (see above for allowed fields for each source)
- Not changed: It is still an error not to pass `dimensions` to the `userProvided` embedder
- If `source` was specified in the settings, validate the setting early to return a synchronous error in case of a missing mandatory field for the userProvided source (dimensions) or a forbidden field for the specified source.
- If `source` was not specified in the settings, still validate the setting, but only at indexing time, by using the source stored in the DB.
- Resets all values if the source changes, even if the user did not reset them explicitly.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Change the public facing guide for using the API
- [ ] Change examples of use in the changelog
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4272: Don't pass default revision when the model is explicitly set in config r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#4271
## What does this PR do?
- When the `model` is explicitly set in the `embedders` setting, we reset the `revision` to `None`, such that if the user doesn't specify a revision, the head of the model repository is chosen.
- Not changed: If the user specifies a revision, it applies, like previously.
- Not changed: If the user doesn't specify a model, the default model with the default revision applies, like previously.
## Manual testing on a fresh DB
1. Enable experimental feature:
```sh
curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' -H 'Authorization: Bearer foo' \
--data-binary '{ "vectorStore": true
}'
```
2. Send settings with a specified model but no specified revision:
```sh
curl \
-X PATCH 'http://localhost:7700/indexes/products/settings' \
-H 'Content-Type: application/json' --data-binary \
'{ "embedders": { "default": { "source": { "huggingFace": { "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2" } }, "documentTemplate": { "template": "A product titled '{{doc.title}}'"} } } }'
```
3. Check that the task was successful:
```sh
curl 'http://localhost:7700/tasks/0'
{"uid":0,"indexUid":"products","status":"succeeded","type":"settingsUpdate","canceledBy":null,"details":{"embedders":{"default":{"source":{"huggingFace":{"model":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"}},"documentTemplate":{"template":"A product titled {{doc.title}}"}}}},"error":null,"duration":"PT0.001892S","enqueuedAt":"2023-12-20T09:17:01.73789Z","startedAt":"2023-12-20T09:17:01.73854Z","finishedAt":"2023-12-20T09:17:01.740432Z"}
```
4. Send documents to index:
```sh
curl 'https://localhost:7700/indexes/products/documents' -H 'Content-Type: application/json' --data-binary '{"id": 0, "title": "Best product"}'
```
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4269: Remove dependency that requires libstdc++ r=dureuill a=dureuill
Removes the dependency that caused the additional runtime dependency on libstdc++ by disabling the default features of the hf tokenizer.
## Discussion
- This removes a feature that is using a C++ dependency and is supposed to accelerate the tokenizer. As the tokenizer is likely to be a significant bottleneck for embedding texts using a HF model, this is an issue.
- We should at least rerun the movies vector indexing and check that it still works correctly and that it has a runtime in the ballpark of what it used to be.
Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>