30 Commits

Author SHA1 Message Date
meili-bors[bot]
796acd1aee
Merge #5288
Some checks failed
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Tests on ubuntu-20.04 (push) Failing after 13s
Test suite / Run tests in debug (push) Failing after 13s
Test suite / Run Clippy (push) Failing after 19s
Test suite / Tests on windows-2022 (push) Failing after 48s
Test suite / Run Rustfmt (push) Successful in 1m28s
Test suite / Tests on macos-13 (push) Has been cancelled
5288: Improve AI logging r=dureuill a=Kerollmops

This PR fixes #5285 and brings the changes from #5233 to simplify debugging indexation and search performance issues related to AI. The following texts can be found in the logs to debug and understand performance issues:

 - `embed_one: search` represents the time we spent waiting for the embedding generation, i.e., OpenAI, local HuggingFace, Ollama.
 - `filtered_universe: search::universe` the time spent filtering the documents.
 - ~`next_bucket: search::vector_sort` is the time spent finding the nearest neighbors (ANNs) in the vector store (arroy), locally~ was being triggered too many times.
 - `indexing::vectors` is the time arroy spends indexing the new vectors for a batch.
 - `documents::extract vectors` and `documents::merge vectors` to see the time spent generating and writing the embeddings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-04 10:20:45 +00:00
meili-bors[bot]
ede74ccc42
Merge #5306
Some checks failed
Test suite / Tests on ubuntu-20.04 (push) Failing after 2s
Test suite / Tests almost all features (push) Has been skipped
Test suite / Test disabled tokenization (push) Has been skipped
Test suite / Run tests in debug (push) Failing after 2s
Test suite / Tests on windows-2022 (push) Failing after 24s
Test suite / Run Rustfmt (push) Successful in 1m33s
Test suite / Run Clippy (push) Successful in 6m20s
Test suite / Tests on macos-13 (push) Has been cancelled
5306: Fix internal error when passing `documentTemplateMaxBytes` to a source that doesn't support it r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes #5305 

## What does this PR do?
- add `DOCUMENT_TEMPLATE_MAX_BYTES` to `allowed_sources_for_field` and `allowed_fields_for_source` to prevent a panic


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-02-04 08:46:13 +00:00
Kerollmops
7a9382b115
Better document the rayon limitation condition 2025-02-03 10:24:53 +01:00
Kerollmops
62dabeba5f
Do not create too many rayon tasks when processing the settings 2025-02-03 10:24:52 +01:00
Kerollmops
48812229a9
Remove a log that would log too much 2025-02-03 10:24:52 +01:00
Louis Dureuil
96544bfa43
add DOCUMENT_TEMPLATE_MAX_BYTES to allowed_sources_for_field and allowed_fields_for_source 2025-02-03 09:59:17 +01:00
Kerollmops
aaefbfae1f
Do not create too many rayon tasks 2025-01-30 16:36:12 +01:00
Kerollmops
97e17f52a1
Add more logs to see calls to the embedders 2025-01-30 16:36:12 +01:00
Kerollmops
0f8eb3b506
Improve the logs of the search with AI 2025-01-27 14:22:22 +01:00
Louis Dureuil
4709c638ed
Swap implementations of ollama 2025-01-20 22:22:22 +01:00
Louis Dureuil
8b1fcfd7f8
Parse ollama URL to adapt configuration depending on the endpoint 2025-01-13 14:34:11 +01:00
Clément Renault
0ee4671a91
Fix after upgrading candle 2025-01-08 15:59:56 +01:00
Clément Renault
3e3695445f
Fix after upgrading thiserror 2025-01-08 15:58:32 +01:00
Tamo
0bf4157a75
try my best to make the sub-settings routes works, it doesn't 2025-01-07 16:26:06 +01:00
Gnosnay
44eb153619 Replace hardcoded string with constants 2024-12-28 20:35:55 +08:00
Clément Renault
cc4bd54669
Correctly construct the Embeddings struct 2024-11-28 13:53:25 +01:00
Louis Dureuil
e9d17136b2
Add deadline of 3 seconds to embedding requests made in the context of hybrid search 2024-11-18 12:15:11 +01:00
Louis Dureuil
6570da3bcb
Retry in case where the JSON deserialization fails 2024-11-18 11:33:09 +01:00
Louis Dureuil
3b0cb5b487
Fix vector error messages 2024-11-12 23:26:16 +01:00
Louis Dureuil
bfdcd1cf33
Space changes 2024-11-12 22:52:45 +01:00
Louis Dureuil
c4e9f761e9
Emit better error messages when parsing vectors 2024-11-12 22:49:22 +01:00
Louis Dureuil
8a6e61c77f
InvalidVectorsEmbedderConf error takes a String rather than a deserr error 2024-11-12 22:47:57 +01:00
Louis Dureuil
980921e078
Vector fixes 2024-11-12 16:31:22 +01:00
Louis Dureuil
bef8fc6cf1
Fix hf embedder 2024-11-08 13:10:17 +01:00
Louis Dureuil
4706a0eb49
Fix vector parsing 2024-11-07 23:26:20 +01:00
Louis Dureuil
10f49f0d75
Post processing of the merge 2024-11-06 17:50:12 +01:00
Louis Dureuil
ee03743355
Merge branch 'indexer-edition-2024' into indexer-edition-2024-doc-chunks 2024-11-06 15:50:53 +01:00
ManyTheFish
10feeb88f2 Merge branch 'main' into indexer-edition-2024 2024-11-06 15:19:18 +01:00
Tamo
cf6ad1ae5e Merge branch 'main' into tmp-release-v1.11.0 2024-11-04 16:14:44 +01:00
Clément Renault
9c1e54a2c8
Move crates under a sub folder to clean up the code 2024-10-21 08:18:43 +02:00