meili-bors[bot]
049bd45849
Merge #4371
...
4371: Fixes embedder issues r=irevoire a=dureuill
# Pull Request
## Related issue
Fixes #4361
Fixes #4370
## What does this PR do?
- Truncate tokens to 512 for Hugging Face embedders
- Move the tokio runtime to OpenAI so that we no longer have a thread with rayon -> tokio -> rayon
- Spawn a new reqwest client after each new runtime to avoid spurious runtime error
## Manual tests
- embedding failing document from `@CaroFG` with hugging face
- embedding movies with hugging face
- embedding and searching movies with openai
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-01-29 11:23:57 +00:00
Louis Dureuil
425bc92ce6
Don't use a runtime in extract_embedder, use it only for OpenAI
2024-01-29 11:23:18 +01:00
Louis Dureuil
cbd065ed46
Truncate HuggingFace vectors that are too long
2024-01-29 11:22:24 +01:00
Tamo
b9f365a965
make clippy happy
2024-01-25 18:57:22 +01:00
Tamo
3f21daf2e7
add a bunch of tests and fix the error message when adding the geosearch as filterable/sortable while there is malformed documents in the DB
2024-01-25 18:57:21 +01:00
Louis Dureuil
014eaea428
Use MatchingWords from keyword search instead of the one from vector search
2024-01-23 14:47:28 +01:00
meili-bors[bot]
e93d36d5b9
Merge #4313
...
4313: Fix document formatting performances r=Kerollmops a=ManyTheFish
reduce the formatted option list to the attributes that should be formatted,
instead of all the attributes to display.
The time to compute the `format` list scales with the number of fields to format;
cumulated with `map_leaf_values` that iterates over all the nested fields, it gives a quadratic complexity:
`d*f` where `d` is the total number of fields to display and `f` is the total number of fields to format.
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-01-11 14:19:44 +00:00
ManyTheFish
5f5a486895
Reduce formatting time
2024-01-11 11:36:41 +01:00
ManyTheFish
5f4fc6c955
Add timer logs
2024-01-11 09:44:16 +01:00
Clément Renault
3f3462ab62
Limit the number of values returned by the facet search
2024-01-10 16:54:08 +01:00
Tamo
54ae6951eb
fix warning
2024-01-02 15:19:30 +01:00
Louis Dureuil
0bf879fb88
Fix warning on rust stable
2023-12-20 17:48:09 +01:00
Louis Dureuil
6ff81de401
Fix tests
2023-12-20 17:16:46 +01:00
Louis Dureuil
9123370e90
Validate fused settings in settings task after fusing with existing setting
2023-12-20 17:16:46 +01:00
Louis Dureuil
14b396d302
Add new errors
2023-12-20 17:16:45 +01:00
Louis Dureuil
393216bf30
Flatten embedders settings
2023-12-20 17:16:43 +01:00
Louis Dureuil
e249e4db7b
Change Setting::apply function signature
2023-12-20 17:15:24 +01:00
Louis Dureuil
333ce12eb2
Fixed issue where the default revision is always the one we picked for the default model
2023-12-20 10:17:49 +01:00
Louis Dureuil
942d49314c
Remove dependency that requires libstdc++
2023-12-18 22:17:18 +01:00
Many the fish
9e1b458010
Merge branch 'main' into change-proximity-precision-settings
2023-12-18 09:08:47 +01:00
ManyTheFish
6425996e36
Change the naming of attributeScale and wordScale into byAttribute and byWord
2023-12-14 16:31:00 +01:00
Louis Dureuil
eb5cb91da2
Switch default from hf to openai
2023-12-14 16:19:46 +01:00
Louis Dureuil
87bba98bd8
Various changes
...
- fixed seed for arroy
- check vector dimensions as soon as it is provided to search
- don't embed whitespace
2023-12-14 16:08:42 +01:00
Louis Dureuil
217105b7da
hybrid search uses semantic ratio, error handling
2023-12-14 16:08:42 +01:00
ManyTheFish
9991152bbe
Add TODOs
2023-12-14 16:08:42 +01:00
Louis Dureuil
a4536b1381
Small adjustments to respect the spec
2023-12-14 16:08:42 +01:00
Louis Dureuil
5b51cb04af
Remove some settings
2023-12-14 16:08:42 +01:00
Louis Dureuil
b8e4709dfa
Remove prompt strategy and fallback
2023-12-14 16:08:41 +01:00
Louis Dureuil
806e5b6899
Tests pass
2023-12-14 16:08:41 +01:00
Louis Dureuil
e0cc775dc4
Various changes
...
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
2023-12-14 16:08:41 +01:00
Louis Dureuil
12940d79a9
WIP
...
- manual embedder
- multi embedders OK
- clippy + tests OK
2023-12-14 16:08:41 +01:00
Louis Dureuil
922a640188
WIP multi embedders
...
fixed template bugs
2023-12-14 16:08:41 +01:00
Louis Dureuil
d4715e0c4d
Fix same vector sort bug
2023-12-14 16:08:41 +01:00
Louis Dureuil
11e2a2c1aa
Fix geosort bug
2023-12-14 16:08:41 +01:00
Louis Dureuil
65e49b7092
Remove stuff, add distribution shift (WIP)
2023-12-14 16:08:38 +01:00
Louis Dureuil
e56f160032
Actually pass embedders on reindex
2023-12-14 16:07:49 +01:00
Louis Dureuil
687d92f217
prompt bifluor+
2023-12-14 16:07:49 +01:00
Louis Dureuil
fb539f61fe
WIP
2023-12-14 16:07:49 +01:00
Louis Dureuil
cb4ebe163e
WIP
2023-12-14 16:07:49 +01:00
Louis Dureuil
dde3a04679
WIP arroy integration
2023-12-14 16:07:49 +01:00
Louis Dureuil
13c2c6c16b
Small commit to add hybrid search and autoembedding
2023-12-14 16:07:48 +01:00
Louis Dureuil
21bcf32109
Add candle and hg_hub, updating a lot of deps in the process
2023-12-14 16:07:48 +01:00
Clément Renault
56571f762a
Merge remote-tracking branch 'origin/main' into tmp-release-v1.5.1
2023-12-13 11:57:01 +01:00
ManyTheFish
467b49153d
Implement proximityPrecision setting on milli side
2023-12-06 15:49:02 +01:00
ManyTheFish
bddc168d83
List TODOs
2023-12-06 14:59:23 +01:00
ManyTheFish
3b3fa38f27
Put the restrict list in a sub-struct
2023-11-28 18:37:57 +01:00
Clément Renault
170e063b80
Remove the actix-web dependency from milli
2023-11-28 17:19:57 +01:00
ManyTheFish
d6c2ee15a9
Filter on attributes before computing the docids when attribute restriction is on
2023-11-28 14:55:29 +01:00
Clément Renault
ec9b52d608
Rename copy_to_path to copy_to_file
2023-11-28 14:32:30 +01:00
Clément Renault
34c67ac389
Remove the possibility to fail fetching the env info
2023-11-28 14:31:23 +01:00