250 Commits

Author SHA1 Message Date
Louis Dureuil
c3cdc407ec
Avoid unnecessary clone() 2024-08-08 14:57:02 +02:00
Louis Dureuil
2f10273d14
Group by normalized values, make sure you don't remove a value where there remains at still one value that normalizes towards it 2024-08-08 14:02:53 +02:00
Louis Dureuil
d4ea7cc2a9
fix clippy 👉👈 2024-07-25 12:10:32 +02:00
Louis Dureuil
2413592bbf
Display docid when there are documents without manual embeddings for a manual embedder 2024-07-25 12:10:32 +02:00
ManyTheFish
04fa44e7eb
Implement localized attributes settings 2024-07-25 10:51:27 +02:00
ManyTheFish
cc02920f2b
Update charabia 2024-07-25 10:51:27 +02:00
Louis Dureuil
24240934f9
Improve errors when indexing documents with a user provided embedder 2024-07-16 13:39:01 +02:00
hanbings
0a40a98bb6
Make milli use edition 2021 (#4770)
* Make milli use edition 2021

* Add lifetime annotations to milli.

* Run cargo fmt
2024-07-09 17:25:39 +02:00
meili-bors[bot]
ddd564665b
Merge #4713
4713: Speed up facet distribution r=ManyTheFish a=Kerollmops

This PR is akin to #4682, but this time, the same logic is applied to the facets. Bitmaps are not decoded, and we do an intersection on the bytes with the search candidates instead of materializing the RoaringBitmap to destroy it just after the operation.

A prospect raised some slow requests when performing facet searches, and I found out that the disk optimization intersection wasn't performed on the facets.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-06-24 05:23:46 +00:00
Clément Renault
9736e16a88
Make clippy happy 2024-06-20 13:02:44 +02:00
Louis Dureuil
a04041c8f2
Only spawn the pool once 2024-06-19 16:25:33 +02:00
Louis Dureuil
e35ef31738
Small changes following review 2024-06-13 14:20:48 +02:00
Louis Dureuil
3bc8f81abc
user_provided => regenerate 2024-06-12 18:12:20 +02:00
Louis Dureuil
f5cf01e7d1
Rework extraction to use EmbedderAction 2024-06-12 14:50:55 +02:00
Louis Dureuil
7cef2299cf
Fix behavior when removing a document 2024-06-11 09:45:08 +02:00
Tamo
2cdcb703d9 fix the deletion of vectors and add a test 2024-06-06 11:39:29 +02:00
Tamo
b7349910d9 implements mor review comments 2024-06-06 11:39:29 +02:00
Tamo
5d50850e12 always push the user defined vectors in arroy 2024-06-06 11:39:29 +02:00
Tamo
a73ccc78a6 forward the embedding config to the extractors 2024-06-06 11:39:28 +02:00
Tamo
84e498299b Remove the vectors from the documents database 2024-06-06 11:36:11 +02:00
ManyTheFish
b833be46b9 Avoid running proximity when only the exact attributes changes 2024-06-05 17:30:07 +02:00
ManyTheFish
261e92d7e6 Skip iterating over documents when the faceted field list doesn't change 2024-06-05 17:30:07 +02:00
ManyTheFish
5cd08979b1 iterate over the faceted fields instead of over the whole document 2024-06-05 17:30:07 +02:00
Many the fish
e1fbfde6c4
Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 11:31:03 +02:00
Clément Renault
dc949ab46a
Remove puffin usage 2024-05-27 15:59:14 +02:00
Louis Dureuil
8a941c0241
Smaller review changes 2024-05-22 14:44:42 +02:00
ManyTheFish
f762307838 Fix clippy 2024-05-21 13:44:20 +02:00
ManyTheFish
3e94a90722 Fixes 2024-05-21 13:39:46 +02:00
ManyTheFish
fc7e817221 Index geo points based on the settings differences 2024-05-20 12:27:26 +02:00
Louis Dureuil
52d9cb6e5a
Refactor vector indexing
- use the parsed_vectors module
- only parse `_vectors` once per document, instead of once per embedder per document
2024-05-20 10:36:17 +02:00
Tamo
c22460045c Stops returning an option in the internal searchable fields 2024-05-14 17:00:02 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
meili-bors[bot]
ebca29f3de
Merge #4597
4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish

# Pull Request
- add some conditions reducing the work done when changing the settings
- add some benchmarks on embedders

## Related issue
Fixes #4585


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 16:37:28 +00:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper 2024-04-24 16:40:12 +02:00
ManyTheFish
a1aa999026 Add conditions reducing wrok 2024-04-22 14:18:35 +02:00
ManyTheFish
df29ba709a Make some cleaning in Arcs 2024-04-17 12:33:25 +02:00
ManyTheFish
3acfab2eb7 Fix PR comments 2024-04-17 10:55:51 +02:00
ManyTheFish
87a93ba47d fix clippy 2024-04-16 14:39:30 +02:00
ManyTheFish
eaf113ef34 Fix wod pair proximity error when nothing has to be extracted 2024-04-16 14:39:30 +02:00
ManyTheFish
e5ae337aae Comeback to sorters in extract_word_docids
using buffers and merge the keys manually is less efficient
2024-04-16 14:39:30 +02:00
ManyTheFish
a489b406b4 fix test 2024-04-16 14:39:06 +02:00
ManyTheFish
02c3d6b265 finish work 2024-04-16 14:39:06 +02:00
ManyTheFish
b5e4a55af6 refactor faceted and searchable pipeline 2024-04-16 14:39:06 +02:00
yudrywet
cf864a1c2e chore: fix some typos in comments
Signed-off-by: yudrywet <yudeyao@yeah.net>
2024-04-14 20:11:34 +08:00
Louis Dureuil
f87747f4d3
Remove unwraps 2024-03-25 11:23:04 +01:00
Louis Dureuil
ac52c857e8
Update ollama and openai impls to use the rest embedder internally 2024-03-25 11:23:03 +01:00
Louis Dureuil
b11df7ec34
Meilisearch: fix some wrong spans 2024-03-05 10:11:43 +01:00
ManyTheFish
3beda8833d Fix and add logs 2024-02-14 11:46:30 +01:00
Many the fish
e5e811e2c9
Update milli/src/update/index_documents/extract/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-02-13 14:22:21 +01:00
ManyTheFish
be1b054b05
Compute chunk size based on the input data size ant the number of indexing threads 2024-02-08 17:28:37 +01:00