Commit Graph

642 Commits

Author SHA1 Message Date
Louis Dureuil
3bc8f81abc
user_provided => regenerate 2024-06-12 18:12:20 +02:00
Louis Dureuil
a89eea233b
Fix vectors injection 2024-06-12 17:10:19 +02:00
Louis Dureuil
f5cf01e7d1
Rework extraction to use EmbedderAction 2024-06-12 14:50:55 +02:00
Louis Dureuil
d1dd7e5d09
In transform for removed embedders, write back their user provided vectors in documents, and clear the writers 2024-06-12 14:50:55 +02:00
Louis Dureuil
7cef2299cf
Fix behavior when removing a document 2024-06-11 09:45:08 +02:00
Tamo
2cdcb703d9 fix the deletion of vectors and add a test 2024-06-06 11:39:29 +02:00
Tamo
d85ab23b82 rename all occurences of user_defined to user_provided for consistency 2024-06-06 11:39:29 +02:00
Tamo
b7349910d9 implements mor review comments 2024-06-06 11:39:29 +02:00
Tamo
376b3a19a7 makes clippy and fmt happy 2024-06-06 11:39:29 +02:00
Tamo
5d50850e12 always push the user defined vectors in arroy 2024-06-06 11:39:29 +02:00
Tamo
a73ccc78a6 forward the embedding config to the extractors 2024-06-06 11:39:28 +02:00
Tamo
9eb6f522ea wraps the index embedding config in a struct 2024-06-06 11:37:30 +02:00
Tamo
84e498299b Remove the vectors from the documents database 2024-06-06 11:36:11 +02:00
ManyTheFish
b833be46b9 Avoid running proximity when only the exact attributes changes 2024-06-05 17:30:07 +02:00
ManyTheFish
261e92d7e6 Skip iterating over documents when the faceted field list doesn't change 2024-06-05 17:30:07 +02:00
ManyTheFish
5cd08979b1 iterate over the faceted fields instead of over the whole document 2024-06-05 17:30:07 +02:00
Clément Renault
b81953a65d Add a span for the prepare_for_documents_reindexing 2024-06-05 17:30:07 +02:00
Clément Renault
1b639ce44b Reduce the number of complex calls to settings diff functions 2024-06-05 17:30:07 +02:00
Clément Renault
87cf8a3c94 Introduce a new way to determine the operations to perform on the fields 2024-06-05 17:30:07 +02:00
Clément Renault
0f578348f1 Introduce a dedicated function to write proximity entries in database 2024-06-05 17:30:07 +02:00
Clément Renault
fad4675abe Give the settings diff to the write_typed_chunk_into_index function 2024-06-05 17:30:07 +02:00
Clément Renault
0c6e4b2f00 Introducing a new into_del_add_obkv_conditional_operation function 2024-06-05 17:30:07 +02:00
ManyTheFish
1ab88e10b9 Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 16:24:00 +02:00
Many the fish
e1fbfde6c4
Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 11:31:03 +02:00
Louis Dureuil
d35278320e
Add support functions for accessing arroy writers and readers 2024-05-28 15:27:43 +02:00
Clément Renault
dc949ab46a
Remove puffin usage 2024-05-27 15:59:14 +02:00
meili-bors[bot]
19acc65ad2
Merge #4646
4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops

This PR implements what is described in #4485. It reduces the number of disk writes and disk usage.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-23 16:06:50 +00:00
Clément Renault
fe17c0f52e
Construct the minimal OBKVs according to the settings diff 2024-05-23 11:23:57 +02:00
Clément Renault
bc5663e673
FieldIdsMap no longer useful thanks to #4631 2024-05-22 16:06:15 +02:00
Louis Dureuil
8a941c0241
Smaller review changes 2024-05-22 14:44:42 +02:00
Louis Dureuil
16037e2169
Don't remove embedders that are not in the config from the document DB 2024-05-22 12:24:51 +02:00
Clément Renault
500ddc76b5
Make the flattened sorter optional 2024-05-21 16:16:36 +02:00
Clément Renault
1aa8ed9ef7
Make the original sorter optional 2024-05-21 14:53:26 +02:00
ManyTheFish
f762307838 Fix clippy 2024-05-21 13:44:20 +02:00
ManyTheFish
3e94a90722 Fixes 2024-05-21 13:39:46 +02:00
ManyTheFish
fc7e817221 Index geo points based on the settings differences 2024-05-20 12:27:26 +02:00
Louis Dureuil
d05d49ffd8
Fix tests 2024-05-20 10:36:18 +02:00
Louis Dureuil
0462ebbe58
Don't write an empty _vectors field 2024-05-20 10:36:18 +02:00
Louis Dureuil
2f7a8a4efb
Don't write vectors that weren't autogenerated in document DB 2024-05-20 10:36:18 +02:00
Louis Dureuil
52d9cb6e5a
Refactor vector indexing
- use the parsed_vectors module
- only parse `_vectors` once per document, instead of once per embedder per document
2024-05-20 10:36:17 +02:00
Tamo
897d25780e update milli to latest version 2024-05-16 18:31:32 +02:00
Tamo
4e4a1ddff7 gate a test behind the required feature 2024-05-14 17:00:02 +02:00
Tamo
c22460045c Stops returning an option in the internal searchable fields 2024-05-14 17:00:02 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
meili-bors[bot]
ebca29f3de
Merge #4597
4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish

# Pull Request
- add some conditions reducing the work done when changing the settings
- add some benchmarks on embedders

## Related issue
Fixes #4585


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 16:37:28 +00:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper 2024-04-24 16:40:12 +02:00
Clément Renault
96cc5319c8
Introduce a new internal error type to categorize panics 2024-04-22 18:09:33 +02:00
Clément Renault
0c7003c5df
Introduce an atomic to catch panics in thread pools 2024-04-22 18:09:33 +02:00
ManyTheFish
a1aa999026 Add conditions reducing wrok 2024-04-22 14:18:35 +02:00
ManyTheFish
df29ba709a Make some cleaning in Arcs 2024-04-17 12:33:25 +02:00
ManyTheFish
3acfab2eb7 Fix PR comments 2024-04-17 10:55:51 +02:00
ManyTheFish
87a93ba47d fix clippy 2024-04-16 14:39:30 +02:00
ManyTheFish
eaf113ef34 Fix wod pair proximity error when nothing has to be extracted 2024-04-16 14:39:30 +02:00
ManyTheFish
e5ae337aae Comeback to sorters in extract_word_docids
using buffers and merge the keys manually is less efficient
2024-04-16 14:39:30 +02:00
ManyTheFish
a489b406b4 fix test 2024-04-16 14:39:06 +02:00
ManyTheFish
02c3d6b265 finish work 2024-04-16 14:39:06 +02:00
ManyTheFish
b5e4a55af6 refactor faceted and searchable pipeline 2024-04-16 14:39:06 +02:00
ManyTheFish
893200ab87 Avoid clearing documents in transform 2024-04-16 14:39:06 +02:00
yudrywet
cf864a1c2e chore: fix some typos in comments
Signed-off-by: yudrywet <yudeyao@yeah.net>
2024-04-14 20:11:34 +08:00
Louis Dureuil
466d718a05
Fix test 2024-04-04 15:58:19 +02:00
Louis Dureuil
afd1da5642
Add distribution to all embedders 2024-03-27 11:50:22 +01:00
Louis Dureuil
a1db342f01
Expose REST embedder to the API 2024-03-25 11:23:15 +01:00
Louis Dureuil
f87747f4d3
Remove unwraps 2024-03-25 11:23:04 +01:00
Louis Dureuil
ac52c857e8
Update ollama and openai impls to use the rest embedder internally 2024-03-25 11:23:03 +01:00
Louis Dureuil
b11df7ec34
Meilisearch: fix some wrong spans 2024-03-05 10:11:43 +01:00
ManyTheFish
03bb6372af Change is_batchable_with by mergeable_with 2024-02-14 11:50:22 +01:00
ManyTheFish
3beda8833d Fix and add logs 2024-02-14 11:46:30 +01:00
ManyTheFish
48026aa75c fix PR comments 2024-02-13 15:19:01 +01:00
Many the fish
e5e811e2c9
Update milli/src/update/index_documents/extract/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-02-13 14:22:21 +01:00
Louis Dureuil
7efb1cae11 yield in loop when the channel is not disconnected 2024-02-12 09:12:54 +01:00
ManyTheFish
be1b054b05
Compute chunk size based on the input data size ant the number of indexing threads 2024-02-08 17:28:37 +01:00
meili-bors[bot]
023c2d755f
Merge #4391
4391: Tracing r=dureuill a=irevoire

# Pull Request

- [ ] Hide the parameters of the process batch
- [x] Make actix-web trace every call on every route
- [x] Remove all `env_logger`/`logs` dependencies
- [x] Be able to enable or disable the memory measurement using the `/logs` route parameters

See the following product discussion: https://github.com/orgs/meilisearch/discussions/721

Supersedes https://github.com/meilisearch/meilisearch/pull/4338

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4317

## What does this PR do?

Update the format of the logs from:
```
[2024-02-06T14:54:11Z INFO  actix_server::builder] starting 10 workers
```

to

```
2024-02-06T13:58:14.710803Z  INFO actix_server::builder: 200: starting 10 workers
```

First, run meilisearch with the route enabled via the feature flag:
- `cargo run --experimental-enable-logs-route`
- Or at runtime by sending the following payload:
```
curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json'  \
--data-binary '{
    "logsRoute": true
  }'
```

Then gather data from meilisearch by calling for example:
```
curl \
	-X POST http://localhost:7700/logs \
	-H 'Content-Type: application/json' \
	--data-binary '{
	    "mode": "fmt",
            "target": "milli=trace"
    }'
```

Once your operation is over, tell meilisearch to stop the route:
```
curl \
	-X DELETE http://localhost:7700/logs
```

----

In the case you’re profiling code, you will be interested by the next command that converts the output of the route to a format that the firefox profiler can understand.

```bash
cargo run --release --bin trace-to-firefox -- 2024-01-17_17:07:55-indexing-trace.json
```

Then go to https://profiler.firefox.com and load it.
Note that we can also share the profiles using the https://share.firefox.dev website.


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-02-08 14:16:56 +00:00
Louis Dureuil
407ad753ed
rust fmt 2024-02-08 15:11:42 +01:00
Tamo
bf43a3f60a
fix typo 2024-02-08 15:04:06 +01:00
Tamo
1502382316
use debug instead of debug_span 2024-02-08 15:04:06 +01:00
Tamo
08af0e690c
Structures a bunch of logs 2024-02-08 15:04:06 +01:00
Louis Dureuil
db722d201a
Write entries into database downgraded to trace level 2024-02-08 15:04:05 +01:00
Tamo
e773dfa9ba
get rids of log in milli and add logs for the bucket sort 2024-02-08 15:04:05 +01:00
Louis Dureuil
5d7061682e
Add tracing to milli 2024-02-08 15:03:31 +01:00
Clément Renault
053306c0e7
Try with 500MiB 2024-02-07 11:24:43 +01:00
Clément Renault
9eeb75d501
Clamp the max memory of the grenad sorters to a reasonable maximum 2024-02-06 10:47:04 +01:00
Louis Dureuil
fbf5f2a392
Don't use a runtime in extract_embedder, use it only for OpenAI 2024-02-01 10:33:27 +01:00
Tamo
9f8f3105d5
make clippy happy 2024-02-01 10:33:27 +01:00
Tamo
318843aacd
add a bunch of tests and fix the error message when adding the geosearch as filterable/sortable while there is malformed documents in the DB 2024-02-01 10:33:27 +01:00
Tamo
c1bf33a112
Revert "Remove panic on the geosearch" 2024-01-25 18:51:19 +01:00
Tamo
0887186ecf
make clippy happy 2024-01-17 16:07:10 +01:00
Tamo
7d190d8078
add a bunch of tests and fix the error message when adding the geosearch as filterable/sortable while there is malformed documents in the DB 2024-01-17 15:51:52 +01:00
Clément Renault
01e2c3d6bb
Bump arroy to v0.2.0 2024-01-16 16:45:55 +01:00
Clément Renault
9f9ad4cc05
Fix Clippy warnings 2024-01-16 15:27:24 +01:00
Clément Renault
3ee7682fa7
Fix some integer comparisons 2024-01-16 15:22:23 +01:00
Tamo
54ae6951eb
fix warning 2024-01-02 15:19:30 +01:00
Louis Dureuil
6ff81de401
Fix tests 2023-12-20 17:16:46 +01:00
Many the fish
9e1b458010
Merge branch 'main' into change-proximity-precision-settings 2023-12-18 09:08:47 +01:00
ManyTheFish
6425996e36 Change the naming of attributeScale and wordScale into byAttribute and byWord 2023-12-14 16:31:00 +01:00
Louis Dureuil
87bba98bd8
Various changes
- fixed seed for arroy
- check vector dimensions as soon as it is provided to search
- don't embed whitespace
2023-12-14 16:08:42 +01:00
Louis Dureuil
806e5b6899
Tests pass 2023-12-14 16:08:41 +01:00
Louis Dureuil
e0cc775dc4
Various changes
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
2023-12-14 16:08:41 +01:00
Louis Dureuil
12940d79a9
WIP
- manual embedder
- multi embedders OK
- clippy + tests OK
2023-12-14 16:08:41 +01:00
Louis Dureuil
922a640188
WIP multi embedders
fixed template bugs
2023-12-14 16:08:41 +01:00
Louis Dureuil
65e49b7092
Remove stuff, add distribution shift (WIP) 2023-12-14 16:08:38 +01:00