Louis Dureuil
0a8f50695e
Fixes for Rust v1.79
2024-06-13 17:47:44 +02:00
Louis Dureuil
e35ef31738
Small changes following review
2024-06-13 14:20:48 +02:00
Louis Dureuil
3bc8f81abc
user_provided => regenerate
2024-06-12 18:12:20 +02:00
Louis Dureuil
a89eea233b
Fix vectors injection
2024-06-12 17:10:19 +02:00
Louis Dureuil
f5cf01e7d1
Rework extraction to use EmbedderAction
2024-06-12 14:50:55 +02:00
Louis Dureuil
d1dd7e5d09
In transform for removed embedders, write back their user provided vectors in documents, and clear the writers
2024-06-12 14:50:55 +02:00
Louis Dureuil
d18c1f77d7
Update embedder configs with a finer granularity
...
- no longer clear vector DB between any two embedder changes
2024-06-12 14:50:55 +02:00
Louis Dureuil
7cef2299cf
Fix behavior when removing a document
2024-06-11 09:45:08 +02:00
Tamo
2cdcb703d9
fix the deletion of vectors and add a test
2024-06-06 11:39:29 +02:00
Tamo
d85ab23b82
rename all occurences of user_defined to user_provided for consistency
2024-06-06 11:39:29 +02:00
Tamo
b7349910d9
implements mor review comments
2024-06-06 11:39:29 +02:00
Tamo
376b3a19a7
makes clippy and fmt happy
2024-06-06 11:39:29 +02:00
Tamo
5d50850e12
always push the user defined vectors in arroy
2024-06-06 11:39:29 +02:00
Tamo
a73ccc78a6
forward the embedding config to the extractors
2024-06-06 11:39:28 +02:00
Tamo
9eb6f522ea
wraps the index embedding config in a struct
2024-06-06 11:37:30 +02:00
Tamo
84e498299b
Remove the vectors from the documents database
2024-06-06 11:36:11 +02:00
Tamo
7a84697570
never store the _vectors as searchable or faceted fields
2024-06-06 11:36:11 +02:00
ManyTheFish
30293883e0
Fix condition mistake
2024-06-05 17:30:07 +02:00
ManyTheFish
b833be46b9
Avoid running proximity when only the exact attributes changes
2024-06-05 17:30:07 +02:00
ManyTheFish
0a4118329e
Put only_additional_fields to None if the difference gives an empty result.
2024-06-05 17:30:07 +02:00
ManyTheFish
261e92d7e6
Skip iterating over documents when the faceted field list doesn't change
2024-06-05 17:30:07 +02:00
ManyTheFish
5cd08979b1
iterate over the faceted fields instead of over the whole document
2024-06-05 17:30:07 +02:00
Clément Renault
a998b881f6
Cache a lot of operations to know if a field must be indexed
2024-06-05 17:30:07 +02:00
Clément Renault
b81953a65d
Add a span for the prepare_for_documents_reindexing
2024-06-05 17:30:07 +02:00
Clément Renault
091bb157f1
Add a span for the settings diff creation
2024-06-05 17:30:07 +02:00
Clément Renault
1b639ce44b
Reduce the number of complex calls to settings diff functions
2024-06-05 17:30:07 +02:00
Clément Renault
87cf8a3c94
Introduce a new way to determine the operations to perform on the fields
2024-06-05 17:30:07 +02:00
Clément Renault
0f578348f1
Introduce a dedicated function to write proximity entries in database
2024-06-05 17:30:07 +02:00
Clément Renault
fad4675abe
Give the settings diff to the write_typed_chunk_into_index function
2024-06-05 17:30:07 +02:00
Clément Renault
1ab03c4ede
Fix an issue with settings diff and * in the searchable attributes
2024-06-05 17:30:07 +02:00
Clément Renault
0c6e4b2f00
Introducing a new into_del_add_obkv_conditional_operation function
2024-06-05 17:30:07 +02:00
Clément Renault
42b3f52ef9
Introduce the SettingDiff only_additional_fields method
2024-06-05 17:30:07 +02:00
ManyTheFish
1ab88e10b9
Merge branch 'main' into merge-release-v1.8.1-in-main
2024-05-29 16:24:00 +02:00
Many the fish
e1fbfde6c4
Merge branch 'main' into merge-release-v1.8.1-in-main
2024-05-29 11:31:03 +02:00
ManyTheFish
27b75ec648
merge main into v1.8.1
2024-05-29 11:26:07 +02:00
Louis Dureuil
d35278320e
Add support functions for accessing arroy writers and readers
2024-05-28 15:27:43 +02:00
Clément Renault
dc949ab46a
Remove puffin usage
2024-05-27 15:59:14 +02:00
meili-bors[bot]
19acc65ad2
Merge #4646
...
4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops
This PR implements what is described in #4485 . It reduces the number of disk writes and disk usage.
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-23 16:06:50 +00:00
Clément Renault
fe17c0f52e
Construct the minimal OBKVs according to the settings diff
2024-05-23 11:23:57 +02:00
Clément Renault
bc5663e673
FieldIdsMap no longer useful thanks to #4631
2024-05-22 16:06:15 +02:00
Louis Dureuil
8a941c0241
Smaller review changes
2024-05-22 14:44:42 +02:00
Louis Dureuil
16037e2169
Don't remove embedders that are not in the config from the document DB
2024-05-22 12:24:51 +02:00
Clément Renault
500ddc76b5
Make the flattened sorter optional
2024-05-21 16:16:36 +02:00
Clément Renault
1aa8ed9ef7
Make the original sorter optional
2024-05-21 14:53:26 +02:00
ManyTheFish
f762307838
Fix clippy
2024-05-21 13:44:20 +02:00
ManyTheFish
3e94a90722
Fixes
2024-05-21 13:39:46 +02:00
ManyTheFish
fc7e817221
Index geo points based on the settings differences
2024-05-20 12:27:26 +02:00
Louis Dureuil
d05d49ffd8
Fix tests
2024-05-20 10:36:18 +02:00
Louis Dureuil
0462ebbe58
Don't write an empty _vectors field
2024-05-20 10:36:18 +02:00
Louis Dureuil
2f7a8a4efb
Don't write vectors that weren't autogenerated in document DB
2024-05-20 10:36:18 +02:00
Louis Dureuil
52d9cb6e5a
Refactor vector indexing
...
- use the parsed_vectors module
- only parse `_vectors` once per document, instead of once per embedder per document
2024-05-20 10:36:17 +02:00
Tamo
897d25780e
update milli to latest version
2024-05-16 18:31:32 +02:00
Tamo
f2d0a59f1d
when no searchable attributes are defined, makes all the weight equals to zero
2024-05-16 01:06:33 +02:00
Tamo
ad4d8502b3
stops storing the whole fieldids weights map when no searchable are defined
2024-05-15 17:16:10 +02:00
Tamo
7ec4e2a3fb
apply all style review comments
2024-05-15 15:02:26 +02:00
Tamo
caa6a7149a
make the attribute ranking rule use the weights and fix the tests
2024-05-14 17:36:32 +02:00
Tamo
b0afe0972e
stop updating the fields ids map when fields are only swapped
2024-05-14 17:00:02 +02:00
Tamo
685f452fb2
Fix the indexing of the searchable
2024-05-14 17:00:02 +02:00
Tamo
4e4a1ddff7
gate a test behind the required feature
2024-05-14 17:00:02 +02:00
Tamo
c22460045c
Stops returning an option in the internal searchable fields
2024-05-14 17:00:02 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
...
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
meili-bors[bot]
ebca29f3de
Merge #4597
...
4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish
# Pull Request
- add some conditions reducing the work done when changing the settings
- add some benchmarks on embedders
## Related issue
Fixes #4585
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 16:37:28 +00:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper
2024-04-24 16:40:12 +02:00
Clément Renault
96cc5319c8
Introduce a new internal error type to categorize panics
2024-04-22 18:09:33 +02:00
Clément Renault
0c7003c5df
Introduce an atomic to catch panics in thread pools
2024-04-22 18:09:33 +02:00
ManyTheFish
a1aa999026
Add conditions reducing wrok
2024-04-22 14:18:35 +02:00
ManyTheFish
df29ba709a
Make some cleaning in Arcs
2024-04-17 12:33:25 +02:00
ManyTheFish
3acfab2eb7
Fix PR comments
2024-04-17 10:55:51 +02:00
ManyTheFish
87a93ba47d
fix clippy
2024-04-16 14:39:30 +02:00
ManyTheFish
eaf113ef34
Fix wod pair proximity error when nothing has to be extracted
2024-04-16 14:39:30 +02:00
ManyTheFish
e5ae337aae
Comeback to sorters in extract_word_docids
...
using buffers and merge the keys manually is less efficient
2024-04-16 14:39:30 +02:00
ManyTheFish
a489b406b4
fix test
2024-04-16 14:39:06 +02:00
ManyTheFish
02c3d6b265
finish work
2024-04-16 14:39:06 +02:00
ManyTheFish
b5e4a55af6
refactor faceted and searchable pipeline
2024-04-16 14:39:06 +02:00
ManyTheFish
a7e368aaa6
Create InnerIndexSettingsDiffs struct and populate it
2024-04-16 14:39:06 +02:00
ManyTheFish
893200ab87
Avoid clearing documents in transform
2024-04-16 14:39:06 +02:00
ManyTheFish
aabce52b1b
Fix test
2024-04-16 14:39:06 +02:00
ManyTheFish
8fff5fc281
update tests
2024-04-16 14:39:06 +02:00
yudrywet
cf864a1c2e
chore: fix some typos in comments
...
Signed-off-by: yudrywet <yudeyao@yeah.net>
2024-04-14 20:11:34 +08:00
Louis Dureuil
466d718a05
Fix test
2024-04-04 15:58:19 +02:00
meili-bors[bot]
56bf8503db
Merge #4537
...
4537: Expose distribution shift in settings r=ManyTheFish a=dureuill
See [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#d652adc0890445658aaf36352dbc8802 )
# Changes
- Distribution shift added to all embedders.
- Exposed in settings
- Changed the reindexing logic to not trigger a reindex operation when only the distribution shift or API key change
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-03 09:08:58 +00:00
redistay
182cb42953
chore: fix some typos in conments
...
Signed-off-by: redistay <wujunjing@outlook.com>
2024-04-02 19:37:55 +08:00
meili-bors[bot]
92a049c2dd
Merge #4543
...
4543: Bring back changes from v1.7.4 into main r=Kerollmops a=dureuill
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: dureuill <dureuill@users.noreply.github.com>
2024-03-28 16:53:51 +00:00
Louis Dureuil
796213af9a
Merge branch 'main' into tmp-release-v1.7.4
2024-03-28 10:51:49 +01:00
Louis Dureuil
ee8cbea810
Don't optimize reindexing when fields contain dots
2024-03-27 17:04:45 +01:00
Louis Dureuil
572fb3a51d
Finer granularity for embedder needs reindex
2024-03-27 12:01:34 +01:00
Louis Dureuil
afd1da5642
Add distribution to all embedders
2024-03-27 11:50:22 +01:00
Louis Dureuil
817ccc089a
also allow api_key
2024-03-25 11:50:00 +01:00
Louis Dureuil
4136630ea5
Use constants instead of raw strings in set_*set()
2024-03-25 11:39:33 +01:00
Louis Dureuil
58972f35cb
Allow url
parameter for ollama embedder
2024-03-25 11:32:55 +01:00
Louis Dureuil
dfa5e41ea6
Check validity of the URL setting
2024-03-25 11:23:16 +01:00
Louis Dureuil
a1db342f01
Expose REST embedder to the API
2024-03-25 11:23:15 +01:00
Louis Dureuil
f87747f4d3
Remove unwraps
2024-03-25 11:23:04 +01:00
Louis Dureuil
ac52c857e8
Update ollama and openai impls to use the rest embedder internally
2024-03-25 11:23:03 +01:00
meili-bors[bot]
fc1c3f4a29
Merge #4466
...
4466: Implements the search cutoff r=irevoire a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4488
## What does this PR do?
- Adds a cutoff to the bucket sort after 150ms has been spent
- Adds a new setting to customize the default value of 150ms
- When the time is exceeded, we exit early with what we had the time to sort
- If the cutoff has been reached, the search details are updated with a new `Skip` ranking details for the ranking rules that were skipped
- Adds analytics to measure the total number of degraded search requests
- Adds the number of degraded search requests to the Prometheus metrics and Grafana dashboard
- The cutoff **must not** skip the filters; otherwise, we would leak documents to people who don’t have the right to see them
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-03-20 13:06:53 +00:00
Tamo
c5322df519
Revert "Revert "Merge remote-tracking branch 'origin/main' into release-v1.7.1""
2024-03-20 10:08:28 +01:00
Tamo
567194b925
Revert "Merge remote-tracking branch 'origin/main' into release-v1.7.1"
...
This reverts commit bd74cce86a
, reversing
changes made to d2f77e88bd
.
2024-03-19 16:56:21 +01:00
Clément Renault
bd74cce86a
Merge remote-tracking branch 'origin/main' into release-v1.7.1
2024-03-19 13:39:17 +01:00
Tamo
d1db495119
add a settings for the search cutoff
2024-03-19 10:28:23 +01:00
meili-bors[bot]
abd954755d
Merge #4476
...
4476: Make the `/facet-search` route use the `sortFacetValuesBy` setting r=irevoire a=Kerollmops
This PR fixes #4423 by ensuring that the `/facet-search` route uses the `sortFacetValuesBy` setting.
Note for the documentation team (to be moved in the tracking issue): Using the new `sortFacetValuesBy` setting can slow down the facet-search requests as Meilisearch iterates over the whole list of facet values and computes the count of documents on every entry. That is hardly or even impossible to optimize correctly.
### TODO
- [x] Create a custom HashMap wrapper for the facet `OrderBy` settings.
This wrapper will return the `OrderBy` setting of the facet, if not defined will use the default `*` one, and if not there either (strange) will fall back on the lexicographic one.
- [x] Create a `ValuesCollection` wrapper that implements the logic for the lexicographic and count order by.
- [x] Use it when there is no search query.
- [x] Use it when there is a search query with and without allowed typos.
- [x] Do not change the original logic, only use a wrapper.
- [x] Add tests
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-03-13 14:36:14 +00:00