Clément Renault
9736e16a88
Make clippy happy
2024-06-20 13:02:44 +02:00
Clément Renault
6fa4da8ae7
Improve facet distribution speed in count mode
2024-06-20 12:58:51 +02:00
Clément Renault
19d7cdc20d
Improve facet distribution speed in lexico mode
2024-06-20 12:57:08 +02:00
Louis Dureuil
a04041c8f2
Only spawn the pool once
2024-06-19 16:25:33 +02:00
meili-bors[bot]
e580d6b98f
Merge #4693
...
4693: Introduce distinct attributes at search time r=irevoire a=Kerollmops
This PR fixes #4611 .
### To Do
- [x] Remove the `distinguishableAttributes` settings (not even a commit about that).
- [x] Use the `filterableAttributes` to be able to use the `distinct` parameter at search.
- [x] Work on the errors and make tests.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-06-18 07:45:03 +00:00
Tamo
43875e6758
fix bug around nested fields
2024-06-17 15:59:30 +02:00
meili-bors[bot]
e9bf4c43a4
Merge #4649
...
4649: Don't store the vectors in the documents database r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4607
## What does this PR do?
- Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable
- [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings
- [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field.
- [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-17 12:32:03 +00:00
Louis Dureuil
0a8f50695e
Fixes for Rust v1.79
2024-06-13 17:47:44 +02:00
Louis Dureuil
e35ef31738
Small changes following review
2024-06-13 14:20:48 +02:00
Louis Dureuil
3bc8f81abc
user_provided => regenerate
2024-06-12 18:12:20 +02:00
Louis Dureuil
a89eea233b
Fix vectors injection
2024-06-12 17:10:19 +02:00
Louis Dureuil
f5cf01e7d1
Rework extraction to use EmbedderAction
2024-06-12 14:50:55 +02:00
Louis Dureuil
d1dd7e5d09
In transform for removed embedders, write back their user provided vectors in documents, and clear the writers
2024-06-12 14:50:55 +02:00
Louis Dureuil
d18c1f77d7
Update embedder configs with a finer granularity
...
- no longer clear vector DB between any two embedder changes
2024-06-12 14:50:55 +02:00
Louis Dureuil
d0b05ae691
Add EmbedderAction to settings
2024-06-12 14:50:54 +02:00
Louis Dureuil
e9bf4eb100
Reformulate ParsedVectorsDiff in terms of VectorState
2024-06-12 14:11:44 +02:00
Louis Dureuil
b368105272
Add EmbedderConfigs::into_inner
2024-06-12 14:11:44 +02:00
meili-bors[bot]
e0eff08095
Merge #4685
...
4685: Fix ci tests r=dureuill a=ManyTheFish
# Pull Request
Make the all following CI succeed:
https://github.com/meilisearch/meilisearch/actions/runs/9477183091
## Related issue
Fixes #4629
## What does this PR do?
- Change the test behavior for `swedish-recomposition` feature flag
- Remove the `-v` parameter from grep
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-06-12 07:58:33 +00:00
Clément Renault
39f60abd7d
Add and modify distinct tests
2024-06-11 17:53:53 -04:00
Clément Renault
1991bd03da
Distinct at search erases the distinct in the settings
2024-06-11 17:02:39 -04:00
Clément Renault
ee39309aae
Improve errors and introduce a new InvalidSearchDistinct error code
2024-06-11 16:03:39 -04:00
Clément Renault
0d31be1494
Make the distinct work at search
2024-06-11 11:39:35 -04:00
Louis Dureuil
7cef2299cf
Fix behavior when removing a document
2024-06-11 09:45:08 +02:00
ManyTheFish
57d066595b
fix Tests almost all features
2024-06-06 17:24:50 +02:00
Clément Renault
75b2e02cd2
Log more stuff around filtering
2024-06-06 11:00:07 -04:00
Clément Renault
52d0d35b39
Revert "Reduce the universe while exploring the facet tree" because it's slower this way
...
This reverts commit 14026115f21409535772ede0ee4273f37848dd61.
2024-06-06 09:17:51 -04:00
Clément Renault
5432776132
Reduce the universe while exploring the facet tree
2024-06-06 09:17:51 -04:00
Clément Renault
66470b27e6
Use the MultiOps trait for IN operations
2024-06-06 09:17:51 -04:00
Clément Renault
0a9bd398c7
Improve the NOT operator to use the universe when possible
2024-06-06 09:17:51 -04:00
Clément Renault
7967e93c16
Skip evaluating when a universe is empty, nothing can be found
2024-06-06 09:17:51 -04:00
Clément Renault
a6f3a01c6a
Expose the universe to do efficient intersections on deserialization
2024-06-06 09:17:51 -04:00
Clément Renault
4ca4a3f954
Make the CboRoaringBitmapCodec support intersection on deserialization
2024-06-06 09:17:51 -04:00
Clément Renault
e4a69c5ac3
Introduce the FacetGroupLazyValue type
2024-06-06 09:17:50 -04:00
Clément Renault
531e3d7d6a
MultiOps trait for OR operations
2024-06-06 09:17:50 -04:00
Tamo
2cdcb703d9
fix the deletion of vectors and add a test
2024-06-06 11:39:29 +02:00
Tamo
31a793d226
fix the regeneration of the embeddings in the search
2024-06-06 11:39:29 +02:00
Tamo
d85ab23b82
rename all occurences of user_defined to user_provided for consistency
2024-06-06 11:39:29 +02:00
Tamo
b7349910d9
implements mor review comments
2024-06-06 11:39:29 +02:00
Tamo
376b3a19a7
makes clippy and fmt happy
2024-06-06 11:39:29 +02:00
Tamo
b867829ef1
remove useless dbg
2024-06-06 11:39:29 +02:00
Tamo
5d50850e12
always push the user defined vectors in arroy
2024-06-06 11:39:29 +02:00
Tamo
a73ccc78a6
forward the embedding config to the extractors
2024-06-06 11:39:28 +02:00
Tamo
9eb6f522ea
wraps the index embedding config in a struct
2024-06-06 11:37:30 +02:00
Tamo
04f6523f3c
expose a new parameter to retrieve the embedders at search time
2024-06-06 11:36:11 +02:00
Tamo
84e498299b
Remove the vectors from the documents database
2024-06-06 11:36:11 +02:00
Tamo
7a84697570
never store the _vectors as searchable or faceted fields
2024-06-06 11:36:11 +02:00
Tamo
4148fbbe85
provide a method to get all the nested fields ids from a name
2024-06-06 11:36:11 +02:00
ManyTheFish
2e50c6ec81
Update Charabia
2024-06-06 10:18:43 +02:00
ManyTheFish
30293883e0
Fix condition mistake
2024-06-05 17:30:07 +02:00
ManyTheFish
b833be46b9
Avoid running proximity when only the exact attributes changes
2024-06-05 17:30:07 +02:00
ManyTheFish
0a4118329e
Put only_additional_fields to None if the difference gives an empty result.
2024-06-05 17:30:07 +02:00
ManyTheFish
261e92d7e6
Skip iterating over documents when the faceted field list doesn't change
2024-06-05 17:30:07 +02:00
ManyTheFish
5cd08979b1
iterate over the faceted fields instead of over the whole document
2024-06-05 17:30:07 +02:00
Clément Renault
a998b881f6
Cache a lot of operations to know if a field must be indexed
2024-06-05 17:30:07 +02:00
Clément Renault
b81953a65d
Add a span for the prepare_for_documents_reindexing
2024-06-05 17:30:07 +02:00
Clément Renault
091bb157f1
Add a span for the settings diff creation
2024-06-05 17:30:07 +02:00
Clément Renault
1b639ce44b
Reduce the number of complex calls to settings diff functions
2024-06-05 17:30:07 +02:00
Clément Renault
87cf8a3c94
Introduce a new way to determine the operations to perform on the fields
2024-06-05 17:30:07 +02:00
Clément Renault
0f578348f1
Introduce a dedicated function to write proximity entries in database
2024-06-05 17:30:07 +02:00
Clément Renault
fad4675abe
Give the settings diff to the write_typed_chunk_into_index function
2024-06-05 17:30:07 +02:00
Clément Renault
1ab03c4ede
Fix an issue with settings diff and * in the searchable attributes
2024-06-05 17:30:07 +02:00
Clément Renault
0c6e4b2f00
Introducing a new into_del_add_obkv_conditional_operation function
2024-06-05 17:30:07 +02:00
Clément Renault
42b3f52ef9
Introduce the SettingDiff only_additional_fields method
2024-06-05 17:30:07 +02:00
meili-bors[bot]
93f5defedc
Merge #4656
...
4656: Adding a new `searchableAttribute` no longer re-index all the attributes r=ManyTheFish a=Kerollmops
Fixes #4492 .
## To Do
- [x] Do not call the `InnerSettingsDiff::only_additional_fields` function too many times
- [ ] Add tests
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-06-05 14:51:14 +00:00
ManyTheFish
33241a6b12
Fix condition mistake
2024-06-05 16:00:24 +02:00
ManyTheFish
ff87b4db26
Avoid running proximity when only the exact attributes changes
2024-06-05 12:48:44 +02:00
ManyTheFish
ba9fadc8f1
Put only_additional_fields to None if the difference gives an empty result.
2024-06-05 10:51:16 +02:00
ManyTheFish
d29d4f88da
Skip iterating over documents when the faceted field list doesn't change
2024-06-04 15:31:24 +02:00
ManyTheFish
17c5ceeb9d
iterate over the faceted fields instead of over the whole document
2024-06-04 14:04:20 +02:00
meili-bors[bot]
fc584f1db3
Merge #4666
...
4666: Add a score threshold search parameter r=ManyTheFish a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4609
## What does this PR do?
- See [usage](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#95b76ded400342ba9ab3d67c734836f0 ) and [the known limitation](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#e4e32195bf0e4195b5daecdbb7a97a17 )
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-03 08:42:44 +00:00
Louis Dureuil
2b6db6541e
Changes after review
2024-06-03 10:30:00 +02:00
meili-bors[bot]
d6bd88ce4f
Merge #4667
...
4667: Frequency matching strategy r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes #3773
## What does this PR do?
- add test for matching strategy
- implement frequency matching strategy
See the [PRD for more details](https://www.notion.so/meilisearch/Frequency-Matching-Strategy-0f3ba08833a442a39590a53a1505ab00 ).
[Public API](https://www.notion.so/meilisearch/frequency-matching-strategy-89868fb7fc584026bc56e378eb854a7f ).
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-05-30 14:53:31 +00:00
Clément Renault
b9a0ff0dd6
Cache a lot of operations to know if a field must be indexed
2024-05-30 16:18:23 +02:00
Clément Renault
75496af985
Add a span for the prepare_for_documents_reindexing
2024-05-30 12:14:22 +02:00
Clément Renault
0e9eb9eedb
Add a span for the settings diff creation
2024-05-30 12:08:27 +02:00
ManyTheFish
3f1a510069
Add tests and fix matching strategy
2024-05-30 12:02:42 +02:00
Clément Renault
3a78e988da
Reduce the number of complex calls to settings diff functions
2024-05-30 11:23:07 +02:00
Clément Renault
d9e5074189
Introduce a new way to determine the operations to perform on the fields
2024-05-30 11:23:07 +02:00
Clément Renault
bc210bdc00
Introduce a dedicated function to write proximity entries in database
2024-05-30 11:23:06 +02:00
Clément Renault
4bf83f701c
Give the settings diff to the write_typed_chunk_into_index function
2024-05-30 11:23:06 +02:00
Clément Renault
db3887929f
Fix an issue with settings diff and * in the searchable attributes
2024-05-30 11:22:50 +02:00
Clément Renault
9af103a88e
Introducing a new into_del_add_obkv_conditional_operation function
2024-05-30 11:22:49 +02:00
Clément Renault
99211eb375
Introduce the SettingDiff only_additional_fields method
2024-05-30 11:22:49 +02:00
Louis Dureuil
4f03b0cf5b
Add ranking score threshold to similar
2024-05-30 11:20:50 +02:00
Louis Dureuil
c26db7878c
Expose rankingScoreThreshold in API
2024-05-30 10:32:35 +02:00
ManyTheFish
1ab88e10b9
Merge branch 'main' into merge-release-v1.8.1-in-main
2024-05-29 16:24:00 +02:00
Louis Dureuil
aac1d769a7
Add ranking_score_threshold to milli
2024-05-29 14:17:09 +02:00
ManyTheFish
abdc4afcca
Implement Frequency matching strategy
2024-05-29 13:59:08 +02:00
Many the fish
e1fbfde6c4
Merge branch 'main' into merge-release-v1.8.1-in-main
2024-05-29 11:31:03 +02:00
ManyTheFish
27b75ec648
merge main into v1.8.1
2024-05-29 11:26:07 +02:00
Louis Dureuil
ca6cc4654b
Add similar route
2024-05-28 15:28:19 +02:00
Louis Dureuil
d35278320e
Add support functions for accessing arroy writers and readers
2024-05-28 15:27:43 +02:00
Louis Dureuil
02b3d82c60
filtered_universe accepts index and txn instead of SearchContext
2024-05-28 15:22:12 +02:00
Louis Dureuil
fd2c95999d
Change validate_document_id
to public and remove extra layer of result
2024-05-28 15:21:19 +02:00
Clément Renault
dc949ab46a
Remove puffin usage
2024-05-27 15:59:14 +02:00
Clément Renault
7f3e51349e
Remove puffin for the dependencies
2024-05-27 15:53:06 +02:00
meili-bors[bot]
19acc65ad2
Merge #4646
...
4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops
This PR implements what is described in #4485 . It reduces the number of disk writes and disk usage.
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-23 16:06:50 +00:00
Clément Renault
fe17c0f52e
Construct the minimal OBKVs according to the settings diff
2024-05-23 11:23:57 +02:00
Clément Renault
bc5663e673
FieldIdsMap no longer useful thanks to #4631
2024-05-22 16:06:15 +02:00
Louis Dureuil
8a941c0241
Smaller review changes
2024-05-22 14:44:42 +02:00
Louis Dureuil
3412e7fbcf
"[]" is deserialized as 0 embedding rather than 1 embedding of dim 0
2024-05-22 12:25:21 +02:00
Louis Dureuil
16037e2169
Don't remove embedders that are not in the config from the document DB
2024-05-22 12:24:51 +02:00
Louis Dureuil
8f7c8ca7f0
Remove now unused error variant
2024-05-22 12:23:43 +02:00
Clément Renault
500ddc76b5
Make the flattened sorter optional
2024-05-21 16:16:36 +02:00
Clément Renault
943f8dba0c
Make clippy happy
2024-05-21 14:58:41 +02:00
Clément Renault
1aa8ed9ef7
Make the original sorter optional
2024-05-21 14:53:26 +02:00
ManyTheFish
f762307838
Fix clippy
2024-05-21 13:44:20 +02:00
ManyTheFish
3e94a90722
Fixes
2024-05-21 13:39:46 +02:00
Louis Dureuil
b17cb56dee
Test array of vectors
2024-05-20 14:44:10 +02:00
ManyTheFish
fc7e817221
Index geo points based on the settings differences
2024-05-20 12:27:26 +02:00
Louis Dureuil
d05d49ffd8
Fix tests
2024-05-20 10:36:18 +02:00
Louis Dureuil
0462ebbe58
Don't write an empty _vectors field
2024-05-20 10:36:18 +02:00
Louis Dureuil
2f7a8a4efb
Don't write vectors that weren't autogenerated in document DB
2024-05-20 10:36:18 +02:00
Louis Dureuil
52d9cb6e5a
Refactor vector indexing
...
- use the parsed_vectors module
- only parse `_vectors` once per document, instead of once per embedder per document
2024-05-20 10:36:17 +02:00
Louis Dureuil
261de888b7
Add function to get the embeddings of a document in an index
2024-05-20 10:36:17 +02:00
Louis Dureuil
98c811247e
Add parsed vectors module
2024-05-20 10:25:59 +02:00
Tamo
273c6e8c5c
uses the latest version of heed to get rid of unsafe code
2024-05-16 18:31:32 +02:00
Tamo
897d25780e
update milli to latest version
2024-05-16 18:31:32 +02:00
Tamo
f2d0a59f1d
when no searchable attributes are defined, makes all the weight equals to zero
2024-05-16 01:06:33 +02:00
Tamo
c78a2fa4f5
rename method and variable around the attributes to search on feature
2024-05-15 18:04:42 +02:00
Tamo
5542f1d9f1
get back to what we were doingb efore in the DB cache and with the restricted field id
2024-05-15 18:00:39 +02:00
Tamo
ad4d8502b3
stops storing the whole fieldids weights map when no searchable are defined
2024-05-15 17:16:10 +02:00
Tamo
7ec4e2a3fb
apply all style review comments
2024-05-15 15:02:26 +02:00
Tamo
9fffb8e83d
make clippy happy
2024-05-14 17:36:32 +02:00
Tamo
caa6a7149a
make the attribute ranking rule use the weights and fix the tests
2024-05-14 17:36:32 +02:00
Tamo
a0082c4df9
add a failing test on the attribute ranking rule
2024-05-14 17:00:02 +02:00
Tamo
b0afe0972e
stop updating the fields ids map when fields are only swapped
2024-05-14 17:00:02 +02:00
Tamo
9ecde41853
add a test on the current behaviour
2024-05-14 17:00:02 +02:00
Tamo
685f452fb2
Fix the indexing of the searchable
2024-05-14 17:00:02 +02:00
Tamo
4e4a1ddff7
gate a test behind the required feature
2024-05-14 17:00:02 +02:00
Tamo
c22460045c
Stops returning an option in the internal searchable fields
2024-05-14 17:00:02 +02:00
Clément Renault
ac4bc143c4
Bump ureq to v2.9.7
2024-05-07 10:39:38 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
...
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
Louis Dureuil
f4dd73ec8c
Destructure EmbedderOptions so we don't miss some options
2024-05-02 15:39:36 +02:00
ManyTheFish
88174b8ae4
Update charabia v0.8.10
2024-04-30 14:30:23 +02:00
meili-bors[bot]
ebca29f3de
Merge #4597
...
4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish
# Pull Request
- add some conditions reducing the work done when changing the settings
- add some benchmarks on embedders
## Related issue
Fixes #4585
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 16:37:28 +00:00
meili-bors[bot]
c793b6ef6d
Merge #4600
...
4600: Fix embedders api r=ManyTheFish a=ManyTheFish
# Pull Request
## Related issue
Fixes #4594
Fixes #4595
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 13:16:33 +00:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper
2024-04-24 16:40:12 +02:00
ManyTheFish
9b76501875
Display set API key for Ollama embedder
2024-04-24 12:33:07 +02:00
Clément Renault
b3173d0423
Remove useless dots in the error messages
2024-04-22 18:09:33 +02:00
Clément Renault
96cc5319c8
Introduce a new internal error type to categorize panics
2024-04-22 18:09:33 +02:00
Clément Renault
0c7003c5df
Introduce an atomic to catch panics in thread pools
2024-04-22 18:09:33 +02:00
ManyTheFish
a1aa999026
Add conditions reducing wrok
2024-04-22 14:18:35 +02:00
ManyTheFish
c71b5d09ff
Updatre charabia v0.8.9
2024-04-18 11:38:26 +02:00
writegr
ab43a8a949
chore: fix some typos in comments
...
Signed-off-by: writegr <wellweek@outlook.com>
2024-04-18 14:12:52 +08:00
meili-bors[bot]
4a8459b799
Merge #4576
...
4576: increase the default search time budget from 150ms to 1.5s r=ManyTheFish a=irevoire
# Pull Request
## Related issue
Fixes #4575
## What does this PR do?
- increase the default search time budget from 150ms to 1.5s
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-17 16:04:47 +00:00
Clément Renault
c923adf222
Fix facet distribution for alpha on facet numbers
2024-04-17 16:31:16 +02:00
ManyTheFish
df29ba709a
Make some cleaning in Arcs
2024-04-17 12:33:25 +02:00
ManyTheFish
3acfab2eb7
Fix PR comments
2024-04-17 10:55:51 +02:00
Tamo
19137be0ea
increase the default search time budget from 150ms to 1.5s
2024-04-16 18:09:49 +02:00
ManyTheFish
87a93ba47d
fix clippy
2024-04-16 14:39:30 +02:00
ManyTheFish
eaf113ef34
Fix wod pair proximity error when nothing has to be extracted
2024-04-16 14:39:30 +02:00
ManyTheFish
e5ae337aae
Comeback to sorters in extract_word_docids
...
using buffers and merge the keys manually is less efficient
2024-04-16 14:39:30 +02:00
ManyTheFish
a489b406b4
fix test
2024-04-16 14:39:06 +02:00
ManyTheFish
02c3d6b265
finish work
2024-04-16 14:39:06 +02:00
ManyTheFish
b5e4a55af6
refactor faceted and searchable pipeline
2024-04-16 14:39:06 +02:00
ManyTheFish
a7e368aaa6
Create InnerIndexSettingsDiffs struct and populate it
2024-04-16 14:39:06 +02:00
ManyTheFish
893200ab87
Avoid clearing documents in transform
2024-04-16 14:39:06 +02:00
ManyTheFish
aabce52b1b
Fix test
2024-04-16 14:39:06 +02:00
ManyTheFish
8fff5fc281
update tests
2024-04-16 14:39:06 +02:00
yudrywet
cf864a1c2e
chore: fix some typos in comments
...
Signed-off-by: yudrywet <yudeyao@yeah.net>
2024-04-14 20:11:34 +08:00
Louis Dureuil
89e72fab32
Update grenad to fix rare DB corruption
2024-04-11 21:06:59 +02:00
meili-bors[bot]
b1844b0c27
Merge #4548
...
4548: v1.8 hybrid search changes r=dureuill a=dureuill
Implements the search changes from the [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#40f24df3da694428a39cc8043c9cfc64 )
### ⚠️ Breaking changes in an experimental feature:
- Removed the `_semanticScore`. Use the `_rankingScore` instead.
- Removed `vector` in the response of the search (output was too big).
- Removed all the vectors from the `vectorSort` ranking score details
- target vector appearing in the name of the rule
- matched vector appearing in the details of the rule
### Other user-facing changes
- Added `semanticHitCount`, indicating how many hits were returned from the semantic search. This is especially useful in the hybrid search.
- Embed lazily: Meilisearch no longer generates an embedding when the keyword results are "good enough".
- Graceful embedding failure in hybrid search: when doing hybrid search (`semanticRatio in ]0.0, 1.0[`), an embedding failure no longer causes the search request to fail. Instead, only the keyword search is performed. When doing a full vector search (`semanticRatio==1.0`), a failure to embed will still result in failing that search.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-04 16:00:20 +00:00
Louis Dureuil
1ff2a2d6fb
Add semanticHitCount
2024-04-04 16:04:06 +02:00
Louis Dureuil
3c6e9851a4
Correct error formatting
2024-04-04 15:58:19 +02:00
Louis Dureuil
466d718a05
Fix test
2024-04-04 15:58:19 +02:00
Louis Dureuil
6ebb6b55a6
Lazily embed, don't fail hybrid search on embedding failure
2024-04-04 15:58:17 +02:00
Louis Dureuil
fabc9cf14a
milli: add Embedder::embed_one
2024-04-04 15:57:29 +02:00
Louis Dureuil
00c4ed3bc2
milli: refactor getting embedder and embedder name
2024-04-04 15:57:29 +02:00
Louis Dureuil
928e6e4c05
Breaking change: remove vector for score details
2024-04-04 15:57:29 +02:00
meili-bors[bot]
339a5e3431
Merge #4549
...
4549: Hugging Face embedder improvements r=dureuill a=dureuill
Architectural changes/Internal improvements
### 1. Prefer safetensors weights over pytorch weights when available
safetensors weights are memory mapped, which reduces memory usage of supported models.
### 2. Update candle
Updates candle to `0.4.1`, now targeting crates.io and the tokenizers to `v0.15.2` (still on github).
This might fix https://github.com/meilisearch/meilisearch/issues/4399 thanks to the now included https://github.com/huggingface/candle/issues/1454
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-04 13:47:18 +00:00
meili-bors[bot]
5509bafff8
Merge #4535
...
4535: Support Negative Keywords r=ManyTheFish a=Kerollmops
This PR fixes #4422 by supporting `-` before any word in the query.
The minus symbol `-`, from the ASCII table, is not the only character that can be considered the negative operator. You can see the two other matching characters under the `Based on "-" (U+002D)` section on [this unicode reference website](https://www.compart.com/en/unicode/U+002D ).
It's important to notice the strange behavior when a query includes and excludes the same word; only the derivative ( synonyms and split) will be kept:
- If you input `progamer -progamer`, the engine will still search for `pro gamer`.
- If you have the synonym `like = love` and you input `like -like`, it will still search for `love`.
## TODO
- [x] Add analytics
- [x] Add support to the `-` operator
- [x] Make sure to support spaces around `-` well
- [x] Support phrase negation
- [x] Add tests
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-04-04 13:10:27 +00:00
Louis Dureuil
58cafcc824
Update candle
2024-04-03 13:11:56 +02:00
meili-bors[bot]
56bf8503db
Merge #4537
...
4537: Expose distribution shift in settings r=ManyTheFish a=dureuill
See [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#d652adc0890445658aaf36352dbc8802 )
# Changes
- Distribution shift added to all embedders.
- Exposed in settings
- Changed the reindexing logic to not trigger a reindex operation when only the distribution shift or API key change
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-03 09:08:58 +00:00
Louis Dureuil
a1eccc762a
Prefer safetensors to pytorch when both are available
2024-04-03 11:05:59 +02:00
meili-bors[bot]
75f81a0bab
Merge #4547
...
4547: Fix milli/Cargo.toml for usage as dependency via git r=dureuill a=Toromyx
# Pull Request
## Related issues/discussions
This enables th usage of `milli` [via git repository](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories ) as mentioned in <https://github.com/meilisearch/meilisearch/issues/3367#issuecomment-1422613815 >, <https://github.com/meilisearch/meilisearch/discussions/1523#discussioncomment-1039338 >, and <https://github.com/meilisearch/meilisearch/discussions/1981#discussioncomment-1771568 >
## What does this PR do?
Trying to depend on `milli` like
```
[dependencies.milli]
git = "https://github.com/meilisearch/meilisearch.git "
tag = "v1.7.4"
```
leads to the following error:
```
error: failed to select a version for the requirement `candle-core = "^0.3.1"`
candidate versions found which didn't match: 0.4.2
location searched: Git repository https://github.com/huggingface/candle.git
required by package `milli v1.7.4 (https://github.com/meilisearch/meilisearch.git?tag=v1.7.4#0259ad60 )`
```
because the default branch of <https://github.com/huggingface/candle > does not contain the correct version.
To fix this, i added a `rev="..."` entry in the relevant dependencies, specifiyng the commit already present in the `Cargo.lock` file.
I also updated the version to the one in the Cargo.lock. This also updated `candle-kernels` sub-dependency from 0.3.1 to 0.3.3 which is probably correct?
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Thomas Gauges <thomas.gauges@gmail.com>
2024-04-03 07:31:36 +00:00
Thomas Gauges
d55d496250
Fix milli/Cargo.toml for usage as dependency via git
2024-04-02 15:19:30 +02:00
redistay
182cb42953
chore: fix some typos in conments
...
Signed-off-by: redistay <wujunjing@outlook.com>
2024-04-02 19:37:55 +08:00
meili-bors[bot]
92a049c2dd
Merge #4543
...
4543: Bring back changes from v1.7.4 into main r=Kerollmops a=dureuill
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: dureuill <dureuill@users.noreply.github.com>
2024-03-28 16:53:51 +00:00
Clément Renault
877f4b1045
Support negative phrases
2024-03-28 15:51:43 +01:00
Louis Dureuil
796213af9a
Merge branch 'main' into tmp-release-v1.7.4
2024-03-28 10:51:49 +01:00
Clément Renault
69f8b2730d
Fix the tests
2024-03-28 10:47:04 +01:00
Louis Dureuil
ee8cbea810
Don't optimize reindexing when fields contain dots
2024-03-27 17:04:45 +01:00
Louis Dureuil
572fb3a51d
Finer granularity for embedder needs reindex
2024-03-27 12:01:34 +01:00
Louis Dureuil
4ff0255783
remove unused function
2024-03-27 11:51:14 +01:00
Louis Dureuil
a25456120d
Expose distribution in settings
2024-03-27 11:51:04 +01:00
Louis Dureuil
168ded3b9d
Deserr for distribution
2024-03-27 11:50:33 +01:00
Louis Dureuil
afd1da5642
Add distribution to all embedders
2024-03-27 11:50:22 +01:00
Clément Renault
34262c7a0d
Add analytics for the negative operator
2024-03-26 18:01:27 +01:00
Clément Renault
1da9e0f246
Better support space around the negative operator (-)
2024-03-26 17:47:13 +01:00
Clément Renault
e4a3e603b3
Expose a first working version of the negative keyword
2024-03-26 17:47:13 +01:00
Louis Dureuil
817ccc089a
also allow api_key
2024-03-25 11:50:00 +01:00
Louis Dureuil
4136630ea5
Use constants instead of raw strings in set_*set()
2024-03-25 11:39:33 +01:00
Louis Dureuil
58972f35cb
Allow url
parameter for ollama embedder
2024-03-25 11:32:55 +01:00
Louis Dureuil
dfa5e41ea6
Check validity of the URL setting
2024-03-25 11:23:16 +01:00
Louis Dureuil
a1db342f01
Expose REST embedder to the API
2024-03-25 11:23:15 +01:00
Louis Dureuil
f87747f4d3
Remove unwraps
2024-03-25 11:23:04 +01:00
Louis Dureuil
b6b4b6bab7
Remove the tokio and the reqwests
2024-03-25 11:23:03 +01:00
Louis Dureuil
ac52c857e8
Update ollama and openai impls to use the rest embedder internally
2024-03-25 11:23:03 +01:00
Louis Dureuil
8708cbef25
Add RestEmbedder
2024-03-25 11:23:03 +01:00