meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-22 18:17:39 +08:00

Author	SHA1	Message	Date
ManyTheFish	8bf89ec394	Infer locales from index settings	2024-08-21 10:47:40 +02:00
meili-bors[bot]	ee62d9ce30	Merge #4845 4845: Fix perf regression facet strings r=ManyTheFish a=dureuill Benchmarks between v1.9 and v1.10 show a performance regression of about x2 (+3dB regression) for most indexing workloads (+44s for hackernews). [Benchmark interpretation in the engine weekly meeting](https://www.notion.so/meilisearch/Engine-weekly-4d49560d374c4a87b4e3d126a261d4a0?pvs=4#98a709683276450295fcfe1f8ea5cef3). - Initial investigation pointed to #4819 as the origin of the regression. - Further investigation points towards the hypernormalization of each facet value in `extract_facet_string_docids` - Most of the slowdown is in `normalize_facet_strings`, and precisely in `detection.language()`. This PR improves the situation (-10s compared with `main` for hackernews, so only +34s regression compared with `v1.9`) by skipping normalization when it can be skipped. I'm not sure how to fix the root cause though. Should we skip facet locale normalization for now? Cc `@ManyTheFish` --- Tentative resolution options: 1. remove locale normalization from facet. I'm not sure why this is required, I believe we weren't doing this before, so maybe we can stop doing that again. 2. don't do language detection when it can be helped: won't help with the regressions in benchmark, but maybe we can skip language detection when the locales contain only one language? 3. use a faster language detection library: `@Kerollmops` told me about https://github.com/quickwit-oss/whichlang which bolsters x10 to x100 throughput compared with whatlang. Should we consider replacing whatlang with whichlang? Now I understand whichlang supports fewer languages than whatlang, so I also suggest: 4. use whichlang when the list of locales is empty (autodetection), or when it only contains locales that whichlang can detect. If the list of locales contains locales that whichlang cannot detect, then use whatlang instead. --- > [!CAUTION] > this PR contains a commit that adds detailed spans, that were used to detect which part of `extract_facet_string_docids` was taking too much time. As this commit adds spans that are called too often and adds 7s overhead, it should be removed before landing. Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-08-19 06:29:48 +00:00
ManyTheFish	0f965d3574	Remove hotloop's spans	2024-08-14 14:33:36 +02:00
ManyTheFish	ade54493ab	Only detect language for a facet if several locales have been specified by the user in the settings	2024-08-14 12:03:52 +02:00
Louis Dureuil	c3cdc407ec	Avoid unnecessary clone()	2024-08-08 14:57:02 +02:00
Louis Dureuil	2f10273d14	Group by normalized values, make sure you don't remove a value where there remains at still one value that normalizes towards it	2024-08-08 14:02:53 +02:00
Louis Dureuil	e3ef0ae19e	also intersect the universe for searchOnAttributes	2024-08-06 14:06:56 +02:00
meili-bors[bot]	57f7af77c7	Merge #4846 4846: Add OpenAI tests r=dureuill a=dureuill # Pull Request ## Related issue Part of fixing #4757 ## What does this PR do? - OpenAI embedder: don't pass apiKey when it is empty (slightly improves error messages) - rest embedder and rest-based embedders: specialize the authorization denied error message depending on the configuration source - fix existing tests - Adds assets containing prerecorded texts to embed and the embeddings obtained from OpenAI - Adds an asset containing a tokenized long document and the embedding obtained from OpenAI for this token - Uses the wiremock crate to mock the OpenAI API: parse the openai request, lookup the response in assets, craft an openai response Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-08-05 10:49:28 +00:00
Louis Dureuil	e64d0e0ca8	use insert instead of push for bitmaps	2024-08-01 18:32:45 +02:00
Louis Dureuil	9ef710cad4	Use wrapper that forces the desired date format	2024-07-31 17:12:19 +02:00
Louis Dureuil	5aa6cb3600	Specialize authorized error message depending on config source	2024-07-31 15:03:44 +02:00
Louis Dureuil	9b7764575b	openai: don't pass apiKey when it is empty	2024-07-31 15:03:44 +02:00
Louis Dureuil	0e68718027	Add detailed spans	2024-07-31 13:05:47 +02:00
Louis Dureuil	7c3fc8c655	Split settings and document facet string extractions	2024-07-31 10:57:46 +02:00
Louis Dureuil	8acd3f50bb	skip normalization when the locales and values are the same	2024-07-31 09:53:00 +02:00
Tamo	d262b1df32	craft an API over the Shared Server and Shared index to avoid hard to debug mistakes	2024-07-30 14:24:57 +02:00
meili-bors[bot]	c2c1ba39ee	Merge #4826 4826: Update Charabia v0.9.0 r=dureuill a=ManyTheFish # Pull Request ## Related Changelog https://github.com/meilisearch/charabia/releases/tag/v0.9.0 ## Notable Change for Meilisearch Adds all math symbols from https://www.compart.com/en/unicode/category/Sm to the default separator list. Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-07-25 14:08:38 +00:00
ManyTheFish	35567b2137	Update Charabia v0.9.0	2024-07-25 16:02:14 +02:00
Louis Dureuil	d4ea7cc2a9	fix clippy 👉👈	2024-07-25 12:10:32 +02:00
Louis Dureuil	2413592bbf	Display docid when there are documents without manual embeddings for a manual embedder	2024-07-25 12:10:32 +02:00
Louis Dureuil	553440632e	Introduce Setting::some_or_not_set	2024-07-25 12:01:52 +02:00
Louis Dureuil	7a347966da	Allow explicit `dimensions` for ollama	2024-07-25 12:01:51 +02:00
Louis Dureuil	4654d51e05	Add custom headers for REST embedder	2024-07-25 12:01:51 +02:00
ManyTheFish	a918561ac1	Fix PR comments	2024-07-25 10:52:56 +02:00
ManyTheFish	70d71581ee	fix clippy	2024-07-25 10:52:56 +02:00
ManyTheFish	04fa44e7eb	Implement localized attributes settings	2024-07-25 10:51:27 +02:00
ManyTheFish	90c0a6db7d	Implement localized search	2024-07-25 10:51:27 +02:00
ManyTheFish	cc02920f2b	Update charabia	2024-07-25 10:51:27 +02:00
Tamo	988552e178	add tests on the rest embedder	2024-07-24 14:34:17 +02:00
Louis Dureuil	0d8199f3b7	Change parameters in milli settings	2024-07-24 14:34:17 +02:00
Louis Dureuil	4b74803dae	Change parameters in vector settings	2024-07-24 14:34:17 +02:00
Louis Dureuil	d731fa661b	ollama and openai use new EmbedderOptions	2024-07-24 14:34:17 +02:00
Louis Dureuil	a1beddd5d9	rest embedder: use json_template	2024-07-24 14:34:17 +02:00
Louis Dureuil	4109182ca4	Add json_template module	2024-07-24 14:34:12 +02:00
Louis Dureuil	1a297c048e	Error changes	2024-07-24 14:34:12 +02:00
Louis Dureuil	303e601b87	HuggingFace: Clearer error message when a model is not supported	2024-07-23 15:13:22 +02:00
meili-bors[bot]	ea73615abf	Merge #4804 4804: Implements the experimental contains filter operator r=irevoire a=irevoire # Pull Request Related PRD: (private link) https://www.notion.so/meilisearch/Contains-Like-Filter-Operator-0d8ad53c6761466f913432eb1d843f1e Public usage page: https://meilisearch.notion.site/Contains-filter-operator-usage-3e7421b0aacf45f48ab09abe259a1de6 ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3613 ## What does this PR do? - Extract the contains operator from this PR: https://github.com/meilisearch/meilisearch/pull/3751 - Gate it behind a feature flag - Add tests Co-authored-by: Tamo <tamo@meilisearch.com>	2024-07-17 15:47:11 +00:00
Tamo	02c61eabfa	fix the range reported when the experimental feature has not been set	2024-07-17 16:54:33 +02:00
Tamo	2af9481804	Implements the experimental contains filter operator«	2024-07-17 11:13:37 +02:00
Louis Dureuil	24240934f9	Improve errors when indexing documents with a user provided embedder	2024-07-16 13:39:01 +02:00
Louis Dureuil	f4c94ac57f	manual embedders: limit max size of errors to 250	2024-07-16 13:39:01 +02:00
Louis Dureuil	4087a88dbe	rest\|ollama\|openai: increase tries to 10 + randomize retry duration	2024-07-16 13:39:00 +02:00
Louis Dureuil	5adacf2f45	OpenAI: embed only the first MAX_TOKENS tokens	2024-07-16 13:39:00 +02:00
Louis Dureuil	65d0c32aa7	Allow overriding OpenAI's url	2024-07-16 13:39:00 +02:00
Louis Dureuil	82647bcded	When `retrieveVectors` is true, retrieve `_vectors.embedder` even if there are no vector for that embedder	2024-07-16 13:39:00 +02:00
Louis Dureuil	e83da00446	Milli changes to match to allow for more flexible lifetimes	2024-07-11 16:29:35 +02:00
Louis Dureuil	7fb3e378ff	Do not fail sort comparisons when the field name or target point are different	2024-07-11 16:28:14 +02:00
meili-bors[bot]	29b44e5541	Merge #4626 4626: Edit Documents with Rhai r=ManyTheFish a=Kerollmops This PR introduces a first version of [the _Update Documents with Function_ (internal)](https://www.notion.so/meilisearch/Update-Documents-by-Function-45f87b13e61c4435b73943768a490808). It uses [the Rhai programming language](https://rhai.rs/) to let users express the modifications they want apply. You can read more about the way to use this functions on [the Usage PRD Page](https://meilisearch.notion.site/Edit-Documents-with-Rhai-0cff8fea7655436592e7c8a6de932062?pvs=25). The [prototype is available](https://github.com/meilisearch/meilisearch/actions/runs/9038384483) through Docker by using the following command: ``` docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-edit-documents-with-rhai-3 ``` ## TODO - [x] Support the `DocumentEdition` task in dumps. - [x] Remove the unwraps and panics. - [x] Improve error codes for the `function` parameter. - [x] [Update Rhai to v1.19.0](https://github.com/rhaiscript/rhai/releases/tag/v1.19.0) 🚀 - [x] Make it an experimental feature (only restrict the HTTP calls). - [x] It must be possible not to send a context. - [x] Rebase on main. - [x] Check that the script cannot do any io. - [x] ~Introduce a `Documents.edit` action or~ require the `Documents.all` action. - [x] Change the `editionCode` to the clearer `function` field name in the tasks. - [x] Support a user provided context and maybe more (but keep function execution isolated for reproducibility). - [x] Support deleting documents when the `doc` is `()` (nil, null). - [x] Support canceling document edition. - [x] Multithread document edition by using rayon (and [rayon-par-bridge](https://docs.rs/rayon-par-bridge/latest/rayon_par_bridge/)). - [x] Limit the number of instruction by function execution. - [ ] ~Expose the limit of instructions in the settings.~ Not sure, in fact. - [x] Ignore unmodified documents in the tasks count. - [x] Make the `filter` field optional (not forced to be `null`). Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-07-11 09:02:55 +00:00
Clément Renault	6e80364c50	Apply review comments	2024-07-11 11:00:27 +02:00
Clément Renault	3bac22fd87	We do not do intersections with the universe when it is related to cache	2024-07-10 16:49:36 +02:00
Clément Renault	ce61cb7fe6	Simplify and speedup an intersection pass	2024-07-10 16:49:36 +02:00
Clément Renault	1693d1a311	Simplify the check to decide to stop a loop	2024-07-10 16:49:36 +02:00
Clément Renault	febea735ca	Remove the unused universe parameter from resolve_negative_phrases	2024-07-10 16:49:36 +02:00
Clément Renault	93ba051094	Remove the invalid get_phrases_docids universe parameter	2024-07-10 16:49:35 +02:00
Clément Renault	cd7a20fa32	Make it work by avoid storing invalid stuff in the cache	2024-07-10 16:49:35 +02:00
Clément Renault	41f51adbec	Do less useless intersections	2024-07-10 16:49:35 +02:00
Clément Renault	0ca1a4e805	Always do the intersections with the universe	2024-07-10 16:49:34 +02:00
Clément Renault	50a7393c55	Modify the compute_query_term_subset_docids function to accept the universe	2024-07-10 16:49:34 +02:00
Clément Renault	837274f853	Restrict even more the Rhai engine	2024-07-10 16:30:18 +02:00
Clément Renault	aace587dd1	Create errors for the internal processing ones	2024-07-10 16:29:18 +02:00
Clément Renault	f35d6710f3	Update rhai to v1.19.0	2024-07-10 16:29:17 +02:00
Clément Renault	81ec0abad1	Use the new rayon-par-bridge library	2024-07-10 16:29:04 +02:00
Clément Renault	b67d385cf0	Parallelize the edition functions	2024-07-10 16:28:54 +02:00
Clément Renault	dfecb25814	Disable the time package	2024-07-10 16:28:37 +02:00
Clément Renault	2eae2015d7	Support aborting documents edition by function	2024-07-10 16:28:15 +02:00
Clément Renault	33fa17bf12	Support deleting documents with functions	2024-07-10 16:28:15 +02:00
Clément Renault	400e6b93ce	Support user-provided context for documents edition	2024-07-10 16:28:15 +02:00
Clément Renault	f4add93043	Limit the number of script operations	2024-07-10 16:28:14 +02:00
Clément Renault	2fae96ac14	Show the actual number of actually edited documents	2024-07-10 16:28:14 +02:00
Clément Renault	45af18ae9c	Check the Rhai syntax before accepting the script	2024-07-10 16:28:13 +02:00
Clément Renault	2d97164d9f	It works perfectly with some Rhai	2024-07-10 16:28:13 +02:00
Clément Renault	efc156a4a4	Executing Lua works correctly	2024-07-10 16:27:36 +02:00
meili-bors[bot]	2099b4f0dd	Merge #4786 4786: Update dependencies r=Kerollmops a=irevoire # Pull Request ## Related issue Fixes #4753 ## What does this PR do? - Update all dependencies except rustls - [x] Release charabia - [x] Update charabia - [x] Double check that the docker build works after updating charabia Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-07-10 13:23:54 +00:00
Clément Renault	9d6885793e	Upgrade dependencies	2024-07-10 13:46:24 +02:00
Clément Renault	5f4530ce57	Remove more unused dependencies	2024-07-10 13:36:34 +02:00
Tamo	4d5005b01a	make clippy happy	2024-07-10 10:06:59 +02:00
Tamo	952e742321	update charabia	2024-07-09 23:41:29 +02:00
hanbings	0a40a98bb6	Make milli use edition 2021 (#4770 ) * Make milli use edition 2021 * Add lifetime annotations to milli. * Run cargo fmt	2024-07-09 17:25:39 +02:00
Tamo	cd46ebd6b5	remove insta deprecating	2024-07-08 18:38:05 +02:00
Tamo	6afa578688	update most incompatible dependencies	2024-07-08 18:31:15 +02:00
Tamo	300bdfc2a7	update most dependencies	2024-07-08 18:09:12 +02:00
Louis Dureuil	128e6c7502	Search: spans with a finer granularity	2024-07-02 16:13:53 +02:00
ManyTheFish	015d90a962	merge main	2024-07-01 11:50:36 +02:00
Louis Dureuil	e53de15b8e	Fix behavior of limit and offset for hybrid search when keyword results are returned early The test is fixed	2024-06-27 14:25:33 +02:00
Tamo	ce08dc509b	add more tests and improve the location of the error	2024-06-27 11:51:45 +02:00
Tamo	1daaed163a	Make _vectors.:embedding.regenerate mandatory + tests + error messages	2024-06-27 11:04:58 +02:00
meili-bors[bot]	7e3c306c54	Merge #4725 4725: Store primary key as String when Number exceeds i64 range r=irevoire a=JWSong # Pull Request ## Related issue Fixes #4696 ## What does this PR do? - When a Number value exceeding the range of i64 is received as a primary key, it will be stored as a String. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: JWSong <thdwjddn123@gmail.com>	2024-06-26 07:06:04 +00:00
JWSong	dcdc83946f	accept large number as string	2024-06-25 21:41:47 +09:00
meili-bors[bot]	3c4c46377b	Merge #4665 4665: Add missing Korean support r=ManyTheFish a=junhochoi Some configuration is missing `korean` features and add a test case in `milli/src/search/mod.rs`. # Pull Request ## Related issue #3443 #3882 ## What does this PR do? - Improvement on enabling Korean support Inspired by the work (#3882) I tried to enable Korean features but have found some missing configurations. This PR is add those missing configs (mostly Cargo.toml) and added one test case. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Junho Choi <jh.choi@catenoid.net>	2024-06-25 11:51:21 +00:00
Louis Dureuil	d75e0098c7	Fixes for Rust v1.79	2024-06-25 11:16:06 +02:00
Junho Choi	2e0ff56f3f	Add missing Korean support Some configuration is missing `korean` features and add a test case in `milli/src/search/mod.rs`.	2024-06-25 12:45:21 +09:00
Tamo	1693332cab	Update arroy and always build the tree that need to be built	2024-06-24 10:14:03 +02:00
meili-bors[bot]	ddd564665b	Merge #4713 4713: Speed up facet distribution r=ManyTheFish a=Kerollmops This PR is akin to #4682, but this time, the same logic is applied to the facets. Bitmaps are not decoded, and we do an intersection on the bytes with the search candidates instead of materializing the RoaringBitmap to destroy it just after the operation. A prospect raised some slow requests when performing facet searches, and I found out that the disk optimization intersection wasn't performed on the facets. Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-06-24 05:23:46 +00:00
Clément Renault	9736e16a88	Make clippy happy	2024-06-20 13:02:44 +02:00
Clément Renault	6fa4da8ae7	Improve facet distribution speed in count mode	2024-06-20 12:58:51 +02:00
Clément Renault	19d7cdc20d	Improve facet distribution speed in lexico mode	2024-06-20 12:57:08 +02:00
Louis Dureuil	a04041c8f2	Only spawn the pool once	2024-06-19 16:25:33 +02:00
meili-bors[bot]	e580d6b98f	Merge #4693 4693: Introduce distinct attributes at search time r=irevoire a=Kerollmops This PR fixes #4611. ### To Do - [x] Remove the `distinguishableAttributes` settings (not even a commit about that). - [x] Use the `filterableAttributes` to be able to use the `distinct` parameter at search. - [x] Work on the errors and make tests. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2024-06-18 07:45:03 +00:00
Tamo	43875e6758	fix bug around nested fields	2024-06-17 15:59:30 +02:00
meili-bors[bot]	e9bf4c43a4	Merge #4649 4649: Don't store the vectors in the documents database r=dureuill a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4607 ## What does this PR do? - Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable - [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings - [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field. - [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-06-17 12:32:03 +00:00
Louis Dureuil	0a8f50695e	Fixes for Rust v1.79	2024-06-13 17:47:44 +02:00
Louis Dureuil	e35ef31738	Small changes following review	2024-06-13 14:20:48 +02:00
Louis Dureuil	3bc8f81abc	user_provided => regenerate	2024-06-12 18:12:20 +02:00
Louis Dureuil	a89eea233b	Fix vectors injection	2024-06-12 17:10:19 +02:00
Louis Dureuil	f5cf01e7d1	Rework extraction to use EmbedderAction	2024-06-12 14:50:55 +02:00
Louis Dureuil	d1dd7e5d09	In transform for removed embedders, write back their user provided vectors in documents, and clear the writers	2024-06-12 14:50:55 +02:00
Louis Dureuil	d18c1f77d7	Update embedder configs with a finer granularity - no longer clear vector DB between any two embedder changes	2024-06-12 14:50:55 +02:00
Louis Dureuil	d0b05ae691	Add EmbedderAction to settings	2024-06-12 14:50:54 +02:00
Louis Dureuil	e9bf4eb100	Reformulate ParsedVectorsDiff in terms of VectorState	2024-06-12 14:11:44 +02:00
Louis Dureuil	b368105272	Add EmbedderConfigs::into_inner	2024-06-12 14:11:44 +02:00
meili-bors[bot]	e0eff08095	Merge #4685 4685: Fix ci tests r=dureuill a=ManyTheFish # Pull Request Make the all following CI succeed: https://github.com/meilisearch/meilisearch/actions/runs/9477183091 ## Related issue Fixes #4629 ## What does this PR do? - Change the test behavior for `swedish-recomposition` feature flag - Remove the `-v` parameter from grep Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Many the fish <many@meilisearch.com>	2024-06-12 07:58:33 +00:00
Clément Renault	39f60abd7d	Add and modify distinct tests	2024-06-11 17:53:53 -04:00
Clément Renault	1991bd03da	Distinct at search erases the distinct in the settings	2024-06-11 17:02:39 -04:00
Clément Renault	ee39309aae	Improve errors and introduce a new InvalidSearchDistinct error code	2024-06-11 16:03:39 -04:00
Clément Renault	0d31be1494	Make the distinct work at search	2024-06-11 11:39:35 -04:00
Louis Dureuil	7cef2299cf	Fix behavior when removing a document	2024-06-11 09:45:08 +02:00
ManyTheFish	57d066595b	fix Tests almost all features	2024-06-06 17:24:50 +02:00
Clément Renault	75b2e02cd2	Log more stuff around filtering	2024-06-06 11:00:07 -04:00
Clément Renault	52d0d35b39	Revert "Reduce the universe while exploring the facet tree" because it's slower this way This reverts commit 14026115f21409535772ede0ee4273f37848dd61.	2024-06-06 09:17:51 -04:00
Clément Renault	5432776132	Reduce the universe while exploring the facet tree	2024-06-06 09:17:51 -04:00
Clément Renault	66470b27e6	Use the MultiOps trait for IN operations	2024-06-06 09:17:51 -04:00
Clément Renault	0a9bd398c7	Improve the NOT operator to use the universe when possible	2024-06-06 09:17:51 -04:00
Clément Renault	7967e93c16	Skip evaluating when a universe is empty, nothing can be found	2024-06-06 09:17:51 -04:00
Clément Renault	a6f3a01c6a	Expose the universe to do efficient intersections on deserialization	2024-06-06 09:17:51 -04:00
Clément Renault	4ca4a3f954	Make the CboRoaringBitmapCodec support intersection on deserialization	2024-06-06 09:17:51 -04:00
Clément Renault	e4a69c5ac3	Introduce the FacetGroupLazyValue type	2024-06-06 09:17:50 -04:00
Clément Renault	531e3d7d6a	MultiOps trait for OR operations	2024-06-06 09:17:50 -04:00
Tamo	2cdcb703d9	fix the deletion of vectors and add a test	2024-06-06 11:39:29 +02:00
Tamo	31a793d226	fix the regeneration of the embeddings in the search	2024-06-06 11:39:29 +02:00
Tamo	d85ab23b82	rename all occurences of user_defined to user_provided for consistency	2024-06-06 11:39:29 +02:00
Tamo	b7349910d9	implements mor review comments	2024-06-06 11:39:29 +02:00
Tamo	376b3a19a7	makes clippy and fmt happy	2024-06-06 11:39:29 +02:00
Tamo	b867829ef1	remove useless dbg	2024-06-06 11:39:29 +02:00
Tamo	5d50850e12	always push the user defined vectors in arroy	2024-06-06 11:39:29 +02:00
Tamo	a73ccc78a6	forward the embedding config to the extractors	2024-06-06 11:39:28 +02:00
Tamo	9eb6f522ea	wraps the index embedding config in a struct	2024-06-06 11:37:30 +02:00
Tamo	04f6523f3c	expose a new parameter to retrieve the embedders at search time	2024-06-06 11:36:11 +02:00
Tamo	84e498299b	Remove the vectors from the documents database	2024-06-06 11:36:11 +02:00
Tamo	7a84697570	never store the _vectors as searchable or faceted fields	2024-06-06 11:36:11 +02:00
Tamo	4148fbbe85	provide a method to get all the nested fields ids from a name	2024-06-06 11:36:11 +02:00
ManyTheFish	2e50c6ec81	Update Charabia	2024-06-06 10:18:43 +02:00
ManyTheFish	30293883e0	Fix condition mistake	2024-06-05 17:30:07 +02:00
ManyTheFish	b833be46b9	Avoid running proximity when only the exact attributes changes	2024-06-05 17:30:07 +02:00
ManyTheFish	0a4118329e	Put only_additional_fields to None if the difference gives an empty result.	2024-06-05 17:30:07 +02:00
ManyTheFish	261e92d7e6	Skip iterating over documents when the faceted field list doesn't change	2024-06-05 17:30:07 +02:00
ManyTheFish	5cd08979b1	iterate over the faceted fields instead of over the whole document	2024-06-05 17:30:07 +02:00
Clément Renault	a998b881f6	Cache a lot of operations to know if a field must be indexed	2024-06-05 17:30:07 +02:00
Clément Renault	b81953a65d	Add a span for the prepare_for_documents_reindexing	2024-06-05 17:30:07 +02:00
Clément Renault	091bb157f1	Add a span for the settings diff creation	2024-06-05 17:30:07 +02:00
Clément Renault	1b639ce44b	Reduce the number of complex calls to settings diff functions	2024-06-05 17:30:07 +02:00
Clément Renault	87cf8a3c94	Introduce a new way to determine the operations to perform on the fields	2024-06-05 17:30:07 +02:00
Clément Renault	0f578348f1	Introduce a dedicated function to write proximity entries in database	2024-06-05 17:30:07 +02:00
Clément Renault	fad4675abe	Give the settings diff to the write_typed_chunk_into_index function	2024-06-05 17:30:07 +02:00
Clément Renault	1ab03c4ede	Fix an issue with settings diff and * in the searchable attributes	2024-06-05 17:30:07 +02:00
Clément Renault	0c6e4b2f00	Introducing a new into_del_add_obkv_conditional_operation function	2024-06-05 17:30:07 +02:00
Clément Renault	42b3f52ef9	Introduce the SettingDiff only_additional_fields method	2024-06-05 17:30:07 +02:00
meili-bors[bot]	93f5defedc	Merge #4656 4656: Adding a new `searchableAttribute` no longer re-index all the attributes r=ManyTheFish a=Kerollmops Fixes #4492. ## To Do - [x] Do not call the `InnerSettingsDiff::only_additional_fields` function too many times - [ ] Add tests Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-06-05 14:51:14 +00:00
ManyTheFish	33241a6b12	Fix condition mistake	2024-06-05 16:00:24 +02:00
ManyTheFish	ff87b4db26	Avoid running proximity when only the exact attributes changes	2024-06-05 12:48:44 +02:00
ManyTheFish	ba9fadc8f1	Put only_additional_fields to None if the difference gives an empty result.	2024-06-05 10:51:16 +02:00
ManyTheFish	d29d4f88da	Skip iterating over documents when the faceted field list doesn't change	2024-06-04 15:31:24 +02:00
ManyTheFish	17c5ceeb9d	iterate over the faceted fields instead of over the whole document	2024-06-04 14:04:20 +02:00
meili-bors[bot]	fc584f1db3	Merge #4666 4666: Add a score threshold search parameter r=ManyTheFish a=dureuill # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4609 ## What does this PR do? - See [usage](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#95b76ded400342ba9ab3d67c734836f0) and [the known limitation](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#e4e32195bf0e4195b5daecdbb7a97a17) Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-06-03 08:42:44 +00:00
Louis Dureuil	2b6db6541e	Changes after review	2024-06-03 10:30:00 +02:00
meili-bors[bot]	d6bd88ce4f	Merge #4667 4667: Frequency matching strategy r=Kerollmops a=ManyTheFish # Pull Request ## Related issue Fixes #3773 ## What does this PR do? - add test for matching strategy - implement frequency matching strategy See the [PRD for more details](https://www.notion.so/meilisearch/Frequency-Matching-Strategy-0f3ba08833a442a39590a53a1505ab00). [Public API](https://www.notion.so/meilisearch/frequency-matching-strategy-89868fb7fc584026bc56e378eb854a7f). Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-05-30 14:53:31 +00:00
Clément Renault	b9a0ff0dd6	Cache a lot of operations to know if a field must be indexed	2024-05-30 16:18:23 +02:00
Clément Renault	75496af985	Add a span for the prepare_for_documents_reindexing	2024-05-30 12:14:22 +02:00
Clément Renault	0e9eb9eedb	Add a span for the settings diff creation	2024-05-30 12:08:27 +02:00
ManyTheFish	3f1a510069	Add tests and fix matching strategy	2024-05-30 12:02:42 +02:00
Clément Renault	3a78e988da	Reduce the number of complex calls to settings diff functions	2024-05-30 11:23:07 +02:00
Clément Renault	d9e5074189	Introduce a new way to determine the operations to perform on the fields	2024-05-30 11:23:07 +02:00
Clément Renault	bc210bdc00	Introduce a dedicated function to write proximity entries in database	2024-05-30 11:23:06 +02:00
Clément Renault	4bf83f701c	Give the settings diff to the write_typed_chunk_into_index function	2024-05-30 11:23:06 +02:00
Clément Renault	db3887929f	Fix an issue with settings diff and * in the searchable attributes	2024-05-30 11:22:50 +02:00
Clément Renault	9af103a88e	Introducing a new into_del_add_obkv_conditional_operation function	2024-05-30 11:22:49 +02:00
Clément Renault	99211eb375	Introduce the SettingDiff only_additional_fields method	2024-05-30 11:22:49 +02:00
Louis Dureuil	4f03b0cf5b	Add ranking score threshold to similar	2024-05-30 11:20:50 +02:00
Louis Dureuil	c26db7878c	Expose rankingScoreThreshold in API	2024-05-30 10:32:35 +02:00
ManyTheFish	1ab88e10b9	Merge branch 'main' into merge-release-v1.8.1-in-main	2024-05-29 16:24:00 +02:00
Louis Dureuil	aac1d769a7	Add ranking_score_threshold to milli	2024-05-29 14:17:09 +02:00
ManyTheFish	abdc4afcca	Implement Frequency matching strategy	2024-05-29 13:59:08 +02:00
Many the fish	e1fbfde6c4	Merge branch 'main' into merge-release-v1.8.1-in-main	2024-05-29 11:31:03 +02:00
ManyTheFish	27b75ec648	merge main into v1.8.1	2024-05-29 11:26:07 +02:00
Louis Dureuil	ca6cc4654b	Add similar route	2024-05-28 15:28:19 +02:00
Louis Dureuil	d35278320e	Add support functions for accessing arroy writers and readers	2024-05-28 15:27:43 +02:00
Louis Dureuil	02b3d82c60	filtered_universe accepts index and txn instead of SearchContext	2024-05-28 15:22:12 +02:00
Louis Dureuil	fd2c95999d	Change `validate_document_id` to public and remove extra layer of result	2024-05-28 15:21:19 +02:00
Clément Renault	dc949ab46a	Remove puffin usage	2024-05-27 15:59:14 +02:00
Clément Renault	7f3e51349e	Remove puffin for the dependencies	2024-05-27 15:53:06 +02:00
meili-bors[bot]	19acc65ad2	Merge #4646 4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops This PR implements what is described in #4485. It reduces the number of disk writes and disk usage. Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-05-23 16:06:50 +00:00
Clément Renault	fe17c0f52e	Construct the minimal OBKVs according to the settings diff	2024-05-23 11:23:57 +02:00
Clément Renault	bc5663e673	FieldIdsMap no longer useful thanks to #4631	2024-05-22 16:06:15 +02:00
Louis Dureuil	8a941c0241	Smaller review changes	2024-05-22 14:44:42 +02:00
Louis Dureuil	3412e7fbcf	"[]" is deserialized as 0 embedding rather than 1 embedding of dim 0	2024-05-22 12:25:21 +02:00
Louis Dureuil	16037e2169	Don't remove embedders that are not in the config from the document DB	2024-05-22 12:24:51 +02:00
Louis Dureuil	8f7c8ca7f0	Remove now unused error variant	2024-05-22 12:23:43 +02:00
Clément Renault	500ddc76b5	Make the flattened sorter optional	2024-05-21 16:16:36 +02:00
Clément Renault	943f8dba0c	Make clippy happy	2024-05-21 14:58:41 +02:00
Clément Renault	1aa8ed9ef7	Make the original sorter optional	2024-05-21 14:53:26 +02:00
ManyTheFish	f762307838	Fix clippy	2024-05-21 13:44:20 +02:00
ManyTheFish	3e94a90722	Fixes	2024-05-21 13:39:46 +02:00
Louis Dureuil	b17cb56dee	Test array of vectors	2024-05-20 14:44:10 +02:00
ManyTheFish	fc7e817221	Index geo points based on the settings differences	2024-05-20 12:27:26 +02:00
Louis Dureuil	d05d49ffd8	Fix tests	2024-05-20 10:36:18 +02:00
Louis Dureuil	0462ebbe58	Don't write an empty _vectors field	2024-05-20 10:36:18 +02:00
Louis Dureuil	2f7a8a4efb	Don't write vectors that weren't autogenerated in document DB	2024-05-20 10:36:18 +02:00
Louis Dureuil	52d9cb6e5a	Refactor vector indexing - use the parsed_vectors module - only parse `_vectors` once per document, instead of once per embedder per document	2024-05-20 10:36:17 +02:00
Louis Dureuil	261de888b7	Add function to get the embeddings of a document in an index	2024-05-20 10:36:17 +02:00
Louis Dureuil	98c811247e	Add parsed vectors module	2024-05-20 10:25:59 +02:00
Tamo	273c6e8c5c	uses the latest version of heed to get rid of unsafe code	2024-05-16 18:31:32 +02:00
Tamo	897d25780e	update milli to latest version	2024-05-16 18:31:32 +02:00
Tamo	f2d0a59f1d	when no searchable attributes are defined, makes all the weight equals to zero	2024-05-16 01:06:33 +02:00
Tamo	c78a2fa4f5	rename method and variable around the attributes to search on feature	2024-05-15 18:04:42 +02:00
Tamo	5542f1d9f1	get back to what we were doingb efore in the DB cache and with the restricted field id	2024-05-15 18:00:39 +02:00
Tamo	ad4d8502b3	stops storing the whole fieldids weights map when no searchable are defined	2024-05-15 17:16:10 +02:00
Tamo	7ec4e2a3fb	apply all style review comments	2024-05-15 15:02:26 +02:00
Tamo	9fffb8e83d	make clippy happy	2024-05-14 17:36:32 +02:00
Tamo	caa6a7149a	make the attribute ranking rule use the weights and fix the tests	2024-05-14 17:36:32 +02:00
Tamo	a0082c4df9	add a failing test on the attribute ranking rule	2024-05-14 17:00:02 +02:00
Tamo	b0afe0972e	stop updating the fields ids map when fields are only swapped	2024-05-14 17:00:02 +02:00
Tamo	9ecde41853	add a test on the current behaviour	2024-05-14 17:00:02 +02:00
Tamo	685f452fb2	Fix the indexing of the searchable	2024-05-14 17:00:02 +02:00
Tamo	4e4a1ddff7	gate a test behind the required feature	2024-05-14 17:00:02 +02:00
Tamo	c22460045c	Stops returning an option in the internal searchable fields	2024-05-14 17:00:02 +02:00
Clément Renault	ac4bc143c4	Bump ureq to v2.9.7	2024-05-07 10:39:38 +02:00
meili-bors[bot]	4d5971f343	Merge #4621 4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-05-06 13:46:39 +00:00
Louis Dureuil	f4dd73ec8c	Destructure EmbedderOptions so we don't miss some options	2024-05-02 15:39:36 +02:00
ManyTheFish	88174b8ae4	Update charabia v0.8.10	2024-04-30 14:30:23 +02:00
meili-bors[bot]	ebca29f3de	Merge #4597 4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish # Pull Request - add some conditions reducing the work done when changing the settings - add some benchmarks on embedders ## Related issue Fixes #4585 Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-04-25 16:37:28 +00:00
meili-bors[bot]	c793b6ef6d	Merge #4600 4600: Fix embedders api r=ManyTheFish a=ManyTheFish # Pull Request ## Related issue Fixes #4594 Fixes #4595 Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-04-25 13:16:33 +00:00
Clément Renault	d4aeff92d0	Introduce the ThreadPoolNoAbort wrapper	2024-04-24 16:40:12 +02:00
ManyTheFish	9b76501875	Display set API key for Ollama embedder	2024-04-24 12:33:07 +02:00
Clément Renault	b3173d0423	Remove useless dots in the error messages	2024-04-22 18:09:33 +02:00
Clément Renault	96cc5319c8	Introduce a new internal error type to categorize panics	2024-04-22 18:09:33 +02:00
Clément Renault	0c7003c5df	Introduce an atomic to catch panics in thread pools	2024-04-22 18:09:33 +02:00
ManyTheFish	a1aa999026	Add conditions reducing wrok	2024-04-22 14:18:35 +02:00
ManyTheFish	c71b5d09ff	Updatre charabia v0.8.9	2024-04-18 11:38:26 +02:00
writegr	ab43a8a949	chore: fix some typos in comments Signed-off-by: writegr <wellweek@outlook.com>	2024-04-18 14:12:52 +08:00
meili-bors[bot]	4a8459b799	Merge #4576 4576: increase the default search time budget from 150ms to 1.5s r=ManyTheFish a=irevoire # Pull Request ## Related issue Fixes #4575 ## What does this PR do? - increase the default search time budget from 150ms to 1.5s Co-authored-by: Tamo <tamo@meilisearch.com>	2024-04-17 16:04:47 +00:00
Clément Renault	c923adf222	Fix facet distribution for alpha on facet numbers	2024-04-17 16:31:16 +02:00
ManyTheFish	df29ba709a	Make some cleaning in Arcs	2024-04-17 12:33:25 +02:00
ManyTheFish	3acfab2eb7	Fix PR comments	2024-04-17 10:55:51 +02:00
Tamo	19137be0ea	increase the default search time budget from 150ms to 1.5s	2024-04-16 18:09:49 +02:00
ManyTheFish	87a93ba47d	fix clippy	2024-04-16 14:39:30 +02:00
ManyTheFish	eaf113ef34	Fix wod pair proximity error when nothing has to be extracted	2024-04-16 14:39:30 +02:00
ManyTheFish	e5ae337aae	Comeback to sorters in extract_word_docids using buffers and merge the keys manually is less efficient	2024-04-16 14:39:30 +02:00
ManyTheFish	a489b406b4	fix test	2024-04-16 14:39:06 +02:00
ManyTheFish	02c3d6b265	finish work	2024-04-16 14:39:06 +02:00
ManyTheFish	b5e4a55af6	refactor faceted and searchable pipeline	2024-04-16 14:39:06 +02:00
ManyTheFish	a7e368aaa6	Create InnerIndexSettingsDiffs struct and populate it	2024-04-16 14:39:06 +02:00

... 3 4 5 6 7 ...

2652 Commits