meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-22 18:17:39 +08:00

Author	SHA1	Message	Date
Tamo	2cdcb703d9	fix the deletion of vectors and add a test	2024-06-06 11:39:29 +02:00
Tamo	31a793d226	fix the regeneration of the embeddings in the search	2024-06-06 11:39:29 +02:00
Tamo	d85ab23b82	rename all occurences of user_defined to user_provided for consistency	2024-06-06 11:39:29 +02:00
Tamo	b7349910d9	implements mor review comments	2024-06-06 11:39:29 +02:00
Tamo	376b3a19a7	makes clippy and fmt happy	2024-06-06 11:39:29 +02:00
Tamo	b867829ef1	remove useless dbg	2024-06-06 11:39:29 +02:00
Tamo	5d50850e12	always push the user defined vectors in arroy	2024-06-06 11:39:29 +02:00
Tamo	a73ccc78a6	forward the embedding config to the extractors	2024-06-06 11:39:28 +02:00
Tamo	9eb6f522ea	wraps the index embedding config in a struct	2024-06-06 11:37:30 +02:00
Tamo	04f6523f3c	expose a new parameter to retrieve the embedders at search time	2024-06-06 11:36:11 +02:00
Tamo	84e498299b	Remove the vectors from the documents database	2024-06-06 11:36:11 +02:00
Tamo	7a84697570	never store the _vectors as searchable or faceted fields	2024-06-06 11:36:11 +02:00
Tamo	4148fbbe85	provide a method to get all the nested fields ids from a name	2024-06-06 11:36:11 +02:00
ManyTheFish	2e50c6ec81	Update Charabia	2024-06-06 10:18:43 +02:00
ManyTheFish	30293883e0	Fix condition mistake	2024-06-05 17:30:07 +02:00
ManyTheFish	b833be46b9	Avoid running proximity when only the exact attributes changes	2024-06-05 17:30:07 +02:00
ManyTheFish	0a4118329e	Put only_additional_fields to None if the difference gives an empty result.	2024-06-05 17:30:07 +02:00
ManyTheFish	261e92d7e6	Skip iterating over documents when the faceted field list doesn't change	2024-06-05 17:30:07 +02:00
ManyTheFish	5cd08979b1	iterate over the faceted fields instead of over the whole document	2024-06-05 17:30:07 +02:00
Clément Renault	a998b881f6	Cache a lot of operations to know if a field must be indexed	2024-06-05 17:30:07 +02:00
Clément Renault	b81953a65d	Add a span for the prepare_for_documents_reindexing	2024-06-05 17:30:07 +02:00
Clément Renault	091bb157f1	Add a span for the settings diff creation	2024-06-05 17:30:07 +02:00
Clément Renault	1b639ce44b	Reduce the number of complex calls to settings diff functions	2024-06-05 17:30:07 +02:00
Clément Renault	87cf8a3c94	Introduce a new way to determine the operations to perform on the fields	2024-06-05 17:30:07 +02:00
Clément Renault	0f578348f1	Introduce a dedicated function to write proximity entries in database	2024-06-05 17:30:07 +02:00
Clément Renault	fad4675abe	Give the settings diff to the write_typed_chunk_into_index function	2024-06-05 17:30:07 +02:00
Clément Renault	1ab03c4ede	Fix an issue with settings diff and * in the searchable attributes	2024-06-05 17:30:07 +02:00
Clément Renault	0c6e4b2f00	Introducing a new into_del_add_obkv_conditional_operation function	2024-06-05 17:30:07 +02:00
Clément Renault	42b3f52ef9	Introduce the SettingDiff only_additional_fields method	2024-06-05 17:30:07 +02:00
meili-bors[bot]	93f5defedc	Merge #4656 4656: Adding a new `searchableAttribute` no longer re-index all the attributes r=ManyTheFish a=Kerollmops Fixes #4492. ## To Do - [x] Do not call the `InnerSettingsDiff::only_additional_fields` function too many times - [ ] Add tests Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-06-05 14:51:14 +00:00
ManyTheFish	33241a6b12	Fix condition mistake	2024-06-05 16:00:24 +02:00
ManyTheFish	ff87b4db26	Avoid running proximity when only the exact attributes changes	2024-06-05 12:48:44 +02:00
ManyTheFish	ba9fadc8f1	Put only_additional_fields to None if the difference gives an empty result.	2024-06-05 10:51:16 +02:00
ManyTheFish	d29d4f88da	Skip iterating over documents when the faceted field list doesn't change	2024-06-04 15:31:24 +02:00
ManyTheFish	17c5ceeb9d	iterate over the faceted fields instead of over the whole document	2024-06-04 14:04:20 +02:00
meili-bors[bot]	fc584f1db3	Merge #4666 4666: Add a score threshold search parameter r=ManyTheFish a=dureuill # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4609 ## What does this PR do? - See [usage](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#95b76ded400342ba9ab3d67c734836f0) and [the known limitation](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#e4e32195bf0e4195b5daecdbb7a97a17) Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-06-03 08:42:44 +00:00
Louis Dureuil	2b6db6541e	Changes after review	2024-06-03 10:30:00 +02:00
meili-bors[bot]	d6bd88ce4f	Merge #4667 4667: Frequency matching strategy r=Kerollmops a=ManyTheFish # Pull Request ## Related issue Fixes #3773 ## What does this PR do? - add test for matching strategy - implement frequency matching strategy See the [PRD for more details](https://www.notion.so/meilisearch/Frequency-Matching-Strategy-0f3ba08833a442a39590a53a1505ab00). [Public API](https://www.notion.so/meilisearch/frequency-matching-strategy-89868fb7fc584026bc56e378eb854a7f). Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-05-30 14:53:31 +00:00
Clément Renault	b9a0ff0dd6	Cache a lot of operations to know if a field must be indexed	2024-05-30 16:18:23 +02:00
Clément Renault	75496af985	Add a span for the prepare_for_documents_reindexing	2024-05-30 12:14:22 +02:00
Clément Renault	0e9eb9eedb	Add a span for the settings diff creation	2024-05-30 12:08:27 +02:00
ManyTheFish	3f1a510069	Add tests and fix matching strategy	2024-05-30 12:02:42 +02:00
Clément Renault	3a78e988da	Reduce the number of complex calls to settings diff functions	2024-05-30 11:23:07 +02:00
Clément Renault	d9e5074189	Introduce a new way to determine the operations to perform on the fields	2024-05-30 11:23:07 +02:00
Clément Renault	bc210bdc00	Introduce a dedicated function to write proximity entries in database	2024-05-30 11:23:06 +02:00
Clément Renault	4bf83f701c	Give the settings diff to the write_typed_chunk_into_index function	2024-05-30 11:23:06 +02:00
Clément Renault	db3887929f	Fix an issue with settings diff and * in the searchable attributes	2024-05-30 11:22:50 +02:00
Clément Renault	9af103a88e	Introducing a new into_del_add_obkv_conditional_operation function	2024-05-30 11:22:49 +02:00
Clément Renault	99211eb375	Introduce the SettingDiff only_additional_fields method	2024-05-30 11:22:49 +02:00
Louis Dureuil	4f03b0cf5b	Add ranking score threshold to similar	2024-05-30 11:20:50 +02:00
Louis Dureuil	c26db7878c	Expose rankingScoreThreshold in API	2024-05-30 10:32:35 +02:00
ManyTheFish	1ab88e10b9	Merge branch 'main' into merge-release-v1.8.1-in-main	2024-05-29 16:24:00 +02:00
Louis Dureuil	aac1d769a7	Add ranking_score_threshold to milli	2024-05-29 14:17:09 +02:00
ManyTheFish	abdc4afcca	Implement Frequency matching strategy	2024-05-29 13:59:08 +02:00
Many the fish	e1fbfde6c4	Merge branch 'main' into merge-release-v1.8.1-in-main	2024-05-29 11:31:03 +02:00
ManyTheFish	27b75ec648	merge main into v1.8.1	2024-05-29 11:26:07 +02:00
Louis Dureuil	ca6cc4654b	Add similar route	2024-05-28 15:28:19 +02:00
Louis Dureuil	d35278320e	Add support functions for accessing arroy writers and readers	2024-05-28 15:27:43 +02:00
Louis Dureuil	02b3d82c60	filtered_universe accepts index and txn instead of SearchContext	2024-05-28 15:22:12 +02:00
Louis Dureuil	fd2c95999d	Change `validate_document_id` to public and remove extra layer of result	2024-05-28 15:21:19 +02:00
Clément Renault	dc949ab46a	Remove puffin usage	2024-05-27 15:59:14 +02:00
Clément Renault	7f3e51349e	Remove puffin for the dependencies	2024-05-27 15:53:06 +02:00
meili-bors[bot]	19acc65ad2	Merge #4646 4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops This PR implements what is described in #4485. It reduces the number of disk writes and disk usage. Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-05-23 16:06:50 +00:00
Clément Renault	fe17c0f52e	Construct the minimal OBKVs according to the settings diff	2024-05-23 11:23:57 +02:00
Clément Renault	bc5663e673	FieldIdsMap no longer useful thanks to #4631	2024-05-22 16:06:15 +02:00
Louis Dureuil	8a941c0241	Smaller review changes	2024-05-22 14:44:42 +02:00
Louis Dureuil	3412e7fbcf	"[]" is deserialized as 0 embedding rather than 1 embedding of dim 0	2024-05-22 12:25:21 +02:00
Louis Dureuil	16037e2169	Don't remove embedders that are not in the config from the document DB	2024-05-22 12:24:51 +02:00
Louis Dureuil	8f7c8ca7f0	Remove now unused error variant	2024-05-22 12:23:43 +02:00
Clément Renault	500ddc76b5	Make the flattened sorter optional	2024-05-21 16:16:36 +02:00
Clément Renault	943f8dba0c	Make clippy happy	2024-05-21 14:58:41 +02:00
Clément Renault	1aa8ed9ef7	Make the original sorter optional	2024-05-21 14:53:26 +02:00
ManyTheFish	f762307838	Fix clippy	2024-05-21 13:44:20 +02:00
ManyTheFish	3e94a90722	Fixes	2024-05-21 13:39:46 +02:00
Louis Dureuil	b17cb56dee	Test array of vectors	2024-05-20 14:44:10 +02:00
ManyTheFish	fc7e817221	Index geo points based on the settings differences	2024-05-20 12:27:26 +02:00
Louis Dureuil	d05d49ffd8	Fix tests	2024-05-20 10:36:18 +02:00
Louis Dureuil	0462ebbe58	Don't write an empty _vectors field	2024-05-20 10:36:18 +02:00
Louis Dureuil	2f7a8a4efb	Don't write vectors that weren't autogenerated in document DB	2024-05-20 10:36:18 +02:00
Louis Dureuil	52d9cb6e5a	Refactor vector indexing - use the parsed_vectors module - only parse `_vectors` once per document, instead of once per embedder per document	2024-05-20 10:36:17 +02:00
Louis Dureuil	261de888b7	Add function to get the embeddings of a document in an index	2024-05-20 10:36:17 +02:00
Louis Dureuil	98c811247e	Add parsed vectors module	2024-05-20 10:25:59 +02:00
Tamo	273c6e8c5c	uses the latest version of heed to get rid of unsafe code	2024-05-16 18:31:32 +02:00
Tamo	897d25780e	update milli to latest version	2024-05-16 18:31:32 +02:00
Tamo	f2d0a59f1d	when no searchable attributes are defined, makes all the weight equals to zero	2024-05-16 01:06:33 +02:00
Tamo	c78a2fa4f5	rename method and variable around the attributes to search on feature	2024-05-15 18:04:42 +02:00
Tamo	5542f1d9f1	get back to what we were doingb efore in the DB cache and with the restricted field id	2024-05-15 18:00:39 +02:00
Tamo	ad4d8502b3	stops storing the whole fieldids weights map when no searchable are defined	2024-05-15 17:16:10 +02:00
Tamo	7ec4e2a3fb	apply all style review comments	2024-05-15 15:02:26 +02:00
Tamo	9fffb8e83d	make clippy happy	2024-05-14 17:36:32 +02:00
Tamo	caa6a7149a	make the attribute ranking rule use the weights and fix the tests	2024-05-14 17:36:32 +02:00
Tamo	a0082c4df9	add a failing test on the attribute ranking rule	2024-05-14 17:00:02 +02:00
Tamo	b0afe0972e	stop updating the fields ids map when fields are only swapped	2024-05-14 17:00:02 +02:00
Tamo	9ecde41853	add a test on the current behaviour	2024-05-14 17:00:02 +02:00
Tamo	685f452fb2	Fix the indexing of the searchable	2024-05-14 17:00:02 +02:00
Tamo	4e4a1ddff7	gate a test behind the required feature	2024-05-14 17:00:02 +02:00
Tamo	c22460045c	Stops returning an option in the internal searchable fields	2024-05-14 17:00:02 +02:00
Clément Renault	ac4bc143c4	Bump ureq to v2.9.7	2024-05-07 10:39:38 +02:00
meili-bors[bot]	4d5971f343	Merge #4621 4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-05-06 13:46:39 +00:00
Louis Dureuil	f4dd73ec8c	Destructure EmbedderOptions so we don't miss some options	2024-05-02 15:39:36 +02:00
ManyTheFish	88174b8ae4	Update charabia v0.8.10	2024-04-30 14:30:23 +02:00
meili-bors[bot]	ebca29f3de	Merge #4597 4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish # Pull Request - add some conditions reducing the work done when changing the settings - add some benchmarks on embedders ## Related issue Fixes #4585 Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-04-25 16:37:28 +00:00
meili-bors[bot]	c793b6ef6d	Merge #4600 4600: Fix embedders api r=ManyTheFish a=ManyTheFish # Pull Request ## Related issue Fixes #4594 Fixes #4595 Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-04-25 13:16:33 +00:00
Clément Renault	d4aeff92d0	Introduce the ThreadPoolNoAbort wrapper	2024-04-24 16:40:12 +02:00
ManyTheFish	9b76501875	Display set API key for Ollama embedder	2024-04-24 12:33:07 +02:00
Clément Renault	b3173d0423	Remove useless dots in the error messages	2024-04-22 18:09:33 +02:00
Clément Renault	96cc5319c8	Introduce a new internal error type to categorize panics	2024-04-22 18:09:33 +02:00
Clément Renault	0c7003c5df	Introduce an atomic to catch panics in thread pools	2024-04-22 18:09:33 +02:00
ManyTheFish	a1aa999026	Add conditions reducing wrok	2024-04-22 14:18:35 +02:00
ManyTheFish	c71b5d09ff	Updatre charabia v0.8.9	2024-04-18 11:38:26 +02:00
writegr	ab43a8a949	chore: fix some typos in comments Signed-off-by: writegr <wellweek@outlook.com>	2024-04-18 14:12:52 +08:00
meili-bors[bot]	4a8459b799	Merge #4576 4576: increase the default search time budget from 150ms to 1.5s r=ManyTheFish a=irevoire # Pull Request ## Related issue Fixes #4575 ## What does this PR do? - increase the default search time budget from 150ms to 1.5s Co-authored-by: Tamo <tamo@meilisearch.com>	2024-04-17 16:04:47 +00:00
Clément Renault	c923adf222	Fix facet distribution for alpha on facet numbers	2024-04-17 16:31:16 +02:00
ManyTheFish	df29ba709a	Make some cleaning in Arcs	2024-04-17 12:33:25 +02:00
ManyTheFish	3acfab2eb7	Fix PR comments	2024-04-17 10:55:51 +02:00
Tamo	19137be0ea	increase the default search time budget from 150ms to 1.5s	2024-04-16 18:09:49 +02:00
ManyTheFish	87a93ba47d	fix clippy	2024-04-16 14:39:30 +02:00
ManyTheFish	eaf113ef34	Fix wod pair proximity error when nothing has to be extracted	2024-04-16 14:39:30 +02:00
ManyTheFish	e5ae337aae	Comeback to sorters in extract_word_docids using buffers and merge the keys manually is less efficient	2024-04-16 14:39:30 +02:00
ManyTheFish	a489b406b4	fix test	2024-04-16 14:39:06 +02:00
ManyTheFish	02c3d6b265	finish work	2024-04-16 14:39:06 +02:00
ManyTheFish	b5e4a55af6	refactor faceted and searchable pipeline	2024-04-16 14:39:06 +02:00
ManyTheFish	a7e368aaa6	Create InnerIndexSettingsDiffs struct and populate it	2024-04-16 14:39:06 +02:00
ManyTheFish	893200ab87	Avoid clearing documents in transform	2024-04-16 14:39:06 +02:00
ManyTheFish	aabce52b1b	Fix test	2024-04-16 14:39:06 +02:00
ManyTheFish	8fff5fc281	update tests	2024-04-16 14:39:06 +02:00
yudrywet	cf864a1c2e	chore: fix some typos in comments Signed-off-by: yudrywet <yudeyao@yeah.net>	2024-04-14 20:11:34 +08:00
Louis Dureuil	89e72fab32	Update grenad to fix rare DB corruption	2024-04-11 21:06:59 +02:00
meili-bors[bot]	b1844b0c27	Merge #4548 4548: v1.8 hybrid search changes r=dureuill a=dureuill Implements the search changes from the [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#40f24df3da694428a39cc8043c9cfc64) ### ⚠️ Breaking changes in an experimental feature: - Removed the `_semanticScore`. Use the `_rankingScore` instead. - Removed `vector` in the response of the search (output was too big). - Removed all the vectors from the `vectorSort` ranking score details - target vector appearing in the name of the rule - matched vector appearing in the details of the rule ### Other user-facing changes - Added `semanticHitCount`, indicating how many hits were returned from the semantic search. This is especially useful in the hybrid search. - Embed lazily: Meilisearch no longer generates an embedding when the keyword results are "good enough". - Graceful embedding failure in hybrid search: when doing hybrid search (`semanticRatio in ]0.0, 1.0[`), an embedding failure no longer causes the search request to fail. Instead, only the keyword search is performed. When doing a full vector search (`semanticRatio==1.0`), a failure to embed will still result in failing that search. Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-04-04 16:00:20 +00:00
Louis Dureuil	1ff2a2d6fb	Add semanticHitCount	2024-04-04 16:04:06 +02:00
Louis Dureuil	3c6e9851a4	Correct error formatting	2024-04-04 15:58:19 +02:00
Louis Dureuil	466d718a05	Fix test	2024-04-04 15:58:19 +02:00
Louis Dureuil	6ebb6b55a6	Lazily embed, don't fail hybrid search on embedding failure	2024-04-04 15:58:17 +02:00
Louis Dureuil	fabc9cf14a	milli: add Embedder::embed_one	2024-04-04 15:57:29 +02:00
Louis Dureuil	00c4ed3bc2	milli: refactor getting embedder and embedder name	2024-04-04 15:57:29 +02:00
Louis Dureuil	928e6e4c05	Breaking change: remove vector for score details	2024-04-04 15:57:29 +02:00
meili-bors[bot]	339a5e3431	Merge #4549 4549: Hugging Face embedder improvements r=dureuill a=dureuill Architectural changes/Internal improvements ### 1. Prefer safetensors weights over pytorch weights when available safetensors weights are memory mapped, which reduces memory usage of supported models. ### 2. Update candle Updates candle to `0.4.1`, now targeting crates.io and the tokenizers to `v0.15.2` (still on github). This might fix https://github.com/meilisearch/meilisearch/issues/4399 thanks to the now included https://github.com/huggingface/candle/issues/1454 Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-04-04 13:47:18 +00:00
meili-bors[bot]	5509bafff8	Merge #4535 4535: Support Negative Keywords r=ManyTheFish a=Kerollmops This PR fixes #4422 by supporting `-` before any word in the query. The minus symbol `-`, from the ASCII table, is not the only character that can be considered the negative operator. You can see the two other matching characters under the `Based on "-" (U+002D)` section on [this unicode reference website](https://www.compart.com/en/unicode/U+002D). It's important to notice the strange behavior when a query includes and excludes the same word; only the derivative ( synonyms and split) will be kept: - If you input `progamer -progamer`, the engine will still search for `pro gamer`. - If you have the synonym `like = love` and you input `like -like`, it will still search for `love`. ## TODO - [x] Add analytics - [x] Add support to the `-` operator - [x] Make sure to support spaces around `-` well - [x] Support phrase negation - [x] Add tests Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-04-04 13:10:27 +00:00
Louis Dureuil	58cafcc824	Update candle	2024-04-03 13:11:56 +02:00
meili-bors[bot]	56bf8503db	Merge #4537 4537: Expose distribution shift in settings r=ManyTheFish a=dureuill See [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#d652adc0890445658aaf36352dbc8802) # Changes - Distribution shift added to all embedders. - Exposed in settings - Changed the reindexing logic to not trigger a reindex operation when only the distribution shift or API key change Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-04-03 09:08:58 +00:00
Louis Dureuil	a1eccc762a	Prefer safetensors to pytorch when both are available	2024-04-03 11:05:59 +02:00
meili-bors[bot]	75f81a0bab	Merge #4547 4547: Fix milli/Cargo.toml for usage as dependency via git r=dureuill a=Toromyx # Pull Request ## Related issues/discussions This enables th usage of `milli` [via git repository](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) as mentioned in <https://github.com/meilisearch/meilisearch/issues/3367#issuecomment-1422613815>, <https://github.com/meilisearch/meilisearch/discussions/1523#discussioncomment-1039338>, and <https://github.com/meilisearch/meilisearch/discussions/1981#discussioncomment-1771568> ## What does this PR do? Trying to depend on `milli` like ``` [dependencies.milli] git = "https://github.com/meilisearch/meilisearch.git" tag = "v1.7.4" ``` leads to the following error: ``` error: failed to select a version for the requirement `candle-core = "^0.3.1"` candidate versions found which didn't match: 0.4.2 location searched: Git repository https://github.com/huggingface/candle.git required by package `milli v1.7.4 (https://github.com/meilisearch/meilisearch.git?tag=v1.7.4#0259ad60)` ``` because the default branch of <https://github.com/huggingface/candle> does not contain the correct version. To fix this, i added a `rev="..."` entry in the relevant dependencies, specifiyng the commit already present in the `Cargo.lock` file. I also updated the version to the one in the Cargo.lock. This also updated `candle-kernels` sub-dependency from 0.3.1 to 0.3.3 which is probably correct? ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Thomas Gauges <thomas.gauges@gmail.com>	2024-04-03 07:31:36 +00:00
Thomas Gauges	d55d496250	Fix milli/Cargo.toml for usage as dependency via git	2024-04-02 15:19:30 +02:00
redistay	182cb42953	chore: fix some typos in conments Signed-off-by: redistay <wujunjing@outlook.com>	2024-04-02 19:37:55 +08:00
meili-bors[bot]	92a049c2dd	Merge #4543 4543: Bring back changes from v1.7.4 into main r=Kerollmops a=dureuill Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: dureuill <dureuill@users.noreply.github.com>	2024-03-28 16:53:51 +00:00
Clément Renault	877f4b1045	Support negative phrases	2024-03-28 15:51:43 +01:00
Louis Dureuil	796213af9a	Merge branch 'main' into tmp-release-v1.7.4	2024-03-28 10:51:49 +01:00
Clément Renault	69f8b2730d	Fix the tests	2024-03-28 10:47:04 +01:00
Louis Dureuil	ee8cbea810	Don't optimize reindexing when fields contain dots	2024-03-27 17:04:45 +01:00
Louis Dureuil	572fb3a51d	Finer granularity for embedder needs reindex	2024-03-27 12:01:34 +01:00
Louis Dureuil	4ff0255783	remove unused function	2024-03-27 11:51:14 +01:00
Louis Dureuil	a25456120d	Expose distribution in settings	2024-03-27 11:51:04 +01:00
Louis Dureuil	168ded3b9d	Deserr for distribution	2024-03-27 11:50:33 +01:00
Louis Dureuil	afd1da5642	Add distribution to all embedders	2024-03-27 11:50:22 +01:00
Clément Renault	34262c7a0d	Add analytics for the negative operator	2024-03-26 18:01:27 +01:00
Clément Renault	1da9e0f246	Better support space around the negative operator (-)	2024-03-26 17:47:13 +01:00
Clément Renault	e4a3e603b3	Expose a first working version of the negative keyword	2024-03-26 17:47:13 +01:00
Louis Dureuil	817ccc089a	also allow `api_key`	2024-03-25 11:50:00 +01:00
Louis Dureuil	4136630ea5	Use constants instead of raw strings in set_*set()	2024-03-25 11:39:33 +01:00
Louis Dureuil	58972f35cb	Allow `url` parameter for ollama embedder	2024-03-25 11:32:55 +01:00
Louis Dureuil	dfa5e41ea6	Check validity of the URL setting	2024-03-25 11:23:16 +01:00
Louis Dureuil	a1db342f01	Expose REST embedder to the API	2024-03-25 11:23:15 +01:00
Louis Dureuil	f87747f4d3	Remove unwraps	2024-03-25 11:23:04 +01:00
Louis Dureuil	b6b4b6bab7	Remove the tokio and the reqwests	2024-03-25 11:23:03 +01:00
Louis Dureuil	ac52c857e8	Update ollama and openai impls to use the rest embedder internally	2024-03-25 11:23:03 +01:00
Louis Dureuil	8708cbef25	Add RestEmbedder	2024-03-25 11:23:03 +01:00
Louis Dureuil	c3d02f092d	OpenAI sync	2024-03-25 11:23:03 +01:00
Louis Dureuil	bc58e8a310	Documentation for the vector module	2024-03-25 11:23:03 +01:00
meili-bors[bot]	ec81c2bf1a	Merge #4511 4511: Bump charabia to 0.8.8 r=ManyTheFish a=6543 ... and update lock file this will add the fix (https://github.com/meilisearch/charabia/pull/275) to support markdown formatted codeblocks Co-authored-by: 6543 <6543@obermui.de>	2024-03-25 09:26:11 +00:00
meili-bors[bot]	fc1c3f4a29	Merge #4466 4466: Implements the search cutoff r=irevoire a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4488 ## What does this PR do? - Adds a cutoff to the bucket sort after 150ms has been spent - Adds a new setting to customize the default value of 150ms - When the time is exceeded, we exit early with what we had the time to sort - If the cutoff has been reached, the search details are updated with a new `Skip` ranking details for the ranking rules that were skipped - Adds analytics to measure the total number of degraded search requests - Adds the number of degraded search requests to the Prometheus metrics and Grafana dashboard - The cutoff must not skip the filters; otherwise, we would leak documents to people who don’t have the right to see them Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-03-20 13:06:53 +00:00
6543	4628b7b7bd	bump charabia to 0.8.8 and update lock file	2024-03-20 13:39:00 +01:00
Tamo	c5322df519	Revert "Revert "Merge remote-tracking branch 'origin/main' into release-v1.7.1""	2024-03-20 10:08:28 +01:00
Tamo	6079141ea6	snapshot the scores side by side with the score details	2024-03-19 18:30:14 +01:00
Tamo	2c3af8e513	query the detailed score detail in the test	2024-03-19 18:09:02 +01:00
Louis Dureuil	098ab594eb	A score of 0.0 is now lesser than a sort result handles the niche case 🐩 in the hybrid search where: 1. a sort ranking rule is the first rule. 2. the keyword search is skipped at the first rule. 3. the semantic search is not skipped at the first rule. Previously, we would have the skipped search winning, whereas we want the non skipped one winning.	2024-03-19 17:32:32 +01:00
Tamo	567194b925	Revert "Merge remote-tracking branch 'origin/main' into release-v1.7.1" This reverts commit `bd74cce86a`, reversing changes made to `d2f77e88bd`.	2024-03-19 16:56:21 +01:00
Tamo	d8fe4fe49d	return the order in the score details	2024-03-19 15:45:04 +01:00
Tamo	7b9e0d2944	forward the degraded parameter to the hybrid search	2024-03-19 15:11:21 +01:00
Tamo	bfec9468d4	Update milli/src/search/mod.rs Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-03-19 14:49:15 +01:00
Clément Renault	bd74cce86a	Merge remote-tracking branch 'origin/main' into release-v1.7.1	2024-03-19 13:39:17 +01:00
Tamo	b8cda6c300	fix the search cutoff and add a test	2024-03-19 10:35:47 +01:00
Tamo	d1db495119	add a settings for the search cutoff	2024-03-19 10:28:23 +01:00
Tamo	4a467739cd	implements a first version of the cutoff without settings	2024-03-19 10:28:21 +01:00
Louis Dureuil	a302e258bd	Don't display dimensions as 0 when it is not set	2024-03-18 16:10:12 +01:00
shuangcui	5c95b5c933	chore: remove repetitive words Signed-off-by: shuangcui <fliter@qq.com>	2024-03-14 21:28:55 +08:00
meili-bors[bot]	abd954755d	Merge #4476 4476: Make the `/facet-search` route use the `sortFacetValuesBy` setting r=irevoire a=Kerollmops This PR fixes #4423 by ensuring that the `/facet-search` route uses the `sortFacetValuesBy` setting. Note for the documentation team (to be moved in the tracking issue): Using the new `sortFacetValuesBy` setting can slow down the facet-search requests as Meilisearch iterates over the whole list of facet values and computes the count of documents on every entry. That is hardly or even impossible to optimize correctly. ### TODO - [x] Create a custom HashMap wrapper for the facet `OrderBy` settings. This wrapper will return the `OrderBy` setting of the facet, if not defined will use the default `*` one, and if not there either (strange) will fall back on the lexicographic one. - [x] Create a `ValuesCollection` wrapper that implements the logic for the lexicographic and count order by. - [x] Use it when there is no search query. - [x] Use it when there is a search query with and without allowed typos. - [x] Do not change the original logic, only use a wrapper. - [x] Add tests Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-03-13 14:36:14 +00:00
Clément Renault	f3fc2bd01f	Address some issues with preallocations	2024-03-13 15:22:14 +01:00
Clément Renault	e0dac5a22f	Simplify the algorithm by using the new facet values collection wrapper	2024-03-13 11:31:34 +01:00
Clément Renault	b918b55c6b	Introduce a new facet value collection wrapper to simply the usage	2024-03-13 11:31:34 +01:00
Clément Renault	306b25ad3a	Move the searchForFacetValues struct into a dedicated module	2024-03-13 10:24:21 +01:00
Clément Renault	9f7a4fbfeb	Return the facets of a placeholder facet-search sorted by count	2024-03-13 10:09:01 +01:00
meili-bors[bot]	5ed7b6a0b2	Merge #4456 4456: Add Ollama as an embeddings provider r=dureuill a=jakobklemm # Pull Request ## Related issue [Related Discord Thread](https://discord.com/channels/1006923006964154428/1211977150316683305) ## What does this PR do? - Adds Ollama as a provider of Embeddings besides HuggingFace and OpenAI under the name `ollama` - Adds the environment variable `MEILI_OLLAMA_URL` to set the embeddings URL of an Ollama instance with a default value of `http://localhost:11434/api/embeddings` if no variable is set - Changes some of the structs and functions in `openai.rs` to be public so that they can be shared. - Added more error variants for Ollama specific errors - It uses the model `nomic-embed-text` as default, but any string value is allowed, however it won't automatically check if the model actually exists or is an embedding model Tested against Ollama version `v0.1.27` and the `nomic-embed-text` model. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Co-authored-by: Jakob Klemm <jakob@jeykey.net> Co-authored-by: Louis Dureuil <louis.dureuil@gmail.com>	2024-03-13 08:48:47 +00:00
Louis Dureuil	ae67d5eef0	Update milli/src/vector/error.rs Fix Meilisearch capitalization	2024-03-13 09:45:04 +01:00
Jakob Klemm	88bc9556a9	Add Ollama dimension inference and add clearer errors Instead of the user manually specifying the model dimensions it will now automatically get determined Just like with hf.rs the word "test" gets embedded to determine the dimensions of the output Add a dedicated error type for if the model doesn't exist (don't automatically pull it though) and set the fault of that error to be the user	2024-03-12 19:59:11 +01:00
Clément Renault	ca4876fd10	Do not reindex when modifying unknown faceted field	2024-03-12 16:18:58 +01:00
Clément Renault	d3a95ea2f6	Introduce a new OrderByMap struct to simplify the sort by usage	2024-03-12 13:56:56 +01:00
Clément Renault	69c118ef76	Extract the facet order before extracting the facets values	2024-03-12 10:35:39 +01:00
meili-bors[bot]	ee3076d5ba	Merge #4462 4462: Divide threshold by ten r=dureuill a=ManyTheFish Change the facet incremental vs bulk indexing threshold to better fit our user needs, it might be changed in the future if we have more insights Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-03-06 13:05:38 +00:00
meili-bors[bot]	ab1224bfa7	Merge #4458 4458: Replace logging timer by spans r=Kerollmops a=dureuill - Remove logging timer dependency. - Remplace last uses in search by spans Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-03-05 16:43:23 +00:00
meili-bors[bot]	eefc1c421e	Merge #4459 4459: Put a bound on OpenAI timeout r=dureuill a=dureuill # Pull Request ## Related issue Fixes #4460 ## What does this PR do? - Makes sure that the timeout of the openai embedder is limited to max 1min, rather than the prior 15min+ Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-03-05 15:18:51 +00:00
Louis Dureuil	0c216048b5	Cap timeout duration	2024-03-05 12:19:25 +01:00
Louis Dureuil	36d17110d8	openai: Handle BAD_GETAWAY, be more resilient to failure	2024-03-05 12:18:54 +01:00
Louis Dureuil	25f64ce7df	Replace logging timer by spans	2024-03-05 11:05:42 +01:00
Louis Dureuil	b11df7ec34	Meilisearch: fix some wrong spans	2024-03-05 10:11:43 +01:00
ManyTheFish	eada6de261	Divide threshold by ten	2024-03-04 18:02:54 +01:00
Jakob Klemm	d3004d8040	Implemented Ollama as an embeddings provider Initial prototype of Ollama embeddings actually working, error handlign / retries still missing. Allow model to be any String and require dimensions parameter Fixed rustfmt formatting issues There were some formatting issues in the initial PR and this should not make the changes comply with the Rust style guidelines Because I accidentally didn't follow the style guide for commits in my commit messages I squashed them into one to comply	2024-03-04 15:09:43 +01:00
Louis Dureuil	452a343a2b	Fix imports	2024-02-28 18:09:40 +01:00
meili-bors[bot]	b87485e80d	Merge #4433 4433: Enhance facet incremental r=Kerollmops a=ManyTheFish # Pull Request ## Related issue Fixes #4367 Fixes #4409 ## What does this PR do? - Add a test reproducing #4409 - Fix #4409 by removing a document from a level only if it is no more present in all the linked sub-level nodes - Optimize facet Incremental indexing by creating or deleting a complete level once per field id instead of for each facet value - Optimize facet Incremental indexing by doing the additions and the deletions in the same process instead of doing them separately Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-02-28 15:28:46 +00:00
ManyTheFish	5e83bac448	Fix PR comments	2024-02-26 15:40:15 +01:00
Louis Dureuil	55796406c5	Add GPU analytics	2024-02-26 10:41:47 +01:00
ManyTheFish	a493a50825	Fix clippy	2024-02-22 14:53:33 +01:00
ManyTheFish	9d1f489a37	Fix facet incremental indexing	2024-02-21 18:42:16 +01:00
meili-bors[bot]	d34692e30b	Merge #4365 4365: Update charabia r=dureuill a=ManyTheFish Update Charabia v0.8.7, - Add Vietnamese Normalization (Ð and Đ into d) Fixes #4357 Charabia versions: - https://github.com/meilisearch/charabia/releases/tag/v0.8.6 - https://github.com/meilisearch/charabia/releases/tag/v0.8.7 Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-02-14 16:57:25 +00:00
ManyTheFish	78e04520fc	Update charabia version	2024-02-14 15:16:16 +01:00
ManyTheFish	03bb6372af	Change is_batchable_with by mergeable_with	2024-02-14 11:50:22 +01:00
ManyTheFish	3beda8833d	Fix and add logs	2024-02-14 11:46:30 +01:00
ManyTheFish	55e942cd45	buggy	2024-02-13 15:26:30 +01:00
ManyTheFish	48026aa75c	fix PR comments	2024-02-13 15:19:01 +01:00
Many the fish	e5e811e2c9	Update milli/src/update/index_documents/extract/mod.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-02-13 14:22:21 +01:00
Many the fish	55de96f74e	Update milli/src/update/facet/mod.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-02-13 14:22:10 +01:00
ManyTheFish	39c83cb3d9	fix clippy	2024-02-12 09:12:54 +01:00
Louis Dureuil	7efb1cae11	yield in loop when the channel is not disconnected	2024-02-12 09:12:54 +01:00
Louis Dureuil	7877788510	fix logs	2024-02-12 09:12:54 +01:00
ManyTheFish	be1b054b05	Compute chunk size based on the input data size ant the number of indexing threads	2024-02-08 17:28:37 +01:00
meili-bors[bot]	023c2d755f	Merge #4391 4391: Tracing r=dureuill a=irevoire # Pull Request - [ ] Hide the parameters of the process batch - [x] Make actix-web trace every call on every route - [x] Remove all `env_logger`/`logs` dependencies - [x] Be able to enable or disable the memory measurement using the `/logs` route parameters See the following product discussion: https://github.com/orgs/meilisearch/discussions/721 Supersedes https://github.com/meilisearch/meilisearch/pull/4338 ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4317 ## What does this PR do? Update the format of the logs from: ``` [2024-02-06T14:54:11Z INFO actix_server::builder] starting 10 workers ``` to ``` 2024-02-06T13:58:14.710803Z INFO actix_server::builder: 200: starting 10 workers ``` First, run meilisearch with the route enabled via the feature flag: - `cargo run --experimental-enable-logs-route` - Or at runtime by sending the following payload: ``` curl \ -X PATCH 'http://localhost:7700/experimental-features/' \ -H 'Content-Type: application/json' \ --data-binary '{ "logsRoute": true }' ``` Then gather data from meilisearch by calling for example: ``` curl \ -X POST http://localhost:7700/logs \ -H 'Content-Type: application/json' \ --data-binary '{ "mode": "fmt", "target": "milli=trace" }' ``` Once your operation is over, tell meilisearch to stop the route: ``` curl \ -X DELETE http://localhost:7700/logs ``` ---- In the case you’re profiling code, you will be interested by the next command that converts the output of the route to a format that the firefox profiler can understand. ```bash cargo run --release --bin trace-to-firefox -- 2024-01-17_17:07:55-indexing-trace.json ``` Then go to https://profiler.firefox.com and load it. Note that we can also share the profiles using the https://share.firefox.dev website. Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2024-02-08 14:16:56 +00:00
Louis Dureuil	407ad753ed	rust fmt	2024-02-08 15:11:42 +01:00
Tamo	bf43a3f60a	fix typo	2024-02-08 15:04:06 +01:00
Tamo	1502382316	use debug instead of debug_span	2024-02-08 15:04:06 +01:00
Tamo	08af0e690c	Structures a bunch of logs	2024-02-08 15:04:06 +01:00
Louis Dureuil	db722d201a	Write entries into database downgraded to trace level	2024-02-08 15:04:05 +01:00
Tamo	e773dfa9ba	get rids of log in milli and add logs for the bucket sort	2024-02-08 15:04:05 +01:00
Louis Dureuil	5d7061682e	Add tracing to milli	2024-02-08 15:03:31 +01:00
meili-bors[bot]	72ebac1fbb	Merge #4388 4388: Cap the maximum memory of the grenad sorters r=curquiza a=Kerollmops This PR clamps the memory usage of the grenad sorters to a reasonable maximum. Grenad sorters are opened on multiple threads at a time. This can result in higher memory usage than expected, even though it shouldn't consume more than the memory available. Fixes #4152. Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-02-08 13:19:28 +00:00
Louis Dureuil	a1caac9bfb	Correct distribution shifts for new models	2024-02-07 15:09:16 +01:00
Louis Dureuil	88d03c56ab	Don't accept dimensions of 0 (ever) or dimensions greater than the default dimensions of the model	2024-02-07 11:52:09 +01:00
Louis Dureuil	32ee05ccef	Fix default dimensions for models	2024-02-07 11:52:09 +01:00
Louis Dureuil	74c180267e	pass dimensions only when defined	2024-02-07 11:52:08 +01:00
Louis Dureuil	517f5332d6	Allow actually passing `dimensions` for OpenAI source -> make sure the settings change is rejected or the settings task fails when the specified model doesn't support overriding `dimensions` and the passed `dimensions` differs from the model's default dimensions.	2024-02-07 11:51:44 +01:00
Louis Dureuil	9ac5750096	Retrieve the overriden dimensions from the configuration when fetching settings	2024-02-07 11:51:44 +01:00
Louis Dureuil	7ae4013478	Make sure the overriden dimensions are always used when embedding	2024-02-07 11:51:44 +01:00
Gosti	fb705116a6	feat: add new models and ability to override dimensions	2024-02-07 11:51:42 +01:00
Clément Renault	053306c0e7	Try with 500MiB	2024-02-07 11:24:43 +01:00
Clément Renault	9eeb75d501	Clamp the max memory of the grenad sorters to a reasonable maximum	2024-02-06 10:47:04 +01:00
Louis Dureuil	fbf5f2a392	Don't use a runtime in extract_embedder, use it only for OpenAI	2024-02-01 10:33:27 +01:00
Louis Dureuil	1555870088	Truncate HuggingFace vectors that are too long	2024-02-01 10:33:27 +01:00
Tamo	9f8f3105d5	make clippy happy	2024-02-01 10:33:27 +01:00
Tamo	318843aacd	add a bunch of tests and fix the error message when adding the geosearch as filterable/sortable while there is malformed documents in the DB	2024-02-01 10:33:27 +01:00
Louis Dureuil	dff2707471	Use MatchingWords from keyword search instead of the one from vector search	2024-02-01 10:33:27 +01:00
Tamo	c1bf33a112	Revert "Remove panic on the geosearch"	2024-01-25 18:51:19 +01:00
Louis Dureuil	f692021bfc	Implement PR comments	2024-01-22 10:25:56 +01:00

... 3 4 5 6 7 ...

2525 Commits