meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-26 20:15:07 +08:00

Author	SHA1	Message	Date
Clément Renault	794ebcd582	Replace grenad with the new grenad various-improvement branch	2024-08-30 11:53:59 +02:00
Clément Renault	b7c77c7a39	Use the latest version of the obkv crate	2024-08-30 11:53:59 +02:00
Clément Renault	0c57cf7565	Replace obkv with the temporary new version of it	2024-08-30 11:53:58 +02:00
Clément Renault	27df9e6c73	Introduce the indexer::index function that runs the indexation	2024-08-30 11:53:58 +02:00
Clément Renault	45c060831e	Introduce typed channels and the merger loop	2024-08-30 11:53:58 +02:00
Clément Renault	874c1ac538	First channels types	2024-08-30 11:53:58 +02:00
Clément Renault	e6ffa4d454	Implement the document merge function for the replace method	2024-08-30 11:53:58 +02:00
Clément Renault	637a9c8bdd	Implement the document merge function for the update method	2024-08-30 11:53:58 +02:00
Louis Dureuil	c683fa98e6	WIP Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-08-30 11:53:57 +02:00
meili-bors[bot]	ee62d9ce30	Merge #4845 4845: Fix perf regression facet strings r=ManyTheFish a=dureuill Benchmarks between v1.9 and v1.10 show a performance regression of about x2 (+3dB regression) for most indexing workloads (+44s for hackernews). [Benchmark interpretation in the engine weekly meeting](https://www.notion.so/meilisearch/Engine-weekly-4d49560d374c4a87b4e3d126a261d4a0?pvs=4#98a709683276450295fcfe1f8ea5cef3). - Initial investigation pointed to #4819 as the origin of the regression. - Further investigation points towards the hypernormalization of each facet value in `extract_facet_string_docids` - Most of the slowdown is in `normalize_facet_strings`, and precisely in `detection.language()`. This PR improves the situation (-10s compared with `main` for hackernews, so only +34s regression compared with `v1.9`) by skipping normalization when it can be skipped. I'm not sure how to fix the root cause though. Should we skip facet locale normalization for now? Cc `@ManyTheFish` --- Tentative resolution options: 1. remove locale normalization from facet. I'm not sure why this is required, I believe we weren't doing this before, so maybe we can stop doing that again. 2. don't do language detection when it can be helped: won't help with the regressions in benchmark, but maybe we can skip language detection when the locales contain only one language? 3. use a faster language detection library: `@Kerollmops` told me about https://github.com/quickwit-oss/whichlang which bolsters x10 to x100 throughput compared with whatlang. Should we consider replacing whatlang with whichlang? Now I understand whichlang supports fewer languages than whatlang, so I also suggest: 4. use whichlang when the list of locales is empty (autodetection), or when it only contains locales that whichlang can detect. If the list of locales contains locales that whichlang cannot detect, then use whatlang instead. --- > [!CAUTION] > this PR contains a commit that adds detailed spans, that were used to detect which part of `extract_facet_string_docids` was taking too much time. As this commit adds spans that are called too often and adds 7s overhead, it should be removed before landing. Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-08-19 06:29:48 +00:00
ManyTheFish	0f965d3574	Remove hotloop's spans	2024-08-14 14:33:36 +02:00
ManyTheFish	ade54493ab	Only detect language for a facet if several locales have been specified by the user in the settings	2024-08-14 12:03:52 +02:00
Louis Dureuil	c3cdc407ec	Avoid unnecessary clone()	2024-08-08 14:57:02 +02:00
Louis Dureuil	2f10273d14	Group by normalized values, make sure you don't remove a value where there remains at still one value that normalizes towards it	2024-08-08 14:02:53 +02:00
Louis Dureuil	e64d0e0ca8	use insert instead of push for bitmaps	2024-08-01 18:32:45 +02:00
Louis Dureuil	0e68718027	Add detailed spans	2024-07-31 13:05:47 +02:00
Louis Dureuil	7c3fc8c655	Split settings and document facet string extractions	2024-07-31 10:57:46 +02:00
Louis Dureuil	8acd3f50bb	skip normalization when the locales and values are the same	2024-07-31 09:53:00 +02:00
Louis Dureuil	d4ea7cc2a9	fix clippy 👉👈	2024-07-25 12:10:32 +02:00
Louis Dureuil	2413592bbf	Display docid when there are documents without manual embeddings for a manual embedder	2024-07-25 12:10:32 +02:00
Louis Dureuil	553440632e	Introduce Setting::some_or_not_set	2024-07-25 12:01:52 +02:00
Louis Dureuil	7a347966da	Allow explicit `dimensions` for ollama	2024-07-25 12:01:51 +02:00
Louis Dureuil	4654d51e05	Add custom headers for REST embedder	2024-07-25 12:01:51 +02:00
ManyTheFish	a918561ac1	Fix PR comments	2024-07-25 10:52:56 +02:00
ManyTheFish	04fa44e7eb	Implement localized attributes settings	2024-07-25 10:51:27 +02:00
ManyTheFish	cc02920f2b	Update charabia	2024-07-25 10:51:27 +02:00
Tamo	988552e178	add tests on the rest embedder	2024-07-24 14:34:17 +02:00
Louis Dureuil	0d8199f3b7	Change parameters in milli settings	2024-07-24 14:34:17 +02:00
Louis Dureuil	24240934f9	Improve errors when indexing documents with a user provided embedder	2024-07-16 13:39:01 +02:00
Louis Dureuil	65d0c32aa7	Allow overriding OpenAI's url	2024-07-16 13:39:00 +02:00
Clément Renault	6e80364c50	Apply review comments	2024-07-11 11:00:27 +02:00
Clément Renault	837274f853	Restrict even more the Rhai engine	2024-07-10 16:30:18 +02:00
Clément Renault	aace587dd1	Create errors for the internal processing ones	2024-07-10 16:29:18 +02:00
Clément Renault	81ec0abad1	Use the new rayon-par-bridge library	2024-07-10 16:29:04 +02:00
Clément Renault	b67d385cf0	Parallelize the edition functions	2024-07-10 16:28:54 +02:00
Clément Renault	2eae2015d7	Support aborting documents edition by function	2024-07-10 16:28:15 +02:00
Clément Renault	33fa17bf12	Support deleting documents with functions	2024-07-10 16:28:15 +02:00
Clément Renault	400e6b93ce	Support user-provided context for documents edition	2024-07-10 16:28:15 +02:00
Clément Renault	f4add93043	Limit the number of script operations	2024-07-10 16:28:14 +02:00
Clément Renault	2fae96ac14	Show the actual number of actually edited documents	2024-07-10 16:28:14 +02:00
Clément Renault	45af18ae9c	Check the Rhai syntax before accepting the script	2024-07-10 16:28:13 +02:00
Clément Renault	2d97164d9f	It works perfectly with some Rhai	2024-07-10 16:28:13 +02:00
Clément Renault	efc156a4a4	Executing Lua works correctly	2024-07-10 16:27:36 +02:00
meili-bors[bot]	2099b4f0dd	Merge #4786 4786: Update dependencies r=Kerollmops a=irevoire # Pull Request ## Related issue Fixes #4753 ## What does this PR do? - Update all dependencies except rustls - [x] Release charabia - [x] Update charabia - [x] Double check that the docker build works after updating charabia Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-07-10 13:23:54 +00:00
Tamo	4d5005b01a	make clippy happy	2024-07-10 10:06:59 +02:00
hanbings	0a40a98bb6	Make milli use edition 2021 (#4770 ) * Make milli use edition 2021 * Add lifetime annotations to milli. * Run cargo fmt	2024-07-09 17:25:39 +02:00
Tamo	cd46ebd6b5	remove insta deprecating	2024-07-08 18:38:05 +02:00
Tamo	1693332cab	Update arroy and always build the tree that need to be built	2024-06-24 10:14:03 +02:00
meili-bors[bot]	ddd564665b	Merge #4713 4713: Speed up facet distribution r=ManyTheFish a=Kerollmops This PR is akin to #4682, but this time, the same logic is applied to the facets. Bitmaps are not decoded, and we do an intersection on the bytes with the search candidates instead of materializing the RoaringBitmap to destroy it just after the operation. A prospect raised some slow requests when performing facet searches, and I found out that the disk optimization intersection wasn't performed on the facets. Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-06-24 05:23:46 +00:00
Clément Renault	9736e16a88	Make clippy happy	2024-06-20 13:02:44 +02:00
Louis Dureuil	a04041c8f2	Only spawn the pool once	2024-06-19 16:25:33 +02:00
Louis Dureuil	0a8f50695e	Fixes for Rust v1.79	2024-06-13 17:47:44 +02:00
Louis Dureuil	e35ef31738	Small changes following review	2024-06-13 14:20:48 +02:00
Louis Dureuil	3bc8f81abc	user_provided => regenerate	2024-06-12 18:12:20 +02:00
Louis Dureuil	a89eea233b	Fix vectors injection	2024-06-12 17:10:19 +02:00
Louis Dureuil	f5cf01e7d1	Rework extraction to use EmbedderAction	2024-06-12 14:50:55 +02:00
Louis Dureuil	d1dd7e5d09	In transform for removed embedders, write back their user provided vectors in documents, and clear the writers	2024-06-12 14:50:55 +02:00
Louis Dureuil	d18c1f77d7	Update embedder configs with a finer granularity - no longer clear vector DB between any two embedder changes	2024-06-12 14:50:55 +02:00
Louis Dureuil	7cef2299cf	Fix behavior when removing a document	2024-06-11 09:45:08 +02:00
Tamo	2cdcb703d9	fix the deletion of vectors and add a test	2024-06-06 11:39:29 +02:00
Tamo	d85ab23b82	rename all occurences of user_defined to user_provided for consistency	2024-06-06 11:39:29 +02:00
Tamo	b7349910d9	implements mor review comments	2024-06-06 11:39:29 +02:00
Tamo	376b3a19a7	makes clippy and fmt happy	2024-06-06 11:39:29 +02:00
Tamo	5d50850e12	always push the user defined vectors in arroy	2024-06-06 11:39:29 +02:00
Tamo	a73ccc78a6	forward the embedding config to the extractors	2024-06-06 11:39:28 +02:00
Tamo	9eb6f522ea	wraps the index embedding config in a struct	2024-06-06 11:37:30 +02:00
Tamo	84e498299b	Remove the vectors from the documents database	2024-06-06 11:36:11 +02:00
Tamo	7a84697570	never store the _vectors as searchable or faceted fields	2024-06-06 11:36:11 +02:00
ManyTheFish	30293883e0	Fix condition mistake	2024-06-05 17:30:07 +02:00
ManyTheFish	b833be46b9	Avoid running proximity when only the exact attributes changes	2024-06-05 17:30:07 +02:00
ManyTheFish	0a4118329e	Put only_additional_fields to None if the difference gives an empty result.	2024-06-05 17:30:07 +02:00
ManyTheFish	261e92d7e6	Skip iterating over documents when the faceted field list doesn't change	2024-06-05 17:30:07 +02:00
ManyTheFish	5cd08979b1	iterate over the faceted fields instead of over the whole document	2024-06-05 17:30:07 +02:00
Clément Renault	a998b881f6	Cache a lot of operations to know if a field must be indexed	2024-06-05 17:30:07 +02:00
Clément Renault	b81953a65d	Add a span for the prepare_for_documents_reindexing	2024-06-05 17:30:07 +02:00
Clément Renault	091bb157f1	Add a span for the settings diff creation	2024-06-05 17:30:07 +02:00
Clément Renault	1b639ce44b	Reduce the number of complex calls to settings diff functions	2024-06-05 17:30:07 +02:00
Clément Renault	87cf8a3c94	Introduce a new way to determine the operations to perform on the fields	2024-06-05 17:30:07 +02:00
Clément Renault	0f578348f1	Introduce a dedicated function to write proximity entries in database	2024-06-05 17:30:07 +02:00
Clément Renault	fad4675abe	Give the settings diff to the write_typed_chunk_into_index function	2024-06-05 17:30:07 +02:00
Clément Renault	1ab03c4ede	Fix an issue with settings diff and * in the searchable attributes	2024-06-05 17:30:07 +02:00
Clément Renault	0c6e4b2f00	Introducing a new into_del_add_obkv_conditional_operation function	2024-06-05 17:30:07 +02:00
Clément Renault	42b3f52ef9	Introduce the SettingDiff only_additional_fields method	2024-06-05 17:30:07 +02:00
ManyTheFish	1ab88e10b9	Merge branch 'main' into merge-release-v1.8.1-in-main	2024-05-29 16:24:00 +02:00
Many the fish	e1fbfde6c4	Merge branch 'main' into merge-release-v1.8.1-in-main	2024-05-29 11:31:03 +02:00
ManyTheFish	27b75ec648	merge main into v1.8.1	2024-05-29 11:26:07 +02:00
Louis Dureuil	d35278320e	Add support functions for accessing arroy writers and readers	2024-05-28 15:27:43 +02:00
Clément Renault	dc949ab46a	Remove puffin usage	2024-05-27 15:59:14 +02:00
meili-bors[bot]	19acc65ad2	Merge #4646 4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops This PR implements what is described in #4485. It reduces the number of disk writes and disk usage. Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-05-23 16:06:50 +00:00
Clément Renault	fe17c0f52e	Construct the minimal OBKVs according to the settings diff	2024-05-23 11:23:57 +02:00
Clément Renault	bc5663e673	FieldIdsMap no longer useful thanks to #4631	2024-05-22 16:06:15 +02:00
Louis Dureuil	8a941c0241	Smaller review changes	2024-05-22 14:44:42 +02:00
Louis Dureuil	16037e2169	Don't remove embedders that are not in the config from the document DB	2024-05-22 12:24:51 +02:00
Clément Renault	500ddc76b5	Make the flattened sorter optional	2024-05-21 16:16:36 +02:00
Clément Renault	1aa8ed9ef7	Make the original sorter optional	2024-05-21 14:53:26 +02:00
ManyTheFish	f762307838	Fix clippy	2024-05-21 13:44:20 +02:00
ManyTheFish	3e94a90722	Fixes	2024-05-21 13:39:46 +02:00
ManyTheFish	fc7e817221	Index geo points based on the settings differences	2024-05-20 12:27:26 +02:00
Louis Dureuil	d05d49ffd8	Fix tests	2024-05-20 10:36:18 +02:00
Louis Dureuil	0462ebbe58	Don't write an empty _vectors field	2024-05-20 10:36:18 +02:00
Louis Dureuil	2f7a8a4efb	Don't write vectors that weren't autogenerated in document DB	2024-05-20 10:36:18 +02:00
Louis Dureuil	52d9cb6e5a	Refactor vector indexing - use the parsed_vectors module - only parse `_vectors` once per document, instead of once per embedder per document	2024-05-20 10:36:17 +02:00
Tamo	897d25780e	update milli to latest version	2024-05-16 18:31:32 +02:00
Tamo	f2d0a59f1d	when no searchable attributes are defined, makes all the weight equals to zero	2024-05-16 01:06:33 +02:00
Tamo	ad4d8502b3	stops storing the whole fieldids weights map when no searchable are defined	2024-05-15 17:16:10 +02:00
Tamo	7ec4e2a3fb	apply all style review comments	2024-05-15 15:02:26 +02:00
Tamo	caa6a7149a	make the attribute ranking rule use the weights and fix the tests	2024-05-14 17:36:32 +02:00
Tamo	b0afe0972e	stop updating the fields ids map when fields are only swapped	2024-05-14 17:00:02 +02:00
Tamo	685f452fb2	Fix the indexing of the searchable	2024-05-14 17:00:02 +02:00
Tamo	4e4a1ddff7	gate a test behind the required feature	2024-05-14 17:00:02 +02:00
Tamo	c22460045c	Stops returning an option in the internal searchable fields	2024-05-14 17:00:02 +02:00
meili-bors[bot]	4d5971f343	Merge #4621 4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-05-06 13:46:39 +00:00
meili-bors[bot]	ebca29f3de	Merge #4597 4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish # Pull Request - add some conditions reducing the work done when changing the settings - add some benchmarks on embedders ## Related issue Fixes #4585 Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-04-25 16:37:28 +00:00
Clément Renault	d4aeff92d0	Introduce the ThreadPoolNoAbort wrapper	2024-04-24 16:40:12 +02:00
Clément Renault	96cc5319c8	Introduce a new internal error type to categorize panics	2024-04-22 18:09:33 +02:00
Clément Renault	0c7003c5df	Introduce an atomic to catch panics in thread pools	2024-04-22 18:09:33 +02:00
ManyTheFish	a1aa999026	Add conditions reducing wrok	2024-04-22 14:18:35 +02:00
ManyTheFish	df29ba709a	Make some cleaning in Arcs	2024-04-17 12:33:25 +02:00
ManyTheFish	3acfab2eb7	Fix PR comments	2024-04-17 10:55:51 +02:00
ManyTheFish	87a93ba47d	fix clippy	2024-04-16 14:39:30 +02:00
ManyTheFish	eaf113ef34	Fix wod pair proximity error when nothing has to be extracted	2024-04-16 14:39:30 +02:00
ManyTheFish	e5ae337aae	Comeback to sorters in extract_word_docids using buffers and merge the keys manually is less efficient	2024-04-16 14:39:30 +02:00
ManyTheFish	a489b406b4	fix test	2024-04-16 14:39:06 +02:00
ManyTheFish	02c3d6b265	finish work	2024-04-16 14:39:06 +02:00
ManyTheFish	b5e4a55af6	refactor faceted and searchable pipeline	2024-04-16 14:39:06 +02:00
ManyTheFish	a7e368aaa6	Create InnerIndexSettingsDiffs struct and populate it	2024-04-16 14:39:06 +02:00
ManyTheFish	893200ab87	Avoid clearing documents in transform	2024-04-16 14:39:06 +02:00
ManyTheFish	aabce52b1b	Fix test	2024-04-16 14:39:06 +02:00
ManyTheFish	8fff5fc281	update tests	2024-04-16 14:39:06 +02:00
yudrywet	cf864a1c2e	chore: fix some typos in comments Signed-off-by: yudrywet <yudeyao@yeah.net>	2024-04-14 20:11:34 +08:00
Louis Dureuil	466d718a05	Fix test	2024-04-04 15:58:19 +02:00
meili-bors[bot]	56bf8503db	Merge #4537 4537: Expose distribution shift in settings r=ManyTheFish a=dureuill See [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#d652adc0890445658aaf36352dbc8802) # Changes - Distribution shift added to all embedders. - Exposed in settings - Changed the reindexing logic to not trigger a reindex operation when only the distribution shift or API key change Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-04-03 09:08:58 +00:00
redistay	182cb42953	chore: fix some typos in conments Signed-off-by: redistay <wujunjing@outlook.com>	2024-04-02 19:37:55 +08:00
meili-bors[bot]	92a049c2dd	Merge #4543 4543: Bring back changes from v1.7.4 into main r=Kerollmops a=dureuill Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: dureuill <dureuill@users.noreply.github.com>	2024-03-28 16:53:51 +00:00
Louis Dureuil	796213af9a	Merge branch 'main' into tmp-release-v1.7.4	2024-03-28 10:51:49 +01:00
Louis Dureuil	ee8cbea810	Don't optimize reindexing when fields contain dots	2024-03-27 17:04:45 +01:00
Louis Dureuil	572fb3a51d	Finer granularity for embedder needs reindex	2024-03-27 12:01:34 +01:00
Louis Dureuil	afd1da5642	Add distribution to all embedders	2024-03-27 11:50:22 +01:00
Louis Dureuil	817ccc089a	also allow `api_key`	2024-03-25 11:50:00 +01:00
Louis Dureuil	4136630ea5	Use constants instead of raw strings in set_*set()	2024-03-25 11:39:33 +01:00
Louis Dureuil	58972f35cb	Allow `url` parameter for ollama embedder	2024-03-25 11:32:55 +01:00
Louis Dureuil	dfa5e41ea6	Check validity of the URL setting	2024-03-25 11:23:16 +01:00
Louis Dureuil	a1db342f01	Expose REST embedder to the API	2024-03-25 11:23:15 +01:00
Louis Dureuil	f87747f4d3	Remove unwraps	2024-03-25 11:23:04 +01:00
Louis Dureuil	ac52c857e8	Update ollama and openai impls to use the rest embedder internally	2024-03-25 11:23:03 +01:00
meili-bors[bot]	fc1c3f4a29	Merge #4466 4466: Implements the search cutoff r=irevoire a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4488 ## What does this PR do? - Adds a cutoff to the bucket sort after 150ms has been spent - Adds a new setting to customize the default value of 150ms - When the time is exceeded, we exit early with what we had the time to sort - If the cutoff has been reached, the search details are updated with a new `Skip` ranking details for the ranking rules that were skipped - Adds analytics to measure the total number of degraded search requests - Adds the number of degraded search requests to the Prometheus metrics and Grafana dashboard - The cutoff must not skip the filters; otherwise, we would leak documents to people who don’t have the right to see them Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-03-20 13:06:53 +00:00
Tamo	c5322df519	Revert "Revert "Merge remote-tracking branch 'origin/main' into release-v1.7.1""	2024-03-20 10:08:28 +01:00
Tamo	567194b925	Revert "Merge remote-tracking branch 'origin/main' into release-v1.7.1" This reverts commit `bd74cce86a`, reversing changes made to `d2f77e88bd`.	2024-03-19 16:56:21 +01:00
Clément Renault	bd74cce86a	Merge remote-tracking branch 'origin/main' into release-v1.7.1	2024-03-19 13:39:17 +01:00
Tamo	d1db495119	add a settings for the search cutoff	2024-03-19 10:28:23 +01:00
meili-bors[bot]	abd954755d	Merge #4476 4476: Make the `/facet-search` route use the `sortFacetValuesBy` setting r=irevoire a=Kerollmops This PR fixes #4423 by ensuring that the `/facet-search` route uses the `sortFacetValuesBy` setting. Note for the documentation team (to be moved in the tracking issue): Using the new `sortFacetValuesBy` setting can slow down the facet-search requests as Meilisearch iterates over the whole list of facet values and computes the count of documents on every entry. That is hardly or even impossible to optimize correctly. ### TODO - [x] Create a custom HashMap wrapper for the facet `OrderBy` settings. This wrapper will return the `OrderBy` setting of the facet, if not defined will use the default `*` one, and if not there either (strange) will fall back on the lexicographic one. - [x] Create a `ValuesCollection` wrapper that implements the logic for the lexicographic and count order by. - [x] Use it when there is no search query. - [x] Use it when there is a search query with and without allowed typos. - [x] Do not change the original logic, only use a wrapper. - [x] Add tests Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-03-13 14:36:14 +00:00
meili-bors[bot]	5ed7b6a0b2	Merge #4456 4456: Add Ollama as an embeddings provider r=dureuill a=jakobklemm # Pull Request ## Related issue [Related Discord Thread](https://discord.com/channels/1006923006964154428/1211977150316683305) ## What does this PR do? - Adds Ollama as a provider of Embeddings besides HuggingFace and OpenAI under the name `ollama` - Adds the environment variable `MEILI_OLLAMA_URL` to set the embeddings URL of an Ollama instance with a default value of `http://localhost:11434/api/embeddings` if no variable is set - Changes some of the structs and functions in `openai.rs` to be public so that they can be shared. - Added more error variants for Ollama specific errors - It uses the model `nomic-embed-text` as default, but any string value is allowed, however it won't automatically check if the model actually exists or is an embedding model Tested against Ollama version `v0.1.27` and the `nomic-embed-text` model. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Co-authored-by: Jakob Klemm <jakob@jeykey.net> Co-authored-by: Louis Dureuil <louis.dureuil@gmail.com>	2024-03-13 08:48:47 +00:00
Jakob Klemm	88bc9556a9	Add Ollama dimension inference and add clearer errors Instead of the user manually specifying the model dimensions it will now automatically get determined Just like with hf.rs the word "test" gets embedded to determine the dimensions of the output Add a dedicated error type for if the model doesn't exist (don't automatically pull it though) and set the fault of that error to be the user	2024-03-12 19:59:11 +01:00
Clément Renault	ca4876fd10	Do not reindex when modifying unknown faceted field	2024-03-12 16:18:58 +01:00
Clément Renault	d3a95ea2f6	Introduce a new OrderByMap struct to simplify the sort by usage	2024-03-12 13:56:56 +01:00
meili-bors[bot]	ee3076d5ba	Merge #4462 4462: Divide threshold by ten r=dureuill a=ManyTheFish Change the facet incremental vs bulk indexing threshold to better fit our user needs, it might be changed in the future if we have more insights Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-03-06 13:05:38 +00:00
Louis Dureuil	b11df7ec34	Meilisearch: fix some wrong spans	2024-03-05 10:11:43 +01:00
ManyTheFish	eada6de261	Divide threshold by ten	2024-03-04 18:02:54 +01:00
Jakob Klemm	d3004d8040	Implemented Ollama as an embeddings provider Initial prototype of Ollama embeddings actually working, error handlign / retries still missing. Allow model to be any String and require dimensions parameter Fixed rustfmt formatting issues There were some formatting issues in the initial PR and this should not make the changes comply with the Rust style guidelines Because I accidentally didn't follow the style guide for commits in my commit messages I squashed them into one to comply	2024-03-04 15:09:43 +01:00
ManyTheFish	5e83bac448	Fix PR comments	2024-02-26 15:40:15 +01:00
ManyTheFish	a493a50825	Fix clippy	2024-02-22 14:53:33 +01:00
ManyTheFish	9d1f489a37	Fix facet incremental indexing	2024-02-21 18:42:16 +01:00
ManyTheFish	03bb6372af	Change is_batchable_with by mergeable_with	2024-02-14 11:50:22 +01:00
ManyTheFish	3beda8833d	Fix and add logs	2024-02-14 11:46:30 +01:00
ManyTheFish	48026aa75c	fix PR comments	2024-02-13 15:19:01 +01:00
Many the fish	e5e811e2c9	Update milli/src/update/index_documents/extract/mod.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-02-13 14:22:21 +01:00
Many the fish	55de96f74e	Update milli/src/update/facet/mod.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-02-13 14:22:10 +01:00
ManyTheFish	39c83cb3d9	fix clippy	2024-02-12 09:12:54 +01:00
Louis Dureuil	7efb1cae11	yield in loop when the channel is not disconnected	2024-02-12 09:12:54 +01:00
Louis Dureuil	7877788510	fix logs	2024-02-12 09:12:54 +01:00
ManyTheFish	be1b054b05	Compute chunk size based on the input data size ant the number of indexing threads	2024-02-08 17:28:37 +01:00
meili-bors[bot]	023c2d755f	Merge #4391 4391: Tracing r=dureuill a=irevoire # Pull Request - [ ] Hide the parameters of the process batch - [x] Make actix-web trace every call on every route - [x] Remove all `env_logger`/`logs` dependencies - [x] Be able to enable or disable the memory measurement using the `/logs` route parameters See the following product discussion: https://github.com/orgs/meilisearch/discussions/721 Supersedes https://github.com/meilisearch/meilisearch/pull/4338 ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4317 ## What does this PR do? Update the format of the logs from: ``` [2024-02-06T14:54:11Z INFO actix_server::builder] starting 10 workers ``` to ``` 2024-02-06T13:58:14.710803Z INFO actix_server::builder: 200: starting 10 workers ``` First, run meilisearch with the route enabled via the feature flag: - `cargo run --experimental-enable-logs-route` - Or at runtime by sending the following payload: ``` curl \ -X PATCH 'http://localhost:7700/experimental-features/' \ -H 'Content-Type: application/json' \ --data-binary '{ "logsRoute": true }' ``` Then gather data from meilisearch by calling for example: ``` curl \ -X POST http://localhost:7700/logs \ -H 'Content-Type: application/json' \ --data-binary '{ "mode": "fmt", "target": "milli=trace" }' ``` Once your operation is over, tell meilisearch to stop the route: ``` curl \ -X DELETE http://localhost:7700/logs ``` ---- In the case you’re profiling code, you will be interested by the next command that converts the output of the route to a format that the firefox profiler can understand. ```bash cargo run --release --bin trace-to-firefox -- 2024-01-17_17:07:55-indexing-trace.json ``` Then go to https://profiler.firefox.com and load it. Note that we can also share the profiles using the https://share.firefox.dev website. Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2024-02-08 14:16:56 +00:00
Louis Dureuil	407ad753ed	rust fmt	2024-02-08 15:11:42 +01:00
Tamo	bf43a3f60a	fix typo	2024-02-08 15:04:06 +01:00
Tamo	1502382316	use debug instead of debug_span	2024-02-08 15:04:06 +01:00
Tamo	08af0e690c	Structures a bunch of logs	2024-02-08 15:04:06 +01:00
Louis Dureuil	db722d201a	Write entries into database downgraded to trace level	2024-02-08 15:04:05 +01:00
Tamo	e773dfa9ba	get rids of log in milli and add logs for the bucket sort	2024-02-08 15:04:05 +01:00
Louis Dureuil	5d7061682e	Add tracing to milli	2024-02-08 15:03:31 +01:00
meili-bors[bot]	72ebac1fbb	Merge #4388 4388: Cap the maximum memory of the grenad sorters r=curquiza a=Kerollmops This PR clamps the memory usage of the grenad sorters to a reasonable maximum. Grenad sorters are opened on multiple threads at a time. This can result in higher memory usage than expected, even though it shouldn't consume more than the memory available. Fixes #4152. Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-02-08 13:19:28 +00:00
Louis Dureuil	88d03c56ab	Don't accept dimensions of 0 (ever) or dimensions greater than the default dimensions of the model	2024-02-07 11:52:09 +01:00
Louis Dureuil	517f5332d6	Allow actually passing `dimensions` for OpenAI source -> make sure the settings change is rejected or the settings task fails when the specified model doesn't support overriding `dimensions` and the passed `dimensions` differs from the model's default dimensions.	2024-02-07 11:51:44 +01:00
Clément Renault	053306c0e7	Try with 500MiB	2024-02-07 11:24:43 +01:00
Clément Renault	9eeb75d501	Clamp the max memory of the grenad sorters to a reasonable maximum	2024-02-06 10:47:04 +01:00
Louis Dureuil	fbf5f2a392	Don't use a runtime in extract_embedder, use it only for OpenAI	2024-02-01 10:33:27 +01:00
Tamo	9f8f3105d5	make clippy happy	2024-02-01 10:33:27 +01:00
Tamo	318843aacd	add a bunch of tests and fix the error message when adding the geosearch as filterable/sortable while there is malformed documents in the DB	2024-02-01 10:33:27 +01:00
Tamo	c1bf33a112	Revert "Remove panic on the geosearch"	2024-01-25 18:51:19 +01:00
Tamo	0887186ecf	make clippy happy	2024-01-17 16:07:10 +01:00
Tamo	7d190d8078	add a bunch of tests and fix the error message when adding the geosearch as filterable/sortable while there is malformed documents in the DB	2024-01-17 15:51:52 +01:00
Clément Renault	01e2c3d6bb	Bump arroy to v0.2.0	2024-01-16 16:45:55 +01:00
Clément Renault	9f9ad4cc05	Fix Clippy warnings	2024-01-16 15:27:24 +01:00
Clément Renault	3ee7682fa7	Fix some integer comparisons	2024-01-16 15:22:23 +01:00
Tamo	54ae6951eb	fix warning	2024-01-02 15:19:30 +01:00
Louis Dureuil	6ff81de401	Fix tests	2023-12-20 17:16:46 +01:00
Louis Dureuil	9123370e90	Validate fused settings in settings task after fusing with existing setting	2023-12-20 17:16:46 +01:00
Louis Dureuil	e249e4db7b	Change Setting::apply function signature	2023-12-20 17:15:24 +01:00
Many the fish	9e1b458010	Merge branch 'main' into change-proximity-precision-settings	2023-12-18 09:08:47 +01:00
ManyTheFish	6425996e36	Change the naming of attributeScale and wordScale into byAttribute and byWord	2023-12-14 16:31:00 +01:00
Louis Dureuil	87bba98bd8	Various changes - fixed seed for arroy - check vector dimensions as soon as it is provided to search - don't embed whitespace	2023-12-14 16:08:42 +01:00

... 2 3 4 5 6 ...

1095 Commits