meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-26 03:55:07 +08:00

Author	SHA1	Message	Date
writegr	ab43a8a949	chore: fix some typos in comments Signed-off-by: writegr <wellweek@outlook.com>	2024-04-18 14:12:52 +08:00
ManyTheFish	a1ea224da9	Fix tests	2024-04-16 17:29:34 +02:00
ManyTheFish	5ab901dd30	Fix tests	2024-04-16 14:39:30 +02:00
ManyTheFish	bad46f88d6	Fix embedder test	2024-04-16 14:39:30 +02:00
meili-bors[bot]	56bf8503db	Merge #4537 4537: Expose distribution shift in settings r=ManyTheFish a=dureuill See [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#d652adc0890445658aaf36352dbc8802) # Changes - Distribution shift added to all embedders. - Exposed in settings - Changed the reindexing logic to not trigger a reindex operation when only the distribution shift or API key change Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-04-03 09:08:58 +00:00
redistay	182cb42953	chore: fix some typos in conments Signed-off-by: redistay <wujunjing@outlook.com>	2024-04-02 19:37:55 +08:00
meili-bors[bot]	78668584cd	Merge #4533 4533: Hide api key in settings and task queue r=dureuill a=dureuill # Pull Request See [Usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#117f5ff7b19f4d95bb3ae0005f6c6633) ## Motivation See [slack discussion (internal link)](https://meilisearch.slack.com/archives/C06GQP7FQ6P/p1709804022298749) ## Changes - The value of the `apiKey` parameter is now hidden in the settings and the details of the task queue. Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-03-28 16:02:53 +00:00
Bruno Casali	8f2606d79d	fixes typos	2024-03-27 14:26:47 -03:00
Louis Dureuil	92224f109a	Fix tests	2024-03-27 12:19:10 +01:00
Louis Dureuil	9a95ed619d	Add tests	2024-03-26 10:36:56 +01:00
Louis Dureuil	f82d056072	Hide secrets in settings and task queue	2024-03-26 10:36:24 +01:00
Tamo	f2f1367ec3	add a timeout to the webhook	2024-03-20 13:59:43 +01:00
Tamo	b130917933	add the content type in the webhook + improve the test	2024-03-05 11:22:29 +01:00
Louis Dureuil	452a343a2b	Fix imports	2024-02-28 18:09:40 +01:00
meili-bors[bot]	b005eb3289	Merge #4435 4435: Make update file deletion atomic r=Kerollmops a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4432 Fixes https://github.com/meilisearch/meilisearch/issues/4438 by adding the logs the user asked ## What does this PR do? - Adds a bunch of logs to help debug this kind of issue in the future - Delete the update files AFTER committing the update in the `index-scheduler` (thus, if a restart happens, we are able to re-process the batch successfully) - Multi-thread the deletion of all update files. Co-authored-by: Tamo <tamo@meilisearch.com>	2024-02-26 17:54:40 +00:00
Tamo	0562818c2a	fix and remove the file-store hack of /dev/null	2024-02-26 13:59:41 +01:00
Tamo	36c27a18a1	implement the dry run ha parameter	2024-02-26 13:58:04 +01:00
Tamo	1eb1c043b5	disable the auto deletion of tasks when the ha mode is enabled	2024-02-26 13:58:04 +01:00
Tamo	eb25b07390	let you specify your task id	2024-02-26 13:56:31 +01:00
Tamo	066a7a3cde	takes only one read transaction per thread	2024-02-26 10:43:04 +01:00
Tamo	91cdd502f8	When processing tasks, make the update file deletion atomic	2024-02-22 14:56:22 +01:00
Tamo	3b6544db6d	Implement the experimental log mode cli flag	2024-02-13 18:09:15 +01:00
Louis Dureuil	ef994d84d0	Change error messages and fix tests	2024-02-08 15:04:06 +01:00
Tamo	f70a615ed9	update the github discussion links	2024-02-08 15:04:05 +01:00
Tamo	7ff722b72e	get rids of the log dependencies everywhere	2024-02-08 15:04:05 +01:00
Tamo	e23ec4886d	fix the tests and add tests on the experimental features	2024-02-08 15:04:03 +01:00
Tamo	7793ba67a4	hide the route logs behind a feature flag	2024-02-08 15:03:33 +01:00
Louis Dureuil	02e6c8a440	Add tracing to index-scheduler	2024-02-08 15:03:31 +01:00
Louis Dureuil	05edd85d75	Stabilize scoreDetails	2024-02-06 11:15:19 +01:00
meili-bors[bot]	1ccde9bf0b	Merge #4316 4316: Autobatch the task deletions r=curquiza a=irevoire # Pull Request ## Related issue Fix part of https://github.com/meilisearch/meilisearch-support/issues/69 Fix #4315 ## What does this PR do? - Autobatch the task deletions Co-authored-by: Tamo <tamo@meilisearch.com>	2024-01-15 17:54:50 +00:00
Tamo	b4d7d80ad9	autobatch the task deletions	2024-01-11 14:58:07 +01:00
Louis Dureuil	97bb1ff9e2	Move `currently_updating_index` to IndexMapper	2024-01-09 15:37:27 +01:00
meili-bors[bot]	658ec6e0a4	Merge #4279 4279: Check experimental feature on setting update query rather than in the task. r=ManyTheFish a=dureuill Improve the UX by checking for the vector store feature and returning an error synchronously when sending a setting update, rather than in the indexing task. Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-12-22 11:36:12 +00:00
Louis Dureuil	ee54d3171e	Check experimental feature at query time	2023-12-21 15:26:12 +01:00
Clément Renault	fa2b96b9a5	Add an Authorization Header along with the webhook calls	2023-12-19 12:18:45 +01:00
Tamo	4fb25b8782	fix clippy	2023-12-19 10:35:51 +01:00
Tamo	c83a33017e	stream and chunk the data	2023-12-19 10:35:51 +01:00
Tamo	be72326c0a	gzip the tasks	2023-12-19 10:35:51 +01:00
Tamo	0b2fff27f2	update and fix the test	2023-12-19 10:35:51 +01:00
Tamo	3adbc2b942	return a task view instead of a task	2023-12-19 10:35:51 +01:00
Tamo	fbea721378	add a first working test with actixweb	2023-12-19 10:35:51 +01:00
Tamo	d78ad51082	Implement the webhook	2023-12-19 10:35:50 +01:00
Many the fish	9e1b458010	Merge branch 'main' into change-proximity-precision-settings	2023-12-18 09:08:47 +01:00
Louis Dureuil	e0cc775dc4	Various changes - DistributionShift in Search object (to be set from model in embed?) - Fix issue where embedder index wasn't computed at search time - Accept as default embedder either the "default" one, or the only embedder when there is only one	2023-12-14 16:08:41 +01:00
Louis Dureuil	922a640188	WIP multi embedders fixed template bugs	2023-12-14 16:08:41 +01:00
Louis Dureuil	abbe131084	Cosmetic change	2023-12-14 16:08:41 +01:00
Louis Dureuil	13c2c6c16b	Small commit to add hybrid search and autoembedding	2023-12-14 16:07:48 +01:00
ManyTheFish	35e1981488	Remove proximityPrecision form the experimental feature	2023-12-14 15:52:42 +01:00
Clément Renault	7e259cb0d2	Expose the --max-number-of-batched-tasks argument	2023-12-11 16:08:39 +01:00
ManyTheFish	1f4fc9c229	Make the feature experimental	2023-12-06 15:49:05 +01:00
Clément Renault	ec9b52d608	Rename copy_to_path to copy_to_file	2023-11-28 14:32:30 +01:00
Clément Renault	34c67ac389	Remove the possibility to fail fetching the env info	2023-11-28 14:31:23 +01:00
Clément Renault	0dbf1a16ff	Make clippy happy	2023-11-23 14:11:38 +01:00
Clément Renault	462b4c0080	Fix the tests	2023-11-23 12:07:35 +01:00
Clément Renault	0d4482625a	Make the changes to use heed v0.20-alpha.6	2023-11-23 11:43:58 +01:00
Clément Renault	7cb7e37ba8	Merge branch 'main' into tmp-release-v1.5.0	2023-11-21 16:30:46 +01:00
meili-bors[bot]	33b7c574ea	Merge #4090 4090: Diff indexing r=ManyTheFish a=ManyTheFish This pull request aims to reduce the indexing time by computing a difference between the data added to the index and the data removed from the index before writing in LMDB. ## Why focus on reducing the writings in LMDB? The indexing in Meilisearch is split into 3 main phases: 1) The computing or the extraction of the data (Multi-threaded) 2) The writing of the data in LMDB (Mono-threaded) 3) The processing of the prefix databases (Mono-threaded) see below: ![Capture d’écran 2023-09-28 à 20 01 45](https://github.com/meilisearch/meilisearch/assets/6482087/51513162-7c39-4244-978b-2c6b60c43a56) Because the writing is mono-threaded, it represents a bottleneck in the indexing, reducing the number of writes in LMDB will reduce the pressure on the main thread and should reduce the global time spent on the indexing. ## Give Feedback We created [a dedicated discussion](https://github.com/meilisearch/meilisearch/discussions/4196) for users to try this new feature and to give feedback on bugs or performance issues. ## Technical approach ### Part 1: merge the addition and the deletion process This part: a) Aims to reduce the time spent on indexing only the filterable/sortable fields of documents, for example: - Updating the number of "likes" or "stars" of a song or a movie - Updating the "stock count" or the "price" of a product b) Aims to reduce the time spent on writing in LMDB which should reduce the global indexing time for the highly multi-threaded machines by reducing the writing bottleneck. c) Aims to reduce the average time spent to delete documents without having to keep the soft-deleted documents implementation - [x] Create a preprocessing function that creates the diff-based documents chuck (`OBKV<fid, OBKV<AddDel, value>>`) - [x] and clearly separate the faceted fields and the searchable fields in two different chunks - Change the parameters of the input extractor by taking an `OBKV<fid, OBKV<AddDel, value>>` instead of `OBKV<fid, value>`. - [x] extract_docid_word_positions - [x] extract_geo_points - [x] extract_vector_points - [x] extract_fid_docid_facet_values - Adapt the searchable extractors to the new diff-chucks - [x] extract_fid_word_count_docids - [x] extract_word_pair_proximity_docids - [x] extract_word_position_docids - [x] extract_word_docids - Adapt the facet extractors to the new diff-chucks - [x] extract_facet_number_docids - [x] extract_facet_string_docids - [x] extract_fid_docid_facet_values - [x] FacetsUpdate - [x] Adapt the prefix database extractors ⚠️ ⚠️ - [x] Make the LMDB writer remove the document_ids to delete at the same time the new document_ids are added - [x] Remove document deletion pipeline - [x] remove `new_documents_ids` entirely and `replaced_documents_ids` - [x] reuse extracted external id from transform instead of re-extracting in `TypedChunks::Documents` - [x] Remove deletion pipeline after autobatcher - [x] remove autobatcher deletion pipeline - [x] everything uses `IndexOperation::DocumentOperation` - [x] repair deletion by internal id for filter by delete - [x] Improve the deletion via internal ids by avoiding iterating over the whole set of external document ids. - [x] Remove soft-deleted documents #### FIXME - [x] field distribution is not correctly updated after deletion - [x] missing documents in the tests of tokenizer_customization ### Part 2: Only compute the documents field by field This part aims to reduce the global indexing time for any kind of partial document modification on any size of machine from the mono-threaded one to the highly multi-threaded one. - [ ] Make the preprocessing function only send the fields that changed to the extractors - [ ] remove the `word_docids` and `exact_word_docids` database and adapt the search (⚠️ could impact the search performances) - [ ] replace the `word_pair_proximity_docids` database with a `word_pair_proximity_fid_docids` database and adapt the search (⚠️ could impact the search performances) - [ ] Adapt the prefix database extractors ⚠️ ⚠️ ## Technical Concerns - The part 1 implementation could increase the indexing time for the smallest machines (with few threads) by increasing the extracting time (multi-threaded) more than the writing time (mono-threaded) - The part 2 implementation needs to change the databases which could have a significant impact on the search performances - The prefix databases are a bit special to process and may be a pain to adapt to the difference-based indexing Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-11-21 09:44:38 +00:00
Tamo	5b57fbab08	makes the dump cancellable	2023-11-14 11:23:13 +01:00
Louis Dureuil	a2d6dc8571	Fix typo, remove caching for the change of index	2023-11-13 10:44:36 +01:00
Louis Dureuil	492fc086f0	cargo fmt	2023-11-12 21:53:11 +01:00
Louis Dureuil	a2d0c73b41	Save the currently updating index so that the search can access it at all times	2023-11-10 10:52:03 +01:00
Louis Dureuil	f8289cd974	Use it from delete-by-filter	2023-11-09 14:23:15 +01:00
Louis Dureuil	ef6fa10f7a	Remove `IndexOperation::DocumentDeletion`	2023-11-06 12:16:15 +01:00
Louis Dureuil	cbaa54cafd	Fix clippy issues	2023-11-06 11:19:31 +01:00
Clément Renault	e507ef5932	Slow the logging down	2023-11-01 13:49:32 +01:00
Clément Renault	13416ccbf7	Introduce a new meilitool to help the cloud team	2023-10-30 14:30:20 +01:00
Clément Renault	dfab6293c9	Use an LMDB database to store the external documents ids	2023-10-30 11:41:23 +01:00
Louis Dureuil	652ac3052d	use new iterator in batch	2023-10-30 11:41:22 +01:00
Louis Dureuil	c534a1b687	Stop using delete documents pipeline in batch runner	2023-10-30 11:41:22 +01:00
Louis Dureuil	cf8dad1ca0	index_scheduler.features() is no longer fallible	2023-10-23 10:38:56 +02:00
bwbonanno	dd619913da	Use RwLock to never persist cli state to db	2023-10-19 12:45:57 -07:00
bwbonanno	d8c649b3cd	Return recoverable error if we fail to retrieve metrics state	2023-10-18 08:28:24 -07:00
bwbonanno	12fc878640	Merge remote-tracking branch 'origin/main' into enable-metrics-http	2023-10-16 13:48:01 -07:00
bwbonanno	689ec7c7ad	Make the experimental route /metrics activable via HTTP	2023-10-13 22:12:54 +00:00
Clément Renault	3655d4bdca	Move the puffin file export logic into the run function	2023-10-13 13:11:30 +02:00
Clément Renault	055ca3935b	Update index-scheduler/src/batch.rs Co-authored-by: Tamo <tamo@meilisearch.com>	2023-10-13 13:11:30 +02:00
Kerollmops	bf8fac6676	Fix the tests	2023-10-13 13:11:30 +02:00
Kerollmops	f2a9e1ebbb	Improve the debugging experience in the puffin reports	2023-10-13 13:11:30 +02:00
Kerollmops	513e61e9a3	Remove the experimental CLI flag	2023-10-13 13:11:29 +02:00
Kerollmops	90a626bf80	Use the runtime feature to enable puffin report exporting	2023-10-13 13:11:29 +02:00
Kerollmops	0d4acf2daa	Fix the metrics product URL	2023-10-13 13:11:29 +02:00
Kerollmops	58db8d85ec	Add the `exportPuffinReports` option to the runtime features route	2023-10-13 13:11:29 +02:00
Clément Renault	656dadabea	Expose an experimental flag to write the puffin reports to disk	2023-10-13 13:11:09 +02:00
Tamo	34fac115d5	fix clippy	2023-09-11 17:15:57 +02:00
Tamo	9258e5b5bf	Fix the stats of the documents deletion by filter The issue was that the operation « DocumentDeletionByFilter » was not declared as an index operation. That means the indexes stats were not reprocessed after the application of the operation.	2023-09-11 14:04:10 +02:00
meili-bors[bot]	e4e49e63d0	Merge #3993 3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza Co-authored-by: irevoire <irevoire@users.noreply.github.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-08-10 14:30:02 +00:00
Tamo	fe819a9d80	fix the get stats method It was not taking into account the processing tasks at all	2023-08-08 13:21:15 +02:00
ManyTheFish	b45c36cd71	Merge branch 'main' into tmp-release-v1.3.0	2023-08-01 15:05:17 +02:00
Kerollmops	eef95de30e	First iteration on exposing puffin profiling	2023-07-18 17:38:13 +02:00
Clément Renault	22762808ab	Fix the tests	2023-07-06 12:13:29 +02:00
Clément Renault	86b834c9e4	Display the total number of tasks in the tasks route	2023-07-06 10:05:18 +02:00
meili-bors[bot]	aae099e330	Merge #3851 3851: Expose lastUpdate and isIndexing in /stats endpoint r=dureuill a=gentcys # Pull Request ## Related issue Fixes #3843 ## What does this PR do? - expose lastUpdate in `/stats` endpoint - expose isIndex in `stats` endpoint - add a method `is_task_processing` in index-scheduler/src/lib.rs. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Cong Chen <cong.chen@ocrlabs.com> Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-07-03 13:41:04 +00:00
ManyTheFish	71500a4e15	Update tests	2023-07-03 11:20:43 +02:00
Louis Dureuil	324d448236	Format let-else ❤️ 🎉	2023-07-03 10:20:28 +02:00
Cong Chen	9859e65d2f	fix tests	2023-07-01 09:32:50 +08:00
Cong Chen	3bdf01bc1c	Fix failed test	2023-06-30 17:39:23 +08:00
Cong Chen	a5a31667b0	fix converse result of is_task_processing()	2023-06-30 11:28:18 +08:00
Cong Chen	e3fc7112bc	use `RoaringBitmap::is_empty` instead	2023-06-29 11:46:47 +08:00
Kerollmops	816d7ed174	Update the Vector Store product feature link	2023-06-27 12:32:42 +02:00
Louis Dureuil	13e9b4c2e5	Add dump support	2023-06-26 16:29:43 +02:00

1 2 3 4 5 ...

516 Commits