meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-30 00:55:00 +08:00

Author	SHA1	Message	Date
Tamo	2acc3ec5ee	fix the type of the document deletion by filter tasks	2023-05-30 15:18:52 +02:00
Tamo	c9b65677bf	return the on disk size actually used by meilisearch	2023-05-25 18:30:30 +02:00
Tamo	c433bdd1cd	add a view for the task queue in the metrics	2023-05-25 12:58:13 +02:00
Tamo	4391cba6ca	fix the addition + deletion bug	2023-05-17 18:28:57 +02:00
Tamo	d7ddf4925e	Revert "Disable autobatching of additions and deletions" This reverts commit `a94e78ffb0`.	2023-05-17 14:25:50 +02:00
Tamo	96da5130a4	fix the error code in case of not filterable attributes on the get / delete documents by filter routes	2023-05-16 13:56:18 +02:00
Clément Renault	13f870e993	Fix typos and documentation issues	2023-05-15 15:11:45 +02:00
Kerollmops	f759ec7fad	Expose a flag to enable the MDB_WRITEMAP flag	2023-05-15 11:38:43 +02:00
Kerollmops	c4a40e7110	Use the writemap flag to reduce the memory usage	2023-05-15 10:15:33 +02:00
meili-bors[bot]	a95128df6b	Merge #3550 3550: Delete documents by filter r=irevoire a=dureuill # Prototype `prototype-delete-by-filter-0` Usage: A new route is available under `POST /indexes/{index_uid}/documents/delete` that allows you to delete your documents by filter. The expected payload looks like that: ```json { "filter": "doggo = bernese", } ``` It'll then enqueue a task in your task queue that'll delete all the documents matching this filter once it's processed. Here is an example of the associated details; ```json "details": { "deletedDocuments": 53, "originalFilter": "\"doggo = bernese\"" } ``` ---------- # Pull Request ## Related issue Related to https://github.com/meilisearch/meilisearch/issues/3477 ## What does this PR do? ### User standpoint - Modifies the `/indexes/{:indexUid}/documents/delete-batch` route to accept either the existing array of documents ids, or a JSON object with a `filter` field representing a filter to apply. If that latter variant is used, any document matching the filter will be deleted. ### Implementation standpoint - (processing time version) Adds a new BatchKind that is not autobatchable and that performs the delete by filter - Reuse the `documentDeletion` task with a new `originalFilter` detail that replaces the `providedIds` detail. ## Example <details> <summary>Sample request, response and task result</summary> Request: ``` curl \ -X POST 'http://localhost:7700/indexes/index-10/documents/delete-batch' \ -H 'Content-Type: application/json' \ --data-binary '{ "filter" : "mass = 600"}' ``` Response: ``` { "taskUid": 3902, "indexUid": "index-10", "status": "enqueued", "type": "documentDeletion", "enqueuedAt": "2023-02-28T20:50:31.667502Z" } ``` Task log: ```json { "uid": 3906, "indexUid": "index-12", "status": "succeeded", "type": "documentDeletion", "canceledBy": null, "details": { "deletedDocuments": 3, "originalFilter": "\"mass = 600\"" }, "error": null, "duration": "PT0.001819S", "enqueuedAt": "2023-03-07T08:57:20.11387Z", "startedAt": "2023-03-07T08:57:20.115895Z", "finishedAt": "2023-03-07T08:57:20.117714Z" } ``` </details> ## Draft status - [ ] Error handling - [ ] Analytics - [ ] Do we want to reuse the `delete-batch` route in this way, or create a new route instead? - [ ] Should the filter be applied at request time or when the deletion task is processed? - The first commit in this PR applies the filter at request time, meaning that even if a document is modified in a way that no longer matches the filter in a later update, it will be deleted as long as the deletion task is processed after that update. - The other commits in this PR apply the filter only when the asynchronous deletion task is processed, meaning that documents that match the filter at processing time are deleted even if they didn't match the filter at request time. - [ ] If keeping the filter at request time, find a more elegant way to recover the user document ids from the internal document ids. The current way implemented in the first commit of this PR involves getting all the documents matching the filter, looking for the value of their primary key, and turning it into a string by copy-pasting routines found in milli... - [ ] Security consideration, if any - [ ] Fix the tests (but waiting until product questions are resolved) - [ ] Add delete by filter specific tests Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2023-05-04 10:44:41 +00:00
meili-bors[bot]	da220294f6	Merge #3639 3639: Add a dedicated error variant for planned failures in index scheduler tests r=Kerollmops a=Sufflope # Pull Request ## Related issue Fixes #3086 ## What does this PR do? - Add a dedicated test variant in test cfg to avoid reusing a misleading existing error ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Jean-Sébastien Bour <jean-sebastien@bour.name>	2023-05-04 09:33:57 +00:00
Louis Dureuil	d8381eb790	Fix originalFilter	2023-05-04 10:07:59 +02:00
Louis Dureuil	b212aef5db	add one nanosecond to generated filter so as to generate a filter that would have matched the last task to delete	2023-05-04 09:56:48 +02:00
Louis Dureuil	52ab114f6c	Fix test on macOS: 50 tasks would result in the test consistently failing on a local macOS	2023-05-04 00:06:49 +02:00
Tamo	dcbfecf42c	make the generated filter valid	2023-05-04 00:06:49 +02:00
Tamo	9ca6f59546	Update index-scheduler/src/lib.rs Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-05-04 00:06:49 +02:00
Tamo	aa7537a11e	make the autodeletion work with a fixed number of tasks and update the tests	2023-05-04 00:06:49 +02:00
Tamo	972bb2831c	log when meilisearch need to delete tasks	2023-05-04 00:06:49 +02:00
Tamo	f9ddd32545	implement the auto-deletion of tasks	2023-05-04 00:06:49 +02:00
Tamo	0f0cd2d929	handle the array of array form of filter in the dumps	2023-05-03 17:41:50 +02:00
Tamo	6df2ba93a9	remove one useless txn	2023-05-03 17:41:49 +02:00
Louis Dureuil	3680a6bf1e	extract impl to a function	2023-05-03 17:41:49 +02:00
Louis Dureuil	732c52093d	Processing time without autobatching implementation	2023-05-03 17:41:48 +02:00
Jean-Sébastien Bour	d09b771bce	Add a dedicated error variant for planned failures in index scheduler tests Fixes #3086	2023-05-02 14:37:20 +02:00
Tamo	0b2200e6e7	remove the unused snapshot files	2023-04-25 17:55:27 +02:00
Kerollmops	a109802d45	Upgrade the incompatible versions of the dependencies	2023-04-24 17:50:57 +02:00
Kerollmops	47b66e49b8	Upgrade the compatible versions of the dependencies	2023-04-24 17:50:52 +02:00
bors[bot]	654a3a9e19	Merge #3688 3688: Following release v1.1.1: bring back changes into `main` r=curquiza a=curquiza `@meilisearch/engine-team` ensure the changes we bring to `main` are the ones you want Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: dureuill <dureuill@users.noreply.github.com>	2023-04-24 11:38:23 +00:00
Louis Dureuil	fd583501d7	Use non_free_pages_size instead of real_disk_size to check task db space taken	2023-04-13 17:07:44 +02:00
bors[bot]	f9960be115	Merge #3659 3659: stops receiving tasks once the task queue is full r=Kerollmops a=irevoire Give 20GiB to the task queue + once 50% of the task queue is used, it blocks itself and only receives task deletion requests to ensure we never get in a state where we can’t do anything. Also, create a new error message when we reach this case: ``` Meilisearch cannot receive write operations because the size limit of the tasks database has been reached. Please delete tasks to continue performing write operations. ``` Co-authored-by: Tamo <tamo@meilisearch.com>	2023-04-13 09:11:12 +00:00
Tamo	b4fabce36d	update the error message + update the task db size to 20GiB with a limit at 50%	2023-04-12 18:54:11 +02:00
Tamo	be69ab320d	stops receiving tasks once the task queue is full	2023-04-12 18:54:11 +02:00
Louis Dureuil	a94e78ffb0	Disable autobatching of additions and deletions	2023-04-12 10:53:00 +02:00
Tamo	4d308d5237	Improve the health route by ensuring lmdb is not down And refactorize slightly the auth controller.	2023-04-06 15:31:42 +02:00
Tamo	597d57bf1d	Merge branch 'main' into bring-back-changes-v1.1.0	2023-04-05 11:32:14 +02:00
Tamo	3fb67f94f7	Reduce the time to import a dump by caching some datas With this commit, for a dump containing 1M tasks we went form 1m02 to 6s	2023-03-29 14:44:15 +02:00
Tamo	cf5145b542	Reduce the time to import a dump With this commit, for a dump containing 1M tasks we went from 3m36s to import the task queue down to 1m02s	2023-03-29 14:27:40 +02:00
Tamo	a2b151e877	ensure that the task queue is correctly imported reduce the size of the snapshots file	2023-03-21 14:41:46 +01:00
bors[bot]	667bb87e35	Merge #3541 3541: Add cache on the indexes stats r=dureuill a=irevoire Fix https://github.com/meilisearch/meilisearch/issues/3540 Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-03-09 13:32:52 +00:00
Louis Dureuil	7faa9a22f6	Pass IndexStat by ref in store_stats_of	2023-03-07 14:00:54 +01:00
Louis Dureuil	76288fad72	Fix snapshots	2023-03-06 16:57:31 +01:00
Louis Dureuil	076a3d371c	Eagerly compute stats as fallback to the cache. - Refactor all around to avoid spawning indexes more times than necessary	2023-03-06 16:57:31 +01:00
Tamo	3bbf760542	update most snapshots	2023-03-06 16:57:31 +01:00
Tamo	fd5c48941a	Add cache on the indexes stats	2023-03-06 16:57:31 +01:00
Tamo	e704728ee7	fix the snapshots permissions on unix system	2023-03-06 16:28:40 +01:00
Louis Dureuil	0202ff8ab4	Attempt to use default budget for faster startup	2023-02-28 10:55:43 +01:00
Louis Dureuil	71e7900c67	move index_map to file	2023-02-23 11:29:11 +01:00
Louis Dureuil	431782f3ee	Move index_mapper to mod.rs	2023-02-23 11:29:11 +01:00
Louis Dureuil	3db613ff77	Don't iterate all indexes manually	2023-02-23 11:29:09 +01:00
Louis Dureuil	5822764be9	Skip computing index budget in tests	2023-02-23 11:23:39 +01:00
Louis Dureuil	a529bf160c	Compute budget	2023-02-23 11:23:39 +01:00
Louis Dureuil	f1119f2dc2	Add dichotomic search to utils	2023-02-23 11:23:39 +01:00
Louis Dureuil	1db7d5d851	Add basic tests for index eviction and resize	2023-02-23 11:23:39 +01:00
Louis Dureuil	80b060f920	Use LRU cache	2023-02-23 11:23:39 +01:00
Louis Dureuil	fdf043580c	Add LruMap	2023-02-23 11:23:38 +01:00
Louis Dureuil	42577403d8	Authentication: Directly pass the authfilter to the index scheduler	2023-02-22 16:35:52 +01:00
bors[bot]	b08a49a16e	Merge #3319 #3470 3319: Transparently resize indexes on MaxDatabaseSizeReached errors r=Kerollmops a=dureuill # Pull Request ## Related issue Related to https://github.com/meilisearch/meilisearch/discussions/3280, depends on https://github.com/meilisearch/milli/pull/760 ## What does this PR do? ### User standpoint - Meilisearch no longer fails tasks that encounter the `milli::UserError(MaxDatabaseSizeReached)` error. - Instead, these tasks are retried after increasing the maximum size allocated to the index where the failure occurred. ### Implementation standpoint - Add `Batch::index_uid` to get the `index_uid` of a batch of task if there is one - `IndexMapper::create_or_open_index` now takes an additional `size` argument that allows to (re)open indexes with a size different from the base `IndexScheduler::index_size` field - `IndexScheduler::tick` now returns a `Result<TickOutcome>` instead of a `Result<usize>`. This offers more explicit control over what the behavior should be wrt the next tick. - Add `IndexStatus::BeingResized` that contains a handle that a thread can use to await for the resize operation to complete and the index to be available again. - Add `IndexMapper::resize_index` to increase the size of an index. - In `IndexScheduler::tick`, intercept task batches that failed due to `MaxDatabaseSizeReached` and resize the index that caused the error, then request a new tick that will eventually handle the still enqueued task. ## Testing the PR The following diff can be applied to this branch to make testing the PR easier: <details> ```diff diff --git a/index-scheduler/src/index_mapper.rs b/index-scheduler/src/index_mapper.rs index 553ab45a..022b2f00 100644 --- a/index-scheduler/src/index_mapper.rs +++ b/index-scheduler/src/index_mapper.rs `@@` -228,13 +228,15 `@@` impl IndexMapper { drop(lock); + std:🧵:sleep_ms(2000); + let current_size = index.map_size()?; let closing_event = index.prepare_for_closing(); - log::info!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2); + log::error!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2); closing_event.wait(); - log::info!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2); + log::error!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2); let index_path = self.base_path.join(uuid.to_string()); let index = self.create_or_open_index(&index_path, None, 2 * current_size)?; `@@` -268,8 +270,10 `@@` impl IndexMapper { match index { Some(Available(index)) => break index, Some(BeingResized(ref resize_operation)) => { + log::error!("waiting for resize end"); // Deadlock: no lock taken while doing this operation. resize_operation.wait(); + log::error!("trying our luck again!"); continue; } Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())), diff --git a/index-scheduler/src/lib.rs b/index-scheduler/src/lib.rs index 11b17d05..242dc095 100644 --- a/index-scheduler/src/lib.rs +++ b/index-scheduler/src/lib.rs `@@` -908,6 +908,7 `@@` impl IndexScheduler { /// /// Returns the number of processed tasks. fn tick(&self) -> Result<TickOutcome> { + log::error!("ticking!"); #[cfg(test)] { *self.run_loop_iteration.write().unwrap() += 1; diff --git a/meilisearch/src/main.rs b/meilisearch/src/main.rs index 050c825a..63f312f6 100644 --- a/meilisearch/src/main.rs +++ b/meilisearch/src/main.rs `@@` -25,7 +25,7 `@@` fn setup(opt: &Opt) -> anyhow::Result<()> { #[actix_web::main] async fn main() -> anyhow::Result<()> { - let (opt, config_read_from) = Opt::try_build()?; + let (mut opt, config_read_from) = Opt::try_build()?; setup(&opt)?; `@@` -56,6 +56,8 `@@` We generated a secure master key for you (you can safely copy this token): _ => (), } + opt.max_index_size = byte_unit::Byte::from_str("1MB").unwrap(); + let (index_scheduler, auth_controller) = setup_meilisearch(&opt)?; #[cfg(all(not(debug_assertions), feature = "analytics"))] ``` </details> Mainly, these debug changes do the following: - Set the default index size to 1MiB so that index resizes are initially frequent - Turn some logs from info to error so that they can be displayed with `--log-level ERROR` (hiding the other infos) - Add a long sleep between the beginning and the end of the resize so that we can observe the `BeingResized` index status (otherwise it would never come up in my tests) ## Open questions - Is the growth factor of x2 the correct solution? For a `Vec` in memory it makes sense, but here we're manipulating quantities that are potentially in the order of 500GiBs. For bigger indexes it may make more sense to add at most e.g. 100GiB on each resize operation, avoiding big steps like 500GiB -> 1TiB. ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! 3470: Autobatch addition and deletion r=irevoire a=irevoire This PR adds the capability to meilisearch to batch document addition and deletion together. Fix https://github.com/meilisearch/meilisearch/issues/3440 -------------- Things to check before merging; - [x] What happens if we delete multiple time the same documents -> add a test - [x] If a documentDeletion gets batched with a documentAddition but the index doesn't exist yet? It should not work Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-20 15:00:19 +00:00
Louis Dureuil	35f6c624bc	Make sure we don't leave the in memory hashmap in an inconsistent state	2023-02-20 13:55:32 +01:00
Louis Dureuil	1116788475	Resize indexes when they're full	2023-02-20 13:55:32 +01:00
Louis Dureuil	951a5b5832	Add IndexMapper::resize_index fn	2023-02-20 13:55:32 +01:00
Louis Dureuil	1c670d7fa0	Add IndexStatus::BeingResized	2023-02-20 13:55:32 +01:00
Louis Dureuil	6cc3797aa1	IndexScheduler::tick returns a TickOutcome	2023-02-20 13:55:31 +01:00
Louis Dureuil	faf1e17a27	`create_or_open_index` takes a `map_size` argument	2023-02-20 13:55:31 +01:00
Louis Dureuil	4c519c2ab3	Add Batch::index_uid	2023-02-20 13:55:31 +01:00
Tamo	74d1a67a99	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
Tamo	29d14bed90	get rids of the let/else syntax	2023-02-14 17:45:46 +01:00
Clément Renault	4570d5bf3a	Merge remote-tracking branch 'origin/main' into temp-wildcard	2023-02-09 13:14:05 +01:00
Tamo	eaad84bd1d	fix the test to handle the document deletion correctly	2023-02-09 11:29:13 +01:00
Tamo	ea9ac46f28	stop autobatching the deletion without the index creation right with the addition	2023-02-08 21:24:27 +01:00
Tamo	93f130a400	fix all warnings	2023-02-08 20:57:35 +01:00
Tamo	860c993ef7	Handle the autobatching of deletion and addition in the scheduler	2023-02-08 20:53:19 +01:00
Tamo	67dda0678f	cleanup the autobatcher a little bit	2023-02-08 18:10:59 +01:00
Tamo	2db6347686	update the autobatcher to batch the addition and deletion together	2023-02-08 18:07:59 +01:00
Kerollmops	a36b1dbd70	Fix the tasks with the new patterns	2023-02-01 18:21:45 +01:00
Louis Dureuil	924d5d4c11	clippy: remove needless lifetimes	2023-01-31 10:40:48 +01:00
Tamo	a858531574	apply review comments	2023-01-25 14:51:36 +01:00
Tamo	bf94f89035	Update index-scheduler/src/lib.rs Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-01-25 11:31:50 +01:00
Tamo	3bcff60d1c	makes clippy happy	2023-01-25 11:31:48 +01:00
Tamo	c92948b143	Compute the size of the auth-controller, index-scheduler and all update files in the global stats	2023-01-25 11:25:02 +01:00
Tamo	c7b2e3be87	apply review comments	2023-01-24 17:54:43 +01:00
Tamo	ea3b269b77	reformat	2023-01-23 23:59:34 +01:00
Tamo	a4be4c49e8	Update index-scheduler/src/batch.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-01-23 23:58:03 +01:00
Tamo	7d1ebb7295	add test on the autobatcher layer	2023-01-23 20:56:12 +01:00
Tamo	767cb725a5	reimplement the batching of task with or without primary key in the autobatcher	2023-01-23 20:18:22 +01:00
Tamo	5672118bfa	When adding documents, trying to update the primary-key now throw an error While updating the test suite I also noticed an issue with the indexed_documents value of failed task and had to update it. I also named a bunch of snapshots that had no name sorry 😬	2023-01-23 17:32:13 +01:00
Louis Dureuil	72e2b220ed	Fix tests	2023-01-19 15:48:20 +01:00
Tamo	e8e7070cc6	improve the error message when no task filter are specified for the cancelation or deletion of tasks	2023-01-19 12:42:08 +01:00
bors[bot]	3e5b3df487	Merge #3370 #3373 #3375 3370: make the swap indexes not found errors return an IndexNotFound error-code r=irevoire a=irevoire Fix https://github.com/meilisearch/meilisearch/issues/3368 3373: fix a wrong error code and add tests on the document resource r=irevoire a=irevoire Fix https://github.com/meilisearch/meilisearch/issues/3371 3375: Avoid deleting all task invalid canceled by r=irevoire a=Kerollmops Fixes #3369 by making sure that at least one `canceledBy` task filter parameter matches something. Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-01-18 15:21:11 +00:00
Kerollmops	e89973f1bf	Do not delete all tasks when no canceled-by matches	2023-01-18 15:50:46 +01:00
Tamo	57da80900d	make the swap indexes not found errors return an IndexNotFound error code	2023-01-18 14:16:00 +01:00
Loïc Lecrenier	2bc2e99ff3	Simplify declaration of the error codes	2023-01-11 19:08:39 +01:00
Tamo	e706628bb1	fix the error code of the swap index route	2023-01-06 14:48:25 +01:00
Tamo	50ce0409bc	Integrate deserr on the most important routes	2023-01-05 20:48:29 +01:00
Loïc Lecrenier	2d74678b51	Replace underscores with hyphens in doc link to error code	2023-01-05 10:09:02 +01:00
Louis Dureuil	233372abea	Remove `--max-index-size` and `--max-task-db-size`	2023-01-04 17:20:01 +01:00
amab8901	9a39c4e40d	Get date from IndexMetaData	2022-12-22 11:46:17 +01:00
amab8901	0893b175dc	Merge branch 'main' into 2983-forward-date-to-milli	2022-12-21 14:31:19 +01:00
amab8901	d5978d11e1	Refactor	2022-12-21 14:28:00 +01:00
Tamo	d8fb506c92	handle most io error instead of tagging everything as an internal	2022-12-19 20:50:40 +01:00
amab8901	aa03e02fdc	Apply Rustfmt	2022-12-19 19:24:56 +01:00

1 2 3 4 5 ...

417 Commits