meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-26 12:05:05 +08:00

A lightning-fast search API that fits effortlessly into your apps, websites, and workflow

Go to file

bors[bot] b08a49a16e Merge #3319 #3470 3319: Transparently resize indexes on MaxDatabaseSizeReached errors r=Kerollmops a=dureuill # Pull Request ## Related issue Related to https://github.com/meilisearch/meilisearch/discussions/3280, depends on https://github.com/meilisearch/milli/pull/760 ## What does this PR do? ### User standpoint - Meilisearch no longer fails tasks that encounter the `milli::UserError(MaxDatabaseSizeReached)` error. - Instead, these tasks are retried after increasing the maximum size allocated to the index where the failure occurred. ### Implementation standpoint - Add `Batch::index_uid` to get the `index_uid` of a batch of task if there is one - `IndexMapper::create_or_open_index` now takes an additional `size` argument that allows to (re)open indexes with a size different from the base `IndexScheduler::index_size` field - `IndexScheduler::tick` now returns a `Result<TickOutcome>` instead of a `Result<usize>`. This offers more explicit control over what the behavior should be wrt the next tick. - Add `IndexStatus::BeingResized` that contains a handle that a thread can use to await for the resize operation to complete and the index to be available again. - Add `IndexMapper::resize_index` to increase the size of an index. - In `IndexScheduler::tick`, intercept task batches that failed due to `MaxDatabaseSizeReached` and resize the index that caused the error, then request a new tick that will eventually handle the still enqueued task. ## Testing the PR The following diff can be applied to this branch to make testing the PR easier: <details> ```diff diff --git a/index-scheduler/src/index_mapper.rs b/index-scheduler/src/index_mapper.rs index 553ab45a..022b2f00 100644 --- a/index-scheduler/src/index_mapper.rs +++ b/index-scheduler/src/index_mapper.rs `@@` -228,13 +228,15 `@@` impl IndexMapper { drop(lock); + std:🧵:sleep_ms(2000); + let current_size = index.map_size()?; let closing_event = index.prepare_for_closing(); - log::info!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2); + log::error!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2); closing_event.wait(); - log::info!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2); + log::error!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2); let index_path = self.base_path.join(uuid.to_string()); let index = self.create_or_open_index(&index_path, None, 2 * current_size)?; `@@` -268,8 +270,10 `@@` impl IndexMapper { match index { Some(Available(index)) => break index, Some(BeingResized(ref resize_operation)) => { + log::error!("waiting for resize end"); // Deadlock: no lock taken while doing this operation. resize_operation.wait(); + log::error!("trying our luck again!"); continue; } Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())), diff --git a/index-scheduler/src/lib.rs b/index-scheduler/src/lib.rs index 11b17d05..242dc095 100644 --- a/index-scheduler/src/lib.rs +++ b/index-scheduler/src/lib.rs `@@` -908,6 +908,7 `@@` impl IndexScheduler { /// /// Returns the number of processed tasks. fn tick(&self) -> Result<TickOutcome> { + log::error!("ticking!"); #[cfg(test)] { *self.run_loop_iteration.write().unwrap() += 1; diff --git a/meilisearch/src/main.rs b/meilisearch/src/main.rs index 050c825a..63f312f6 100644 --- a/meilisearch/src/main.rs +++ b/meilisearch/src/main.rs `@@` -25,7 +25,7 `@@` fn setup(opt: &Opt) -> anyhow::Result<()> { #[actix_web::main] async fn main() -> anyhow::Result<()> { - let (opt, config_read_from) = Opt::try_build()?; + let (mut opt, config_read_from) = Opt::try_build()?; setup(&opt)?; `@@` -56,6 +56,8 `@@` We generated a secure master key for you (you can safely copy this token): _ => (), } + opt.max_index_size = byte_unit::Byte::from_str("1MB").unwrap(); + let (index_scheduler, auth_controller) = setup_meilisearch(&opt)?; #[cfg(all(not(debug_assertions), feature = "analytics"))] ``` </details> Mainly, these debug changes do the following: - Set the default index size to 1MiB so that index resizes are initially frequent - Turn some logs from info to error so that they can be displayed with `--log-level ERROR` (hiding the other infos) - Add a long sleep between the beginning and the end of the resize so that we can observe the `BeingResized` index status (otherwise it would never come up in my tests) ## Open questions - Is the growth factor of x2 the correct solution? For a `Vec` in memory it makes sense, but here we're manipulating quantities that are potentially in the order of 500GiBs. For bigger indexes it may make more sense to add at most e.g. 100GiB on each resize operation, avoiding big steps like 500GiB -> 1TiB. ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! 3470: Autobatch addition and deletion r=irevoire a=irevoire This PR adds the capability to meilisearch to batch document addition and deletion together. Fix https://github.com/meilisearch/meilisearch/issues/3440 -------------- Things to check before merging; - [x] What happens if we delete multiple time the same documents -> add a test - [x] If a documentDeletion gets batched with a documentAddition but the index doesn't exist yet? It should not work Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>		2023-02-20 15:00:19 +00:00
.github	Merge #3467	2023-02-20 09:27:51 +00:00
assets	Add a README to the milli crate	2023-01-16 16:25:12 +01:00
benchmarks	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
dump	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
file-store	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
filter-parser	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
flatten-serde-json	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
grafana-dashboards	Add suffix describing the unit when needed; Replace MeiliSearch by Meilisearch; Precised some metrics name	2022-08-23 17:09:27 +02:00
index-scheduler	Merge #3319 #3470	2023-02-20 15:00:19 +00:00
json-depth-checker	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
meili-snap	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
meilisearch	Merge #3515	2023-02-20 14:12:55 +00:00
meilisearch-auth	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
meilisearch-types	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
milli	Merge #3319 #3470	2023-02-20 15:00:19 +00:00
permissive-json-pointer	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
.dockerignore	import .git to docker to fix vergen	2021-07-28 19:12:40 +02:00
.gitignore	edit gitignore to ignore .idea and .vscode folders	2023-02-10 11:42:19 +04:00
.rustfmt.toml	Introduce a rustfmt file	2022-10-27 11:35:05 +02:00
bors.toml	Remove macos-latest and windows-latest usages	2022-12-20 11:10:09 +01:00
Cargo.lock	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
Cargo.toml	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
CODE_OF_CONDUCT.md	Create CODE_OF_CONDUCT.md	2020-04-30 20:16:02 +02:00
config.toml	Fixup dumps-destination -> dump-directory section header in help link	2023-01-09 13:31:57 +01:00
CONTRIBUTING.md	Update contributing.md	2023-02-16 10:53:14 +01:00
Cross.toml	Cross build with action-rs	2021-10-10 02:21:30 +08:00
Dockerfile	Change Dockerfile to also pass the VERGEN_GIT_SEMVER_LIGHTWEIGHT when building	2023-02-16 10:53:14 +01:00
download-latest.sh	Update download-latest.sh	2022-11-30 16:55:32 +01:00
LICENSE	Update LICENSE	2022-02-15 15:54:45 +01:00
README.md	Merge #3399	2023-02-01 14:34:55 +00:00
SECURITY.md	docs(security): Fix `Supported`	2022-05-31 14:21:34 -05:00

README.md

Website | Roadmap | Blog | Documentation | FAQ | Discord

⚡ A lightning-fast search engine that fits effortlessly into your apps, websites, and workflow 🔍

Meilisearch helps you shape a delightful search experience in a snap, offering features that work out-of-the-box to speed up your workflow.

🔥 Try it! 🔥

✨ Features

Search-as-you-type: find search results in less than 50 milliseconds
Typo tolerance: get relevant matches even when queries contain typos and misspellings
Filtering and faceted search: enhance your user's search experience with custom filters and build a faceted search interface in a few lines of code
Sorting: sort results based on price, date, or pretty much anything else your users need
Synonym support: configure synonyms to include more relevant content in your search results
Geosearch: filter and sort documents based on geographic data
Extensive language support: search datasets in any language, with optimized support for Chinese, Japanese, Hebrew, and languages using the Latin alphabet
Security management: control which users can access what data with API keys that allow fine-grained permissions handling
Multi-Tenancy: personalize search results for any number of application tenants
Highly Customizable: customize Meilisearch to your specific needs or use our out-of-the-box and hassle-free presets
RESTful API: integrate Meilisearch in your technical stack with our plugins and SDKs
Easy to install, deploy, and maintain

📖 Documentation

You can consult Meilisearch's documentation at https://docs.meilisearch.com.

🚀 Getting started

For basic instructions on how to set up Meilisearch, add documents to an index, and search for documents, take a look at our Quick Start guide.

You may also want to check out Meilisearch 101 for an introduction to some of Meilisearch's most popular features.

☁️ Meilisearch cloud

Let us manage your infrastructure so you can focus on integrating a great search experience. Try Meilisearch Cloud today.

🧰 SDKs & integration tools

Install one of our SDKs in your project for seamless integration between Meilisearch and your favorite language or framework!

Take a look at the complete Meilisearch integration list.

⚙️ Advanced usage

Experienced users will want to keep our API Reference close at hand.

We also offer a wide range of dedicated guides to all Meilisearch features, such as filtering, sorting, geosearch, API keys, and tenant tokens.

Finally, for more in-depth information, refer to our articles explaining fundamental Meilisearch concepts such as documents and indexes.

📊 Telemetry

Meilisearch collects anonymized data from users to help us improve our product. You can deactivate this whenever you want.

To request deletion of collected data, please write to us at privacy@meilisearch.com. Don't forget to include your Instance UID in the message, as this helps us quickly find and delete your data.

If you want to know more about the kind of data we collect and what we use it for, check the telemetry section of our documentation.

📫 Get in touch!

Meilisearch is a search engine created by Meili, a software development company based in France and with team members all over the world. Want to know more about us? Check out our blog!

🗞 Subscribe to our newsletter if you don't want to miss any updates! We promise we won't clutter your mailbox: we only send one edition every two months.

💌 Want to make a suggestion or give feedback? Here are some of the channels where you can reach us:

For feature requests, please visit our product repository
Found a bug? Open an issue!
Want to be part of our Discord community? Join us!
For everything else, please check this page listing some of the other places where you can find us

Thank you for your support!

👩‍💻 Contributing

Meilisearch is, and will always be, open-source! If you want to contribute to the project, please take a look at our contribution guidelines.

📦 Versioning

Meilisearch releases and their associated binaries are available in this GitHub page.

The binaries are versioned following SemVer conventions. To know more, read our versioning policy.

Differently from the binaries, crates in this repository are not currently available on crates.io and do not follow SemVer conventions.