meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-02-20 01:27:52 +08:00

Author	SHA1	Message	Date
Clémentine Urquizar - curqui	94d8484581	Update Dockerfile Co-authored-by: Markus Machatschek <markus.machatschek@hey.com>	2023-03-06 14:39:20 +01:00
Clémentine Urquizar - curqui	5333edd1db	Update Dockerfile Co-authored-by: Markus Machatschek <markus.machatschek@hey.com>	2023-02-23 17:39:31 +01:00
curquiza	bddf3f96e6	Use Debian instead of Alpine in Dockerfile prototype-debian-docker-image-0	2023-02-20 19:38:22 +01:00
bors[bot]	1e9ac00800	Merge #3505 3505: Csv delimiter r=irevoire a=irevoire Fixes https://github.com/meilisearch/meilisearch/issues/3442 Closes https://github.com/meilisearch/meilisearch/pull/2803 Specified in https://github.com/meilisearch/specifications/pull/221 This PR is a reimplementation of https://github.com/meilisearch/meilisearch/pull/2803, on the new engine. Thanks for your idea and initial PR `@MixusMinimax;` sorry I couldn’t update/merge your PR. Way too many changes happened on the engine in the meantime. Attention to reviewer; I had to update deserr to implement the support of deserializing `char`s ------- It introduces four new error messages; - Invalid value in parameter csvDelimiter: expected a string of one character, but found an empty string - Invalid value in parameter csvDelimiter: expected a string of one character, but found the following string of 5 characters: doggo - csv delimiter must be an ascii character. Found: 🍰 - The Content-Type application/json does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type text/csv. And one error code; - `invalid_index_csv_delimiter` The `invalid_content_type` error code is now also used when we encounter the `csvDelimiter` query parameter with a non-csv content type. Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-20 17:01:36 +00:00
bors[bot]	b08a49a16e	Merge #3319 #3470 3319: Transparently resize indexes on MaxDatabaseSizeReached errors r=Kerollmops a=dureuill # Pull Request ## Related issue Related to https://github.com/meilisearch/meilisearch/discussions/3280, depends on https://github.com/meilisearch/milli/pull/760 ## What does this PR do? ### User standpoint - Meilisearch no longer fails tasks that encounter the `milli::UserError(MaxDatabaseSizeReached)` error. - Instead, these tasks are retried after increasing the maximum size allocated to the index where the failure occurred. ### Implementation standpoint - Add `Batch::index_uid` to get the `index_uid` of a batch of task if there is one - `IndexMapper::create_or_open_index` now takes an additional `size` argument that allows to (re)open indexes with a size different from the base `IndexScheduler::index_size` field - `IndexScheduler::tick` now returns a `Result<TickOutcome>` instead of a `Result<usize>`. This offers more explicit control over what the behavior should be wrt the next tick. - Add `IndexStatus::BeingResized` that contains a handle that a thread can use to await for the resize operation to complete and the index to be available again. - Add `IndexMapper::resize_index` to increase the size of an index. - In `IndexScheduler::tick`, intercept task batches that failed due to `MaxDatabaseSizeReached` and resize the index that caused the error, then request a new tick that will eventually handle the still enqueued task. ## Testing the PR The following diff can be applied to this branch to make testing the PR easier: <details> ```diff diff --git a/index-scheduler/src/index_mapper.rs b/index-scheduler/src/index_mapper.rs index 553ab45a..022b2f00 100644 --- a/index-scheduler/src/index_mapper.rs +++ b/index-scheduler/src/index_mapper.rs `@@` -228,13 +228,15 `@@` impl IndexMapper { drop(lock); + std:🧵:sleep_ms(2000); + let current_size = index.map_size()?; let closing_event = index.prepare_for_closing(); - log::info!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2); + log::error!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2); closing_event.wait(); - log::info!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2); + log::error!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2); let index_path = self.base_path.join(uuid.to_string()); let index = self.create_or_open_index(&index_path, None, 2 * current_size)?; `@@` -268,8 +270,10 `@@` impl IndexMapper { match index { Some(Available(index)) => break index, Some(BeingResized(ref resize_operation)) => { + log::error!("waiting for resize end"); // Deadlock: no lock taken while doing this operation. resize_operation.wait(); + log::error!("trying our luck again!"); continue; } Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())), diff --git a/index-scheduler/src/lib.rs b/index-scheduler/src/lib.rs index 11b17d05..242dc095 100644 --- a/index-scheduler/src/lib.rs +++ b/index-scheduler/src/lib.rs `@@` -908,6 +908,7 `@@` impl IndexScheduler { /// /// Returns the number of processed tasks. fn tick(&self) -> Result<TickOutcome> { + log::error!("ticking!"); #[cfg(test)] { *self.run_loop_iteration.write().unwrap() += 1; diff --git a/meilisearch/src/main.rs b/meilisearch/src/main.rs index 050c825a..63f312f6 100644 --- a/meilisearch/src/main.rs +++ b/meilisearch/src/main.rs `@@` -25,7 +25,7 `@@` fn setup(opt: &Opt) -> anyhow::Result<()> { #[actix_web::main] async fn main() -> anyhow::Result<()> { - let (opt, config_read_from) = Opt::try_build()?; + let (mut opt, config_read_from) = Opt::try_build()?; setup(&opt)?; `@@` -56,6 +56,8 `@@` We generated a secure master key for you (you can safely copy this token): _ => (), } + opt.max_index_size = byte_unit::Byte::from_str("1MB").unwrap(); + let (index_scheduler, auth_controller) = setup_meilisearch(&opt)?; #[cfg(all(not(debug_assertions), feature = "analytics"))] ``` </details> Mainly, these debug changes do the following: - Set the default index size to 1MiB so that index resizes are initially frequent - Turn some logs from info to error so that they can be displayed with `--log-level ERROR` (hiding the other infos) - Add a long sleep between the beginning and the end of the resize so that we can observe the `BeingResized` index status (otherwise it would never come up in my tests) ## Open questions - Is the growth factor of x2 the correct solution? For a `Vec` in memory it makes sense, but here we're manipulating quantities that are potentially in the order of 500GiBs. For bigger indexes it may make more sense to add at most e.g. 100GiB on each resize operation, avoiding big steps like 500GiB -> 1TiB. ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! 3470: Autobatch addition and deletion r=irevoire a=irevoire This PR adds the capability to meilisearch to batch document addition and deletion together. Fix https://github.com/meilisearch/meilisearch/issues/3440 -------------- Things to check before merging; - [x] What happens if we delete multiple time the same documents -> add a test - [x] If a documentDeletion gets batched with a documentAddition but the index doesn't exist yet? It should not work Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-20 15:00:19 +00:00
bors[bot]	a8f6f108e0	Merge #3515 3515: Consider null as a valid geo field r=irevoire a=irevoire Fix #3497 Associated spec; https://github.com/meilisearch/specifications/pull/222 Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-20 14:12:55 +00:00
Tamo	1479050f7a	apply review suggestions	2023-02-20 14:53:37 +01:00
bors[bot]	97b8c32e22	Merge #3514 3514: Bump version of mini-dashboard to v0.2.6 r=irevoire a=bidoubiwa Update the version of the mini-dashboard to v0.2.6. See [release notes](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.6). Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>	2023-02-20 13:21:00 +00:00
Louis Dureuil	35f6c624bc	Make sure we don't leave the in memory hashmap in an inconsistent state	2023-02-20 13:55:32 +01:00
Louis Dureuil	1116788475	Resize indexes when they're full	2023-02-20 13:55:32 +01:00
Louis Dureuil	951a5b5832	Add IndexMapper::resize_index fn	2023-02-20 13:55:32 +01:00
Louis Dureuil	1c670d7fa0	Add IndexStatus::BeingResized	2023-02-20 13:55:32 +01:00
Louis Dureuil	6cc3797aa1	IndexScheduler::tick returns a TickOutcome	2023-02-20 13:55:31 +01:00
Louis Dureuil	faf1e17a27	`create_or_open_index` takes a `map_size` argument	2023-02-20 13:55:31 +01:00
Louis Dureuil	4c519c2ab3	Add Batch::index_uid	2023-02-20 13:55:31 +01:00
Charlotte Vermandel	dd120e0e16	Bump version of mini-dashboard to v0.2.6	2023-02-20 13:45:57 +01:00
Tamo	18796d6e6a	Consider null as a valid geo object	2023-02-20 13:45:51 +01:00
bors[bot]	c91bfeaf15	Merge #3467 3467: Identify builds git tagged with `prototype-...` in CLI and analytics r=curquiza a=dureuill # Pull Request ## What does this PR do? - Parses the last git tag to extract a prototype name if: - Current build uses the prototype tag (not after the tag) precisely - The prototype tag name respects the following conditions: 1. starts with `prototype-` 2. ends with a number 3. the hyphen-separated segment right before the number is not a number (required to reject commits after the tag). - Display the prototype name in the launch summary in the CLI - Send the prototype name to analytics if any - Update prototypes instructions in CONTRIBUTING.md \|`VERGEN_GIT_SEMVER_LIGHTWEIGHT` value \| Prototype \| \|---\|---\| \| `Some("prototype-geo-bounding-box-0-139-gcde89018")` \| `None` (does not end with a number) \| \| `Some("prototype-geo-bounding-box-0-139-89018")` \| `None` (before the last segment is a number) \| \| `Some("prototype-geo-bounding-box-0")` \| `Some("prototype-geo-bounding-box-0")` \| \| `Some("prototype-geo-bounding-box")` \| `None` (does not end with a number") \| \| `Some("geo-bounding-box-0")` \| `None` (does not start with "prototype") \| \| `None` \| `None` \| Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-02-20 09:27:51 +00:00
bors[bot]	28961b2ad1	Merge #3499 3499: Use the workspace inheritance r=Kerollmops a=irevoire Use the workspace inheritance [introduced in rust 1.64](https://blog.rust-lang.org/2022/09/22/Rust-1.64.0.html#cargo-improvements-workspace-inheritance-and-multi-target-builds). It allows us to define the version of meilisearch once in the main `Cargo.toml` and let all the other `Cargo.toml` uses this version. `@curquiza` I added you as a reviewer because I had to patch some CI scripts And `@Kerollmops,` I had to bump the `cargo_toml` crates because our version was getting old and didn't support the feature yet. Also, in another PR, I would like to unify some of our dependencies to ensure we always stay in sync between all our crates. Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-17 09:52:29 +00:00
Tamo	895ab2906c	apply review suggestions	2023-02-16 18:42:47 +01:00
Tamo	f11c7d4b62	cargo run execute meilisearch by default	2023-02-16 18:03:45 +01:00
Tamo	e79f6f87f6	make cargo fmt&clippy happy	2023-02-16 18:00:40 +01:00
Tamo	5367d8f05a	add two tests on the indexing of csvs	2023-02-16 17:37:11 +01:00
Tamo	52686da028	test various error on the document ressource	2023-02-16 17:37:10 +01:00
Tamo	8c074f5028	implements the csv delimiter without tests Co-authored-by: Maxi Barmetler <maxi.barmetler@gmail.com>	2023-02-16 17:35:36 +01:00
Louis Dureuil	49e18da23e	Do not escape tag name $() syntax is not interpreted by the Dockerfile	2023-02-16 10:53:14 +01:00
Louis Dureuil	54240db495	Add note in code so one does not forget next time	2023-02-16 10:53:14 +01:00
Louis Dureuil	e1ed4bc750	Change Dockerfile to also pass the VERGEN_GIT_SEMVER_LIGHTWEIGHT when building	2023-02-16 10:53:14 +01:00
Louis Dureuil	9bd1cfb3a3	Ignore -dirty flag	2023-02-16 10:53:14 +01:00
Louis Dureuil	a341c94871	Update contributing.md	2023-02-16 10:53:14 +01:00
Louis Dureuil	f46cf46b8c	Add prototype to analytics if any	2023-02-16 10:53:14 +01:00
Louis Dureuil	c3a30a5a91	If using a prototype, display its name at Meilisearch startup	2023-02-16 10:53:14 +01:00
bors[bot]	143e3cf948	Merge #3490 3490: Fix attributes set candidates r=curquiza a=ManyTheFish # Pull Request Fix attributes set candidates for v1.1.0 ## details The attribute criterion was not returning the remaining candidates when its internal algorithm was been exhausted. We had a loss of candidates by the attribute criterion leading to the bug reported in the issue linked below. After some investigation, it seems that it was the only criterion that had this behavior. We are now returning the remaining candidates instead of an empty bitmap. ## Related issue Fixes #3483 PR on milli for v1.0.1: https://github.com/meilisearch/milli/pull/777 Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-02-15 17:38:07 +00:00
Tamo	ab2adba183	update our CI scripts accordingly	2023-02-15 13:56:24 +01:00
Tamo	74d1a67a99	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
bors[bot]	91ce8a5e67	Merge #3492 3492: Bump deserr r=Kerollmops a=irevoire Bump deserr to the latest version; - We now use the default actix-web extractors that deserr provides (which were copy/pasted from meilisearch) - We also use the default `JsonError` message provided by deserr instead of defining our own in meilisearch - Finally, we get the new `did you mean?` error message. Fix #3493 Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-15 10:05:05 +00:00
bors[bot]	fd7ae1883b	Merge #3495 3495: Add tests with rust nightly in CI r=curquiza a=ztkmkoo # Pull Request ## Related issue Fixes #3402 ## What does this PR do? - add ci test with rust nightly - make test with rust stable not run on schedule event ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Kebron <ztkmkoo@gmail.com>	2023-02-15 07:53:17 +00:00
Tamo	42a3cdca66	get rids of the unwrap_any function in favor of take_cf_content	2023-02-14 20:06:31 +01:00
Tamo	a43765d454	use the pre-defined deserr extractors	2023-02-14 20:05:30 +01:00
Tamo	769576fd94	get rids of the whole error_message module since it has been integrated into the last version of deserr	2023-02-14 20:05:27 +01:00
Tamo	8fb7b1d10f	bump deserr	2023-02-14 20:04:30 +01:00
bors[bot]	d494c29768	Merge #3479 3479: Unify "Bad latitude" & "Bad longitude" errors r=irevoire a=cymruu # Pull Request ## Related issue Fix part of #3006 ## What does this PR do? - Moved out `BadGeoLat`, `BadGeoLng`, `BadGeoBoundingBoxTopIsBelowBottom` from `FilterError` into newly introduced error type `ParseGeoError`. - Renamed `BadGeo` error to `ReservedGeo` - Used new `ParseGeoError` type in `FilterError` and `AscDescError` Screenshot: ![image](https://user-images.githubusercontent.com/2981598/217927231-fe23b6a3-2ea8-4145-98af-38eb61c4ff16.png) I ran `cargo test --package milli -- --test-threads 1` and tests passed. `--test-threads` was set to 1 because my OS complained about too many opened files. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Co-authored-by: Filip Bachul <filipbachul@gmail.com> Co-authored-by: filip <filipbachul@gmail.com>	2023-02-14 18:35:51 +00:00
Tamo	74dcfe9676	Fix a bug when you update a document that was already present in the db, deleted and then inserted again in the same transform	2023-02-14 19:09:40 +01:00
Tamo	1b1703a609	make a small optimization to merge obkvs a little bit faster	2023-02-14 18:32:41 +01:00
Tamo	fb5e4957a6	fix and test the early exit in case a grenad ends with a deletion	2023-02-14 18:23:57 +01:00
Tamo	8de3c9f737	Update milli/src/update/index_documents/transform.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-02-14 17:57:14 +01:00
Tamo	43a19d0709	document the operation enum + the grenads	2023-02-14 17:55:26 +01:00
Tamo	29d14bed90	get rids of the let/else syntax	2023-02-14 17:45:46 +01:00
bors[bot]	f3b54337f9	Merge #3174 3174: Allow wildcards at the end of index names for API Keys and Tenant tokens r=irevoire a=Kerollmops This PR introduces the wildcards at the end of the index names when identifying indexes in the API Keys and tenant tokens. It fixes #2788 and fixes #2908. This PR is based on `@akhildevelops'` work. Note that when a tenant token filter is chosen to restrict a search, it is always the most restrictive pattern that is chosen. If we have an index pattern _prod_ that defines _filter1_ and _p_ that defines _filter2_, the engine will choose _filter1_ over _filter2_ as it is defined for a most restrictive pattern, _prod_. This restrictiveness is defined by 1. is it exact, without __ 2. the length of the pattern. It is a continuation of work that has already started and should close #2869. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-02-14 16:12:01 +00:00
Clément Renault	7f3ae40204	Remove a useless comment regarding the index pattern error code	2023-02-14 17:09:20 +01:00

1 2 3 4 5 ...

7425 Commits