meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 18:45:06 +08:00

Author	SHA1	Message	Date
Loïc Lecrenier	dd12d44134	Support swapped word pairs in new proximity ranking rule impl	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	c8e251bf24	Remove noise in codebase	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	a938fbde4a	Use a cache when resolving the query graph	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	dcf3f1d18a	Remove EdgeIndex and NodeIndex types, prefer u32 instead	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	66d0c63694	Add some documentation and use bitmaps instead of hashmaps when possible	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	132191360b	Introduce the sort ranking rule working with the new search structures	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	345c99d5bd	Introduce the words ranking rule working with the new search structures	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	89d696c1e3	Introduce the proximity ranking rule as a graph-based ranking rule	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	c645853529	Introduce a generic graph-based ranking rule	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	a70ab8b072	Introduce a function to find the K shortest paths in a graph	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	48aae76b15	Introduce a function to find the docids of a set of paths in a graph	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	23bf572dea	Introduce cache structures used with ranking rule graphs	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	864f6410ed	Introduce a structure to represent a set of graph paths efficiently	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	c9bf6bb2fa	Introduce a structure to implement ranking rules with graph algorithms	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	46249ea901	Implement a function to find a QueryGraph's docids	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	ce0d1e0e13	Introduce a common way to manage the coordination between ranking rules	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	5065d8b0c1	Introduce a DatabaseCache to memorize the addresses of LMDB values	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	a83007c013	Introduce structure to represent search queries as graphs	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	79e0a6dd4e	Introduce a new search module, eventually meant to replace the old one The code here does not compile, because I am merely splitting one giant commit into smaller ones where each commit explains a single file.	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	2d88089129	Remove unused term matching strategies	2023-03-20 09:41:55 +01:00
Loïc Lecrenier	6c659dc12f	Use MiMalloc in milli tests	2023-03-20 09:41:37 +01:00
Tamo	0f33a65468	makes kero happy	2023-03-13 16:51:11 +01:00
Tamo	eddefb0e0f	refactor the error type of the milli::document thing silence a warning	2023-03-09 13:03:14 +01:00
Tamo	c5f22be6e1	add boolean support for csv documents	2023-03-09 11:12:49 +01:00
bors[bot]	4f1ccbc495	Merge #3525 3525: Fix phrase search containing stop words r=ManyTheFish a=ManyTheFish # Summary A search with a phrase containing only stop words was returning an HTTP error 500, this PR filters the phrase containing only stop words dropping them before the search starts, a query with a phrase containing only stop words now behaves like a placeholder search. fixes https://github.com/meilisearch/meilisearch/issues/3521 related v1.0.2 PR on milli: https://github.com/meilisearch/milli/pull/779 Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-03-02 10:55:37 +00:00
ManyTheFish	37489fd495	Return an internal error in the case of matching word is invalid	2023-03-01 19:05:16 +01:00
Louis Dureuil	5822764be9	Skip computing index budget in tests	2023-02-23 11:23:39 +01:00
bors[bot]	ac5a1e4c4b	Merge #3423 3423: Add min and max facet stats r=dureuill a=dureuill # Pull Request ## Related issue Fixes #3426 ## What does this PR do? ### User standpoint - When using a `facets` parameter in search, the facets that have numeric values are displayed in a new section of the response called `facetStats` that contains, per facet, the numeric min and max value of the hits returned by the search. <details> <summary> Sample request/response </summary> ```json ❯ curl \ -X POST 'http://localhost:7700/indexes/meteorites/search?facets=mass' \ -H 'Content-Type: application/json' \ --data-binary '{ "q": "LL6", "facets":["mass", "recclass"], "limit": 5 }' \| jsonxf { "hits": [ { "name": "Niger (LL6)", "id": "16975", "nametype": "Valid", "recclass": "LL6", "mass": 3.3, "fall": "Fell" }, { "name": "Appley Bridge", "id": "2318", "nametype": "Valid", "recclass": "LL6", "mass": 15000, "fall": "Fell", "_geo": { "lat": 53.58333, "lng": -2.71667 } }, { "name": "Athens", "id": "4885", "nametype": "Valid", "recclass": "LL6", "mass": 265, "fall": "Fell", "_geo": { "lat": 34.75, "lng": -87.0 } }, { "name": "Bandong", "id": "4935", "nametype": "Valid", "recclass": "LL6", "mass": 11500, "fall": "Fell", "_geo": { "lat": -6.91667, "lng": 107.6 } }, { "name": "Benguerir", "id": "30443", "nametype": "Valid", "recclass": "LL6", "mass": 25000, "fall": "Fell", "_geo": { "lat": 32.25, "lng": -8.15 } } ], "query": "LL6", "processingTimeMs": 15, "limit": 5, "offset": 0, "estimatedTotalHits": 42, "facetDistribution": { "mass": { "110000": 1, "11500": 1, "1161": 1, "12000": 1, "1215.5": 1, "127000": 1, "15000": 1, "1676": 1, "1700": 1, "1710.5": 1, "18000": 1, "19000": 1, "220000": 1, "2220": 1, "22300": 1, "25000": 2, "265": 1, "271000": 1, "2840": 1, "3.3": 1, "3000": 1, "303": 1, "32000": 1, "34000": 1, "36.1": 1, "45000": 1, "460": 1, "478": 1, "483": 1, "5500": 2, "600": 1, "6000": 1, "67.8": 1, "678": 1, "680.5": 1, "6930": 1, "8": 1, "8300": 1, "840": 1, "8400": 1 }, "recclass": { "L/LL6": 3, "LL6": 39 } }, "facetStats": { "mass": { "min": 3.3, "max": 271000.0 } } } ``` </details> ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-02-22 13:06:43 +00:00
ManyTheFish	900bae3d9d	keep phrases that has at least one word	2023-02-21 18:16:51 +01:00
ManyTheFish	bbecab8948	fix clippy	2023-02-21 10:18:44 +01:00
ManyTheFish	8aa808d51b	Merge branch 'main' into enhance-language-detection	2023-02-20 18:14:34 +01:00
bors[bot]	b08a49a16e	Merge #3319 #3470 3319: Transparently resize indexes on MaxDatabaseSizeReached errors r=Kerollmops a=dureuill # Pull Request ## Related issue Related to https://github.com/meilisearch/meilisearch/discussions/3280, depends on https://github.com/meilisearch/milli/pull/760 ## What does this PR do? ### User standpoint - Meilisearch no longer fails tasks that encounter the `milli::UserError(MaxDatabaseSizeReached)` error. - Instead, these tasks are retried after increasing the maximum size allocated to the index where the failure occurred. ### Implementation standpoint - Add `Batch::index_uid` to get the `index_uid` of a batch of task if there is one - `IndexMapper::create_or_open_index` now takes an additional `size` argument that allows to (re)open indexes with a size different from the base `IndexScheduler::index_size` field - `IndexScheduler::tick` now returns a `Result<TickOutcome>` instead of a `Result<usize>`. This offers more explicit control over what the behavior should be wrt the next tick. - Add `IndexStatus::BeingResized` that contains a handle that a thread can use to await for the resize operation to complete and the index to be available again. - Add `IndexMapper::resize_index` to increase the size of an index. - In `IndexScheduler::tick`, intercept task batches that failed due to `MaxDatabaseSizeReached` and resize the index that caused the error, then request a new tick that will eventually handle the still enqueued task. ## Testing the PR The following diff can be applied to this branch to make testing the PR easier: <details> ```diff diff --git a/index-scheduler/src/index_mapper.rs b/index-scheduler/src/index_mapper.rs index 553ab45a..022b2f00 100644 --- a/index-scheduler/src/index_mapper.rs +++ b/index-scheduler/src/index_mapper.rs `@@` -228,13 +228,15 `@@` impl IndexMapper { drop(lock); + std:🧵:sleep_ms(2000); + let current_size = index.map_size()?; let closing_event = index.prepare_for_closing(); - log::info!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2); + log::error!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2); closing_event.wait(); - log::info!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2); + log::error!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2); let index_path = self.base_path.join(uuid.to_string()); let index = self.create_or_open_index(&index_path, None, 2 * current_size)?; `@@` -268,8 +270,10 `@@` impl IndexMapper { match index { Some(Available(index)) => break index, Some(BeingResized(ref resize_operation)) => { + log::error!("waiting for resize end"); // Deadlock: no lock taken while doing this operation. resize_operation.wait(); + log::error!("trying our luck again!"); continue; } Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())), diff --git a/index-scheduler/src/lib.rs b/index-scheduler/src/lib.rs index 11b17d05..242dc095 100644 --- a/index-scheduler/src/lib.rs +++ b/index-scheduler/src/lib.rs `@@` -908,6 +908,7 `@@` impl IndexScheduler { /// /// Returns the number of processed tasks. fn tick(&self) -> Result<TickOutcome> { + log::error!("ticking!"); #[cfg(test)] { *self.run_loop_iteration.write().unwrap() += 1; diff --git a/meilisearch/src/main.rs b/meilisearch/src/main.rs index 050c825a..63f312f6 100644 --- a/meilisearch/src/main.rs +++ b/meilisearch/src/main.rs `@@` -25,7 +25,7 `@@` fn setup(opt: &Opt) -> anyhow::Result<()> { #[actix_web::main] async fn main() -> anyhow::Result<()> { - let (opt, config_read_from) = Opt::try_build()?; + let (mut opt, config_read_from) = Opt::try_build()?; setup(&opt)?; `@@` -56,6 +56,8 `@@` We generated a secure master key for you (you can safely copy this token): _ => (), } + opt.max_index_size = byte_unit::Byte::from_str("1MB").unwrap(); + let (index_scheduler, auth_controller) = setup_meilisearch(&opt)?; #[cfg(all(not(debug_assertions), feature = "analytics"))] ``` </details> Mainly, these debug changes do the following: - Set the default index size to 1MiB so that index resizes are initially frequent - Turn some logs from info to error so that they can be displayed with `--log-level ERROR` (hiding the other infos) - Add a long sleep between the beginning and the end of the resize so that we can observe the `BeingResized` index status (otherwise it would never come up in my tests) ## Open questions - Is the growth factor of x2 the correct solution? For a `Vec` in memory it makes sense, but here we're manipulating quantities that are potentially in the order of 500GiBs. For bigger indexes it may make more sense to add at most e.g. 100GiB on each resize operation, avoiding big steps like 500GiB -> 1TiB. ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! 3470: Autobatch addition and deletion r=irevoire a=irevoire This PR adds the capability to meilisearch to batch document addition and deletion together. Fix https://github.com/meilisearch/meilisearch/issues/3440 -------------- Things to check before merging; - [x] What happens if we delete multiple time the same documents -> add a test - [x] If a documentDeletion gets batched with a documentAddition but the index doesn't exist yet? It should not work Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-20 15:00:19 +00:00
Many the fish	119e6d8811	Update milli/src/search/mod.rs Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-20 15:33:10 +01:00
Louis Dureuil	eb28d4c525	add facet test	2023-02-20 13:52:28 +01:00
Louis Dureuil	9ac981d025	Remove some clippy type complexity warns by deboxing iters	2023-02-20 13:52:27 +01:00
Louis Dureuil	74859ecd61	Add min and max facet stats	2023-02-20 13:52:27 +01:00
Louis Dureuil	8ae441a4db	Update usage of iterators	2023-02-20 13:52:27 +01:00
Louis Dureuil	042d86cbb3	facet sort ascending/descending now also return the values	2023-02-20 13:52:27 +01:00
Tamo	18796d6e6a	Consider null as a valid geo object	2023-02-20 13:45:51 +01:00
Tamo	895ab2906c	apply review suggestions	2023-02-16 18:42:47 +01:00
bors[bot]	143e3cf948	Merge #3490 3490: Fix attributes set candidates r=curquiza a=ManyTheFish # Pull Request Fix attributes set candidates for v1.1.0 ## details The attribute criterion was not returning the remaining candidates when its internal algorithm was been exhausted. We had a loss of candidates by the attribute criterion leading to the bug reported in the issue linked below. After some investigation, it seems that it was the only criterion that had this behavior. We are now returning the remaining candidates instead of an empty bitmap. ## Related issue Fixes #3483 PR on milli for v1.0.1: https://github.com/meilisearch/milli/pull/777 Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-02-15 17:38:07 +00:00
bors[bot]	91ce8a5e67	Merge #3492 3492: Bump deserr r=Kerollmops a=irevoire Bump deserr to the latest version; - We now use the default actix-web extractors that deserr provides (which were copy/pasted from meilisearch) - We also use the default `JsonError` message provided by deserr instead of defining our own in meilisearch - Finally, we get the new `did you mean?` error message. Fix #3493 Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-15 10:05:05 +00:00
Tamo	8fb7b1d10f	bump deserr	2023-02-14 20:04:30 +01:00
Tamo	74dcfe9676	Fix a bug when you update a document that was already present in the db, deleted and then inserted again in the same transform	2023-02-14 19:09:40 +01:00
Tamo	1b1703a609	make a small optimization to merge obkvs a little bit faster	2023-02-14 18:32:41 +01:00
Tamo	fb5e4957a6	fix and test the early exit in case a grenad ends with a deletion	2023-02-14 18:23:57 +01:00
Tamo	8de3c9f737	Update milli/src/update/index_documents/transform.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-02-14 17:57:14 +01:00
Tamo	43a19d0709	document the operation enum + the grenads	2023-02-14 17:55:26 +01:00
Filip Bachul	a53536836b	fmt	2023-02-14 17:04:22 +01:00
Filip Bachul	d7ad39ad77	fix: clippy error	2023-02-14 00:15:35 +01:00

1 2 3 4 5 ...

1168 Commits