meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-12-23 03:46:17 +08:00

Author	SHA1	Message	Date
meili-bors[bot]	d963b5f85a	Merge #3792 3792: fix the type of the document deletion by filter tasks r=dureuill a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3791 ## What does this PR do? - Hide the deleteDocumentByFilter internal type from the users. Co-authored-by: Tamo <tamo@meilisearch.com>	2023-05-30 18:20:28 +00:00
Tamo	2acc3ec5ee	fix the type of the document deletion by filter tasks	2023-05-30 15:18:52 +02:00
Kerollmops	da04edff8c	Better use deserialize_unchecked_from to reduce the deserialization time	2023-05-30 14:58:30 +02:00
Tamo	85a80f4f4c	move the grafana dashboard to the assets directory and upload a basic prometheus scraper to help new users	2023-05-29 18:39:34 +02:00
Tamo	1213ec7164	update the dashboard once again	2023-05-29 18:37:55 +02:00
Tamo	f03d99690d	run the indexing fuzzer on every merge for as long as possible	2023-05-29 14:56:15 +02:00
meili-bors[bot]	0a7817a002	Merge #3786 3786: Consistently use wrapping add to avoid overflow in debug when query s… r=dureuill a=dureuill # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3785 ## What does this PR do? - Some of the code paths would erroneously use the default addition operator that has the semantics that "overflow is an error, checked at runtime in debug" instead of the intended "overflow is expected" semantics that this code use (this code is using `u16::MAX` as a sentinel). This PR makes it so the wrapping add operator is used everywhere. Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-05-29 12:39:54 +00:00
Tamo	23a5b45ebf	drop the old fuzz file	2023-05-29 14:02:37 +02:00
Tamo	46fa99f486	make the fuzzer stops if an error occurs	2023-05-29 13:44:32 +02:00
Tamo	67a583bedf	handle the panic happening in milli	2023-05-29 13:39:26 +02:00
Tamo	99e9057684	rename the indexing fuzzer to fuzz-indexing so it doesn't collide with other binary name when being called from the root of the workspace	2023-05-29 13:07:06 +02:00
Tamo	8d40d300a5	rename the fuzzer to indexing	2023-05-29 12:37:24 +02:00
Tamo	6c6387d05e	move the fuzzer to its own crate	2023-05-29 12:27:39 +02:00
Louis Dureuil	1dfc4038ab	Add test that fails before PR and passes now	2023-05-29 11:58:26 +02:00
Louis Dureuil	73198179f1	Consistently use wrapping add to avoid overflow in debug when query starts with a separator	2023-05-29 11:54:12 +02:00
Tamo	51dce9e9d1	improve the dashboard slightly	2023-05-25 18:33:01 +02:00
Tamo	c9b65677bf	return the on disk size actually used by meilisearch	2023-05-25 18:30:30 +02:00
Tamo	35d5556f1f	prefix all the metrics by meilisearch_	2023-05-25 17:41:53 +02:00
Tamo	c433bdd1cd	add a view for the task queue in the metrics	2023-05-25 12:58:13 +02:00
curquiza	2db09725f8	Improve SDK CI to choose the Docker image	2023-05-25 12:22:35 +02:00
meili-bors[bot]	fdb23132d4	Merge #3781 3781: Revert "Improve docker cache" r=Kerollmops a=curquiza Reverts meilisearch/meilisearch#3566 because does not work as expected, and so I want to remove useless complexity from the CI and Dockerfile Co-authored-by: Clémentine U. - curqui <clementine@meilisearch.com>	2023-05-25 09:57:40 +00:00
Clémentine U. - curqui	11b95284cd	Revert "Improve docker cache"	2023-05-25 11:48:26 +02:00
Tamo	1b601f70c6	increase the bucketing of requests	2023-05-25 11:08:16 +02:00
meili-bors[bot]	8185731bbf	Merge #3779 3779: Add a cron test with disabled tokenization (with @roy9495) r=Kerollmops a=curquiza Replaces https://github.com/meilisearch/meilisearch/pull/3746 because of bors issue Co-authored-by: TATHAGATA ROY <98920199+roy9495@users.noreply.github.com> Co-authored-by: Clémentine U. - curqui <clementine@meilisearch.com>	2023-05-25 08:13:14 +00:00
Clémentine U. - curqui	840727d76f	Update .github/workflows/test-suite.yml	2023-05-25 10:07:59 +02:00
Clémentine U. - curqui	ead07d0b9d	Update .github/workflows/test-suite.yml	2023-05-25 10:07:52 +02:00
Clémentine U. - curqui	44f231d41e	Update .github/workflows/test-suite.yml	2023-05-25 10:07:45 +02:00
TATHAGATA ROY	3c5d1c93de	Added a cron test for disabled all-tokenization	2023-05-25 10:07:32 +02:00
meili-bors[bot]	087866d59f	Merge #3775 3775: Last error code changes on the new get/delete documents routes r=dureuill a=irevoire # Pull Request ## Related issue Fixes #3774 ## What does this PR do? Following the specification: https://github.com/meilisearch/specifications/pull/236 1. Get rid of the `invalid_document_delete_filter` and always use the `invalid_document_filter` 2. Introduce a new `missing_document_filter` instead of returning `invalid_document_delete_filter` (that’s consistent with all the other routes that have a mandatory parameter) 3. Always return the `original_filter` in the details (potentially set to `null`) instead of hiding it if it wasn’t used Co-authored-by: Tamo <tamo@meilisearch.com>	2023-05-24 10:07:41 +00:00
Tamo	9111f5176f	get rid of the invalid document delete filter in favor of the invalid document filter	2023-05-24 11:53:16 +02:00
Tamo	b9dd092a62	make the details return null in the originalFilter field if no filter was provided + add a big test on the details	2023-05-24 11:48:22 +02:00
Tamo	ca99bc3188	implement the missing document filter error code when deleting documents	2023-05-24 11:29:20 +02:00
Tamo	57d53de402	Increase the number of buckets	2023-05-24 10:47:15 +02:00
meili-bors[bot]	2e49d6aec1	Merge #3768 3768: Fix bugs in graph-based ranking rules + make `words` a graph-based ranking rule r=dureuill a=loiclec This PR contains three changes: ## 1. Don't call the `words` ranking rule if the term matching strategy is `All` This is because the purpose of `words` is only to remove nodes from the query graph. It would never do any useful work when the matching strategy was `All`. Remember that the universe was already computed before by computing all the docids corresponding to the "maximally reduced" query graph, which, in the case of `All`, is equal to the original graph. ## 2. The `words` ranking rule is replaced by a graph-based ranking rule. This is for three reasons: 1. performance: graph-based ranking rules benefit from a lot of optimisations by default, which ensures that they are never too slow. The previous implementation of `words` could call `compute_query_graph_docids` many times if some words had to be removed from the query, which would be quite expensive. I was especially worried about its performance in cases where it is placed right after the `sort` ranking rule. Furthermore, `compute_query_graph_docids` would clone a lot of bitmaps many times unnecessarily. 2. consistency: every other ranking rule (except `sort`) is graph-based. It makes sense to implement `words` like that as well. It will automatically benefit from all the features, optimisations, and bug fixes that all the other ranking rules get. 3. surfacing bugs: as the first ranking rule to be called (most of the time), I'd like `words` to behave the same as the other ranking rules so that we can quickly detect bugs in our graph algorithms. This actually already happened, which is why this PR also contains a bug fix. ## 3. Fix the `update_all_costs_before_nodes` function It is a bit difficult to explain what was wrong, but I'll try. The bug happened when we had graphs like: <img width="730" alt="Screenshot 2023-05-16 at 10 58 57" src="https://github.com/meilisearch/meilisearch/assets/6040237/40db1a68-d852-4e89-99d5-0d65757242a7"> and we gave the node `is` as argument. Then, we'd walk backwards from the node breadth-first. We'd update the costs of: 1. `sun` 2. `thesun` 3. `start` 4. `the` which is an incorrect order. The correct order is: 1. `sun` 2. `thesun` 3. `the` 4. `start` That is, we can only update the cost of a node when all of its successors have either already been visited or were not affected by the update to the node passed as argument. To solve this bug, I factored out the graph-traversal logic into a `traverse_breadth_first_backward` function. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-05-23 13:28:08 +00:00
Louis Dureuil	51043f78f0	Remove trailing whitespace	2023-05-23 15:27:25 +02:00
Louis Dureuil	a490a11325	Add explanatory comment on the way we're recomputing costs	2023-05-23 15:24:24 +02:00
Tamo	002f42875f	fix the fuzzer	2023-05-23 11:42:40 +02:00
Tamo	22213dc604	push the fuzzer	2023-05-23 09:14:26 +02:00
Tamo	602ad98cb8	improve the way we handle the fsts	2023-05-22 11:15:14 +02:00
Tamo	7f619ff0e4	get rids of the now unused soft_deletion_used parameter	2023-05-22 10:33:49 +02:00
Tamo	4391cba6ca	fix the addition + deletion bug	2023-05-17 18:28:57 +02:00
Tamo	d7ddf4925e	Revert "Disable autobatching of additions and deletions" This reverts commit `a94e78ffb0`.	2023-05-17 14:25:50 +02:00
meili-bors[bot]	101f5a20d2	Merge #3757 3757: Adjust the cost of edges in the `position` ranking rule by bucketing positions more aggressively r=loiclec a=loiclec This PR significantly improves the performance of the `position` ranking rule when: 1. a query contains many words 2. the `position` ranking rule needs to be called many times 3. the score of the documents according to `position` is high These conditions greatly increase: 1. the number of edge traversals that are needed to find a valid path from the `start` node to the `end` node 2. the number of edges that need to be deleted from the graph, and therefore the number of times that we need to recompute all the possible costs from START to END As a result, a majority of the search time is spent in `visit_condition`, `visit_node`, and `update_all_costs_before_node`. This is frustrating because it often happens when the "universe" given to the rule consists of only a handful of document ids. By limiting the number of possible edges between two nodes from `20` to `10`, we: 1. reduce the number of possible costs from START to END 2. reduce the number of edges that will be deleted 3. make it faster to update the costs after deleting an edge 4. reduce the number of buckets that need to be computed In terms of relevancy, I don't think we lose or gain much. We still prefer terms that are in a lower positions, with decreasing precision as we go further. The previous choice of bucketing wasn't chosen in a principled way, and neither is this one. They both "feel" right to me. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>	2023-05-17 11:43:59 +00:00
meili-bors[bot]	6ce1ce77e6	Merge #3738 3738: Add analytics on the get documents resource r=dureuill a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3737 Related spec https://github.com/meilisearch/specifications/pull/234 ## What does this PR do? Add the analytics for the following routes: - `GET` - `/indexes/:uid/documents` - `GET` - `/indexes/:uid/documents/:doc_id` - `POST` - `/indexes/:uid/documents/fetch` These analytics are aggregated between two events: - `Documents Fetched GET` - `Documents Fetched POST` That shares the same payload: Property name \| Description \| Example \| \|---------------\|-------------\|---------\| \| `requests.total_received` \| Total number of request received in this batch \| 325 \| \| `per_document_id` \| `false` \| false \| \| `per_filter` \| `true` if `POST /indexes/:indexUid/documents/fetch` endpoint was used with a filter in this batch, otherwise `false` \| false \| \| `pagination.max_limit` \| Highest value given for the `limit` parameter in this batch \| 60 \| \| `pagination.max_offset` \| Highest value given for the `offset` parameter in this batch \| 1000 \| Co-authored-by: Tamo <tamo@meilisearch.com>	2023-05-16 19:37:41 +00:00
Loïc Lecrenier	ec8f685d84	Fix bug in cheapest path algorithm	2023-05-16 17:01:30 +02:00
Loïc Lecrenier	5758268866	Don't compute split_words for phrases	2023-05-16 17:01:18 +02:00
meili-bors[bot]	4d037e6693	Merge #3759 3759: Invalid error code when parsing filters r=dureuill a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3753 ## What does this PR do? Fix the error code in case the error comes from the evaluate of the filter for the get, fetch and delete documents routes. Co-authored-by: Tamo <tamo@meilisearch.com>	2023-05-16 12:55:06 +00:00
Tamo	96da5130a4	fix the error code in case of not filterable attributes on the get / delete documents by filter routes	2023-05-16 13:56:18 +02:00
Loïc Lecrenier	3e19702de6	Update snapshot tests	2023-05-16 12:22:46 +02:00
meili-bors[bot]	1e762d151f	Merge #3755 3755: Re-add final dot r=curquiza a=ManyTheFish I removed the final dot of the error message in my last PR, this one re-adds it. related to https://github.com/meilisearch/meilisearch/pull/3749 > Oups 😬 Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-05-16 10:10:58 +00:00

... 47 48 49 50 51 ...

10451 Commits