meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-03-14 04:41:32 +08:00

Author	SHA1	Message	Date
meili-bors[bot]	e0537c3870	Merge #3720 3720: Change links of docs everywhere r=curquiza a=curquiza Completely fixes #3668 Co-authored-by: curquiza <clementine@meilisearch.com>	2023-05-04 10:07:41 +00:00
meili-bors[bot]	da220294f6	Merge #3639 3639: Add a dedicated error variant for planned failures in index scheduler tests r=Kerollmops a=Sufflope # Pull Request ## Related issue Fixes #3086 ## What does this PR do? - Add a dedicated test variant in test cfg to avoid reusing a misleading existing error ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Jean-Sébastien Bour <jean-sebastien@bour.name>	2023-05-04 09:33:57 +00:00
meili-bors[bot]	78e611f282	Merge #3693 3693: Implement the auto deletion of tasks r=dureuill a=irevoire Fixes https://github.com/meilisearch/meilisearch/issues/3622 This PR should be the definite fix for #3622. It adds a limit (1M) to the maximum number of tasks the task queue can hold. Once the task queue reaches this limit (1M of tasks are in the task queue, whatever their status is), meilisearch will schedule a task deletion that tries to delete the oldest 100k tasks. If meilisearch can't delete 100k tasks because some of them are not yet finished, it will delete as many tasks as possible. Once the limit is reached, you're still able to register new tasks. The engine will only stop you from adding new tasks once [the other hard limit](https://github.com/meilisearch/meilisearch/pull/3659) of 10GiB of tasks is reached (that's between 5M and 15M of tasks depending on your workflow). ------- Technically; - We only try to schedule our task deletion when calling the tick function but before creating a new batch. This means we never enqueue a task we're not going to process ~right away. - If our task deletion doesn't delete anything, we don't enqueue it and log a warn the user that the engine is not working properly Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-05-04 08:30:22 +00:00
Louis Dureuil	d8381eb790	Fix originalFilter	2023-05-04 10:07:59 +02:00
Louis Dureuil	b212aef5db	add one nanosecond to generated filter so as to generate a filter that would have matched the last task to delete	2023-05-04 09:56:48 +02:00
meili-bors[bot]	6bf66f35be	Merge #3721 3721: Use new bors URL of our self hosted bors instance r=curquiza a=curquiza Co-authored-by: curquiza <clementine@meilisearch.com>	2023-05-04 07:53:39 +00:00
Louis Dureuil	52ab114f6c	Fix test on macOS: 50 tasks would result in the test consistently failing on a local macOS	2023-05-04 00:06:49 +02:00
Tamo	dcbfecf42c	make the generated filter valid	2023-05-04 00:06:49 +02:00
Tamo	9ca6f59546	Update index-scheduler/src/lib.rs Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-05-04 00:06:49 +02:00
Tamo	aa7537a11e	make the autodeletion work with a fixed number of tasks and update the tests	2023-05-04 00:06:49 +02:00
Tamo	972bb2831c	log when meilisearch need to delete tasks	2023-05-04 00:06:49 +02:00
Tamo	f9ddd32545	implement the auto-deletion of tasks	2023-05-04 00:06:49 +02:00
curquiza	30edba3497	Update links of the docs	2023-05-03 19:14:57 +02:00
meili-bors[bot]	1afde4fea5	Merge #3542 3542: Refactor of the search algorithms r=dureuill a=loiclec This PR refactors a large part of the search logic (related to https://github.com/meilisearch/meilisearch/issues/3547) - The "query tree" is replaced by a "query graph", which describes the different ways in which the search query can be interpreted and precomputes the word derivations for each query term. Example: <img width="1162" alt="Screenshot 2023-02-27 at 10 26 50" src="https://user-images.githubusercontent.com/6040237/221525270-87917cc0-60d1-473f-847f-2c5a7de9e370.png"> - The control flow between the ~criterions~ ranking rules is managed in a single place instead of being independently implemented by each ranking rule. - The set of document candidates is determined greedily from the beginning. It is often referred as the "universe" in the code. - The ranking rules `proximity`, `attribute`, `typo`, and (maybe) `exactness` are or will be implemented using a K-shortest path graph algorithm. This minimises the number of database and bitmap operations we need to do to compute each ranking rule bucket. It also simplifies the code a lot since a lot of ranking rules will share a large part of their implementation. - Pointers to database values are stored in a cache to avoid searching in the LMDB databases needlessly. - The result of some roaring bitmap operations are also stored in a cache, although we'll need to measure the memory pressure this puts on the system and maybe deactivate this cache later on. - Search requests can be visually logged and debugged in tests. TODO: - [ ] Reintroduce search benchmarks - [x] Implement `disableOnWords` and `disableOnAttributes` settings of typo tolerance - [x] Implement "exhaustive number of hits - [x] Implement `attribute` ranking rule - [x] Indexing changes: split into `word_fid_docids` and `word_position_docids` (with bucketed position) - [x] Ranking rule implementations - [ ] Implement `exactness` ranking rule - [x] Initial implementation - [ ] Correct implementation when followed by `Words` - [ ] Implement `geosort` ranking rule - [ ] Add tests - [x] Typo tolerance `disableOnWords`/`disableOnAttributes` - [ ] Geosort - [x] Exactness - [ ] Attribute/Position - [ ] Interactions between ranking rules: - [x] Typo/Proximity/Attribute not preceded by Words - [x] Exactness not preceded by Words - [x] Exactness -> Words (+ check universe correctness) - [x] Exactness -> Typo, etc. - [ ] Sort -> Words (performance tests) - [ ] Attribute/Position -> Typo - [ ] Attribute/Position -> Proximity - [x] Typo -> Exactness - [x] Typo -> Proximity - [x] Proximity -> Typo - [x] Words - [x] Typo - [x] Proximity - [x] Sort - [x] Ngrams - [x] Split words - [x] Ngram + Split Words - [x] Term matching strategy - [x] Distinct attribute - [x] Phrase Search - [x] Placeholder search - [x] Highlighter - [x] Limit the number of word derivations in a search query - [x] Compute the initial universe correctly according to the terms matching strategy - [x] Implement placeholder search - [x] Get the list of ranking rules from the settings - [x] Implement `distinct` - [x] Determine what to do when one of `attribute`, `proximity`, `typo`, or `exactness` is placed before `words` - [x] Make sure the correct number of allowed typos is used for each word, including the prefix one - [x] Make sure stop words are treated correctly (e.g. correct position in query graph), including in phrases - [x] Support phrases correctly - [x] Support synonyms - [x] Support split words - [x] Support combination of ngram + split-words (e.g. `whiteh orse` -> `"white horse"`) - [x] Implement `typo` ranking rule - [x] Implement `sort` ranking rule - [x] Use existing `Search` interface to use the new search algorithms - [x] Remove old code Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2023-05-03 13:42:51 +00:00
Louis Dureuil	f8f190cd40	Update exactness tests following charabia camelCase tokenization	2023-05-03 14:45:09 +02:00
Louis Dureuil	3a408e8287	Increase map size for tests following charabia camelCase tokenization	2023-05-03 14:44:48 +02:00
Louis Dureuil	d3e5b10e23	fix nb of dbs	2023-05-03 14:11:20 +02:00
Louis Dureuil	1aaf24ccbf	Cargo fmt	2023-05-03 12:21:58 +02:00
Louis Dureuil	90bc230820	Merge remote-tracking branch 'origin/main' into search-refactor Conflicts \| resolution ----------\|----------- Cargo.lock \| added mimalloc Cargo.toml \| took origin/main version milli/src/search/criteria/exactness.rs \| deleted after checking it was only clippy changes milli/src/search/query_tree.rs \| deleted after checking it was only clippy changes	2023-05-03 12:19:06 +02:00
Louis Dureuil	342c4ff85d	geosort: Remove rtree unwrap	2023-05-03 09:52:16 +02:00
Tamo	c85392ce40	make the descendent geosort fast	2023-05-03 09:13:12 +02:00
Tamo	8875d24a48	deserialize the rtree only when its needed, and keep it in memory once it has been deserialized	2023-05-03 09:13:12 +02:00
Tamo	c470b67fa2	revamp the test to use execute_iterative_and_rtree_returns_the_same	2023-05-03 09:13:12 +02:00
meili-bors[bot]	c0e081cd98	Merge #3702 #3710 3702: Update charabia v0.7.2 r=curquiza a=ManyTheFish fixes #3701 fixes #3689 fixes #3285 3710: Updated messages pointing to the docs website r=curquiza a=roy9495 # Pull Request Fixes partially #3668 ## What does this PR do? - ...Any messages referencing this docs site https://docs.meilisearch.com has been changed to this docs site https://meilisearch.com/docs . Thanks. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: TATHAGATA ROY <98920199+roy9495@users.noreply.github.com>	2023-05-02 17:27:57 +00:00
Louis Dureuil	b60840ebff	Remove self.iterating from words	2023-05-02 18:54:23 +02:00
Louis Dureuil	fdc1763838	Use MultiOps for resolve_query_graph	2023-05-02 18:54:09 +02:00
Louis Dureuil	75819bc940	Remove too many arguments on resolve_maximally_reduced_query_graph	2023-05-02 18:53:40 +02:00
Louis Dureuil	7b8cc25625	rename located_query_terms_from_string -> located_query_terms_from_tokens	2023-05-02 18:53:01 +02:00
meili-bors[bot]	2be641f373	Merge #3718 3718: Fix broken README links r=curquiza a=Kerollmops This PR fixes #3708 by changing the link to the new SDKs and API Reference pages. I would like to thank `@Tommy-42,` who also found the issue. Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-05-02 16:23:38 +00:00
curquiza	ddcb661c19	Use new bors URL of our self hosted instance	2023-05-02 18:20:12 +02:00
Jean-Sébastien Bour	d09b771bce	Add a dedicated error variant for planned failures in index scheduler tests Fixes #3086	2023-05-02 14:37:20 +02:00
Clément Renault	d89d2efb7e	Change a the text of a link	2023-05-02 13:53:36 +02:00
Clément Renault	f284a9c0dd	Fix the README.md broken links	2023-05-02 13:51:50 +02:00
bors[bot]	134e7fc433	Merge #3709 3709: Add SDKs test in a CI r=Kerollmops a=curquiza Add a CI running every week to run the `nightly` docker image of Meilisearch with the most "strategic" SDKs (most used, well tested, strongly typed SDK) - meilisearch-js - instant-meilisearch - meilisearch-php - meilisearch-python - meilisearch-go - meilisearch-ruby - meilisearch-rust Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2023-05-02 11:22:09 +00:00
Clémentine Urquizar	0cba919228	Add SDKs test in a CI	2023-05-02 11:53:28 +02:00
Loïc Lecrenier	aa63091752	Fix bug in exact_attribute	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	58735d6d8f	Fix outdated relevancy test	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	1b514517f5	Fix bug in computation of query term at a position	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	11f814821d	Minor cleanup	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	30fb1153cc	Speed up graph based ranking rule when a lot of different costs exist	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	3b2c8b9f25	Improve performance of position rr	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	2a7f9adf78	Build query graph more correctly from paths Update snapshots	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	608ceea440	Fix bug in position rr	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	79001b9c97	Improve performance of the cheapest path finder algorithm	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	59b12fca87	Fix errors, clippy warnings, and add review comments	2023-04-29 11:48:11 +02:00
Loïc Lecrenier	48f5bb1693	Implements the geo-sort ranking rule	2023-04-29 11:02:16 +02:00
Loïc Lecrenier	93188b3c88	Fix indexing of word_prefix_fid_docids	2023-04-29 10:56:48 +02:00
Loïc Lecrenier	bc4efca611	Add more tests for the attribute ranking rule	2023-04-29 10:56:48 +02:00
TATHAGATA ROY	feaf25a95d	Updated messages pointing to the docs website	2023-04-28 20:52:03 +00:00
bors[bot]	414b3fae89	Merge #3571 3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops # Pull Request ## Related issue This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884) is describing in a very detailed manner. ## What does this PR do? ### `IS NULL` and `IS NOT NULL` This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter. - `IS NULL` matches fields that `EXISTS` AND `= IS NULL` - `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL` 1. `{"name": "A", "price": null}` 2. `{"name": "A", "price": 10}` 3. `{"name": "A"}` `price IS NULL` would match 1 `price IS NOT NULL` or `NOT price IS NULL` would match 2,3 `price EXISTS` would match 1, 2 `price NOT EXISTS` or `NOT price EXISTS` would match 3 common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2 ### `IS EMPTY` and `IS NOT EMPTY` - `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty - `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty. 1. `{"name": "A", "tags": null}` 2. `{"name": "A", "tags": [null]}` 3. `{"name": "A", "tags": []}` 4. `{"name": "A", "tags": ["hello","world"]}` 5. `{"name": "A", "tags": [""]}` 6. `{"name": "A"}` 7. `{"name": "A", "tags": {}}` 8. `{"name": "A", "tags": {"t1":"v1"}}` 9. `{"name": "A", "tags": {"t1":""}}` 10. `{"name": "A", "tags": ""}` `tags IS EMPTY` would match 3,7,10 `tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9 `tags IS NULL` would match 1 `tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10 `tags EXISTS` would match 1,2,3,4,5,7,8,9,10 `tags NOT EXISTS` or `NOT tags EXISTS` would match 6 common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9 ## What should the reviewer do? - Check that I tested the filters - Check that I deleted the ids of the documents when deleting documents Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-04-27 13:14:00 +00:00

1 2 3 4 5 ...

7937 Commits