Commit Graph

7923 Commits

Author SHA1 Message Date
Louis Dureuil
05cc463fbc
Draft implementation of filter support for /delete-by-batch route 2023-05-03 17:41:48 +02:00
meili-bors[bot]
1afde4fea5
Merge #3542
3542: Refactor of the search algorithms r=dureuill a=loiclec

This PR refactors a large part of the search logic (related to https://github.com/meilisearch/meilisearch/issues/3547)

- The "query tree" is replaced by a "query graph", which describes the different ways in which the search query can be interpreted and precomputes the word derivations for each query term. Example:

<img width="1162" alt="Screenshot 2023-02-27 at 10 26 50" src="https://user-images.githubusercontent.com/6040237/221525270-87917cc0-60d1-473f-847f-2c5a7de9e370.png">

- The control flow between the ~criterions~ ranking rules is managed in a single place instead of being independently implemented by each ranking rule.

- The set of document candidates is determined greedily from the beginning. It is often referred as the "universe" in the code.

- The ranking rules  `proximity`, `attribute`, `typo`, and (maybe) `exactness` are or will be implemented using a K-shortest path graph algorithm. This minimises the number of database and bitmap operations we need to do to compute each ranking rule bucket. It also simplifies the code a lot since a lot of ranking rules will share a large part of their implementation.

- Pointers to database values are stored in a cache to avoid searching in the LMDB databases needlessly.

- The result of some roaring bitmap operations are also stored in a cache, although we'll need to measure the memory pressure this puts on the system and maybe deactivate this cache later on.

- Search requests can be visually logged and debugged in tests.

TODO:
- [ ] Reintroduce search benchmarks
- [x] Implement `disableOnWords` and `disableOnAttributes` settings of typo tolerance
- [x] Implement "exhaustive number of hits
- [x] Implement `attribute` ranking rule
   - [x] Indexing changes: split into `word_fid_docids` and `word_position_docids` (with bucketed position)
   - [x] Ranking rule implementations
- [ ] Implement `exactness` ranking rule
  - [x] Initial implementation
  - [ ] Correct implementation when followed by `Words`
- [ ] Implement `geosort` ranking rule
- [ ] Add tests
   - [x] Typo tolerance `disableOnWords`/`disableOnAttributes`
   - [ ] Geosort
   - [x] Exactness
   - [ ] Attribute/Position
   - [ ] Interactions between ranking rules:
     - [x] Typo/Proximity/Attribute not preceded by Words
     - [x] Exactness not preceded by Words
     - [x] Exactness -> Words (+ check universe correctness)
     - [x] Exactness -> Typo, etc.
     - [ ] Sort -> Words (performance tests)
     - [ ] Attribute/Position -> Typo
     - [ ] Attribute/Position -> Proximity
     - [x] Typo -> Exactness 
     - [x] Typo -> Proximity
     - [x] Proximity -> Typo
   - [x] Words 
   - [x] Typo
   - [x] Proximity
   - [x] Sort
   - [x] Ngrams
   - [x] Split words
   - [x] Ngram + Split Words
   - [x] Term matching strategy
   - [x] Distinct attribute
   - [x] Phrase Search
   - [x] Placeholder search
   - [x] Highlighter 
- [x] Limit the number of word derivations in a search query
- [x] Compute the initial universe correctly according to the terms matching strategy
- [x] Implement placeholder search
- [x] Get the list of ranking rules from the settings 
- [x] Implement `distinct`
- [x] Determine what to do when one of `attribute`, `proximity`, `typo`, or `exactness` is placed before `words`
- [x] Make sure the correct number of allowed typos is used for each word, including the prefix one
- [x] Make sure stop words are treated correctly (e.g. correct position in query graph), including in phrases
- [x] Support phrases correctly
- [x] Support synonyms
- [x] Support split words
- [x] Support combination of ngram + split-words (e.g. `whiteh orse` -> `"white horse"`)
- [x] Implement `typo` ranking rule
- [x] Implement `sort` ranking rule
- [x] Use existing `Search` interface to use the new search algorithms
- [x] Remove old code


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-05-03 13:42:51 +00:00
Louis Dureuil
f8f190cd40
Update exactness tests following charabia camelCase tokenization 2023-05-03 14:45:09 +02:00
Louis Dureuil
3a408e8287
Increase map size for tests following charabia camelCase tokenization 2023-05-03 14:44:48 +02:00
Louis Dureuil
d3e5b10e23
fix nb of dbs 2023-05-03 14:11:20 +02:00
Louis Dureuil
1aaf24ccbf
Cargo fmt 2023-05-03 12:21:58 +02:00
Louis Dureuil
90bc230820
Merge remote-tracking branch 'origin/main' into search-refactor
Conflicts | resolution
----------|-----------
Cargo.lock | added mimalloc
Cargo.toml |  took origin/main version
milli/src/search/criteria/exactness.rs | deleted after checking it was only clippy changes
milli/src/search/query_tree.rs | deleted after checking it was only clippy changes
2023-05-03 12:19:06 +02:00
Louis Dureuil
342c4ff85d
geosort: Remove rtree unwrap 2023-05-03 09:52:16 +02:00
Tamo
c85392ce40
make the descendent geosort fast 2023-05-03 09:13:12 +02:00
Tamo
8875d24a48
deserialize the rtree only when its needed, and keep it in memory once it has been deserialized 2023-05-03 09:13:12 +02:00
Tamo
c470b67fa2
revamp the test to use execute_iterative_and_rtree_returns_the_same 2023-05-03 09:13:12 +02:00
meili-bors[bot]
c0e081cd98
Merge #3702 #3710
3702: Update charabia v0.7.2 r=curquiza a=ManyTheFish

fixes #3701
fixes #3689
fixes #3285 

3710: Updated messages pointing to the docs website r=curquiza a=roy9495

# Pull Request

Fixes partially #3668

## What does this PR do?
- ...Any messages referencing this docs site https://docs.meilisearch.com has been changed to this docs site https://meilisearch.com/docs .
 Thanks.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: TATHAGATA ROY <98920199+roy9495@users.noreply.github.com>
2023-05-02 17:27:57 +00:00
Louis Dureuil
b60840ebff
Remove self.iterating from words 2023-05-02 18:54:23 +02:00
Louis Dureuil
fdc1763838
Use MultiOps for resolve_query_graph 2023-05-02 18:54:09 +02:00
Louis Dureuil
75819bc940
Remove too many arguments on resolve_maximally_reduced_query_graph 2023-05-02 18:53:40 +02:00
Louis Dureuil
7b8cc25625
rename located_query_terms_from_string -> located_query_terms_from_tokens 2023-05-02 18:53:01 +02:00
meili-bors[bot]
2be641f373
Merge #3718
3718: Fix broken README links r=curquiza a=Kerollmops

This PR fixes #3708 by changing the link to the new SDKs and API Reference pages. I would like to thank `@Tommy-42,` who also found the issue.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-05-02 16:23:38 +00:00
Clément Renault
d89d2efb7e
Change a the text of a link 2023-05-02 13:53:36 +02:00
Clément Renault
f284a9c0dd
Fix the README.md broken links 2023-05-02 13:51:50 +02:00
bors[bot]
134e7fc433
Merge #3709
3709: Add SDKs test in a CI r=Kerollmops a=curquiza

Add a CI running every week to run the `nightly` docker image of Meilisearch with the most "strategic" SDKs (most used, well tested, strongly typed SDK)
- meilisearch-js
- instant-meilisearch
- meilisearch-php
- meilisearch-python
- meilisearch-go
- meilisearch-ruby
- meilisearch-rust

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2023-05-02 11:22:09 +00:00
Clémentine Urquizar
0cba919228 Add SDKs test in a CI 2023-05-02 11:53:28 +02:00
Loïc Lecrenier
aa63091752 Fix bug in exact_attribute 2023-05-02 10:48:32 +02:00
Loïc Lecrenier
58735d6d8f Fix outdated relevancy test 2023-05-02 10:48:32 +02:00
Loïc Lecrenier
1b514517f5 Fix bug in computation of query term at a position 2023-05-02 10:48:32 +02:00
Loïc Lecrenier
11f814821d Minor cleanup 2023-05-02 10:48:32 +02:00
Loïc Lecrenier
30fb1153cc Speed up graph based ranking rule when a lot of different costs exist 2023-05-02 09:59:42 +02:00
Loïc Lecrenier
3b2c8b9f25 Improve performance of position rr 2023-05-02 09:59:42 +02:00
Loïc Lecrenier
2a7f9adf78 Build query graph more correctly from paths
Update snapshots
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
608ceea440 Fix bug in position rr 2023-05-02 09:59:42 +02:00
Loïc Lecrenier
79001b9c97 Improve performance of the cheapest path finder algorithm 2023-05-02 09:59:42 +02:00
Loïc Lecrenier
59b12fca87 Fix errors, clippy warnings, and add review comments 2023-04-29 11:48:11 +02:00
Loïc Lecrenier
48f5bb1693 Implements the geo-sort ranking rule 2023-04-29 11:02:16 +02:00
Loïc Lecrenier
93188b3c88 Fix indexing of word_prefix_fid_docids 2023-04-29 10:56:48 +02:00
Loïc Lecrenier
bc4efca611 Add more tests for the attribute ranking rule 2023-04-29 10:56:48 +02:00
TATHAGATA ROY
feaf25a95d Updated messages pointing to the docs website 2023-04-28 20:52:03 +00:00
bors[bot]
414b3fae89
Merge #3571
3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops

# Pull Request

## Related issue
This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884) is describing in a very detailed manner.

## What does this PR do?

### `IS NULL` and `IS NOT NULL`

This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter.

- `IS NULL` matches fields that `EXISTS` AND `= IS NULL`
- `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL`

1. `{"name": "A", "price": null}`
2. `{"name": "A", "price": 10}`
3. `{"name": "A"}`

`price IS NULL` would match 1
`price IS NOT NULL` or `NOT price IS NULL` would match 2,3
`price EXISTS` would match 1, 2
`price NOT EXISTS` or `NOT price EXISTS` would match 3

common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2

### `IS EMPTY` and `IS NOT EMPTY`

- `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty
- `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty.

1. `{"name": "A", "tags": null}`
2. `{"name": "A", "tags": [null]}`
3. `{"name": "A", "tags": []}`
4. `{"name": "A", "tags": ["hello","world"]}`
5. `{"name": "A", "tags": [""]}`
6. `{"name": "A"}`
7. `{"name": "A", "tags": {}}`
8. `{"name": "A", "tags": {"t1":"v1"}}`
9. `{"name": "A", "tags": {"t1":""}}`
10. `{"name": "A", "tags": ""}`

`tags IS EMPTY` would match 3,7,10
`tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9
`tags IS NULL` would match 1
`tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10
`tags EXISTS` would match 1,2,3,4,5,7,8,9,10
`tags NOT EXISTS` or `NOT tags EXISTS` would match 6

common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9

## What should the reviewer do?

- Check that I tested the filters
- Check that I deleted the ids of the documents when deleting documents


Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-04-27 13:14:00 +00:00
Loïc Lecrenier
899baa0ea5 Update forgotten snapshot from previous commit 2023-04-27 13:43:04 +02:00
Loïc Lecrenier
374095d42c Add tests for stop words and fix a couple of bugs 2023-04-27 13:30:09 +02:00
Loïc Lecrenier
dd007dceca
Merge pull request #3703 from meilisearch/search-refactor-test-typo-tolerance
Search refactor test typo tolerance + some bugfixes
2023-04-27 11:01:35 +02:00
bors[bot]
3ae587205c
Merge #3464
3464: Remove CLI changes for clippy r=curquiza a=dureuill

# Pull Request

## Related issue

Reverts #3434, which was linked to https://github.com/rust-lang/rust-clippy/issues/10087, as putting the lint in the pedantic group [is being uplifted to Rust 1.67.1](https://github.com/rust-lang/rust/pull/107743#issue-1573438821) (my thanks to everyone involved in this work 🎉).

## Motivation

- Using "standard issue" clippy in the CI spares our contributors and us from knowing/remembering to add the lint when running clippy locally
- In particular, spares us from configuring tools like rust-analyzer to take the lint into account.
- Should this lint come back in another form in the future, we won't blindly ignore it, and we will be able to reassess it, which will be good wrt writing idiomatic Rust. By the time this occurs, lints might be configurable through `clippy.toml` too, which would make disabling one globally much more convenient if needs be.

## Note

We should wait for the release of Rust 1.67.1 and its propagation to our CI before merging this. The PR won't pass CI before this.


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-04-26 17:36:56 +00:00
ManyTheFish
1bf2694604 Update cargo lock 2023-04-26 17:41:29 +02:00
Louis Dureuil
ed9cc1af55
Remove CLI changes for clippy 2023-04-26 17:04:09 +02:00
Louis Dureuil
b41a6cbd7a
Check sort criteria also in placeholder search 2023-04-26 16:28:17 +02:00
Louis Dureuil
c8af572697
Add tests for exact words and exact attributes 2023-04-26 16:13:01 +02:00
ManyTheFish
249053e514 Update feature flags 2023-04-26 14:59:25 +02:00
ManyTheFish
ff2cf2a5ae Update charabia in milli 2023-04-26 14:56:54 +02:00
Loïc Lecrenier
b448aca49c Add more tests for exactness rr 2023-04-26 11:04:18 +02:00
Loïc Lecrenier
55bad07c16 Fix bug in exact_attribute rr implementation 2023-04-26 10:40:05 +02:00
bors[bot]
380469665f
Merge #3696
3696: Remove the unused snapshot files r=dureuill a=irevoire

While « reverting by hand » the PR about the auto batching of the addition&deletion of documents, we forgot to remove the associated snapshot files.

Here is the command I used to generate this PR: `cargo insta test --delete-unreferenced-snapshots`

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-04-26 07:18:29 +00:00
Loïc Lecrenier
3421125a55 Prevent the exactness ranking rule from removing random words
Make it strictly follow the term matching strategy
2023-04-26 09:09:19 +02:00