Jakub Jirutka
13f1277637
Allow to disable specialized tokenizations (again)
...
In PR #2773 , I added the `chinese`, `hebrew`, `japanese` and `thai`
feature flags to allow melisearch to be built without huge specialed
tokenizations that took up 90% of the melisearch binary size.
Unfortunately, due to some recent changes, this doesn't work anymore.
The problem lies in excessive use of the `default` feature flag, which
infects the dependency graph.
Instead of adding `default-features = false` here and there, it's easier
and more future-proof to not declare `default` in `milli` and
`meilisearch-types`. I've renamed it to `all-tokenizers`, which also
makes it a bit clearer what it's about.
2023-05-04 15:45:40 +02:00
Louis Dureuil
732c52093d
Processing time without autobatching implementation
2023-05-03 17:41:48 +02:00
Louis Dureuil
f8f190cd40
Update exactness tests following charabia camelCase tokenization
2023-05-03 14:45:09 +02:00
Louis Dureuil
1aaf24ccbf
Cargo fmt
2023-05-03 12:21:58 +02:00
Louis Dureuil
90bc230820
Merge remote-tracking branch 'origin/main' into search-refactor
...
Conflicts | resolution
----------|-----------
Cargo.lock | added mimalloc
Cargo.toml | took origin/main version
milli/src/search/criteria/exactness.rs | deleted after checking it was only clippy changes
milli/src/search/query_tree.rs | deleted after checking it was only clippy changes
2023-05-03 12:19:06 +02:00
Louis Dureuil
342c4ff85d
geosort: Remove rtree unwrap
2023-05-03 09:52:16 +02:00
Tamo
c85392ce40
make the descendent geosort fast
2023-05-03 09:13:12 +02:00
Tamo
8875d24a48
deserialize the rtree only when its needed, and keep it in memory once it has been deserialized
2023-05-03 09:13:12 +02:00
Tamo
c470b67fa2
revamp the test to use execute_iterative_and_rtree_returns_the_same
2023-05-03 09:13:12 +02:00
Louis Dureuil
b60840ebff
Remove self.iterating from words
2023-05-02 18:54:23 +02:00
Louis Dureuil
fdc1763838
Use MultiOps for resolve_query_graph
2023-05-02 18:54:09 +02:00
Louis Dureuil
75819bc940
Remove too many arguments on resolve_maximally_reduced_query_graph
2023-05-02 18:53:40 +02:00
Louis Dureuil
7b8cc25625
rename located_query_terms_from_string -> located_query_terms_from_tokens
2023-05-02 18:53:01 +02:00
Loïc Lecrenier
aa63091752
Fix bug in exact_attribute
2023-05-02 10:48:32 +02:00
Loïc Lecrenier
1b514517f5
Fix bug in computation of query term at a position
2023-05-02 10:48:32 +02:00
Loïc Lecrenier
11f814821d
Minor cleanup
2023-05-02 10:48:32 +02:00
Loïc Lecrenier
30fb1153cc
Speed up graph based ranking rule when a lot of different costs exist
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
3b2c8b9f25
Improve performance of position rr
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
2a7f9adf78
Build query graph more correctly from paths
...
Update snapshots
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
608ceea440
Fix bug in position rr
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
79001b9c97
Improve performance of the cheapest path finder algorithm
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
59b12fca87
Fix errors, clippy warnings, and add review comments
2023-04-29 11:48:11 +02:00
Loïc Lecrenier
48f5bb1693
Implements the geo-sort ranking rule
2023-04-29 11:02:16 +02:00
Loïc Lecrenier
bc4efca611
Add more tests for the attribute ranking rule
2023-04-29 10:56:48 +02:00
bors[bot]
414b3fae89
Merge #3571
...
3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops
# Pull Request
## Related issue
This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884 ) is describing in a very detailed manner.
## What does this PR do?
### `IS NULL` and `IS NOT NULL`
This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter.
- `IS NULL` matches fields that `EXISTS` AND `= IS NULL`
- `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL`
1. `{"name": "A", "price": null}`
2. `{"name": "A", "price": 10}`
3. `{"name": "A"}`
`price IS NULL` would match 1
`price IS NOT NULL` or `NOT price IS NULL` would match 2,3
`price EXISTS` would match 1, 2
`price NOT EXISTS` or `NOT price EXISTS` would match 3
common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2
### `IS EMPTY` and `IS NOT EMPTY`
- `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty
- `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty.
1. `{"name": "A", "tags": null}`
2. `{"name": "A", "tags": [null]}`
3. `{"name": "A", "tags": []}`
4. `{"name": "A", "tags": ["hello","world"]}`
5. `{"name": "A", "tags": [""]}`
6. `{"name": "A"}`
7. `{"name": "A", "tags": {}}`
8. `{"name": "A", "tags": {"t1":"v1"}}`
9. `{"name": "A", "tags": {"t1":""}}`
10. `{"name": "A", "tags": ""}`
`tags IS EMPTY` would match 3,7,10
`tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9
`tags IS NULL` would match 1
`tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10
`tags EXISTS` would match 1,2,3,4,5,7,8,9,10
`tags NOT EXISTS` or `NOT tags EXISTS` would match 6
common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9
## What should the reviewer do?
- Check that I tested the filters
- Check that I deleted the ids of the documents when deleting documents
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-04-27 13:14:00 +00:00
Loïc Lecrenier
899baa0ea5
Update forgotten snapshot from previous commit
2023-04-27 13:43:04 +02:00
Loïc Lecrenier
374095d42c
Add tests for stop words and fix a couple of bugs
2023-04-27 13:30:09 +02:00
Louis Dureuil
b41a6cbd7a
Check sort criteria also in placeholder search
2023-04-26 16:28:17 +02:00
Louis Dureuil
c8af572697
Add tests for exact words and exact attributes
2023-04-26 16:13:01 +02:00
Loïc Lecrenier
b448aca49c
Add more tests for exactness rr
2023-04-26 11:04:18 +02:00
Loïc Lecrenier
55bad07c16
Fix bug in exact_attribute rr implementation
2023-04-26 10:40:05 +02:00
Loïc Lecrenier
3421125a55
Prevent the exactness
ranking rule from removing random words
...
Make it strictly follow the term matching strategy
2023-04-26 09:09:19 +02:00
Clément Renault
14293f6c8f
Make rustfmt happy
2023-04-25 16:55:39 +02:00
Loïc Lecrenier
d3a94e8b25
Fix bugs and add tests to exactness ranking rule
2023-04-25 16:49:08 +02:00
Clément Renault
cfd1b2cc97
Fix the clippy warnings
2023-04-25 16:40:32 +02:00
Loïc Lecrenier
8f2e971879
Add tests for "exactness" rr, make correct universe computation
2023-04-24 16:57:34 +02:00
Loïc Lecrenier
d1fdbb63da
Make all search tests pass, fix distinctAttribute bug
2023-04-24 12:12:08 +02:00
Loïc Lecrenier
84d9c731f8
Fix bug in encoding of word_position_docids and word_fid_docids
2023-04-24 09:59:30 +02:00
Loïc Lecrenier
bd9aba4d77
Add "position" part of the attribute ranking rule
2023-04-13 10:46:09 +02:00
Loïc Lecrenier
8edad8291b
Add logger to attribute rr, fix a bug
2023-04-13 10:25:00 +02:00
Kerollmops
d9cebff61c
Add a simple test to check that attributes are ranking correctly
2023-04-13 08:27:09 +02:00
Loïc Lecrenier
30f7bd03f6
Fix compiler warning/errors caused by previous merge
2023-04-13 08:27:09 +02:00
Kerollmops
df0d9bb878
Introduce the attribute ranking rule in the list of ranking rules
2023-04-13 08:27:09 +02:00
Kerollmops
5230ddb3ea
Resolve the attribute ranking rule conditions
2023-04-13 08:27:09 +02:00
Kerollmops
d6a7c28e4d
Implement the attribute ranking rule edge computation
2023-04-13 08:27:09 +02:00
Kerollmops
e55efc419e
Introduce a new cache for the words fids
2023-04-13 08:27:09 +02:00
Loïc Lecrenier
644e136aee
Merge branch 'search-refactor-typo-attributes' into search-refactor
2023-04-13 08:26:56 +02:00
Louis Dureuil
38b7b31beb
Decide to use prefix DB if the word is not an ngram
2023-04-12 16:45:38 +02:00
Louis Dureuil
7a01f20df7
Use word_prefix_docids, make get_word_prefix_docids private
2023-04-12 16:45:38 +02:00
Louis Dureuil
c20c38a7fa
Add SearchContext::word_prefix_docids() method
2023-04-12 16:44:43 +02:00
Louis Dureuil
5ab46324c4
Everyone uses the SearchContext::word_docids instead of get_db_word_docids
...
make get_db_word_docids private
2023-04-12 16:44:43 +02:00
Louis Dureuil
325f17488a
Add SearchContext::word_docids() method
2023-04-12 16:37:05 +02:00
Louis Dureuil
e7ff987c46
Update call sites
2023-04-12 16:36:38 +02:00
Louis Dureuil
244003e36f
Refactor DB cache to return Roaring Bitmaps directly instead of byte slices
2023-04-12 16:35:48 +02:00
Loïc Lecrenier
1f813a6f3b
Simplify implementation of the detailed (=visual) logger
2023-04-12 16:32:53 +02:00
Loïc Lecrenier
96183e804a
Simplify the logger
2023-04-12 16:32:53 +02:00
Loïc Lecrenier
7ab48ed8c7
Matching words fixes
2023-04-12 16:21:43 +02:00
Loïc Lecrenier
e7bb8c940f
Merge branch 'search-refactor-highlighter' into search-refactor-highlighter-merged
2023-04-11 12:22:34 +02:00
Loïc Lecrenier
d0e9d65025
Fix distinct attribute bugs
2023-04-07 11:09:01 +02:00
Loïc Lecrenier
a81165f0d8
Merge remote-tracking branch 'origin/main' into search-refactor
2023-04-07 10:15:55 +02:00
Loïc Lecrenier
d6585eb10b
Avoid splitting ngrams into their original component words
2023-04-07 10:13:49 +02:00
Loïc Lecrenier
f7d90ad19f
Merge remote-tracking branch 'origin/search-refactor-tests-doc' into search-refactor
2023-04-07 10:13:18 +02:00
Louis Dureuil
31630c85d0
exactness graph rr: Add important TODO/FIXME after review
2023-04-06 17:50:39 +02:00
Louis Dureuil
ab09dc0167
exact_attributes: Add TODOs and additional check after review
2023-04-06 17:50:39 +02:00
Louis Dureuil
618c54915d
exact_attribute: dedup nodes after sorting them
2023-04-06 17:50:39 +02:00
Louis Dureuil
90a6c01495
Use correct codec in proximity
2023-04-06 17:50:39 +02:00
Louis Dureuil
e58426109a
Fix panics and issues in exactness graph ranking rule
2023-04-06 17:50:39 +02:00
Louis Dureuil
f513cf930a
Exact attribute with state
2023-04-06 17:50:39 +02:00
Louis Dureuil
8a13ed7e3f
Add exactness ranking rules
2023-04-06 17:50:39 +02:00
Louis Dureuil
1b8e4d0301
Add ExactTerm and helper method
2023-04-06 17:50:39 +02:00
Louis Dureuil
996619b22a
Increase position by 8 on hard separator when building query terms
2023-04-06 17:50:39 +02:00
Louis Dureuil
2c9822a337
Rename is_multiple_words
to is_ngram
and zero_typo
to exact
2023-04-06 17:50:39 +02:00
Louis Dureuil
7276deee0a
Add new db caches
2023-04-06 17:50:39 +02:00
ManyTheFish
f7e7f438f8
Patch prefix match
2023-04-06 17:22:31 +02:00
ManyTheFish
ba8dcc2d78
Fix clippy
2023-04-06 15:50:47 +02:00
Loïc Lecrenier
7ca91ebb71
Merge branch 'search-refactor-exactness' into search-refactor-tests-doc
2023-04-06 15:16:35 +02:00
ManyTheFish
47f6a3ad3d
Take into account that a logger need the search context
2023-04-06 15:02:23 +02:00
bors[bot]
b4c01581cd
Merge #3641
...
3641: Bring back changes from `release v1.1.0` into `main` after v1.1.0 release r=curquiza a=curquiza
Replace https://github.com/meilisearch/meilisearch/pull/3637 since we don't want to pull commits from `main` into `release-v1.1.0` when fixing git conflicts
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-04-06 12:37:54 +00:00
ManyTheFish
ae17c62e24
Remove warnings
2023-04-06 14:07:18 +02:00
ManyTheFish
a1148c09c2
remove old matcher
2023-04-06 14:00:21 +02:00
ManyTheFish
9c5f64769a
Integrate the new Highlighter in the search
2023-04-06 13:58:56 +02:00
ManyTheFish
ebe23b04c9
Make the matcher consume the search context
2023-04-06 12:28:28 +02:00
ManyTheFish
13b7c826c1
add new highlighter
2023-04-06 12:15:37 +02:00
Louis Dureuil
d1ddaa223d
Use correct codec in proximity
2023-04-05 18:14:00 +02:00
Louis Dureuil
f7ecea142e
Fix panics and issues in exactness graph ranking rule
2023-04-05 18:13:46 +02:00
Louis Dureuil
337e75b0e4
Exact attribute with state
2023-04-05 18:12:46 +02:00
Loïc Lecrenier
b5691802a3
Add new tests and fix construction of query graph from paths
2023-04-05 16:31:10 +02:00
Loïc Lecrenier
6e50f23896
Add more search tests
2023-04-05 13:33:23 +02:00
Loïc Lecrenier
4c8a0179ba
Add more search tests
2023-04-05 11:30:49 +02:00
Loïc Lecrenier
c69cbec64a
Add more search tests
2023-04-05 11:20:04 +02:00
Loïc Lecrenier
ce328c329d
Move bucket sort function to its own module and fix a bug
2023-04-04 18:03:08 +02:00
Loïc Lecrenier
959e4607bb
Add more search tests
2023-04-04 18:02:46 +02:00
Louis Dureuil
4b4ffb8ec9
Add exactness ranking rules
2023-04-04 17:12:07 +02:00
Louis Dureuil
3951fe22ab
Add ExactTerm and helper method
2023-04-04 17:09:32 +02:00
Louis Dureuil
4d5bc9df4c
Increase position by 8 on hard separator when building query terms
2023-04-04 17:07:26 +02:00
Louis Dureuil
ec2f8e8040
Rename is_multiple_words
to is_ngram
and zero_typo
to exact
2023-04-04 17:06:07 +02:00
Louis Dureuil
406b8bd248
Add new db caches
2023-04-04 17:04:46 +02:00
Loïc Lecrenier
62b9c6fbee
Add search tests
2023-04-04 16:18:22 +02:00
Loïc Lecrenier
b439d36807
Split query_term module into multiple submodules
2023-04-04 15:38:30 +02:00
Loïc Lecrenier
faceb661e3
Add note that a part of the code needs fixing
2023-04-04 15:02:01 +02:00
Loïc Lecrenier
4129d657e2
Simplify query_term module a bit
2023-04-04 15:01:42 +02:00
Filip Bachul
1e6fe71a67
fix clippy warning
2023-04-03 20:18:26 +02:00
Filip Bachul
fddfb37f1f
remove unnecessary FilterError:ReservedGeo and FilterError:ReservedGeo
2023-04-03 20:18:26 +02:00
Loïc Lecrenier
3f13608002
Fix computation of ngram derivations
2023-04-03 15:27:49 +02:00
Loïc Lecrenier
4708d9b016
Fix compiler warnings/errors
2023-04-03 10:09:27 +02:00
Clément Renault
0d2e7bcc13
Implement the previous way for the exhaustive distinct candidates
2023-04-03 10:08:10 +02:00
Loïc Lecrenier
55fbfb6124
Merge branch 'search-refactor-located-query-terms' into search-refactor
2023-04-03 10:04:36 +02:00
Loïc Lecrenier
58fe260c72
Allow removing all the terms from a query if it contains a phrase
2023-04-03 09:18:02 +02:00
Loïc Lecrenier
24e5f6f7a9
Don't remove phrases with "last" term matching strategy
2023-04-03 09:17:33 +02:00
Louis Dureuil
9b87c36200
Limit the number of derivations for a single word.
2023-03-31 09:19:18 +02:00
Loïc Lecrenier
12b26cd54e
Don't remove phrases from the query with term matching strategy Last
2023-03-30 14:54:08 +02:00
Loïc Lecrenier
061b1e6d7c
Tiny refactor of query graph remove_nodes method
2023-03-30 14:49:25 +02:00
Loïc Lecrenier
0d6e8b5c31
Fix phrase search bug when the phrase has only one word
2023-03-30 14:48:12 +02:00
Loïc Lecrenier
d48cdc67a0
Fix term matching strategy bugs
2023-03-30 14:01:52 +02:00
Loïc Lecrenier
35c16ad047
Use new term matching strategy logic in words ranking rule
2023-03-30 13:15:43 +02:00
Loïc Lecrenier
2997d1f186
Use new term matching strategy logic in resolve_maximally_reduced_...
2023-03-30 13:12:51 +02:00
Loïc Lecrenier
2a5997fb20
Avoid expensive assert! in bucket sort function
2023-03-30 13:07:17 +02:00
Loïc Lecrenier
ee8a9e0bad
Remove outdated sentence in documentation
2023-03-30 12:22:24 +02:00
Loïc Lecrenier
3b0737a092
Fix detailed logger
2023-03-30 12:20:44 +02:00
Loïc Lecrenier
fdd02105ac
Graph-based ranking rule + term matching strategy support
2023-03-30 12:19:21 +02:00
Loïc Lecrenier
aa9592455c
Refactor the paths_of_cost algorithm
...
Support conditions that require certain nodes to be skipped
2023-03-30 12:11:11 +02:00
Loïc Lecrenier
01e24dd630
Rewrite proximity ranking rule
2023-03-30 11:59:06 +02:00
Loïc Lecrenier
ae6bb1ce17
Update the ConditionDocidsCache after change to RankingRuleGraphTrait
2023-03-30 11:41:20 +02:00
Loïc Lecrenier
5fd28620cd
Build ranking rule graph correctly after changes to trait definition
2023-03-30 11:32:55 +02:00
Loïc Lecrenier
728710d63a
Update typo ranking rule to use new query term structure
2023-03-30 11:32:19 +02:00
Loïc Lecrenier
fa81381865
Update the trait requirements of ranking-rule graphs
2023-03-30 11:19:45 +02:00
Loïc Lecrenier
b96a682f16
Update resolve_graph module to work with lazy query terms
2023-03-30 11:10:38 +02:00
Loïc Lecrenier
d0f048c068
Simplify the API of the DatabaseCache
2023-03-30 11:08:17 +02:00
Loïc Lecrenier
223e82a10d
Update QueryGraph to use new lazy query terms + build from paths
2023-03-30 11:06:02 +02:00
Loïc Lecrenier
9507ff5e31
Update query term structure to allow for laziness
2023-03-30 11:06:02 +02:00
Louis Dureuil
c2b025946a
located_query_terms_from_string
: use u16 for positions, hard limit number of iterated tokens.
...
- Refactor phrase logic to reduce number of possible states
2023-03-30 11:04:14 +02:00
Loïc Lecrenier
3a818c5e87
Add more functionality to interners
2023-03-30 09:56:23 +02:00
Louis Dureuil
d74134ce3a
Check sort criteria
2023-03-29 15:21:54 +02:00
Louis Dureuil
5ac129bfa1
Mark geosearch as currently unimplemented for sort rule
2023-03-29 15:20:42 +02:00
ManyTheFish
efea1e5837
Fix facet normalization
2023-03-29 12:02:24 +02:00
Louis Dureuil
abb4522f76
Small comment on ignored rules for placeholder search
2023-03-29 09:11:06 +02:00
Louis Dureuil
ef084ef042
SmallBitmap: Consistently panic on incoherent universe lengths
2023-03-29 08:45:38 +02:00
Louis Dureuil
3524bd1257
SmallBitmap: Add documentation
2023-03-29 08:44:11 +02:00
Tamo
a50b058557
update the geoBoundingBox feature
...
Now instead of using the (top_left, bottom_right) corners of the bounding box it s using the (top_right, bottom_left) corners.
2023-03-28 18:26:18 +02:00
Louis Dureuil
d4f6216966
Resolve rule time sort criteria
2023-03-28 16:42:02 +02:00
Louis Dureuil
77acafe534
Resolve search time sort criteria for placeholder search
2023-03-28 16:41:03 +02:00
Louis Dureuil
abb19d368d
Initialize query time ranking rule for query search
2023-03-28 12:40:52 +02:00
Louis Dureuil
b4a52a622e
BoxRankingRule
2023-03-28 12:39:42 +02:00
Louis Dureuil
e9eb271499
Remove empty attribute_rule mod
2023-03-27 11:08:03 +02:00
Louis Dureuil
3281a88d08
SmallBitmap: don't expose internal items
2023-03-27 11:04:43 +02:00
Louis Dureuil
5a644054ab
Removed unused search impl
2023-03-27 11:04:27 +02:00
Louis Dureuil
16fefd364e
Add TODO notes
2023-03-27 11:04:04 +02:00
Loïc Lecrenier
00bad8c716
Add comments suggesting performance improvements
2023-03-23 10:18:24 +01:00
Loïc Lecrenier
862714a18b
Remove criterion_implementation_strategy param of Search
2023-03-23 09:44:12 +01:00
Loïc Lecrenier
d18ebe4f3a
Remove more warnings
2023-03-23 09:41:18 +01:00