Commit Graph

1046 Commits

Author SHA1 Message Date
Loïc Lecrenier
6c3a5d69e1 Update snapshots 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
a7de4f5b85 Don't add swapped word pairs to the word_pair_proximity_docids db 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
264a04922d Add prefix_word_pair_proximity database
Similar to the word_prefix_pair_proximity one but instead the keys are:
(proximity, prefix, word2)
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
1dbbd8694f Rename StrStrU8Codec to U8StrStrCodec and reorder its fields 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
bdeb47305e Change encoding of word_pair_proximity DB to (proximity, word1, word2)
Same for word_prefix_pair_proximity
2022-10-18 10:37:34 +02:00
Many the fish
81919a35a2
Update milli/src/search/criteria/initial.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-17 18:23:20 +02:00
Many the fish
516e838eb4
Update milli/src/search/criteria/initial.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-17 18:23:15 +02:00
ManyTheFish
6f55e7844c Add some code comments 2022-10-17 14:41:57 +02:00
ManyTheFish
cf203b7fde Take filter in account when computing the pages candidates 2022-10-17 14:13:44 +02:00
ManyTheFish
d71bc1e69f Compute an exact count when using distinct 2022-10-17 14:13:44 +02:00
ManyTheFish
a396806343 Add settings to force milli to exhaustively compute the total number of hits 2022-10-17 14:13:44 +02:00
Loïc Lecrenier
4c481a8947 Upgrade all dependencies 2022-10-17 13:05:56 +02:00
bors[bot]
f30979d021
Merge #662
662: Enhance word splitting strategy r=ManyTheFish a=akki1306

# Pull Request

## Related issue
Fixes #648 

## What does this PR do?
- [split_best_frequency](55d889522b/milli/src/search/query_tree.rs (L282-L301)) to use frequency of word pairs near together with proximity value of 1 instead of considering the frequency of individual words. Word pairs having max frequency are considered.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Co-authored-by: Akshay Kulkarni <akshayk.gj@gmail.com>
2022-10-13 08:14:22 +00:00
Akshay Kulkarni
85f3028317
remove underscore and introduce back word_documents_count 2022-10-13 13:21:59 +05:30
Akshay Kulkarni
8195fc6141
revert removal of word_documents_count method 2022-10-13 13:14:27 +05:30
Akshay Kulkarni
32f825d442
move default implementation of word_pair_frequency to TestContext 2022-10-13 12:57:50 +05:30
Akshay Kulkarni
ff8b2d4422
formatting 2022-10-13 12:44:08 +05:30
Akshay Kulkarni
6cb8b46900
use word_pair_frequency and remove word_documents_count 2022-10-13 12:43:11 +05:30
Akshay Kulkarni
8c9245149e
format file 2022-10-12 15:27:56 +05:30
Akshay Kulkarni
63e79a9039
update comment 2022-10-12 13:36:48 +05:30
Akshay Kulkarni
7f9680f0a0
Enhance word splitting strategy 2022-10-12 13:18:23 +05:30
Loïc Lecrenier
6fbf5dac68 Simplify documents! macro to reduce compile times 2022-10-12 09:22:05 +02:00
msvaljek
762e320c35
Add proximity calculation for the same word 2022-10-07 12:59:12 +02:00
vishalsodani
00c02d00f3 Add missing logging timer to extractors 2022-09-30 22:17:06 +05:30
bors[bot]
d94339a858
Merge #636
636: Remove unused `infos`, `http-ui`, and `milli/fuzz`, crates r=ManyTheFish a=loiclec

We haven't used the `infos/`, `http-ui/` and `milli/fuzz/` crates in a long time. They are not properly maintained and probably do not work correctly anymore.

This PR removes these crates entirely from the workspace to reduce the amount of code we need to maintain.

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:39:57 +00:00
bors[bot]
15d478cf4d
Merge #635
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec

# Pull Request
## What does this PR do?

Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.

In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.


Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:00:52 +00:00
Loïc Lecrenier
add96f921b Remove unused infos/ http-ui/ and fuzz/ crates 2022-09-14 06:55:01 +02:00
curquiza
753e76d451 Update version for the next release (v0.33.4) in Cargo.toml files 2022-09-13 13:55:50 +00:00
Loïc Lecrenier
3794962330 Use an unstable algorithm for grenad::Sorter when possible 2022-09-13 14:49:53 +02:00
Kerollmops
d4d7c9d577
We avoid skipping errors in the indexing pipeline 2022-09-13 14:03:00 +02:00
Vincent Herlemont
8cd5200f48 Make charabia languages configurable 2022-09-08 12:21:43 +02:00
Vincent Herlemont
5e07ea79c2 Make charabia default feature optional 2022-09-07 20:54:31 +02:00
curquiza
077dcd2002 Update version for the next release (v0.33.3) in Cargo.toml files 2022-09-07 15:48:53 +00:00
Kerollmops
fe3973a51c
Make sure that long words are correctly skipped 2022-09-07 15:03:32 +02:00
Kerollmops
c83c3cd796
Add a test to make sure that long words are correctly skipped 2022-09-07 14:12:36 +02:00
ManyTheFish
bf750e45a1 Fix word removal issue 2022-09-01 12:10:47 +02:00
ManyTheFish
a38608fe59 Add test mixing phrased and no-phrased words 2022-09-01 12:02:10 +02:00
ManyTheFish
97a04887a3 Update version for next release (v0.33.2) in Cargo.toml 2022-09-01 11:47:23 +02:00
bors[bot]
17d020e996
Merge #618
618: Update version for next release (v0.33.1) in Cargo.toml r=Kerollmops a=curquiza

No breaking for this release

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-08-31 10:43:45 +00:00
Clémentine Urquizar
c3363706c5
Update version for next release (v0.33.1) in Cargo.toml 2022-08-31 11:37:27 +02:00
Clément Renault
7f92116b51
Accept again integers as document ids 2022-08-31 10:56:39 +02:00
Irevoire
f6024b3269
Remove the artifacts of the past 2022-08-23 16:10:38 +02:00
bors[bot]
a79ff8a1a9
Merge #611
611: Upgrade charabia v0.6.0 r=curquiza a=ManyTheFish

# Pull Request

## What does this PR do?

- Update `log`
- Upgrade `charabia`

related to https://github.com/meilisearch/meilisearch/issues/2686


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-08-23 10:17:29 +00:00
Clémentine Urquizar
9ed7324995
Update version for next release (v0.33.0) 2022-08-23 11:47:48 +02:00
bors[bot]
18886dc6b7
Merge #598
598: Matching query terms policy r=Kerollmops a=ManyTheFish

## Summary

Implement several optional words strategy.

## Content

Replace `optional_words` boolean with an enum containing several term matching strategies:
```rust
pub enum TermsMatchingStrategy {
    // remove last word first
    Last,
    // remove first word first
    First,
    // remove more frequent word first
    Frequency,
    // remove smallest word first
    Size,
    // only one of the word is mandatory
    Any,
    // all words are mandatory
    All,
}
```

All strategies implemented during the prototype are kept, but only `Last` and `All` will be published by Meilisearch in the `v0.29.0` release.

## Related

spec: https://github.com/meilisearch/specifications/pull/173
prototype discussion: https://github.com/meilisearch/meilisearch/discussions/2639#discussioncomment-3447699


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-08-22 15:51:37 +00:00
ManyTheFish
5391e3842c replace optional_words by term_matching_strategy 2022-08-22 17:47:19 +02:00
ManyTheFish
ba5ca8a362 Upgrade charabia v0.6.0 2022-08-22 14:38:00 +02:00
Irevoire
e7624abe63
share heed between all sub-crates 2022-08-19 11:23:41 +02:00
ManyTheFish
993aa1321c Fix query tree building 2022-08-18 17:56:06 +02:00
ManyTheFish
bff9653050 Fix remove count 2022-08-18 17:36:30 +02:00