Loïc Lecrenier
3baa34d842
Fix compiler errors/warnings
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
86d9f50b9c
Fix bugs in incremental facet indexing with variable parameters
...
e.g. add one facet value incrementally with a group_size = X and then
add another one with group_size = Y
It is not actually possible to do so with the public API of milli,
but I wanted to make sure the algorithm worked well in those cases
anyway.
The bugs were found by fuzzing the code with fuzzcheck, which I've added
to milli as a conditional dev-dependency. But it can be removed later.
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
985a94adfc
cargo fmt
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
b1ab09196c
Remove outdated TODOs
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
27454e9828
Document and refine facet indexing algorithms
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
bee3c23b45
Add comparison benchmark between bulk and incremental facet indexing
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
b2f01ad204
Refactor facet database tests
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
9026867d17
Give same interface to bulk and incremental facet indexing types
...
+ cargo fmt, oops, sorry for the bad history :(
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
330c9eb1b2
Rename facet codecs and refine FacetsUpdate API
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
485a72306d
Refactor facet-related codecs
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
9b55e582cd
Add FacetsUpdate type that wraps incremental and bulk indexing methods
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
3d145d7f48
Merge the two <facetttype>_faceted_documents_ids methods into one
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
079ed4a992
Add more snapshots
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
afdf87f6f7
Fix bugs in asc/desc criterion and facet indexing
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
a7201ece04
cargo fmt
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
36296bbb20
Add facet incremental indexing snapshot tests + fix bug
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
07ff92c663
Add more snapshots from facet tests
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
61252248fb
Fix some facet indexing bugs
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
68cbcdf08b
Fix compile errors/warnings in http-ui and infos
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
85824ee203
Try to make facet indexing incremental
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
d30c89e345
Fix compile error+warnings in new tests
2022-10-26 13:46:46 +02:00
Loïc Lecrenier
e8a156d682
Reorganise facets database indexing code
2022-10-26 13:46:46 +02:00
Loïc Lecrenier
bd2c0e1ab6
Remove unused code
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
39a4a0a362
Reintroduce filter range search and facet extractors
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
22d80eeaf9
Reintroduce facet deletion functionality
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
6cc91824c1
Remove unused heed codec files
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
63ef0aba18
Start porting facet distribution and sort to new database structure
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
7913d6365c
Update Facets indexing to be compatible with new database structure
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
c3f49f766d
Prepare refactor of facets database
...
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
9a569d73d1
Minor code style change
2022-10-24 15:30:43 +02:00
Loïc Lecrenier
d76d0cb1bf
Merge branch 'main' into word-pair-proximity-docids-refactor
2022-10-24 15:23:00 +02:00
Loïc Lecrenier
a983129613
Apply suggestions from code review
2022-10-20 09:49:37 +02:00
Loïc Lecrenier
ab2f6f3aa4
Refine some details in word_prefix_pair_proximity indexing code
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
178d00f93a
Cargo fmt
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
072b576514
Fix proximity value in keys of prefix_word_pair_proximity_docids
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
6c3a5d69e1
Update snapshots
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
a7de4f5b85
Don't add swapped word pairs to the word_pair_proximity_docids db
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
264a04922d
Add prefix_word_pair_proximity database
...
Similar to the word_prefix_pair_proximity one but instead the keys are:
(proximity, prefix, word2)
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
1dbbd8694f
Rename StrStrU8Codec to U8StrStrCodec and reorder its fields
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
bdeb47305e
Change encoding of word_pair_proximity DB to (proximity, word1, word2)
...
Same for word_prefix_pair_proximity
2022-10-18 10:37:34 +02:00
Ewan Higgs
beb987d3d1
Fixing piles of clippy errors.
...
Most of these are calling clone when the struct supports Copy.
Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.
I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
msvaljek
762e320c35
Add proximity calculation for the same word
2022-10-07 12:59:12 +02:00
vishalsodani
00c02d00f3
Add missing logging timer to extractors
2022-09-30 22:17:06 +05:30
bors[bot]
15d478cf4d
Merge #635
...
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec
# Pull Request
## What does this PR do?
Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.
In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:00:52 +00:00
Loïc Lecrenier
3794962330
Use an unstable algorithm for grenad::Sorter when possible
2022-09-13 14:49:53 +02:00
Kerollmops
d4d7c9d577
We avoid skipping errors in the indexing pipeline
2022-09-13 14:03:00 +02:00
Kerollmops
fe3973a51c
Make sure that long words are correctly skipped
2022-09-07 15:03:32 +02:00
Kerollmops
c83c3cd796
Add a test to make sure that long words are correctly skipped
2022-09-07 14:12:36 +02:00
ManyTheFish
5391e3842c
replace optional_words by term_matching_strategy
2022-08-22 17:47:19 +02:00
ManyTheFish
9640976c79
Rename TermMatchingPolicies
2022-08-18 17:36:08 +02:00