Loïc Lecrenier
14ca8048a8
Add some documentation on how to run the facet db fuzzer
2022-10-26 13:48:01 +02:00
Loïc Lecrenier
206a3e00e5
cargo fmt
2022-10-26 13:48:01 +02:00
Loïc Lecrenier
f198b20c42
Add facet deletion tests that use both the incremental and bulk methods
...
+ update deletion snapshots to the new database format
2022-10-26 13:47:46 +02:00
Loïc Lecrenier
e3ba1fc883
Make deletion tests for both soft-deletion and hard-deletion
2022-10-26 13:47:46 +02:00
Loïc Lecrenier
ab5e56fd16
Add document deletion snapshot tests and tests for hard-deletion
2022-10-26 13:47:46 +02:00
Loïc Lecrenier
d885de1600
Add option to avoid soft deletion of documents
2022-10-26 13:47:46 +02:00
Loïc Lecrenier
2295e0e3ce
Use real delete function in facet indexing fuzz tests
...
By deleting multiple docids at once instead of one-by-one
2022-10-26 13:47:46 +02:00
Loïc Lecrenier
acc8caebe6
Add link to GitHub PR to document of update/facet module
2022-10-26 13:47:46 +02:00
Loïc Lecrenier
a034a1e628
Move StrRefCodec and ByteSliceRefCodec to their own files
2022-10-26 13:47:46 +02:00
Loïc Lecrenier
1165ba2171
Make facet deletion incremental
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
0ade699873
Don't crash when failing to decode using StrRef codec
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
d0109627b9
Fix a bug in facet_range_search and add documentation
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
1ecd3bb822
Fix bug in FieldDocIdFacetCodec
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
51961e1064
Polish some details
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
cb8442a119
Further unify facet databases of f64s and strings
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
3baa34d842
Fix compiler errors/warnings
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
86d9f50b9c
Fix bugs in incremental facet indexing with variable parameters
...
e.g. add one facet value incrementally with a group_size = X and then
add another one with group_size = Y
It is not actually possible to do so with the public API of milli,
but I wanted to make sure the algorithm worked well in those cases
anyway.
The bugs were found by fuzzing the code with fuzzcheck, which I've added
to milli as a conditional dev-dependency. But it can be removed later.
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
de52a9bf75
Improve documentation of some facet-related algorithms
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
985a94adfc
cargo fmt
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
b1ab09196c
Remove outdated TODOs
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
3d7ed3263f
Fix bug in string facet distribution with few candidates
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
fca4577e23
Return original string in facet distributions, work on facet tests
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
27454e9828
Document and refine facet indexing algorithms
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
bee3c23b45
Add comparison benchmark between bulk and incremental facet indexing
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
b2f01ad204
Refactor facet database tests
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
9026867d17
Give same interface to bulk and incremental facet indexing types
...
+ cargo fmt, oops, sorry for the bad history :(
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
330c9eb1b2
Rename facet codecs and refine FacetsUpdate API
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
485a72306d
Refactor facet-related codecs
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
9b55e582cd
Add FacetsUpdate type that wraps incremental and bulk indexing methods
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
3d145d7f48
Merge the two <facetttype>_faceted_documents_ids methods into one
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
982efab88f
Fix encoding bugs in facet databases
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
079ed4a992
Add more snapshots
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
afdf87f6f7
Fix bugs in asc/desc criterion and facet indexing
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
a7201ece04
cargo fmt
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
36296bbb20
Add facet incremental indexing snapshot tests + fix bug
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
07ff92c663
Add more snapshots from facet tests
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
61252248fb
Fix some facet indexing bugs
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
68cbcdf08b
Fix compile errors/warnings in http-ui and infos
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
85824ee203
Try to make facet indexing incremental
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
d30c89e345
Fix compile error+warnings in new tests
2022-10-26 13:46:46 +02:00
Loïc Lecrenier
e8a156d682
Reorganise facets database indexing code
2022-10-26 13:46:46 +02:00
Loïc Lecrenier
fb8d23deb3
Reintroduce db_snap! for facet databases
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
e570c23153
Reintroduce asc/desc functionality
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
bd2c0e1ab6
Remove unused code
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
39a4a0a362
Reintroduce filter range search and facet extractors
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
22d80eeaf9
Reintroduce facet deletion functionality
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
6cc91824c1
Remove unused heed codec files
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
5a904cf29d
Reintroduce facet distribution functionality
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
b8a1caad5e
Add range search and incremental indexing algorithm
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
63ef0aba18
Start porting facet distribution and sort to new database structure
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
7913d6365c
Update Facets indexing to be compatible with new database structure
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
c3f49f766d
Prepare refactor of facets database
...
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
bors[bot]
c8f16530d5
Merge #616
...
616: Introduce an indexation abortion function when indexing documents r=Kerollmops a=Kerollmops
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-26 11:41:18 +00:00
Ewan Higgs
9d27ac8a2e
Ignore too many arguments to functions.
2022-10-25 21:22:53 +02:00
Ewan Higgs
42cdc38c7b
Allow weird ranges like 1..=0 to pass clippy.
...
Everything else is just a warning and exit code will be 0.
2022-10-25 21:12:59 +02:00
Ewan Higgs
2ce025a906
Fixes after rebase to fix new issues.
2022-10-25 20:58:31 +02:00
Ewan Higgs
17f7922bfc
Remove unneeded lifetimes.
2022-10-25 20:49:04 +02:00
Ewan Higgs
6b2fe94192
Fixes for clippy bringing us down to 18 remaining issues.
...
This brings us a step closer to enforcing clippy on each build.
2022-10-25 20:49:02 +02:00
Loïc Lecrenier
36bd66281d
Add method to create a new Index with specific creation dates
2022-10-25 14:37:56 +02:00
Loïc Lecrenier
9a569d73d1
Minor code style change
2022-10-24 15:30:43 +02:00
Loïc Lecrenier
be302fd250
Remove outdated workaround for duplicate words in phrase search
2022-10-24 15:27:06 +02:00
Loïc Lecrenier
d76d0cb1bf
Merge branch 'main' into word-pair-proximity-docids-refactor
2022-10-24 15:23:00 +02:00
Loïc Lecrenier
a983129613
Apply suggestions from code review
2022-10-20 09:49:37 +02:00
bors[bot]
f11a4087da
Merge #665
...
665: Fixing piles of clippy errors. r=ManyTheFish a=ehiggs
## Related issue
No issue fixed. Simply cleaning up some code for clippy on the march towards a clean build when #659 is merged.
## What does this PR do?
Most of these are calling clone when the struct supports Copy.
Many are using & and &mut on `self` when the function they are called from already has an immutable or mutable borrow so this isn't needed.
I tried to stay away from actual changes or places where I'd have to name fresh variables.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>
2022-10-20 07:19:46 +00:00
Loïc Lecrenier
176ffd23f5
Fix compile error after rebasing wppd-refactor
2022-10-18 10:40:26 +02:00
Loïc Lecrenier
ab2f6f3aa4
Refine some details in word_prefix_pair_proximity indexing code
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
e6e76fbefe
Improve performance of resolve_phrase at the cost of some relevancy
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
178d00f93a
Cargo fmt
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
830a7c0c7a
Use resolve_phrase
function for exactness criteria as well
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
18d578dfc4
Adjust some algorithms using DBs of word pair proximities
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
072b576514
Fix proximity value in keys of prefix_word_pair_proximity_docids
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
6c3a5d69e1
Update snapshots
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
a7de4f5b85
Don't add swapped word pairs to the word_pair_proximity_docids db
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
264a04922d
Add prefix_word_pair_proximity database
...
Similar to the word_prefix_pair_proximity one but instead the keys are:
(proximity, prefix, word2)
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
1dbbd8694f
Rename StrStrU8Codec to U8StrStrCodec and reorder its fields
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
bdeb47305e
Change encoding of word_pair_proximity DB to (proximity, word1, word2)
...
Same for word_prefix_pair_proximity
2022-10-18 10:37:34 +02:00
Many the fish
81919a35a2
Update milli/src/search/criteria/initial.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-17 18:23:20 +02:00
Many the fish
516e838eb4
Update milli/src/search/criteria/initial.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-17 18:23:15 +02:00
Clément Renault
fc03e53615
Add a test to check that we can abort an indexation
2022-10-17 17:28:03 +02:00
Kerollmops
6603437cb1
Introduce an indexation abortion function when indexing documents
2022-10-17 17:28:03 +02:00
ManyTheFish
6f55e7844c
Add some code comments
2022-10-17 14:41:57 +02:00
ManyTheFish
cf203b7fde
Take filter in account when computing the pages candidates
2022-10-17 14:13:44 +02:00
ManyTheFish
d71bc1e69f
Compute an exact count when using distinct
2022-10-17 14:13:44 +02:00
ManyTheFish
a396806343
Add settings to force milli to exhaustively compute the total number of hits
2022-10-17 14:13:44 +02:00
Ewan Higgs
beb987d3d1
Fixing piles of clippy errors.
...
Most of these are calling clone when the struct supports Copy.
Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.
I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
bors[bot]
f30979d021
Merge #662
...
662: Enhance word splitting strategy r=ManyTheFish a=akki1306
# Pull Request
## Related issue
Fixes #648
## What does this PR do?
- [split_best_frequency](55d889522b/milli/src/search/query_tree.rs (L282-L301)
) to use frequency of word pairs near together with proximity value of 1 instead of considering the frequency of individual words. Word pairs having max frequency are considered.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Akshay Kulkarni <akshayk.gj@gmail.com>
2022-10-13 08:14:22 +00:00
Akshay Kulkarni
85f3028317
remove underscore and introduce back word_documents_count
2022-10-13 13:21:59 +05:30
Akshay Kulkarni
8195fc6141
revert removal of word_documents_count method
2022-10-13 13:14:27 +05:30
Akshay Kulkarni
32f825d442
move default implementation of word_pair_frequency to TestContext
2022-10-13 12:57:50 +05:30
Akshay Kulkarni
ff8b2d4422
formatting
2022-10-13 12:44:08 +05:30
Akshay Kulkarni
6cb8b46900
use word_pair_frequency and remove word_documents_count
2022-10-13 12:43:11 +05:30
Akshay Kulkarni
8c9245149e
format file
2022-10-12 15:27:56 +05:30
Akshay Kulkarni
63e79a9039
update comment
2022-10-12 13:36:48 +05:30
Akshay Kulkarni
7f9680f0a0
Enhance word splitting strategy
2022-10-12 13:18:23 +05:30
Loïc Lecrenier
6fbf5dac68
Simplify documents! macro to reduce compile times
2022-10-12 09:22:05 +02:00
msvaljek
762e320c35
Add proximity calculation for the same word
2022-10-07 12:59:12 +02:00
vishalsodani
00c02d00f3
Add missing logging timer to extractors
2022-09-30 22:17:06 +05:30
bors[bot]
15d478cf4d
Merge #635
...
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec
# Pull Request
## What does this PR do?
Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.
In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:00:52 +00:00
Loïc Lecrenier
3794962330
Use an unstable algorithm for grenad::Sorter when possible
2022-09-13 14:49:53 +02:00
Kerollmops
d4d7c9d577
We avoid skipping errors in the indexing pipeline
2022-09-13 14:03:00 +02:00