tamo
5d5d115608
reformat all the files
2021-06-02 11:05:07 +02:00
tamo
7086009f93
improve the base search
2021-06-02 11:05:07 +02:00
tamo
d0b44c380f
add benchmarks on a wiki dataset
2021-06-02 11:05:07 +02:00
tamo
beae843766
add a missing space
2021-06-02 11:05:07 +02:00
tamo
5132a106a1
refactorize everything related to the songs dataset in a songs benchmark file
2021-06-02 11:05:07 +02:00
tamo
136efd6b53
fix the benches
2021-06-02 11:05:07 +02:00
tamo
4b78ef31b6
add the configuration of the searchable fields and displayed fields and a default configuration for the songs
2021-06-02 11:05:07 +02:00
tamo
ea0c6d8c40
add a bunch of queries and start the introduction of the filters and the new dataset
2021-06-02 11:05:07 +02:00
tamo
3def42abd8
merge all the criterion only benchmarks in one file
2021-06-02 11:05:07 +02:00
tamo
a2bff68c1a
remove the optional words for the typo criterion
2021-06-02 11:05:07 +02:00
tamo
aee49bb3cd
add the proximity criterion
2021-06-02 11:05:07 +02:00
tamo
49e4cc3daf
add the words criterion to the bench
2021-06-02 11:05:07 +02:00
tamo
15cce89a45
update the README with instructions to get the download the dataset
2021-06-02 11:05:07 +02:00
tamo
e425f70ef9
let criterion decide how much iteration it wants to do in 10s
2021-06-02 11:05:07 +02:00
tamo
4fdbfd6048
push a first version of the benchmark for the typo
2021-06-02 11:05:07 +02:00
bors[bot]
270da98c46
Merge #202
...
202: Add field id word count docids database r=Kerollmops a=LegendreM
This PR introduces a new database, `field_id_word_count_docids`, that maps the number of words in an attribute with a list of document ids. This relation is limited to attributes that contain less than 11 words.
This database is used by the exactness criterion to know if a document has an attribute that contains exactly the query without any additional word.
Fix #165
Fix #196
Related to [specifications:#36](https://github.com/meilisearch/specifications/pull/36 )
Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-06-01 16:09:48 +00:00
many
e857ca4d7d
Fix PR comments
2021-06-01 18:06:46 +02:00
Many
ab2cf69e8d
Update milli/src/update/delete_documents.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:10 +02:00
Many
8e6d1ff0dc
Update milli/src/update/index_documents/store.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:02 +02:00
bors[bot]
7d36d664a7
Merge #203
...
203: Make the MatchingWords return the number of matching bytes r=Kerollmops a=LegendreM
Make the MatchingWords return the number of matching bytes using a custom Levenshtein algorithm.
Fix #138
Co-authored-by: many <maxime@meilisearch.com>
2021-06-01 12:00:33 +00:00
many
225ae6fd25
Resolve PR comments
2021-06-01 11:53:09 +02:00
Marin Postma
984dc7c1ed
rewrite roaring codec without byteorder.
2021-05-31 22:15:39 +02:00
Marin Postma
1373637da1
optimize roaring codec
2021-05-31 22:15:35 +02:00
many
1df68d342a
Make the MatchingWords return the number of matching bytes
2021-05-31 18:22:29 +02:00
many
c701f8bf36
Use field id word count database in exactness criterion
2021-05-31 16:27:28 +02:00
many
4ddf008be2
add field id word count database
2021-05-31 16:27:28 +02:00
bors[bot]
2f5e61bacb
Merge #184
...
184: Transfer numbers and strings facets into the appropriate facet databases r=Kerollmops a=Kerollmops
This pull request is related to https://github.com/meilisearch/milli/issues/152 and changes the layout of the facets values, numbers and strings are now in dedicated databases and the user no more needs to define the type of the fields. No more conversion between the two types is done, numbers (floats and integers converted to f64) go to the facet float database and strings go to the strings facet database.
There is one related issue that I found regarding CSVs, the values in a CSV are always considered to be strings, [meilisearch/specifications#28 ](d916b57d74/text/0028-indexing-csv.md
) fixes this issue by allowing the user to define the fields types using `:` in the "CSV Formatting Rules" section.
All previous tests on facets have been modified to pass again and I have also done hand-driven tests with the 115m songs dataset. Everything seems to be good!
Fixes #192 .
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-05-31 13:32:58 +00:00
Kerollmops
1c0a5cd136
Resolve code modification suggestions
2021-05-31 15:22:50 +02:00
many
a5e98cf46d
Fix plane sweep algorithm
2021-05-25 18:21:55 +02:00
Clément Renault
3a4a150ef0
Fix the tests and remaining warnings
2021-05-25 11:31:06 +02:00
Clément Renault
02c655ff1a
Refine the facet distribution to use both databases
2021-05-25 11:30:00 +02:00
Clément Renault
79efded841
Refine the FacetCondition from_array constructor
2021-05-25 11:30:00 +02:00
Clément Renault
f7efde11d9
Refine the facet condition to use both facet databases
2021-05-25 11:30:00 +02:00
Clément Renault
e62b89a2ed
Make the facet distinct work with the new split facets
2021-05-25 11:30:00 +02:00
Clément Renault
bd7b285bae
Split the update side to use the number and the strings facet databases
2021-05-25 11:30:00 +02:00
Clément Renault
038e03a4e4
Use both facet databases in the FacetIter type
2021-05-25 11:30:00 +02:00
Clément Renault
597144b0b9
Use both number and string facet databases in the distinct system
2021-05-25 11:29:59 +02:00
Clément Renault
837c1041c7
Clear and delete the documents from the facet database
2021-05-25 11:28:36 +02:00
Clément Renault
a56c46b6f1
Explode the string and f64 facet databases into two
2021-05-25 11:28:36 +02:00
Clément Renault
df7a32e3d0
Move the creation date initialization into a function
2021-05-25 11:28:35 +02:00
many
a3944a7083
Introduce a filtered_candidates field
2021-05-11 11:37:40 +02:00
many
efba662ca6
Fix clippy warnings in cirteria
2021-05-10 10:27:18 +02:00
many
e923d51b8f
Make bucket candidates optionals
2021-05-10 10:27:04 +02:00
Marin Postma
eeb0c70ea2
meilisearch compatible primary key inference
2021-05-06 22:42:32 +02:00
Marin Postma
313c362461
early return on empty document addition
2021-05-06 18:14:16 +02:00
Many
44b6843de7
Fix pull request reviews
...
Update milli/src/fields_ids_map.rs
Update milli/src/search/criteria/exactness.rs
Update milli/src/search/criteria/mod.rs
2021-05-06 14:31:03 +02:00
many
c1ce4e4ca9
Introduce mocked ExactAttribute step in exactness criterion
2021-05-06 14:28:31 +02:00
many
a3f8686fbf
Introduce exactness criterion
2021-05-06 14:28:30 +02:00
bors[bot]
25f75d4d03
Merge #189
...
189: Update version for the next release (v0.2.1) r=Kerollmops a=curquiza
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-05-05 15:28:56 +00:00
Clémentine Urquizar
1e11578ef0
Update version for the next release (v0.2.1)
2021-05-05 14:57:34 +02:00
Alexey Shekhirin
f8d0f5265f
fix(update): fields distribution after documents merge
2021-05-04 22:12:20 +03:00
tamo
d61566787e
provide an iterator over all the documents in a milli index
2021-05-04 11:23:51 +02:00
Clémentine Urquizar
a8680887d8
Upgrade Milli version (v0.2.0)
2021-05-03 14:50:47 +02:00
Clémentine Urquizar
34e02aba42
Upgrade Tokenizer version (v0.2.2)
2021-05-03 10:55:55 +02:00
Alexey Shekhirin
d81c0e8bba
feat(update): disable autogenerate_docids by default
2021-04-30 21:41:34 +03:00
Marin Postma
e8e32e0ba1
make document addition number visible
2021-04-29 20:05:07 +02:00
many
ee09e50e7f
Remove excluded document in criteria iterations
...
- pass excluded document to criteria to remove them in higher levels of the bucket-sort
- merge already returned document with excluded documents to avoid duplicas
Related to #125 and #112
Fix #170
2021-04-29 12:09:38 +02:00
many
31607bf9cd
Add a threshold on proximity when choosing between linear/set algorithm
2021-04-28 14:57:22 +02:00
many
3b7e6afb55
Make some refacto and add documentation
2021-04-28 13:53:27 +02:00
Many
0add4d735c
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:40:34 +02:00
Many
3794ffc952
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:39:23 +02:00
Many
329bd4a1bb
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:39:03 +02:00
Many
3b1358b62f
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:32:19 +02:00
Many
c862b1bc6b
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:32:10 +02:00
Many
e92d137676
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:31:42 +02:00
Many
b3d6c6a9a0
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:31:13 +02:00
Many
498c2b298c
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:30:02 +02:00
Many
0e4e6dfada
Update milli/src/search/criteria/proximity.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:29:52 +02:00
Many
47d780b8ce
Update milli/src/search/criteria/mod.rs
...
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-04-27 14:39:53 +02:00
Many
0daa0e170a
Fix PR comments
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 14:39:53 +02:00
many
0d7d3ce802
Update roaring package
2021-04-27 14:39:53 +02:00
many
71740805a7
Fix forgotten typo tests
2021-04-27 14:39:53 +02:00
many
e77291a6f3
Optimize Atrribute criterion on big requests
2021-04-27 14:39:53 +02:00
many
716c8e22b0
Add style and comments
2021-04-27 14:39:52 +02:00
many
f853790016
Use the LCM of 10 first numbers to compute attribute rank
2021-04-27 14:39:52 +02:00
many
2b036449be
Fix the return of equal candidates in different pages
2021-04-27 14:39:52 +02:00
many
0efa011e09
Make a small code clean-up
2021-04-27 14:39:52 +02:00
many
17c8c6f945
Make set algorithm return None when nothing can be returned
2021-04-27 14:39:52 +02:00
many
b3e2280bb9
Debug attribute criterion
...
* debug folding when initializing iterators
2021-04-27 14:39:52 +02:00
many
1eee0029a8
Make attribute criterion typo/prefix tolerant
2021-04-27 14:39:52 +02:00
many
59f58c15f7
Implement attribute criterion
...
* Implement WordLevelIterator
* Implement QueryLevelIterator
* Implement set algorithm based on iterators
Not tested + Some TODO to fix
2021-04-27 14:39:52 +02:00
Clément Renault
361193099f
Reduce the amount of branches when query tree flattened
2021-04-27 14:39:52 +02:00
Kerollmops
e65bad16cc
Compute the words prefixes at the end of an update
2021-04-27 14:39:52 +02:00
many
ab92c814c3
Fix attributes score
2021-04-27 14:35:43 +02:00
Clément Renault
0ad9499b93
Fix an indexing bug in the words level positions
2021-04-27 14:35:43 +02:00
Clément Renault
7aa5753ed2
Make the attribute positions range bounds to be fixed
2021-04-27 14:35:43 +02:00
Clément Renault
658f316511
Introduce the Initial Criterion
2021-04-27 14:35:43 +02:00
Kerollmops
89ee2cf576
Introduce the TreeLevel struct
2021-04-27 14:25:35 +02:00
Kerollmops
bd1a371c62
Compute the WordsLevelPositions only once
2021-04-27 14:25:34 +02:00
Kerollmops
8bd4f5d93e
Compute the biggest values of the words_level_positions_docids
2021-04-27 14:25:34 +02:00
Kerollmops
f713828406
Implement the clear and delete documents for the word-level-positions database
2021-04-27 14:25:34 +02:00
Kerollmops
3069bf4f4a
Fix and improve the words-level-positions computation
2021-04-27 14:25:34 +02:00
Kerollmops
3a25137ee4
Expose and use the WordsLevelPositions update
2021-04-27 14:25:34 +02:00
Kerollmops
c765f277a3
Introduce the WordsLevelPositions update
2021-04-27 14:25:34 +02:00
Kerollmops
9242f2f1d4
Store the first word positions levels
2021-04-27 14:25:34 +02:00
Kerollmops
b0a417f342
Introduce the word_level_position_docids Index database
2021-04-27 14:25:34 +02:00
many
75e7b1e3da
Implement test Context methods
2021-04-27 14:25:34 +02:00
many
4ff67ec2ee
Implement attribute criterion for small amounts of candidates
2021-04-27 14:25:34 +02:00
Kerollmops
0f4c0beffd
Introduce the Attribute criterion
2021-04-27 14:25:34 +02:00
tamo
f8dee1b402
[makes clippy happy] search/criteria/proximity.rs
2021-04-21 12:36:45 +02:00