Commit Graph

645 Commits

Author SHA1 Message Date
Tamo
47ee93b0bd
return an error when _geoPoint is used but _geo is not sortable 2021-09-22 16:37:41 +02:00
Tamo
1e5e3d57e2
auto convert AscDescError into CriterionError 2021-09-22 16:37:41 +02:00
Tamo
023446ecf3
create a smaller and easier to maintain CriterionError type 2021-09-22 16:37:41 +02:00
Tamo
86e272856a
create an asc_desc error type that is never supposed to be returned to the end user 2021-09-22 16:37:41 +02:00
Tamo
257e621d40
create an asc_desc module 2021-09-22 16:37:41 +02:00
Tamo
113a061bee
fix the error handling on the criterion side 2021-09-22 15:09:07 +02:00
Tamo
78b0bce9a1
fix the returned error when asc desc fails to be parsed 2021-09-22 11:37:05 +02:00
Clémentine Urquizar
f8ecbc28e2
Update version for the next release (v0.15.0) 2021-09-21 18:09:14 +02:00
mpostma
aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
bors[bot]
94764e5c7c
Merge #360
360: Update version for the next release (v0.14.0) r=Kerollmops a=curquiza

Release containing the geosearch, cf #322 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-21 08:43:27 +00:00
bors[bot]
31c8de1cca
Merge #322
322: Geosearch r=ManyTheFish a=irevoire

This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so.

### What we will have to do on the indexing part:
 - [x] Index the `_geo` fields from the documents.
   - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process.
   - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module.
   - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree`
- [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification.
- [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file.
- [x] save a Roaring bitmap of all documents containing the `_geo` field

### What we will have to do on the query part:
- [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range.
  - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum.
  - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents.
  - [x] Add the `_geoRadius` function in the pest parser.
- [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too!
  - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule.

-----------

- On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned.

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-20 19:04:57 +00:00
Irevoire
0d104a0fce
Update milli/src/criterion.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-20 18:13:17 +02:00
Clémentine Urquizar
3f1453f470
Update version for the next release (v0.14.0) 2021-09-20 18:12:23 +02:00
Tamo
f4b8e5675d
move the reserved keyword logic for the criterion and sort + add test 2021-09-20 17:21:02 +02:00
Irevoire
3b7a2cdbce
fix typo
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-20 16:10:39 +02:00
Tamo
c695a1ffd2
add the possibility to sort by descending order on geoPoint 2021-09-15 11:49:58 +02:00
Tamo
91ce4d1721
Stop iterating through the whole list of points
We stop when there is no possible candidates left
2021-09-15 11:49:58 +02:00
Clémentine Urquizar
f167f7b412
Update version for the next release (v0.13.1) 2021-09-10 09:48:17 +02:00
Tamo
cfc62a1c15
use geoutils instead of haversine 2021-09-09 18:11:38 +02:00
many
26deeb45a3
Add lacking parameter to word level position builder 2021-09-09 17:49:04 +02:00
Tamo
3fc145c254
if we have no rtree we return all other provided documents 2021-09-09 17:44:09 +02:00
Irevoire
a84f3a8b31
Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
c81ff22c5b
delete the invalid criterion name error in favor of invalid ranking rule name 2021-09-08 19:17:00 +02:00
Tamo
bad8ea47d5
edit the two lasts TODO comments 2021-09-08 18:24:09 +02:00
Tamo
b15c77ebc4
return an error in case a user try to sort with :desc 2021-09-08 18:24:09 +02:00
Tamo
4b618b95e4
rebase on main 2021-09-08 18:24:09 +02:00
Tamo
2988d3c76d
tests the geo filters 2021-09-08 18:24:09 +02:00
Tamo
e5ef0cad9a
use meters in the filters 2021-09-08 18:24:09 +02:00
Tamo
4f69b190bc
remove the distance from the search, the computation of the distance will be made on meilisearch side 2021-09-08 18:24:09 +02:00
Tamo
7ae2a7341c
introduce the reserved keywords in the filters 2021-09-08 18:24:09 +02:00
Tamo
6d5762a6c8
handle the case where you forgot entirely the parenthesis 2021-09-08 18:24:09 +02:00
Tamo
ebf82ac28c
improve the error messages and add tests for the filters 2021-09-08 18:24:09 +02:00
Tamo
bd4c248292
improve the error handling in general and introduce the concept of reserved keywords 2021-09-08 18:24:09 +02:00
Tamo
e8c093c1d0
fix the error handling in the filters 2021-09-08 18:24:09 +02:00
Tamo
f0b74637dc
fix all the tests 2021-09-08 18:24:09 +02:00
Tamo
b1bf7d4f40
reformat 2021-09-08 18:24:09 +02:00
Tamo
aca707413c
remove the memory leak 2021-09-08 18:24:09 +02:00
Tamo
a8a1f5bd55
move the geosearch criteria out of asc_desc.rs 2021-09-08 18:24:09 +02:00
Tamo
dc84ecc40b
fix a bug 2021-09-08 18:24:09 +02:00
Tamo
4820ac71a6
allow spaces in a geoRadius 2021-09-08 18:24:09 +02:00
Tamo
13c78e5aa2
Implement the _geoPoint in the sortable 2021-09-08 18:24:09 +02:00
Tamo
5bb175fc90
only index _geo if it's set as sortable OR filterable
and only allow the filters if geo was set to filterable
2021-09-08 17:51:08 +02:00
Tamo
f73273d71c
only call the extractor if needed 2021-09-08 17:51:08 +02:00
Irevoire
ea2f2ecf96
create a new database containing all the documents that were geo-faceted 2021-09-08 17:51:08 +02:00
Irevoire
4b459768a0
create the _geoRadius filter 2021-09-08 17:51:07 +02:00
Irevoire
6d70978edc
update the facet filter grammar 2021-09-08 17:51:07 +02:00
Irevoire
216a8aa3b2
add a tests for the indexation of the geosearch 2021-09-08 17:51:07 +02:00
Irevoire
a21c854790
handle errors 2021-09-08 17:51:07 +02:00
Irevoire
70ab2c37c5
remove multiple bugs 2021-09-08 17:51:07 +02:00
Irevoire
b4b6ba6d82
rename all the ’long’ into ’lng’ like written in the specification 2021-09-08 17:51:07 +02:00
Irevoire
3b9f1db061
implement the clear of the rtree 2021-09-08 17:51:07 +02:00
Irevoire
d344489c12
implement the deletion of geo points 2021-09-08 17:51:07 +02:00
Irevoire
44d6b6ae9e
Index the geo points 2021-09-08 17:51:07 +02:00
Irevoire
8d9c2c4425
create a new db with getters and setters 2021-09-08 17:51:07 +02:00
bors[bot]
b22aac92ac
Merge #342
342: Let the caller decide what kind of error they want to returns when parsing `AscDesc` r=Kerollmops a=irevoire

This is one possible fix for #339 
We would then need to patch these lines https://github.com/meilisearch/MeiliSearch/blob/main/meilisearch-http/src/index/search.rs#L110-L114 to return the error we want.

Another solution would be to add a parameter to the `from_str` to specify which context we are in.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-08 14:18:57 +00:00
Tamo
932998f5cc
let the caller decide if they want to return an invalidSortName or an
invalidCriterionName error
2021-09-08 16:17:31 +02:00
bors[bot]
86c3b0c8c2
Merge #350
350: Fix mdb val size error r=Kerollmops a=ManyTheFish

Related to [#1677](https://github.com/meilisearch/MeiliSearch/issues/1677)

Co-authored-by: many <maxime@meilisearch.com>
2021-09-08 13:32:15 +00:00
many
e54280fbfc
Skip empty normalized words 2021-09-08 15:25:23 +02:00
many
d18ee58ab9
Check if key are not empty in validator 2021-09-08 15:25:23 +02:00
Kerollmops
8a088fb99e
Bump grenad to v0.3.1 2021-09-08 14:08:55 +02:00
Kerollmops
20ad43b908
Enable the grenad tempfile feature back 2021-09-08 14:06:28 +02:00
bors[bot]
772e55d174
Merge #347
347: Update version for the next release (v0.13.0) r=curquiza a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-08 11:41:15 +00:00
many
9961b78b06
Drop sorter before creating a new one 2021-09-08 13:30:26 +02:00
Clémentine Urquizar
eb7b9d9dbf
Update version for the next release (v0.13.0) 2021-09-08 10:59:30 +02:00
bors[bot]
48d211b8b0
Merge #344
344: Move the sort ranking rule before the exactness ranking rule r=ManyTheFish a=Kerollmops

This PR moves the sort ranking rule at the 5th position by default, right before the exactness one.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-09-07 15:47:15 +00:00
bors[bot]
720becb5e8
Merge #341
341: Throw a query time error when a sort parameter is used but the sort ranking rule is missing r=Kerollmops a=Kerollmops

This PR makes the engine throw an error for when the ranking rules don't contain the `sort` rule, the `sortable_fields` are correctly set but the user tries to use the `sort` query parameter. Doing so will have no effect on the returned documents so we preferred returning an error to help debug this.

That's breaking on the MeiliSearch side as we added a new variant to the `UserError` enum.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-09-07 14:45:05 +00:00
Kerollmops
e2cefc9b4f
Move the sort ranking rule before the exactness ranking rule 2021-09-07 16:41:33 +02:00
mpostma
cd043d4461 remove unused grenad default features 2021-09-07 16:21:46 +02:00
Kerollmops
5989528833
Add a test to make sure we throw the right error message 2021-09-07 11:02:00 +02:00
Kerollmops
fd3daa4423
Throw a query time error when a sort param is used but sort ranking rule is missing 2021-09-07 11:02:00 +02:00
Kerollmops
8dca36433c
Introduce the new SortRankingRuleMissing user error variant 2021-09-07 11:01:59 +02:00
Alexey Shekhirin
0be09555f1
test(search): asc/desc criteria for large datasets 2021-09-03 18:00:08 +03:00
Alexey Shekhirin
c2517e7d5f
fix(facet): string fields sorting 2021-09-03 11:58:26 +03:00
bors[bot]
5cbe879325
Merge #308
308: Implement a better parallel indexer r=Kerollmops a=ManyTheFish

Rewrite the indexer:
- enhance memory consumption control
- optimize parallelism using rayon and crossbeam channel
- factorize the different parts and make new DB implementation easier
- optimize and fix prefix databases


Co-authored-by: many <maxime@meilisearch.com>
2021-09-02 15:03:52 +00:00
many
741a4444a9
Remove log in chunk generator 2021-09-02 16:57:46 +02:00
many
7f7fafb857
Make document_chunk_size settable from update builder 2021-09-02 15:25:39 +02:00
many
db0c681bae
Fix Pr comments 2021-09-02 15:17:52 +02:00
Clémentine Urquizar
285849e3a6
Update version for the next release (v0.12.0) 2021-09-02 10:08:41 +02:00
many
4860fd4529
Ignore empty facet values 2021-09-01 16:48:40 +02:00
many
b3a22f31f6
Fix memory consuption in word pair proximity extractor 2021-09-01 16:48:40 +02:00
many
9452fabfb2
Optimize cbo roaring bitmaps merge 2021-09-01 16:48:40 +02:00
many
8f702828ca
Ignore errors comming from crossbeam channel senders 2021-09-01 16:48:40 +02:00
many
e09eec37bc
Handle distance addition with hard separators 2021-09-01 16:48:40 +02:00
many
fc7cc770d4
Add logging timers 2021-09-01 16:48:40 +02:00
many
a2f59a28f7
Remove unwrap sending errors in channel 2021-09-01 16:48:40 +02:00
many
5c962c03dd
Fix and optimize word_prefix_pair_proximity_docids database 2021-09-01 16:48:40 +02:00
many
2d1727697d
Take stop word in account 2021-09-01 16:48:40 +02:00
many
823da19745
Fix test and use progress callback 2021-09-01 16:48:39 +02:00
many
1d314328f0
Plug new indexer 2021-09-01 16:48:36 +02:00
many
3aaf1d62f3
Publish grenad CompressionType type in milli 2021-09-01 16:42:08 +02:00
Alexey Shekhirin
0e379558a1
fix(search): get sortable_fields only if criteria present 2021-08-31 21:35:41 +03:00
bors[bot]
d6bba0663a
Merge #334
334: Wrap long values into BStr for warn logs r=Kerollmops a=shekhirin

Resolves https://github.com/meilisearch/milli/issues/263

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-08-31 17:38:54 +00:00
Alexey Shekhirin
0b02eb456c
chore(update): wrap long values into BStr for warn logs 2021-08-31 20:28:16 +03:00
Kerollmops
f230ae6fd5
Introduce the reset_sortable_fields Settings method 2021-08-25 17:44:16 +02:00
Kerollmops
af65485ba7
Reexport the grenad CompressionType from milli 2021-08-24 18:15:31 +02:00
Kerollmops
f2e1591826
Remove the unused tinytemplate dependency 2021-08-24 18:10:58 +02:00
Kerollmops
2f20257070
Update milli to the v0.11.0 2021-08-24 18:10:11 +02:00
Clément Renault
89d0758713
Revert "Revert "Sort at query time"" 2021-08-24 11:55:16 +02:00
Clémentine Urquizar
88f6c18665
Update version for the next release (v0.10.2) 2021-08-23 11:33:30 +02:00
Clément Renault
c084f7f731
Fix the facet string docids filterable deletion bug 2021-08-23 10:50:39 +02:00
Clémentine Urquizar
922f9fd4d5
Revert "Sort at query time" 2021-08-20 18:09:17 +02:00
bors[bot]
41fc0dcb62
Merge #309
309: Sort at query time r=Kerollmops a=Kerollmops

This PR:
 - Makes the `Asc/Desc` criteria work with strings too, it first returns documents ordered by numbers then by strings, and finally the documents that can't be ordered. Note that it is lexicographically ordered and not ordered by character, which means that it doesn't know about wide and short characters i.e. `a`, `丹`, `▲`.
 - Changes the syntax for the `Asc/Desc` criterion by now using a colon to separate the name and the order i.e. `title:asc`, `price:desc`.
 - Add the `Sort` criterion at the third position in the ranking rules by default.
 - Add the `sort_criteria` method to the `Search` builder struct to let the users define the `Asc/Desc` sortable attributes they want to use at query time. Note that we need to check that the fields are registered in the sortable attributes before performing the search.
 - Introduce a new `InvalidSortableAttribute` user error that is raised when the sort criteria declared at query time are not part of the sortable attributes.
 - `@ManyTheFish` introduced integration tests for the dynamic Sort criterion.

Fixes #305.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: many <maxime@meilisearch.com>
2021-08-18 16:55:32 +00:00
many
d1df0d20f9
Add integration test of SortBy criterion 2021-08-18 16:21:51 +02:00
Kerollmops
1b7f6ea1e7
Return a new error when the sort criteria is not sortable 2021-08-18 15:04:07 +02:00
Kerollmops
71602e0f1b
Add the sortable fields into the settings and in the index 2021-08-18 15:04:07 +02:00
Kerollmops
407f53872a
Add a sort_criteria method to the Search builder struct 2021-08-18 15:04:07 +02:00
Kerollmops
687cd2e205
Introduce the new Sort criterion and AscDesc enum 2021-08-18 15:04:07 +02:00
bors[bot]
198c416bd8
Merge #312
312: Update milli version to v0.10.1 r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-18 12:08:04 +00:00
Clémentine Urquizar
6cb9c3b81f
Update milli version to v0.10.1 2021-08-18 13:46:27 +02:00
Clémentine Urquizar
42cf847a63
Update tokenizer version to v0.2.5 2021-08-18 13:37:41 +02:00
Kerollmops
5b88df508e
Use the new Asc/Desc syntax everywhere 2021-08-17 14:15:22 +02:00
Kerollmops
fcedff95e8
Change the Asc/Desc criterion syntax to use a colon (:) 2021-08-17 14:03:21 +02:00
Kerollmops
e9ada44509
AscDesc criterion returns documents ordered by numbers then by strings 2021-08-17 13:21:31 +02:00
Kerollmops
110bf6b778
Make the FacetStringIter work in both, ascending and descending orders 2021-08-17 11:18:40 +02:00
Kerollmops
22ebd2658f
Introduce the EitherString/RevRange private aliases 2021-08-17 10:47:15 +02:00
Kerollmops
7a5889bc5a
Introduce the highest_reverse_iter private method 2021-08-17 10:45:26 +02:00
Kerollmops
ad0d311f8a
Introduce the FacetStringLevelZeroRevRange struct 2021-08-17 10:44:43 +02:00
Kerollmops
6214c38da9
Introduce the FacetStringGroupRevRange struct 2021-08-17 10:44:27 +02:00
Kerollmops
1c604de158
Introduce the highest_iter private method on the FacetStringIter struct 2021-08-17 10:41:11 +02:00
Kerollmops
64df159057
Introduce the new_reducing constructor on the FacetStringIter struct 2021-08-17 10:35:06 +02:00
Kerollmops
01a4052828
Move the FacetStringIter creation logic into a private new method 2021-08-17 10:29:43 +02:00
bors[bot]
51581d14f8
Merge #307
307: Update version for the next release (v0.10.0) r=Kerollmops a=curquiza

Replaces https://github.com/meilisearch/milli/pull/304

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-16 10:33:53 +00:00
Clémentine Urquizar
fcc520e49a
Update version for the next release (v0.10.0) 2021-08-16 12:00:28 +02:00
many
7dbefae1e3
Make facet string iterator non reducing 2021-08-12 17:23:39 +02:00
many
8fdf860c17
Remove max values by facet limit for facet distribution 2021-08-12 11:29:20 +02:00
bors[bot]
2102e0da6b
Merge #302
302: Update milli to v0.9.0 r=curquiza a=curquiza

Updating the minor and not patch since #300 seems to be breaking: it involves a re-indexation to get the fix, so it involves an additional step from the users, not only downloading the latest version.

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-05 08:38:15 +00:00
bors[bot]
89b9b61840
Merge #300
300: Fix prefix level position docids database r=curquiza a=ManyTheFish

The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.

Fix #299

Co-authored-by: many <maxime@meilisearch.com>
2021-08-04 16:52:09 +00:00
Clémentine Urquizar
7f26c75610
Update milli to v0.9.0 2021-08-04 16:04:55 +02:00
many
cdeb07f0fd
Fix prefix level position docids database
The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.

Fix #299
2021-08-04 14:11:49 +02:00
bors[bot]
1290edd58a
Merge #297
297: Bump milli to v0.8.1 r=curquiza a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-29 14:19:41 +00:00
Kerollmops
341c244965
Bump milli to v0.8.1 2021-07-29 15:56:36 +02:00
Kerollmops
90514e03d1
Fix invalid faceted documents ids buffer size 2021-07-29 15:49:23 +02:00
bors[bot]
200e98c211
Merge #293
293: Make sure that the relevancy is not impacted by other settings r=Kerollmops a=Kerollmops

Fix https://github.com/meilisearch/meilisearch/issues/1505.

fix https://github.com/meilisearch/MeiliSearch/issues/1529

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-27 16:04:52 +00:00
Clémentine Urquizar
6a141694da
Update version for the next release (v0.8.0) 2021-07-27 16:38:42 +02:00
Kerollmops
dc2b63abdf
Introduce an empty FilterCondition variant to support unknown fields 2021-07-27 16:34:04 +02:00
Kerollmops
b12738cfe9
Use the right DB prefixes to store the faceted fields 2021-07-22 19:18:22 +02:00
Kerollmops
7aa6cc9b04
Do not insert fields in the map when changing the settings 2021-07-22 18:40:12 +02:00
bors[bot]
ee3a49cfba
Merge #291
291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops

Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released.

It is not an undefined behavior, I repeat: it is a "normal" bug 🎉 👏

----

This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it.

As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`.

I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes.

<details>
  <summary>The backtrace leading us to a panic in grenad.</summary>

```
meilisearch_1    | thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1    | stack backtrace:
meilisearch_1    |    0: rust_begin_unwind
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5
meilisearch_1    |    1: core::panicking::panic_fmt
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14
meilisearch_1    |    2: core::panicking::panic
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5
meilisearch_1    |    3: grenad::block_builder::BlockBuilder::insert
meilisearch_1    |              at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1    |    4: grenad::writer::Writer<W>::insert
meilisearch_1    |              at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12
meilisearch_1    |    5: milli::update::words_level_positions::write_level_entry
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5
meilisearch_1    |    6: milli::update::words_level_positions::compute_positions_levels
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13
meilisearch_1    |    7: milli::update::words_level_positions::WordsLevelPositions::execute
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23
meilisearch_1    |    8: milli::update::index_documents::IndexDocuments::execute_raw
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9
meilisearch_1    |    9: milli::update::index_documents::IndexDocuments::execute
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9
meilisearch_1    |   10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30
meilisearch_1    |   11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22
meilisearch_1    |   12: meilisearch_http::index::update_handler::UpdateHandler::handle_update
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18
meilisearch_1    |   13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}}
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35
meilisearch_1    |   14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21
meilisearch_1    |   15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17
meilisearch_1    |   16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9
meilisearch_1    |   17: tokio::runtime::task::core::CoreStage<T>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13
meilisearch_1    |   18: tokio::runtime::task::harness::poll_future::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23
meilisearch_1    |   19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
meilisearch_1    |   20: std::panicking::try::do_call
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
meilisearch_1    |   21: std::panicking::try
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
meilisearch_1    |   22: std::panic::catch_unwind
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
meilisearch_1    |   23: tokio::runtime::task::harness::poll_future
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19
meilisearch_1    |   24: tokio::runtime::task::harness::Harness<T,S>::poll_inner
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9
meilisearch_1    |   25: tokio::runtime::task::harness::Harness<T,S>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15
meilisearch_1    |   26: tokio::runtime::task::raw::RawTask::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18
meilisearch_1    |   27: tokio::runtime::task::Notified<S>::run
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9
meilisearch_1    |   28: tokio::runtime::blocking::pool::Inner::run
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17
meilisearch_1    |   29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17
meilisearch_1    | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```

</details>

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-22 16:14:35 +00:00
Kerollmops
0353fbb5df
Bump the tokenizer version to v0.2.4 2021-07-22 17:14:45 +02:00
Kerollmops
92c0a2cdc1
Add a test that triggers a panic when indexing zeroes 2021-07-22 17:14:44 +02:00
Kerollmops
aa02a7fdd8
Add a test to check that we indeed impact the relevancy 2021-07-22 17:04:38 +02:00
Clément Renault
0227254a65
Return the original string values for the inverted facet index database 2021-07-21 16:59:39 +02:00
Kerollmops
03a01166ba
Display the original facet string value from the linear facet database 2021-07-21 16:59:39 +02:00
Clément Renault
d23c250ad5
Fix a bound error in the facet string range construction 2021-07-21 16:59:39 +02:00
Clément Renault
081278dfd6
Use the facet string levels when computing the facet distribution 2021-07-21 16:59:39 +02:00
Clément Renault
5676b204dd
Fix the facet string levels codecs 2021-07-21 16:59:38 +02:00
Kerollmops
8c86348119
Indexing the facet strings levels 2021-07-21 16:59:38 +02:00
Kerollmops
a7ae552ba7
Fix the FacetStringLevelZeroRange range when unbounded 2021-07-21 16:59:38 +02:00
Kerollmops
757b2b502a
Remove the FacetValueStringCodec 2021-07-21 16:59:38 +02:00
Kerollmops
adfd4da24c
Introduce the FacetStringIter iterator 2021-07-21 16:59:38 +02:00
Kerollmops
a79661c6dc
Introduce a lot of facet string helper iterators 2021-07-21 16:59:38 +02:00
Kerollmops
851f979039
Describe the way we want to group the facet strings 2021-07-21 16:59:38 +02:00
Kerollmops
f858f64b1f
Move the facet number iterators into their own module 2021-07-21 16:59:37 +02:00
Kerollmops
9f8095c069
Make sure that we don't keep a reference on the LMDB key when using put_current 2021-07-21 10:35:35 +02:00
Kerollmops
a9553af635
Add a test to check that we can index more that 256 fields 2021-07-06 11:58:03 +02:00
Kerollmops
838ed1cd32
Use an u16 field id instead of one byte 2021-07-06 11:58:03 +02:00
Kerollmops
91c5d0c042
Use the AlwaysFreePages flag when opening an index 2021-07-05 16:36:13 +02:00
Kerollmops
a6b4069172
Bump to v0.7.2 2021-07-05 10:54:53 +02:00
many
9f62149b94
Fix matching lenghth in matching_words 2021-07-01 19:03:28 +02:00
Clémentine Urquizar
3c149d8a43
Update tokenizer version to v0.2.3 2021-06-30 18:41:35 +02:00
bors[bot]
b4dcdbf00d
Merge #269 #271
269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops

This PR fixes #268.

The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted.

The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html).

271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops

In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-30 12:34:55 +00:00
Kerollmops
32b7bd366f
Remove the roaring operation functions warnings 2021-06-30 14:12:56 +02:00
Kerollmops
c92ef54466
Add a test for when we insert a previously deleted document 2021-06-30 14:00:01 +02:00
Kerollmops
28782ff99d
Fix ExternalDocumentsIds struct when inserting previously deleted ids 2021-06-30 14:00:01 +02:00
Clémentine Urquizar
b489515f4d
Update milli version to v0.7.1 2021-06-30 13:52:46 +02:00
Kerollmops
54889813ce
Implement some debug functions on the ExternalDocumentsIds struct 2021-06-30 11:29:41 +02:00
Kerollmops
4bce66d5ff
Make the Index::delete_* method private 2021-06-30 10:07:31 +02:00
Irevoire
6044b80362
Update milli/src/search/matching_words.rs
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2021-06-30 00:35:26 +02:00
Tamo
be75e738b1
add more tests 2021-06-29 16:24:58 +02:00
Tamo
56fceb1928
re-implement the Damerau-Levenshtein used for the highlighting 2021-06-29 15:36:03 +02:00
Clément Renault
80c6aaf1fd
Bump milli to 0.7.0 2021-06-28 18:31:56 +02:00
Clément Renault
bdc5599b73
Bump heed to use the git repo with v0.12.0 2021-06-28 18:26:20 +02:00
Clément Renault
0013236e5d
Fix the LMDB and heed invalid interactions.
It is undefined behavior to keep a reference to the database while
modifying it, we were keeping references in the database and also
feeding the heed put_current methods with keys referenced inside
the database itself.

https://github.com/Kerollmops/heed/pull/108
2021-06-28 16:19:02 +02:00
Kerollmops
9e5f9a8a10
Add a test for the words level positions generation bug 2021-06-28 16:08:31 +02:00
Kerollmops
98285b4b18
Bump milli to 0.6.0 2021-06-23 17:30:26 +02:00
Kerollmops
4fc8f06791
Rename faceted_fields into filterable_fields 2021-06-23 17:26:54 +02:00
Kerollmops
c31cadb54f
Do not consider the searchable field as filterable 2021-06-23 17:26:54 +02:00
bors[bot]
2ab24c4f49
Merge #256
256: Update version for the next release (v0.5.1) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-23 12:29:57 +00:00
Clémentine Urquizar
9885fb4159
Update version for the next release (v0.5.1) 2021-06-23 14:05:20 +02:00
Kerollmops
a6218a20ae
Introduce a new InvalidFacetsDistribution user error 2021-06-23 13:56:19 +02:00
Kerollmops
2364777838
Return an error for when a field distribution cannot be done 2021-06-23 11:50:49 +02:00
Kerollmops
aeaac743ff
Replace an if let some by a match 2021-06-23 11:33:30 +02:00
Tamo
8d2a0b43ff
run the formatter on the whole project a second time 2021-06-22 15:36:22 +02:00
Tamo
3d90b03d7b
fix the limit
There was no check on the limit and thus, if a user especified a very large number this line could causes a panic
2021-06-22 14:52:13 +02:00
bors[bot]
5b6adc6d96
Merge #245
245: Warn for when a key is too large for LMDB r=Kerollmops a=Kerollmops

Closes #191, and resolves #140.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-22 12:10:52 +00:00
Kerollmops
51dbb2e06d
Warn for when a key is too large for LMDB 2021-06-22 11:51:36 +02:00
Kerollmops
aecbd14761
Improve the error message for InvalidDocumentId 2021-06-22 11:31:58 +02:00
Kerollmops
0cca2ea24f
Return a MissingDocumentId when a document doesn't have one 2021-06-22 11:22:33 +02:00
Kerollmops
481b0bf277
Warn for when a facet key is too large for LMDB 2021-06-22 10:57:46 +02:00
bors[bot]
b073fd49ea
Merge #244
244: Update version for the next release (v0.5.0) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-21 14:27:10 +00:00
Clémentine Urquizar
320670f8fe
Update version for the next release (v0.5.0) 2021-06-21 15:59:17 +02:00
Clémentine Urquizar
daef43f504
Rename FieldsDistribution into FieldDistribution 2021-06-21 15:57:41 +02:00
Clémentine Urquizar
35fcc351a0
Update version for the next release (v0.4.2) 2021-06-20 17:37:24 +02:00
bors[bot]
5b19dd23d9
Merge #240
240: Field distribution r=Kerollmops a=irevoire

closes #199
closes #198 


Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-19 10:14:25 +00:00
Tamo
d08cfda796
convert the field_distribution to a BTreeMap and avoid counting twice the same documents 2021-06-17 18:31:54 +02:00
bors[bot]
a9e552ab18
Merge #238
238: Integration tests on filters and distinct r=Kerollmops a=ManyTheFish

Fix #216 
Fix #120 

Co-authored-by: many <maxime@meilisearch.com>
2021-06-17 15:00:51 +00:00
many
6cb1102bdb
Fix PR comments 2021-06-17 15:19:03 +02:00
Tamo
969adaefdf
rename fields_distribution in field_distribution 2021-06-17 15:16:20 +02:00
Kerollmops
ccd6f13793
Update version to the next release (0.4.1) 2021-06-17 15:01:20 +02:00
many
f496cd320d
Add distinct integration tests 2021-06-17 14:33:18 +02:00