Commit Graph

540 Commits

Author SHA1 Message Date
Tamo
113a061bee
fix the error handling on the criterion side 2021-09-22 15:09:07 +02:00
Tamo
78b0bce9a1
fix the returned error when asc desc fails to be parsed 2021-09-22 11:37:05 +02:00
Clémentine Urquizar
f8ecbc28e2
Update version for the next release (v0.15.0) 2021-09-21 18:09:14 +02:00
mpostma
aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
bors[bot]
94764e5c7c
Merge #360
360: Update version for the next release (v0.14.0) r=Kerollmops a=curquiza

Release containing the geosearch, cf #322 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-21 08:43:27 +00:00
bors[bot]
31c8de1cca
Merge #322
322: Geosearch r=ManyTheFish a=irevoire

This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so.

### What we will have to do on the indexing part:
 - [x] Index the `_geo` fields from the documents.
   - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process.
   - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module.
   - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree`
- [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification.
- [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file.
- [x] save a Roaring bitmap of all documents containing the `_geo` field

### What we will have to do on the query part:
- [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range.
  - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum.
  - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents.
  - [x] Add the `_geoRadius` function in the pest parser.
- [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too!
  - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule.

-----------

- On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned.

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-20 19:04:57 +00:00
Irevoire
0d104a0fce
Update milli/src/criterion.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-20 18:13:17 +02:00
Clémentine Urquizar
3f1453f470
Update version for the next release (v0.14.0) 2021-09-20 18:12:23 +02:00
Tamo
f4b8e5675d
move the reserved keyword logic for the criterion and sort + add test 2021-09-20 17:21:02 +02:00
Irevoire
3b7a2cdbce
fix typo
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-20 16:10:39 +02:00
Tamo
c695a1ffd2
add the possibility to sort by descending order on geoPoint 2021-09-15 11:49:58 +02:00
Tamo
91ce4d1721
Stop iterating through the whole list of points
We stop when there is no possible candidates left
2021-09-15 11:49:58 +02:00
Clémentine Urquizar
f167f7b412
Update version for the next release (v0.13.1) 2021-09-10 09:48:17 +02:00
Tamo
cfc62a1c15
use geoutils instead of haversine 2021-09-09 18:11:38 +02:00
many
26deeb45a3
Add lacking parameter to word level position builder 2021-09-09 17:49:04 +02:00
Tamo
3fc145c254
if we have no rtree we return all other provided documents 2021-09-09 17:44:09 +02:00
Irevoire
a84f3a8b31
Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
c81ff22c5b
delete the invalid criterion name error in favor of invalid ranking rule name 2021-09-08 19:17:00 +02:00
Tamo
bad8ea47d5
edit the two lasts TODO comments 2021-09-08 18:24:09 +02:00
Tamo
b15c77ebc4
return an error in case a user try to sort with :desc 2021-09-08 18:24:09 +02:00
Tamo
4b618b95e4
rebase on main 2021-09-08 18:24:09 +02:00
Tamo
2988d3c76d
tests the geo filters 2021-09-08 18:24:09 +02:00
Tamo
e5ef0cad9a
use meters in the filters 2021-09-08 18:24:09 +02:00
Tamo
4f69b190bc
remove the distance from the search, the computation of the distance will be made on meilisearch side 2021-09-08 18:24:09 +02:00
Tamo
7ae2a7341c
introduce the reserved keywords in the filters 2021-09-08 18:24:09 +02:00
Tamo
6d5762a6c8
handle the case where you forgot entirely the parenthesis 2021-09-08 18:24:09 +02:00
Tamo
ebf82ac28c
improve the error messages and add tests for the filters 2021-09-08 18:24:09 +02:00
Tamo
bd4c248292
improve the error handling in general and introduce the concept of reserved keywords 2021-09-08 18:24:09 +02:00
Tamo
e8c093c1d0
fix the error handling in the filters 2021-09-08 18:24:09 +02:00
Tamo
f0b74637dc
fix all the tests 2021-09-08 18:24:09 +02:00
Tamo
b1bf7d4f40
reformat 2021-09-08 18:24:09 +02:00
Tamo
aca707413c
remove the memory leak 2021-09-08 18:24:09 +02:00
Tamo
a8a1f5bd55
move the geosearch criteria out of asc_desc.rs 2021-09-08 18:24:09 +02:00
Tamo
dc84ecc40b
fix a bug 2021-09-08 18:24:09 +02:00
Tamo
4820ac71a6
allow spaces in a geoRadius 2021-09-08 18:24:09 +02:00
Tamo
13c78e5aa2
Implement the _geoPoint in the sortable 2021-09-08 18:24:09 +02:00
Tamo
5bb175fc90
only index _geo if it's set as sortable OR filterable
and only allow the filters if geo was set to filterable
2021-09-08 17:51:08 +02:00
Tamo
f73273d71c
only call the extractor if needed 2021-09-08 17:51:08 +02:00
Irevoire
ea2f2ecf96
create a new database containing all the documents that were geo-faceted 2021-09-08 17:51:08 +02:00
Irevoire
4b459768a0
create the _geoRadius filter 2021-09-08 17:51:07 +02:00
Irevoire
6d70978edc
update the facet filter grammar 2021-09-08 17:51:07 +02:00
Irevoire
216a8aa3b2
add a tests for the indexation of the geosearch 2021-09-08 17:51:07 +02:00
Irevoire
a21c854790
handle errors 2021-09-08 17:51:07 +02:00
Irevoire
70ab2c37c5
remove multiple bugs 2021-09-08 17:51:07 +02:00
Irevoire
b4b6ba6d82
rename all the ’long’ into ’lng’ like written in the specification 2021-09-08 17:51:07 +02:00
Irevoire
3b9f1db061
implement the clear of the rtree 2021-09-08 17:51:07 +02:00
Irevoire
d344489c12
implement the deletion of geo points 2021-09-08 17:51:07 +02:00
Irevoire
44d6b6ae9e
Index the geo points 2021-09-08 17:51:07 +02:00
Irevoire
8d9c2c4425
create a new db with getters and setters 2021-09-08 17:51:07 +02:00
bors[bot]
b22aac92ac
Merge #342
342: Let the caller decide what kind of error they want to returns when parsing `AscDesc` r=Kerollmops a=irevoire

This is one possible fix for #339 
We would then need to patch these lines https://github.com/meilisearch/MeiliSearch/blob/main/meilisearch-http/src/index/search.rs#L110-L114 to return the error we want.

Another solution would be to add a parameter to the `from_str` to specify which context we are in.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-08 14:18:57 +00:00
Tamo
932998f5cc
let the caller decide if they want to return an invalidSortName or an
invalidCriterionName error
2021-09-08 16:17:31 +02:00
bors[bot]
86c3b0c8c2
Merge #350
350: Fix mdb val size error r=Kerollmops a=ManyTheFish

Related to [#1677](https://github.com/meilisearch/MeiliSearch/issues/1677)

Co-authored-by: many <maxime@meilisearch.com>
2021-09-08 13:32:15 +00:00
many
e54280fbfc
Skip empty normalized words 2021-09-08 15:25:23 +02:00
many
d18ee58ab9
Check if key are not empty in validator 2021-09-08 15:25:23 +02:00
Kerollmops
8a088fb99e
Bump grenad to v0.3.1 2021-09-08 14:08:55 +02:00
Kerollmops
20ad43b908
Enable the grenad tempfile feature back 2021-09-08 14:06:28 +02:00
bors[bot]
772e55d174
Merge #347
347: Update version for the next release (v0.13.0) r=curquiza a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-08 11:41:15 +00:00
many
9961b78b06
Drop sorter before creating a new one 2021-09-08 13:30:26 +02:00
Clémentine Urquizar
eb7b9d9dbf
Update version for the next release (v0.13.0) 2021-09-08 10:59:30 +02:00
bors[bot]
48d211b8b0
Merge #344
344: Move the sort ranking rule before the exactness ranking rule r=ManyTheFish a=Kerollmops

This PR moves the sort ranking rule at the 5th position by default, right before the exactness one.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-09-07 15:47:15 +00:00
bors[bot]
720becb5e8
Merge #341
341: Throw a query time error when a sort parameter is used but the sort ranking rule is missing r=Kerollmops a=Kerollmops

This PR makes the engine throw an error for when the ranking rules don't contain the `sort` rule, the `sortable_fields` are correctly set but the user tries to use the `sort` query parameter. Doing so will have no effect on the returned documents so we preferred returning an error to help debug this.

That's breaking on the MeiliSearch side as we added a new variant to the `UserError` enum.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-09-07 14:45:05 +00:00
Kerollmops
e2cefc9b4f
Move the sort ranking rule before the exactness ranking rule 2021-09-07 16:41:33 +02:00
mpostma
cd043d4461 remove unused grenad default features 2021-09-07 16:21:46 +02:00
Kerollmops
5989528833
Add a test to make sure we throw the right error message 2021-09-07 11:02:00 +02:00
Kerollmops
fd3daa4423
Throw a query time error when a sort param is used but sort ranking rule is missing 2021-09-07 11:02:00 +02:00
Kerollmops
8dca36433c
Introduce the new SortRankingRuleMissing user error variant 2021-09-07 11:01:59 +02:00
Alexey Shekhirin
0be09555f1
test(search): asc/desc criteria for large datasets 2021-09-03 18:00:08 +03:00
Alexey Shekhirin
c2517e7d5f
fix(facet): string fields sorting 2021-09-03 11:58:26 +03:00
bors[bot]
5cbe879325
Merge #308
308: Implement a better parallel indexer r=Kerollmops a=ManyTheFish

Rewrite the indexer:
- enhance memory consumption control
- optimize parallelism using rayon and crossbeam channel
- factorize the different parts and make new DB implementation easier
- optimize and fix prefix databases


Co-authored-by: many <maxime@meilisearch.com>
2021-09-02 15:03:52 +00:00
many
741a4444a9
Remove log in chunk generator 2021-09-02 16:57:46 +02:00
many
7f7fafb857
Make document_chunk_size settable from update builder 2021-09-02 15:25:39 +02:00
many
db0c681bae
Fix Pr comments 2021-09-02 15:17:52 +02:00
Clémentine Urquizar
285849e3a6
Update version for the next release (v0.12.0) 2021-09-02 10:08:41 +02:00
many
4860fd4529
Ignore empty facet values 2021-09-01 16:48:40 +02:00
many
b3a22f31f6
Fix memory consuption in word pair proximity extractor 2021-09-01 16:48:40 +02:00
many
9452fabfb2
Optimize cbo roaring bitmaps merge 2021-09-01 16:48:40 +02:00
many
8f702828ca
Ignore errors comming from crossbeam channel senders 2021-09-01 16:48:40 +02:00
many
e09eec37bc
Handle distance addition with hard separators 2021-09-01 16:48:40 +02:00
many
fc7cc770d4
Add logging timers 2021-09-01 16:48:40 +02:00
many
a2f59a28f7
Remove unwrap sending errors in channel 2021-09-01 16:48:40 +02:00
many
5c962c03dd
Fix and optimize word_prefix_pair_proximity_docids database 2021-09-01 16:48:40 +02:00
many
2d1727697d
Take stop word in account 2021-09-01 16:48:40 +02:00
many
823da19745
Fix test and use progress callback 2021-09-01 16:48:39 +02:00
many
1d314328f0
Plug new indexer 2021-09-01 16:48:36 +02:00
many
3aaf1d62f3
Publish grenad CompressionType type in milli 2021-09-01 16:42:08 +02:00
Alexey Shekhirin
0e379558a1
fix(search): get sortable_fields only if criteria present 2021-08-31 21:35:41 +03:00
bors[bot]
d6bba0663a
Merge #334
334: Wrap long values into BStr for warn logs r=Kerollmops a=shekhirin

Resolves https://github.com/meilisearch/milli/issues/263

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-08-31 17:38:54 +00:00
Alexey Shekhirin
0b02eb456c
chore(update): wrap long values into BStr for warn logs 2021-08-31 20:28:16 +03:00
Kerollmops
f230ae6fd5
Introduce the reset_sortable_fields Settings method 2021-08-25 17:44:16 +02:00
Kerollmops
af65485ba7
Reexport the grenad CompressionType from milli 2021-08-24 18:15:31 +02:00
Kerollmops
f2e1591826
Remove the unused tinytemplate dependency 2021-08-24 18:10:58 +02:00
Kerollmops
2f20257070
Update milli to the v0.11.0 2021-08-24 18:10:11 +02:00
Clément Renault
89d0758713
Revert "Revert "Sort at query time"" 2021-08-24 11:55:16 +02:00
Clémentine Urquizar
88f6c18665
Update version for the next release (v0.10.2) 2021-08-23 11:33:30 +02:00
Clément Renault
c084f7f731
Fix the facet string docids filterable deletion bug 2021-08-23 10:50:39 +02:00
Clémentine Urquizar
922f9fd4d5
Revert "Sort at query time" 2021-08-20 18:09:17 +02:00
bors[bot]
41fc0dcb62
Merge #309
309: Sort at query time r=Kerollmops a=Kerollmops

This PR:
 - Makes the `Asc/Desc` criteria work with strings too, it first returns documents ordered by numbers then by strings, and finally the documents that can't be ordered. Note that it is lexicographically ordered and not ordered by character, which means that it doesn't know about wide and short characters i.e. `a`, `丹`, `▲`.
 - Changes the syntax for the `Asc/Desc` criterion by now using a colon to separate the name and the order i.e. `title:asc`, `price:desc`.
 - Add the `Sort` criterion at the third position in the ranking rules by default.
 - Add the `sort_criteria` method to the `Search` builder struct to let the users define the `Asc/Desc` sortable attributes they want to use at query time. Note that we need to check that the fields are registered in the sortable attributes before performing the search.
 - Introduce a new `InvalidSortableAttribute` user error that is raised when the sort criteria declared at query time are not part of the sortable attributes.
 - `@ManyTheFish` introduced integration tests for the dynamic Sort criterion.

Fixes #305.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: many <maxime@meilisearch.com>
2021-08-18 16:55:32 +00:00
many
d1df0d20f9
Add integration test of SortBy criterion 2021-08-18 16:21:51 +02:00
Kerollmops
1b7f6ea1e7
Return a new error when the sort criteria is not sortable 2021-08-18 15:04:07 +02:00
Kerollmops
71602e0f1b
Add the sortable fields into the settings and in the index 2021-08-18 15:04:07 +02:00