1054 Commits

Author SHA1 Message Date
Clément Renault
ff8d7a810d
Change the behavior of the as_cloneable_grenad by taking a ref 2022-02-16 15:40:08 +01:00
Clément Renault
f367cc2e75
Finally bump grenad to v0.4.1 2022-02-16 15:28:48 +01:00
Irevoire
48542ac8fd
get rid of chrono in favor of time 2022-02-15 11:41:55 +01:00
bors[bot]
5d58cb7449
Merge #442
442: fix phrase search r=curquiza a=MarinPostma

Run the exact match search on 7 words windows instead of only two. This makes false positive very very unlikely, and impossible on phrase query that are less than seven words.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-02-07 16:18:20 +00:00
ad hoc
bd2262ceea
allow null values in csv 2022-02-03 16:03:01 +01:00
ad hoc
13de251047
rewrite word pair distance gathering 2022-02-03 15:57:20 +01:00
Many
d59bcea749 Revert "Revert "Change chunk size to 4MiB to fit more the end user usage"" 2022-02-02 17:01:13 +01:00
mpostma
7541ab99cd
review changes 2022-02-02 12:59:01 +01:00
mpostma
d0aabde502
optimize 2 typos case 2022-02-02 12:56:09 +01:00
mpostma
55e6cb9c7b
typos on first letter counts as 2 2022-02-02 12:56:09 +01:00
mpostma
642c01d0dc
set max typos on ngram to 1 2022-02-02 12:56:08 +01:00
ad hoc
d852dc0d2b
fix phrase search 2022-02-01 20:21:33 +01:00
Kerollmops
fb79c32430
Compute the new, common and, deleted prefix words fst once 2022-01-27 11:00:18 +01:00
Clément Renault
51d1e64b23
Remove, now useless, the WriteMethod enum 2022-01-27 10:08:35 +01:00
Clément Renault
e9c02173cf
Rework the WordsPrefixPositionDocids update to compute a subset of the database 2022-01-27 10:08:35 +01:00
Clément Renault
dbba5fd461
Create a function to simplify the word prefix pair proximity docids compute 2022-01-27 10:08:35 +01:00
Clément Renault
e760e02737
Fix the computation of the newly added and common prefix pair proximity words 2022-01-27 10:08:35 +01:00
Clément Renault
d59e559317
Fix the computation of the newly added and common prefix words 2022-01-27 10:08:34 +01:00
Clément Renault
2ec8542105
Rework the WordPrefixDocids update to compute a subset of the database 2022-01-27 10:08:34 +01:00
Clément Renault
28692f65be
Rework the WordPrefixDocids update to compute a subset of the database 2022-01-27 10:08:34 +01:00
Clément Renault
5404bc02dd
Move the fst_stream_into_hashset method in the helper methods 2022-01-27 10:06:00 +01:00
Clément Renault
c90fa95f93
Only compute the word prefix pairs on the created word pair proximities 2022-01-27 10:06:00 +01:00
Clément Renault
822f67e9ad
Bring the newly created word pair proximity docids 2022-01-27 10:06:00 +01:00
Clément Renault
d28f18658e
Retrieve the previous version of the words prefixes FST 2022-01-27 10:05:59 +01:00
Clément Renault
f9b214f34e
Apply suggestions from code review
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2022-01-26 11:28:11 +01:00
Clément Renault
f04cd19886
Introduce a max prefix length parameter to the word prefix pair proximity update 2022-01-25 17:04:23 +01:00
Clément Renault
1514dfa1b7
Introduce a max proximity parameter to the word prefix pair proximity update 2022-01-25 17:04:23 +01:00
Clément Renault
23ea3ad738
Remove the useless threshold when computing the word prefix pair proximity 2022-01-25 17:04:23 +01:00
Clément Renault
e3c34684c6
Fix a bug where we were skipping most of the prefix pairs 2022-01-25 17:04:23 +01:00
bors[bot]
fd177b63f8
Merge #423
423: Remove an unused file r=irevoire a=irevoire

This empty file is not included anywhere

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-19 14:18:05 +00:00
Marin Postma
0c84a40298 document batch support
reusable transform

rework update api

add indexer config

fix tests

review changes

Co-authored-by: Clément Renault <clement@meilisearch.com>

fmt
2022-01-19 12:40:20 +01:00
Tamo
01968d7ca7
ensure we get no documents and no error when filtering on an empty db 2022-01-18 11:40:30 +01:00
bors[bot]
8f4499090b
Merge #433
433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire

- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
  documents containing this field thus we return an empty RoaringBitmap
  instead of throwing an internal error

Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-17 14:06:53 +00:00
Tamo
d1ac40ea14
fix(filter): Fix two bugs.
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
  documents containing this field thus we returns an empty RoaringBitmap
  instead of throwing an internal error
2022-01-17 13:51:46 +01:00
Samyak S Sarnayak
2d7607734e
Run cargo fmt on matching_words.rs 2022-01-17 13:04:33 +05:30
Samyak S Sarnayak
5ab505be33
Fix highlight by replacing num_graphemes_from_bytes
num_graphemes_from_bytes has been renamed in the tokenizer to
num_chars_from_bytes.

Highlight now works correctly!
2022-01-17 13:02:55 +05:30
Samyak S Sarnayak
e752bd06f7
Fix matching_words tests to compile successfully
The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59
2022-01-17 11:37:45 +05:30
Samyak S Sarnayak
30247d70cd
Fix search highlight for non-unicode chars
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
  clusters to highlight.

In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
  clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a
      part of it matched
2022-01-17 11:37:44 +05:30
Tamo
98a365aaae
store the geopoint in three dimensions 2021-12-14 12:21:24 +01:00
Tamo
d671d6f0f1
remove an unused file 2021-12-13 19:27:34 +01:00
Clément Renault
25faef67d0
Remove the database setup in the filter_depth test 2021-12-09 11:57:53 +01:00
Clément Renault
65519bc04b
Test that empty filters return a None 2021-12-09 11:57:53 +01:00
Clément Renault
ef59762d8e
Prefer returning None instead of the Empty Filter state 2021-12-09 11:57:52 +01:00
Clément Renault
ee856a7a46
Limit the max filter depth to 2000 2021-12-07 17:36:45 +01:00
Clément Renault
32bd9f091f
Detect the filters that are too deep and return an error 2021-12-07 17:20:11 +01:00
Clément Renault
90f49eab6d
Check the filter max depth limit and reject the invalid ones 2021-12-07 16:32:48 +01:00
many
8970246bc4
Sort positions before iterating over them during word pair proximity extraction 2021-11-22 18:16:54 +01:00
Marin Postma
6e977dd8e8 change visibility of DocumentDeletionResult 2021-11-22 15:44:44 +01:00
many
35f9499638
Export tokenizer from milli 2021-11-18 16:57:12 +01:00
Marin Postma
6eb47ab792 remove update_id in UpdateBuilder 2021-11-16 13:07:04 +01:00