A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
Go to file
bors[bot] d11a6e187f
Merge #639
639: Reduce the size of the word_pair_proximity database  r=loiclec a=loiclec

# Pull Request

## What does this PR do?
Fixes #634 

Now, the value corresponding to the key `prox word1 word2` in the `word_pair_proximity_docids` database contains the ids of the documents in which:
- `word1` is followed by `word2`
- the minimum number of words between `word1` and `word2` is `prox-1`

Before this PR, the `word_pair_proximity_docids` had keys with the format `word1 word2 prox` and the value contained the ids of the documents in which either:
- `word1` is followed by `word2` after a minimum of `prox-1` words in between them
- `word2` is followed by `word1` after a minimum of `prox-2` words 

As a consequence of this change, calls such as:
```
let docids = word_pair_proximity_docids.get(rtxn, (word1, word2, prox));
```
have to be replaced with:
```
let docids1 = word_pair_proximity_docids.get(rtxn, (prox, word1, word2)) ;
let docids2 = word_pair_proximity_docids.get(rtxn, (prox-1, word2, word1)) ;
let docids = docids1 | docids2;
```

## Phrase search

The PR also fixes two bugs in the `resolve_phrase` function. The first bug is that a phrase containing twice the same word would always return zero documents (e.g. `"dog eats dog"`). 

The second bug occurs with a phrase such as "fox is smarter than a dog"` and the document with the text:
```
fox or dog? a fox is smarter than a dog
```
In that case, the phrase search would not return the documents because:
* we only have the key `fox dog 2` in `word_pair_proximity_docids`
* but the implementation of `resolve_phrase` looks for `fox dog 5`, which returns 0 documents 

### New implementation of `resolve_phrase`
Given the phrase:
```
fox is smarter than a dog
```
We select the document ids corresponding to all of the following keys in `word_pair_proximity_docids`:
- `1 fox is`
- `1 is smarter`
- `1 smarter than`
- (etc.)
- `1 fox smarter` OR `2 fox smarter`
- `1 is than` OR `2 is than`
- ...
- `1 than dog` OR `2 than dog`

## Benchmark Results

Indexing:
```
group                                                                     indexing_main_d94339a8                 indexing_word-pair-proximity-docids-refactor_2983dd8e
-----                                                                     ----------------------                 -----------------------------------------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable-                 1.19    40.7±11.28ms        ? ?/sec    1.00     34.3±4.16ms        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-           1.62     11.3±3.77ms        ? ?/sec    1.00      7.0±1.56ms        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested-    1.00     12.5±2.62ms        ? ?/sec    1.07     13.4±4.24ms        ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable-            1.26    50.2±12.63ms        ? ?/sec    1.00    39.8±20.25ms        ? ?/sec
indexing/-wiki-delete-searchable-                                         1.83   269.1±16.11ms        ? ?/sec    1.00    146.8±6.12ms        ? ?/sec
indexing/Indexing geo_point                                               1.00      47.2±0.46s        ? ?/sec    1.00      47.3±0.56s        ? ?/sec
indexing/Indexing movies in three batches                                 1.42      12.7±0.13s        ? ?/sec    1.00       9.0±0.07s        ? ?/sec
indexing/Indexing movies with default settings                            1.40      10.2±0.07s        ? ?/sec    1.00       7.3±0.06s        ? ?/sec
indexing/Indexing nested movies with default settings                     1.22       7.8±0.11s        ? ?/sec    1.00       6.4±0.13s        ? ?/sec
indexing/Indexing nested movies without any facets                        1.24       7.3±0.07s        ? ?/sec    1.00       5.9±0.06s        ? ?/sec
indexing/Indexing songs in three batches with default settings            1.14      47.6±0.67s        ? ?/sec    1.00      41.8±0.63s        ? ?/sec
indexing/Indexing songs with default settings                             1.13      44.1±0.74s        ? ?/sec    1.00      38.9±0.76s        ? ?/sec
indexing/Indexing songs without any facets                                1.19      42.0±0.66s        ? ?/sec    1.00      35.2±0.48s        ? ?/sec
indexing/Indexing songs without faceted numbers                           1.20      44.3±1.40s        ? ?/sec    1.00      37.0±0.48s        ? ?/sec
indexing/Indexing wiki                                                    1.39     862.9±9.95s        ? ?/sec    1.00    622.6±27.11s        ? ?/sec
indexing/Indexing wiki in three batches                                   1.40     934.4±5.97s        ? ?/sec    1.00     665.7±4.72s        ? ?/sec
indexing/Reindexing geo_point                                             1.01      15.9±0.39s        ? ?/sec    1.00      15.7±0.28s        ? ?/sec
indexing/Reindexing movies with default settings                          1.15   288.8±25.03ms        ? ?/sec    1.00    250.4±2.23ms        ? ?/sec
indexing/Reindexing songs with default settings                           1.01       4.1±0.06s        ? ?/sec    1.00       4.1±0.03s        ? ?/sec
indexing/Reindexing wiki                                                  1.41   1484.7±20.59s        ? ?/sec    1.00   1052.0±19.89s        ? ?/sec
```

Search Wiki:
<details>
<pre>
group                                                                                    search_wiki_main_d94339a8              search_wiki_word-pair-proximity-docids-refactor_2983dd8e
-----                                                                                    -------------------------              --------------------------------------------------------
smol-wiki-articles.csv: basic placeholder/                                               1.02     25.8±0.21µs        ? ?/sec    1.00     25.4±0.19µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"film"                                          1.00    441.7±2.57µs        ? ?/sec    1.00    442.3±2.41µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"france"                                        1.00    357.0±2.63µs        ? ?/sec    1.00    358.3±2.65µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"japan"                                         1.00    239.4±2.24µs        ? ?/sec    1.00    240.2±1.82µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"machine"                                       1.00    180.3±2.40µs        ? ?/sec    1.00    180.0±1.08µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"miles" "davis"                                 1.00      9.1±0.03ms        ? ?/sec    1.03      9.3±0.04ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"mingus"                                        1.00      3.6±0.01ms        ? ?/sec    1.03      3.7±0.02ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"rock" "and" "roll"                             1.00     34.0±0.11ms        ? ?/sec    1.03     35.1±0.13ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"spain"                                         1.00    162.0±0.88µs        ? ?/sec    1.00    161.9±0.98µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/film                                         1.01    164.4±1.46µs        ? ?/sec    1.00    163.1±1.58µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/france                                       1.00   1698.3±7.37µs        ? ?/sec    1.00  1697.7±11.53µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/japan                                        1.00  1154.0±23.61µs        ? ?/sec    1.00   1150.7±9.27µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/machine                                      1.00    524.6±3.45µs        ? ?/sec    1.01    528.1±4.56µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/miles davis                                  1.00     13.5±0.05ms        ? ?/sec    1.02     13.8±0.05ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/mingus                                       1.00      4.1±0.02ms        ? ?/sec    1.03      4.2±0.01ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/rock and roll                                1.00     49.0±0.19ms        ? ?/sec    1.03     50.4±0.22ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/spain                                        1.00    412.2±3.35µs        ? ?/sec    1.00    412.9±2.81µs        ? ?/sec
smol-wiki-articles.csv: prefix search/c                                                  1.00    383.9±2.53µs        ? ?/sec    1.00    383.4±2.44µs        ? ?/sec
smol-wiki-articles.csv: prefix search/g                                                  1.00    433.4±2.53µs        ? ?/sec    1.00    432.8±2.52µs        ? ?/sec
smol-wiki-articles.csv: prefix search/j                                                  1.00    424.3±2.05µs        ? ?/sec    1.00    424.0±2.15µs        ? ?/sec
smol-wiki-articles.csv: prefix search/q                                                  1.00    154.0±1.93µs        ? ?/sec    1.00    153.5±1.04µs        ? ?/sec
smol-wiki-articles.csv: prefix search/t                                                  1.04   658.5±91.93µs        ? ?/sec    1.00    631.4±3.89µs        ? ?/sec
smol-wiki-articles.csv: prefix search/x                                                  1.00    446.2±2.09µs        ? ?/sec    1.00    445.6±3.13µs        ? ?/sec
smol-wiki-articles.csv: proximity/april paris                                            1.02      3.4±0.39ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
smol-wiki-articles.csv: proximity/diesel engine                                          1.00  1022.1±17.52µs        ? ?/sec    1.00   1017.7±8.16µs        ? ?/sec
smol-wiki-articles.csv: proximity/herald sings                                           1.01  1872.5±97.70µs        ? ?/sec    1.00   1862.2±8.57µs        ? ?/sec
smol-wiki-articles.csv: proximity/tea two                                                1.00   295.2±34.91µs        ? ?/sec    1.00    296.6±4.08µs        ? ?/sec
smol-wiki-articles.csv: typo/Disnaylande                                                 1.00      3.4±0.51ms        ? ?/sec    1.04      3.5±0.01ms        ? ?/sec
smol-wiki-articles.csv: typo/aritmetric                                                  1.00      3.6±0.01ms        ? ?/sec    1.00      3.7±0.01ms        ? ?/sec
smol-wiki-articles.csv: typo/linax                                                       1.00    167.5±1.28µs        ? ?/sec    1.00    167.1±2.65µs        ? ?/sec
smol-wiki-articles.csv: typo/migrosoft                                                   1.01    217.9±1.84µs        ? ?/sec    1.00    216.2±1.61µs        ? ?/sec
smol-wiki-articles.csv: typo/nympalidea                                                  1.00      2.9±0.01ms        ? ?/sec    1.10      3.1±0.01ms        ? ?/sec
smol-wiki-articles.csv: typo/phytogropher                                                1.00      3.0±0.23ms        ? ?/sec    1.08      3.3±0.01ms        ? ?/sec
smol-wiki-articles.csv: typo/sisan                                                       1.00    234.6±1.38µs        ? ?/sec    1.01    235.8±1.67µs        ? ?/sec
smol-wiki-articles.csv: typo/the fronce                                                  1.00    104.4±0.84µs        ? ?/sec    1.00    103.9±0.81µs        ? ?/sec
smol-wiki-articles.csv: words/Abraham machin                                             1.02    675.5±4.74µs        ? ?/sec    1.00    662.1±5.13µs        ? ?/sec
smol-wiki-articles.csv: words/Idaho Bellevue pizza                                       1.02  1004.5±11.07µs        ? ?/sec    1.00   989.5±13.08µs        ? ?/sec
smol-wiki-articles.csv: words/Kameya Tokujirō mingus monk                                1.00  1650.8±10.92µs        ? ?/sec    1.00  1643.2±10.77µs        ? ?/sec
smol-wiki-articles.csv: words/Ulrich Hensel meilisearch milli                            1.00      5.4±0.03ms        ? ?/sec    1.00      5.4±0.02ms        ? ?/sec
smol-wiki-articles.csv: words/the black saint and the sinner lady and the good doggo     1.00     32.9±0.10ms        ? ?/sec    1.00     32.8±0.10ms        ? ?/sec
</pre>
</details>

Search songs:
<details>
<pre>
group                                                                                                    search_songs_main_d94339a8             search_songs_word-pair-proximity-docids-refactor_2983dd8e
-----                                                                                                    --------------------------             ---------------------------------------------------------
smol-songs.csv: asc + default/Notstandskomitee                                                           1.00      3.0±0.01ms        ? ?/sec    1.01      3.0±0.04ms        ? ?/sec
smol-songs.csv: asc + default/charles                                                                    1.00      2.2±0.01ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
smol-songs.csv: asc + default/charles mingus                                                             1.00      3.1±0.01ms        ? ?/sec    1.01      3.1±0.01ms        ? ?/sec
smol-songs.csv: asc + default/david                                                                      1.00      2.9±0.01ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
smol-songs.csv: asc + default/david bowie                                                                1.00      4.5±0.02ms        ? ?/sec    1.00      4.5±0.02ms        ? ?/sec
smol-songs.csv: asc + default/john                                                                       1.00      3.1±0.01ms        ? ?/sec    1.01      3.2±0.01ms        ? ?/sec
smol-songs.csv: asc + default/marcus miller                                                              1.00      5.0±0.02ms        ? ?/sec    1.00      5.0±0.02ms        ? ?/sec
smol-songs.csv: asc + default/michael jackson                                                            1.00      4.7±0.02ms        ? ?/sec    1.00      4.7±0.02ms        ? ?/sec
smol-songs.csv: asc + default/tamo                                                                       1.00  1463.4±12.17µs        ? ?/sec    1.01   1481.5±8.83µs        ? ?/sec
smol-songs.csv: asc + default/thelonious monk                                                            1.00      4.4±0.01ms        ? ?/sec    1.00      4.4±0.02ms        ? ?/sec
smol-songs.csv: asc/Notstandskomitee                                                                     1.01      2.6±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
smol-songs.csv: asc/charles                                                                              1.00    473.6±3.70µs        ? ?/sec    1.01   476.8±22.17µs        ? ?/sec
smol-songs.csv: asc/charles mingus                                                                       1.01    780.1±3.90µs        ? ?/sec    1.00    773.6±4.60µs        ? ?/sec
smol-songs.csv: asc/david                                                                                1.00    757.6±4.50µs        ? ?/sec    1.00    760.7±5.20µs        ? ?/sec
smol-songs.csv: asc/david bowie                                                                          1.00   1131.2±8.68µs        ? ?/sec    1.00   1130.7±8.36µs        ? ?/sec
smol-songs.csv: asc/john                                                                                 1.00    668.9±6.48µs        ? ?/sec    1.00    669.9±2.78µs        ? ?/sec
smol-songs.csv: asc/marcus miller                                                                        1.00    959.8±7.10µs        ? ?/sec    1.00    958.9±4.72µs        ? ?/sec
smol-songs.csv: asc/michael jackson                                                                      1.01  1076.7±16.73µs        ? ?/sec    1.00   1070.8±7.34µs        ? ?/sec
smol-songs.csv: asc/tamo                                                                                 1.00     70.4±0.55µs        ? ?/sec    1.00     70.5±0.51µs        ? ?/sec
smol-songs.csv: asc/thelonious monk                                                                      1.01      2.9±0.01ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
smol-songs.csv: basic filter: <=/Notstandskomitee                                                        1.00    162.0±0.91µs        ? ?/sec    1.01    163.6±1.72µs        ? ?/sec
smol-songs.csv: basic filter: <=/charles                                                                 1.00     38.3±0.24µs        ? ?/sec    1.01     38.7±0.31µs        ? ?/sec
smol-songs.csv: basic filter: <=/charles mingus                                                          1.01     85.3±0.44µs        ? ?/sec    1.00     84.6±0.47µs        ? ?/sec
smol-songs.csv: basic filter: <=/david                                                                   1.01     32.4±0.25µs        ? ?/sec    1.00     32.1±0.24µs        ? ?/sec
smol-songs.csv: basic filter: <=/david bowie                                                             1.00     68.6±0.99µs        ? ?/sec    1.01     68.9±0.88µs        ? ?/sec
smol-songs.csv: basic filter: <=/john                                                                    1.04     26.1±0.37µs        ? ?/sec    1.00     25.1±0.22µs        ? ?/sec
smol-songs.csv: basic filter: <=/marcus miller                                                           1.00     76.7±0.39µs        ? ?/sec    1.01     77.3±0.61µs        ? ?/sec
smol-songs.csv: basic filter: <=/michael jackson                                                         1.00     95.5±0.66µs        ? ?/sec    1.01     96.3±0.79µs        ? ?/sec
smol-songs.csv: basic filter: <=/tamo                                                                    1.03     26.2±0.36µs        ? ?/sec    1.00     25.3±0.23µs        ? ?/sec
smol-songs.csv: basic filter: <=/thelonious monk                                                         1.00    140.7±1.36µs        ? ?/sec    1.01    142.7±0.88µs        ? ?/sec
smol-songs.csv: basic filter: TO/Notstandskomitee                                                        1.00    165.4±1.25µs        ? ?/sec    1.00    165.7±1.72µs        ? ?/sec
smol-songs.csv: basic filter: TO/charles                                                                 1.01     40.6±0.57µs        ? ?/sec    1.00     40.1±0.54µs        ? ?/sec
smol-songs.csv: basic filter: TO/charles mingus                                                          1.01     87.1±0.80µs        ? ?/sec    1.00     86.3±0.61µs        ? ?/sec
smol-songs.csv: basic filter: TO/david                                                                   1.02     34.5±0.26µs        ? ?/sec    1.00     33.7±0.24µs        ? ?/sec
smol-songs.csv: basic filter: TO/david bowie                                                             1.00     70.6±0.38µs        ? ?/sec    1.00     70.6±0.68µs        ? ?/sec
smol-songs.csv: basic filter: TO/john                                                                    1.02     27.5±0.77µs        ? ?/sec    1.00     26.9±0.21µs        ? ?/sec
smol-songs.csv: basic filter: TO/marcus miller                                                           1.01     79.8±0.76µs        ? ?/sec    1.00     79.3±1.27µs        ? ?/sec
smol-songs.csv: basic filter: TO/michael jackson                                                         1.00     98.3±0.54µs        ? ?/sec    1.00     98.0±0.88µs        ? ?/sec
smol-songs.csv: basic filter: TO/tamo                                                                    1.03     27.9±0.23µs        ? ?/sec    1.00     27.1±0.32µs        ? ?/sec
smol-songs.csv: basic filter: TO/thelonious monk                                                         1.00    142.5±1.36µs        ? ?/sec    1.02    145.2±0.98µs        ? ?/sec
smol-songs.csv: basic placeholder/                                                                       1.00     49.4±0.34µs        ? ?/sec    1.00     49.3±0.45µs        ? ?/sec
smol-songs.csv: basic with quote/"Notstandskomitee"                                                      1.00    190.5±1.60µs        ? ?/sec    1.01    191.8±2.10µs        ? ?/sec
smol-songs.csv: basic with quote/"charles"                                                               1.00    165.0±1.13µs        ? ?/sec    1.01    166.0±1.39µs        ? ?/sec
smol-songs.csv: basic with quote/"charles" "mingus"                                                      1.00  1149.4±15.78µs        ? ?/sec    1.02   1171.1±9.95µs        ? ?/sec
smol-songs.csv: basic with quote/"david"                                                                 1.00    236.5±1.61µs        ? ?/sec    1.00    236.9±1.73µs        ? ?/sec
smol-songs.csv: basic with quote/"david" "bowie"                                                         1.00   1384.8±9.02µs        ? ?/sec    1.01  1393.8±11.39µs        ? ?/sec
smol-songs.csv: basic with quote/"john"                                                                  1.00    358.3±4.85µs        ? ?/sec    1.00    358.9±1.75µs        ? ?/sec
smol-songs.csv: basic with quote/"marcus" "miller"                                                       1.00    281.4±1.79µs        ? ?/sec    1.01    285.6±3.24µs        ? ?/sec
smol-songs.csv: basic with quote/"michael" "jackson"                                                     1.00   1328.4±8.01µs        ? ?/sec    1.00   1334.6±8.00µs        ? ?/sec
smol-songs.csv: basic with quote/"tamo"                                                                  1.00    528.7±3.72µs        ? ?/sec    1.01    533.4±5.31µs        ? ?/sec
smol-songs.csv: basic with quote/"thelonious" "monk"                                                     1.00   1223.0±7.24µs        ? ?/sec    1.02  1245.7±12.04µs        ? ?/sec
smol-songs.csv: basic without quote/Notstandskomitee                                                     1.00      2.8±0.01ms        ? ?/sec    1.00      2.8±0.01ms        ? ?/sec
smol-songs.csv: basic without quote/charles                                                              1.00    273.3±2.06µs        ? ?/sec    1.01    275.9±1.76µs        ? ?/sec
smol-songs.csv: basic without quote/charles mingus                                                       1.00      2.3±0.01ms        ? ?/sec    1.02      2.4±0.01ms        ? ?/sec
smol-songs.csv: basic without quote/david                                                                1.00    434.3±3.86µs        ? ?/sec    1.01    436.7±2.47µs        ? ?/sec
smol-songs.csv: basic without quote/david bowie                                                          1.00      5.6±0.02ms        ? ?/sec    1.01      5.7±0.02ms        ? ?/sec
smol-songs.csv: basic without quote/john                                                                 1.00   1322.5±9.98µs        ? ?/sec    1.00  1321.2±17.40µs        ? ?/sec
smol-songs.csv: basic without quote/marcus miller                                                        1.02      2.4±0.02ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
smol-songs.csv: basic without quote/michael jackson                                                      1.00      3.8±0.02ms        ? ?/sec    1.01      3.9±0.01ms        ? ?/sec
smol-songs.csv: basic without quote/tamo                                                                 1.00    809.0±4.01µs        ? ?/sec    1.01    819.0±6.22µs        ? ?/sec
smol-songs.csv: basic without quote/thelonious monk                                                      1.00      3.8±0.02ms        ? ?/sec    1.02      3.9±0.02ms        ? ?/sec
smol-songs.csv: big filter/Notstandskomitee                                                              1.00      2.7±0.01ms        ? ?/sec    1.01      2.8±0.01ms        ? ?/sec
smol-songs.csv: big filter/charles                                                                       1.00    266.5±1.34µs        ? ?/sec    1.01    270.1±8.17µs        ? ?/sec
smol-songs.csv: big filter/charles mingus                                                                1.00    651.0±5.40µs        ? ?/sec    1.00    651.0±2.73µs        ? ?/sec
smol-songs.csv: big filter/david                                                                         1.00  1018.1±11.16µs        ? ?/sec    1.00   1022.3±8.94µs        ? ?/sec
smol-songs.csv: big filter/david bowie                                                                   1.00  1912.2±11.13µs        ? ?/sec    1.00   1919.8±8.30µs        ? ?/sec
smol-songs.csv: big filter/john                                                                          1.00    867.2±6.66µs        ? ?/sec    1.01    873.3±3.44µs        ? ?/sec
smol-songs.csv: big filter/marcus miller                                                                 1.00    717.7±2.86µs        ? ?/sec    1.01    721.5±3.89µs        ? ?/sec
smol-songs.csv: big filter/michael jackson                                                               1.00  1668.4±16.76µs        ? ?/sec    1.00  1667.9±10.11µs        ? ?/sec
smol-songs.csv: big filter/tamo                                                                          1.01    136.7±0.88µs        ? ?/sec    1.00    135.5±1.22µs        ? ?/sec
smol-songs.csv: big filter/thelonious monk                                                               1.03      3.1±0.02ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
smol-songs.csv: desc + default/Notstandskomitee                                                          1.00      3.0±0.01ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
smol-songs.csv: desc + default/charles                                                                   1.00  1599.5±13.07µs        ? ?/sec    1.01  1622.9±22.43µs        ? ?/sec
smol-songs.csv: desc + default/charles mingus                                                            1.00      2.3±0.01ms        ? ?/sec    1.01      2.4±0.03ms        ? ?/sec
smol-songs.csv: desc + default/david                                                                     1.00      5.7±0.02ms        ? ?/sec    1.00      5.7±0.02ms        ? ?/sec
smol-songs.csv: desc + default/david bowie                                                               1.00      9.0±0.04ms        ? ?/sec    1.00      9.0±0.03ms        ? ?/sec
smol-songs.csv: desc + default/john                                                                      1.00      4.5±0.01ms        ? ?/sec    1.00      4.5±0.02ms        ? ?/sec
smol-songs.csv: desc + default/marcus miller                                                             1.00      3.9±0.01ms        ? ?/sec    1.00      3.9±0.02ms        ? ?/sec
smol-songs.csv: desc + default/michael jackson                                                           1.00      6.6±0.03ms        ? ?/sec    1.00      6.6±0.03ms        ? ?/sec
smol-songs.csv: desc + default/tamo                                                                      1.00  1472.4±10.38µs        ? ?/sec    1.01   1484.2±8.07µs        ? ?/sec
smol-songs.csv: desc + default/thelonious monk                                                           1.00      4.4±0.02ms        ? ?/sec    1.00      4.4±0.05ms        ? ?/sec
smol-songs.csv: desc/Notstandskomitee                                                                    1.01      2.6±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
smol-songs.csv: desc/charles                                                                             1.00    475.9±3.38µs        ? ?/sec    1.00    475.9±2.64µs        ? ?/sec
smol-songs.csv: desc/charles mingus                                                                      1.00    775.3±4.30µs        ? ?/sec    1.00    778.9±3.52µs        ? ?/sec
smol-songs.csv: desc/david                                                                               1.00    757.9±4.10µs        ? ?/sec    1.01    763.4±3.27µs        ? ?/sec
smol-songs.csv: desc/david bowie                                                                         1.00  1129.0±11.87µs        ? ?/sec    1.01   1135.1±8.86µs        ? ?/sec
smol-songs.csv: desc/john                                                                                1.00    670.2±4.38µs        ? ?/sec    1.00    670.2±3.46µs        ? ?/sec
smol-songs.csv: desc/marcus miller                                                                       1.00    961.2±4.47µs        ? ?/sec    1.00    961.9±4.03µs        ? ?/sec
smol-songs.csv: desc/michael jackson                                                                     1.00   1076.5±6.61µs        ? ?/sec    1.00   1077.9±7.11µs        ? ?/sec
smol-songs.csv: desc/tamo                                                                                1.00     70.6±0.57µs        ? ?/sec    1.01     71.3±0.48µs        ? ?/sec
smol-songs.csv: desc/thelonious monk                                                                     1.01      2.9±0.01ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
smol-songs.csv: prefix search/a                                                                          1.00   1236.2±9.43µs        ? ?/sec    1.00  1232.0±12.07µs        ? ?/sec
smol-songs.csv: prefix search/b                                                                          1.00   1090.8±9.89µs        ? ?/sec    1.00   1090.8±9.43µs        ? ?/sec
smol-songs.csv: prefix search/i                                                                          1.00   1333.9±8.28µs        ? ?/sec    1.00  1334.2±11.21µs        ? ?/sec
smol-songs.csv: prefix search/s                                                                          1.00    810.5±3.69µs        ? ?/sec    1.00    806.6±3.50µs        ? ?/sec
smol-songs.csv: prefix search/x                                                                          1.00    290.5±1.88µs        ? ?/sec    1.00    291.0±1.85µs        ? ?/sec
smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie                                             1.00      4.7±0.02ms        ? ?/sec    1.00      4.7±0.02ms        ? ?/sec
smol-songs.csv: proximity/The Disneyland Sing-Along Chorus                                               1.01      5.6±0.02ms        ? ?/sec    1.00      5.6±0.03ms        ? ?/sec
smol-songs.csv: proximity/Under Great Northern Lights                                                    1.00      2.5±0.01ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
smol-songs.csv: proximity/black saint sinner lady                                                        1.00      4.8±0.02ms        ? ?/sec    1.00      4.8±0.02ms        ? ?/sec
smol-songs.csv: proximity/les dangeureuses 1960                                                          1.00      3.2±0.01ms        ? ?/sec    1.01      3.2±0.01ms        ? ?/sec
smol-songs.csv: typo/Arethla Franklin                                                                    1.00    388.7±5.16µs        ? ?/sec    1.00    390.0±2.11µs        ? ?/sec
smol-songs.csv: typo/Disnaylande                                                                         1.01      2.6±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
smol-songs.csv: typo/dire straights                                                                      1.00    125.9±1.22µs        ? ?/sec    1.00    126.0±0.71µs        ? ?/sec
smol-songs.csv: typo/fear of the duck                                                                    1.00    373.7±4.25µs        ? ?/sec    1.01   375.7±14.17µs        ? ?/sec
smol-songs.csv: typo/indochie                                                                            1.00    103.6±0.94µs        ? ?/sec    1.00    103.4±0.74µs        ? ?/sec
smol-songs.csv: typo/indochien                                                                           1.00    155.6±1.14µs        ? ?/sec    1.01    157.5±1.75µs        ? ?/sec
smol-songs.csv: typo/klub des loopers                                                                    1.00    160.6±2.98µs        ? ?/sec    1.01    161.7±1.96µs        ? ?/sec
smol-songs.csv: typo/michel depech                                                                       1.00     79.4±0.54µs        ? ?/sec    1.01     79.9±0.60µs        ? ?/sec
smol-songs.csv: typo/mongus                                                                              1.00    126.7±1.85µs        ? ?/sec    1.00    126.1±0.74µs        ? ?/sec
smol-songs.csv: typo/stromal                                                                             1.01    132.9±0.99µs        ? ?/sec    1.00    131.9±1.09µs        ? ?/sec
smol-songs.csv: typo/the white striper                                                                   1.00    287.8±2.88µs        ? ?/sec    1.00    286.5±1.91µs        ? ?/sec
smol-songs.csv: typo/thelonius monk                                                                      1.00    304.2±1.49µs        ? ?/sec    1.01    306.5±1.50µs        ? ?/sec
smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots                                     1.01     20.9±0.08ms        ? ?/sec    1.00     20.7±0.07ms        ? ?/sec
smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title         1.00     48.9±0.13ms        ? ?/sec    1.00     48.9±0.11ms        ? ?/sec
smol-songs.csv: words/The Disneyland Children's Sing-Alone song                                          1.01     13.9±0.06ms        ? ?/sec    1.00     13.8±0.07ms        ? ?/sec
smol-songs.csv: words/les liaisons dangeureuses 1793                                                     1.01      3.7±0.01ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec
smol-songs.csv: words/seven nation mummy                                                                 1.00  1054.2±14.49µs        ? ?/sec    1.00  1056.6±10.53µs        ? ?/sec
smol-songs.csv: words/the black saint and the sinner lady and the good doggo                             1.00     58.2±0.29ms        ? ?/sec    1.00     57.9±0.21ms        ? ?/sec
smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one     1.00     66.1±0.21ms        ? ?/sec    1.00     66.0±0.24ms        ? ?/sec
</code>
</details>

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-10-25 10:42:04 +00:00
.github Upgrade ubuntu-18.04 to 20.04 2022-09-08 14:58:06 +02:00
assets chore: move logo to (new) assets folder 2022-10-04 12:20:24 +02:00
benchmarks Update version for the next release (v0.34.0) in Cargo.toml files 2022-10-24 10:13:25 +00:00
cli Update version for the next release (v0.34.0) in Cargo.toml files 2022-10-24 10:13:25 +00:00
filter-parser Update version for the next release (v0.34.0) in Cargo.toml files 2022-10-24 10:13:25 +00:00
flatten-serde-json Update version for the next release (v0.34.0) in Cargo.toml files 2022-10-24 10:13:25 +00:00
json-depth-checker Update version for the next release (v0.34.0) in Cargo.toml files 2022-10-24 10:13:25 +00:00
milli Minor code style change 2022-10-24 15:30:43 +02:00
script format the whole project 2021-06-16 18:33:33 +02:00
.gitignore chore: move logo to (new) assets folder 2022-10-04 12:20:24 +02:00
.rustfmt.toml format the whole project 2021-06-16 18:33:33 +02:00
bors.toml Upgrade ubuntu-18.04 to 20.04 2022-09-08 14:58:06 +02:00
Cargo.toml Optimize a few performance sensitive dependencies on debug builds 2022-10-12 09:22:05 +02:00
CONTRIBUTING.md Update CONTRIBUTING.md 2022-10-13 13:46:18 +02:00
LICENSE Update LICENSE 2022-02-15 15:52:50 +01:00
README.md chore: move logo to (new) assets folder 2022-10-04 12:20:24 +02:00

the milli logo

a concurrent indexer combined with fast and relevant search algorithms

Introduction

This repository contains the core engine used in Meilisearch.

It contains a library that can manage one and only one index. Meilisearch manages the multi-index itself. Milli is unable to store updates in a store: it is the job of something else above and this is why it is only able to process one update at a time.

This repository contains crates to quickly debug the engine:

  • There are benchmarks located in the benchmarks crate.
  • The cli crate is a simple command-line interface that helps run flamegraph on top of it.
  • The filter-parser crate contains the parser for the Meilisearch filter syntax.
  • The flatten-serde-json crate contains the library that flattens serde-json Value objects like Elasticsearch does.
  • The json-depth-checker crate is used to indicate if a JSON must be flattened.

How to use it?

Milli is a library that does search things, it must be embedded in a program. You can compute the documentation of it by using cargo doc --open.

Here is an example usage of the library where we insert documents into the engine and search for one of them right after.

let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();

let mut wtxn = index.write_txn().unwrap();
let content = documents!([
    {
        "id": 2,
        "title": "Prideand Prejudice",
        "author": "Jane Austin",
        "genre": "romance",
        "price$": "3.5$",
    },
    {
        "id": 456,
        "title": "Le Petit Prince",
        "author": "Antoine de Saint-Exupéry",
        "genre": "adventure",
        "price$": "10.0$",
    },
    {
        "id": 1,
        "title": "Wonderland",
        "author": "Lewis Carroll",
        "genre": "fantasy",
        "price$": "25.99$",
    },
    {
        "id": 4,
        "title": "Harry Potter ing fantasy\0lood Prince",
        "author": "J. K. Rowling",
        "genre": "fantasy\0",
    },
]);

let config = IndexerConfig::default();
let indexing_config = IndexDocumentsConfig::default();
let mut builder =
    IndexDocuments::new(&mut wtxn, &index, &config, indexing_config.clone(), |_| ())
        .unwrap();
builder.add_documents(content).unwrap();
builder.execute().unwrap();
wtxn.commit().unwrap();


// You can search in the index now!
let mut rtxn = index.read_txn().unwrap();
let mut search = Search::new(&rtxn, &index);
search.query("horry");
search.limit(10);

let result = search.execute().unwrap();
assert_eq!(result.documents_ids.len(), 1);

Contributing

We're glad you're thinking about contributing to this repository! Feel free to pick an issue, and to ask any question you need. Some points might not be clear and we are available to help you!

Also, we recommend following the CONTRIBUTING.md to create your PR.