meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-03-03 04:14:15 +08:00

Go to file

We now store the words pairs proximity in a cache and only compute the
shortest proximity between pairs of words in a document.

2020-09-29 15:09:18 +02:00

benches

Fix the benchmarks compilation

2020-09-28 13:39:17 +02:00

public

Make the front-end to throttle the request by 100ms

2020-08-31 13:34:35 +02:00

src

Improve the indexing process

2020-09-29 15:09:18 +02:00

templates

Put the documents MTBL back into LMDB

2020-08-28 15:43:24 +02:00

.gitignore

Introduce the words-docids command for the infos binary

2020-09-07 22:36:35 +02:00

Cargo.lock

Replace the arc cache by a simple linked hash map

2020-09-23 14:50:52 +02:00

Cargo.toml

Replace the arc cache by a simple linked hash map

2020-09-23 14:50:52 +02:00

LICENSE

Initial commit

2020-05-31 14:21:56 +02:00

qc_loop.sh

Initial commit

2020-05-31 14:22:06 +02:00

README.md

Update README.md

2020-08-04 15:40:37 +02:00

README.md

A concurrent indexer combined with fast and relevant search algorithms.

Introduction

This engine is a prototype, do not use it in production. This is one of the most advanced search engine I have worked on. It currently only supports the proximity criterion.

Compile all the binaries

cargo build --release --bins

Indexing

It can index mass documents in no much time, I already achieved to index:

109m songs (song and artist name) in 21min and take 29GB on disk.
12m cities (name, timezone and country ID) in 3min13s and take 3.3GB on disk.

All of that on a 39$/month machine with 4cores.

Index your documents

You can feed the engine with your CSV data:

./target/release/indexer --db my-data.mmdb ../my-data.csv

Querying

The engine is designed to handle very frequent words like any other word frequency. This is why you can search for "asia dubai" (the most common timezone) in the countries datasets in no time (59ms) even with 12m documents.

We haven't modified the algorithm to handle queries that are scattered over multiple attributes, this is an open issue (#4).

Exposing a website to request the database

Once you've indexed the dataset you will be able to access it with your brwoser.

./target/release/serve -l 0.0.0.0:8700 --db my-data.mmdb

Gaps

There is many ways to make the engine search for too long and consume too much CPU. This can for example be achieved by querying the engine for "the best of the do" on the songs and subreddits datasets.

There is plenty of way to improve the algorithms and there is and will be new issues explaining potential improvements.