From a43f99c600066452092eee35ab396bab1c69bafc Mon Sep 17 00:00:00 2001 From: Kerollmops Date: Mon, 13 Sep 2021 14:00:56 +0200 Subject: [PATCH 1/2] Inform the users that documents must have an id in there documents --- README.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 58e781d83..674c1919e 100644 --- a/README.md +++ b/README.md @@ -40,13 +40,12 @@ All of that on a 39$/month machine with 4cores. You can feed the engine with your CSV (comma-seperated, yes) data like this: ```bash -printf "name,age\nhello,32\nkiki,24\n" | http POST 127.0.0.1:9700/documents content-type:text/csv +printf "id,name,age\n1,hello,32\n2,kiki,24\n" | http POST 127.0.0.1:9700/documents content-type:text/csv ``` -Here ids will be automatically generated as UUID v4 if they doesn't exist in some or every documents. - -Note that it also support JSON and JSON streaming, you can send them to the engine by using -the `content-type:application/json` and `content-type:application/x-ndjson` headers respectively. +Don't forget to specify the `id` of the documents. Also Note that it also support JSON and +JSON streaming, you can send them to the engine by using the `content-type:application/json` +and `content-type:application/x-ndjson` headers respectively. ### Querying the engine via the website From 2741aa8589cc69bb51a6e7f530f52a276b60366a Mon Sep 17 00:00:00 2001 From: Kerollmops Date: Mon, 13 Sep 2021 16:06:45 +0200 Subject: [PATCH 2/2] Update the indexing timings in the README --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 674c1919e..07071183e 100644 --- a/README.md +++ b/README.md @@ -32,10 +32,10 @@ cargo run --release -- --db my-database.mdb -vvv --indexing-jobs 8 ### Index your documents It can index a massive amount of documents in not much time, I already achieved to index: - - 115m songs (song and artist name) in ~1h and take 107GB on disk. - - 12m cities (name, timezone and country ID) in 15min and take 10GB on disk. + - 115m songs (song and artist name) in \~48min and take 81GiB on disk. + - 12m cities (name, timezone and country ID) in \~4min and take 6GiB on disk. -All of that on a 39$/month machine with 4cores. +These metrics are done on a MacBook Pro with the M1 processor. You can feed the engine with your CSV (comma-seperated, yes) data like this: @@ -43,9 +43,9 @@ You can feed the engine with your CSV (comma-seperated, yes) data like this: printf "id,name,age\n1,hello,32\n2,kiki,24\n" | http POST 127.0.0.1:9700/documents content-type:text/csv ``` -Don't forget to specify the `id` of the documents. Also Note that it also support JSON and -JSON streaming, you can send them to the engine by using the `content-type:application/json` -and `content-type:application/x-ndjson` headers respectively. +Don't forget to specify the `id` of the documents. Also, note that it supports JSON and JSON +streaming: you can send them to the engine by using the `content-type:application/json` and +`content-type:application/x-ndjson` headers respectively. ### Querying the engine via the website