meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-27 04:25:06 +08:00

Author	SHA1	Message	Date
Clément Renault	5b4eda670b	Add two tests for the UpdateStore	2020-10-18 18:55:09 +02:00
Clément Renault	edb8c99fbe	Introduce a method to get the meta of an update on the UpdateStore	2020-10-18 17:19:04 +02:00
Clément Renault	eca49e3a03	Introduce a notification channel for the UpdateStore	2020-10-18 16:37:37 +02:00
Clément Renault	83c1db8763	Introduce the UpdateStore	2020-10-18 15:26:57 +02:00
Clément Renault	90d4c1d153	Simplify the words pair proximity computation	2020-10-15 16:18:43 +02:00
Clément Renault	9021b2dba6	Introduce the enable-chunk-fusing flag	2020-10-14 18:44:59 +02:00
Kerollmops	f980422c57	Move from oxidized-mtbl to grenad	2020-10-14 12:47:32 +02:00
Clément Renault	b342a86c15	Divide the max-memory parameter by the number of sorters in the store	2020-10-08 17:27:53 +02:00
Kerollmops	fb2c402ae1	Split the max-memory by the number of jobs	2020-10-07 14:23:22 +02:00
Kerollmops	38820bc75c	Improve and simplify the query tokenizer	2020-10-07 14:23:22 +02:00
Kerollmops	a00f5850ee	Add support for placeholder search for empty queries	2020-10-06 20:19:50 +02:00
Kerollmops	433d9bbc6e	Use CompressionType::from_str rather than a custom function	2020-10-06 13:50:34 +02:00
Clément Renault	a2182e68a6	Rewrite the parallel merge indexing part	2020-10-05 20:54:06 +02:00
Kerollmops	e9e03259c1	Improve the mDFS performance and return the proximity	2020-10-05 18:13:56 +02:00
Kerollmops	bb15f16d8c	Merge other databases content while writing into LMDB at the same time	2020-10-05 16:35:10 +02:00
Clément Renault	9af946a306	Merging the main, word docids and words pairs proximity docids in parallel	2020-10-04 18:40:34 +02:00
Clément Renault	99705deb7d	Directly use a writer for the docid word positions	2020-10-04 18:17:53 +02:00
Clément Renault	67577a3760	It is an error to merge docid word positions	2020-10-04 17:31:12 +02:00
Clément Renault	ce8e56ee18	Rewrite the indexer to use one MTBL by database This allows us to avoid prefixing keys and appending into LMDB databases	2020-10-04 17:04:33 +02:00
Clément Renault	acd2a63879	Introduce a simple FST based chinese word segmenter	2020-10-04 17:04:33 +02:00
Clément Renault	6cc6addc2f	Increase the CboRoaringBitmapCodec threshold	2020-10-02 17:06:17 +02:00
Clément Renault	e41a3822a6	Add a simple test for the CboRoaringBitmapCodec	2020-10-02 16:52:36 +02:00
Clément Renault	c4b0c57059	Reduce the default indexer max-memory parameter	2020-10-02 16:47:41 +02:00
Kerollmops	007e647462	Introduce the Mdfs Iterator that explore the proximity graph using a mana DFS	2020-10-02 16:46:07 +02:00
Kerollmops	d4e80407e5	Introduce the mana depth first search algorithm	2020-10-02 16:46:07 +02:00
Kerollmops	f6a8096720	Rename the quartile as percentiles 25th, 50th and 75th	2020-10-02 16:46:07 +02:00
Kerollmops	891e0188dd	Introduce the database-stats infos subcommand	2020-10-02 16:46:07 +02:00
Kerollmops	079742b4d3	Clean up the stats and size of database infos subcommands	2020-10-02 16:46:06 +02:00
Kerollmops	d0c73564b1	Use the CboRoaringBitmapCodec for the word pair proximity docids	2020-10-02 16:46:06 +02:00
Kerollmops	5a6a698e1d	Introduce the CboRoaringBitmapCodec	2020-10-02 16:46:06 +02:00
Kerollmops	4eda149ffa	Rename the BoRoaringBitmap codec	2020-10-02 16:46:06 +02:00
Clément Renault	ac84db2506	Move the words pairs proximities average into the stats infos subcommand	2020-10-02 16:46:06 +02:00
Kerollmops	30755e31e7	Introduce the words pairs proximities stats info subcommand	2020-10-02 16:46:06 +02:00
Clément Renault	bc35c9a598	Introduce the size_of_database infos subcommand	2020-10-02 16:46:05 +02:00
Kerollmops	c6b883289c	Remove the unused fetch_keywords function	2020-09-30 15:41:23 +02:00
Kerollmops	58237bd67f	Introduce the average-number-of-document-by-word-pair-proximity infos subcommand	2020-09-29 18:32:48 +02:00
Kerollmops	991be8950e	Rename the subcommand into average-number-of-positions-by-word-by-doc	2020-09-29 18:15:44 +02:00
Kerollmops	54370e228a	Search for documents with longer proximities until we find enough	2020-09-29 17:37:14 +02:00
Kerollmops	f277ea134f	Simplify some search function by reducing the number of parameters	2020-09-29 16:08:58 +02:00
Kerollmops	68f4af7d2e	Improve the display of the number of processed documents	2020-09-29 16:08:58 +02:00
Kerollmops	59a127d022	Improve the indexing process We now store the words pairs proximity in a cache and only compute the shortest proximity between pairs of words in a document.	2020-09-29 15:09:18 +02:00
Kerollmops	6ddb3e722c	Depth-first search cache the docids unions	2020-09-28 16:55:21 +02:00
Kerollmops	a3821a0b33	Introduce the depth_first_search path resolution function	2020-09-28 16:34:12 +02:00
Clément Renault	d8354f6f02	Fix the word_docids capacity limit detection	2020-09-27 11:52:05 +02:00
Clément Renault	25b2853b70	Move the words pairs proximities compute into the write document function	2020-09-23 15:02:40 +02:00
Clément Renault	ed05999f63	Replace the arc cache by a simple linked hash map	2020-09-23 14:50:52 +02:00
Clément Renault	4d22d80281	Display only the key on heed error	2020-09-23 14:13:51 +02:00
Clément Renault	5178b3d59d	Make the search system be aware of query words typos	2020-09-23 12:01:39 +02:00
Clément Renault	b597a92487	Add a default max-memory value to the indexer	2020-09-23 12:00:36 +02:00
Clément Renault	1f6e00878d	Use the words pair proximities in the search algorithm	2020-09-22 18:47:55 +02:00
Clément Renault	31224a8425	Index the word pair proximities for both orders of the pair	2020-09-22 14:49:22 +02:00
Clément Renault	a58ae5eb2a	Introduce the word-pair-proximities-docids infos subcommand	2020-09-22 14:04:34 +02:00
Clément Renault	d6fa9c0414	Index the intra documents word pair proximities	2020-09-22 14:04:33 +02:00
Clément Renault	7b67ae6972	Introduce the StrStrU8 heed codec	2020-09-22 12:44:17 +02:00
Clément Renault	e34437b2d7	Move the proximity function to a module	2020-09-22 10:54:59 +02:00
Clément Renault	15208c7d3d	Simplify the indexer record loop	2020-09-22 10:33:30 +02:00
Clément Renault	e5adfaade0	Replace the token filter by a filter mapper	2020-09-22 10:24:31 +02:00
Clément Renault	d21c80b865	Apply the chunk compression parameters on all the MTBL writers	2020-09-21 18:30:54 +02:00
Clément Renault	944df52e2a	Simplify the indexer main loop	2020-09-21 14:59:48 +02:00
Kerollmops	3ded98e5fa	Bump the roaring version that fix a deserialization bug	2020-09-10 22:37:51 +02:00
Kerollmops	d5e5baa20f	Bump the oxidized-mtbl dependency	2020-09-10 13:29:12 +02:00
Kerollmops	aed0704404	Remove the temporary optimisation	2020-09-08 14:48:33 +02:00
Kerollmops	072382fa61	Sort the word docids to make intersections much faster	2020-09-07 22:38:49 +02:00
Kerollmops	ad11c5fb3f	Introduce the words-docids command for the infos binary	2020-09-07 22:36:35 +02:00
Kerollmops	5664c37539	Introduce an heed codec that reduce the size of small amount of serialized integers	2020-09-07 20:06:23 +02:00
Kerollmops	3e2250423c	Introduce the average-number-of-positions infos subcommand	2020-09-07 15:26:42 +02:00
Kerollmops	ea605b499c	Introduce two new infos subcommands	2020-09-07 14:56:48 +02:00
Clément Renault	bb1ab428db	Use another function to define the proximity	2020-09-06 17:55:07 +02:00
Clément Renault	dec460ce52	Fix the infos binary and add commands	2020-09-06 17:14:20 +02:00
Clément Renault	daa3673c1c	Invert the word docid positions key order	2020-09-06 10:30:53 +02:00
Clément Renault	c2405bcae2	Prefer using the word_docids db to create the words-fst	2020-09-06 10:23:56 +02:00
Kerollmops	4ca9472e02	Fix the minimum proximity len	2020-09-06 10:19:34 +02:00
Clément Renault	1c504471d3	Introduce the plane-sweep algorithm	2020-09-05 18:25:27 +02:00
Clément Renault	dc88a86259	Store the word positions under the documents	2020-09-05 18:03:06 +02:00
Kerollmops	580ed1119a	Make the engine to return csv string records as documents and headers	2020-08-31 19:02:00 +02:00
Clément Renault	bad0663138	Come back to the old tokenizer	2020-08-31 13:34:38 +02:00
Clément Renault	4afc4d0751	Use the groups of four positions to speed up disjunctions tests	2020-08-30 16:25:11 +02:00
Clément Renault	605f75b56f	Add the words grouped by four positions in the infos binary	2020-08-29 18:23:33 +02:00
Clément Renault	ad5cafbfed	Introduce a database to store docids in groups of four positions	2020-08-29 17:42:55 +02:00
Clément Renault	3db517548d	Move the documents back into the LMDB database	2020-08-29 15:14:04 +02:00
Clément Renault	816db7a0aa	Improve the RoaringBitmap codec to reserve enough vector space	2020-08-29 11:21:30 +02:00
Clément Renault	3fe497e129	Improve the Mtbl heed codec to only encode MTBL databases	2020-08-29 11:20:39 +02:00
Clément Renault	21aafd603c	Make sure the first document is associated to the document id 0	2020-08-29 10:56:40 +02:00
Clément Renault	0a44ff86ab	Put the documents MTBL back into LMDB We makes sure to write the documents into a file before memory mapping it and putting it into LMDB, this way we avoid moving it to RAM	2020-08-28 15:43:24 +02:00
Clément Renault	d784d87880	Remove the prefix LMDB databases	2020-08-28 14:41:43 +02:00
Clément Renault	7cde312f14	Introduce the StrBEU32Codec heed codec	2020-08-28 14:16:37 +02:00
Clément Renault	34db376ae5	Rename the RoaringBitmapCodec module	2020-08-28 13:31:16 +02:00
Kerollmops	38ddc71b83	Simplify the search algorithm	2020-08-26 15:16:41 +02:00
Kerollmops	ba2eb0d7ad	Take the words-fst into account when retrieving the biggests values	2020-08-26 14:36:22 +02:00
Clément Renault	32da07ccee	Introduce the word-positions-doc-ids and words-positions infos commands	2020-08-23 10:52:47 +02:00
Clément Renault	d19f394630	Make the indexer support gzipped CSV as input	2020-08-21 18:10:24 +02:00
Clément Renault	ff479c865d	Replace pipe by ringtail to improve stdin read performances	2020-08-21 17:45:52 +02:00
Clément Renault	ada30c2789	Introducing more arguments to specify the different compression algorithms	2020-08-21 16:41:26 +02:00
Clément Renault	02335ee72d	Introduce the biggest-value-sizes command on the infos binary	2020-08-21 14:44:42 +02:00
Clément Renault	1e3e756c19	Introduce the words-frequencies command on the infos binary	2020-08-21 14:44:42 +02:00
Kerollmops	6a230fe803	Move the contains_documents logic to a function	2020-08-21 14:44:42 +02:00
Kerollmops	e55a569629	Compress much more the documents database	2020-08-21 14:44:42 +02:00
Kerollmops	962bad3cea	Introduce an infos binary to fetch stats	2020-08-17 19:41:49 +02:00
Clément Renault	8806fcd545	Introduce a better query and document lexer	2020-08-16 14:36:54 +02:00
Clément Renault	1e358e3ae8	Introduce the AstarBagIter that iterates through best paths	2020-08-15 16:24:06 +02:00
Clément Renault	7dc594ba4d	Introduce the Search builder struct	2020-08-13 14:27:51 +02:00
Clément Renault	bfb46cbfbe	Introduce the Crtierion enum	2020-08-12 10:43:02 +02:00
Clément Renault	6d04a285dc	Retrieve and display the distances of the words found	2020-08-11 15:18:02 +02:00
Clément Renault	1bd37d213a	Lowercase quoted words	2020-08-10 14:49:09 +02:00
Clément Renault	883a8109c8	Show both database and documents database sizes	2020-08-10 14:37:18 +02:00
Clément Renault	a4e0f3f724	Remove the useless TransitiveArc from the serve binary	2020-08-10 14:06:27 +02:00
Clément Renault	edc06a97d6	Remove the useless stats binary	2020-08-10 13:55:02 +02:00
Clément Renault	ae77fe5a69	Introduce an option to specify the maximum database size	2020-08-10 13:53:53 +02:00
Clément Renault	394844062f	Move the documents MTBL database inside the Index	2020-08-10 13:47:19 +02:00
Clément Renault	ecd2b2f217	Make the final merge done in parallel	2020-08-07 15:44:04 +02:00
Clément Renault	91282c8b6a	Move the documents into another file	2020-08-07 13:11:31 +02:00
Clément Renault	fae694a102	Put the documents into an MTBL database	2020-08-07 12:14:40 +02:00
Clément Renault	405a71d3a4	Accept csv from stdin	2020-08-06 13:38:21 +02:00
Clément Renault	d3b1096510	Compute the word attribute postings lists on each threads	2020-08-06 11:50:27 +02:00
Clément Renault	8d734941af	Clean up some lines	2020-08-06 10:20:26 +02:00
Clément Renault	6508d497ce	Replace the regex highlighting by a simple algorithm	2020-08-05 13:52:27 +02:00
Clément Renault	4873abe145	Introduce option flags to toggle the indexing engine	2020-08-05 12:10:41 +02:00
Clément Renault	bd4b18541c	Introduce a new indexer which uses an MTBL sorter	2020-08-04 15:44:37 +02:00
Kerollmops	ee305c9284	Replace the title by the milli logo	2020-07-15 23:55:28 +02:00
Kerollmops	9ade00e27b	Highlight all the matching words	2020-07-14 11:53:21 +02:00
Kerollmops	085c376655	Use the regex crate to highlight "hello"	2020-07-14 11:28:40 +02:00
Kerollmops	aa92311d4e	Add a dark theme to the dashboard	2020-07-13 23:51:41 +02:00
Kerollmops	3d144e62c4	Search for best proximities in multiple attributes	2020-07-13 19:06:56 +02:00
Kerollmops	576dd011a1	Compute the candidates but not by attribute	2020-07-13 18:16:05 +02:00
Kerollmops	6b14b20369	Introduce a method to retrieve the number of attributes of the documents	2020-07-13 17:50:16 +02:00
Kerollmops	92c2b1dd2d	Refine the help message of the binaries	2020-07-12 11:06:45 +02:00
Kerollmops	f757df5dfd	Introduce the stderr logger to the project	2020-07-12 11:04:35 +02:00
Kerollmops	12358476da	Use the log crate instead of stderr	2020-07-12 10:55:09 +02:00
Kerollmops	2c62eeea3c	Rename the project milli	2020-07-12 00:16:41 +02:00
Kerollmops	d31da26a51	Avoid cloning RoraringBitmaps when unecessary	2020-07-11 23:51:32 +02:00
Kerollmops	b8a1fc0126	Clean up the CSS style custom bulma rules	2020-07-11 14:51:59 +02:00
Kerollmops	f6eae91c7d	Pretty print the new dashboard numbers	2020-07-11 14:17:37 +02:00
Kerollmops	d44428fa90	Display more informations on the dashboard	2020-07-11 11:51:56 +02:00
Kerollmops	11c7fef80a	Implement a memory dumper It moves the in memory HashMaps used when indexing to a disk based MTBL file	2020-07-07 16:48:49 +02:00
Kerollmops	b12bfcb03b	Reduce the deepness of the word position document ids This helps reduce the number of allocations.	2020-07-07 12:30:05 +02:00
Kerollmops	7178b6c2c4	First basic version using MTBL again	2020-07-07 11:32:33 +02:00
Kerollmops	adb1038b26	Add a `jobs` parameter to set the number of threads the indexer uses	2020-07-06 12:17:17 +02:00
Kerollmops	ec1023e790	Intersect document ids by inverse popularity of the words This reduces the worst request we had which took 56s to now took 3s ("the best of the do").	2020-07-05 19:33:51 +02:00
Kerollmops	cd7e64b2b3	Allow users to set the arc cache size when indexing	2020-07-04 18:12:41 +02:00
Kerollmops	ac8353a64f	Merge pre-computed word attribute documents ids	2020-07-04 17:02:27 +02:00
Kerollmops	fea7cac206	Display the time it took to compute the word attribute documents ids	2020-07-04 15:18:38 +02:00
Kerollmops	46ced5c828	Introduce the RwIter append heed API	2020-07-04 12:34:10 +02:00
Kerollmops	7e7440c431	Finalize the LMDB indexing design	2020-07-01 22:45:43 +02:00
Kerollmops	2ae3f40971	Make the indexer ignore certain words This is a preparation for making the indexing fully parallel by making the indexer only be aware of certain words for each threads to avoid postings lists conflicts for each words	2020-07-01 17:49:46 +02:00
Kerollmops	a3ac2623d5	Introduce multiple functions to clean up the code	2020-07-01 17:24:55 +02:00
Kerollmops	ac5cc7ddad	Introduce an Iterator yielding owned entries for the LruCache	2020-07-01 17:21:52 +02:00
Kerollmops	014a25697d	Use only one ARC cache based on the words	2020-07-01 12:03:18 +02:00
Kerollmops	fc4013a43f	Fix the ARC cache	2020-07-01 10:35:07 +02:00
Kerollmops	2fcae719ad	Use another LRU impl which uses hashbrown	2020-06-29 22:26:06 +02:00
Kerollmops	f98b615bf3	Replace the LRU by an Arc cache	2020-06-29 20:48:57 +02:00

1 2 3 4 5 ...

317 Commits