-
Notifications
You must be signed in to change notification settings - Fork 482
Indexing and Query Performance
Loki.js has always been a fast, in-memory database solution. In fact, recent benchmarks indicate that its primary get() operation is about 1.4 million operations per second fast on a mid-range Core i5 running under node.js. The get() operation utilizes an auto generated 'id' column with its own auto generated binary index. So out of the gate if you intend to do single object lookups you get this performance.
A more versatile way to query is to use collection.find() which accepts a mongo-style query object. If you do not index the column you are searching against, you can expect about 20k ops/sec under node.js (browser performance may vary but this serves as a good order of magnitude). For most purposes that is probably more performance than is needed, but you can now apply loki.js binary indexes on your object properties as well. Using the collection.ensureBinaryIndex(propertyName) method, you can create an index which can be used by various find() operations such as collection.find(). For our test benchmark, this increased performance to about 500k ops/sec.
Where filters should be used sparingly if performance is of concern. It is unable to utilize indexes, so performance will be no better than an unindexed find, and depending on the complexity of your filter function even less so. Unindex queries and where filters always require a full array scan but they can be useful if thousands of ops/sec are sufficient or if used later in a query chain or dynamic view filter pipeline with less penalty.
The Resultset class introduced method chaining as an option for querying. You might use this method chaining to apply several find operations in succession or mix find(), where(), and sort() operations into a sequential chained pipe. For simplicity, an example of this might be (where users is a collection object) :
users.chain().find(queryObj).where(queryFunc).sort('name').data();
Examining this statement, if queryObj (a mongo-style query object) were { 'age': { '$gt': 30 } }, then that age column would be best to apply an index on, and that find() chain operation should come first in the chain. In chained operations, only the first chained operation can utilize the indexes for filtering. If it filtered out a sufficient number of records, the impact of the (where) query function will be less. The overhead of maintaining the filtered result set reduces performance by about 20% over collection.find, but they enable much more versatility. In our benchmarks this is still about 400k ops/sec.
Dynamic Views behave similarly to resultsets in that you want to utilize an index, your first filter must be applied using
var userview = users.addDynamicView("over30");
userview.applyFind(queryObj);
That query object should refer to a field which you have applied an index to. Dynamic Views run their filters once however, so even non performant query pipelines are fast after they are set up. This is due to re-evaluation of those filters on single objects as they are inserted, updated, or deleted from the collection. Being single object evaluations there is no array scan penalty which occurs during the first evaluation. The overhead of dynamic views, which ride on top of the resultset, reduces performance of the first evaluation by about 40%, however subsequent queries are highly optimized (faster than collection.find). Even with that overhead, our benchmarks show roughly 300k ops/sec performance on initial evaluation. Depending on update frequency, subsequent evaluations can scale up to over 1 million ops/sec.
In loki.js, Dynamic Views have an option called persistent. This is not the same as serialized... all created dynamic views will be serialized. A good use of persistent views might be for data binding applications. What the persistent Dynamic Views do is keep an internal copy of the results in its own internal array, filtered and sorted according to your specifications. This copying of results into the internal array occurs during the first data() evaluation or filters or sorts are dirty (documents inserted, updated, removed from view). Once your data is initially populated, consider how many inserts/updates/deletes will be done on the collection between .data() calls... if its well over 10 or 100, then it might be worth the trade off in memory overhead associated with persistent dynamic views.