{"id":106842,"date":"2023-05-18T14:59:03","date_gmt":"2023-05-18T14:59:03","guid":{"rendered":"https:\/\/www.dash.org\/?p=106842"},"modified":"2023-05-19T06:17:03","modified_gmt":"2023-05-19T06:17:03","slug":"grovedb-secondary-indexes","status":"publish","type":"post","link":"https:\/\/www.dash.org\/blog\/grovedb-secondary-indexes\/","title":{"rendered":"GroveDB Secondary Indexes"},"content":{"rendered":"

Secondary indexes are an essential part of most database use cases. They enable massive efficiency gains for almost all queries more complex than single-key retrievals. Yet, until now, there have been essentially no databases which have enabled <\/span>cryptographic proofs<\/span><\/i> for queries on secondary indexes. The engineers at Dash Core Group, in pursuit of greater decentralization, felt that this was an essential feature for Dash Platform, so we created our own solution in the form of a multilayered, specialized, provable database: GroveDB.<\/span><\/p>\n

This article will discuss the applications of secondary index query proofs and why they will make Dash Platform very attractive for dApp developers before diving into the background, architecture, and some implementation details of secondary indexes built on top of GroveDB.<\/span><\/p>\n

Special thanks to Ivan Shumkov, Samuel Westrich, ThePhez, and Virgile Bartolo for their contributions to this article.<\/span><\/p>\n

 <\/p>\n

Applications<\/span><\/h1>\n

Secondary index query proofs will become an invaluable tool for systems that handle sensitive data, as they are the only true method of enabling users to have absolute certainty that the data they receive hasn’t been tampered with, while at the same time allowing for efficient complex queries. Use cases can be imagined for many industries, but in the blockchain space, an industry where trustlessness is a major focus, their need is especially clear.<\/span><\/p>\n

 <\/p>\n

dApps<\/span><\/h4>\n

Many popular blockchain dApps, including Metamask, Uniswap, and Opensea, currently rely on external services like The Graph for indexing. However, these services don’t offer any query proofs to ensure the authenticity of query results. Instead, users are forced to rely on economic incentives and reputation systems which <\/span>try<\/span><\/i> to ensure nodes supply authentic data. However, they ultimately provide no guarantees or means of verification. GroveDB’s secondary index query proofs <\/span>can<\/span><\/i> provide such means and guarantees, and it’s therefore likely that many applications will want to use GroveDB for that reason.<\/span><\/p>\n

 <\/p>\n

Dash Platform<\/span><\/h4>\n

While GroveDB was built to be a standalone product which any system using RocksDB can integrate, Dash Platform will be the first to do so. Developers who deploy their dApps on Dash Platform will thus have an edge over dApps deployed on other platforms, as they will have security guarantees and query capabilities enabled by GroveDB on a level that can\u2019t be found elsewhere.<\/span><\/p>\n

We at DCG are very excited to see the impact GroveDB\u2019s complex query proofs will have on the blockchain database field, and invite any interested readers, developers, and product owners to reach out with questions. For more information, check out the <\/span>GitHub<\/span><\/a> repository, and keep an eye out for the release of the official website coming soon.<\/span><\/p>\n

 <\/p>\n

Background<\/span><\/h1>\n

GroveDB Documents<\/span><\/h4>\n

GroveDB stores its data as key-values, which is the default method for storing data on blockchain databases. However, it also contains logic that allows the key-values to be combined into another form of hybrid data somewhere between a relational database table record and a NoSQL document. Because they\u2019re handled as JSON objects and even called documents in Dash Platform, we also usually just refer to them as documents when talking in the context of GroveDB.<\/span><\/p>\n

 <\/p>\n\n\n\n
{<\/span>
\n<\/span>\u00a0 <\/span>“id”<\/span>: <\/span>“EgHsMrtzbMrJSaxixRWSNqIbXShASteoJxUBkAwMFcSveMPfLKLTyyMbwuMDXkl”<\/span>,<\/span>
\n<\/span>\u00a0 <\/span>“name”<\/span>: <\/span>“Alice”<\/span>,<\/span>
\n<\/span>\u00a0 <\/span>“city”<\/span>: <\/span>“New York”<\/span>,<\/span>
\n<\/span>\u00a0 <\/span>“cuisine”<\/span>: <\/span>“Italian”<\/span>,<\/span>
\n<\/span>\u00a0 “acceptsDash”<\/span>: <\/span>True<\/span>
\n<\/span>}<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

A NoSQL document with five key-value fields.<\/span><\/i><\/p>\n

 <\/p>\n

Secondary Indexes<\/span><\/h4>\n

Documents in document-oriented databases and records in relational databases require unique identifiers so they can be distinguished from other documents\/records that may otherwise be identical. Usually, an array of random bytes is used under the \u201cid\u201d field. The ID serves as the <\/span>primary index<\/span><\/i>, so that specific documents\/records can easily be retrieved by querying their ID. However, querying by ID often isn\u2019t very useful. Most applications need to be able to query by specific fields, such as name, city, cuisine, acceptsDash, etc., which are <\/span>secondary indexes<\/span><\/i>.<\/span><\/p>\n

With secondary indexes, clients are able to execute queries that do things like, for example, retrieve the names of all the restaurants within a given city, given that a secondary index is created for the <\/span>city<\/span><\/i> field. The database would simply have to navigate to the <\/span>city<\/span><\/i> index and iterate over the documents\/records for the given city. If a client wanted to perform the same query <\/span>without<\/span><\/i> secondary indexes, they would need to iterate through <\/span>every single document\/record<\/span><\/i> in the database to check if that restaurant is in that city, and then return the name. Through this example it should be clear that secondary indexes enable massive efficiency gains for most database use cases.<\/span><\/p>\n

 <\/p>\n

Cryptographic Proofs<\/span><\/h4>\n

Cryptographic proofs are the lifeblood of trustless databases. If you\u2019re given a cryptographic proof of the results of your query, you can verify with certainty that those results are what\u2019s really stored in the database. This is a very important feature when dealing with, for example, financial applications distributed across a large number of anonymously-hosted servers. Without cryptographic proofs, the hosts could easily manipulate the data and clients would have no way of verifying whether the returned data is authentic. Somewhat shockingly, this is how almost every database in the world works today. Every time a user queries a database without proofs, they are <\/span>trusting<\/span><\/i> that the hosts returns accurate information. To see exactly how proofs work in GroveDB, see our Merk implementation docs<\/a>.<\/span><\/p>\n

 <\/p>\n

Prior Work<\/span><\/h1>\n

While no previous solution has enabled secondary index query <\/span>proofs<\/span><\/i> (at least that is open source), we can take a look at how secondary indexes alone have been implemented in other key-value databases in order to provide a backdrop of where GroveDB\u2019s secondary index architecture fits into the grand scheme of things. There are two main categories of secondary index structures for key-value databases: inverted indexes and b-trees:<\/span><\/p>\n