4. Partitioned Databases

A partitioned database forms documents into logical partitions by using a partition key. All documents are assigned to a partition, and many documents are typically given the same partition key. The benefit of partitioned databases is that secondary indices can be significantly more efficient when locating matching documents since their entries are contained within their partition. This means a given secondary index read will only scan a single partition range instead of having to read from a copy of every shard.

As a means to introducing partitioned databases, we’ll consider a motivating use case to describe the benefits of this feature. For this example, we’ll consider a database that stores readings from a large network of soil moisture sensors.

Note

Before reading this document you should be familiar with the theory of sharding in CouchDB.

Traditionally, a document in this database may have something like the following structure:

{
    "_id": "sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
    "_rev":"1-14e8f3262b42498dbd5c672c9d461ff0",
    "sensor_id": "sensor-260",
    "location": [41.6171031, -93.7705674],
    "field_name": "Bob's Corn Field #5",
    "readings": [
        ["2019-01-21T00:00:00", 0.15],
        ["2019-01-21T06:00:00", 0.14],
        ["2019-01-21T12:00:00", 0.16],
        ["2019-01-21T18:00:00", 0.11]
    ]
}

Note

While this example uses IoT sensors, the main thing to consider is that there is a logical grouping of documents. Similar use cases might be documents grouped by user or scientific data grouped by experiment.

So we’ve got a bunch of sensors, all grouped by the field they monitor along with their readouts for a given day (or other appropriate time period).

Along with our documents, we might expect to have two secondary indexes for querying our database that might look something like:

function(doc) {
    if(doc._id.indexOf("sensor-reading-") != 0) {
        return;
    }
    for(var r in doc.readings) {
        emit([doc.sensor_id, r[0]], r[1])
    }
}

and:

function(doc) {
    if(doc._id.indexOf("sensor-reading-") != 0) {
        return;
    }
    emit(doc.field_name, doc.sensor_id)
}

With these two indexes defined, we can easily find all readings for a given sensor, or list all sensors in a given field.

Unfortunately, in CouchDB, when we read from either of these indexes, it requires finding a copy of every shard and asking for any documents related to the particular sensor or field. This means that as our database scales up the number of shards, every index request must perform more work, which is unnecessary since we are only interested in a small number of documents. Fortunately for you, dear reader, partitioned databases were created to solve this precise problem.

4.1. What is a partition?

In the previous section, we introduced a hypothetical database that contains sensor readings from an IoT field monitoring service. In this particular use case, it’s quite logical to group all documents by their sensor_id field. In this case, we would call the sensor_id the partition key.

A good partition has two basic properties. First, it should have a high cardinality. That is, a large partitioned database should have many more partitions than documents in any single partition. A database that has a single partition would be an anti-pattern for this feature. Secondly, the amount of data per partition should be “small”. The general recommendation is to limit individual partitions to less than ten gigabytes (10 GB) of data. Which, for the example of sensor documents, equates to roughly 60,000 years of data.

Note

The max_partition_size under CouchDB dictates the partition limit. The default value for this option is 10GiB but can be changed accordingly. Setting the value for this option to 0 disables the partition limit.

4.2. Why use partitions?

The primary benefit of using partitioned databases is for the performance of partitioned queries. Large databases with lots of documents often have a similar pattern where there are groups of related documents that are queried together.

By using partitions, we can execute queries against these individual groups of documents more efficiently by placing the entire group within a specific shard on disk. Thus, the view engine only has to consult one copy of the given shard range when executing a query instead of executing the query across all q shards in the database. This mean that you do not have to wait for all q shards to respond, which is both efficient and faster.

4.3. Partitions By Example

To create a partitioned database, we simply need to pass a query string parameter:

shell> curl -X PUT 'http://adm:pass@127.0.0.1:5984/my_new_db?partitioned=true'
{"ok":true}

To see that our database is partitioned, we can look at the database information:

shell> curl http://adm:pass@127.0.0.1:5984/my_new_db
{
  "cluster": {
    "n": 3,
    "q": 8,
    "r": 2,
    "w": 2
  },
  "compact_running": false,
  "db_name": "my_new_db",
  "disk_format_version": 7,
  "doc_count": 0,
  "doc_del_count": 0,
  "instance_start_time": "0",
  "props": {
    "partitioned": true
  },
  "purge_seq": "0-g1AAAAFDeJzLYWBg4M...",
  "sizes": {
    "active": 0,
    "external": 0,
    "file": 66784
  },
  "update_seq": "0-g1AAAAFDeJzLYWBg4M..."
}

You’ll now see that the "props" member contains "partitioned": true.

Note

Every document in a partitioned database (except _design and _local documents) must have the format “partition:docid”. More specifically, the partition for a given document is everything before the first colon. The document id is everything after the first colon, which may include more colons.

Note

System databases (such as _users) are not allowed to be partitioned. This is due to system databases already having their own incompatible requirements on document ids.

Now that we’ve created a partitioned database, it’s time to add some documents. Using our earlier example, we could do this as such:

shell> cat doc.json
{
    "_id": "sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
    "sensor_id": "sensor-260",
    "location": [41.6171031, -93.7705674],
    "field_name": "Bob's Corn Field #5",
    "readings": [
        ["2019-01-21T00:00:00", 0.15],
        ["2019-01-21T06:00:00", 0.14],
        ["2019-01-21T12:00:00", 0.16],
        ["2019-01-21T18:00:00", 0.11]
    ]
}
shell> $ curl -X POST -H "Content-Type: application/json" \
            http://adm:pass@127.0.0.1:5984/my_new_db -d @doc.json
{
    "ok": true,
    "id": "sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
    "rev": "1-05ed6f7abf84250e213fcb847387f6f5"
}

The only change required to the first example document is that we are now including the partition name in the document id by prepending it to the old id separated by a colon.

Note

The partition name in the document id is not magical. Internally, the database is simply using only the partition for hashing the document to a given shard, instead of the entire document id.

Working with documents in a partitioned database is no different than a non-partitioned database. All APIs are available, and existing client code will all work seamlessly.

Now that we have created a document, we can get some info about the partition containing the document:

shell> curl http://adm:pass@127.0.0.1:5984/my_new_db/_partition/sensor-260
{
  "db_name": "my_new_db",
  "doc_count": 1,
  "doc_del_count": 0,
  "partition": "sensor-260",
  "sizes": {
    "active": 244,
    "external": 347
  }
}

And we can also list all documents in a partition:

shell> curl http://adm:pass@127.0.0.1:5984/my_new_db/_partition/sensor-260/_all_docs
{"total_rows": 1, "offset": 0, "rows":[
    {
        "id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
        "key":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
        "value": {"rev": "1-05ed6f7abf84250e213fcb847387f6f5"}
    }
]}

Note that we can use all of the normal bells and whistles available to _all_docs requests. Accessing _all_docs through the /dbname/_partition/name/_all_docs endpoint is mostly a convenience so that requests are guaranteed to be scoped to a given partition. Users are free to use the normal /dbname/_all_docs to read documents from multiple partitions. Both query styles have the same performance.

Next, we’ll create a design document containing our index for getting all readings from a given sensor. The map function is similar to our earlier example except we’ve accounted for the change in the document id.

function(doc) {
    if(doc._id.indexOf(":sensor-reading-") < 0) {
        return;
    }
    for(var r in doc.readings) {
        emit([doc.sensor_id, r[0]], r[1])
    }
}

After uploading our design document, we can try out a partitioned query:

shell> cat ddoc.json
{
    "_id": "_design/sensor-readings",
    "views": {
        "by_sensor": {
            "map": "function(doc) { ... }"
        }
    }
}
shell> $ curl -X POST -H "Content-Type: application/json" http://adm:pass@127.0.0.1:5984/my_new_db -d @ddoc.json
{
    "ok": true,
    "id": "_design/sensor-readings",
    "rev": "1-13859808da293bd72fde3b31be97372a"
}
shell> curl http://adm:pass@127.0.0.1:5984/my_new_db/_partition/sensor-260/_design/sensor-readings/_view/by_sensor
{"total_rows":4,"offset":0,"rows":[
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","0"],"value":null},
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","1"],"value":null},
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","2"],"value":null},
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","3"],"value":null}
]}

Hooray! Our first partitioned query. For experienced users, that may not be the most exciting development, given that the only things that have changed are a slight tweak to the document id, and accessing views with a slightly different path. However, for anyone who likes performance improvements, it’s actually a big deal. By knowing that the view results are all located within the provided partition name, our partitioned queries now perform nearly as fast as document lookups!

The last thing we’ll look at is how to query data across multiple partitions. For that, we’ll implement the example sensors by field query from our initial example. The map function will use the same update to account for the new document id format, but is otherwise identical to the previous version:

function(doc) {
    if(doc._id.indexOf(":sensor-reading-") < 0) {
        return;
    }
    emit(doc.field_name, doc.sensor_id)
}

Next, we’ll create a new design doc with this function. Be sure to notice that the "options" member contains "partitioned": false.

shell> cat ddoc2.json
{
  "_id": "_design/all_sensors",
  "options": {
    "partitioned": false
  },
  "views": {
    "by_field": {
      "map": "function(doc) { ... }"
    }
  }
}
shell> $ curl -X POST -H "Content-Type: application/json" http://adm:pass@127.0.0.1:5984/my_new_db -d @ddoc2.json
{
    "ok": true,
    "id": "_design/all_sensors",
    "rev": "1-4a8188d80fab277fccf57bdd7154dec1"
}

Note

Design documents in a partitioned database default to being partitioned. Design documents that contain views for queries across multiple partitions must contain the "partitioned": false member in the "options" object.

Note

Design documents are either partitioned or global. They cannot contain a mix of partitioned and global indexes.

And to see a request showing us all sensors in a field, we would use a request like:

shell> curl -u adm:pass http://adm:pass@127.0.0.1:15984/my_new_db/_design/all_sensors/_view/by_field
{"total_rows":1,"offset":0,"rows":[
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":"Bob's Corn Field #5","value":"sensor-260"}
]}

Notice that we’re not using the /dbname/_partition/... path for global queries. This is because global queries, by definition, do not cover individual partitions. Other than having the "partitioned": false parameter in the design document, global design documents and queries are identical in behavior to design documents on non-partitioned databases.

Warning

To be clear, this means that global queries perform identically to queries on non-partitioned databases. Only partitioned queries on a partitioned database benefit from the performance improvements.