4. Partitioned Databases¶
A partitioned database forms documents into logical partitions by using a partition key. All documents are assigned to a partition, and many documents are typically given the same partition key. The benefit of partitioned databases is that secondary indices can be significantly more efficient when locating matching documents since their entries are contained within their partition. This means a given secondary index read will only scan a single partition range instead of having to read from a copy of every shard.
As a means to introducing partitioned databases, we’ll consider a motivating use case to describe the benefits of this feature. For this example, we’ll consider a database that stores readings from a large network of soil moisture sensors.
Traditionally, a document in this database may have something like the following structure:
{
"_id": "sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
"_rev":"1-14e8f3262b42498dbd5c672c9d461ff0",
"sensor_id": "sensor-260",
"location": [41.6171031, -93.7705674],
"field_name": "Bob's Corn Field #5",
"readings": [
["2019-01-21T00:00:00", 0.15],
["2019-01-21T06:00:00", 0.14],
["2019-01-21T12:00:00", 0.16],
["2019-01-21T18:00:00", 0.11]
]
}
Note
While this example uses IoT sensors, the main thing to consider is that there is a logical grouping of documents. Similar use cases might be documents grouped by user or scientific data grouped by experiment.
So we’ve got a bunch of sensors, all grouped by the field they monitor along with their readouts for a given day (or other appropriate time period).
Along with our documents, we might expect to have two secondary indexes for querying our database that might look something like:
function(doc) {
if(doc._id.indexOf("sensor-reading-") != 0) {
return;
}
for(var r in doc.readings) {
emit([doc.sensor_id, r[0]], r[1])
}
}
and:
function(doc) {
if(doc._id.indexOf("sensor-reading-") != 0) {
return;
}
emit(doc.field_name, doc.sensor_id)
}
With these two indexes defined, we can easily find all readings for a given sensor, or list all sensors in a given field.
Unfortunately, in CouchDB, when we read from either of these indexes, it requires finding a copy of every shard and asking for any documents related to the particular sensor or field. This means that as our database scales up the number of shards, every index request must perform more work, which is unnecessary since we are only interested in a small number of documents. Fortunately for you, dear reader, partitioned databases were created to solve this precise problem.
4.1. What is a partition?¶
In the previous section, we introduced a hypothetical database that contains
sensor readings from an IoT field monitoring service. In this particular
use case, it’s quite logical to group all documents by their sensor_id
field. In this case, we would call the sensor_id
the partition key.
A good partition has two basic properties. First, it should have a high cardinality. That is, a large partitioned database should have many more partitions than documents in any single partition. A database that has a single partition would be an anti-pattern for this feature. Secondly, the amount of data per partition should be “small”. The general recommendation is to limit individual partitions to less than ten gigabytes (10 GB) of data. Which, for the example of sensor documents, equates to roughly 60,000 years of data.
Note
The max_partition_size
under CouchDB dictates the partition limit.
The default value for this option is 10GiB but can be changed accordingly.
Setting the value for this option to 0 disables the partition limit.
4.2. Why use partitions?¶
The primary benefit of using partitioned databases is for the performance of partitioned queries. Large databases with lots of documents often have a similar pattern where there are groups of related documents that are queried together.
By using partitions, we can execute queries against these individual groups
of documents more efficiently by placing the entire group within a specific
shard on disk. Thus, the view engine only has to consult one copy of the
given shard range when executing a query instead of executing
the query across all q
shards in the database. This mean that you do
not have to wait for all q
shards to respond, which is both
efficient and faster.
4.3. Partitions By Example¶
To create a partitioned database, we simply need to pass a query string parameter:
shell> curl -X PUT 'http://adm:pass@127.0.0.1:5984/my_new_db?partitioned=true'
{"ok":true}
To see that our database is partitioned, we can look at the database information:
shell> curl http://adm:pass@127.0.0.1:5984/my_new_db
{
"cluster": {
"n": 3,
"q": 8,
"r": 2,
"w": 2
},
"compact_running": false,
"db_name": "my_new_db",
"disk_format_version": 7,
"doc_count": 0,
"doc_del_count": 0,
"instance_start_time": "0",
"props": {
"partitioned": true
},
"purge_seq": "0-g1AAAAFDeJzLYWBg4M...",
"sizes": {
"active": 0,
"external": 0,
"file": 66784
},
"update_seq": "0-g1AAAAFDeJzLYWBg4M..."
}
You’ll now see that the "props"
member contains "partitioned": true
.
Note
Every document in a partitioned database (except _design and _local documents) must have the format “partition:docid”. More specifically, the partition for a given document is everything before the first colon. The document id is everything after the first colon, which may include more colons.
Note
System databases (such as _users) are not allowed to be partitioned. This is due to system databases already having their own incompatible requirements on document ids.
Now that we’ve created a partitioned database, it’s time to add some documents. Using our earlier example, we could do this as such:
shell> cat doc.json
{
"_id": "sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
"sensor_id": "sensor-260",
"location": [41.6171031, -93.7705674],
"field_name": "Bob's Corn Field #5",
"readings": [
["2019-01-21T00:00:00", 0.15],
["2019-01-21T06:00:00", 0.14],
["2019-01-21T12:00:00", 0.16],
["2019-01-21T18:00:00", 0.11]
]
}
shell> $ curl -X POST -H "Content-Type: application/json" \
http://adm:pass@127.0.0.1:5984/my_new_db -d @doc.json
{
"ok": true,
"id": "sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
"rev": "1-05ed6f7abf84250e213fcb847387f6f5"
}
The only change required to the first example document is that we are now including the partition name in the document id by prepending it to the old id separated by a colon.
Note
The partition name in the document id is not magical. Internally, the database is simply using only the partition for hashing the document to a given shard, instead of the entire document id.
Working with documents in a partitioned database is no different than a non-partitioned database. All APIs are available, and existing client code will all work seamlessly.
Now that we have created a document, we can get some info about the partition containing the document:
shell> curl http://adm:pass@127.0.0.1:5984/my_new_db/_partition/sensor-260
{
"db_name": "my_new_db",
"doc_count": 1,
"doc_del_count": 0,
"partition": "sensor-260",
"sizes": {
"active": 244,
"external": 347
}
}
And we can also list all documents in a partition:
shell> curl http://adm:pass@127.0.0.1:5984/my_new_db/_partition/sensor-260/_all_docs
{"total_rows": 1, "offset": 0, "rows":[
{
"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
"key":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
"value": {"rev": "1-05ed6f7abf84250e213fcb847387f6f5"}
}
]}
Note that we can use all of the normal bells and whistles available to
_all_docs
requests. Accessing _all_docs
through the
/dbname/_partition/name/_all_docs
endpoint is mostly a convenience
so that requests are guaranteed to be scoped to a given partition. Users
are free to use the normal /dbname/_all_docs
to read documents from
multiple partitions. Both query styles have the same performance.
Next, we’ll create a design document containing our index for getting all readings from a given sensor. The map function is similar to our earlier example except we’ve accounted for the change in the document id.
function(doc) {
if(doc._id.indexOf(":sensor-reading-") < 0) {
return;
}
for(var r in doc.readings) {
emit([doc.sensor_id, r[0]], r[1])
}
}
After uploading our design document, we can try out a partitioned query:
shell> cat ddoc.json
{
"_id": "_design/sensor-readings",
"views": {
"by_sensor": {
"map": "function(doc) { ... }"
}
}
}
shell> $ curl -X POST -H "Content-Type: application/json" http://adm:pass@127.0.0.1:5984/my_new_db -d @ddoc.json
{
"ok": true,
"id": "_design/sensor-readings",
"rev": "1-13859808da293bd72fde3b31be97372a"
}
shell> curl http://adm:pass@127.0.0.1:5984/my_new_db/_partition/sensor-260/_design/sensor-readings/_view/by_sensor
{"total_rows":4,"offset":0,"rows":[
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","0"],"value":null},
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","1"],"value":null},
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","2"],"value":null},
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","3"],"value":null}
]}
Hooray! Our first partitioned query. For experienced users, that may not be the most exciting development, given that the only things that have changed are a slight tweak to the document id, and accessing views with a slightly different path. However, for anyone who likes performance improvements, it’s actually a big deal. By knowing that the view results are all located within the provided partition name, our partitioned queries now perform nearly as fast as document lookups!
The last thing we’ll look at is how to query data across multiple partitions. For that, we’ll implement the example sensors by field query from our initial example. The map function will use the same update to account for the new document id format, but is otherwise identical to the previous version:
function(doc) {
if(doc._id.indexOf(":sensor-reading-") < 0) {
return;
}
emit(doc.field_name, doc.sensor_id)
}
Next, we’ll create a new design doc with this function. Be sure to notice
that the "options"
member contains "partitioned": false
.
shell> cat ddoc2.json
{
"_id": "_design/all_sensors",
"options": {
"partitioned": false
},
"views": {
"by_field": {
"map": "function(doc) { ... }"
}
}
}
shell> $ curl -X POST -H "Content-Type: application/json" http://adm:pass@127.0.0.1:5984/my_new_db -d @ddoc2.json
{
"ok": true,
"id": "_design/all_sensors",
"rev": "1-4a8188d80fab277fccf57bdd7154dec1"
}
Note
Design documents in a partitioned database default to being
partitioned. Design documents that contain views for queries
across multiple partitions must contain the "partitioned": false
member in the "options"
object.
Note
Design documents are either partitioned or global. They cannot contain a mix of partitioned and global indexes.
And to see a request showing us all sensors in a field, we would use a request like:
shell> curl -u adm:pass http://adm:pass@127.0.0.1:15984/my_new_db/_design/all_sensors/_view/by_field
{"total_rows":1,"offset":0,"rows":[
{"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":"Bob's Corn Field #5","value":"sensor-260"}
]}
Notice that we’re not using the /dbname/_partition/...
path for global
queries. This is because global queries, by definition, do not cover individual
partitions. Other than having the "partitioned": false
parameter in the
design document, global design documents and queries are identical in
behavior to design documents on non-partitioned databases.
Warning
To be clear, this means that global queries perform identically to queries on non-partitioned databases. Only partitioned queries on a partitioned database benefit from the performance improvements.