.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
.. use this file except in compliance with the License. You may obtain a copy of
.. the License at
..
..   http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing, software
.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
.. License for the specific language governing permissions and limitations under
.. the License.

.. _partitioned-dbs:

=====================
Partitioned Databases
=====================

A partitioned database forms documents into logical partitions by using
a partition key. All documents are assigned to a partition, and many documents
are typically given the same partition key. The benefit of partitioned databases
is that secondary indices can be significantly more efficient when locating
matching documents since their entries are contained within their partition.
This means a given secondary index read will only scan a single partition
range instead of having to read from a copy of every shard.

As a means to introducing partitioned databases, we'll consider a motivating
use case to describe the benefits of this feature. For this example, we'll
consider a database that stores readings from a large network of soil
moisture sensors.

.. note::
    Before reading this document you should be familiar with the
    :ref:`theory <cluster/theory>` of :ref:`sharding <cluster/sharding>`
    in CouchDB.

Traditionally, a document in this database may have something like the
following structure:

.. code-block:: javascript

    {
        "_id": "sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
        "_rev":"1-14e8f3262b42498dbd5c672c9d461ff0",
        "sensor_id": "sensor-260",
        "location": [41.6171031, -93.7705674],
        "field_name": "Bob's Corn Field #5",
        "readings": [
            ["2019-01-21T00:00:00", 0.15],
            ["2019-01-21T06:00:00", 0.14],
            ["2019-01-21T12:00:00", 0.16],
            ["2019-01-21T18:00:00", 0.11]
        ]
    }

.. note::
    While this example uses IoT sensors, the main thing to consider is that
    there is a logical grouping of documents. Similar use cases might be
    documents grouped by user or scientific data grouped by experiment.

So we've got a bunch of sensors, all grouped by the field they monitor
along with their readouts for a given day (or other appropriate time period).

Along with our documents, we might expect to have two secondary indexes
for querying our database that might look something like:

.. code-block:: javascript

    function(doc) {
        if(doc._id.indexOf("sensor-reading-") != 0) {
            return;
        }
        for(var r in doc.readings) {
            emit([doc.sensor_id, r[0]], r[1])
        }
    }

and:

.. code-block:: javascript

    function(doc) {
        if(doc._id.indexOf("sensor-reading-") != 0) {
            return;
        }
        emit(doc.field_name, doc.sensor_id)
    }

With these two indexes defined, we can easily find all readings for a given
sensor, or list all sensors in a given field.

Unfortunately, in CouchDB, when we read from either of these indexes, it
requires finding a copy of every shard and asking for any documents related
to the particular sensor or field. This means that as our database scales
up the number of shards, every index request must perform more work,
which is unnecessary since we are only interested in a small number of documents.
Fortunately for you, dear reader, partitioned databases were created to solve
this precise problem.

What is a partition?
====================

In the previous section, we introduced a hypothetical database that contains
sensor readings from an IoT field monitoring service. In this particular
use case, it's quite logical to group all documents by their ``sensor_id``
field. In this case, we would call the ``sensor_id`` the partition key.

A good partition has two basic properties. First, it should have a high
cardinality. That is, a large partitioned database should have many more
partitions than documents in any single partition. A database that has
a single partition would be an anti-pattern for this feature. Secondly,
the amount of data per partition should be "small". The general
recommendation is to limit individual partitions to less than ten
gigabytes (10 GB) of data. Which, for the example of sensor documents,
equates to roughly 60,000 years of data.

.. note::

    The ``max_partition_size`` under CouchDB dictates the partition limit.
    The default value for this option is 10GiB but can be changed accordingly.
    Setting the value for this option to 0 disables the partition limit.

Why use partitions?
===================

The primary benefit of using partitioned databases is for the performance
of partitioned queries. Large databases with lots of documents often
have a similar pattern where there are groups of related documents that
are queried together.

By using partitions, we can execute queries against these individual groups
of documents more efficiently by placing the entire group within a specific
shard on disk. Thus, the view engine only has to consult one copy of the
given shard range when executing a query instead of executing
the query across all ``q`` shards in the database. This mean that you do
not have to wait for all ``q`` shards to respond, which is both
efficient and faster.

Partitions By Example
=====================

To create a partitioned database, we simply need to pass a query string
parameter:

.. code-block:: bash

    shell> curl -X PUT 'http://adm:pass@127.0.0.1:5984/my_new_db?partitioned=true'
    {"ok":true}

To see that our database is partitioned, we can look at the database
information:

.. code-block:: bash

    shell> curl http://adm:pass@127.0.0.1:5984/my_new_db
    {
      "cluster": {
        "n": 3,
        "q": 8,
        "r": 2,
        "w": 2
      },
      "compact_running": false,
      "db_name": "my_new_db",
      "disk_format_version": 7,
      "doc_count": 0,
      "doc_del_count": 0,
      "instance_start_time": "0",
      "props": {
        "partitioned": true
      },
      "purge_seq": "0-g1AAAAFDeJzLYWBg4M...",
      "sizes": {
        "active": 0,
        "external": 0,
        "file": 66784
      },
      "update_seq": "0-g1AAAAFDeJzLYWBg4M..."
    }

You'll now see that the ``"props"`` member contains ``"partitioned": true``.

.. note::

    Every document in a partitioned database (except _design
    and _local documents) must have the format “partition:docid”.
    More specifically, the partition for a given document is
    everything before the first colon. The document id is everything
    after the first colon, which may include more colons.

.. note::

    System databases (such as _users) are *not* allowed to be partitioned. This is
    due to system databases already having their own incompatible
    requirements on document ids.

Now that we've created a partitioned database, it's time to add some documents.
Using our earlier example, we could do this as such:

.. code-block:: bash

    shell> cat doc.json
    {
        "_id": "sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
        "sensor_id": "sensor-260",
        "location": [41.6171031, -93.7705674],
        "field_name": "Bob's Corn Field #5",
        "readings": [
            ["2019-01-21T00:00:00", 0.15],
            ["2019-01-21T06:00:00", 0.14],
            ["2019-01-21T12:00:00", 0.16],
            ["2019-01-21T18:00:00", 0.11]
        ]
    }
    shell> $ curl -X POST -H "Content-Type: application/json" \
                http://adm:pass@127.0.0.1:5984/my_new_db -d @doc.json
    {
        "ok": true,
        "id": "sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
        "rev": "1-05ed6f7abf84250e213fcb847387f6f5"
    }

The only change required to the first example document is that we are now
including the partition name in the document id by prepending it to the
old id separated by a colon.

.. note::

    The partition name in the document id is not magical. Internally,
    the database is simply using only the partition for hashing
    the document to a given shard, instead of the entire document id.

Working with documents in a partitioned database is no different than
a non-partitioned database. All APIs are available, and existing client
code will all work seamlessly.

Now that we have created a document, we can get some info about the partition
containing the document:

.. code-block:: bash

    shell> curl http://adm:pass@127.0.0.1:5984/my_new_db/_partition/sensor-260
    {
      "db_name": "my_new_db",
      "doc_count": 1,
      "doc_del_count": 0,
      "partition": "sensor-260",
      "sizes": {
        "active": 244,
        "external": 347
      }
    }

And we can also list all documents in a partition:

.. code-block:: bash

    shell> curl http://adm:pass@127.0.0.1:5984/my_new_db/_partition/sensor-260/_all_docs
    {"total_rows": 1, "offset": 0, "rows":[
        {
            "id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
            "key":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf",
            "value": {"rev": "1-05ed6f7abf84250e213fcb847387f6f5"}
        }
    ]}

Note that we can use all of the normal bells and whistles available to
``_all_docs`` requests. Accessing ``_all_docs`` through the
``/dbname/_partition/name/_all_docs`` endpoint is mostly a convenience
so that requests are guaranteed to be scoped to a given partition. Users
are free to use the normal ``/dbname/_all_docs`` to read documents from
multiple partitions. Both query styles have the same performance.

Next, we'll create a design document containing our index for
getting all readings from a given sensor. The map function is similar to
our earlier example except we've accounted for the change in the document
id.

.. code-block:: javascript

    function(doc) {
        if(doc._id.indexOf(":sensor-reading-") < 0) {
            return;
        }
        for(var r in doc.readings) {
            emit([doc.sensor_id, r[0]], r[1])
        }
    }

After uploading our design document, we can try out a partitioned query:

.. code-block:: bash

    shell> cat ddoc.json
    {
        "_id": "_design/sensor-readings",
        "views": {
            "by_sensor": {
                "map": "function(doc) { ... }"
            }
        }
    }
    shell> $ curl -X POST -H "Content-Type: application/json" http://adm:pass@127.0.0.1:5984/my_new_db -d @ddoc.json
    {
        "ok": true,
        "id": "_design/sensor-readings",
        "rev": "1-13859808da293bd72fde3b31be97372a"
    }
    shell> curl http://adm:pass@127.0.0.1:5984/my_new_db/_partition/sensor-260/_design/sensor-readings/_view/by_sensor
    {"total_rows":4,"offset":0,"rows":[
    {"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","0"],"value":null},
    {"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","1"],"value":null},
    {"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","2"],"value":null},
    {"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":["sensor-260","3"],"value":null}
    ]}

Hooray! Our first partitioned query. For experienced users, that may not
be the most exciting development, given that the only things that have
changed are a slight tweak to the document id, and accessing views with
a slightly different path. However, for anyone who likes performance
improvements, it's actually a big deal. By knowing that the view results
are all located within the provided partition name, our partitioned
queries now perform nearly as fast as document lookups!

The last thing we'll look at is how to query data across multiple partitions.
For that, we'll implement the example sensors by field query from our
initial example. The map function will use the same update to account
for the new document id format, but is otherwise identical to the previous
version:

.. code-block:: javascript

    function(doc) {
        if(doc._id.indexOf(":sensor-reading-") < 0) {
            return;
        }
        emit(doc.field_name, doc.sensor_id)
    }

Next, we'll create a new design doc with this function. Be sure to notice
that the ``"options"`` member contains ``"partitioned": false``.

.. code-block:: bash

    shell> cat ddoc2.json
    {
      "_id": "_design/all_sensors",
      "options": {
        "partitioned": false
      },
      "views": {
        "by_field": {
          "map": "function(doc) { ... }"
        }
      }
    }
    shell> $ curl -X POST -H "Content-Type: application/json" http://adm:pass@127.0.0.1:5984/my_new_db -d @ddoc2.json
    {
        "ok": true,
        "id": "_design/all_sensors",
        "rev": "1-4a8188d80fab277fccf57bdd7154dec1"
    }

.. note::

    Design documents in a partitioned database default to being
    partitioned. Design documents that contain views for queries
    across multiple partitions must contain the ``"partitioned": false``
    member in the ``"options"`` object.

.. note::

    Design documents are either partitioned or global. They cannot
    contain a mix of partitioned and global indexes.

And to see a request showing us all sensors in a field, we would use a
request like:

.. code-block:: bash

    shell> curl -u adm:pass http://adm:pass@127.0.0.1:15984/my_new_db/_design/all_sensors/_view/by_field
    {"total_rows":1,"offset":0,"rows":[
    {"id":"sensor-260:sensor-reading-ca33c748-2d2c-4ed1-8abf-1bca4d9d03cf","key":"Bob's Corn Field #5","value":"sensor-260"}
    ]}

Notice that we're not using the ``/dbname/_partition/...`` path for global
queries. This is because global queries, by definition, do not cover individual
partitions. Other than having the ``"partitioned": false`` parameter in the
design document, global design documents and queries are identical in
behavior to design documents on non-partitioned databases.

.. warning::

    To be clear, this means that global queries perform identically to
    queries on non-partitioned databases. Only partitioned queries
    on a partitioned database benefit from the performance improvements.