structuring_for_strong_consistency

Structuring Data for Strong Consistency

This API is supported for first-generation runtimes and can be used when upgrading to corresponding second-generation runtimes. If you are updating to the App Engine Java 11/17 Python 3 PHP 7/8 Go 1.12+ runtime, refer to the migration guide to learn about your migration options for App Engine.

Cloud Datastore provides high availability, scalability and durability by distributing data over many machines and using synchronous replication over a wide geographic area. However, there is a tradeoff in this design, which is that the write throughput for any single entity group is limited to about one commit per second, and there are limitations on queries or transactions that span multiple entity groups. This page describes these limitations in more detail and discusses best practices for structuring your data to support strong consistency while still meeting your application's write throughput requirements.

Strongly-consistent reads always return current data, and, if performed within a transaction, will appear to come from a single, consistent snapshot. However, queries must specify an ancestor filter in order to be strongly-consistent or participate in a transaction, and transactions can involve at most 25 entity groups. Eventually-consistent reads do not have those limitations, and are adequate in many cases. Using eventually-consistent reads can allow you to distribute your data among a larger number of entity groups, enabling you to obtain greater write throughput by executing commits in parallel on the different entity groups. But, you need to understand the characteristics of eventually-consistent reads in order to determine whether they are suitable for your application:

The results from these reads might not reflect the latest transactions. This can occur because these reads do not ensure that the replica they are running on is up-to-date. Instead, they use whatever data is available on that replica at the time of query execution. Replication latency is almost always less than a few seconds.
A committed transaction that spanned multiple entities might appear to have been applied to some of the entities and not others. Note, though, that a transaction will never appear to have been partially applied within a single entity.
The query results can include entities that should not have been included according to the filter criteria, and might exclude entities that should have been included. This can occur because indexes might be read at a different version than the entity itself is read at.

To understand how to structure your data for strong consistency, compare two different approaches for a simple guestbook application. The first approach creates a new root entity for each entity that is created:

View Guestbook.java on GitHub

It then queries on the entity kind Greeting for the ten most recent greetings.

View Guestbook.java on GitHub

However, because you are using a non-ancestor query, the replica used to perform the query in this scheme might not have seen the new greeting by the time the query is executed. Nonetheless, nearly all writes will be available for non-ancestor queries within a few seconds of commit. For many applications, a solution that provides the results of a non-ancestor query in the context of the current user's own changes will usually be sufficient to make such replication latencies completely acceptable.

If strong consistency is important to your application, an alternate approach is to write entities with an ancestor path that identifies the same root entity across all entities that must be read in a single, strongly-consistent ancestor query:

View GuestbookStrong.java on GitHub

You will then be able to perform a strongly-consistent ancestor query within the entity group identified by the common root entity:

View GuestbookStrong.java on GitHub

This approach achieves strong consistency by writing to a single entity group per guestbook, but it also limits changes to the guestbook to no more than 1 write per second (the supported limit for entity groups). If your application is likely to encounter heavier write usage, you might need to consider using other means: for example, you might put recent posts in a memcache with an expiration and display a mix of recent posts from the memcache and Datastore, or you might cache them in a cookie, put some state in the URL, or something else entirely. The goal is to find a caching solution that provides the data for the current user for the period of time in which the user is posting to your application. Remember, if you do a get, an ancestor query, or any operation within a transaction, you will always see the most recently written data.

structuring_for_strong_consistency

Structuring Data for Strong Consistency

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!