Data Consistency: the Don Quixote of Sagas

5 min readFeb 10, 2023

I had a kind of overall problem, and in developing microservices and applications in general, I heard about sagas a lot, and it sounded like it could be a solution to my problem. My problem was that I wanted to ensure data consistency, that when data was mutated, especially in a concurrent environment, that changes to data didn’t become complex read/write/race conditions since MOST mutations require a static read and then a write with an unknown amount of time in-between. I think that Sagas are a red herring in that they don’t solve the problem of data consistency, they solve the problem of logical consistency. My goal with this article is to not only define what ‘data consistency’ means, but to show you what sagas aren’t meant to solve and by extension the incredibly narrow use case that they DO solve.

Data consistency Is the idea that if data is mutated by multiple entities, those entities mutating data are acting upon the most “correct” version of the data. I think when we write CRUD APIs or some kind of API to mutate data, we ignore the hidden requirement of the feedback loop. SOME API requests are inherently consistent in terms of logic and data like creating a new entity or incrementing a field by one. These work off of the idea that these are always eventually consistent: (1) with a valid primary key, creation will succeed, either the entity will be created or will already exist and (2) the field will be incremented by one, it doesn’t add a specific amount, it simply increments by one. More curious operations like updating a field with a user-supplied value or mutating multiple objects within a hierarchy can get hairy.

A list of the problems I wanted to solve is below; I found that the core issue in each of these is ‘data consistency’:

· When an entity could be created concurrently, how can we ensure that the “same” object isn’t created twice?

· If there are two instances of a given application (e.g. horizontal scaling) and the same endpoint is executed on the same entity at the same time, how can we ensure data consistency?

· If I have a service with a single responsibility, but it’s duties require multiple tables within its own database, how do I ensure data consistency with endpoints that have to mutate multiple tables?

· What happens when downstream services are unavailable and you’ve mutated data?

Although its underwhelming, most of these problem’s solutions are things done at the database level; unfortunately the most scalable solution for data storage for MOST microservices is still a database of which there’s only one of (if you’re sane).

I have a not so hypothetical application bludgeon in which there’s a timer object, this timer object has a comment field. This hypothetical application will [eventually] have an endpoint that lets you update the comment for the Timer. Obviously, if you update the comment and then read the timer it should return the timer object with your new comment. Not so obviously, if you update the comment and right before you read it, someone else ALSO updates the comment; the timer you read won’t contain the comment you just edited.

Yes you could use a RETURNING clause and/or have the API write and read atomically to avoid this SNAFU…but that doesn’t prove my point (haha). This is an example of a super simple race condition, in that although the mutation was successful, for practical purposes, your specific mutation was lost, quietly I might add. In this case, the data is NOT consistent because there’s no feedback when you mutate an older version of the data. Like read race conditions, this CAN be super benign, but think of more complex mutations, consider the implications of two people concurrently editing an entity over and over again.

One way to ensure data consistency is by using versioning; a single field, often an integer, can be atomically incremented each time the data is mutated. Versioning provides a feedback loop; the ability to answer the question: “Has the data changed since I last read it?”, allows you to make additional decisions:

For a UI, you could re-read the data, display the most recent data and then allow the user to attempt to mutate the data again (ensuring consistency)
For a complex process, you could perform the version check to ensure that your logic is working with the most recent version of the data

Yes, I agree that this seems a bit overkill, and yes I don’t think every field requires this level of data consistency, but the most difficult thing about race conditions and data consistency is identifying that you’ve lost data consistency. In a monolith, where there’s a single database, you can create atomic queries between those tables using foreign keys and the like to ensure data consistency, but when your databases sprawl (as is the case with the single responsibility principle of microservices), data consistency must be implemented at the application level.

Data consistency is NOT logical consistency

This is the TLDR of this entire article; I thought that sagas would solve my data consistency problem(s), and they don’t, their job is to ensure logical consistency NOT data consistency. All of my problems stem from data consistency rather than logical consistency. Race conditions specifically, can create logical inconsistencies, but generally as a result of data inconsistencies…first. Here are some two sentence solutions to the problems mentioned above:

You can ensure the “same” object isn’t created twice by ensuring that the object itself has a candidate (or natural) key
You can ensure data consistency when two instances of an application is modifying the same object by comparing the version of the data you’ve read to the version being modified, this allows you to identify data inconsistency…and do something (or nothing)
If your microservice needs to ensure data consistency between multiple tables (database specifically) within its responsibility, you have to use some combination of foreign key constraints and transactions; use database normalization
When downstream services are unavailable and you have to (or already have) mutate data, you have to solve it with compensating transactions, to undo the dependent mutation. This is the basic use case for a saga. In addition, you should implement some kind of circuit breaker pattern to not unnecessarily overwhelm that downstream service.

Again, data consistency is NOT logical consistency and like most things I write about, the best way to solve your problems is to be honest and understand the problem just as much as the solution. In my case, I realized that the root of the problem I was trying to solve was data consistency, I wanted to ensure that if I was editing a version of the data that was different from what I originally read, that I’d be able to know and make a decision one way or another.

And lastly…you’re in luck, I hate to just theory fight (an FGC reference), so I put together some test code that attempts to prove the ideas and provide a proof of concept in Go. The repo is available at: https://github.com/antonio-alexander/go-blog-data-consistency. Try the docker-compose and read through the README.md for more information.

Data Consistency: the Don Quixote of Sagas

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Antonio Alexander

No responses yet