Data Trinity: Data Changes and Eventual Inconsistency (Part 2.2)

Eventual Inconsistency between Real World and Software Systems

Aug 27, 2020

In part 1 we modeled our domain via attributes and entities. Now its time to fill it with live. Live is all about change. That’s true for humans as well as software systems.

Use events or transactions to model change
Eventual Inconsistency: Mind the gap between real world and software systems

https://pixabay.com/get/57e4d1414d53a914f6d1867dda35367b1c37dbe655577348_1920.jpg

Cockpit at night, pixabay.com

Data Changes

There are 3 ways to model change of our application state: Mutation, Events, Transaction.

Mutation is the worst kind of change. On the CPU we have to mutate the values inside the CPUs registers because we have only a few of them. When we move up the abstraction layers to our high level programming languages we need less and less mutation in our code - the mutation parts are hidden inside abstractions.

Mutation is the root of 90% of our bugs in our software. Some value changed but we did not expect the value has been changed. Maybe we even do not know something was changed. This problem is elevated when using parallel processing and multi-core CPUs. Sometimes a bug happens, sometimes not, depending on how the OS scheduled different threads. Super hard to catch these kind of bugs. Use immutable values where ever possible and as little mutation as necessary.

A typical clojure applications has mutation in 1 or 2 places. The rest is immutable. Compare this to a medium sized java application, where 10,000s of objects are mutable. Where is it easier to catch a race condition bug?

Events are immutable as they describe something that happened in the past. We can create a history of all events that happened to our application. When we want to calculate the state of our application, we calculate the state by aggregation of all events that happened until now. The result is our application state as of now. (See redux pattern [2])

One gigantic drawback events have is that they need a name. Without a name we cannot disambiguate one event from another. But naming is hard! Choosing a good name is very hard and you run out of good names very fast (in large applications). Secondly names can be associated with a type hierarchy. This can create a too rigid software architecture.

Database like transactions are preferable. Transactions do not need to be named, but can be verified against a database schema, thus are safer to use.

Modern databases use a transaction logs to store transaction (events) in a fail safe way. When the database crashes, it can look into the transaction log and start processing transactions which were not committed to the data in the database. [3]

When giving the database a schema, the database can verify if a transaction is valid with respect to the schema. The schema describes declaratively what the shape of the data should look like, where as events / redux pattern [2] describes how to process events to derive a data shape. The schema is transparent, the redux pattern is opaque.

Eventual Inconsistency

Imagine we manage a warehouse. Our database tracks what we store in the warehouse. Our database is updated when we receive deliveries and when we send goods to customers. Lets imagine one of our employees steals a few books. Our database will not know about this fact and will be inconsistent to the real world.

This simple story of live means some very nasty truth about our systems. The state of our system is eventual not in line with reality. Think about the crash of Boeings 737-8 Max. One (of two) air flow sensors send wrong data to the planes MCAS flight control software. The MCAS was shitty programmed by by low payed, outsourced programmers. The software only took one sensor data as input - the wrong one. Result: 300+ dead. [1]

Our software systems will be eventual inconsistent with real world. The questions are: Is being inconsistent acceptable? How do we know we are inconsistent? How do we get consistent? There is no single answer. It depends. For sure for MCAS the answer is different than for a warehouse database.

[1] MCAS development was outsourced to save a few dollars: https://www.onlinecitizenasia.com/2019/06/29/software-used-in-737-max-crashes-linked-to-indian-software-companies/

[2] Redux pattern: https://redux.js.org/introduction/core-concepts

[3] Postgres transaction log: http://www.interdb.jp/pg/pgsql09.html

Data Driver

Discussion about this post