Time is essential of modern day web. From Facebooks news feed to financial price data, time is essential. Still in our programming languages and software systems time gets seldom the attention it needs.
I got 2 quizzes for you
At the same Time at Different Locations
Clock Drifts
Clocks of different time zones, pixaby.com
Imagine we stop time. The CPU cannot process any data because its stuck in a cycle. Our software systems will do nothing. The state of our system is frozen. The state cannot change anymore.
Constructs for time in modern day programming languages are still in their infancy. In the early days of Java, time constructs were not even thread safe. Indicating how much importance time constructs had during the development of Java. A good overview of obstacles and falsehoods about dates and time you can find here [2].
Quiz 1: At the same Time at Different Locations
The problem: In my first startup I had to hunt down a wired bug in our system. It took me days to uncover the following. When connecting multiple micro services via JDBC to the same postgres database and use the same query to query the same unchanged data set then I got a different result sets back for each micro service. Guess whats the problem?
The kicker: I discovered that the java virtual machine and JDBC uses the local machine time and not UTC (Coordinated Universal Time) for each micro service. Each micro service was deployed in a different data center around the world. When querying the data the dates in the result set where localized for each micro service and yielded slightly different results for each micro service.
The solution is always default to UTC time everywhere. Aviation does this right, our cloud service should do too. The only place where time zones should matter is in the front end facing the customer - except for the exceptions (e. g. backend calculated statistics which face the customer).
Quiz 2: Clock Drift (and the Google Solution)
The problem: A customer used a multi decades old computer to manage the vehicle traffic in a city (traffic light control server). When watching the live stream of vehicle flow data, we see every other day that we get traffic events out of time order or we see a few minutes gap events. We thought that some traffic events where buffered and sent later or somebody restarted the server. Oh boy were we wrong…
The kicker: The events were all in order, but the timestamp in the traffic events were highly inaccurate. The clock in this multi decade old computer had a massive clock drift of a couple of seconds per hour! Per day the drift was about +/- 1 to 5 minutes to the actual time. On 12 pm every other night the clock was synchronized and drifted clock was set to the actual time. Just looking at the event timestamp lead to a false believe that events had discontinuies. Instead the event time had discontinuies.
The solution: We started to add event time when receiving events in our cloud micro service. We knew our processing and transmission delay would smaller than 100 ms. The clocks in the cloud are quiete accurate and you can synchronize them as much as you want. We also added latency estimates to events to guesstimate if the transmission was a problem and at which hop.
The Google Solution [1] to this problem is putting sychronized atomic clocks into their data centers around the world. As the atomic clocks are highly accurate and you know what time each one has around the world at any given moment, Google could implement a distributed highly consitent database. If the atomic clock solution is too expensive, you can fall back to the network latency between your database shards. Then the network latency is the minimum of inconsistency between the shards!
[1]: Google Spanner https://static.googleusercontent.com/media/research.google.com/de//archive/spanner-osdi2012.pdf
[2]: Date and time falsehoods: https://gist.github.com/timvisee/fcda9bbdff88d45cc9061606b4b923ca