Have you ever wondered how Google manages to cope with the ever increasing volumes of data? Well, we have some new insights. Google has published a paper about Spanner, their scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system (so they claim) to distribute data at global scale and support externally-consistent distributed transactions.
The high-level details have been published in a research paper (open access) detailing it all. Google says that this is the first database that can quickly store and retrieve information across a worldwide network of data centers while keeping that information “consistent” — meaning all users see the same collection of information at all times.
Spanner has of course built upon techniques from some of their other massive software platforms that they have built, but at its heart is something completely new. It plugs into a network of servers equipped with super-precise atomic clocks and GPS antennas and uses these time keepers to more accurately synchronize the distribution of data across their now global network.
“If you want to know what the large-scale, high-performance data processing infrastructure of the future looks like, my advice would be to read the Google research papers that are coming out right now,” Mike Olson, the CEO of Hadoop specialist Cloudera, said at recent event in Silicon Valley.
Google is not alone, others in the “Big Data” area are also attempting to crack the same problem. Facebook is already building a system (called Project Prism) that has goals that are similar to Spanner. Facebook has already developed what they called Hadoop, a platform that can juggle as much as 100 petabytes of data — aka hundreds of millions of gigabytes, but this is no longer enough so they now must scale that up and that is precisely what Prism will do for them. In effect it is a way of running a Hadoop cluster so large that it spans multiple data centers across the globe, in other words, the Facebook Prism will be what Spanner already is.
The secret sauce for Spanner is what they are calling the TrueTime API (application programming interface). The problem with a Global database is that time is a rather relative concept, and that can really scupper things when you need to work out the correct order in which to process transactions and so keep it all in step. To solve that problem all their servers plug into TrueTime, and that keeps all the transactional updates happening in the right order on a global scale. This is how they describe it within the Conclusion section at the end of their paper …
One aspect of our design stands out: the linchpin of Spanner’s feature set is TrueTime. We have shown that reifying clock uncertainty in the time API makes it possible to build distributed systems with much stronger time semantics. In addition, as the underlying system enforces tighter bounds on clock uncertainty, the overhead of the stronger semantics decreases. As a community, we should no longer depend on loosely synchronized clocks and weak time APIs in designing distributed algorithms.
To understand TrueTime, you have to understand the limits of existing databases. Today, there are many databases designed to store data across thousands of servers. Most were inspired either by Google’s BigTable database or a similar storage system built by Amazon known as Dynamo. They work well enough, but they aren’t designed to juggle information across multiple data centers — at least not in a way that keeps the information consistent at all times.
According to Andy Gross — the principal architect at Basho, whose Riak database is based on Amazon Dynamo — the problem is that servers must constantly communicate to ensure they correctly store and retrieve data, and all this back-and-forth ends up bogging down the system if you spread it across multiple geographic locations.
“You have to a do a whole lot of communication to decide the correct order for all the transactions,” Gross says, “and the latencies you get are typically prohibitive for a fast database.”
That of course is where TrueTime comes it, it cracks that problem, now that is indeed truly neat.
- The Google paper – Spanner: Google’s Globally-Distributed Database