Saturday, August 19, 2017

Kinetics of Large Clusters

Summary

  1. Martin Kleppmann's fatal mistake.
  2. Physicochemical kinetics does mathematics.
  3. The half-life of the cluster.
  4. We solve nonlinear differential equations without solving them.
  5. Nodes as a catalyst.
  6. The predictive power of graphs.
  7. 100 million years.
  8. Synergy.

In the previous article, we discussed in detail Brewer's article and Brewer's theorem. This time we will analyze the post of Martin Kleppmann "The probability of data loss in large clusters".

In the mentioned post, the author attempts to simulate the following task. To ensure the preservation of data, the data replication method is usually used. In this case, in fact, it does not matter whether erasure is used or not. In the original post, the author sets the probability of dropping one node, and then raises the question: what is the probability of data loss when the number of nodes increases?

The answer is shown in this picture:

Data loss

Sunday, August 13, 2017

Latency of Geo-Distributed Databases

Theorem 0. The minimum guaranteed latency for the globally highly available strong consistency database is 133 ms.

Earth

1 Abstract

The article introduces step-by-step the formal vocabulary and auxiliary lemmas and theorems to prove the main theorem 0. Finally, as a consequence, the CAL theorem is formulated.

2 Introduction

Modern applications require intensive work with huge amount of data. It includes both massive transactional processing and analytical research. As an answer to the current demand, the new generation of databases appears: NewSQL databases. Those databases provide the following important characteristics: horizontal scalability, geo-availability, and strong consistency.

NewSQL era opens new possibilities to store and process so called Big Data. At the same time, the important question appears: "how fast the databases might be?". It is very challenging task to improve the performance and latency parameters because it involves almost all layers while building the databases: from hardware questions about data centers connectivity and availability to software sophisticated algorithms and architectural design.

Thus, we need to understand the degree of latency optimizations and corresponding limitations that we have to deal with. The article tries to find answers to that challenge.

Saturday, March 4, 2017

CAP Theorem Myths

Introduction

cap

The article explains the most widespread myths of CAP theorem. One of the reason is to analyze recent Spanner, TrueTime & The CAP Theorem article and to make clear understanding about terms involved in the theorem and discussed a lot under different contexts.

We consider that article closer to the end, armed with the concepts and knowledge. Before that, we analyze the most common myths associated with the CAP theorem.

Sunday, May 8, 2016

Replicated Object. Part 7: Masterless Consensus Algorithm

1 Abstract

The article introduces the new generation of consensus algorithms: masterless consensus algorithm. The core part consists of less than 30 lines of C++ code. Thus it is the simplest consensus algorithm that contains several outstanding features allowing to easily developing complex fault-tolerant distributed services.

2 Introduction

There are only two hard problems in distributed systems:
2. Exactly-once delivery.
1. Guaranteed order of messages.
2. Exactly-once delivery.

Mathias Verraes.

Distributed programming is hard. The main reason that you should not rely on the common assumptions about timings, possible failures, devices reliability and operation sequences.

Tuesday, November 10, 2015

Replicated Object. Part 2: God Adapter

1 Annotation

The article introduces a special adapter that allows developers to wrap any object into another one with additional features you want to include. Adapted objects have the same interface thus they are completely transparent from the usage point of view. The generic concept will be introduced step-by-step using simple but powerful examples.

2 Introduction

Disclaimer. If you are not tolerant to C++ perversions please stop reading this article.

The term god adapter is originated from god object meaning that it implements many features. The same idea is applicable for god adapter as well. Such adapter has outstanding responsibility and includes features that you can or even cannot imagine.

Sunday, September 20, 2015

Replicated Object. Part 1: Introduction

1 Abstract

The present article explains an early prototype that introduces the concept of replicated object or replob. Such object is a further rethinking how to deal with complexity related to distributed systems development. Replob eliminates the dependency on the external reliable service and incorporates the consistent data manipulation into the user-defined objects representing data and related functionality. The idea is based on using the power of C++ language and object-oriented programming that allows complex logic utilization within distributed transactions and significantly simplifies development of the reliable applications and services. Subsequent articles will explain presented approach in detail step-by-step.

2 Introduction

Disclaimer. Almost all methods specified in the article contain dirty memory hacks and abnormal usage of C++ language. So if you are not tolerant to system and C++ perversions please stop reading this article.

Today, topics related to distributed systems are one of the most interesting and attract many people including developers and computer scientists. The popularity can be explained in a simple manner: we need to create robust fault-tolerant systems that provide safe environment to perform execution of operations and data storing.

Along with that, the consistency of distributed system plays important role. It comes with a price if you want to have stronger notion of consistency level. There are a set of systems provides a weakest form of consistency: so called eventual consistency. While those systems have relatively good performance they cannot be used in many areas where you need to have transactional semantics for your operations. The thing is that it is much simpler to meditate and reason about a system under consideration using one of the strong forms of consistency like strict consistency or linearizability. Due to those consistency levels, it is much easier to develop reliable application with safe semantics of operations.

Wednesday, August 12, 2015

New Language Paradigm Philosophy

Rules

  1. Simple things must be simple.
  2. Complex things must be as simple as possible.

Meaning that the implementation should be the simplest.

Four Noble Truths

The approach is like a buddhism:

  1. There is a complexity.
  2. There is a root cause of the complexity.
  3. There is an absence of complexity.
  4. There is a way to avoid complexity.

Complexity

API depends on the model you choose: actor, callback-style, subscription, future/promise, RPC-style etc. But the model should be implementation details: you should change it if you wish. Currently, it’s not the case.

The idea is to transform the code in such way to have flexibility in the model and approaches. One should consider the model as a low-level (implementation details) architecture.

Building Blocks

You should build the application from top to bottom (from architecture to implementation), not from bottom to top (from classes and libraries to satisfy the requirements/architecture).