tag:blogger.com,1999:blog-76942399375144493222024-02-07T22:06:44.819-08:00C++ and Distributed SystemsC++ stuff and distributed systems analysis and algorithms.Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.comBlogger17125tag:blogger.com,1999:blog-7694239937514449322.post-80490741590652767182018-09-10T17:15:00.001-07:002018-10-07T11:40:43.080-07:00Exactly Once is NOT Exactly the Same: Article Analysis<h2 id="introduction">Introduction</h2>
<p>I decided to analyze an article <a href="https://streaml.io/blog/exactly-once">describing some interesting details of the stream processing exactly-once</a>. The thing is that sometimes the authors misunderstand important terms. The analysis of the article just will clarify many aspects and details so revealing illogicalities and oddities allows you to fully experience the concepts and meaning.</p>
<p>Let’s get started.</p>
<h2 id="analysis">Analysis</h2>
<p>Everything starts very well:</p>
<blockquote>
<p>Distributed event stream processing has become an increasingly hot topic in the area of Big Data. Notable Stream Processing Engines (SPEs) include Apache Storm, Apache Flink, Heron, Apache Kafka (Kafka Streams), and Apache Spark (Spark Streaming). One of the most notable and widely discussed features of SPEs is their processing semantics, with “exactly-once” being one of the most sought after and many SPEs claiming to provide “exactly-once” processing semantics.</p>
</blockquote>
<p>Meaning that data processing is extremely important bla-bla-bla and the topic under discussion is <em>exactly-once processing</em>. Let us discuss it.</p>
<blockquote>
<p>There exists a lot of misunderstanding and ambiguity, however, surrounding what exactly “exactly-once” is, what it entails, and what it really means when individual SPEs claim to provide it.</p>
</blockquote>
<p>Indeed, it is very important to understand what it is. To do this, it would be nice to give a correct definition before the lengthy reasoning. And who am I to give such damn sensible advice?</p>
<a name='more'></a>
<blockquote>
<p>I’ll discuss how “exactly-once” processing semantics differ across many popular SPEs and why “exactly-once” can be better described as effectively-once</p>
</blockquote>
<p>Inventing new terms is certainly important task. I love this business. But this requires a justification and strong reasoning. Let us try to find it.</p>
<p>I will not describe the obvious things by the type of directed processing graphs and stuff. Readers can read the original article on their own. Moreover, for analysis these details are not very significant. I will give only a picture:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIK36QQ1OxOO6tzT_L0dWbBUe0hsijy1n72AUp1VJCv0hVVe6TnCoguX5J67a2gDvIBGRQKU4v2qcCsZp0KL6gTJHUJD06vg5AKVA4w2DbazEGB1Yu-riU6zAsxJUsa3qiQ4cjv0wtRlQ/s720/processing.png" alt="Processing"></p>
<p>Further, a description of the semantics follows:</p>
<ul>
<li><em>At-most-once</em>, i.e. not more than once. At some complex cases such behavior is extremely difficult to guarantee for specific failure scenarios, network split, and more. But for the author provides simple solution:</li>
</ul>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_6KPe880fa_DMr_QaPwFW44odF9XtA4JBwlQ44VW6J91FBI_YxRH0jdAX2tKvx5k3TfL0-B0HAIizdL5m5heaxWJRqWkLjIF_WGBt0gEUaw-o5zJuQDQv2GbVBAPtgFoCKGoF_Fg3kdw/s860/at_most.png" alt="At-most-once"></p>
<ul>
<li><em>At-least-once</em>, i.e. not less than once. The scheme is more complicated:</li>
</ul>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkBZlzMJd_QMcy81uV6d-2uJ4VUTMzst0FalYj85B_Stu38DkPU4Va2Cmq1eu0A8h33bqXdexKhvoW_Ch8szRZ5NA5wLcfGfKDHOFkNjOqj725NVAaGL2vfhLhlPgKrdDU6_wax3J2NSA/s860/at_least.png" alt="At-least-once"></p>
<ul>
<li><em>Exactly-once</em>, finally. What exactly is once?</li>
</ul>
<blockquote>
<p>Events are guaranteed to be processed “exactly once” by all operators in the stream application, even in the event of various failures.</p>
</blockquote>
<p>So the guarantee of processing exactly-once is when the processing “exactly once” occurred.</p>
<p>Can you feel the power of the definition? I rephrase: processing happens once when processing occurs “once”. Well, yes, the exactly once processing must be there even in case of failures. But for distributed systems this thing is obvious. And the quotes hint at the fact that something is wrong here. Giving the definitions with quotes without explaining what it means - it is a mark of a deep and thoughtful approach.</p>
<p>Next part describes the ways of implementing such semantics. And here I would like to specify it in more detail.</p>
<blockquote>
<p>Two popular mechanisms are typically used to achieve “exactly-once” processing semantics:</p>
<ol>
<li>Distributed snapshot/state checkpointing</li>
<li>At-least-once event delivery plus message deduplication</li>
</ol>
</blockquote>
<p>If the first mechanism for snapshots and checkpoints does not cause questions, well, except for some details of the type of efficiency. Then with the second there are small problems, which the author has left out.</p>
<p>For some reason, it is implied that the handler can only be deterministic. In the case of a nondeterministic handler, each subsequent restarting provides in general other output values and states, so deduplication will not work, because the output values will be different. Thus, the general mechanism will be much more complicated than described in the article. Or, to put it bluntly, such a mechanism is incorrect.</p>
<p>However, we pass to the most delicious:</p>
<blockquote>
<h2 id="is-exactly-once-really-exactly-once">Is exactly-once really exactly-once?</h2>
<p>Now let’s reexamine what the “exactly-once” processing semantics really guarantees to the end user. The label “exactly-once” is misleading in describing what is done exactly once.</p>
</blockquote>
<p>It is said that it is time to revise this concept, because there are some inconsistencies. Okay.</p>
<blockquote>
<p>Some might think that “exactly-once” describes the guarantee to event processing in which each event in the stream is processed only once. In reality, there is no SPE that can guarantee exactly-once processing. To guarantee that the user-defined logic in each operator only executes once per event is impossible in the face of arbitrary failures, because partial execution of user code is an ever-present possibility.</p>
</blockquote>
<p>The dear author should be reminded how modern CPUs work. Each CPU performs a large number of parallel steps during processing. Moreover, there are branch predictors at which the processor starts to perform the wrong actions if the predictor is mistaken. In this case, the actions and side effects are rolled back. Thus, the same piece of code can be suddenly executed twice even if no failures have occurred!</p>
<p>The attentive reader will immediately exclaim: so in fact the real output is important, and not how it is executed. Precisely! What is important is what happened as a result, and not how it actually happened. If the result is as if it happened exactly once, so it happened exactly once. Do not you find it? And everything else is rubbish that are not relevant. Systems are complex, and the resulting abstractions create only the illusion of doing it in a certain way. It seems to us that the code is executed sequentially, the instruction for the instruction, that first goes reading, then writing, then a new instruction. But this is not so, everything is much more complicated. And the essence of correct abstractions is to maintain the illusion of simple and understandable guarantees, without digging inwards every time you need to assign values to a variable.</p>
<p>And simply speaking, the main issue of this article is that exactly-once is an abstraction that allows you to build applications without thinking about duplicates and lost of the values. That everything will be fine even in case of a failure. And there is no need to invent new terms for this.</p>
<p>An example of the same code in the article clearly demonstrates the lack of understanding of how to write handlers:</p>
<pre><code>Map (Event event) {
Print "Event ID: " + event.getId()
Return event
}
</code></pre>
<p>The reader is invited to rewrite the code himself, so as not to repeat the mistakes by the author.</p>
<blockquote>
<p>So what does SPEs guarantee when they claim “exactly-once” processing semantics? If user logic cannot be guaranteed to be executed exactly once then what is executed exactly once? When SPEs claim “exactly-once” processing semantics, what they’re actually saying is that they can guarantee that updates to state managed by the SPE are committed only once to a durable backend store.</p>
</blockquote>
<p>The user does not need a guarantee of physical execution of the code. Knowing how the CPU works, it’s easy to conclude that this is impossible. The main goal is the logical exactly-once execution as if there were no failures at all. Additional involving the notions of “commit to a durable store” only exacerbates the lack of understanding of the basic things because there are implementations of such semantics without the need for a commit.</p>
<p>More information can be easily found in my article: <a href="http://gridem.blogspot.com/2018/07/heterogeneous-concurrent-exactly-once.html">Heterogeneous Concurrent Exactly-Once Real-Time Processing</a>.</p>
<blockquote>
<p>In other words, the processing of an event can happen more than once but the effect of that processing is only reflected once in the durable backend state store.</p>
</blockquote>
<p>User doesn’t care that there is a “durable backend state store” at all. Only the effect of execution is important, i.e. the consistency and the result of the entire processing execution matters. It is worth noting that for some tasks there is no need to have a durable backend state store, and having the exactly-once guarantee would be required.</p>
<blockquote>
<p>Here at Streamlio, we’ve decided that effectively-once is the best term for describing these processing semantics.</p>
</blockquote>
<p>A typical example of the stupid introduction of concepts: we will write some example and lengthy arguments on a whole paragraph, and in the end we will add that “we define this concept in this way”. The accuracy and clarity of the definitions causes a really bright and emotional response.</p>
<h2 id="conclusions">Conclusions</h2>
<p>Misunderstanding of the essence of abstractions leads to a distortion of the original meaning of existing concepts and the subsequent invention of new terms from scratch.</p>
<p>[1] <a href="https://streaml.io/blog/exactly-once">Exactly once is NOT exactly the same</a>.<br>
[2] <a href="http://gridem.blogspot.com/2018/07/heterogeneous-concurrent-exactly-once.html">Heterogeneous Concurrent Exactly-Once Real-Time Processing</a>.</p>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-42592498146263923542018-07-19T20:44:00.001-07:002018-10-07T11:41:40.702-07:00Heterogeneous Concurrent Exactly-Once Real-Time Processing<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIoyb47HdzxfI9hH1DfG3w6zQQQ0xFRpgtiSHJ6WNqBFC-3a93f5_CP6CEU2ENHoJP9gkQNE5tOxzcDmGH3rwRyDJEqd2a1bWkAkhvB0Rd3yWuI2jo0MIhVWJQJyeH_Ig5rsDbzdbG8A0/s480/sausage2.jpg" alt="Concurrent sausage"></p>
<h2 class="mume-header" id="abstract">Abstract</h2>
<p><em>Exactly-once</em> data processing in real-time is an extremely non-trivial task and requires serious and thoughtful approach for the entire pipeline. Someone even believes that such a <a href="https://bravenewgeek.com/you-cannot-have-exactly-once-delivery/">task is impossible</a>. In reality, one wants to have an approach that provides generic fault-tolerant processing without any delays together with using the different data storages, which puts forward an even stronger requirement for the system: <em>concurrent exactly-once</em> and heterogeneity of the persistent layer. To date, any of the existing systems do not support this requirement.</p>
<p>The proposed approach will consistently reveal secret ingredients and necessary concepts allowing to implement heterogeneous <em>concurrent exactly-once</em> processing relatively easy literally based on two components.</p>
<h2 class="mume-header" id="introduction">Introduction</h2>
<p>The developer of distributed systems passes several stages:</p>
<p><em>Stage 1: Algorithms</em>. Here we study the basic algorithms, data structures, approaches to OOP type programming, and so on. The code is solely single-threaded. The initial phase of entering the profession. Nevertheless, it is rather difficult and can last for years.</p>
<p><em>Stage 2: Multithreading</em>. Then there are questions of extracting maximum efficiency from hardware, there is multithreading, asynchrony, races, debugging, stracing, sleepless nights... Many get stuck at this stage and even start from some moment to catch an inexplicable buzz. But only a few come to understand the architecture of virtual memory and memory models, lock-free/wait-free algorithms, various asynchronous models. And almost no one ever comes to the verification of multi-threaded code.</p>
<p><em>Stage 3: Distributed programming</em>. Here such shit happens that words cannot describe it.</p>
<a name='more'></a>
<p>It would seem that there is nothing complex. We make the transformation: many threads → many processes → many servers. But each step of transformation brings qualitative changes, and they all fall on the system, crushing it and turning it into dust.</p>
<p>And it's about changing the domain of error handling and having shared memory. If earlier there was always a piece of memory that was available in every thread, and if desired, in every process, now such a piece does not exist and can not be. Everything for itself, independent and proud.</p>
<p>Earlier, in multitheaded systems, crashed thread terminates the whole process at the same time, and it was good, because did not lead to partial failures, now partial failures become the norm and every time before each action you think: "what if?". This is so annoying and distracting from writing, in fact, the logic itself, that the code size becomes extremely large. Everything turns into spaghetti code, handling errors, state transitions, switching and saving context, recovery due to failures of one component, another component, unavailability of some services, etc. etc. Screwing the monitoring above that provides the perfect possibility to spend the endless nights with your favorite laptop.</p>
<p>While the multithreading approach provides the possibility of taking the mutex and shredding the shared memory in the fun. Amazing possibility!</p>
<p>As a result, we have that the key and battle-tested patterns were taken away, and the new ones, for replacement, were not brought for some reason, and it is similar to anecdote about how the fairy waved her wand and tank tower fell off.</p>
<p>Nevertheless, there is a set of proven practices and algorithms in distributed systems. However, every self-respecting developer considers his duty to reject known achievements and to push his own approach, despite the accumulated experience, a considerable number of scientific articles and academic researches. After all, if you can in algorithms and multithreading, how can you get into a mess with the distributed programming? There can not be two opinions here!</p>
<p>As a result, the systems are buggy, the data diverges and becomes corrupted, the services periodically become unavailable for writing, or even completely unavailable, because suddenly the node has crashed, the network connectivity loss, Java has been eating a lot of memory and GC has stuck, and many other reasons that allow you delaying the dismiss.</p>
<p>However, even with known and proven approaches, life does not become easier, because distributed reliable primitives are heavyweight with serious requirements imposed on the logic of the executable code. Therefore, the corners are cut off wherever possible. And, as it often happens, with the hastily cut off corners you receive the simplicity and relative scalability, but lose the reliability, availability, and consistency of a distributed system.</p>
<p>Ideally, I would not want to think about the fact that the system is distributed and multithreaded, i.e. work at the 1st stage (algorithms), without thinking about the 2nd (multithreading + asynchrony) and 3rd (distribution). This way of isolating abstractions would greatly enhance the simplicity, reliability, and speed up the code writing. Unfortunately, at the moment it's possible only in dreams.</p>
<p>Nevertheless, separate abstractions make it possible to achieve relative isolation. One of the typical examples is the <a href="http://gridem.blogspot.com/2017/11/replicated-object-3-subjector-model.html">use of coroutines</a>, where instead of an asynchronous code, we get a synchronous one, i.e. go from the 2nd stage to the 1st stage, which makes it much easier to write and maintain the code.</p>
<p>In this article, the use of lock-free algorithms for constructing a reliable consistent distributed scalable real-time system is successively disclosed, i.e. how the lock-free achievements of the 2nd stage help in the implementation of the 3rd, reducing the problem to single-threaded algorithms of the 1st stage.</p>
<h2 class="mume-header" id="problem-statement">Problem Statement</h2>
<p>This task only illustrates some important approaches and is presented as an example of introducing into the context of the problematics. It can easily be generalized to more complex cases, which will be done in the future.</p>
<p><strong>Task: real-time streaming data processing</strong>.</p>
<p>There are two streams of numbers. The handler reads the data of these input streams and selects the last numbers for a certain period. These numbers are averaged in this time interval, i.e. in the sliding data window for a specified time. The resulting average value must be written to the output stream for subsequent processing. In addition, if the number of numbers in the window exceeds a certain threshold, then increase by one the counter in the external transactional database.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEideQI39Y7l3L6EXcVA1G560lt0c0JcWdXkDLNrEiquIS8Gz8XwsGjieksWK2-ZQNQM_uwvycMSajBaOeymNzs9Zz-96qmQ9FsVdc73L33FlEfsVS0efvc_6uo3iaC8_dWKNs73x4mtFec/s1600/scheme_initial.png" alt="Initial"></p>
<p>Let’s note several specific items:</p>
<ol>
<li><em>Nondeterminism</em>. There are two sources of nondeterministic behavior: it is a reading from two streams, and also a time window. It is clear that reading can be done in different ways, and the final result will depend on the sequence of the extracted data. The time window also changes the result from run to run, because the amount of data within the windows depends on execution speed.</li>
<li><em>Statefulness</em>. We have the stateful handler to store a set of numbers.</li>
<li><em>Interaction with external storage</em>. We should update the counter value in the external database. The crucial point is that the type of external storage differs from the store of the handler state and streams.</li>
</ol>
<p>All this, as will be shown below, seriously affects the used approaches and the possible ways of implementation.</p>
<p>It remains to add to the task a small trait, which immediately converts the problem from the area of extreme complexity into an impossible problem: a <em>concurrent exactly-once</em> guarantee is required.</p>
<h2 class="mume-header" id="exactly-once">Exactly-Once</h2>
<p><em>Exactly-once</em> is often treated too broadly, which emasculates the term itself, and it stops to meet the original requirements of the task. If we are talking about a system that works locally on one computer - then everything is simple: take more, throw further. But in this case we are talking about a distributed system in which:</p>
<ol>
<li>The number of handlers can be large: each handler works with its piece of data. At the same time, the results can be added to different places, for example, to an external database, perhaps sharded.</li>
<li>Each handler can suddenly stop processing. The fault-tolerant system means the continuation of work even in case of failure of different parts of the system.</li>
</ol>
<p>So, you need to be prepared for the handler to fail, and the other handler must pick up the work already done and continue processing.</p>
<p>Here the question immediately arises: what does <em>exactly-once</em> mean in the case of a nondeterministic handler? After all, in general, every time we restart, we receive different results. The answer is simple: for <em>exactly-once</em>, there is a system execution in which each input value is processed exactly once, giving the corresponding output. In this case, this execution does not have to be physically on the same node. But the result should be as if everything was processed on the same logical node <em>without any failures</em>.</p>
<h2 class="mume-header" id="concurrent-exactly-once">Concurrent Exactly-Once</h2>
<p>For more sophisticated requirements we introduce a new concept: <em>concurrent exactly-once</em>. The fundamental difference from "simple" <em>exactly-once</em> is the absence of pauses in processing as if everything was processed on the same node <strong>without any failures</strong> and <strong>without delays</strong>. In our task, we require precisely <em>concurrent exactly-once</em>, for the sake of simplicity to avoid considering a comparison with existing systems that are not available today.</p>
<p>The consequences of having such a requirement will be discussed below.</p>
<h2 class="mume-header" id="transactions">Transactions</h2>
<p>In order for the reader to penetrate even deeper into the complexity that has arisen, let's look at the various bad scenarios that must be taken into account when developing such a system. We will also try to use a generic approach that solves the above problem in the perspective of our requirements.</p>
<p>The first thing that comes to mind is the need to save the state of the handler and the input and output streams. The state of the output streams is described by a simple queue of numbers, and the state of the input streams is represented by an index position. In fact, the stream is an infinite queue, and the position in the queue uniquely identifies the location.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoleXuBosex9PFBwkazIsHgmzs9NZNFFOkDu9U9iF7axIOJs0YW8_F3w0tH_8mKi4cDUADmUjUL0ePhxbJ72Ckg9XU9nbRw5N3NKmM8A3-s2ZNrnf14RXCCPS48T_ouAj_DjlaMv16SwU/s1600/idea.jpg" alt="Idea"></p>
<p>There is the following naive implementation of processor using a data storage. At this stage, specific properties of the storage are not important. We will use Pseco language to illustrate the ideas (Pseco ≡ pseudo code):</p>
<pre data-role="codeBlock" data-info="py" class="language-python">handle<span class="token punctuation">(</span>input_queues<span class="token punctuation">,</span> output_queues<span class="token punctuation">,</span> state<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># restore stream positions</span>
input_indexes <span class="token operator">=</span> storage<span class="token punctuation">.</span>get_input_indexes<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token comment"># process incoming streams using infinite loop</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
<span class="token comment"># load data from queues from the current position</span>
items<span class="token punctuation">,</span> new_input_indexes <span class="token operator">=</span> input_queues<span class="token punctuation">.</span>get_from<span class="token punctuation">(</span>input_indexes<span class="token punctuation">)</span>
<span class="token comment"># add items to the handler queue</span>
state<span class="token punctuation">.</span>queue<span class="token punctuation">.</span>push<span class="token punctuation">(</span>items<span class="token punctuation">)</span>
<span class="token comment"># and update the window according to duration</span>
state<span class="token punctuation">.</span>queue<span class="token punctuation">.</span>trim_time_window<span class="token punctuation">(</span>duration<span class="token punctuation">)</span>
avg <span class="token operator">=</span> state<span class="token punctuation">.</span>queue<span class="token punctuation">.</span>avg<span class="token punctuation">(</span><span class="token punctuation">)</span>
need_update_counter <span class="token operator">=</span> state<span class="token punctuation">.</span>queue<span class="token punctuation">.</span>size<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">></span> size_boundary
<span class="token comment"># (A) add the average to the output queue</span>
output_queues<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>push<span class="token punctuation">(</span>avg<span class="token punctuation">)</span>
<span class="token keyword">if</span> need_update_counter<span class="token punctuation">:</span>
<span class="token comment"># (B) increase the counter in the external database</span>
db<span class="token punctuation">.</span>increment_counter<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token comment"># (C) save the state in the storage</span>
storage<span class="token punctuation">.</span>save_state<span class="token punctuation">(</span>state<span class="token punctuation">)</span>
<span class="token comment"># (D) save the indexes</span>
storage<span class="token punctuation">.</span>save_queue_indexes<span class="token punctuation">(</span>new_input_indexes<span class="token punctuation">)</span>
<span class="token comment"># (E) update the current indexes</span>
input_indexes <span class="token operator">=</span> new_input_indexes
</pre><p>Here is a simple single-threaded algorithm that reads data out of the input streams and writes the desired values according to the task described above.</p>
<p>Let's see what happens if the node crashes at random places. It is clear that in the case of a crash at points <code>(A)</code> and <code>(E)</code> everything will be fine: either the data has not been saved yet and we will simply restore the state and continue processing on another node, or all the necessary data has already been saved and handler just continues the next step.</p>
<p>However, in case of a crash in all other points, unexpected troubles await us. If there is a failure at point <code>(B)</code>, then on handler restarting we restore the state and push the same average value. In case of a crash at point <code>(C)</code> in addition to the duplicate of the average, a duplicated increment appears. And in case of a failure at <code>(D)</code> we get a non-consistent state of the handler: the state corresponds to a new time point while we read the values from the previous input stream positions.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinlzwx2spsNuIVGXmAYZYTJUxNOK0_ZDHJ-qYp3aghmMjcX7HcxiCuei_v9Kf-eFvOUyJZGLW-LJLcyGB8xWa4cCm2ijaXhoQ7IJZwZ1SPXs6_49O2py8iSd2aRkYOitq7ubrCtagxbis/s1600/troubles.jpg" alt="Troubles"></p>
<p>In this case, the permutation of write operations don’t change the situation fundamentally: inconsistency and duplicates will remain. Thus, we conclude that all actions to change the state of a handler in the storage, the output queue, and the database must be performed transactionally, i.e. simultaneously and atomically using the all-or-nothing technique.</p>
<p>Accordingly, it is necessary to develop a mechanism so that different storages can change the states transactionally, not independently within each one, but transactionally between all the storages simultaneously. Of course, you can put our storage inside the external database, but the task assumed that the database engine and the stream processing engine are separated and work independently of each other. Here I want to consider the most difficult case because simple cases are not interesting.</p>
<h2 class="mume-header" id="concurrent-responsiveness">Concurrent Responsiveness</h2>
<p>Consider concurrent exactly-once in more detail. In the case of a fault-tolerant system, we require resuming processing from a certain point. It is clear that this point is the moment in the past because to preserve scalability, we can not store all the moments of state change in the present and in the future in the storage: either the last result of operations or a batch of changes are saved. This behavior immediately leads us to the fact that after the restoring of the handler state there will be some delay in the results, it will grow with the size of the batch and the size of the state.</p>
<p>In addition to this delay, there are also delays in the system related to loading the state to another node. Moreover, detection of a failed node also takes some time, and often nonnegligible. This is primarily due to the fact that if we put a short detection time, then frequent false positive events are possible, which will lead to various unpleasant side effects.</p>
<p>Thereto, with the increase in the number of handlers executed in parallel, it suddenly turns out that not all of them processes streams equally well even in the absence of failures. Sometimes latency spikes can happen to lead to delays in processing. The reason for such spikes can be various:</p>
<ol>
<li><em>Software</em>: GC delays, memory fragmentation, allocator delays, kernel interrupts, and task scheduling, problems with device drivers.</li>
<li><em>Hardware</em>: disk or network high load, CPU throttling due to cooling problems, overload, etc., disk slowdown due to technical problems.</li>
</ol>
<p>And this is far from an exhaustive list of problems that can lead to the slowdown of handlers.</p>
<p>Accordingly, the slowdown of the data processing is the reality with which we should deal with. Sometimes this is not a serious problem, and sometimes it is extremely important to maintain high processing speed in spite of failures or latency spikes.</p>
<p>Immediately the idea of system redundancy appears: for the same input dataset let's execute several handlers at once concurrently. The problem here is that in this case, duplicates and inconsistent behavior of the system can easily happen. Typically, frameworks are not designed for this behavior and assume that the number of handlers at each time point does not exceed one. Systems that allow the described duplication of the execution without violating the consistency are called <em>concurrent exactly-once</em> engines.</p>
<p>This architecture allows solving several problems at once:</p>
<ol>
<li>Resilience: if the node fails, the other node just continues to work as if nothing happened. There is no need for additional coordination of actions because the second handler is executed regardless of the state of the first one.</li>
<li>Latency spikes elimination: the first completed handler wins providing the final result. The other handler has to pick up a new state and continue processing.</li>
</ol>
<p>This approach, in particular, allows you to complete a difficult, hard long calculation for a more predictable time because the probability that concurrent handlers will fail is significantly less.</p>
<h2 class="mume-header" id="probabilistic-estimation">Probabilistic Estimation</h2>
<p>Let's try to evaluate the advantages of the concurrency. Suppose that every day in average something happens with the handler: either processing delay or corresponding node fail. Suppose also that we prepare the batch of data in 10 seconds.</p>
<p>Then the probability that something will happen during the batch creation is <code>10 / (24 · 3600) ≃ 1e-4</code>.</p>
<p>If you run two handlers concurrently, then the probability that both will fail <code>≃ 1e-8</code>. So this event will come in 23 years! Yes, the system does not live that much, which means it will never happen!</p>
<p>At the same time, if the time to create the batch is even smaller and/or delays occur even more rarely, this result will only increase.</p>
<p>Thus, we come to the conclusion that the approach under consideration substantially increases the reliability of our entire system. It remains to resolve a small question only: where can one read about the recipe of developing the <em>concurrent exactly-once</em> system? And the answer is simple: here it is!</p>
<h2 class="mume-header" id="semi-transactions">Semi-Transactions</h2>
<p>For the sake of further discussion, we need to introduce the notion of a <em>semi-transaction</em>. The example is the easiest way to explain this notion.</p>
<p>Consider transferring funds from one bank account to another. A traditional approach using transactions in the Pseco language can be described as follows:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">transfer<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> to<span class="token punctuation">,</span> amount<span class="token punctuation">)</span><span class="token punctuation">:</span>
tx <span class="token operator">=</span> db<span class="token punctuation">.</span>begin_transaction<span class="token punctuation">(</span><span class="token punctuation">)</span>
amount_from <span class="token operator">=</span> tx<span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> amount_from <span class="token operator"><</span> amount<span class="token punctuation">:</span>
<span class="token keyword">return</span> error<span class="token punctuation">.</span>insufficient_funds
tx<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> amount_from <span class="token operator">-</span> amount<span class="token punctuation">)</span>
tx<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span>to<span class="token punctuation">,</span> tx<span class="token punctuation">.</span>get<span class="token punctuation">(</span>to<span class="token punctuation">)</span> <span class="token operator">+</span> amount<span class="token punctuation">)</span>
tx<span class="token punctuation">.</span>commit<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">return</span> ok
</pre><p>But what if you do not have the ability to use the transactions? You can achieve the same result by using locks:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">transfer<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> to<span class="token punctuation">,</span> amount<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># automatically releases the lock on scope exit</span>
lock_from <span class="token operator">=</span> db<span class="token punctuation">.</span>lock<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">)</span>
lock_to <span class="token operator">=</span> db<span class="token punctuation">.</span>lock<span class="token punctuation">(</span>to<span class="token punctuation">)</span>
amount_from <span class="token operator">=</span> db<span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> amount_from <span class="token operator"><</span> amount<span class="token punctuation">:</span>
<span class="token keyword">return</span> error<span class="token punctuation">.</span>insufficient_funds
db<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> amount_from <span class="token operator">-</span> amount<span class="token punctuation">)</span>
db<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span>to<span class="token punctuation">,</span> db<span class="token punctuation">.</span>get<span class="token punctuation">(</span>to<span class="token punctuation">)</span> <span class="token operator">+</span> amount<span class="token punctuation">)</span>
<span class="token keyword">return</span> ok
</pre><p>This approach can lead to deadlocks because the locks can be taken using different sequences in parallel. To fix this behavior, it is enough to enter a function that simultaneously takes several locks in a deterministic sequence (for example, sorted by keys), which completely eliminates possible deadlocks.</p>
<p>Nevertheless, the implementation can be somewhat simplified:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">transfer<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> to<span class="token punctuation">,</span> amount<span class="token punctuation">)</span><span class="token punctuation">:</span>
lock_from <span class="token operator">=</span> db<span class="token punctuation">.</span>lock<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">)</span>
amount_from <span class="token operator">=</span> db<span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> amount_from <span class="token operator"><</span> amount<span class="token punctuation">:</span>
<span class="token keyword">return</span> error<span class="token punctuation">.</span>insufficient_funds
db<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> amount_from <span class="token operator">-</span> amount<span class="token punctuation">)</span>
lock_from<span class="token punctuation">.</span>release<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token comment"># the lock is necessary because</span>
<span class="token comment"># db.set(db.get...) is not atomic</span>
lock_to <span class="token operator">=</span> db<span class="token punctuation">.</span>lock<span class="token punctuation">(</span>to<span class="token punctuation">)</span>
db<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span>to<span class="token punctuation">,</span> db<span class="token punctuation">.</span>get<span class="token punctuation">(</span>to<span class="token punctuation">)</span> <span class="token operator">+</span> amount<span class="token punctuation">)</span>
<span class="token keyword">return</span> ok
</pre><p>This approach also makes the final state consistent, preserving invariants like preventing the situation of insufficient funds. The main difference from the previous approach is that in such an implementation we have a certain amount of time in which the accounts are in a non-consistent state. Namely, such an operation implies that the total amount of funds in the accounts does not change. In this case, there is a time gap between <code>lock_from.release()</code> and <code>db.lock(to)</code>, during which the database can produce a non-consistent result: the total amount may differ from the correct one.</p>
<p>In fact, we split one transaction into two semi-transactions:</p>
<ol>
<li>The first semi-transaction does a check and withdraws the required amount from the account.</li>
<li>The second semi-transaction deposits the withdrawn amount to another account.</li>
</ol>
<p>It is clear that the splitting of the transaction into smaller ones, generally speaking, violates transactional behavior. And the example above is not an exception. However, if all the semi-transactions in the chain are fully executed, the result will be consistent with the preservation of all the invariants. This is the important property of the semi-transaction chain.</p>
<p>Temporarily losing some consistency, we, nevertheless, acquire another useful property: independence of operations, and, as a consequence, better scalability. Independence is manifested in the fact that the semi-transaction each time works with only one entity, reading, verifying, and changing its data, without communicating with other data. In this way, you can deal with a storage that doesn't support distributed transactions. Moreover, this approach can be used in the case of heterogeneous data storage, i. e. semi-transactions can start on one type of storage, and end on another. Such useful properties will be used below.</p>
<p>The logical question arises: how to implement semi-transitions in distributed systems avoiding getting bashed? To resolve this challenge you need to consider a lock-free approach.</p>
<h2 class="mume-header" id="lock-free">Lock-Free</h2>
<p>As is well known, lock-free approaches sometimes improve the performance of multithreaded systems, especially in the case of concurrent access to the shared resource. Nevertheless, it is completely unobvious that such an approach can be used in distributed systems. Let's dig in deeper and consider what is lock-free and why this property will be useful in solving our problem.</p>
<p>Some developers sometimes do not quite understand what lock-free is. The philistine view suggests that this is something related to atomic processor instructions. Here it is important to understand at the same time that lock-free means the use of atomic operations, while the converse is not true, i.e. not all atomic operations provide lock-free behavior.</p>
<p>An important property of the lock-free algorithm is that at least one thread makes progress in the system. But for some reason, this property is given as a definition very often (such a stupid definition can be found, for example, in <a href="https://en.wikipedia.org/wiki/Non-blocking_algorithm">Wikipedia</a>). It is necessary to add one important nuance: progress is made even in case of delays of one or several threads. This is a very important moment, which is often overlooked, with serious consequences for the distributed system.</p>
<p>Why is the absence of a condition for the progress of at least one thread nullifies the concept of a lock-free algorithm? The fact is that in this case, the usual spinlock will also be lock-free. Indeed, the one who took the lock will make progress. There is a thread with progress ⇒ lock-free?</p>
<p>Obviously, lock-free means locks absence, while spinlock with its name indicates that this is a real lock. That is why it's important to add a condition about progress even in case of delays. After all, these delays can last indefinitely. The definition does not say anything about the upper time limit. And if so, then such delays will be equivalent in some sense to the thread termination. Thus lock-free algorithms produce progress in this case.</p>
<p>But who said that lock-free approaches are applicable only to multithreaded systems? Replacing the threads in the same process on the single node with the processes on different nodes, and the shared memory of the threads on the distributed storage, we get a lock-free distributed algorithm.</p>
<p>Node failure in such a system is equivalent to delaying the execution for a while because the recovery procedure requires some time to be completed. In this case, the lock-free approach allows you to continue working with other participants in the distributed system. Moreover, special lock-free algorithms can be run in parallel, detecting a concurrent change and eliminating the duplicates.</p>
<p>The <em>exactly-once</em> approach means having a consistent distributed storage. Such storages are typically a huge persistent key-value table with possible operations: <code>set</code>, <code>get</code>, and <code>del</code>. However, for a lock-free approach, a more complex operation is required: CAS or compare-and-swap. Let's consider this operation in more detail, the possibilities of its use, and also the outcome.</p>
<h3 class="mume-header" id="cas">CAS</h3>
<p>CAS or compare-and-swap is the primary and important synchronization primitive for lock-free and wait-free algorithms. The essence of it can be illustrated by the following Pseco:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">CAS<span class="token punctuation">(</span>var<span class="token punctuation">,</span> expected<span class="token punctuation">,</span> new<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># everything inside the scope is performed atomically</span>
atomic<span class="token punctuation">:</span>
<span class="token keyword">if</span> var<span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">!=</span> expected<span class="token punctuation">:</span>
<span class="token keyword">return</span> false
var<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span>new<span class="token punctuation">)</span>
<span class="token keyword">return</span> true
</pre><p>To optimize the implementation the common practice is returning not <code>true</code> or <code>false</code> but the previous value because very often such operations are performed in a loop. In order to get the expected value, it is necessary to read it before:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">CAS_optimized<span class="token punctuation">(</span>var<span class="token punctuation">,</span> expected<span class="token punctuation">,</span> new<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># everything inside the scope is performed atomically</span>
atomic<span class="token punctuation">:</span>
current <span class="token operator">=</span> var<span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> current <span class="token operator">==</span> expected<span class="token punctuation">:</span>
var<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span>new<span class="token punctuation">)</span>
<span class="token keyword">return</span> current
<span class="token comment"># then CAS is expressed via CAS_optimized</span>
CAS<span class="token punctuation">(</span>var<span class="token punctuation">,</span> expected<span class="token punctuation">,</span> new<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">return</span> var<span class="token punctuation">.</span>CAS_optimized<span class="token punctuation">(</span>expected<span class="token punctuation">,</span> new<span class="token punctuation">)</span> <span class="token operator">==</span> expected
</pre><p>This approach can save one reading. As part of our consideration, we will use a simple <code>CAS</code> form, since if you want, you can do this by yourself.</p>
<p>In the case of distributed systems, each change is versioned. So, first of all, we read the value from the consistent storage, getting the current version of the data. And then we try to write a new value, expecting that the version of the data has not changed. The version is incremented each time you update the data:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">CAS_versioned<span class="token punctuation">(</span>var<span class="token punctuation">,</span> expected_version<span class="token punctuation">,</span> new<span class="token punctuation">)</span><span class="token punctuation">:</span>
atomic<span class="token punctuation">:</span>
<span class="token keyword">if</span> var<span class="token punctuation">.</span>get_version<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">!=</span> expected_version<span class="token punctuation">:</span>
<span class="token keyword">return</span> false
var<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span>new<span class="token punctuation">,</span> expected_version <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span>
<span class="token keyword">return</span> true
</pre><p>This approach allows you to control the update more accurately, avoiding the <a href="https://en.wikipedia.org/wiki/ABA_problem">ABA problem</a>. In particular, versioning is supported by Etcd and Zookeeper.</p>
<p>We note an important property that provides the usage of <code>CAS_versioned</code> operations. The fact is that such an operation can be repeated without violation of the superior logic. In multithreaded programming, this property has no special value, because there if the operation was unsuccessful, then we know for sure that it did not apply. In the case of distributed systems, this invariant is violated, because the request can reach the recipient, but the successful answer is no longer there. Therefore, it is important to be able to forward requests without fear of violating high-level logic invariants.</p>
<p>This is exactly what the <code>CAS_versioned</code> operation provides. In fact, you can repeat this operation indefinitely until the real answer from the recipient returns. That, in turn, throws out a whole class of errors related to network interaction significantly simplifying the distributed system.</p>
<h3 class="mume-header" id="example">Example</h3>
<p>Let's take a look at how to transfer money from one account to another based on <code>CAS_versioned</code> operation and semi-transactions, which belong, for example, to different instances of Etcd. Here I assume that the <code>CAS_versioned</code> function is already implemented appropriately based on the provided API.</p>
<pre data-role="codeBlock" data-info="py" class="language-python">withdraw<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> amount<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># CAS-loop</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
<span class="token comment"># obtaining version and content</span>
version_from<span class="token punctuation">,</span> amount_from <span class="token operator">=</span> <span class="token keyword">from</span><span class="token punctuation">.</span>get_versioned<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> amount_from <span class="token operator"><</span> amount<span class="token punctuation">:</span>
<span class="token keyword">return</span> error<span class="token punctuation">.</span>insufficient_funds
<span class="token keyword">if</span> <span class="token keyword">from</span><span class="token punctuation">.</span>CAS_versioned<span class="token punctuation">(</span>version_from<span class="token punctuation">,</span> amount_from <span class="token operator">-</span> amount<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">break</span>
<span class="token keyword">return</span> ok
deposit<span class="token punctuation">(</span>to<span class="token punctuation">,</span> amount<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># CAS-loop</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
version_to<span class="token punctuation">,</span> amount_to <span class="token operator">=</span> to<span class="token punctuation">.</span>get_versioned<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> to<span class="token punctuation">.</span>CAS_versioned<span class="token punctuation">(</span>version_to<span class="token punctuation">,</span> amount_to <span class="token operator">+</span> amount<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">break</span>
<span class="token keyword">return</span> ok
transfer<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> to<span class="token punctuation">,</span> amount<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># 1st semi-transaction</span>
<span class="token keyword">if</span> withdraw<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> amount<span class="token punctuation">)</span> <span class="token keyword">is</span> ok<span class="token punctuation">:</span>
<span class="token comment"># if the first semi-transaction succeeds</span>
<span class="token comment"># then perform the 2nd semi-transaction</span>
deposit<span class="token punctuation">(</span>to<span class="token punctuation">,</span> amount<span class="token punctuation">)</span>
</pre><p>Here, we split our operation into semi-transactions, and perform each semi-transaction using the <code>CAS_versioned</code> operation. This approach allows you to work independently with each account, providing you with the possibility of using heterogeneous storages, not related to each other. The only problem that awaits us here is the loss of money in the case of the failure of the current process between the semi-transactions.</p>
<h2 class="mume-header" id="queue">Queue</h2>
<p>In order to go further, it is necessary to implement the queue. The idea is that for communicating handlers you need to have an ordered message queue avoiding data loss or duplication. Accordingly, all interaction in the chain of handlers will be built on top of this primitive. It is also a useful tool for analyzing and auditing incoming and outgoing data streams. In addition, mutations of the handler states can also be performed through the queue.</p>
<p>The queue consists of a pair of operations:</p>
<ol>
<li>Adding a message to the end of the queue.</li>
<li>Extracting a message from the queue at a given index.</li>
</ol>
<p>In this context, I do not consider deleting messages from the queue for several reasons:</p>
<ol>
<li>Several handlers can read from the same queue. Synchronization of the removal will be a non-trivial task, although it is not impossible.</li>
<li>It is useful to keep the queue for a relatively long interval (day or week) for the possibility of debugging and auditing. The usefulness of such a property is difficult to overestimate.</li>
<li>You can delete old items either by periodic schedule or by using the TTL on the queue elements. It is important to make sure that the handlers manage to process the data before the broom comes and cleans everything up. If the processing time is of the order of seconds, and TTL is of the order of days, then nothing like this should happen.</li>
</ol>
<p>To store the elements and effectively implement the addition, we need:</p>
<ol>
<li>Value to store the current index. This index indicates the end of the queue.</li>
<li>Elements of the queue, starting with the zero index.</li>
</ol>
<h3 class="mume-header" id="quasi-lock-free-queue">Quasi Lock-Free Queue</h3>
<p>To insert an item into a queue, we need to update two keys: the current index and the inserted element at the current index. Immediately there is an idea how to do it using the following sequence:</p>
<ol>
<li>First, increase the current index by one atomically using CAS.</li>
<li>Then, write the inserted element at the index from the previous step.</li>
</ol>
<p>However, this approach, oddly enough, has two fatal flaws:</p>
<ol>
<li><strong>This implementation is not lock-free</strong>. It would seem that if we insert several elements in parallel, then at least one insertion is successful in this case. Lock-free? No! The fact is that we have 2 operations: inserting and reading. And although the insert itself is lock-free, however, inserting and reading is not! This is easily seen if we assume that immediately after the atomic update of the index a delay appears with the size of an eternity. Then we will never be able to read this and the subsequent elements and will be locked forever. This will pose a serious problem for the availability of our queue, because in case of a handler failure at this point, other handlers get stuck while reading the value from this position.</li>
<li><strong>Problems with the interaction of several queues</strong>. If the handler fails after updating the index, we do not know what index we need to use to write the value in if we continue working after the checkpoint. This index will be lost forever.</li>
</ol>
<p>Thus, it is extremely important to keep lock-free with respect to all operations in order to preserve the high availability and fault tolerance of the system.</p>
<h3 class="mume-header" id="lock-free-queue">Lock-free Queue</h3>
<p>Accordingly, arguing logically, there is only another variant of implementation: in the reverse order, i.e. we add an element to the end of the queue, and then update the index:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">push<span class="token punctuation">(</span>queue<span class="token punctuation">,</span> value<span class="token punctuation">)</span><span class="token punctuation">:</span>
index <span class="token operator">=</span> queue<span class="token punctuation">.</span>get_current_index<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
<span class="token comment"># get a variable pointing to the queue item</span>
var <span class="token operator">=</span> queue<span class="token punctuation">.</span>at<span class="token punctuation">(</span>index<span class="token punctuation">)</span>
<span class="token comment"># version = 0 corresponds to the new value,</span>
<span class="token comment"># means that the queue item must be empty at the time of writing</span>
<span class="token keyword">if</span> var<span class="token punctuation">.</span>CAS_versioned<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> value<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># CAS succeeds => update the index</span>
queue<span class="token punctuation">.</span>update_index<span class="token punctuation">(</span>index <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span>
<span class="token keyword">break</span>
<span class="token comment"># here is a tricky moment, see description below</span>
index <span class="token operator">=</span> <span class="token builtin">max</span><span class="token punctuation">(</span>queue<span class="token punctuation">.</span>get_current_index<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> index <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span>
update_index<span class="token punctuation">(</span>queue<span class="token punctuation">,</span> index<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
<span class="token comment"># get the current versioned value</span>
cur_index<span class="token punctuation">,</span> version <span class="token operator">=</span> queue<span class="token punctuation">.</span>get_current_index_versioned<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token comment"># the current index may suddenly turn out to be larger</span>
<span class="token comment"># see description below</span>
<span class="token keyword">if</span> cur_index <span class="token operator">>=</span> index<span class="token punctuation">:</span>
<span class="token comment"># someone proactively updated index to a more recent,</span>
<span class="token comment"># so the work is done</span>
<span class="token keyword">break</span>
<span class="token keyword">if</span> queue<span class="token punctuation">.</span>current_index_var<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span>CAS_versioned<span class="token punctuation">(</span>version<span class="token punctuation">,</span> index<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># index updated, the work is completed</span>
<span class="token keyword">break</span>
<span class="token comment"># index has been updated by someone else,</span>
<span class="token comment"># but it is still outdated, try again</span>
</pre><p>It is worth to clarify the tricky moment. The thing is that after the successful execution of the first semi-transaction, the handler may fail or be delayed (handler failure is, generally speaking, a special case of infinite delay). In this case, we want to preserve the lock-free property for our system. What will happen in this case?</p>
<p>The thing is that next <code>push</code> operation will spin in the loop endlessly because the current index is now unavailable for anyone to update! Therefore, it is now our task to update the index and we must proactively do this, independently looking for the next element of the queue.</p>
<p>After writing the value, you now need to update the current index. However, the pitfalls are also waiting for us: we can not just rewrite the value. The matter is that if the handler is delayed for some reason between the semi-transitions, then something else during this time interval could have time to add the element to the queue and update the current value of the index. So, a simple update of the index will not work, because we simply override the new index value. In addition, we need to update to the most recent value. What is the most recent value in this case? It will correspond to the highest value of the index, because the index corresponds to the position in the queue, and the higher the position, the more recent data we have written to the queue.</p>
<p>It is worth noting that we wrote the index just in order to be able to quickly find the end of the queue for further addition. So it is just an optimization. Without the index, this queue also works fine, just much more slowly, slowing linearly with the growth of the queue. Therefore, the lag of the index value does not lose the consistency, but only improves the performance of the queue operations.</p>
<p>Moreover, some storages provide a way to iterate through the records. Having organized a set of keys for access to the queue elements in a certain way, you can find the last element at once without going through the previous ones. This requires expanding the requirements for the storage, and therefore will not be considered. Here we only confine ourselves to the most common approach, which will work everywhere.</p>
<h2 class="mume-header" id="interaction-of-queues">Interaction of Queues</h2>
<p>In order to proceed, consider the following problem, that can be useful in the future.</p>
<p><em>Task</em>. Transfer values from one queue to another.</p>
<p>This is the simplest task that can occur when processing data:</p>
<ol>
<li>There is no state, i.e. stateless handler.</li>
<li>No transformations, the read value, and the written value are the same.</li>
</ol>
<p>I think that it's not worth explaining that we want a fault-tolerant solution with the guarantee of <em>concurrent exactly-once</em>.</p>
<p>Without this requirement, processing would look like this:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">handle<span class="token punctuation">(</span><span class="token builtin">input</span><span class="token punctuation">,</span> output<span class="token punctuation">)</span><span class="token punctuation">:</span>
index <span class="token operator">=</span> <span class="token number">0</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
value <span class="token operator">=</span> <span class="token builtin">input</span><span class="token punctuation">.</span>get<span class="token punctuation">(</span>index<span class="token punctuation">)</span>
output<span class="token punctuation">.</span>push<span class="token punctuation">(</span>value<span class="token punctuation">)</span>
index <span class="token operator">+=</span> <span class="token number">1</span>
</pre><p>Let's add just a little resilence. To do this, you need to load and save the state of the handler:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">handle<span class="token punctuation">(</span><span class="token builtin">input</span><span class="token punctuation">,</span> output<span class="token punctuation">,</span> state<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># state is represented by the index</span>
index <span class="token operator">=</span> state<span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
value <span class="token operator">=</span> <span class="token builtin">input</span><span class="token punctuation">.</span>get<span class="token punctuation">(</span>index<span class="token punctuation">)</span>
output<span class="token punctuation">.</span>push<span class="token punctuation">(</span>value<span class="token punctuation">)</span>
index <span class="token operator">+=</span> <span class="token number">1</span>
<span class="token comment"># save the index in the state</span>
state<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span>index<span class="token punctuation">)</span>
</pre><p>This implementation is not <em>exactly-once</em>. The reason is that if the handler immediately crashes after adding an element to the output queue, but before saving the position, we get a duplicate.</p>
<p>To achieve an <em>exactly-once</em> guarantee, you need to store the index and write to the queue transactively. Since, generally speaking, queues and states can belong to different stores, between which there can be no distributed transactions, the only option that can be used is to break this transaction into semi-transactions:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># returns the smallest possible index to insert a new value</span>
get_next_index<span class="token punctuation">(</span>queue<span class="token punctuation">)</span><span class="token punctuation">:</span>
index <span class="token operator">=</span> queue<span class="token punctuation">.</span>get_index<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token comment"># try to find an empty item</span>
<span class="token keyword">while</span> queue<span class="token punctuation">.</span>has<span class="token punctuation">(</span>index<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># update the index similar to queue.push</span>
index <span class="token operator">=</span> <span class="token builtin">max</span><span class="token punctuation">(</span>index <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">,</span> queue<span class="token punctuation">.</span>get_index<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token keyword">return</span> index
<span class="token comment"># write the value at the specified index</span>
<span class="token comment"># returns true on success</span>
push_at<span class="token punctuation">(</span>queue<span class="token punctuation">,</span> value<span class="token punctuation">,</span> index<span class="token punctuation">)</span><span class="token punctuation">:</span>
var <span class="token operator">=</span> queue<span class="token punctuation">.</span>at<span class="token punctuation">(</span>index<span class="token punctuation">)</span>
<span class="token keyword">if</span> var<span class="token punctuation">.</span>CAS_versioned<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> value<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># update the index</span>
queue<span class="token punctuation">.</span>update_index<span class="token punctuation">(</span>index <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span>
<span class="token keyword">return</span> true
<span class="token keyword">return</span> false
handle<span class="token punctuation">(</span><span class="token builtin">input</span><span class="token punctuation">,</span> output<span class="token punctuation">,</span> state<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># load a state</span>
<span class="token comment"># intially {PREPARING, 0}</span>
fsm_state <span class="token operator">=</span> state<span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
switch fsm_state<span class="token punctuation">:</span>
case <span class="token punctuation">{</span>PREPARING<span class="token punctuation">,</span> input_index<span class="token punctuation">}</span><span class="token punctuation">:</span>
<span class="token comment"># prepare for writing: save the index,</span>
<span class="token comment"># that will be used for writing</span>
output_index <span class="token operator">=</span> output<span class="token punctuation">.</span>get_next_index<span class="token punctuation">(</span><span class="token punctuation">)</span>
fsm_state <span class="token operator">=</span> <span class="token punctuation">{</span>WRITING<span class="token punctuation">,</span> input_index<span class="token punctuation">,</span> output_index<span class="token punctuation">}</span>
case <span class="token punctuation">{</span>WRITING<span class="token punctuation">,</span> input_index<span class="token punctuation">,</span> output_index<span class="token punctuation">}</span><span class="token punctuation">:</span>
value <span class="token operator">=</span> <span class="token builtin">input</span><span class="token punctuation">.</span>get<span class="token punctuation">(</span>input_index<span class="token punctuation">)</span>
<span class="token keyword">if</span> output<span class="token punctuation">.</span>push_at<span class="token punctuation">(</span>value<span class="token punctuation">,</span> output_index<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># push succeeds, goto next item</span>
input_index <span class="token operator">+=</span> <span class="token number">1</span>
<span class="token comment"># if the item was not empty push_at returns false,</span>
<span class="token comment"># and we need to retry using the same input_index</span>
fsm_state <span class="token operator">=</span> <span class="token punctuation">{</span>PREPARING<span class="token punctuation">,</span> input_index<span class="token punctuation">}</span>
state<span class="token punctuation">.</span><span class="token builtin">set</span><span class="token punctuation">(</span>fsm_state<span class="token punctuation">)</span>
</pre><p>What are the cases when <code>push_at</code> returns <code>false</code>? After all, at the previous step, we checked that the cell corresponding to the queue index is free. The fact is that generally speaking, different handlers can write to the output queue. And if so, during the finite state machine going to next step, this cell can already be written by another handler. In this case, we simply repeat the process with the same <code>input_index</code>. Such a conflict can occur only if any other handler succeeds, and so we obtain lock-free behavior.</p>
<p>In fact, we split the operation into two semi-transactions:</p>
<ol>
<li>Prepare for writing: save the output index to avoid the duplicates.</li>
<li>Write the desired value using the saved index.</li>
</ol>
<p>The smallest thing left is to add the <em>concurrent</em> property to <em>exactly-once</em> guarantee.</p>
<p>What's the problem with the code above? There are two of them:</p>
<ol>
<li>At the time of writing to the queue, it may turn out that another handler has already written exactly the same number, and therefore <code>push_at</code> returns <code>false</code> in this case. And we will return to the previous step to push the same value twice.</li>
<li>The state can be updated from two different handlers, they will overwrite each other's data. This, in turn, can lead to very diverse race conditions.</li>
</ol>
<p>Why is it important to support precisely <em>concurrent exactly-once</em> in this case? The fact is that a distributed system can not guarantee that at each time the number of equivalent handlers will be no more than one. This is due to the fact that it is impossible to guarantee the termination of the handler in case of the network split. Therefore, for any split of the transaction into its parts, it is necessary to assume concurrent processing.</p>
<p>The following code demonstrates the final solution of the task, taking into account the above issues:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># either write to an empty cell, or check that the value is already written</span>
<span class="token comment"># i.e. if the function returns true,</span>
<span class="token comment"># subsequent calls will also return true.</span>
<span class="token comment"># the same property holds for false</span>
push_at_idempotent<span class="token punctuation">(</span>queue<span class="token punctuation">,</span> value<span class="token punctuation">,</span> index<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">return</span> queue<span class="token punctuation">.</span>push_at<span class="token punctuation">(</span>value<span class="token punctuation">,</span> index<span class="token punctuation">)</span> <span class="token operator">or</span> queue<span class="token punctuation">.</span>get<span class="token punctuation">(</span>index<span class="token punctuation">)</span> <span class="token operator">==</span> value
handle<span class="token punctuation">(</span><span class="token builtin">input</span><span class="token punctuation">,</span> output<span class="token punctuation">,</span> state<span class="token punctuation">)</span><span class="token punctuation">:</span>
version<span class="token punctuation">,</span> fsm_state <span class="token operator">=</span> state<span class="token punctuation">.</span>get_versioned<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
switch fsm_state<span class="token punctuation">:</span>
case <span class="token punctuation">{</span>PREPARING<span class="token punctuation">,</span> input_index<span class="token punctuation">}</span><span class="token punctuation">:</span>
<span class="token comment"># prepare for writing: save the index,</span>
<span class="token comment"># that will be used for writing</span>
output_index <span class="token operator">=</span> output<span class="token punctuation">.</span>get_next_index<span class="token punctuation">(</span><span class="token punctuation">)</span>
fsm_state <span class="token operator">=</span> <span class="token punctuation">{</span>WRITING<span class="token punctuation">,</span> input_index<span class="token punctuation">,</span> output_index<span class="token punctuation">}</span>
case <span class="token punctuation">{</span>WRITING<span class="token punctuation">,</span> input_index<span class="token punctuation">,</span> output_index<span class="token punctuation">}</span><span class="token punctuation">:</span>
value <span class="token operator">=</span> <span class="token builtin">input</span><span class="token punctuation">.</span>get<span class="token punctuation">(</span>input_index<span class="token punctuation">)</span>
<span class="token comment"># use idempotent function</span>
<span class="token comment"># thus the entire step becomes idempotent</span>
<span class="token keyword">if</span> output<span class="token punctuation">.</span>push_at_idempotent<span class="token punctuation">(</span>value<span class="token punctuation">,</span> output_index<span class="token punctuation">)</span><span class="token punctuation">:</span>
input_index <span class="token operator">+=</span> <span class="token number">1</span>
fsm_state <span class="token operator">=</span> <span class="token punctuation">{</span>PREPARING<span class="token punctuation">,</span> input_index<span class="token punctuation">}</span>
<span class="token comment"># try to atomically change the state</span>
<span class="token keyword">if</span> state<span class="token punctuation">.</span>CAS_versioned<span class="token punctuation">(</span>version<span class="token punctuation">,</span> fsm_state<span class="token punctuation">)</span><span class="token punctuation">:</span>
version <span class="token operator">+=</span> <span class="token number">1</span>
<span class="token keyword">else</span><span class="token punctuation">:</span>
<span class="token comment"># was a concurrent mutation, restore the state</span>
version<span class="token punctuation">,</span> fsm_state <span class="token operator">=</span> state<span class="token punctuation">.</span>get_versioned<span class="token punctuation">(</span><span class="token punctuation">)</span>
</pre><p>The corresponding state diagram is represented here:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihfsia-fJ1JSgirGfouvSUtiyPonRu65QzgYAHqlCfkiUiAjGIecAnJdOeiatdrQEFz2ui5qMWOuPRXTEhbV5BMR0Gil6eF-NlzREskqDhriWyNzV5J2_HWUmoeiUTNiHjlBEf_SU7V6Y/s1600/simple.png" alt="Simple"></p>
<p>The basic idea is to make each action idempotent. This is necessary both for concurrency and for the correct continuation of execution after the failure and subsequent state recovery.</p>
<p>At the same time, no external and internal factors such as kernel panic, a sudden application crash, network timeouts, etc. are afraid of such an algorithm. You can always restart from the very beginning and continue as if nothing had happened. Hard termination will not lose any data and will not lead to duplicates and inconsistencies. You can also make application updates without stopping processing: we launch the new version together with the old one and then terminate the old one. Of course, new and old versions must be compatible with each other.</p>
<p>Thus, such a handler provides <strong>absolute stability</strong> related to failures, delays, and concurrent executions.</p>
<h2 class="mume-header" id="solution-of-the-initial-task">Solution of the Initial Task</h2>
<p>Now we are ready to solve our initial task: the implementation of a stateful handler with the specific logic.</p>
<p>For this, we solve a slightly more general task: there is a user specific handler that has input queues and outputs the changed state and output values that will be pushed to the output queues:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># input parameters:</span>
<span class="token comment"># - input_queues - input queues</span>
<span class="token comment"># - output_queues - output queues</span>
<span class="token comment"># - state - the current state of the handler</span>
<span class="token comment"># - handler - user handler with the type: state, inputs -> state, outputs</span>
handle<span class="token punctuation">(</span>input_queues<span class="token punctuation">,</span> output_queues<span class="token punctuation">,</span> state<span class="token punctuation">,</span> handler<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># get the current FSM state and its version</span>
version<span class="token punctuation">,</span> fsm_state <span class="token operator">=</span> state<span class="token punctuation">.</span>get_versioned<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
switch fsm_state<span class="token punctuation">:</span>
<span class="token comment"># input_indexes contains a list of current indexes of the input queues</span>
case <span class="token punctuation">{</span>HANDLING<span class="token punctuation">,</span> user_state<span class="token punctuation">,</span> input_indexes<span class="token punctuation">}</span><span class="token punctuation">:</span>
<span class="token comment"># read values from each input queue</span>
inputs <span class="token operator">=</span> <span class="token punctuation">[</span>queue<span class="token punctuation">.</span>get<span class="token punctuation">(</span>index<span class="token punctuation">)</span> <span class="token keyword">for</span> queue<span class="token punctuation">,</span> index
<span class="token keyword">in</span> <span class="token builtin">zip</span><span class="token punctuation">(</span>input_queues<span class="token punctuation">,</span> input_indexes<span class="token punctuation">)</span><span class="token punctuation">]</span>
<span class="token comment"># calculate next indexes by increasing the current values</span>
next_indexes <span class="token operator">=</span> <span class="token builtin">next</span><span class="token punctuation">(</span>inputs<span class="token punctuation">,</span> input_indexes<span class="token punctuation">)</span>
<span class="token comment"># invoke user handler obtaining output values</span>
user_state<span class="token punctuation">,</span> outputs <span class="token operator">=</span> handler<span class="token punctuation">(</span>user_state<span class="token punctuation">,</span> inputs<span class="token punctuation">)</span>
<span class="token comment"># proceed to prepare for writing the results,</span>
<span class="token comment"># starting at zero position</span>
fsm_state <span class="token operator">=</span> <span class="token punctuation">{</span>PREPARING<span class="token punctuation">,</span> user_state<span class="token punctuation">,</span> next_indexes<span class="token punctuation">,</span> outputs<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">}</span>
case <span class="token punctuation">{</span>PREPARING<span class="token punctuation">,</span> user_state<span class="token punctuation">,</span> input_indexes<span class="token punctuation">,</span> outputs<span class="token punctuation">,</span> output_pos<span class="token punctuation">}</span><span class="token punctuation">:</span>
<span class="token comment"># get the index to write the value</span>
output_index <span class="token operator">=</span> output_queues<span class="token punctuation">[</span>output_pos<span class="token punctuation">]</span><span class="token punctuation">.</span>get_next_index<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token comment"># switch to next step for writing</span>
fsm_state <span class="token operator">=</span> <span class="token punctuation">{</span>
WRITING<span class="token punctuation">,</span> user_state<span class="token punctuation">,</span> input_indexes<span class="token punctuation">,</span>
outputs<span class="token punctuation">,</span> output_pos<span class="token punctuation">,</span> output_index
<span class="token punctuation">}</span>
case <span class="token punctuation">{</span>
WRITING<span class="token punctuation">,</span> user_state<span class="token punctuation">,</span> input_indexes<span class="token punctuation">,</span>
outputs<span class="token punctuation">,</span> output_pos<span class="token punctuation">,</span> output_index
<span class="token punctuation">}</span><span class="token punctuation">:</span>
value <span class="token operator">=</span> outputs<span class="token punctuation">[</span>output_pos<span class="token punctuation">]</span>
<span class="token comment"># write the value to the output queue</span>
<span class="token keyword">if</span> output_queues<span class="token punctuation">[</span>output_pos<span class="token punctuation">]</span><span class="token punctuation">.</span>push_at_idempotent<span class="token punctuation">(</span>
value<span class="token punctuation">,</span> output_index
<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># goto next value on success</span>
output_pos <span class="token operator">+=</span> <span class="token number">1</span>
<span class="token comment"># otherwise just goto PREPARING without position update</span>
<span class="token comment"># in case of increasing the output_pos</span>
<span class="token comment"># it's necessary to break the loop</span>
fsm_state <span class="token operator">=</span> <span class="token keyword">if</span> output_pos <span class="token operator">==</span> <span class="token builtin">len</span><span class="token punctuation">(</span>outputs<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># all results have been written</span>
<span class="token comment"># goto handling phase</span>
<span class="token punctuation">{</span>HANDLING<span class="token punctuation">,</span> user_state<span class="token punctuation">,</span> input_indexes<span class="token punctuation">}</span>
<span class="token keyword">else</span><span class="token punctuation">:</span>
<span class="token comment"># go here if necessary to write next output value,</span>
<span class="token comment"># or to repeat the preparation step</span>
<span class="token punctuation">{</span>PREPARING<span class="token punctuation">,</span> user_state<span class="token punctuation">,</span> input_indexes<span class="token punctuation">,</span> outputs<span class="token punctuation">,</span> output_pos<span class="token punctuation">}</span>
<span class="token keyword">if</span> state<span class="token punctuation">.</span>CAS_versioned<span class="token punctuation">(</span>version<span class="token punctuation">,</span> fsm_state<span class="token punctuation">)</span><span class="token punctuation">:</span>
version <span class="token operator">+=</span> <span class="token number">1</span>
<span class="token keyword">else</span><span class="token punctuation">:</span>
<span class="token comment"># was a concurrent mutation, restore the state</span>
version<span class="token punctuation">,</span> fsm_state <span class="token operator">=</span> state<span class="token punctuation">.</span>get_versioned<span class="token punctuation">(</span><span class="token punctuation">)</span>
</pre><p>The state diagram looks like this:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLnf_xtVB3N6qGl51mHHxu70gTv65zlAjrolC1GIuIpkgjSCITOGQx3568Z9kwgaRgzA3B8DuFlDQPBWtVQHA79v-_rAbgvrmwB87J4GQbJYl9fj8zTvJVh0zW12YNWNOFl-O-DUTG_JA/s1600/complex.png" alt="final"></p>
<p>Here we have a new state: <code>HANDLING</code>. This state is necessary for committing the execution results of our handler, since, generally speaking, it can contain nondeterministic actions. Moreover, this is just our case. In addition to this, it can be seen that the <code>PREPARING</code> and <code>WRITING</code> phases are repeated several times until all the values have been written to the output queue. Once all the values are written, then the handler immediately goes to the <code>HANDLING</code> phase.</p>
<p>It's worth noting that here I did not handle situations related to the absence of values in the input queues, and also when the handler returns empty values for the output queues. This is done intentionally to avoid introducing unnecessary complexity for greater visibility of the resulting code. I think that the reader will be able to cope independently with such situations and process them correctly.</p>
<p>Also it worth noting another nuance. Recording to the database will occur through the output queue. This allows you to write a generalized code and divide the processing and writing into an external database.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqG9U7VIGnrPcYsqdkpMh95dBwWDZRV-RJLBfD8M2HzJ7_Qukd2luoVdI6OrCuAbe_flYaB0jiXKwHR4ul8aSp_M5G0rpxYtH98SypiiHc5U7FWIydF_4r2Pxp_VAEwZls4dpAe-Y-4nw/s1600/scheme_final.png" alt="final"></p>
<p>Now we can write our handler containing specific logic solving our task:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">my_handler<span class="token punctuation">(</span>state<span class="token punctuation">,</span> inputs<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># add values from input streams</span>
state<span class="token punctuation">.</span>queue<span class="token punctuation">.</span>push<span class="token punctuation">(</span>inputs<span class="token punctuation">)</span>
<span class="token comment"># update the window according to duration</span>
state<span class="token punctuation">.</span>queue<span class="token punctuation">.</span>trim_time_window<span class="token punctuation">(</span>duration<span class="token punctuation">)</span>
<span class="token comment"># calculate the average</span>
avg <span class="token operator">=</span> state<span class="token punctuation">.</span>queue<span class="token punctuation">.</span>avg<span class="token punctuation">(</span><span class="token punctuation">)</span>
need_update_counter <span class="token operator">=</span> state<span class="token punctuation">.</span>queue<span class="token punctuation">.</span>size<span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">></span> size_boundary
<span class="token keyword">return</span> state<span class="token punctuation">,</span> <span class="token punctuation">[</span>
avg<span class="token punctuation">,</span>
<span class="token keyword">if</span> need_update_counter<span class="token punctuation">:</span>
true
<span class="token keyword">else</span><span class="token punctuation">:</span>
<span class="token comment"># none means there is no need to add an element</span>
none
<span class="token punctuation">]</span>
</pre><p>As you can see, the handler just does its job, while the complexity of manipulating queues and implementing a <em>concurrent exactly-once</em> guarantee is encapsulated inside the function <code>handle</code>.</p>
<p>Now you just need to add the interaction with the database:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">handle_db<span class="token punctuation">(</span>input_queue<span class="token punctuation">,</span> db<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
<span class="token comment"># at the very beginning, we create a transaction</span>
tx <span class="token operator">=</span> db<span class="token punctuation">.</span>begin_transaction<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token comment"># read the current index inside the transaction.</span>
<span class="token comment"># the current index is stored in the database,</span>
<span class="token comment"># allowing updating state transactionally</span>
index <span class="token operator">=</span> tx<span class="token punctuation">.</span>get_current_index<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token comment"># write the increased index</span>
tx<span class="token punctuation">.</span>write_current_index<span class="token punctuation">(</span>index <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span>
<span class="token comment"># get the value from the input queue</span>
value <span class="token operator">=</span> intput_queue<span class="token punctuation">.</span>get<span class="token punctuation">(</span>index<span class="token punctuation">)</span>
<span class="token keyword">if</span> value<span class="token punctuation">:</span>
<span class="token comment"># increase the counter</span>
tx<span class="token punctuation">.</span>increment_counter<span class="token punctuation">(</span><span class="token punctuation">)</span>
tx<span class="token punctuation">.</span>commit<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token comment"># either the transaction is successful,</span>
<span class="token comment"># and the counter is updated together with the index,</span>
<span class="token comment"># or the transaction is aborted and we just repeat it again</span>
</pre><p>There are no surprises. Because all state is updated within the single transaction then this handler can be run in parallel with itself, and, therefore, it provides a guarantee of <em>concurrent exactly-once</em> by definition. This implementation immediately demonstrates the usefulness of transactions.</p>
<h2 class="mume-header" id="out-of-the-scope">Out of the Scope</h2>
<p>The above algorithm is only the first step on the way to efficient and transactional data processing. Below I will give a list of possible optimizations and improvements that are useful in some cases, with minimal comments and considerations.</p>
<h3 class="mume-header" id="storage-optimizations">Storage Optimizations</h3>
<p>Consistent storages typically support richer functionality, such as transactional behavior on a limited key set, batching atomic actions, and range scanning. I have considered only the most general storage with the simplest primitive and showed that even in this case it is possible to build a transactional scalable system.</p>
<h3 class="mume-header" id="asynchronous-publishing">Asynchronous Publishing</h3>
<p>After processing the input streams, the stage of publishing the results to the output queues follows. This publication is performed sequentially, each subsequent record is waiting for the previous one. Because the data itself is already saved and ready for pushing, then the idea arises to parallelize it. In this way, it is possible to shoot with unimaginable ease both your legs and both hands. So I leave it as a homework.</p>
<h3 class="mume-header" id="batching">Batching</h3>
<p>An obvious optimization for increasing the throughput of queues is the batching of messages. Indeed, in the queue, it is possible to push not the values themselves, but references to groups of values. The values themselves can be prepared little by little, with the possibility of storing them on separate shards, storages or even files. In this case, the pointer to the element of the queue will become a composite pointer. In addition to the index in the queue, it should also contain a position in the batch.</p>
<h3 class="mume-header" id="double-sharding">Double Sharding</h3>
<p>To parallelize processing, sharding is often used. However, if there are a large number of handlers, then they all start writing to the same queue. To avoid unnecessary contention here, you can shard queue once again, but for writing, in addition to sharding the reads.</p>
<h2 class="mume-header" id="fundamentality">Fundamentality</h2>
<p>Let us discuss the basis for applying this approach. It is clear that one can not break any transaction into a set of atomic operations. The reason is trivial: if this could be done for any situation, then it would always be so. Therefore, it is important to outline the class of problems that can be solved using this approach.</p>
<p>If we carefully look at the actions that we are taking, we can see a number of characteristic features:</p>
<ol>
<li>Transactions are split into semi-transactions, which are executed sequentially. The total effect of all semi-transactions is exactly the same as the effect of the entire transaction.</li>
<li>Isolation is not an important requirement. The client can observe the intermediate actions of the transaction as if the transaction actions were visible to everyone.</li>
<li>The first and only the first semi-transaction can verify the validity of the subsequent actions. If the validation fails, then we simply do not start the follow-up actions. However, if we started the transaction by applying the first semi-transaction, then we do not have the possibility to terminate the execution. So the subsequent semi-transactions only applies the subsequent actions moving the execution forward. This is due to the simple fact: any mutation is visible to the client.</li>
</ol>
<p>The latter property can be slightly weakened, however, firstly, this is not always possible, and secondly, it can greatly complicate the code.</p>
<p>Let's look at the examples, why separation on a semi-transactions is possible:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">transfer<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> to<span class="token punctuation">,</span> amount<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># 1st semi-transaction</span>
<span class="token keyword">if</span> withdraw<span class="token punctuation">(</span><span class="token keyword">from</span><span class="token punctuation">,</span> amount<span class="token punctuation">)</span> <span class="token keyword">is</span> ok<span class="token punctuation">:</span>
<span class="token comment"># if the first semi-transaction succeeds</span>
<span class="token comment"># then perform the 2nd semi-transaction</span>
deposit<span class="token punctuation">(</span>to<span class="token punctuation">,</span> amount<span class="token punctuation">)</span>
</pre><p>Here, <code>withdraw</code> checks may not pass, at the same time <code>deposit</code> will never do it: who refuses extra money? However, if the function <code>deposit</code> for some reason can return a failure (for example, the account was blocked, or there is a limit on the top on the number of funds), then there are problems. It would seem that they can be solved by transferring funds back, but who said that at that moment the original account was not blocked? You can easily get results when the transaction hangs, and the funds will need to be redirected somewhere else, but already in manual mode.</p>
<p>Processing data in real time, in my opinion, is a reference example, when such an approach works perfectly. Indeed, at the very beginning, we check to see if there is any data that needs to be processed. If they are not, then we do not start anything. If there is data, then we run the handler, save the result, and then write it to the output queues sequentially. Because queues are unlimited, then the writing to them always ends in success, and hence transactional behavior will sooner or later be completed. At the same time, one can see an intermediate state, but this will not cause any dissatisfaction: it is an oxymoron to fix the inconsistency between shards where there are no distributed transactions.</p>
<h3 class="mume-header" id="two-phase-lock-free-commit">Two-Phase Lock-Free Commit</h3>
<p>Since we are discussing the transactional behavior, it would be a good idea to expose a two-phase commit.</p>
<p>Usually it consists of two phases: first, we lock the records and check the possibility of executing the commit, and further, if the previous phase passed successfully, then apply the changes simultaneously unlocking the records. In this sense, transactions based on a two-phase commit can implement an optimistic-pessimistic blocking scheme:</p>
<ol>
<li><em>Optimism</em>. From the client's point of view, during the execution of the transaction, we do not lock the records but only save, for example, the versions or timestamps, for subsequent validation on the commit.</li>
<li><em>Pessimism</em>. During the distributed transaction commit, we begin to lock the records.</li>
</ol>
<p>Some additional details can be read, for example, <a href="http://gridem.blogspot.com/2018/04/attainability-of-lower-bound-of.html">here</a>.</p>
<p>Of course, this is a somewhat voluntaristic explanation of the concepts of optimism and pessimism; they can only be applied to the transaction itself, but not to its individual parts, such as a commit. However, the commit phase can be viewed as a separate transaction, returning these concepts to their original meaning.</p>
<p>A pessimistic scheme of two-phase commit hints at a simple fact: this action is not lock-free by definition, which can significantly reduce the processing speed in case of random delays or failures. And even more so, the transaction can not be executed concurrently, because they will only interfere with each other providing conflicts instead of boost.</p>
<p>In the case of semi-transactions based on CAS operations, you can also see a number of similar features. Recall how the transactional write occurs to a queue:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># here is only the fragments of code that we are interested in</span>
handle<span class="token punctuation">(</span><span class="token builtin">input</span><span class="token punctuation">,</span> output<span class="token punctuation">,</span> state<span class="token punctuation">)</span><span class="token punctuation">:</span>
<span class="token comment"># ...</span>
<span class="token keyword">while</span> true<span class="token punctuation">:</span>
switch fsm_state<span class="token punctuation">:</span>
case <span class="token punctuation">{</span>HANDLING<span class="token punctuation">,</span> <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">}</span><span class="token punctuation">:</span>
<span class="token comment"># handle the data and save the result</span>
fsm_state <span class="token operator">=</span> <span class="token punctuation">{</span>PREPARING<span class="token punctuation">,</span> <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">}</span>
case <span class="token punctuation">{</span>PREPARING<span class="token punctuation">,</span> input_index<span class="token punctuation">}</span><span class="token punctuation">:</span>
<span class="token comment"># prepare for writing...</span>
output_index <span class="token operator">=</span> <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>get_next_index<span class="token punctuation">(</span><span class="token punctuation">)</span>
fsm_state <span class="token operator">=</span> <span class="token punctuation">{</span>WRITING<span class="token punctuation">,</span> output_index<span class="token punctuation">,</span> <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">}</span>
case <span class="token punctuation">{</span>WRITING<span class="token punctuation">,</span> output_index<span class="token punctuation">,</span> <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">}</span><span class="token punctuation">:</span>
<span class="token comment"># actual write that uses output_index</span>
</pre><p>In fact, here we have the following. After processing the data, we want to commit the result to the output queues. The commit process takes place in two phases:</p>
<ol>
<li><strong>PREPARING</strong>. Obtain an index that will be used to write the result.</li>
<li><strong>WRITING</strong>. Store the result at the obtained index. On conflict, the transaction will be repeated starting from the <strong>PREPARING</strong> phase.</li>
</ol>
<p>This is very similar to what happens during a two-phase commit. Indeed, during the first phase we prepare the necessary data for writing, and during the second phase, we do write. However, there are fundamental differences:</p>
<ol>
<li>Obtaining an index during the first phase does not lock the output queue. Moreover, this phase is non-intrusive, because generally does not change the state of the queue, all actions occur in the second phase.</li>
<li>In the classical two-phase commit after the successful first phase, the second application is unconditionally applied, i.e. the second phase does not have the possibility to fail. However, in our case, the second phase may not be successful, and the action must be repeated again.</li>
</ol>
<p>Thus, in the lock-free version of the two-phase commit, the first action does not lock the state, which means it allows to perform the transaction action completely optimistically, increasing the availability of data for the change.</p>
<h2 class="mume-header" id="consistency-requirements">Consistency Requirements</h2>
<p>Let's discuss the required consistency levels of the storage. An interesting point is that the safety requirement of the algorithm is not violated in the case of <em>Stale Reads</em>. The most important thing is to correctly write data through the CAS operation: between reading the value and writing during its execution there should be no intermediate changes. This leads us to the following possible consistency levels and storages:</p>
<ul>
<li><em>Distributed single register</em>: storages based on atomic register change (for example, Etcd and Zookeeper):
<ol>
<li>Linearizability</li>
<li>Sequential consistency</li>
</ol>
</li>
<li><em>Transactional</em>: storages with transactional behavior (for example, MySQL, PostgreSQL, etc.):
<ol>
<li>Serializability</li>
<li>Snapshot Isolation</li>
<li>Repeatable Read</li>
<li>Read Committed</li>
</ol>
</li>
<li><em>Distributed Transactional</em>: NewSQL storage:
<ol>
<li>Strict Consistency</li>
<li>Any of the above</li>
</ol>
</li>
</ul>
<p>However, the question arises: how the consistency affects the system? The answer is simple: only performance will be affected. If we read stale data then during the CAS operation we immediately obtain the conflict and all the data will have to be thrown out. Therefore, it makes sense to consider stricter consistent levels, for example, at least <em>Read My Writes</em>.</p>
<h2 class="mume-header" id="conclusion">Conclusion</h2>
<p>Transactional behavior during data processing allows you to achieve <em>exactly-once</em> guarantees. However, this solution is not scalable, because transactional processing is based on a two-phase commit, which locks the corresponding records. Adding the requirement of concurrent execution to avoid pauses as well as the requirements of heterogeneity sets the next hitherto unattainable level because distributed transactions lead to conflicts in case of high concurrency dramatically reducing the processing throughput.</p>
<p>Separation of transactions into semi-transactions and the use of a lock-free approach can significantly improve scalability and heterogeneity.</p>
<p>The important advantages of the approach are:</p>
<ol>
<li><strong>Heterogeneity</strong>: a single abstraction for different types of storages.</li>
<li><strong>Atomicity</strong>: each action is an atomic mutation of the persistent state.</li>
<li><strong>Safety</strong>: the approach implements the strictest guarantee of real-time processing: <em>exactly-once</em>.</li>
<li><strong>Concurrent</strong>: concurrent execution completely eliminates the processing delays.</li>
<li><strong>Real-time</strong>: real-time data processing.</li>
<li><strong>Lock-free</strong>: at any stage, data is not locked, there is always progress in the system.</li>
<li><strong>Deadlock free</strong>: the system will never come to a state from which it can not make progress.</li>
<li><strong>Race condition free</strong>: the system does not contain race conditions.</li>
<li><strong>Hot-hot</strong>: there are no delays to restore the system from failures.</li>
<li><strong>Hard stop</strong>: you can hard-stop the system at any time without implications.</li>
<li><strong>No failover</strong>: the algorithm loads the current state and immediately makes the progress in the system without having to restore the correctness of the previous state.</li>
<li><strong>No downtime</strong>: updates occur without downtime.</li>
<li><strong>Absolute stability</strong>: resilience to failures, delays and concurrent execution.</li>
<li><strong>Scalablility</strong>: sharding the queues and corresponding handlers allows you to scale the system horizontally.</li>
<li><strong>Flexibility</strong>: allows you to flexibly configure the pipeline and the corresponding system parameters.</li>
<li><strong>Fundamental</strong>: semi-transactions solve a wide class of problems.</li>
</ol>
<p>It is worth noting that there is an even more fundamental and performant approach. But it is another story.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijjI-QwNjpJ4qvfGS5f1IXKIyGEnhvBK3A3JWL54aEBaHY5vv7KzeKKqCEjhCLlJVZ902Ob3DT7rCWElrbRo_Y-9jnqYzTJGEfhkB_8gaCDJA0UAo8JBtE3uLrumFwXVsMrwwmtEA6u0M/s1600/light.jpg" alt="Light"></p>
<h2 class="mume-header" id="newly-introduced-concepts">Newly Introduced Concepts</h2>
<p>It is useless to try finding the information related to the following terms:</p>
<ol>
<li>Concurrent exactly-once.</li>
<li>Semi-transactions.</li>
<li>Lock-free two-phase commit, optimistic two-phase, or two-phase commit without locks.</li>
</ol>
<h2 class="mume-header" id="challenges">Challenges</h2>
<ol>
<li>Implement asynchronous writes to the output queues.</li>
<li>Implement reliable lock-free funds transfer based on semi-transactions and queues.</li>
<li>Find a stupid mistake in the handler.</li>
</ol>
<h2 class="mume-header" id="references">References</h2>
<p>[1] <a href="https://en.wikipedia.org/wiki/ABA_problem">Wikipedia: ABA problem.</a><br>
[2] <a href="https://bravenewgeek.com/you-cannot-have-exactly-once-delivery/">Blog: You Cannot Have Exactly-Once Delivery.</a><br>
[3] <a href="http://gridem.blogspot.com/2018/04/attainability-of-lower-bound-of.html">Blog: Attainability of the Lower Bound of the Processing Time of Highly Available Distributed Transactions.</a><br>
[4] <a href="http://gridem.blogspot.com/2017/11/replicated-object-3-subjector-model.html">Blog: Replicated Object. Part 3: Subjector Model.</a><br>
[5] <a href="https://en.wikipedia.org/wiki/Non-blocking_algorithm">Wikipedia: Non-blocking algorithm.</a></p>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com2tag:blogger.com,1999:blog-7694239937514449322.post-4137540571254437152018-04-26T20:29:00.001-07:002018-05-15T13:30:17.751-07:00Attainability of the Lower Bound of the Processing Time of Highly Available Distributed Transactions<h2 class="mume-header" id="introduction">Introduction</h2>
<p>Recently I've read another article from the series: "we are better than a two-phase commit". Here I will not analyze the contents of this article (although I'm thinking about giving a detailed analysis). The task of my opus is to offer the most effective version of the distributed commit in terms of number of round trips. Of course, such a commit comes at a price. However, the goal is to assess and show that the two-phase commit is not a drag, as many believe.</p>
<p>It should also be noted that there will be no full-scale experiments and fake comparisons. Algorithms and theoretical analysis will be given simply. If desired, you can independently implement and test it in practice. Of course, it would be much better that this was presented in the current article, but it all depends on free time and motivation. In my opinion, it's more important to describe algorithms than to present charts, because almost everyone can draw the charts based on algorithms while the opposite is not true.<br></p>
<a name='more'></a>
<p>After such an introduction, let's begin.</p>
<h2 class="mume-header" id="two-phase-commit">Two-Phase Commit</h2>
<p><strong>Definition</strong>. <em>RTT</em> is the time of message back and forth.<br>
<strong>Definition</strong>. <em>Hop</em> is the time of single shot.</p>
<p><strong>Theorem</strong>. 1 RTT is equal to two hops.<br>
<em>Proof</em>. It is obvious.</p>
<p><strong>Definition</strong>. <em>Distributed commit</em> is the process of making atomic changes between at least two distributed participants.</p>
<p><strong>Definition</strong>. A <em>two-phase commit</em> is a commit consisting of two phases. The first phase is an atomic operation to verify the possibility of initiating a transaction and blocking the participants to perform the commit. The second phase is the collection of responses from participants and the further transaction processing with the releasing of the locks.</p>
<p><strong>Theorem</strong>. A two-phase distributed commit can not be made faster than 1 RTT.<br>
<em>Proof</em>. To perform a two-phase commit, it is necessary, as a minimum, to send a request from the client to all participants and receive a response on completion. This requires 2 hops or 1 RTT.</p>
<p><strong>Definition</strong>. A <em>highly available transaction commit</em> is a commit that continues execution even if one or more involved participants are failed.</p>
<p>Here we assume fail-stop model for simplicity. The algorithm described below however can be easily generalized to cover other models.</p>
<p><strong>Theorem</strong>. A two-phase highly available distributed commit for 1 RTT is possible.</p>
<p>To prove this theorem, it is sufficient to provide a method and conditions when it is possible. It is clear that this is not always possible, because in case of concurrent access of the same resource, the involved transactions should be serialized on accessing the resource. Thuse they will be executed sequentially. In this case, talking about 1 RTT will be somewhat funny. Nevertheless, even the usual algorithms provide under good conditions timings much more than 1 RTT.</p>
<p>The remainder of this article will be devoted to the proof of this theorem.</p>
<h2 class="mume-header" id="two-phase-commit-1">Coordination</h2>
<p>Consider the classical scheme of a two-phase commit with the Transaction Coordinator.</p>
<p><strong>Definition</strong>. <em>Transaction Coordinator</em> coordinates the distributed transaction making final decision to commit or abort the transaction based on the responses from participants.</p>
<p>The sequence is as follows:</p>
<p><em>1st hop</em>. The client sends the request to the Transaction Coordinator.<br>
<em>2nd hop</em>. The Transaction Coordinator sends the request to the participants to prepare for the transaction: the 1st phase.<br>
<em>3rd hop</em>. Participants successfully performed the preparation and send a response that they are ready to execute transactions.<br>
<em>4th hop</em>. The Transaction Coordinator sends a message to all participants about the execution of the transaction: the 2nd phase.<br>
<em>5th hop</em>. Participants send back the success of the transaction execution to the coordinator.<br>
<em>6th hop</em>. The coordinator responds to the client.</p>
<p>Total 3 RTT.</p>
<p>Now we add fault tolerance to achieve high availability. We will assume that the coordinator and the participants belong to the corresponding consensus groups. We will also assume favorable conditions, i.e. we have stable leaders of the groups and the consensus terminates. Let us prove the lemma:</p>
<p><strong>Lemma</strong>. Distributed consensus based on the leader can not be done faster than 1 RTT.<br>
<em>Proof</em>. To achieve consensus, the request should be directed to the leader. Wherein:</p>
<p><em>1st hop</em>. The leader sends the request to the other participants of the consensus, usually known as <em>followers</em>.<br>
<em>2nd hop</em>. Participants send confirmation to the leader.</p>
<p>Without these hops consensus is impossible.</p>
<p><strong>Lemma</strong>. 1 RTT consensus is possible.<br>
<em>Proof</em>: Consider Raft algorithm. In the case of a stable leader and the presence of majority of consensus participants, an agreement on the leader takes place after receiving responses from the participants, i.e. after 1 RTT.</p>
<p>It is worth noting that after this the system guarantees that this agreement will remain in the system, even though the agreement at this point has not yet reached by other participants. If leader fails, a failover occurs where the new leader is responsible to reconcile these changes. However, this is not the subject of the lemma; we are considering a potential opportunity, i.e. some ideal conditions that can lead to the desired result - the achievement of the consensus. Why do not we consider all possible conditions? The reason that there is a theorem that <a href="https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf">consensus in an asynchronous system is impossible</a>. Therefore, it is important to understand what is the minimum possible delay in the most favorable situations without violating the correctness of the algorithm, which is required to maintain its invariants in the event of violation of these favorable conditions at any stage. Two of these lemmas give an exhaustive answer, which suggests that the minimum possible time to achieve the distributed agreement is attainable.</p>
<p>This theorem can be generalized, proving that it is impossible to reach a consensus faster than 1 RTT, throwing out the condition of having a stable leader. However, this is beyond the scope of this article (the idea can be taken from article <a href="http://gridem.blogspot.com/2017/08/latency-of-geo-distributed-databases.html">"Latency of Geo-Distributed Databases"</a>). The idea of the proof is to consider the spreading of knowledge about other participants in the system and having of a corresponding message: using 1 hop you can only send the data, but do not know whether they have received and what the state recipient had.</p>
<p>So, for fault tolerance, let’s consider a consensus with 1 RTT and add it to our two-phase commit:</p>
<p><em>1st hop</em>. The client sends a request to the leader of the coordinator.<br>
<em>2nd and 3rd hop</em>. The coordinator leader coordinates the beginning of the transaction.<br>
<em>4th hop</em>. The Transaction Coordinator sends a request to the leaders of the participants: the 1st phase.<br>
<em>5th and 6th hop</em>. Participants successfully prepares with the preservation of the decision in their consensus groups.<br>
<em>7th hop</em>. Leaders of participants send the answer that they are ready to execute transaction.<br>
<em>8th and 9th hop</em>. The coordinator's leader performs consensus agreement.<br>
<em>10th hop</em>. The leader of the coordinator sends out a message to all the participants' leaders about the execution of the transaction: the 2nd phase.<br>
<em>11th and 12th hop</em>. Leaders agree on the commit and apply the changes.<br>
<em>13th hop</em>. Participants send the success to the coordinator's leader.<br>
<em>14th hop</em>. The coordinator responds to the client.</p>
<p>Total 7 RTT. Not bad. Fault tolerance costs "only" 4 RTT. The reason is due to the fact that the coordinator and participants consistently come to their own consensus 2 times.</p>
<p>In the above scheme, you can see some non-optimality. Let's fix it.</p>
<h2 class="mume-header" id="commit-optimization">Commit Optimization</h2>
<p>The first obvious optimization is sending a response to the client immediately after collecting the responses of successful preparation from the participants. Because these responses are fault-tolerant, then the participants will never forget about them, which means that the transaction will sooner or later be executed even if the nodes fail, the leader crashes, etc. However, there is one slippery moment.</p>
<p>In fact the coordinator makes the final decision on whether to commit the final transaction or not. Meaning that even if all participants returned OK, but some participant blunted because of, for example, a leader election, then the coordinator can roll back the transaction. And if so, then you can remove only 10-13th hops, but not 8th and 9th. But that’s not bad either, since we have a decrease by 2 RTT, i.e. 5 RTT instead of 7.</p>
<p>At the same time, 10-13 hopes do not disappear anywhere, just the client does not need to wait for them. The coordinator and participants will finish their processing in parallel with the client. And the client will receive his confirmation a little earlier. The commit will be performed in the system, just a little later. Here we use the magic of asynchrony, consensus and the inability to prove to the external participant that we have slightly cheated and cut the corner. If the client suddenly wants to immediately read the data that we just completed and go directly to a participant, it will wait for the lock (if it was not removed by that time by the 2nd phase), and this request will hang until it is released . However, within the framework of our theoretical research this fact is absolutely not important, because we prepare ideal conditions for ourselves. And in the case of nonideal conditions, as already mentioned above, we will wait for several eternities (since consensus will require eternity, but we need to hold them several, and sequentially).</p>
<p>The next move is a bit more complicated and elegant.</p>
<p>Let's consider the very beginning of the transaction. There the client sends a request to the coordinator and then it initiates a two-phase commit by sending requests to the other participants. There is simple idea to execute such requests simultaneously, i.e. send the request to both the coordinator and the participants in parallel. On this way we can be trapped.</p>
<p>The matter is that the client is not a fault-tolerant entity, i.e. it can fall. Imagine that it sent a request to the participants, they took a lock and waited, and the request to the coordinator for some reason did not reach and the client feil. Thus, there is no one to start a two-phase commit and there is no one to roll it back in case of conflicts / problems and so on. Participants will permanently block records and no one will help them. Therefore, such optimization is incorrect. Participants have the right to commit only after the decision of the coordinator, who is responsible for the transaction and rolls it back if necessary.</p>
<p>To go further, we need to take a completely different look at the problem. And for this we begin, oddly enough, with consensus.</p>
<h2 class="mume-header" id="consensus-optimization">Consensus Optimization</h2>
<p>It seems that there is nothing to do. After all, Raft achieves the minimum possible execution time - 1 RTT. However, it can be done faster - for 0 RTT.</p>
<p>Let’s recall that in addition to the consensus itself, another 1 RTT is required to send a request from the client to the leader and receive a response. So for a remote consensus group, 2 RTT is required for this case, which we see in the two-phase commit on 2 examples: sending and committing to the coordinator, sending and committing to the participants. A total of 4 RTTs at once, and another 1 RTT - to the second phase commit on the coordinator.</p>
<p>It is clear that a leader-based consensus for a remote client can not be faster than 2 RTTs. In fact, at first we need to deliver a message to the leader, and then the leader must execute the agreement by sending the message to the participants of the consensus group and get an response from them. There is no options.</p>
<p>The only option is to get rid of the weak entity - the leader itself. Indeed, not only all records must pass through it, additionally in case of its fail the group becomes inaccessible for a relatively long time. The leader of consensus is the weakest part, and the the leader election is the most fragile and nontrivial part of the consensus. So you just need to get rid of it.</p>
<p><strong>Definition</strong>. <em>Message broadcast</em> is the sending of the same message to all the participants of the group.</p>
<p>To do this, let's take well known in inner circles <a href="http://gridem.blogspot.com/2016/05/replicated-object-part-7-masterless.html">masterless consensus</a>. The main idea is to achieve the same state on the participants. To do this, it is sufficient to make 2 broadcasts, i.e. just 1 RTT. The first broadcast to the participants can make the client itself. Response on broadcast from participants can be sent to the client. If the client receives the same state (and he receives this in the case, for example, of the absence of concurrent requests), then it will be able to understand, on the analysis of the content of the broadcast responses, that its request will be executed sooner or later. In fact, using described algorithm, all participants in the consensus, including the client, simultaneously realize that the agreement has happened. And this will happen after 2 broadcasts, i.e. 1 RTT. Because the client still has to spend 1 RTT on sending the message to the group and receiving the answer, then we have a paradoxical conclusion that the consensus was performed at 0 RTT effectively.</p>
<h2 class="mume-header" id="analogy">Analogy</h2>
<p>To go further, we will use the powerful analysis tool - analogy. Let's return to Raft algorithm. What is happening there? It consists of two phases:</p>
<p><em>1st Phase</em>: The leader sends a request to the participants and is waiting for a response.<br>
<em>2nd Phase</em>: After the response, the leader enters into the agreement individually and sends it to the participants of the system.</p>
<p>Does not it look like anything? That's right, this is a two-phase commit, only with some clauses:</p>
<ol>
<li>The Raft algorithm does not wait for a response from all participants. In a two-phase commit for a successful transaction, you must wait for a successful response from all participants.</li>
<li>The participants in the Raft algorithm can not say <code>notOK</code>. More theoretically, it can do so (for example, out of disk space), but this <code>notOK</code> will be analogous to the lack of response. In a two-phase commit, everything is stricter: if at least one of the participants make <code>notOK</code> decision, then the entire transaction should be aborted and rolled back. This is the very essence of two-phase commit: first we ask for the agreement of everyone, and only after the unanimous agreement we apply the changes. Consensus in this sense is more democratic, because requires majority agreement.</li>
</ol>
<p>At the same time, they have in common that there is a dedicated decision driver (leader or coordinator), and there are two phases - preliminary, and final.</p>
<p>Accordingly, all we need is to refuse the coordinator in the two-phase commit, i.e. do exactly the same thing that we did for consensus, giving up the leader.</p>
<p>Let's forget about fault tolerance for a while and see how the commit looks in this case.</p>
<h2 class="mume-header" id="self-coordination">Self-Coordination</h2>
<p><strong>Definition</strong>. A <em>two-phase commit without a coordinator</em> consists of 2 phases:</p>
<ol>
<li>All participants send their decision to all other participants: <code>OK</code> or <code>notOK</code>.</li>
<li>Each participant after receiving the <code>OK</code> from everyone commit changes or rolls back them if at least one responds to <code>notOK</code>.</li>
</ol>
<p>After that, for reliability, each participant can broadcast to everyone else information that a commit has occurred and you can remove the locks, but this is not necessary.</p>
<p>Why did the coordinator suddenly become unnecessary? The fact is that the coordinator followed the transactional process, including whether the nodes are alive. So in case of problems with participants, the coordinator rolled back the transaction. The problem was only in the coordinator itself, because it could not look after itself. Therefore, often a two-phase commit is called blocking commit.</p>
<p><strong>Definition</strong>. <em>Self-coordinating transactions</em> are transactions that do not require a dedicated coordinator.</p>
<p>However, by adding fault tolerance, the role of the coordinator becomes unnecessary. Every participant that is represented by consensus group can stand up for itself. Thus, we come to self-coordinating transactions without the need for a dedicated coordinator. An important difference from the usual two-phase commit with the coordinator is that the coordinator can at any time decide to roll back the transaction, even if all the participants gave a positive response. In self-coordinated transactions such nondeterministic behavior is unacceptable, because each participant makes a decision based on the responses of other participants and this decision should be the same.</p>
<p><strong>Theorem</strong>. Self-coordinating transactions produce strict consistency (linearizability + serializability).<br>
<em>Proof</em>. Actually, the proof is based on the simple fact that the two-phase commit also provides such a guarantee. Indeed, in a scheme without a coordinator, each participant is itself a coordinator; there is a two-phase commit as if it is the only one. This means that it preserves all the invariants of the two-phase commit. This is easy to verify, if we recall that each participant broadcasts responses to everyone else. So everyone receives <code>OK</code> responses from all the others, acting as a coordinator for performing the transaction commit.</p>
<p>Let's describe the minimum number of hops under a favorable conditions:</p>
<p><em>1st hop</em>. The client sends a message to all participants in the transaction.<br>
<em>2nd hop</em>. All participants send a reply to the client and to each other.</p>
<p>After the 2nd hop, the client has all the necessary information to make a decision about the commit. This requires only 1 RTT.</p>
<h2 class="mume-header" id="fault-tolerance-and-availability">Fault Tolerance and Availability</h2>
<p>An attentive reader may ask: what to do in case of a client failure? After all, if the participants in the system can be made fault-tolerant, then to the client we can not make such requirement, i.e. it can fail at any moment. It is clear that after the client sends requests to all participants of the system, the distributed commit can be completed without the client. And what if the client managed to send only to some of them and then fail safely?</p>
<p>In this case, we oblige the client to do the following: the client must forward to each participant information about all other participants in our transaction. Thus, each participant knows all the other participants and sends them the result. In this case, any participant, if it did not receive a request from the client, can choose one of the following behaviors:</p>
<ol>
<li>Immediately reply that it does not accept the transaction, i.e. sends <code>notOK</code>. In this case, the locks are rolled back. The participant at the same time, as always, broadcasts its response to the other participants.</li>
<li>If the request from the other participant contains all the necessary information for executing the transaction commit for this participant, then it is possible to make a decision about the successful locking of the corresponding records (1st phase) and send <code>OK</code>. To this end, the client must send to each participant of the transaction information about all other participants and all the necessary data for executing the distributed commit.</li>
</ol>
<p>In any case, we get that all participants either get <code>OK</code>, or in the absence of the necessary information, someone responds <code>notOK</code> and the transaction rolls back. So in the event of a client failure, each participant is able either to complete the initiated transaction or to correctly roll back the client's actions.</p>
<p>It remains to make the participants of the distributed system to be fault-tolerant. To do this, we put them in the consensus of the group without a dedicated leader. So each participant will not be represented by a separate node, but a set of nodes in the consensus group.</p>
<p>The commit algorithm will look like this:</p>
<ol>
<li>The client sends its request to each node belonging to the transaction group of the transaction.</li>
<li>Each node sends a reply to all other nodes and the client about the speculative execution of the first phase of the commit as if it were being executed at the current step of the consensus. In reality, we do not know whether this will actually happen or not, because if there are concurrent requests from other clients, the consensus can reorder the current unapplied actions.</li>
<li>The client receives all requests from all nodes of all participants. If all nodes in the speculative execution responded <code>OK</code> and the consensus step was the same for each node from the group consensus, it means that the speculative execution of the first phase will actually happen and client is able making a decision about the commit.</li>
</ol>
<p>In fact, the condition for obtaining a response from all nodes of each group is redundant. However, a more detailed consideration of the relaxing of this requirement is beyond the scope of this article.</p>
<h2 class="mume-header" id="conclusion">Conclusion</h2>
<p>Total we obtain 2 hops or 1 RTT. Given that the communication between the client and the server can not be removed, the effective processing time of the commit on the server side is zero, i.e. as if the server instantly processed a distributed, high-availability, fault-tolerant transaction and sent a response to the client.</p>
<p>Thus, we have an important theoretical and practical result: the lower bound of the execution time of the distributed fault-tolerant highly available commit is attainable.</p>
<h2 class="mume-header" id="references">References</h2>
<p><a href="https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf">Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson, 1983, <em>Impossibility of distributed consensus with one faulty process</em></a></p>
<p><a href="http://gridem.blogspot.com/2017/08/latency-of-geo-distributed-databases.html">G. Demchenko, 2017, <em>Latency of Geo-Distributed Databases</em></a></p>
<p><a href="http://gridem.blogspot.com/2016/05/replicated-object-part-7-masterless.html">G. Demchenko, 2016, <em>Masterless Consensus Algorithm</em></a></p>
<p><a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2003-96.pdf">Jim Gray, Leslie Lamport, 2003, <em>Consensus on Transaction Commit</em></a></p>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-62630990751582164762017-11-16T02:23:00.000-08:002017-11-20T08:19:25.958-08:00Replicated Object. Part 3: Subjector Model<!doctype html><html><head><meta charset='utf-8'>
</head><body class="markdown-body">
<p data-line="2" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLtIJdkPVM8LGti94crTOjgk_7pS0ay0VDToXKSsvuU8bopmSEyTqwpmOqxgJjwJC6vJel6OlwlTdYTqfY77jCJfWiB6nM7x21d5gquCpDsxhjF2dVV9LK9-6CP3wM1nqfumWS6UWX9Qg/s600/iz_larca_eng.jpg" alt="Parallel execution"></p>
<h2 data-line="4" class="code-line" id="preface">Preface</h2>
<p data-line="6" class="code-line">This article is a continuation of the series of articles about asynchrony:</p>
<ol>
<li data-line="8" class="code-line"><a href="https://kukuruku.co/post/asynchronous-programming-back-to-the-future/">Asynchronous Programming: Back to the Future.</a></li>
<li data-line="9" class="code-line"><a href="https://kukuruku.co/post/asynchronous-programming-part-2-teleportation-through-portals/">Asynchronous Programming Part 2: Teleportation through Portals.</a></li>
</ol>
<p data-line="11" class="code-line">After 3 years, I have decided to expand and generalize the available spectrum of asynchronous interaction based on coroutines. In addition to these articles, it is also recommended to read the article related to <em>god adapter</em>:</p>
<ol start="3">
<li data-line="13" class="code-line"><a href="http://gridem.blogspot.com/2015/11/replicated-object-part-2-god-adapter.html">Replicated Object. Part 2: God Adapter.</a></li>
</ol>
<h2 data-line="15" class="code-line" id="introduction">Introduction</h2>
<p data-line="17" class="code-line">Consider an electron. What do we know about it? A negatively charged elementary particle, a lepton having some mass. This means that it can participate in at least electromagnetic and gravitational interactions.</p>
<a name='more'></a>
<p data-line="20" class="code-line">If we place a spherical electron in a vacuum, then everything that it will be able to do is moving rectilinearly. 3 degrees of freedom and spin, and only uniform rectilinear motion. Nothing interesting and unusual.</p>
<p data-line="22" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDJjdaOyMixLoLqXIFbEU3KHlWQ3ojx6FHdRBQDai7fuqtkSco66sZAeFzN8FUShx4P5L-cQAwhfbB2sDTgzYBGhibXxWLUX3cYAMObyKN9vuhU9TEM-CrvdMyipGE880tmxT6keArl_I/s1600/electron.png" alt="Electron"></p>
<p data-line="24" class="code-line">Everything changes in quite an amazing way, if other particles are nearby. For example, a proton. What do we know about it? Many things. We will be interested in the mass nature and the presence of a positive charge, modulo equal to exactly the electron, but with a different sign. This means that the electron and proton will interact with each other in an electromagnetic manner.</p>
<p data-line="26" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOqgmJY1aqL2g_Wq30bd_-NR-RWvCIjxLFTqAQt3OwlqwJWeHBfq8KFKat2QmhJlZ0Y0VZdtBipFSh7ObAJRDJwAbkoog7beOfZQFvxo4tiHnk5WyZlCZRc5tyjtBHE2IHjfOT2iWqOTU/s1600/e_p.png" alt="Electron Proton"></p>
<p data-line="28" class="code-line">The result of this interaction will be the curvature of the rectilinear trajectory under the action of electromagnetic forces. But this is the half of trouble. The electron, moving nearby to the proton, will experience acceleration. The electron-proton system will be a dipole, which suddenly starts producing bremsstrahlung generating electromagnetic waves propagating in a vacuum.</p>
<p data-line="30" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJmCoW3C7-BUiIA5tFoyFCfTQtcXPspu5oEMEV4di3Nt_Yi745MUkeNh0md-7Y23aDRsh5Ujw4WSB5rmr3qwzsiwNJP55hMwuhLmcSneoIvZFacODcBd14y6Il6-XW_UQeQG6SpsG1wes/s1600/brem.png" alt="Bremsstrahlung"></p>
<p data-line="32" class="code-line">But that's not all. Under certain circumstances, the electron is captured into the proton orbit and the known system, the hydrogen atom appears. What do we know about such system? Quite a lot. In particular, this system has a discrete set of energy levels and a line spectrum of radiation, formed during the transition between each pair of stationary states.</p>
<p data-line="34" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgG_HHzkelYwWrI6xQ_DN9tEBO9hA_De7k_nXvENOK5mohe1yujcchGpQWSV8O-rFxeamKNBsoUKfjDJ9itx9t4tp_BxogXA-Lv2W2ggFwG32O9jI_Cszd9iwTAqNFoUNHrkyHhqz2EbpU/s1600/atom.png" alt="Hydrogen Atom"></p>
<p data-line="36" class="code-line">And now let's take a look at this picture from a different angle. Initially, we had two particles: a proton and an electron. The particles themselves do not radiate (they simply can not), they do not exhibit any discreteness, and generally, behave in a relaxed manner. But the picture changes dramatically when they find each other. There are new, completely unique properties - continuous and discrete spectrum, stationary states, the minimum energy level of the system. In reality, of course, everything is much more complicated and interesting:</p>
<p data-line="38" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSgGIo04RGGvT9ultIis_Z028OspN9cUIP_te2sbcXDydbNsr8hIKu5sxyGAUSLwrCnJKQhJqZfxir7UtmlK0p6c0Y7OY0r2f56G11FGUspshsCLQjkVj26ZsUtWW6osjzJYB5htsb5Dw/s1600/h_betta.png" alt="Hydrogen H-beta"></p>
<p data-line="40" class="code-line"><sup><em>Asymmetric Stark broadening of the Hβ, hydrogen atom, <a href="https://www.researchgate.net/profile/Dragan_Nikolic2/publication/252317673_Experimental_and_Theoretical_Analysis_of_Central_Hb_Asymmetry/links/5658034308ae1ef9297bf662/Experimental-and-Theoretical-Analysis-of-Central-Hb-Asymmetry.pdf">Phys. Rev. E 79 (2009)</a></em></sup></p>
<p data-line="42" class="code-line">These arguments can be continued. For example, if you put two hydrogen atoms close together, you get a stable configuration called a <em>hydrogen molecule</em>. Here, electron-vibrational-rotational energy levels are appeared with specific changes in the spectrum, the appearance of P, Q, R branches and much more.</p>
<p data-line="44" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoA3uHXD0ZtJfY_vTXGWms-kJNAagSU3BSuukV3QS0JEfGI1oWh6EbKeVl3f3G9Wip0A9d3NXcNqt8bZtYMWy1Hm2ZRJ_i6UrRx0IIp-RXMb72otCtdu-Z2PSR6G7RrxJR1JgWXwC9qdI/s1600/molecule.png" alt="Hydrogen Molecule"></p>
<p data-line="46" class="code-line">How so? Is not the system described by its parts? No! This is the essence of the fact that when the physical system becomes more complex, qualitative changes occur, not described in each part separately.</p>
<p data-line="48" class="code-line">The synergy of interaction is manifested in many areas of scientific knowledge. That is why chemistry is not reduced to physics, and biology to chemistry. Despite the most powerful achievements of quantum mechanics, nevertheless, chemistry as a branch of scientific knowledge exists separately from physics. It is interesting to note the fact that there are areas of knowledge at the intersection of science, for example, <a href="https://en.wikipedia.org/wiki/Quantum_chemistry"><em>quantum chemistry</em></a>. What does it say? With the complication of the system, new areas of research appear that were not at the previous level. We have to take into account new circumstances, introduce additional qualitative factors, complicating each time an already difficult model for describing the quantum mechanical system.</p>
<p data-line="50" class="code-line">The described metamorphoses can be reversed: if it is necessary to obtain a complex system obtained from the simplest components, then these components should have a synergistic principle. In particular, we all know that any problem can be solved by introducing an additional level of abstraction. Except for the problem of the number of abstractions and the resulting complexity. And only the synergy of abstractions makes it possible to reduce their number.</p>
<p data-line="52" class="code-line">Unfortunately, quite often our programs do not show the described synergistic properties. Unfortunately, it is not true for bugs: there are new, hitherto unprecedented glitches, which were not detected in the previous stage. And I would like, that the application was not described by a set of parts and libraries of the program, but was something unique and grandiose.</p>
<p data-line="54" class="code-line">Let's now try to get into the essence of OOP and coroutines for obtaining new and surprising properties of their synthesis with the purpose of creating a generalized interaction model.</p>
<h2 data-line="56" class="code-line" id="object-oriented-programming">Object-Oriented Programming</h2>
<p data-line="58" class="code-line">Let's consider OOP. What do we know about it? Encapsulation, inheritance, polymorphism? <a href="https://en.wikipedia.org/wiki/SOLID_(object-oriented_design)">SOLID principles</a>? And let's ask Alan Kay, who introduced this concept:</p>
<blockquote data-line="60" class="code-line">
<p data-line="60" class="code-line">Actually I made up the term "object-oriented", and I can tell you I did not have C++ in mind.</p>
<p data-line="62" class="code-line"><em>Alan Kay</em>.</p>
</blockquote>
<p data-line="64" class="code-line">This is a serious blow for C++ programmers. I felt sad for the language. But what did he mean? Let's sort out.</p>
<p data-line="66" class="code-line">The concept of objects was introduced in the mid-1960s with the appearance of the <a href="https://en.wikipedia.org/wiki/Simula">Simula-67</a> language. This language introduced concepts such as an object, virtual methods and coroutines (!). Then in the 1970s, the <a href="https://en.wikipedia.org/wiki/Smalltalk">Smalltalk</a> language, influenced by Simula-67, enhanced the idea of objects and introduced the term <em>object-oriented programming</em>. It was there that the foundations of what we now call the OOP were laid. Alan Kay himself had commented on his sentence:</p>
<blockquote data-line="68" class="code-line">
<p data-line="68" class="code-line">I'm sorry that I long ago coined the term "objects" for this topic because it gets many people to focus on the lesser idea. The big idea is "messaging".</p>
<p data-line="70" class="code-line"><em>Alan Kay</em>.</p>
</blockquote>
<p data-line="72" class="code-line">If you remember Smalltalk, it becomes clear what it means. This language uses the sending of messages (see also <a href="https://en.wikipedia.org/wiki/Objective-C">Objective-C</a>). This mechanism worked, but it was rather slow. Therefore, in the future, we followed the path of the Simula language and replaced the sending of messages to ordinary function calls, as well as calls to virtual functions through a table of these virtual functions to support runtime binding.</p>
<p data-line="74" class="code-line">To return to the origins of OOP, let's take a fresh look at classes and methods in C++. To do this, let's consider the <code>Reader</code> class that reads data from a source and returns a <code>Buffer</code> object:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">class</span> Reader
{
<span class="hljs-keyword">public</span>:
<span class="hljs-function">Buffer <span class="hljs-title">read</span><span class="hljs-params">(Range range, <span class="hljs-keyword">const</span> Options& options)</span></span>;
<span class="hljs-comment">// and other methods...</span>
};
</div></code></pre>
<p data-line="86" class="code-line">In this case, I will only be interested in the <code>read</code> method. This method can be converted to the next, almost equivalent, call:</p>
<pre class="hljs"><code><div><span class="hljs-function">Buffer <span class="hljs-title">read</span><span class="hljs-params">(Reader* <span class="hljs-keyword">this</span>,
Range range,
<span class="hljs-keyword">const</span> Options& options)</span></span>;
</div></code></pre>
<p data-line="94" class="code-line">We simply turned the method call of an object into a stand-alone function. This is what the compiler is doing when it compiles our code into machine code. However, this way leads us to the opposite way, to be precise, in the direction of the C language. Here the OOP does not smell even close, so let's go the other way.</p>
<p data-line="96" class="code-line">How do we call the <code>read</code> method? For example:</p>
<pre class="hljs"><code><div>Reader reader;
<span class="hljs-keyword">auto</span> buffer = reader.read(range, options);
</div></code></pre>
<p data-line="103" class="code-line">Let's transform the <code>read</code> method call as follows:</p>
<pre class="hljs"><code><div>reader
<- read(range, options)
-> buffer;
</div></code></pre>
<p data-line="111" class="code-line">This code means the following. An object named <code>reader</code> has the input <code>read(range, options)</code>, and on the output, the <code>reader</code> provides an object <code>buffer</code>.</p>
<p data-line="113" class="code-line">What can be <code>read(range, options)</code> and <code>buffer</code>? Some input and output messages:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> InReadMessage
{
Range range;
Options options;
};
<span class="hljs-keyword">struct</span> OutReadMessage
{
Buffer buffer;
};
reader
<- InReadMessage{range, options}
-> OutReadMessage;
</div></code></pre>
<p data-line="132" class="code-line">This transformation gives us a slightly different understanding of what is happening: instead of calling a function, we <em>synchronously</em> send an <code>InReadMessage</code> message and then wait for the response message of <code>OutReadMessage</code>. Why synchronously? Because the semantics of the call implies that we are waiting for an answer at the point of invocation. However, generally speaking, the response message in the place of a call may not wait, then it will be an <em>asynchronous</em> sending the message.</p>
<p data-line="134" class="code-line">Thus, all methods can be represented as handlers of different types of messages. And our object automatically dispatches received messages, carrying out static pattern matching through the mechanism of the declaration of various methods and overloading of the same methods with different types of input parameters.</p>
<h3 data-line="136" class="code-line" id="message-interception-and-action-transformation">Message Interception and Action Transformation</h3>
<p data-line="138" class="code-line">We will work on our messages. How can we pack a message for the followed transformation? For that purpose we will use the adapter:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span><<span class="hljs-keyword">typename</span> T_base>
<span class="hljs-keyword">struct</span> ReaderAdapter : T_base
{
<span class="hljs-function">Buffer <span class="hljs-title">read</span><span class="hljs-params">(Range range, <span class="hljs-keyword">const</span> Options& options)</span>
</span>{
<span class="hljs-keyword">return</span> T_base::call([range, options](Reader& reader) {
<span class="hljs-keyword">return</span> reader.read(range, options);
});
}
};
</div></code></pre>
<p data-line="153" class="code-line">Now, when the <code>read</code> method is called, the call is wrapped in a lambda and transferred to the <code>T_base::call</code> of the base class. In this case, a lambda is a functional object that will pass its closure to our descendant object <code>T_base</code>, automatically dispatching it. This lambda is our message, which we pass on transforming the actions.</p>
<p data-line="155" class="code-line">The simplest way to apply the transformation is to synchronize access to the object:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span><<span class="hljs-keyword">typename</span> T_base, <span class="hljs-keyword">typename</span> T_locker>
<span class="hljs-keyword">struct</span> BaseLocker : <span class="hljs-keyword">private</span> T_base
{
<span class="hljs-keyword">protected</span>:
<span class="hljs-keyword">template</span><<span class="hljs-keyword">typename</span> F>
<span class="hljs-function"><span class="hljs-keyword">auto</span> <span class="hljs-title">call</span><span class="hljs-params">(F&& f)</span>
</span>{
<span class="hljs-built_in">std</span>::unique_lock<T_locker> <span class="hljs-number">_</span>{lock_};
<span class="hljs-keyword">return</span> f(<span class="hljs-keyword">static_cast</span><T_base&>(*<span class="hljs-keyword">this</span>));
}
<span class="hljs-keyword">private</span>:
T_locker lock_;
};
</div></code></pre>
<p data-line="174" class="code-line">Inside the <code>call</code> method, locking <code>lock_</code> takes place and a subsequent lambda is called on an instance of the base class <code>T_base</code>, which allows further transformations, if necessary.</p>
<p data-line="176" class="code-line">Let's try to use this functionality:</p>
<pre class="hljs"><code><div><span class="hljs-comment">// create instance</span>
ReaderAdapter<BaseLocker<Reader, <span class="hljs-built_in">std</span>::mutex>> reader;
<span class="hljs-keyword">auto</span> buffer = reader.read(range, options);
</div></code></pre>
<p data-line="185" class="code-line">What's going on here? Instead of using the <code>Reader</code> directly, we now replace the object with the <code>ReaderAdapter</code> template. This adapter, when calling the <code>read</code> method, creates a message in the form of a lambda and passes it on, where the lock is already automatically taken and released strictly for the duration of this operation. At the same time, we exactly preserve the original interface of the <code>Reader</code> class!</p>
<p data-line="187" class="code-line">This approach can be generalized by using a <a href="http://gridem.blogspot.com/2015/11/replicated-object-part-2-god-adapter.html"><em>god adapter</em></a>.</p>
<p data-line="189" class="code-line">The corresponding code with the <em>god adapter</em> will look like this:</p>
<pre class="hljs"><code><div>DECL_ADAPTER(Reader, read)
AdaptedLocked<Reader, <span class="hljs-built_in">std</span>::mutex> reader;
</div></code></pre>
<p data-line="197" class="code-line">Here the adapter intercepts each method of the <code>Reader</code> class specified in the <code>DECL_ADAPTER</code> list, in this case, <code>read</code> method, and then <code>AdaptedLocked</code> adapts already intercepted message by applying locking synchronization based on <code>std::mutex</code>. More details about this are described in the above-mentioned article, so here I will not consider the approach in detail.</p>
<h2 data-line="199" class="code-line" id="coroutines">Coroutines</h2>
<p data-line="201" class="code-line">We have considered OOP and obtained a little understanding. Now let's go from the other side and talk about coroutines.</p>
<p data-line="203" class="code-line">What are coroutines? In short, these are functions that can be interrupted at any place, and then can be continued from that place, i.e. freeze the current execution and restore it from the suspended point. In this sense, they are very similar to threads: the operating system can also freeze them at any time and switch to another thread. For example, because we consume too much CPU time.</p>
<p data-line="205" class="code-line">But what is the difference between threads and coroutines then? The difference is that we can switch between our coroutines in the user space at any time by ourselves without involving the kernel. Firstly, that increases the performance, because there is no need to switch processor rings, contexts etc., and secondly, to add more interesting ways of interaction, which will be discussed below in detail.</p>
<p data-line="207" class="code-line">Some interesting ways of interaction can be read in <a href="https://kukuruku.co/post/asynchronous-programming-back-to-the-future/">my previous articles</a> about <a href="https://kukuruku.co/post/asynchronous-programming-part-2-teleportation-through-portals/">asynchrony</a>.</p>
<h3 data-line="209" class="code-line" id="cospinlock">CoSpinLock</h3>
<p data-line="211" class="code-line">Consider the following piece of code:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">namespace</span> synca {
<span class="hljs-keyword">struct</span> Spinlock
{
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">lock</span><span class="hljs-params">()</span>
</span>{
<span class="hljs-keyword">while</span> (lock_.test_and_set(<span class="hljs-built_in">std</span>::memory_order_acquire)) {
reschedule();
}
}
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">unlock</span><span class="hljs-params">()</span>
</span>{
lock_.clear(<span class="hljs-built_in">std</span>::memory_order_release);
}
<span class="hljs-keyword">private</span>:
<span class="hljs-built_in">std</span>::atomic_flag lock_ = ATOMIC_FLAG_INIT;
};
} <span class="hljs-comment">// namespace synca</span>
</div></code></pre>
<p data-line="237" class="code-line">The above code looks like an ordinary spinlock. Indeed, inside the <code>lock</code> method, we are trying to atomically switch the flag value from <code>false</code> to <code>true</code>. If we succeeded, then the lock is taken, and it was taken by current execution, so it is possible to perform the necessary atomic actions under the obtained lock. On unlocking, we simply reset the flag back to the initial value <code>false</code>.</p>
<p data-line="239" class="code-line">All the difference lies in the implementation of the backoff strategy. Either an exponential randomized backoff or the transfer of control to the operating system via <code>std::this_thread::yield()</code> are often used. The code above is a bit trickier: instead of warming up the processor or transferring control to the operating system's scheduler, I simply reschedule our coroutine to a later execution via the <code>synca::reschedule</code> invocation. At the same time, the current execution becomes frozen, and the scheduler launches another coroutine ready for execution. This is very similar to <code>std::this_thread::yield()</code>, except that instead of switching to kernel space, we are keeping execution in user space and continue to do some meaningful work without increase of entropy of space.</p>
<p data-line="241" class="code-line">Adapter application is the following:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T>
<span class="hljs-keyword">using</span> CoSpinlock = AdaptedLocked<T, synca::Spinlock>;
CoSpinlock<Reader> reader;
<span class="hljs-keyword">auto</span> buffer = reader.read(range, options);
</div></code></pre>
<p data-line="251" class="code-line">As you can see, the code usage and semantics have not been changed, but the behavior has been changed.</p>
<h3 data-line="253" class="code-line" id="comutex">CoMutex</h3>
<p data-line="255" class="code-line">The same trick can be performed with an ordinary mutex, turning it into an asynchronous one based on coroutines. To do this, you should add the waiting queue of coroutines and start them sequentially at the time the lock is released. This can be illustrated by the following scheme:</p>
<p data-line="257" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPA7X6gIr8A5OWdY0hNVyC3CNphndHc-m8JoS7OXQDaKOuK4QrOm2_5wXpMFldomek_qcPI_ToEoq8_Brc0pIDNRHvF0SiLNHeqEsXp9ED4696bIsst6U-c_LJ0ZOBQr3Rkh0zTO6LlMg/s1600/mutex_alt.png" alt="Mutex"></p>
<p data-line="259" class="code-line">I'm not going to provide the full implementation code here. Those who wish can read it independently. I will give only the usage example:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T>
<span class="hljs-keyword">using</span> CoMutex = AdaptedLocked<T, synca::Mutex>;
CoMutex<Reader> reader;
<span class="hljs-keyword">auto</span> buffer = reader.read(range, options);
</div></code></pre>
<p data-line="269" class="code-line">Such mutex has the semantics of a regular mutex, but it does not block the thread execution, forcing the coroutine scheduler to perform useful work without switching to kernel space. <code>CoMutex</code>, unlike <code>CoSpinlock</code>, provides a FIFO-guarantee, i.e. provides fair concurrent access to the object.</p>
<h3 data-line="271" class="code-line" id="coserializedportal">CoSerializedPortal</h3>
<p data-line="273" class="code-line">In the article <a href="https://kukuruku.co/post/asynchronous-programming-part-2-teleportation-through-portals/">Asynchrony 2: Teleportation Through Portals</a>, the task of switching context between different schedulers through the use of teleportation and portals was considered in detail. Here I briefly describe the mentioned approach.</p>
<p data-line="275" class="code-line">Consider an example where we need to switch the coroutine from one thread to another. For this, we can freeze the current state of our coroutine in the source thread, and then schedule the coroutine by resuming it in another thread:</p>
<p data-line="277" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNzYTIYJ5C-Jh520Dv-MXyW8dgc4yivS1sgygqoytDB46wZ0OKA_DMV8WBNdq2N15CXUO9oryoAF3UDw_J_XqWZf-LX5JpGmT_bbrIfjZfMVG1HE57tXyYS14jm4IwLkhKe5O58my4gi0/s1600/teleport.png" alt="Teleport"></p>
<p data-line="279" class="code-line">This is exactly what corresponds to switching execution from one thread to another. The program provides an additional level of abstraction between the code and the thread, allowing you to manipulate the current execution and perform various tricks. Switching between different schedulers is known as <em>teleportation</em>.</p>
<p data-line="281" class="code-line">If we need to switch to another scheduler first, and then go back - then <em>portal</em> appears. The portal constructor teleports to destination scheduler, and destructor teleports to the original one. This portal object guarantees a return to the original execution context even in case of thrown exception due to RAII semantics.</p>
<p data-line="283" class="code-line">Accordingly, there is a simple idea: create a single threaded scheduler and reschedule our coroutines through portals:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T_base>
<span class="hljs-keyword">struct</span> BaseSerializedPortal : T_base
{
<span class="hljs-comment">// create thread pool with single thread</span>
BaseSerializedPortal() : tp_(<span class="hljs-number">1</span>) {}
<span class="hljs-keyword">protected</span>:
<span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> F>
<span class="hljs-function"><span class="hljs-keyword">auto</span> <span class="hljs-title">call</span><span class="hljs-params">(F&& f)</span>
</span>{
<span class="hljs-comment">// Portal constructor teleports to created scheduler</span>
synca::Portal <span class="hljs-number">_</span>{tp_};
<span class="hljs-keyword">return</span> f(<span class="hljs-keyword">static_cast</span><T_base&>(*<span class="hljs-keyword">this</span>));
<span class="hljs-comment">// Portal destructor returns back to original scheduler</span>
}
<span class="hljs-keyword">private</span>:
mt::ThreadPool tp_;
};
CoSerializedPortal<Reader> reader;
</div></code></pre>
<p data-line="309" class="code-line">It is clear that used scheduler will serialize our actions, and therefore synchronize them with each other. In this case, if the thread pool provides FIFO-guarantees, then <code>CoSerializedPortal</code> will have the same guarantee.</p>
<h3 data-line="311" class="code-line" id="coalone">CoAlone</h3>
<p data-line="313" class="code-line">The previous approach with portals can be used somewhat differently. To do this, we will use another scheduler: <code>synca::Alone</code>.</p>
<p data-line="315" class="code-line">This scheduler has the following wonderful property: at any time, no more than one task of this scheduler can be executed. Thus, <code>synca::Alone</code> guarantees that no handler will be started in parallel with the other. If there are tasks then only one of them will be executed. If there is no task then nothing happens. It is clear that this approach serializes the actions, which means that access through this scheduler will be synchronized. Semantically, it's very similar to <code>CoSerializedPortal</code>. It should be noted, however, that such scheduler runs its tasks on a certain thread pool, i.e. it does not create any new threads on its own, but works on existing ones.</p>
<p data-line="317" class="code-line">For more details, I recommend the reader to look through the original article <a href="https://kukuruku.co/post/asynchronous-programming-part-2-teleportation-through-portals/">Asynchrony 2: Teleportation Through Portals</a>.</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T_base>
<span class="hljs-keyword">struct</span> BaseAlone : T_base
{
BaseAlone(mt::IScheduler& scheduler)
: alone_{scheduler} {}
<span class="hljs-keyword">protected</span>:
<span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> F>
<span class="hljs-function"><span class="hljs-keyword">auto</span> <span class="hljs-title">call</span><span class="hljs-params">(F&& f)</span>
</span>{
<span class="hljs-comment">// Alone is scheduler thus we reuse Portal</span>
synca::Portal <span class="hljs-number">_</span>{alone_};
<span class="hljs-keyword">return</span> f(<span class="hljs-keyword">static_cast</span><T_base&>(*<span class="hljs-keyword">this</span>));
}
<span class="hljs-keyword">private</span>:
synca::Alone alone_;
};
CoAlone<Reader> reader;
</div></code></pre>
<p data-line="342" class="code-line">The only difference in implementation compared to <code>CoSerializedPortal</code> is the replacement of the <code>mt::ThreadPool</code> by <code>synca::Alone</code>.</p>
<h3 data-line="344" class="code-line" id="cochannel">CoChannel</h3>
<p data-line="346" class="code-line">Let's introduce the concept of the channel based on coroutines. Ideologically, it is similar to the <a href="https://gobyexample.com/channels">channels in Go language</a>, i.e. it is a queue (not necessarily, by the way, the bounded queue as it is implemented in Go), where multiple producers can put the data into the queue and multiple consumers can extract the data simultaneously without additional synchronization. Simply, the channel is just a pipe into which you can add and then extract the messages without the race condition.</p>
<p data-line="348" class="code-line">The idea of using the channel is that users of our objects write messages to the channel, and the consumer is a specially created coroutine that reads out messages in an infinite loop and dispatches it into the appropriate method.</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T_base>
<span class="hljs-keyword">struct</span> BaseChannel : T_base
{
BaseChannel()
{
<span class="hljs-comment">// create coroutine and run message loop</span>
synca::go([&] { loop(); });
}
<span class="hljs-keyword">private</span>:
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">loop</span><span class="hljs-params">()</span>
</span>{
<span class="hljs-comment">// message loop,</span>
<span class="hljs-comment">// it automatically breaks on channel closing</span>
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">auto</span>&& action : channel_) {
action();
}
}
synca::Channel<Handler> channel_;
};
CoChannel<Reader> reader;
</div></code></pre>
<p data-line="376" class="code-line">There are two questions: the first and the second.</p>
<ol>
<li data-line="378" class="code-line">What is <code>Handler</code>?</li>
<li data-line="379" class="code-line">Where are dispatching and pattern matching?</li>
</ol>
<p data-line="381" class="code-line"><code>Handler</code> is just <code>std::function<void()></code>. All the magic does not happen here, but how this <code>Handler</code> is created for automatic dispatching.</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T_base>
<span class="hljs-keyword">struct</span> BaseChannel : T_base
{
<span class="hljs-keyword">protected</span>:
<span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> F>
<span class="hljs-function"><span class="hljs-keyword">auto</span> <span class="hljs-title">call</span><span class="hljs-params">(F&& f)</span>
</span>{
<span class="hljs-comment">// intercept the call and write it to fun</span>
<span class="hljs-keyword">auto</span> fun = [&] { <span class="hljs-keyword">return</span> f(<span class="hljs-keyword">static_cast</span><T_base&>(*<span class="hljs-keyword">this</span>)); };
<span class="hljs-comment">// crutched result of a function call</span>
WrappedResult<<span class="hljs-keyword">decltype</span>(fun())> result;
channel_.put([&] {
<span class="hljs-keyword">try</span> {
<span class="hljs-comment">// write the result in case of no exceptions</span>
result.<span class="hljs-built_in">set</span>(wrap(fun));
} <span class="hljs-keyword">catch</span> (<span class="hljs-built_in">std</span>::exception&) {
<span class="hljs-comment">// otherwise write the catched exception</span>
result.setCurrentError();
}
<span class="hljs-comment">// wake up the suspended coroutine</span>
synca::done();
});
<span class="hljs-comment">// suspend to wait for the result</span>
synca::wait();
<span class="hljs-comment">// either return the result or throw catched exception</span>
<span class="hljs-keyword">return</span> result.get().unwrap();
}
};
</div></code></pre>
<p data-line="414" class="code-line">Here, fairly simple actions occur: the intercepted method call inside the <code>f</code> functor is wrapped in <code>WrappedResult</code>, the call is put to the channel and the current coroutine freezes. We call this pending call inside the <code>BaseChannel::loop</code> method, thereby filling the result and resuming the suspended coroutine.</p>
<p data-line="416" class="code-line">It is worth saying a few words about the <code>WrappedResult</code> class. This class serves several purposes:</p>
<ol>
<li data-line="418" class="code-line">It allows you to store either the result of the call or the caught exception.</li>
<li data-line="419" class="code-line">In addition, he solves the following problem. The point is that if the function does not return any values (that is, returns the type <code>void</code>), then the construction with the assignment of the result without the wrapper would be incorrect. Indeed, you can not just write <code>void</code> to the <code>void</code> variable. However, it is allowed to use <code>void</code> type together with <code>return</code>, which is used by the <code>WrappedResult<void></code> specialization through invocation <code>.get().unwrap()</code>.</li>
</ol>
<p data-line="421" class="code-line">As a result, we have synchronized access to any object method through the channel handler with the captured method arguments. In this case, all methods are processed in a separate, isolated coroutine, which ensures the serialized execution of the handlers mutating the object state.</p>
<h3 data-line="423" class="code-line" id="ordinary-asynchrony">Ordinary Asynchrony</h3>
<p data-line="425" class="code-line">Let's try, for the sake of interest, to implement the same behavior without the adapter and coroutines in order to demonstrate most clearly all the power and strength of the applied abstractions.</p>
<p data-line="427" class="code-line">To do this, consider the implementation of an asynchronous spinlock:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> AsyncSpinlock
{
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">lock</span><span class="hljs-params">(<span class="hljs-built_in">std</span>::function<<span class="hljs-keyword">void</span>()</span>> cb)
</span>{
<span class="hljs-keyword">if</span> (lock_.test_and_set(<span class="hljs-built_in">std</span>::memory_order_acquire)) {
<span class="hljs-comment">// lock was not granted => reschedule it</span>
currentScheduler().schedule(
[<span class="hljs-keyword">this</span>, cb = <span class="hljs-built_in">std</span>::move(cb)]() <span class="hljs-keyword">mutable</span> {
lock(<span class="hljs-built_in">std</span>::move(cb));
});
} <span class="hljs-keyword">else</span> {
cb();
}
}
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">unlock</span><span class="hljs-params">()</span>
</span>{
lock_.clear(<span class="hljs-built_in">std</span>::memory_order_release);
}
<span class="hljs-keyword">private</span>:
<span class="hljs-built_in">std</span>::atomic_flag lock_ = ATOMIC_FLAG_INIT;
};
</div></code></pre>
<p data-line="455" class="code-line">Here the standard interface of the spinlock has changed. This interface has become more cumbersome and less enjoyable.</p>
<p data-line="457" class="code-line">Now implement the <code>AsyncSpinlockReader</code> class, which will use our asynchronous spinlock:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> AsyncSpinlockReader
{
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">read</span><span class="hljs-params">(Range range, <span class="hljs-keyword">const</span> Options& options,
<span class="hljs-built_in">std</span>::function<<span class="hljs-keyword">void</span>(<span class="hljs-keyword">const</span> Buffer&)</span>> cbBuffer)
</span>{
spinlock_.lock(
[<span class="hljs-keyword">this</span>, range, options, cbBuffer = <span class="hljs-built_in">std</span>::move(cbBuffer)] {
<span class="hljs-keyword">auto</span> buffer = reader_.read(range, options);
<span class="hljs-comment">// it's cool that unlock is synchronous</span>
<span class="hljs-comment">// otherwise we could see funny ladder of lambdas</span>
spinlock_.unlock();
cbBuffer(buffer);
});
}
<span class="hljs-keyword">private</span>:
AsyncSpinlock spinlock_;
Reader reader_;
}
</div></code></pre>
<p data-line="481" class="code-line">As we see from the <code>read</code> method, the asynchronous spinlock <code>AsyncSpinlock</code> will necessarily break existing interfaces of our classes.</p>
<p data-line="483" class="code-line">And now consider the use of:</p>
<pre class="hljs"><code><div><span class="hljs-comment">// instead of</span>
<span class="hljs-comment">// CoSpinlock<Reader> reader;</span>
<span class="hljs-comment">// auto buffer = reader.read(range, options);</span>
AsyncSpinlockReader reader;
reader.read(buffer, options, [](<span class="hljs-keyword">const</span> Buffer& buffer) {
<span class="hljs-comment">// buffer is transferred as an input parameter</span>
<span class="hljs-comment">// we need to carefully transfer the execution context here</span>
});
</div></code></pre>
<p data-line="497" class="code-line">Let's assume for a minute that <code>Spinlock::unlock</code> and the <code>Reader::read</code> method are also asynchronous. In this, it is easy enough to believe, if we assume that the <code>Reader</code> pulls data over the network, and instead of <code>Spinlock</code>, for example, portals are used. Then:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> SuperAsyncSpinlockReader
{
<span class="hljs-comment">// error handling is deliberately omitted here,</span>
<span class="hljs-comment">// otherwise the brain will change its state of aggregation</span>
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">read</span><span class="hljs-params">(Range range, <span class="hljs-keyword">const</span> Options& options,
<span class="hljs-built_in">std</span>::function<<span class="hljs-keyword">void</span>(<span class="hljs-keyword">const</span> Buffer&)</span>> cb)
</span>{
spinlock_.lock(
[<span class="hljs-keyword">this</span>, range, options, cb = <span class="hljs-built_in">std</span>::move(cb)]() <span class="hljs-keyword">mutable</span> {
<span class="hljs-comment">// the first fail: read is asynchronous</span>
reader_.read(range, options,
[<span class="hljs-keyword">this</span>, cb = <span class="hljs-built_in">std</span>::move(cb)](<span class="hljs-keyword">const</span> Buffer& buffer) <span class="hljs-keyword">mutable</span> {
<span class="hljs-comment">// the second fail: spinlock is asynchronous</span>
spinlock_.unlock(
[buffer, cb = <span class="hljs-built_in">std</span>::move(cb)] {
<span class="hljs-comment">// the end of cool ladder</span>
cb(buffer);
});
});
});
}
<span class="hljs-keyword">private</span>:
AsyncSpinlock spinlock_;
AsyncNetworkReader reader_;
}
</div></code></pre>
<p data-line="528" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEieXnAMNp12ZjPD7YkrhO4Cspzh3CACckJLnNeda-41_0gFJYOSHKdgfjR2gYIWzik9_05WF7WkdP_n8K-9aB5CZPSz9oVv9VK7DSbKozSvJtQ6AZkTyjbl-iLm-4vS5yI6TbVLGFBjHLY/s600/i_tak_eng.jpg" alt="okay"></p>
<p data-line="530" class="code-line">Such a straightforward approach seems to hint that it will only get worse because the working code tends to grow and becomes more complicated.</p>
<p data-line="532" class="code-line">Naturally, the correct approach using coroutines makes such a synchronization scheme simple and understandable.</p>
<h3 data-line="534" class="code-line" id="non-invasive-asynchrony">Non-Invasive Asynchrony</h3>
<p data-line="536" class="code-line">All considered synchronization primitives are <em>implicitly asynchronous</em>. The matter is that in case of the already locked resource with concurrent access our coroutine suspends to wake up at the moment of releasing the locking by another coroutine. If we used the so-called <em>stackless coroutines</em>, which are still marinated in the new standard, we would have to use the keyword <code>co_await</code>. And this, in turn, means that each (!) call of any method wrapped through the synchronization adapter should add <code>co_await</code>, changing the semantics and interfaces:</p>
<pre class="hljs"><code><div><span class="hljs-comment">// no synchronization</span>
<span class="hljs-function">Buffer <span class="hljs-title">baseRead</span><span class="hljs-params">()</span>
</span>{
Reader reader;
<span class="hljs-keyword">return</span> reader.read(range, options);
}
<span class="hljs-comment">// callback-style</span>
<span class="hljs-comment">// interface and semantics are changed</span>
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">baseRead</span><span class="hljs-params">(<span class="hljs-built_in">std</span>::function<<span class="hljs-keyword">void</span>(<span class="hljs-keyword">const</span> Buffer& buffer)</span>> cb)
</span>{
AsyncReader reader;
reader.read(range, options, cb);
}
<span class="hljs-comment">// stackless coroutines</span>
<span class="hljs-comment">// interface is changed, asynchronous behavior is added explicitly</span>
<span class="hljs-keyword">future_t</span><Buffer> standardPlannedRead()
{
CoMutex<Reader> reader;
<span class="hljs-keyword">return</span> co_await reader.read(range, options);
}
<span class="hljs-comment">// stackful coroutines</span>
<span class="hljs-comment">// no interface changes</span>
<span class="hljs-function">Buffer <span class="hljs-title">myRead</span><span class="hljs-params">()</span>
</span>{
CoMutex<Reader> reader;
<span class="hljs-keyword">return</span> reader.read(range, options);
}
</div></code></pre>
<p data-line="571" class="code-line">Here, when using the <em>stackless</em> approach, all interfaces in the call chain become broken. In this case, there can not be any transparency, because you can not just replace the <code>Reader</code> with <code>CoMutex<Reader></code>. This invasive approach significantly limits the scope and applicability of the <em>stackless coroutines</em>.</p>
<p data-line="573" class="code-line">At the same time, the approach of <em>stackful coroutines</em> completely eliminates the issue mentioned above.</p>
<p data-line="575" class="code-line">Thus you have a unique choice:</p>
<ol>
<li data-line="577" class="code-line">Use invasive breaking approach tomorrow (3 years, perhaps).</li>
<li data-line="578" class="code-line">Use a non-invasive and clear approach today (or rather, yesterday).</li>
</ol>
<h3 data-line="580" class="code-line" id="hybrid-approaches">Hybrid Approaches</h3>
<p data-line="582" class="code-line">In addition to the above methods of synchronization, you can use the so-called hybrid approaches. The fact that part of the synchronization primitives is based on a scheduler that can be combined with a pool of threads for additional isolation of execution.</p>
<p data-line="584" class="code-line">Consider the synchronization through the portal:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T_base>
<span class="hljs-keyword">struct</span> BasePortal : T_base, <span class="hljs-keyword">private</span> synca::SchedulerRef
{
<span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span>... V>
BasePortal(mt::IScheduler& scheduler, V&&... v)
: T_base{<span class="hljs-built_in">std</span>::forward<V>(v)...}
, synca::SchedulerRef{scheduler} <span class="hljs-comment">// запоминаем планировщик</span>
{
}
<span class="hljs-keyword">protected</span>:
<span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> F>
<span class="hljs-function"><span class="hljs-keyword">auto</span> <span class="hljs-title">call</span><span class="hljs-params">(F&& f)</span>
</span>{
<span class="hljs-comment">// reschedule f(...) through the saved scheduler</span>
synca::Portal <span class="hljs-number">_</span>{scheduler()};
<span class="hljs-keyword">return</span> f(<span class="hljs-keyword">static_cast</span><T_base&>(*<span class="hljs-keyword">this</span>));
}
<span class="hljs-keyword">using</span> synca::SchedulerRef::scheduler;
};
</div></code></pre>
<p data-line="610" class="code-line">In the constructor of the base class of the adapter, we set the scheduler <code>mt::IScheduler</code>, and then reschedule our call <code>f(static_cast<T_base&>(*this))</code> through the portal stored in the scheduler. To use this approach we must first create a single threaded scheduler to synchronize our execution:</p>
<pre class="hljs"><code><div><span class="hljs-comment">// create single thread in thread pool to synchronize access</span>
mt::ThreadPool serialized{<span class="hljs-number">1</span>};
CoPortal<Reader> reader1{serialized};
CoPortal<Reader> reader2{serialized};
</div></code></pre>
<p data-line="619" class="code-line">Thus, both <code>Reader</code> instances are serialized through the same thread belonging to the <code>serialized</code> pool.</p>
<p data-line="621" class="code-line">You can use a similar approach for the isolation of execution for <code>CoAlone</code> and <code>CoChannel</code>:</p>
<pre class="hljs"><code><div><span class="hljs-comment">// because CoAlone и CoChannel synchronize the execution,</span>
<span class="hljs-comment">// thus the number of threads may be arbitrary</span>
mt::ThreadPool isolated{<span class="hljs-number">3</span>};
<span class="hljs-comment">// the synchronization will take place</span>
<span class="hljs-comment">// inside isolated thread pool</span>
CoAlone<Reader> reader1{isolated};
<span class="hljs-comment">// to read from the channel the coroutine will be created</span>
<span class="hljs-comment">// inside isolated thread pool</span>
CoChannel<Reader> reader2{isolated};
</div></code></pre>
<h2 data-line="637" class="code-line" id="subjector">Subjector</h2>
<p data-line="639" class="code-line">So, we have 5 different ways of non-blocking effective synchronization of object operations in the user space:</p>
<ol>
<li data-line="641" class="code-line"><code>CoSpinlock</code>.</li>
<li data-line="642" class="code-line"><code>CoMutex</code>.</li>
<li data-line="643" class="code-line"><code>CoSerializedPortal</code>.</li>
<li data-line="644" class="code-line"><code>CoAlone</code>.</li>
<li data-line="645" class="code-line"><code>CoChannel</code>.</li>
</ol>
<p data-line="647" class="code-line">All of these methods provide uniform access to the object. Let's take the final step to generalize the resulting code:</p>
<pre class="hljs"><code><div><span class="hljs-meta">#<span class="hljs-meta-keyword">define</span> BIND_SUBJECTOR(D_type, D_subjector, ...) \
template <span class="hljs-meta-string"><></span> \
struct subjector::SubjectorPolicy<span class="hljs-meta-string"><D_type></span> \
{ \
using Type = D_subjector<span class="hljs-meta-string"><D_type, ##__VA_ARGS__></span>; \
};</span>
<span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T>
<span class="hljs-keyword">struct</span> SubjectorPolicy
{
<span class="hljs-keyword">using</span> Type = CoMutex<T>;
};
<span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T>
<span class="hljs-keyword">using</span> Subjector = <span class="hljs-keyword">typename</span> SubjectorPolicy<T>::Type;
</div></code></pre>
<p data-line="667" class="code-line">Here we create a type <code>Subjector<T></code> that can later be specified by using one of the 5 behaviors. For example:</p>
<pre class="hljs"><code><div><span class="hljs-comment">// assume that Reader has 3 methods: read, open, close</span>
<span class="hljs-comment">// create adapter to intercept those methods</span>
DECL_ADAPTER(Reader, read, open, close)
<span class="hljs-comment">// then define that Reader must use CoChannel</span>
<span class="hljs-comment">// if we omit this line CoMutex will be used by default</span>
<span class="hljs-comment">// thus this line is optional</span>
BIND_SUBJECTOR(Reader, CoChannel)
<span class="hljs-comment">// here we use an already configured subjector -</span>
<span class="hljs-comment">// universal synchronization object</span>
Subjector<Reader> reader;
</div></code></pre>
<p data-line="684" class="code-line">If we want to use <code>Reader</code>, for example, in the isolated thread, then we only need to change one line:</p>
<pre class="hljs"><code><div>BIND_SUBJECTOR(Reader, CoSerializedPortal)
</div></code></pre>
<p data-line="690" class="code-line">This approach makes it possible to fine-tune the method of interaction after writing and completing the code and allows you to concentrate on issues of the day.</p>
<blockquote data-line="692" class="code-line">
<p data-line="692" class="code-line">If you’re using early-binding languages as most people do, rather than late-binding languages, then you really start getting locked in to stuff that you’ve already done. You can’t reformulate things that easily.</p>
<p data-line="694" class="code-line"><em>Alan Kay</em>.</p>
</blockquote>
<h2 data-line="696" class="code-line" id="asynchronous-call">Asynchronous Call</h2>
<p data-line="698" class="code-line">The above synchronization primitives utilized nonblocking synchronous invocation. Thus every time we are waiting for the task completion obtaining the result. This corresponds to the common semantics of object method calls. However, in some scenarios, it is useful to explicitly start a task asynchronously, without waiting for the result in order to parallelize the execution.</p>
<p data-line="700" class="code-line">Consider the following example:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">class</span> Network
{
<span class="hljs-keyword">public</span>:
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">send</span><span class="hljs-params">(<span class="hljs-keyword">const</span> Packet& packet)</span></span>;
};
DECL_ADAPTER(Network, send)
BIND_SUBJECTOR(Network, CoChannel)
</div></code></pre>
<p data-line="712" class="code-line">If we use the code:</p>
<pre class="hljs"><code><div><span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">sendPacket</span><span class="hljs-params">(<span class="hljs-keyword">const</span> Packet& packet)</span>
</span>{
Subjector<Network> network;
network.send(myPacket);
<span class="hljs-comment">// the next action will not start</span>
<span class="hljs-comment">// until the previous one is completed</span>
doSomeOtherStuff();
}
</div></code></pre>
<p data-line="725" class="code-line">then the action <code>doSomeOtherStuff()</code> does not start until the end of the execution <code>network.send()</code>. The following code can be used to asynchronously send a message:</p>
<pre class="hljs"><code><div><span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">sendPacket</span><span class="hljs-params">(<span class="hljs-keyword">const</span> Packet& packet)</span>
</span>{
Subjector<Network> network;
<span class="hljs-comment">// call using .async()</span>
network.async().send(myPacket);
<span class="hljs-comment">// next action will execute in parallel</span>
<span class="hljs-comment">// with the previous one</span>
doSomeOtherStuff();
}
</div></code></pre>
<p data-line="741" class="code-line">And voila - synchronous code turned into an asynchronous!</p>
<p data-line="743" class="code-line">It works like this. First, a special asynchronous wrapper for the adapter <code>BaseAsyncWrapper</code> is created by using <a href="https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern">strange recursive template</a> pattern:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T_derived>
<span class="hljs-keyword">struct</span> BaseAsyncWrapper
{
<span class="hljs-keyword">protected</span>:
<span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> F>
<span class="hljs-function"><span class="hljs-keyword">auto</span> <span class="hljs-title">call</span><span class="hljs-params">(F&& f)</span>
</span>{
<span class="hljs-keyword">return</span> <span class="hljs-keyword">static_cast</span><T_derived&>(*<span class="hljs-keyword">this</span>).asyncCall(<span class="hljs-built_in">std</span>::forward<F>(f));
}
};
</div></code></pre>
<p data-line="758" class="code-line">The call <code>.async()</code> is redirected to <code>BaseAsyncWrapper</code> that forwards the call back to the child class <code>T_derived</code>, but through the use of a method <code>asyncCall</code> instead of <code>call</code>. Thus, for our <code>Co</code>-objects, the method is sufficient to implement <code>asyncCall</code> in addition to <code>call</code> obtaining asynchronous functionality automatically.</p>
<p data-line="760" class="code-line">To implement <code>asyncCall</code> all the synchronization methods can be divided into two classes:</p>
<ul>
<li data-line="762" class="code-line">Initially <em>synchronous call</em>: <code>CoSpinlock</code>, <code>CoMutex</code>, <code>CoSerializedPortal</code>, <code>CoAlone</code>. To do this, we simply create a new coroutine and run our action on a given scheduler.</li>
</ul>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T_base>
<span class="hljs-keyword">struct</span> Go : T_base
{
<span class="hljs-keyword">protected</span>:
<span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> F>
<span class="hljs-function"><span class="hljs-keyword">auto</span> <span class="hljs-title">asyncCall</span><span class="hljs-params">(F&& f)</span>
</span>{
<span class="hljs-keyword">return</span> synca::go(
[ f = <span class="hljs-built_in">std</span>::move(f), <span class="hljs-keyword">this</span> ]() {
f(<span class="hljs-keyword">static_cast</span><T_base&>(*<span class="hljs-keyword">this</span>));
},
T_base::scheduler());
}
};
</div></code></pre>
<ul>
<li data-line="781" class="code-line">Initially <em>asynchronous call</em>: <code>CoChannel</code>. For this purpose, it is necessary to remove suspending/resuming and leave the original asynchronous call.</li>
</ul>
<pre class="hljs"><code><div><span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> T_base>
<span class="hljs-keyword">struct</span> BaseChannel : T_base
{
<span class="hljs-keyword">template</span> <<span class="hljs-keyword">typename</span> F>
<span class="hljs-function"><span class="hljs-keyword">auto</span> <span class="hljs-title">asyncCall</span><span class="hljs-params">(F&& f)</span>
</span>{
channel_.put([ f = <span class="hljs-built_in">std</span>::move(f), <span class="hljs-keyword">this</span> ] {
<span class="hljs-keyword">try</span> {
f(<span class="hljs-keyword">static_cast</span><T_base&>(*<span class="hljs-keyword">this</span>));
} <span class="hljs-keyword">catch</span> (<span class="hljs-built_in">std</span>::exception&) {
<span class="hljs-comment">// do nothing due to async call</span>
}
});
}
};
</div></code></pre>
<h2 data-line="801" class="code-line" id="characteristics">Characteristics</h2>
<p data-line="803" class="code-line">Various characteristics of these approaches are summarized in the following table:</p>
<p data-line="805" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg23BMxj-Gx6rFu9QdbeVqaT-sOcZy1MU6Dll4EIYyWGOYn3Tuf0fkDDLCNFpoOP8QzF8UC6NUF1sWiKqQQnqTFoAf4XwbQI4PcAgCE3Cew2tRw7bZkUzj5YTX9Of0CQ42tOiiy2nHIv_Q/s1600/table_eng_resized.png" alt="Table"></p>
<p data-line="807" class="code-line"><sup>1</sup><em>Using asynchronous call simultaneously with the hybrid approach</em>.<br/>
<sup>2</sup><em>Using the hybrid approach</em>.</p>
<p data-line="810" class="code-line">Let's consider each column in detail.</p>
<h3 data-line="812" class="code-line" id="lightness">Lightness</h3>
<p data-line="814" class="code-line"><code>CoSpinlock</code> is definitely the most lightweight entity under consideration. Indeed, it contains only atomic instructions and coroutine rescheduling in the case of the locked resource. <code>CoSpinlock</code> makes sense to use in situations of small locking time because otherwise, it starts loading the scheduler useless work to check the atomic variable with subsequent rescheduling. Other synchronization primitives use more heavyweight implementation to synchronize their objects, but do not load the scheduler in case of conflict.</p>
<h3 data-line="816" class="code-line" id="fifo">FIFO</h3>
<p data-line="818" class="code-line">FIFO, or first-in-first-out guarantee, is a guarantee of the queue. It is worth mention that if the application has the only scheduler and that scheduler provides a FIFO-guarantee, <code>CoSpinlock</code> does not give the FIFO-guarantee even in this case.</p>
<h3 data-line="820" class="code-line" id="deadlock-free">Deadlock-free</h3>
<p data-line="822" class="code-line">As the name implies, this synchronization primitive never leads to deadlocks. This guarantee is provided by scheduler-based primitives.</p>
<h3 data-line="824" class="code-line" id="continuity">Continuity</h3>
<p data-line="826" class="code-line">By this notion, I mean the continuity of the lock taken, as if the synchronization has been done by using a mutex. It turns out that continuity is closely connected with the property of deadlock-free. I would like to describe this topic in more details below, as it is important for the deep understanding of synchronization methods and represents particular practical interest.</p>
<h3 data-line="828" class="code-line" id="isolation">Isolation</h3>
<p data-line="830" class="code-line">The isolation property has already been used partially when <code>CoPortal</code> has been considered. Isolation is the property to execute a method in an isolated thread pool. Only <code>CoSerializedPortal</code> uses this property by default since it creates a thread pool with a single thread to synchronize execution. In synchronous execution, as has been previously discussed, such a property can also have primitives based on scheduler: <code>CoAlone</code> and <code>CoChannel</code>. In case of asynchronous invocation, the execution has been forked. This task is solved by a scheduler, which means there is a possibility of isolation of the code and other methods.</p>
<h3 data-line="832" class="code-line" id="asynchrony">Asynchrony</h3>
<p data-line="834" class="code-line">All methods, except for <code>CoChannel</code> use the current coroutine to run a synchronized action. Only <code>CoChannel</code> runs the operation in parallel and waits for the result. Thus the native method of execution for this synchronization primitive is an asynchronous task invocation. It means that <code>CoChannel</code> provides a better opportunity for:</p>
<ol>
<li data-line="836" class="code-line"><em>Parallel</em>: effectively executes the various processing steps.</li>
<li data-line="837" class="code-line"><em>Maintain the object context</em>: a minimum of context switches for the synchronized object, object data is not flushed out of the cache of the processor, speeding up processing.</li>
</ol>
<h2 data-line="839" class="code-line" id="deadlocks-and-race-conditions">Deadlocks and Race Conditions</h2>
<p data-line="841" class="code-line">I like to ask the following problem.</p>
<p data-line="843" class="code-line"><em>Problem 1</em>. Suppose that all the methods of our class synchronized via the mutex. Immediately the question appears: is it possible to obtain the race condition?</p>
<p data-line="845" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1ouw_k4-KyiYfSR8wSkqLLhwnPkyFkd59pYC4lTBlH3ijDS-dOO7UM897CQuAmX2h4jbBR4Cl3e99AAJMQBaTNM5jUDJwGeaezvuso3kHnU2jfZrnfDnDlTLtwj5cwKw7Ci7mFMf_R58/s600/ne_znau.jpg" alt="Task race condition"></p>
<p data-line="847" class="code-line">The obvious answer is no. However, there is a certain catch. The idea starts spinning in the head, the brain begins to offer crazy options, do not meet the conditions of the problem. As a result, everything turns to ashes and hopelessness appears.</p>
<p data-line="849" class="code-line">I advise you to think carefully before seeing the answer. But to avoid breaking the brain, I provide a solution for this problem below.</p>
<p data-line="851" class="code-line">Consider the following class:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> Counter
{
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">set</span><span class="hljs-params">(<span class="hljs-keyword">int</span> value)</span></span>;
<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">get</span><span class="hljs-params">()</span> <span class="hljs-keyword">const</span></span>;
<span class="hljs-keyword">private</span>:
<span class="hljs-keyword">int</span> value_ = <span class="hljs-number">0</span>;
};
</div></code></pre>
<p data-line="864" class="code-line">Wrap it:</p>
<pre class="hljs"><code><div>DECL_ADAPTER(Counter, <span class="hljs-built_in">set</span>, get)
Subjector<Counter> counter;
</div></code></pre>
<p data-line="872" class="code-line">Methods <code>get</code> and <code>set</code> will be wrapped by the asynchronous mutex, even though asynchronous here is not essential. Synchronization is important.</p>
<p data-line="874" class="code-line">And now we want to solve the problem:</p>
<p data-line="876" class="code-line"><em>Problem 2</em>. Atomically increment the counter.</p>
<p data-line="878" class="code-line">And then many suddenly realize:</p>
<pre class="hljs"><code><div>counter.<span class="hljs-built_in">set</span>(counter.get() + <span class="hljs-number">1</span>);
</div></code></pre>
<p data-line="884" class="code-line">This code contains the race condition, despite the fact that each call individually synchronized!</p>
<p data-line="886" class="code-line">To understand the diversity of different race conditions it makes sense to introduce the following categories.</p>
<h3 data-line="888" class="code-line" id="race-condition-of-the-first-kind">Race Condition of the First Kind</h3>
<p data-line="890" class="code-line">Or <em>data race</em>, as described in the standard:</p>
<blockquote data-line="892" class="code-line">
<p data-line="892" class="code-line">The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.</p>
<p data-line="894" class="code-line"><em><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf">C++17 Standard N4659, §4.7.1 (20.2)</a></em></p>
</blockquote>
<p data-line="896" class="code-line">A typical example of this case - when there are two actions that change the object state without any synchronization that is performed in two different threads, for example <code>std::vector::push_back(value)</code>. In the best case, the program will crash, in the worst case, it secretly corrupts the data (yes, in this case, the early crash is the best option). To catch such hard problems, there are special tools:</p>
<ol>
<li data-line="898" class="code-line"><a href="https://clang.llvm.org/docs/ThreadSanitizer.html">ThreadSanitizer</a>: allows to detect problems at runtime.</li>
<li data-line="899" class="code-line"><a href="http://valgrind.org/docs/manual/hg-manual.html">Helgrind: a thread error detector</a>: Valgrind tool for detecting synchronization errors.</li>
<li data-line="900" class="code-line"><a href="https://github.com/dvyukov/relacy">Relacy race detector</a>: verifies the multithreaded <a href="https://en.wikipedia.org/wiki/Non-blocking_algorithm">lock-free/wait-free</a> algorithms based on the memory model.</li>
</ol>
<h3 data-line="902" class="code-line" id="race-condition-of-the-second-kind">Race Condition of the Second Kind</h3>
<p data-line="904" class="code-line">Any race conditions which do not fall under the first category are race conditions of the second kind. These higher-level conditions, which are described by the logic of the program and its invariants, so they can not be detected by low-level tools and verifiers mentioned above. They typically demonstrate an atomicity break as it was shown in the example above with a counter <code>counter.set(counter.get() + 1)</code>. This is due to the fact that <code>.get()</code>, and <code>.set()</code> synchronized separately.</p>
<p data-line="906" class="code-line">At the moment, the tools for the analysis of such issues at an early stage of development. Below is a short list of minimal comments because a detailed description is beyond the scope of this article:</p>
<ol>
<li data-line="908" class="code-line"><a href="https://people.cs.vt.edu/~dongyoon/papers/EUROSYS-17-NodeFz.pdf">Node.fz: fuzzing the server-side event-driven architecture</a>: researchers found bugs related to the concurrent interaction in single-threaded asynchronous code!</li>
<li data-line="909" class="code-line"><a href="https://homes.cs.washington.edu/~pfonseca/papers/eurosys2017-dsbugs.pdf">An empirical study on the correctness of formally verified distributed systems</a>: the study of the correctness of formal verified systems. Authors found bugs where they should not be in principle!</li>
</ol>
<h3 data-line="911" class="code-line" id="continuity-and-asynchrony">Continuity and Asynchrony</h3>
<p data-line="913" class="code-line">The addition of the asynchronous behavior can provide the unexpected program execution, which leads to a tremendous opportunity wonder incorrect behavior once again. For this we consider synchronization via <code>CoAlone</code>:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> User
{
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">setName</span><span class="hljs-params">(<span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span>& name)</span></span>;
<span class="hljs-built_in">std</span>::<span class="hljs-function"><span class="hljs-built_in">string</span> <span class="hljs-title">getName</span><span class="hljs-params">()</span> <span class="hljs-keyword">const</span></span>;
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">setAge</span><span class="hljs-params">(<span class="hljs-keyword">int</span> age)</span></span>;
<span class="hljs-function"><span class="hljs-keyword">int</span> <span class="hljs-title">getAge</span><span class="hljs-params">()</span> <span class="hljs-keyword">const</span></span>;
};
DECL_ADAPTER(User, setName, getName, setAge, getAge)
BIND_SUBJECTOR(User, CoAlone)
<span class="hljs-keyword">struct</span> UserManager
{
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">increaseAge</span><span class="hljs-params">()</span>
</span>{
user_.setAge(user_.getAge() + <span class="hljs-number">1</span>);
}
<span class="hljs-keyword">private</span>:
Subjector<User> user_;
};
UserManager manager;
<span class="hljs-comment">// race condition of the 2nd kind</span>
manager.increaseAge();
</div></code></pre>
<p data-line="943" class="code-line">Here the line <code>manager.increaseAge()</code> contains already known race condition of the 2nd kind, leading to inconsistency in the case of the concurrent behavior of the method call <code>increaseAge()</code> in two different threads.</p>
<p data-line="945" class="code-line">You can try to fix this behavior:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> UserManager
{
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">increaseAge</span><span class="hljs-params">()</span>
</span>{
user_.setAge(user_.getAge() + <span class="hljs-number">1</span>);
}
<span class="hljs-keyword">private</span>:
Subjector<User> user_;
};
DECL_ADAPTER(UserManager, increaseAge)
BIND_SUBJECTOR(UserManager, CoAlone)
Subjector<UserManager> manager;
manager.increaseAge();
</div></code></pre>
<p data-line="965" class="code-line">We use <code>CoAlone</code> for synchronization in both cases. The question immediately arises: Will the race condition in this case?</p>
<p data-line="967" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqZUi7p9plPVd7hAAtmp3xcRzEt5nPV_7MjwAlwXQvy-6sCEv2LBM6jXP6xzxY3S-HikO9vYYJtFejY_ugDFDgpAxjRDfxfnPjymB7rfFt7gvUjcyv0beYUz9jinnEyeZgeEVvSCQvDoA/s600/fire_eng.jpg" alt="Fire"></p>
<p data-line="969" class="code-line">Will be! Despite the additional synchronization, this example is also subject to the problem of the race condition of the 2nd kind. Indeed, when you synchronize <code>UserManager</code> the current coroutine runs on the scheduler <code>Alone</code>. Then the call <code>user_.getAge()</code> teleports to another scheduler <code>Alone</code> belonging to <code>User</code>. So another running coroutine now is able to enter into the method <code>increaseAge()</code> in parallel with the current, which at this point is inside <code>user_.getAge()</code>. This is possible because <code>Alone</code> guarantees only the absence of parallel execution in its scheduler. In this scenario, we have the parallel execution of two different schedulers: <code>CoAlone<User></code> and <code>CoAlone<UserManager></code>.</p>
<p data-line="971" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdJcyGyodgXM9vA3ADX9x6FUcrNGe3LhXyAaaSI5lbgR1Crt9KpAbGtQCp26krUSQPMQZGEsIkjKy3LExYAwBMAusoP0abRD6ctOrc-CovAQj5Xm5o0gaG7vorekmgZg9BRMGPept-VL0/s1600/age_race.png" alt="Age with alone"></p>
<p data-line="973" class="code-line">Thus, atomic execution breaks in the case of scheduler-based synchronization: <code>CoAlone</code> and <code>CoPortal</code>.</p>
<p data-line="975" class="code-line">To fix this situation, it is sufficient to replace:</p>
<pre class="hljs"><code><div>BIND_SUBJECTOR(UserManager, CoMutex)
</div></code></pre>
<p data-line="981" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhebov3Yg2jH6UpucXOGjwI2AT7d6S7j2qV3Bgzvxdsagrv3LmSeVRH39t3LbJrgKqV4wGNS4Mt3uy32zZ_YQP7_X52rDUFZzibLSWE5D7uU0D0ZcnvxYPPNdcheibMZxpajvlFJqG_HFk/s1600/age.png" alt="Age with mutex"></p>
<p data-line="983" class="code-line">This will prevent the race condition of the 2nd kind.</p>
<h3 data-line="985" class="code-line" id="execution-continuity">Execution Continuity</h3>
<p data-line="987" class="code-line">In some cases, the break of atomic execution is extremely useful. To do this, consider the following example:</p>
<pre class="hljs"><code><div><span class="hljs-keyword">struct</span> UI
{
<span class="hljs-comment">// triggered when user requests the info</span>
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">onRequestUser</span><span class="hljs-params">(<span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span>& userName)</span></span>;
<span class="hljs-comment">// update the user info in UI</span>
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">updateUser</span><span class="hljs-params">(<span class="hljs-keyword">const</span> User& user)</span></span>;
};
DECL_ADAPTER(UI, onRequestUser, updateUser)
<span class="hljs-comment">// UI attaches to the main UI thread</span>
BIND_SUBJECTOR(UI, CoPortal)
<span class="hljs-keyword">struct</span> UserManager
{
<span class="hljs-comment">// request the user info</span>
<span class="hljs-function">User <span class="hljs-title">getUser</span><span class="hljs-params">(<span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span>& userName)</span></span>;
<span class="hljs-keyword">private</span>:
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">addUser</span><span class="hljs-params">(<span class="hljs-keyword">const</span> User& user)</span></span>;
<span class="hljs-function">User <span class="hljs-title">findUser</span><span class="hljs-params">(<span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span>& userName)</span></span>;
};
DECL_ADAPTER(UserManager, getUser)
BIND_SUBJECTOR(UserManager, CoAlone)
<span class="hljs-keyword">struct</span> NetworkManager
{
<span class="hljs-comment">// request the info from remote server</span>
<span class="hljs-function">User <span class="hljs-title">getUser</span><span class="hljs-params">(<span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span>& userName)</span></span>;
};
DECL_ADAPTER(NetworkManager, getUser)
<span class="hljs-comment">// network actions are executed outside the other threads isolated</span>
BIND_SUBJECTOR(NetworkManager, CoSerializedPortal)
<span class="hljs-comment">// functions returning singletons of objects</span>
Subjector<UserManager>& getUserManager();
Subjector<NetworkManager>& getNetworkManager();
Subjector<UI>& getUI();
<span class="hljs-keyword">void</span> UI::onRequestUser(<span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span>& userName);
{
updateUser(getUserManager().getUser(userName));
}
<span class="hljs-keyword">void</span> UserManager::getUser(<span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-built_in">string</span>& userName)
{
<span class="hljs-keyword">auto</span> user = findUser(userName);
<span class="hljs-keyword">if</span> (user) {
<span class="hljs-comment">// user has been found => return it immediately</span>
<span class="hljs-keyword">return</span> user;
}
<span class="hljs-comment">// user has not been found => request it remotely</span>
user = getNetworkManager().getUser(userName);
<span class="hljs-comment">// add user to avoid remote call again</span>
addUser(user);
<span class="hljs-keyword">return</span> user;
}
</div></code></pre>
<p data-line="1048" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtI-9Ap9ZwLfBdBZRmaWcnHsjay1P3_uVlSEyYkVtRjUMqD9AJHY2nsDjHZkQ-dCw1hvDAMr23yaAjEVOwtCBQ2iwFbtR9u-_ZeX5MWLYRQlZCltwvcHbABfHUljwf17qHskrPxs3YLT4/s1600/ui.png" alt="UI Interaction"></p>
<p data-line="1050" class="code-line">All the actions begin with the call <code>UI::onRequestUsername</code>. At this point, there is a call <code>UserManager::getUser</code> in UI thread through the corresponding subjector. This subjector first switches execution to another UI thread scheduler <code>Alone</code>, and then calls the corresponding method. In this case, the UI thread is unlocked and can perform other actions. Thus, the call is asynchronous and therefore does not block the UI not forcing it to slow down.</p>
<p data-line="1052" class="code-line">If <code>UserManager</code> already contains information related to the requesting user - the problem is solved, and we return it immediately. Otherwise, we will ask for the necessary information on the network through <code>NetworkManager</code>. Again, we do not block <code>UserManager</code> for the duration of a long query over the network. If at this point the user has requested some other information from the <code>UserManager</code> in parallel, we will be able to provide it without waiting for the completion of the initial operation! So, in this case, the break of atomic execution only improves the responsiveness and provides parallel query execution. At the same time, we made no effort to implement this strategy because subjector automagically synchronizes the access by rescheduling our coroutines appropriately. Is not this a miracle?</p>
<h3 data-line="1054" class="code-line" id="deadlocks">Deadlocks</h3>
<p data-line="1056" class="code-line">Another unique characteristic that has scheduler-based subjectors is the absence of deadlocks. Take a look:</p>
<pre class="hljs"><code><div><span class="hljs-comment">// forward declaration</span>
<span class="hljs-keyword">struct</span> User;
DECL_ADAPTER(User, addFriend, getId, addFriendId)
<span class="hljs-keyword">struct</span> User
{
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">addFriend</span><span class="hljs-params">(Subjector<User>& myFriend)</span>
</span>{
<span class="hljs-keyword">auto</span> friendId = myFriend.getId();
<span class="hljs-keyword">if</span> (hasFriend(friendId)) {
<span class="hljs-comment">// do nothing in case of friend presence</span>
<span class="hljs-keyword">return</span>;
}
addFriendId(friendId);
<span class="hljs-keyword">auto</span> myId = getId();
myFriend.addFriendId(myId);
}
<span class="hljs-function">Id <span class="hljs-title">getId</span><span class="hljs-params">()</span> <span class="hljs-keyword">const</span></span>;
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">addFriendId</span><span class="hljs-params">(Id id)</span></span>;
<span class="hljs-keyword">private</span>:
<span class="hljs-function"><span class="hljs-keyword">bool</span> <span class="hljs-title">hasFriend</span><span class="hljs-params">(Id id)</span></span>;
};
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">makeFriends</span><span class="hljs-params">(Subjector<User>& u1, Subjector<User>& u2)</span>
</span>{
u1.addFriend(u2);
}
</div></code></pre>
<p data-line="1090" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrRsoLX1XMiu2EsJvpsUbkzYV66b1mbVaeEr1QmwBPCCgr0oF23mbdiiKjLOZVwY3JEkVMR2VViB25FQ_zXdA5GPuy15Yl8fuirbcxq32MGaveYxqUA1-6Km1_bnSHtz_8oAcZGfzQhGM/s1600/deadlock.png" alt="Deadlock"></p>
<p data-line="1092" class="code-line">Because the default subjector behavior is <code>CoMutex</code>, then the parallel execution of <code>makeFriends</code> will sometimes hang. How to avoid this situation without having to rewrite all the code from scratch? You need to just add a single line:</p>
<pre class="hljs"><code><div>BIND_SUBJECTOR(User, CoAlone)
</div></code></pre>
<p data-line="1098" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglOdYXXbWzyYpVbRYLlGTAHzrCMvXgvQQm2cpuTupeuURqA2mjCaCVpvhMYUX7Xztgukmvz_j1XQeKMy0qvnK0yUSbwCoWEpub51WdtagcF14DimT4SVykSeHLMo1KZuvjyUl-S3ezRCY/s1600/deadlock_free.png" alt="Deadlock-Free"></p>
<p data-line="1100" class="code-line">Now, no matter how you execute it and how many threads you use you will never obtain the deadlock. Could you ever imagine that it is possible? Obviously, not.</p>
<h3 data-line="1102" class="code-line" id="relationship">Relationship</h3>
<p data-line="1104" class="code-line">If you look closely at the table of comparative characteristics of various ways of interactions, it is possible to see the relationship: if there is <em>continuity</em> then there is no <em>deadlock-free</em>, and vice versa. In general, one can show that in the case of the described methods of synchronization we cannot simultaneously achieve both properties. However, when applying special techniques of transactional execution with tracking dependencies such behavior is possible. However, it requires a separate article to be written, so it will not be considered here.</p>
<h2 data-line="1106" class="code-line" id="discussion">Discussion</h2>
<p data-line="1108" class="code-line">We have obtained 5 different ways to synchronize concurrent execution. Adding 3 hybrid approaches and doubling the total result to take into account asynchronous calls we obtain 16 different variants of nonblocking asynchronous synchronization in the user space without switching to kernel space. As long as there is any work, it will be executed maximizing the loading of all the processor cores. The user may choose the way of synchronization based on the application logic and data interaction.</p>
<p data-line="1110" class="code-line">Among all these 16 ways there is a special case that has a lot of popularity nowadays: <code>CoChannel<T>.async().someMethod(...)</code>. Such interaction applied to all participants in the process is called the <a href="https://en.wikipedia.org/wiki/Actor_model">actor model</a>. Indeed, the channel is the mailbox that automatically dispatches incoming messages that match the methods of a class, using the power of C++ including static type system, OOP, and macros based template meta-programming. Despite the fact that actor model does not require additional synchronization it is subject to race conditions of the second kind, which in turn eliminates deadlocks.</p>
<p data-line="1112" class="code-line">Introduced subjector model has a much greater variability, allowing later to change the particular mode of concurrent interaction without breaking existing interfaces and implementations. The specific choice depends on the circumstances and the arguments presented in this article can be considered as a starting point for the final decision. Each of the methods has special characteristics, wherein the silver bullet has not been delivered again.</p>
<h2 data-line="1114" class="code-line" id="conclusion">Conclusion</h2>
<p data-line="1116" class="code-line">Here is a list of major achievements:</p>
<ol>
<li data-line="1118" class="code-line">
<p data-line="1118" class="code-line"><em>Synchronization integration</em>. All data required for synchronization are placed inside our entity, so it converts the passive object into the active participant - <strong>subjector</strong>.</p>
</li>
<li data-line="1120" class="code-line">
<p data-line="1120" class="code-line"><em>Deep abstraction of execution</em>. The proposed model introduces a single abstraction through a generalized understanding of the principles of object-oriented programming. It joins together a variety of primitives, creating a universal model.</p>
</li>
<li data-line="1122" class="code-line">
<p data-line="1122" class="code-line"><em>Non-invasiveness</em>. Subjector does not change the original interfaces, and thus allows you to transparently add a synchronization without code refactoring.</p>
</li>
<li data-line="1124" class="code-line">
<p data-line="1124" class="code-line"><em>Protection from a multi-threaded design mistakes</em>. Universal adapter guarantees and automatically performs all the necessary steps required to synchronize the execution. This eliminates the necessity to annotate or comment fields and methods in the class, describing the required usage context. Subjector isolates all the available fields and class methods, preventing the huge class of the most insidious bugs associated with concurrent programming, giving the task to synchronize object access to the compiler.</p>
</li>
<li data-line="1126" class="code-line">
<p data-line="1126" class="code-line"><em>Late optimization</em>. Based on the data locality and the production usage pattern the developer may switch between different types of synchronization for the best results at the final stage of the application development.</p>
</li>
<li data-line="1128" class="code-line">
<p data-line="1128" class="code-line"><em>Efficiency</em>. The approach under consideration implements context switching exclusively in user space through the use of coroutines, which makes minimal overhead and maximum utilization of hardware resources.</p>
</li>
<li data-line="1130" class="code-line">
<p data-line="1130" class="code-line"><em>Clarity, purity, and simplicity of the code</em>. There is no need to explicitly use the correct tracking of synchronization context and think about how, when, and what variables should be used together with particular guards, mutexes, schedulers, threads, etc. This gives a more simple composability of different parts, allowing the developer to focus on solving specific tasks, rather than fixing the tricky issues.</p>
</li>
</ol>
<p data-line="1132" class="code-line">As you know, the model is simplified with a generalization. The philosophy of object-oriented programming makes a different way to look at the traditional concepts and see the magnificent abstractions. The synergy of these seemingly incompatible concepts like OOP, coroutines, channels, mutexes, spinlocks, threads and schedulers creates a new model of concurrent interactions, generalizing the well-known model of the actors, the traditional model based on a mutex and a model of <a href="https://en.wikipedia.org/wiki/Communicating_sequential_processes">communicating sequential processes</a>, blurring the line between two different ways of interaction: through <a href="https://en.wikipedia.org/wiki/Shared_memory">shared memory</a> and through the <a href="https://en.wikipedia.org/wiki/Message_passing">exchange of messages</a>.</p>
<p data-line="1134" class="code-line">The name of this generalized model of concurrent interaction is <strong>subjector model</strong>.</p>
<p data-line="1136" class="code-line"><a href="https://github.com/gridem/Subjector">https://github.com/gridem/Subjector</a></p>
<p data-line="1138" class="code-line"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhieXGp8g0knl6tm3m7Dxpss8PHllqFSzH7B64wRenD7Lc8upVoDSQAS__aSloiqXaZkflPQi_OgwZq2LzeBcYAivGAu97D3i3EE42dZwnug3rPGzhAnsX9FG4BY0z6iTO7O96Mcfd6h04/s600/molodec_eng.jpg" alt="New washtub is ready!"></p>
<h2 data-line="1140" class="code-line" id="references">References</h2>
<p data-line="1142" class="code-line">[1] <a href="https://kukuruku.co/post/asynchronous-programming-back-to-the-future/">Asynchronous Programming: Back to the Future.</a><br/>
[2] <a href="https://kukuruku.co/post/asynchronous-programming-part-2-teleportation-through-portals/">Asynchronous Programming Part 2: Teleportation through Portals.</a><br/>
[3] <a href="http://gridem.blogspot.com/2015/11/replicated-object-part-2-god-adapter.html">Replicated Object. Part 2: God Adapter.</a><br/>
[4] <a href="https://www.researchgate.net/profile/Dragan_Nikolic2/publication/252317673_Experimental_and_Theoretical_Analysis_of_Central_Hb_Asymmetry/links/5658034308ae1ef9297bf662/Experimental-and-Theoretical-Analysis-of-Central-Hb-Asymmetry.pdf">S. Djurović, M. Ćirišan, A.V Demura, G.V Demchenko, D. Nikolić, M.A. Gigosos, et al., Measurements of Hβ Stark central asymmetry and its analysis through standard theory and computer simulations, Phys. Rev. E 79 (2009) 46402.</a><br/>
[5] <a href="https://en.wikipedia.org/wiki/Quantum_chemistry">Quantum chemistry.</a><br/>
[6] <a href="https://en.wikipedia.org/wiki/SOLID_(object-oriented_design)">Object-oriented design: SOLID.</a><br/>
[7] <a href="https://en.wikipedia.org/wiki/Simula">Simula-67 language.</a><br/>
[8] <a href="https://en.wikipedia.org/wiki/Smalltalk">Smalltalk language.</a><br/>
[9] <a href="https://en.wikipedia.org/wiki/Objective-C">Objective-C language.</a><br/>
[10] <a href="https://gobyexample.com/channels">Go by Example: Channels.</a><br/>
[11] <a href="https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern">Curiously recurring template pattern.</a><br/>
[12] <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf">C++17 Standard N4659.</a><br/>
[13] <a href="https://clang.llvm.org/docs/ThreadSanitizer.html">ThreadSanitizer.</a><br/>
[14] <a href="http://valgrind.org/docs/manual/hg-manual.html">Helgrind: a thread error detector.</a><br/>
[15] <a href="https://github.com/dvyukov/relacy">Relacy race detector.</a><br/>
[16] <a href="https://en.wikipedia.org/wiki/Non-blocking_algorithm">Non-blocking algorithm.</a><br/>
[17] <a href="https://people.cs.vt.edu/~dongyoon/papers/EUROSYS-17-NodeFz.pdf">J.Davis, A.Thekumparampil, D.Lee, Node.fz: fuzzing the server-side event-driven architecture. EuroSys '17 Proceedings of the Twelfth European Conference on Computer Systems, pp 145-160.</a><br/>
[18] <a href="https://homes.cs.washington.edu/~pfonseca/papers/eurosys2017-dsbugs.pdf">P.Fonseca, K.Zhang, X.Wang, A.Krishnamurthy, An empirical study on the correctness of formally verified distributed systems. EuroSys '17 Proceedings of the Twelfth European Conference on Computer Systems, pp 328-343.</a><br/>
[19] <a href="https://en.wikipedia.org/wiki/Actor_model">Actor model.</a><br/>
[20] <a href="https://en.wikipedia.org/wiki/Communicating_sequential_processes">Communicating sequential processes.</a><br/>
[21] <a href="https://en.wikipedia.org/wiki/Shared_memory">Shared memory.</a><br/>
[22] <a href="https://en.wikipedia.org/wiki/Message_passing">Message passing.</a><br/></p>
</body></html>Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com2tag:blogger.com,1999:blog-7694239937514449322.post-47977012394581873442017-08-19T14:20:00.001-07:002024-01-24T08:56:04.016-08:00Kinetics of Large Distributed Clusters<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.7.1/katex.min.css">
<h2>Summary</h2>
<ol>
<li>Martin Kleppmann's fatal mistake.</li>
<li>Physicochemical kinetics does mathematics.</li>
<li>The half-life of the cluster.</li>
<li>We solve nonlinear differential equations without solving them.</li>
<li>Nodes as a catalyst.</li>
<li>The predictive power of graphs.</li>
<li>100 million years.</li>
<li>Synergy.</li>
</ol>
<p>In <a href="http://gridem.blogspot.com/2017/03/cap-theorem-myths.html">the previous article</a>, we discussed in detail <a href="https://cloud.google.com/spanner/docs/whitepapers/SpannerAndCap.pdf">Brewer's article and Brewer's theorem</a>. This time we will analyze the post of <a href="https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html">Martin Kleppmann "The probability of data loss in large clusters"</a>.</p>
<p>In the mentioned post, the author attempts to simulate the following task. To ensure the preservation of data, the data replication method is usually used. In this case, in fact, it does not matter whether erasure is used or not. In the original post, the author sets the probability of dropping one node, and then raises the question: what is the probability of data loss when the number of nodes increases?</p>
<p>The answer is shown in this picture:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfhk3lyq7Zuy3HDbWxA2X17Be1WDZHIWZX8ytVFE1NcnInw7ib771NX_tGT3xKebvPAMEvAJj1W_kzBdfoBrayGrcrcEG8ISqyN3sA6ynbxRr1GmhyEYdPP4iosNGk6kRZbT16PHC-WBY/s640/dataloss.png" alt="Data loss"></p>
<a name='more'></a>
<p>That is, the data loss grows proportional to the number of nodes.</p>
<p>Why does it matter? If we look at the size of the current clusters, we will see that their number has grown steadily over time. And so there's a reasonable question: Is it worth worrying about protecting your data and raising the replication factor? This has a direct impact on business, cost of ownership, and so on. Also, this example can be a great demonstration of how to produce a mathematically correct but incorrect result.</p>
<h2>Cluster Modeling</h2>
<p>It is useful to understand the model and simulation to demonstrate mistakes in calculations. If the model does not properly describe the actual behavior of the system, no matter what correct formulas are used, we can easily get the wrong result. And all because our model may not take into account any important parameters of the system that cannot be ignored. The science is to understand what's important and what is not.</p>
<p>To describe the life of the cluster, it is important to consider the dynamics of change and the relationship between different processes. This is the weak part of the original article because it has a static picture in it without any particular features associated with replication.</p>
<p>To describe the dynamics, I will use the methods of <a href="https://en.wikipedia.org/wiki/Chemical_kinetics">chemical kinetics</a>, where I will use the ensemble of nodes instead of the particle ensemble. As far as I know, no one's used that formalism to describe the cluster behavior. So I'm going to improvise.</p>
<p>I introduce the following notation:</p>
<ol>
<li><eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>N</mi></mrow><annotation encoding="application/x-tex">N</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">N</span></span></span></span></eq> is the total number of cluster nodes.</li>
<li><eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq> is the number of operable nodes.</li>
<li><eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span></span></eq> is the number of failed nodes.</li>
</ol>
<p>Then it is obvious that:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>N</mi><mo>=</mo><mi>A</mi><mo>+</mo><mi>F</mi></mrow><annotation encoding="application/x-tex">N = A + F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.76666em;vertical-align:-0.08333em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">N</span><span class="mrel">=</span><span class="mord mathit">A</span><span class="mbin">+</span><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span></span></span></eqn></section>
<p>The failed nodes include any problems: the disk got stuck, the processor, network, etc. broke down. I do not care about the reason, the very fact of the failure and inaccessibility of the data is important. In the future, of course, you can take into account more subtle dynamics.</p>
<p>Now let's write the kinetic equations of the processes of breaking and restoring cluster nodes:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>A</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>F</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>f</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>1</mn><mi mathvariant="normal">.</mi><mn>1</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><mi>F</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>A</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>a</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>1</mn><mi mathvariant="normal">.</mi><mn>2</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
A & \\rightarrow F,~k\_f &(1.1) \\\\
F & \\rightarrow A,~k\_a &(1.2) \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:2.05em;"></span><span class="strut bottom" style="height:3.6000000000000005em;vertical-align:-1.5500000000000007em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-1.2099999999999997em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit">A</span></span></span><span style="top:-0.00999999999999951em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span><span style="top:1.1900000000000006em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-1.2099999999999997em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-0.00999999999999951em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit">A</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="arraycolsep" style="width:2em;"></span><span class="col-align-r"><span class="vlist"><span style="top:-1.2099999999999997em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">1</span><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mclose">)</span></span></span><span style="top:-0.00999999999999951em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">1</span><span class="mord mathrm">.</span><span class="mord mathrm">2</span><span class="mclose">)</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>These simplest equations mean the following. The first equation describes the process of node failure. It does not depend on any parameters and describes the isolated output of the node failure. Other nodes are not involved in this process. On the left, the original "composition" of the process participants is used, and the process products are on the right. Rate constants <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>k</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">k_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.980548em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>k</mi><mi>a</mi></msub></mrow><annotation encoding="application/x-tex">k_a</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> specify the rate characteristics of processes for the failure and recovery of nodes, respectively.</p>
<p>Let us clarify the physical meaning of the rate constants. To do this, we write the kinetic equations:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mfrac><mrow><mi>d</mi><mi>A</mi></mrow><mrow><mi>d</mi><mi>t</mi></mrow></mfrac></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mo>−</mo><msub><mi>k</mi><mi>f</mi></msub><mi>A</mi><mo>+</mo><msub><mi>k</mi><mi>a</mi></msub><mi>F</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr><mtr><mtd><mrow><mfrac><mrow><mi>d</mi><mi>F</mi></mrow><mrow><mi>d</mi><mi>t</mi></mrow></mfrac></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><msub><mi>k</mi><mi>f</mi></msub><mi>A</mi><mo>−</mo><msub><mi>k</mi><mi>a</mi></msub><mi>F</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\frac{dA}{dt} &= -k\_f A + k\_a F \\\\
\\\\
\\frac{dF}{dt} &= k\_f A - k\_a F \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:3.50744em;"></span><span class="strut bottom" style="height:6.51488em;vertical-align:-3.0074400000000003em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-2.136em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit">d</span><span class="mord mathit">t</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit">d</span><span class="mord mathit">A</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:-0.6099999999999997em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span style="top:1.1214400000000007em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit">d</span><span class="mord mathit">t</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit">d</span><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:2.6474400000000005em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-2.136em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord">−</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit">A</span><span class="mbin">+</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span><span style="top:1.1214400000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit">A</span><span class="mbin">−</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>From these equations we understand the meaning of constants <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>k</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">k_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.980548em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>k</mi><mi>a</mi></msub></mrow><annotation encoding="application/x-tex">k_a</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>. Assuming that there are no SREs and the cluster does not heal itself (i.e. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>k</mi><mi>a</mi></msub><mo>=</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">k_a = 0</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">=</span><span class="mord mathrm">0</span></span></span></span></eq>), we immediately obtain the equation:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mfrac><mrow><mi>d</mi><mi>A</mi></mrow><mrow><mi>d</mi><mi>t</mi></mrow></mfrac><mo>=</mo><mo>−</mo><msub><mi>k</mi><mi>f</mi></msub><mi>A</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\frac{dA}{dt} = -k\_f A \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.8787200000000002em;"></span><span class="strut bottom" style="height:3.257440000000001em;vertical-align:-1.3787200000000004em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.5072800000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit">d</span><span class="mord mathit">t</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit">d</span><span class="mord mathit">A</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">=</span><span class="mord">−</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit">A</span></span></span><span style="top:1.0187200000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>Or</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>A</mi><mo>=</mo><mi>N</mi><msup><mi>e</mi><mrow><mo>−</mo><msub><mi>k</mi><mi>f</mi></msub><mi>t</mi></mrow></msup></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
A = N e^{-k\_f t} \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.479554em;"></span><span class="strut bottom" style="height:2.4591080000000005em;vertical-align:-0.9795540000000005em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.5804460000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit">A</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.10903em;">N</span><span class="mord"><span class="mord mathit">e</span><span class="msupsub"><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mtight"><span class="mord mathit mtight" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15122857142857138em;margin-right:0.07142857142857144em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-scriptstyle scriptscriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit mtight">t</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:0.6195540000000004em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>That is, the quantity <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>1</mn><mi mathvariant="normal">/</mi><msub><mi>k</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">1 / k_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1.036108em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">1</span><span class="mord mathrm">/</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is the half-life of the cluster for spare parts with the accuracy of <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>e</mi><mi mathvariant="normal">/</mi><mn>2</mn></mrow><annotation encoding="application/x-tex">e / 2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">e</span><span class="mord mathrm">/</span><span class="mord mathrm">2</span></span></span></span></eq>. Let <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">\tau_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.716668em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> be the typical time of the transition of a single node from the state <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq> into the state <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span></span></eq>, and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>a</mi></msub></mrow><annotation encoding="application/x-tex">\tau_a</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is the typical time of the transition of a single node from the state <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span></span></eq> into the state <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq>. Then</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow></mtd><mtd><mrow><mrow></mrow><mo>∼</mo><mfrac><mrow><mn>1</mn></mrow><mrow><msub><mi>k</mi><mi>f</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr><mtr><mtd><mrow><msub><mi>τ</mi><mi>a</mi></msub></mrow></mtd><mtd><mrow><mrow></mrow><mo>∼</mo><mfrac><mrow><mn>1</mn></mrow><mrow><msub><mi>k</mi><mi>a</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr><mtr><mtd><mrow><mfrac><mrow><msub><mi>τ</mi><mi>a</mi></msub></mrow><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow></mfrac></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mfrac><mrow><msub><mi>k</mi><mi>f</mi></msub></mrow><mrow><msub><mi>k</mi><mi>a</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\tau\_f &\\sim \\frac{1}{k\_f} \\\\
\\\\
\\tau\_a &\\sim \\frac{1}{k\_a} \\\\
\\\\
\\frac{\\tau\_a}{\\tau\_f} &= \\frac{k\_f}{k\_a} \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:5.447268em;"></span><span class="strut bottom" style="height:10.394536000000002em;vertical-align:-4.947268000000001em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-4.125828em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-2.3137199999999996em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span style="top:-0.6322800000000004em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:1.0437199999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span style="top:2.775160000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:4.587268000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-4.125828em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">∼</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:-0.6322800000000004em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">∼</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:2.775160000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>Let's solve our kinetic equations. I want to make it right that I will cut corners wherever it is possible to get the simplest analytical dependencies that I will use for possible predictions and tuning.</p>
<p>Because my article reaches the maximum limit on the number of solutions of the differential equations, I will solve these equations using the <a href="https://en.wikipedia.org/wiki/Steady_state_(chemistry)#Steady_state_approximation_in_chemical_kinetics">steady state approximation</a>:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mfrac><mrow><mi>d</mi><mi>A</mi></mrow><mrow><mi>d</mi><mi>t</mi></mrow></mfrac><mo>=</mo><mn>0</mn><mo>⇒</mo><mi>F</mi><mo>=</mo><mi>A</mi><mfrac><mrow><msub><mi>k</mi><mi>f</mi></msub></mrow><mrow><msub><mi>k</mi><mi>a</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\frac{dA}{dt} = 0 \\Rightarrow F = A \\frac{k\_f}{k\_a} \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.95372em;"></span><span class="strut bottom" style="height:3.4074400000000002em;vertical-align:-1.4537200000000003em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.5822799999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit">d</span><span class="mord mathit">t</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit">d</span><span class="mord mathit">A</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">=</span><span class="mord mathrm">0</span><span class="mrel">⇒</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mrel">=</span><span class="mord mathit">A</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:1.0937200000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>Given that <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>F</mi><mo>≪</mo><mi>N</mi></mrow><annotation encoding="application/x-tex">F \ll N</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.72243em;vertical-align:-0.0391em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mrel">≪</span><span class="mord mathit" style="margin-right:0.10903em;">N</span></span></span></span></eq> (this is a reasonable assumption, otherwise you need to buy better hardware or more advanced SREs), we get:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>A</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>≃</mo><mi>N</mi></mrow></mtd></mtr><mtr><mtd><mrow><mi>F</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mi>N</mi><mfrac><mrow><msub><mi>k</mi><mi>f</mi></msub></mrow><mrow><msub><mi>k</mi><mi>a</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
A &\\simeq N \\\\
F &= N \\frac{k\_f}{k\_a} \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:2.55372em;"></span><span class="strut bottom" style="height:4.60744em;vertical-align:-2.05372em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-1.7137200000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit">A</span></span></span><span style="top:0.017719999999999958em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span><span style="top:1.6937200000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-1.7137200000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">≃</span><span class="mord mathit" style="margin-right:0.10903em;">N</span></span></span><span style="top:0.017719999999999958em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.10903em;">N</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>If we assume that the recovery time <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>a</mi></msub></mrow><annotation encoding="application/x-tex">\tau_a</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is approximately 1 week, and the time of death <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">\tau_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.716668em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is approximately 1 year, we get that the ratio of broken nodes <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">p_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.716668em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><msub><mi>p</mi><mi>f</mi></msub><mo>=</mo><mfrac><mrow><mi>F</mi></mrow><mrow><mi>N</mi></mrow></mfrac><mo>=</mo><mfrac><mrow><msub><mi>τ</mi><mi>a</mi></msub></mrow><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow></mfrac><mo>≈</mo><mn>2</mn><mi mathvariant="normal">%</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
p\_f = \\frac{F}{N} = \\frac{\\tau\_a}{\\tau\_f} \\approx 2\\% \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:2.0162189999999995em;"></span><span class="strut bottom" style="height:3.5324379999999995em;vertical-align:-1.516219em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.6558889999999995em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit" style="margin-right:0.10903em;">N</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">≈</span><span class="mord mathrm">2</span><span class="mord mathrm">%</span></span></span><span style="top:1.1562189999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<h2>Chunks</h2>
<p>Let <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>U</mi></mrow><annotation encoding="application/x-tex">U</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span></span></span></span></eq> be the number of under-replicated chunks that need to be replicated after the nodes were failed when it goes into state <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span></span></eq>. Then to take chunks into account we improve our equations:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>A</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>F</mi><mo>+</mo><mi>U</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>f</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>2</mn><mi mathvariant="normal">.</mi><mn>1</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><mi>F</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>A</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>a</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>2</mn><mi mathvariant="normal">.</mi><mn>2</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><mi>U</mi><mo>+</mo><mi>A</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>H</mi><mo>+</mo><mi>A</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>r</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>2</mn><mi mathvariant="normal">.</mi><mn>3</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
A & \\rightarrow F + U,~k\_f &(2.1) \\\\
F & \\rightarrow A,~k\_a &(2.2) \\\\
U + A & \\rightarrow H + A,~k\_r &(2.3) \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:2.6500000000000004em;"></span><span class="strut bottom" style="height:4.800000000000001em;vertical-align:-2.1500000000000004em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-1.8100000000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit">A</span></span></span><span style="top:-0.6100000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span><span style="top:0.5900000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mbin">+</span><span class="mord mathit">A</span></span></span><span style="top:1.7900000000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-1.8100000000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mbin">+</span><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-0.6100000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit">A</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:0.5900000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mbin">+</span><span class="mord mathit">A</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="arraycolsep" style="width:2em;"></span><span class="col-align-r"><span class="vlist"><span style="top:-1.8100000000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">2</span><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mclose">)</span></span></span><span style="top:-0.6100000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">2</span><span class="mord mathrm">.</span><span class="mord mathrm">2</span><span class="mclose">)</span></span></span><span style="top:0.5900000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">2</span><span class="mord mathrm">.</span><span class="mord mathrm">3</span><span class="mclose">)</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>Where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>k</mi><mi>r</mi></msub></mrow><annotation encoding="application/x-tex">k_r</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is a replication rate constant of the process of the second order, and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>H</mi></mrow><annotation encoding="application/x-tex">H</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.08125em;">H</span></span></span></span></eq> is a healthy chunk that dissolves in the total number of chunks.</p>
<p>The third equation should be clarified. It describes the second order process, not the first one:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>U</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>H</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>r</mi></msub></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
U & \\rightarrow H,~k\_r \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.45em;"></span><span class="strut bottom" style="height:2.4000000000000004em;vertical-align:-0.9500000000000004em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.6099999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span></span></span><span style="top:0.5900000000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-0.6099999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>If we did that, we'd have a curve of Kleppmann, which is not part of my plan. In fact, all nodes are involved in the recovery process, and the more nodes we have, the faster replication process goes. This is due to the fact that the chunks from the failed nodes are distributed approximately evenly across the cluster so that each node spends <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq> times less time to replicate the under-replicated chunks. This means that the resulting chunks recovery rate from the failed nodes will be proportional to the number of available nodes <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq>.</p>
<p>It is also worth noting that the equation (3) on the left and right are placed the same "substance" <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq>, and it is not consumed or generated. The chemists would have stated immediately that <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq> is a catalyst in this case. And if you think carefully, it really is.</p>
<p>The steady state approach instantly provides the result:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mfrac><mrow><mi>d</mi><mi>U</mi></mrow><mrow><mi>d</mi><mi>t</mi></mrow></mfrac><mo>=</mo><mn>0</mn><mo>=</mo><msub><mi>k</mi><mi>f</mi></msub><mi>A</mi><mo>−</mo><msub><mi>k</mi><mi>r</mi></msub><mi>U</mi><mi>A</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\frac{dU}{dt} = 0 = k\_f A - k\_r U A \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.8787200000000002em;"></span><span class="strut bottom" style="height:3.257440000000001em;vertical-align:-1.3787200000000004em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.5072800000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit">d</span><span class="mord mathit">t</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit">d</span><span class="mord mathit" style="margin-right:0.10903em;">U</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">=</span><span class="mord mathrm">0</span><span class="mrel">=</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit">A</span><span class="mbin">−</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mord mathit">A</span></span></span><span style="top:1.0187200000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>or</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>U</mi><mo>=</mo><mfrac><mrow><msub><mi>k</mi><mi>f</mi></msub></mrow><mrow><msub><mi>k</mi><mi>r</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
U = \\frac{k\_f}{k\_r} \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.95372em;"></span><span class="strut bottom" style="height:3.4074400000000002em;vertical-align:-1.4537200000000003em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.5822799999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:1.0937200000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>Amazing result! That is, the number of chunks to be replicated is independent of the number of nodes! This is due to the fact that increasing the number of nodes increases the resulting process rate (3), thus compensating the increased number of <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span></span></eq> nodes. Catalysis!</p>
<p>Let's estimate this value. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>r</mi></msub></mrow><annotation encoding="application/x-tex">\tau_r</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is the time of chunks recovery as if we had the only node. The node needs to replicate the 5 TB of the data, but the replication stream in bytes is 50 MB/s, then:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>U</mi><mo>=</mo><mfrac><mrow><msub><mi>τ</mi><mi>r</mi></msub></mrow><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow></mfrac><mo>≈</mo><mfrac><mrow><mn>1</mn><mo>×</mo><mn>1</mn><msup><mn>0</mn><mn>5</mn></msup></mrow><mrow><mn>3</mn><mi mathvariant="normal">.</mi><mn>2</mn><mo>×</mo><mn>1</mn><msup><mn>0</mn><mn>7</mn></msup></mrow></mfrac><mo>≈</mo><mn>3</mn><mo>×</mo><mn>1</mn><msup><mn>0</mn><mrow><mo>−</mo><mn>3</mn></mrow></msup></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
U = \\frac{\\tau\_r}{\\tau\_f} \\approx \\frac{1 \\times 10^5}{3.2 \\times 10^7} \\approx 3 \\times 10^{-3} \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:2.081608em;"></span><span class="strut bottom" style="height:3.6632160000000002em;vertical-align:-1.5816080000000001em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.5905em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">≈</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm">3</span><span class="mord mathrm">.</span><span class="mord mathrm">2</span><span class="mbin">×</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:-0.289em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">7</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mbin">×</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord mathrm mtight">5</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">≈</span><span class="mord mathrm">3</span><span class="mbin">×</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:-0.41300000000000003em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mathrm mtight">3</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:1.221608em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>That is, <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>U</mi><mo>≪</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">U \ll 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.72243em;vertical-align:-0.0391em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mrel">≪</span><span class="mord mathrm">1</span></span></span></span></eq> and you don't have to be afraid of data loss. It is worth taking into account that the loss of one chunk of three does not result in data loss.</p>
<h2>Replication Planning</h2>
<p>In the previous calculation, we made an implicit assumption that nodes instantly know about the specific chunks to be replicated, and immediately begins the replication. In reality, this is completely wrong: metadata servers need to understand that node goes away, then to understand the specific chunks to be replicated and start the replication process on the nodes. This is not instantaneous and takes a while, <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>s</mi></msub></mrow><annotation encoding="application/x-tex">\tau_s</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>, scheduling time.</p>
<p>To take advantage of the lag, I will use the <a href="https://en.wikipedia.org/wiki/Activated_complex">theory of a <em>transition state</em> or an <em>activated complex</em></a>, which describes the process of moving through a saddle point on a multidimensional surface of potential energy. In our view, we will have some additional intermediate status <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>U</mi><mo>∗</mo></msup></mrow><annotation encoding="application/x-tex">U^*</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.688696em;"></span><span class="strut bottom" style="height:0.688696em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>, which means that this chunk has been scheduled for replication, but the replication process has not started yet. That is, the next nanosecond replication will begin, but not picosecond earlier. Then our process system will take the final form:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>A</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>F</mi><mo>+</mo><mi>U</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>f</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>3</mn><mi mathvariant="normal">.</mi><mn>1</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><mi>F</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>A</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>a</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>3</mn><mi mathvariant="normal">.</mi><mn>2</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><mi>U</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><msup><mi>U</mi><mo>∗</mo></msup><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>s</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>3</mn><mi mathvariant="normal">.</mi><mn>3</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><msup><mi>U</mi><mo>∗</mo></msup><mo>+</mo><mi>A</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>H</mi><mo>+</mo><mi>A</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>r</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>3</mn><mi mathvariant="normal">.</mi><mn>4</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
A & \\rightarrow F + U,~k\_f &(3.1) \\\\
F & \\rightarrow A,~k\_a &(3.2) \\\\
U & \\rightarrow U^\*,~k\_s &(3.3) \\\\
U^\* + A & \\rightarrow H + A,~k\_r &(3.4) \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:3.25em;"></span><span class="strut bottom" style="height:6em;vertical-align:-2.7500000000000004em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-2.41em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit">A</span></span></span><span style="top:-1.2100000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span><span style="top:-0.009999999999999953em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span></span></span><span style="top:1.1900000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mbin">+</span><span class="mord mathit">A</span></span></span><span style="top:2.3900000000000006em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-2.41em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mbin">+</span><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-1.2100000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit">A</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-0.009999999999999953em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:1.1900000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mbin">+</span><span class="mord mathit">A</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="arraycolsep" style="width:2em;"></span><span class="col-align-r"><span class="vlist"><span style="top:-2.41em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">3</span><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mclose">)</span></span></span><span style="top:-1.2100000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">3</span><span class="mord mathrm">.</span><span class="mord mathrm">2</span><span class="mclose">)</span></span></span><span style="top:-0.009999999999999953em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">3</span><span class="mord mathrm">.</span><span class="mord mathrm">3</span><span class="mclose">)</span></span></span><span style="top:1.1900000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">3</span><span class="mord mathrm">.</span><span class="mord mathrm">4</span><span class="mclose">)</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>Solving it, we find that:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mfrac><mrow><mi>d</mi><mi>U</mi></mrow><mrow><mi>d</mi><mi>t</mi></mrow></mfrac></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><msub><mi>k</mi><mi>f</mi></msub><mi>A</mi><mo>−</mo><msub><mi>k</mi><mi>s</mi></msub><mi>U</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr><mtr><mtd><mrow><mfrac><mrow><mi>d</mi><msup><mi>U</mi><mo>∗</mo></msup></mrow><mrow><mi>d</mi><mi>t</mi></mrow></mfrac></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><msub><mi>k</mi><mi>s</mi></msub><mi>U</mi><mo>−</mo><msub><mi>k</mi><mi>r</mi></msub><msup><mi>U</mi><mo>∗</mo></msup><mi>A</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\frac{dU}{dt} &= k\_f A - k\_s U \\\\
\\\\
\\frac{dU^\*}{dt} &= k\_s U - k\_r U^\* A \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:3.50744em;"></span><span class="strut bottom" style="height:6.51488em;vertical-align:-3.0074400000000003em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-2.136em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit">d</span><span class="mord mathit">t</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit">d</span><span class="mord mathit" style="margin-right:0.10903em;">U</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:-0.6099999999999997em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span style="top:1.1214400000000007em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit">d</span><span class="mord mathit">t</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit">d</span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:2.6474400000000005em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-2.136em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit">A</span><span class="mbin">−</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit" style="margin-right:0.10903em;">U</span></span></span><span style="top:1.1214400000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mbin">−</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathit">A</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>Using the steady state approach, we find:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>U</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mi>A</mi><mfrac><mrow><msub><mi>k</mi><mi>f</mi></msub></mrow><mrow><msub><mi>k</mi><mi>s</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow><msup><mi>U</mi><mo>∗</mo></msup></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mfrac><mrow><msub><mi>k</mi><mi>f</mi></msub></mrow><mrow><msub><mi>k</mi><mi>r</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow><msub><mi>U</mi><mrow><mi>s</mi><mi>u</mi><mi>m</mi></mrow></msub></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mi>U</mi><mo>+</mo><msup><mi>U</mi><mo>∗</mo></msup><mo>=</mo><mfrac><mrow><msub><mi>τ</mi><mi>r</mi></msub></mrow><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow></mfrac><mo fence="false">(</mo><mn>1</mn><mo>+</mo><mfrac><mrow><mi>A</mi></mrow><mrow><mover accent="true"><mi>A</mi><mo>~</mo></mover></mrow></mfrac><mo fence="false">)</mo></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
U &= A \\frac{k\_f}{k\_s} \\\\
U^\* &= \\frac{k\_f}{k\_r} \\\\
U\_{sum} &= U + U^\* = \\frac{\\tau\_r}{\\tau\_f} \\bigg( 1 + \\frac{A}{\\tilde A} \\bigg) \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:4.268494em;"></span><span class="strut bottom" style="height:8.036988em;vertical-align:-3.7684939999999996em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-2.897054em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span></span></span><span style="top:-0.6896139999999995em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:1.5963860000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathit mtight">s</span><span class="mord mathit mtight">u</span><span class="mord mathit mtight">m</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:3.4084939999999997em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-2.897054em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord mathit">A</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:-0.6896139999999995em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:1.5963860000000003em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mbin">+</span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mord style-wrap reset-textstyle textstyle uncramped"><span class="delimsizing size3">(</span></span><span class="mord mathrm">1</span><span class="mbin">+</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.8101900000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord mathit">A</span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-0.2300000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit">A</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mord style-wrap reset-textstyle textstyle uncramped"><span class="delimsizing size3">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>where:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mover accent="true"><mrow><mi>A</mi></mrow><mo>~</mo></mover><mo>=</mo><mfrac><mrow><msub><mi>τ</mi><mi>r</mi></msub></mrow><mrow><msub><mi>τ</mi><mi>s</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\tilde{A} = \\frac{\\tau\_r}{\\tau\_s} \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.8217800000000002em;"></span><span class="strut bottom" style="height:3.143560000000001em;vertical-align:-1.3217800000000006em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.7142200000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle cramped"><span class="mord mathit">A</span></span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:0.9617800000000005em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>As you can see, the result is the same as the previous one, except for the multiplier <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mo>(</mo><mn>1</mn><mo>+</mo><mi>A</mi><mi mathvariant="normal">/</mi><mover accent="true"><mi>A</mi><mo>~</mo></mover><mo>)</mo></mrow><annotation encoding="application/x-tex">(1 + A/\tilde A)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.9201900000000001em;"></span><span class="strut bottom" style="height:1.17019em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">1</span><span class="mbin">+</span><span class="mord mathit">A</span><span class="mord mathrm">/</span><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord mathit">A</span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span></span></span></span></eq>. Let's consider 2 limiting cases:</p>
<ol>
<li><eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi><mo>≪</mo><mover accent="true"><mi>A</mi><mo>~</mo></mover></mrow><annotation encoding="application/x-tex">A \ll \tilde A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.9201900000000001em;"></span><span class="strut bottom" style="height:0.9592900000000001em;vertical-align:-0.0391em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span><span class="mrel">≪</span><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord mathit">A</span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></eq>. In this case, all of the previous statements are preserved: the number of chunks is not dependent on the number of nodes, which means that it does not grow with the growth of the cluster.</li>
<li><eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi><mo>≫</mo><mover accent="true"><mi>A</mi><mo>~</mo></mover></mrow><annotation encoding="application/x-tex">A \gg \tilde A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.9201900000000001em;"></span><span class="strut bottom" style="height:0.9592900000000001em;vertical-align:-0.0391em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span><span class="mrel">≫</span><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord mathit">A</span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></eq>. In this case, <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>U</mi><mrow><mi>s</mi><mi>u</mi><mi>m</mi></mrow></msub><mo>≃</mo><mi>A</mi><msub><mi>τ</mi><mi>s</mi></msub><mi mathvariant="normal">/</mi><msub><mi>τ</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">U_{sum} \simeq A \tau_s / \tau_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1.036108em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathit mtight">s</span><span class="mord mathit mtight">u</span><span class="mord mathit mtight">m</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">≃</span><span class="mord mathit">A</span><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathrm">/</span><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> and grows linearly with the increase of the number of nodes.</li>
</ol>
<p>To determine the case let's estimate <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mover accent="true"><mi>A</mi><mo>~</mo></mover></mrow><annotation encoding="application/x-tex">\tilde A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.9201900000000001em;"></span><span class="strut bottom" style="height:0.9201900000000001em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord mathit">A</span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></eq>. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>s</mi></msub></mrow><annotation encoding="application/x-tex">\tau_s</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is a typical cumulative time for detecting a under-replicated chunk and planning its replication. Crude evaluation (using the "finger-to-sky" technique) gives a value of 100 s. Thus:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mover accent="true"><mrow><mi>A</mi></mrow><mo>~</mo></mover><mo>=</mo><mfrac><mrow><mn>1</mn><mo>×</mo><mn>1</mn><msup><mn>0</mn><mn>5</mn></msup></mrow><mrow><mn>1</mn><mn>0</mn><mn>0</mn></mrow></mfrac><mo>=</mo><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\tilde{A} = \\frac{1 \\times 10^5}{100} = 1000 \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.938554em;"></span><span class="strut bottom" style="height:3.3771080000000007em;vertical-align:-1.4385540000000003em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.447446em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle cramped"><span class="mord mathit">A</span></span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mbin">×</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord mathrm mtight">5</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span></span></span><span style="top:1.0785540000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>Thus, further expansion of the cluster beyond this number will increase the likelihood of loss of chunk under these circumstances.</p>
<p>What can be done to improve the situation? It would seem possible to improve the asymptotic behavior by shifting the boundary <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mover accent="true"><mi>A</mi><mo>~</mo></mover></mrow><annotation encoding="application/x-tex">\tilde A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.9201900000000001em;"></span><span class="strut bottom" style="height:0.9201900000000001em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord mathit">A</span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></eq> by increasing <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>r</mi></msub></mrow><annotation encoding="application/x-tex">\tau_r</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>, but this will only increase the value of <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>U</mi><mrow><mi>s</mi><mi>u</mi><mi>m</mi></mrow></msub></mrow><annotation encoding="application/x-tex">U_{sum}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathit mtight">s</span><span class="mord mathit mtight">u</span><span class="mord mathit mtight">m</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> without any real improvement. The most appropriate way to do this is to decrease <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>s</mi></msub></mrow><annotation encoding="application/x-tex">\tau_s</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>, which is the time to make a decision to replicate a chunk because <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">\tau_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.716668em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> depends on the characteristics of the hardware, and the software tools cannot influence on that.</p>
<h2>Discussion of the Limit Cases</h2>
<p>The proposed model actually splits clusters into two camps.</p>
<p>The first camp consists of relatively small clusters with the number of nodes < 1000. In this case, the probability of obtaining a under-replicated chunk is described by a simple formula:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>U</mi><mo>=</mo><mfrac><mrow><msub><mi>τ</mi><mi>r</mi></msub></mrow><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
U = \\frac{\\tau\_r}{\\tau\_f} \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.8898339999999998em;"></span><span class="strut bottom" style="height:3.279668em;vertical-align:-1.389834em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.7822739999999997em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:1.029834em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>To improve the situation, two approaches can be applied:</p>
<ol>
<li>Improve the hardware, thereby increasing <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">\tau_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.716668em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>.</li>
<li>Speedup replication by reducing <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>r</mi></msub></mrow><annotation encoding="application/x-tex">\tau_r</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>.</li>
</ol>
<p>These methods are generally clear enough.</p>
<p>In the second camp, we have large and extra-large clusters with a number of nodes > 1000. Here, the dependency will be defined as follows:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>U</mi><mo>=</mo><mi>A</mi><mfrac><mrow><msub><mi>τ</mi><mi>s</mi></msub></mrow><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
U = A \\frac{\\tau\_s}{\\tau\_f} \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.8898339999999998em;"></span><span class="strut bottom" style="height:3.279668em;vertical-align:-1.389834em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.7822739999999997em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mrel">=</span><span class="mord mathit">A</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:1.029834em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>That is, it will be proportional to the number of nodes, which means that the subsequent increase in the cluster will have a negative impact on the likelihood of the appearance of under-replicated chunks. However, you can significantly reduce negative effects by using the following approaches:</p>
<ol>
<li>Continue to increase <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">\tau_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.716668em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>.</li>
<li>Improve the detection of under-replicated chunks and subsequent replication scheduling, thereby reducing <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>s</mi></msub></mrow><annotation encoding="application/x-tex">\tau_s</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>.</li>
</ol>
<p>The second approach is no longer obvious. It seems that there is no significant difference between 20 seconds and 100 seconds for the value <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>s</mi></msub></mrow><annotation encoding="application/x-tex">\tau_s</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>. However, this value significantly influences the probability of under-replicated chunks. It is also not obvious that a dependency on <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>r</mi></msub></mrow><annotation encoding="application/x-tex">\tau_r</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is missing, i.e. the speed of replication does not play a role. This is understandable in this model: with the increase in the number of nodes, this speed is only increasing, so the replication of chunk is beginning to be significantly influenced by the constant addition of the replication process to detect and plan replication.</p>
<p>It's worth to consider <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">\tau_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.716668em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> in detail. In addition to the direct contribution to the chunks lifecycle, an increase of <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>t</mi><mi>a</mi><msub><mi>u</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">tau_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.61508em;"></span><span class="strut bottom" style="height:0.9011879999999999em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord mathit">t</span><span class="mord mathit">a</span><span class="mord"><span class="mord mathit">u</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> has a positive effect on the number of available nodes, because:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>A</mi><mo>=</mo><mi>N</mi><mo>−</mo><mi>F</mi><mo>≃</mo><mi>N</mi><mo fence="false">(</mo><mn>1</mn><mo>−</mo><mfrac><mrow><msub><mi>τ</mi><mi>a</mi></msub></mrow><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow></mfrac><mo fence="false">)</mo></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
A = N - F \\simeq N \\bigg( 1 - \\frac{\\tau\_a}{\\tau\_f} \\bigg) \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:2.0610539999999995em;"></span><span class="strut bottom" style="height:3.6221079999999994em;vertical-align:-1.561054em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.6110539999999995em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit">A</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.10903em;">N</span><span class="mbin">−</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mrel">≃</span><span class="mord mathit" style="margin-right:0.10903em;">N</span><span class="mord style-wrap reset-textstyle textstyle uncramped"><span class="delimsizing size3">(</span></span><span class="mord mathrm">1</span><span class="mbin">−</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mord style-wrap reset-textstyle textstyle uncramped"><span class="delimsizing size3">)</span></span></span></span><span style="top:1.2010539999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>That is, it increases the number of available nodes. So, an improvement of <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>f</mi></msub></mrow><annotation encoding="application/x-tex">\tau_f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.716668em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> directly affects the availability of the cluster resources, speeding up the computation, while increasing the reliability of the data storage. On the other hand, the improvement in hardware quality directly affects the cost of ownership of the cluster. The model provides a quantitative measure of the economic feasibility of this type of solution.</p>
<h2>Comparison of Approaches</h2>
<p>I would like to compare the two approaches. The following graphs will tell you this eloquently.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfhk3lyq7Zuy3HDbWxA2X17Be1WDZHIWZX8ytVFE1NcnInw7ib771NX_tGT3xKebvPAMEvAJj1W_kzBdfoBrayGrcrcEG8ISqyN3sA6ynbxRr1GmhyEYdPP4iosNGk6kRZbT16PHC-WBY/s640/dataloss.png" alt="Data loss"></p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9PLFO9rO1RHmAjPqR3T_M4mkHnDmLURHK88mWxRqcTh3iZZc23COhLrLetD51zbiK6xVaYatkOz_eDJZvFYmuHF0cSFMGoDDCo3NcK0GD3fQgDrIuhwg6Nx2a48nFYDTBQA3TSOg7yXQ/" alt="Kinetics"></p>
<p>You can only see a linear dependency from the first graph, but it will not answer the question: "What do you need to do to improve the situation?" The second picture describes a more complex model that can immediately provide answers to questions about what to do and how to improve the behavior of the replication process. Moreover, it provides a recipe for a quick way, literally in mind, estimating the effects of some architectural decisions. In other words, the predictive strength of the developed model is at a qualitatively different level.</p>
<h2>Chunk Loss</h2>
<p>Now let's obtain the typical time of chunk loss. To do this, we write out the kinetics of the processes of formation of such chunks taking into account the replication factor 3:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mi>A</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>F</mi><mo>+</mo><mi>U</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>f</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>4</mn><mi mathvariant="normal">.</mi><mn>1</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><mi>F</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>A</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>a</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>4</mn><mi mathvariant="normal">.</mi><mn>2</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><mi>U</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><msup><mi>U</mi><mo>∗</mo></msup><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>s</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>4</mn><mi mathvariant="normal">.</mi><mn>3</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><msup><mi>U</mi><mo>∗</mo></msup><mo>+</mo><mi>A</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>H</mi><mo>+</mo><mi>A</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>r</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>4</mn><mi mathvariant="normal">.</mi><mn>4</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><mi>U</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>F</mi><mo>+</mo><msub><mi>U</mi><mn>2</mn></msub><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>f</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>4</mn><mi mathvariant="normal">.</mi><mn>5</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><msup><mi>U</mi><mo>∗</mo></msup></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>F</mi><mo>+</mo><msub><mi>U</mi><mn>2</mn></msub><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>f</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>4</mn><mi mathvariant="normal">.</mi><mn>6</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><msub><mi>U</mi><mn>2</mn></msub></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><msubsup><mi>U</mi><mn>2</mn><mo>∗</mo></msubsup><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>s</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>4</mn><mi mathvariant="normal">.</mi><mn>7</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><msubsup><mi>U</mi><mn>2</mn><mo>∗</mo></msubsup><mo>+</mo><mi>A</mi></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>U</mi><mo>+</mo><mi>A</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>r</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>4</mn><mi mathvariant="normal">.</mi><mn>8</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><msub><mi>U</mi><mn>2</mn></msub></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>F</mi><mo>+</mo><mi>L</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>f</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>4</mn><mi mathvariant="normal">.</mi><mn>9</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow><msubsup><mi>U</mi><mn>2</mn><mo>∗</mo></msubsup></mrow></mtd><mtd><mrow><mrow></mrow><mo>→</mo><mi>F</mi><mo>+</mo><mi>L</mi><mo separator="true">,</mo><mtext> </mtext><msub><mi>k</mi><mi>f</mi></msub></mrow></mtd><mtd><mrow><mo>(</mo><mn>4</mn><mi mathvariant="normal">.</mi><mn>1</mn><mn>0</mn><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
A & \\rightarrow F + U,~k\_f &(4.1) \\\\
F & \\rightarrow A,~k\_a &(4.2) \\\\
U & \\rightarrow U^\*,~k\_s &(4.3) \\\\
U^\* + A & \\rightarrow H + A,~k\_r &(4.4) \\\\
U & \\rightarrow F + U\_2,~k\_f &(4.5) \\\\
U^\* & \\rightarrow F + U\_2,~k\_f &(4.6) \\\\
U\_2 & \\rightarrow U\_2^\*,~k\_s &(4.7) \\\\
U\_2^\* + A & \\rightarrow U + A,~k\_r &(4.8) \\\\
U\_2 & \\rightarrow F + L,~k\_f &(4.9) \\\\
U\_2^\* & \\rightarrow F + L,~k\_f &(4.10) \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:6.849999999999999em;"></span><span class="strut bottom" style="height:13.2em;vertical-align:-6.35em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-6.009999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit">A</span></span></span><span style="top:-4.809999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">F</span></span></span><span style="top:-3.609999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span></span></span><span style="top:-2.409999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mbin">+</span><span class="mord mathit">A</span></span></span><span style="top:-1.209999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span></span></span><span style="top:-0.009999999999997733em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:1.1900000000000024em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:2.390000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.247em;margin-left:-0.10903em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span style="top:-0.4129999999999999em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mbin">+</span><span class="mord mathit">A</span></span></span><span style="top:3.590000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:4.79em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.247em;margin-left:-0.10903em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span style="top:-0.4129999999999999em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:5.989999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-6.009999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mbin">+</span><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-4.809999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit">A</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">a</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-3.609999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-2.409999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mbin">+</span><span class="mord mathit">A</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-1.209999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mbin">+</span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-0.009999999999997733em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mbin">+</span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:1.1900000000000024em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.247em;margin-left:-0.10903em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span style="top:-0.4129999999999999em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:2.390000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="mbin">+</span><span class="mord mathit">A</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:3.590000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mbin">+</span><span class="mord mathit">L</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:4.79em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">→</span><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="mbin">+</span><span class="mord mathit">L</span><span class="mpunct">,</span><span class="mord"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="arraycolsep" style="width:2em;"></span><span class="col-align-r"><span class="vlist"><span style="top:-6.009999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">4</span><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mclose">)</span></span></span><span style="top:-4.809999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">4</span><span class="mord mathrm">.</span><span class="mord mathrm">2</span><span class="mclose">)</span></span></span><span style="top:-3.609999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">4</span><span class="mord mathrm">.</span><span class="mord mathrm">3</span><span class="mclose">)</span></span></span><span style="top:-2.409999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">4</span><span class="mord mathrm">.</span><span class="mord mathrm">4</span><span class="mclose">)</span></span></span><span style="top:-1.209999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">4</span><span class="mord mathrm">.</span><span class="mord mathrm">5</span><span class="mclose">)</span></span></span><span style="top:-0.009999999999997733em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">4</span><span class="mord mathrm">.</span><span class="mord mathrm">6</span><span class="mclose">)</span></span></span><span style="top:1.1900000000000024em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">4</span><span class="mord mathrm">.</span><span class="mord mathrm">7</span><span class="mclose">)</span></span></span><span style="top:2.390000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">4</span><span class="mord mathrm">.</span><span class="mord mathrm">8</span><span class="mclose">)</span></span></span><span style="top:3.590000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">4</span><span class="mord mathrm">.</span><span class="mord mathrm">9</span><span class="mclose">)</span></span></span><span style="top:4.79em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">4</span><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>Here <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>U</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">U_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> indicates the number of under-replicated chunks that lost two copies, <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msubsup><mi>U</mi><mn>2</mn><mo>∗</mo></msubsup></mrow><annotation encoding="application/x-tex">U_2^*</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.688696em;"></span><span class="strut bottom" style="height:0.935696em;vertical-align:-0.247em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.247em;margin-left:-0.10903em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is an intermediate state, similar to <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>U</mi><mo>∗</mo></msup></mrow><annotation encoding="application/x-tex">U^*</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.688696em;"></span><span class="strut bottom" style="height:0.688696em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> corresponding to substance <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>U</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">U_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>, and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>L</mi></mrow><annotation encoding="application/x-tex">L</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">L</span></span></span></span></eq> is the lost chunk. Then:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><mfrac><mrow><mi>d</mi><mi>L</mi></mrow><mrow><mi>d</mi><mi>t</mi></mrow></mfrac></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><msub><mi>k</mi><mi>f</mi></msub><mo fence="false">(</mo><msub><mi>U</mi><mn>2</mn></msub><mo>+</mo><msubsup><mi>U</mi><mn>2</mn><mo>∗</mo></msubsup><mo fence="false">)</mo></mrow></mtd></mtr><mtr><mtd><mrow><msub><mi>τ</mi><mi>l</mi></msub></mrow></mtd><mtd><mrow><mrow></mrow><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><msub><mi>k</mi><mi>f</mi></msub><mo fence="false">(</mo><msub><mi>U</mi><mn>2</mn></msub><mo>+</mo><msubsup><mi>U</mi><mn>2</mn><mo>∗</mo></msubsup><mo fence="false">)</mo></mrow></mfrac></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\frac{dL}{dt} &= k\_f \\big( U\_2 + U\_2^\* \\big) \\\\
\\tau\_l &= \\frac{1}{k\_f \\big( U\_2 + U\_2^\* \\big) } \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:3.0844449999999997em;"></span><span class="strut bottom" style="height:5.668889999999999em;vertical-align:-2.5844449999999997em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-1.7130049999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit">d</span><span class="mord mathit">t</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit">d</span><span class="mord mathit">L</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span style="top:0.294435em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.01968em;">l</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:2.224445em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="col-align-l"><span class="vlist"><span style="top:-1.7130049999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord style-wrap reset-textstyle textstyle uncramped"><span class="delimsizing size1">(</span></span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mbin">+</span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.247em;margin-left:-0.10903em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span style="top:-0.4129999999999999em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord style-wrap reset-textstyle textstyle uncramped"><span class="delimsizing size1">)</span></span></span></span><span style="top:0.294435em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord displaystyle textstyle uncramped"></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.74em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03148em;">k</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03148em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord style-wrap reset-textstyle textstyle uncramped"><span class="delimsizing size1">(</span></span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mbin">+</span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">U</span><span class="msupsub"><span class="vlist"><span style="top:0.26630799999999993em;margin-left:-0.10903em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span style="top:-0.32049599999999995em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mbin mtight">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord style-wrap reset-textstyle textstyle uncramped"><span class="delimsizing size1">)</span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:1em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>Where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>l</mi></msub></mrow><annotation encoding="application/x-tex">\tau_l</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.01968em;">l</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is the typical time of the formation of the lost chunk. We'll solve our system for two limit cases when <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi><mo>=</mo><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn></mrow><annotation encoding="application/x-tex">A = 1000</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span></span></span></span></eq>.</p>
<p><eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi><mo>≪</mo><mover accent="true"><mi>A</mi><mo>~</mo></mover></mrow><annotation encoding="application/x-tex">A \ll \tilde A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.9201900000000001em;"></span><span class="strut bottom" style="height:0.9592900000000001em;vertical-align:-0.0391em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span><span class="mrel">≪</span><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord mathit">A</span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></eq>, then</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><msub><mi>τ</mi><mi>l</mi></msub><mo>=</mo><mi>A</mi><mfrac><mrow><msubsup><mi>τ</mi><mi>f</mi><mn>3</mn></msubsup></mrow><mrow><msubsup><mi>τ</mi><mi>r</mi><mn>2</mn></msubsup></mrow></mfrac><mo>≈</mo><mn>1</mn><mn>0</mn><mn>0</mn><mtext> </mtext><mn>0</mn><mn>0</mn><mn>0</mn><mtext> </mtext><mn>0</mn><mn>0</mn><mn>0</mn><mtext> </mtext><mi>y</mi><mi>e</mi><mi>a</mi><mi>r</mi><mi>s</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\tau\_l = A \\frac{\\tau\_f^3}{\\tau\_r^2} \\approx 100\\ 000\\ 000\\ years \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:2.1281619999999997em;"></span><span class="strut bottom" style="height:3.756324em;vertical-align:-1.6281620000000006em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.5048379999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.01968em;">l</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">=</span><span class="mord mathit">A</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.247em;margin-left:-0.1132em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span style="top:-0.28900000000000003em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.8092159999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.2831079999999999em;margin-left:-0.1132em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord mathrm mtight">3</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">≈</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm"><span class="mspace"> </span><span class="mord mathrm">0</span></span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm"><span class="mspace"> </span><span class="mord mathrm">0</span></span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathit"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03588em;">y</span></span><span class="mord mathit">e</span><span class="mord mathit">a</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mord mathit">s</span></span></span><span style="top:1.2681620000000005em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>For the case <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi><mo>≫</mo><mover accent="true"><mi>A</mi><mo>~</mo></mover></mrow><annotation encoding="application/x-tex">A \gg \tilde A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.9201900000000001em;"></span><span class="strut bottom" style="height:0.9592900000000001em;vertical-align:-0.0391em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span><span class="mrel">≫</span><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord mathit">A</span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></eq> we obtain:</p>
<section><eqn><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mrow><msub><mi>τ</mi><mi>l</mi></msub><mo>=</mo><mfrac><mrow><msubsup><mi>τ</mi><mi>f</mi><mn>3</mn></msubsup></mrow><mrow><mi>A</mi><msubsup><mi>τ</mi><mi>s</mi><mn>2</mn></msubsup></mrow></mfrac><mo>≈</mo><mn>1</mn><mn>0</mn><mn>0</mn><mtext> </mtext><mn>0</mn><mn>0</mn><mn>0</mn><mtext> </mtext><mn>0</mn><mn>0</mn><mn>0</mn><mtext> </mtext><mi>y</mi><mi>e</mi><mi>a</mi><mi>r</mi><mi>s</mi></mrow></mtd></mtr><mtr><mtd><mrow></mrow></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">
\\begin{aligned}
\\tau\_l = \\frac{\\tau\_f^3}{A \\tau\_s^2} \\approx 100\\ 000\\ 000\\ years \\\\
\\end{aligned}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:2.1281619999999997em;"></span><span class="strut bottom" style="height:3.756324em;vertical-align:-1.6281620000000006em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist"><span style="top:-0.5048379999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.01968em;">l</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit">A</span><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.247em;margin-left:-0.1132em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">s</span></span></span><span style="top:-0.28900000000000003em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.8092159999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.2831079999999999em;margin-left:-0.1132em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span></span></span><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord mathrm mtight">3</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">≈</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm"><span class="mspace"> </span><span class="mord mathrm">0</span></span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm"><span class="mspace"> </span><span class="mord mathrm">0</span></span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathit"><span class="mspace"> </span><span class="mord mathit" style="margin-right:0.03588em;">y</span></span><span class="mord mathit">e</span><span class="mord mathit">a</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mord mathit">s</span></span></span><span style="top:1.2681620000000005em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord displaystyle textstyle uncramped"></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></span></span></eqn></section>
<p>That is, the typical time of the formation of the lost chunk is 100 million years! There are roughly similar values for the mentioned two cases as we are in the transition zone. The typical time value <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>l</mi></msub></mrow><annotation encoding="application/x-tex">\tau_l</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.01968em;">l</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> tells for itself and everyone can draw conclusions by itself.</p>
<p>It is worth mentioning one thing, however. In the case <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi><mo>≪</mo><mover accent="true"><mi>A</mi><mo>~</mo></mover></mrow><annotation encoding="application/x-tex">A \ll \tilde A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.9201900000000001em;"></span><span class="strut bottom" style="height:0.9592900000000001em;vertical-align:-0.0391em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span><span class="mrel">≪</span><span class="mord accent"><span class="vlist"><span style="top:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="mord mathit">A</span></span><span style="top:-0.60233em;margin-left:0.27778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="accent-body"><span>~</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></eq> the value <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>U</mi></mrow><annotation encoding="application/x-tex">U</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">U</span></span></span></span></eq> is a constant and does not depend on <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq>. But in an expression for <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mi>l</mi></msub></mrow><annotation encoding="application/x-tex">\tau_l</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.01968em;">l</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> we obtain a reciprocal dependency, that is, as the cluster grows, triple replication even improves data safety! And finally, as the cluster continues to grow, the situation changes exactly to the opposite.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcOSHTw3Pu5uCH4oPxAXhMYl4Co9XFgTHFFUdCGreSd6NXHfFmY3OyppoUZqdtDGN863ePGoLE8yalozx9kFcFiQImpoR3ARzalXl0Iev2tjhZGbgZm-Q7HKl3yE5lbXFrmQFfcASCKFo/" alt="Kinetics"></p>
<h2>Conclusion</h2>
<p>The article consistently introduces an innovative way of simulating the kinetics of large cluster processes. The approximate model for describing the dynamics of a cluster can be considered as probabilistic characteristics that describe the data loss.</p>
<p>Of course, this model is only the first approximation to what is actually happening on the cluster. Here we have only taken into account the most important processes in order to produce a qualitative result. But even such a model allows you to judge what is happening within the cluster and also provides recommendations to improve the situation.</p>
<p>However, the suggested approach allows for more accurate and reliable results based on a subtle consideration of different factors and an analysis of the actual performance of the cluster. Below is a far from the exhaustive list for improving the model:</p>
<ol>
<li>Cluster nodes can fail due to various hardware failures. The failure of a particular node usually has a different probability. Moreover, a failure, for example, of a processor, does not lose data, but only gives a temporary inaccessibility of the node. It is easy to take into account in the model, introducing different states <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>F</mi><mrow><mi>p</mi><mi>r</mi><mi>o</mi><mi>c</mi></mrow></msub></mrow><annotation encoding="application/x-tex">F_{proc}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.969438em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.13889em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathit mtight">p</span><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span><span class="mord mathit mtight">o</span><span class="mord mathit mtight">c</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>, <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>F</mi><mrow><mi>d</mi><mi>i</mi><mi>s</mi><mi>k</mi></mrow></msub></mrow><annotation encoding="application/x-tex">F_{disk}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.13889em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathit mtight">d</span><span class="mord mathit mtight">i</span><span class="mord mathit mtight">s</span><span class="mord mathit mtight" style="margin-right:0.03148em;">k</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>, <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>F</mi><mrow><mi>m</mi><mi>e</mi><mi>m</mi></mrow></msub></mrow><annotation encoding="application/x-tex">F_{mem}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.13889em;">F</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.13889em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathit mtight">m</span><span class="mord mathit mtight">e</span><span class="mord mathit mtight">m</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> etc, with different process rates and different consequences.</li>
<li>Not all nodes are equally useful. Different batches may have different natures and frequency of failures. This can be taken into account in the model by introducing <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>A</mi><mn>1</mn></msub></mrow><annotation encoding="application/x-tex">A_1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">A</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>, <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>A</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">A_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">A</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>, and so on, at different rates of the corresponding processes.</li>
<li>The addition of different types of nodes to the model: partially damaged discs, banned, etc. For example, you can analyze the impact of shutting down a rack and determining the typical speeds of a cluster's transition to a steady mode. In doing so, the dynamics of chunks and nodes can be visualized by numerically solving differential equations.</li>
<li>Each storage disk has slightly different read/write characteristics, including latency and bandwidth. However, you can estimate more accurately the rate constants of the processes by integrating the corresponding disk-specific distribution functions by analogy with the velocity constants in the gases, which are integrated over <a href="https://en.wikipedia.org/wiki/Maxwell%E2%80%93Boltzmann_distribution">Maxwell–Boltzmann distribution</a>.</li>
</ol>
<p>Thus, on the one hand, the kinetic approach allows simplify the description and analysis, and on the other hand, has a very serious potential for introducing additional subtle factors and processes based on the analysis of cluster working data, adding specific details on demand. You can evaluate the impact of each factor's contribution to the resulting equations, allowing you to simulate improvements to make them useful. In the simplest case, this model enables you to quickly get analytical dependencies by providing recipes to improve the situation. The simulation can be bi-directional in nature: You can iteratively improve the model by adding processes to the kinetic equation system and try to analyze the potential system improvements by entering processes in the model. That is, to simulate improvements before the expensive changes by direct implementing them in your code and hardware.</p>
<p>In addition, it is always possible to move to numerical integration of stiff nonlinear differential equations obtaining the dynamic of the system and response to specific effects or small perturbations.</p>
<p>Thus, a synergy of seemingly unrelated fields of knowledge can produce astonishing results with indisputable predictive power.</p>
<h2>References</h2>
<p>[1] <a href="http://gridem.blogspot.com/2017/03/cap-theorem-myths.html">Cap Theorem Myths</a>.<br>
[2] <a href="https://cloud.google.com/spanner/docs/whitepapers/SpannerAndCap.pdf">Spanner, TrueTime and the CAP Theorem</a>.<br>
[3] <a href="https://martin.kleppmann.com/2017/01/26/data-loss-in-large-clusters.html">The probability of data loss in large clusters</a>.<br>
[4] <a href="https://en.wikipedia.org/wiki/Chemical_kinetics">Chemical kinetics</a>.<br>
[5] <a href="https://en.wikipedia.org/wiki/Activated_complex">Activated complex</a>.<br>
[6] <a href="https://en.wikipedia.org/wiki/Steady_state_(chemistry)#Steady_state_approximation_in_chemical_kinetics">Steady state approximation</a>.<br>
[7] <a href="https://en.wikipedia.org/wiki/Maxwell%E2%80%93Boltzmann_distribution">Maxwell–Boltzmann distribution</a>.<br></p>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com2tag:blogger.com,1999:blog-7694239937514449322.post-60580379368452632332017-08-13T06:36:00.000-07:002017-08-19T13:27:54.903-07:00Latency of Geo-Distributed Databases<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.7.1/katex.min.css">
<p><strong>Theorem 0</strong>. The minimum guaranteed latency for the globally highly available strong consistency database is 133 ms.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhaW4jspsXT3FnQxD-1a1412LZMA2liWqbP8Zabc7V5aE2aAKttwhffXetQF0-9yb2xQYYskPeT6D2HRSSLZTQPr1Z99WVVd6Y5hgUHCSuGILHp8S5bQ0z_JeKXeftKGSLtJ-2-keF2Qls/s480/earth.png" alt="Earth"></p>
<h2>1 Abstract</h2>
<p>The article introduces step-by-step the formal vocabulary and auxiliary lemmas and theorems to prove the main theorem 0. Finally, as a consequence, the CAL theorem is formulated.</p>
<h2>2 Introduction</h2>
<p>Modern applications require intensive work with huge amount of data. It includes both massive transactional processing and analytical research. As an answer to the current demand, the new generation of databases appears: NewSQL databases. Those databases provide the following important characteristics: horizontal scalability, geo-availability, and strong consistency.</p>
<p>NewSQL era opens new possibilities to store and process so called Big Data. At the same time, the important question appears: "how fast the databases might be?". It is very challenging task to improve the performance and latency parameters because it involves almost all layers while building the databases: from hardware questions about data centers connectivity and availability to software sophisticated algorithms and architectural design.</p>
<p>Thus, we need to understand the degree of latency optimizations and corresponding limitations that we have to deal with. The article tries to find answers to that challenge.</p>
<a name='more'></a>
<h2>3 Model</h2>
<p>The globally distributed databases have different characteristics and assumptions based on the features and scalability properties they provide. I consider the following assumptions to specify the model to be used.</p>
<p>First of all, I consider that all participants belong to the sphere with some radius <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>R</mi></mrow><annotation encoding="application/x-tex">R</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.00773em;">R</span></span></span></span></eq>. Secondly, I do not need to restrict the failure mode because the final result does not depend on that. Thus without loss of generality, I consider the non-failure mode and put some consideration about real systems behavior under different failure conditions later.</p>
<p>The participants involved in the process are cooperated through the message exchanging. Each participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq> may send the information <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>I</mi></mrow><annotation encoding="application/x-tex">I</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.07847em;">I</span></span></span></span></eq> to another participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05017em;">B</span></span></span></span></eq> by sending the message with needed information <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>I</mi></mrow><annotation encoding="application/x-tex">I</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.07847em;">I</span></span></span></span></eq>. Any participant is represented as the point on the sphere and connected to another participant by some curve on the sphere.</p>
<h3>3.1 Speed</h3>
<p><strong>Definition 3.1.1</strong>. The <em>distance between two participants</em> is the length of the curve between two points on the sphere.</p>
<p><strong>Definition 3.1.2</strong>. The <em>minimum distance between two participants</em> is the minimum distance across all possible curves on the sphere that may connect two participants.</p>
<p>The message exchanging does not take place immediately and has time characteristics.</p>
<p><strong>Definition 3.1.3</strong>. The <em>information propagation speed</em> between two participants <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq> and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05017em;">B</span></span></span></span></eq> is <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>l</mi><mi mathvariant="normal">/</mi><mi>t</mi></mrow><annotation encoding="application/x-tex">l/t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.01968em;">l</span><span class="mord mathrm">/</span><span class="mord mathit">t</span></span></span></span></eq> where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>l</mi></mrow><annotation encoding="application/x-tex">l</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.69444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.01968em;">l</span></span></span></span></eq> is the distance between participants and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>t</mi></mrow><annotation encoding="application/x-tex">t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.61508em;"></span><span class="strut bottom" style="height:0.61508em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">t</span></span></span></span></eq> is the time required to send the information by <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq> and receive the information by <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05017em;">B</span></span></span></span></eq>.</p>
<p>The information has the limited propagation speed.</p>
<p><strong>Definition 3.1.4</strong>. The <em>speed of information propagation</em> has the limit <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span></eq>.</p>
<p>Thus the minimum time required to propagate the information from participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq> to participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05017em;">B</span></span></span></span></eq> is <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>l</mi><mi>m</mi></msub><mi mathvariant="normal">/</mi><mi>c</mi></mrow><annotation encoding="application/x-tex">l_m/c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.01968em;">l</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.01968em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">m</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathrm">/</span><span class="mord mathit">c</span></span></span></span></eq> where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>l</mi><mi>m</mi></msub></mrow><annotation encoding="application/x-tex">l_m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.01968em;">l</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.01968em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">m</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is the minimum distance between <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq> and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05017em;">B</span></span></span></span></eq>.</p>
<p><strong>Definition 3.1.5</strong>. The <em>wave propagation</em> between participants <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span></span></span></span></eq> and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05017em;">B</span></span></span></span></eq> is the information propagation with minimum distance and maximum speed <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span></eq>.</p>
<p>Thus the wave propagation is spread on the sphere with speed <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span></eq>.</p>
<p><strong>Lemma 3.1.1</strong>. Information cannot propagate faster than wave propagation.</p>
<p><em>Proof</em>. By definition of the wave propagation: it uses the largest speed <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span></eq> and smallest distance <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.69444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">d</span></span></span></span></eq>; thus the ratio <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>d</mi><mi mathvariant="normal">/</mi><mi>c</mi></mrow><annotation encoding="application/x-tex">d/c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">d</span><span class="mord mathrm">/</span><span class="mord mathit">c</span></span></span></span></eq> is minimum across all possibilities. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<h3>3.2 Consistency</h3>
<p>Strong consistent databases have a different meaning in different contexts. I consider linearizability property as strong consistency.</p>
<p><strong>Definition 3.2.1</strong>. The <em>operation</em> is the message issued by the client to be committed.</p>
<p>The proof is based on the following linearizability characteristics:</p>
<ol>
<li>There is a global sequence of operations. Each operation is committed using some incremental index <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>i</mi></mrow><annotation encoding="application/x-tex">i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.65952em;"></span><span class="strut bottom" style="height:0.65952em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">i</span></span></span></span></eq>.</li>
<li>To commit the operation on the <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>i</mi></mrow><annotation encoding="application/x-tex">i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.65952em;"></span><span class="strut bottom" style="height:0.65952em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">i</span></span></span></span></eq> index the system must have the information about the whole previous history, e.g. must know the information about all previous commits with indexes <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>j</mi><mo><</mo><mi>i</mi></mrow><annotation encoding="application/x-tex">j < i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.65952em;"></span><span class="strut bottom" style="height:0.85396em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05724em;">j</span><span class="mrel"><</span><span class="mord mathit">i</span></span></span></span></eq>.</li>
<li>The system may respond to the client with successful operation commit only if it commits the operation into the global sequence of operations.</li>
<li>If the system commits the operation using index <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>i</mi></mrow><annotation encoding="application/x-tex">i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.65952em;"></span><span class="strut bottom" style="height:0.65952em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">i</span></span></span></span></eq> thus any participant that knows about committed operation for index <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>i</mi></mrow><annotation encoding="application/x-tex">i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.65952em;"></span><span class="strut bottom" style="height:0.65952em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">i</span></span></span></span></eq> has the same operation for that index. Thus each participant has the same information about prefix of the global sequence of operations.</li>
</ol>
<p><strong>Theorem 3.2.1</strong>. If there are two committed linearizable operations issued by the clients thus there is a participant that knew the information related to the both operations.</p>
<p><em>Proof</em>. To commit linearizable operations the participant must know the history for the previous operations. Thus, in any case, it knows the current operation and previous one, thus it knows both. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p>The theorem can be easily generalized for any number of committed operations.</p>
<p><strong>Theorem 3.2.2</strong>. If there are several committed linearizable operations issued by the clients thus there is a participant that knew the information related to all operations.</p>
<p><em>Proof</em>. The same as theorem 3.2.1. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p><strong>Theorem 3.2.3</strong>. If there are several committed linearizable operations issued by the clients thus there is a point on the sphere where all waves propagate from all clients to that point.</p>
<p><em>Proof</em> by contradiction. If there is no such point thus there is no any participant that has all information related to operations due to lemma 3.1.1. It contradicts to theorem 3.2.2. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p><strong>Definition 3.2.2</strong>. The <em>client commits</em> the operation when it receives the message from a participant that has committed the operation.</p>
<p><strong>Definition 3.2.3</strong>. The <em>client commit latency</em> is the time between issuing the operation and committing it by the client.</p>
<p><strong>Definition 3.2.4</strong>. The <em>guaranteed commit latency</em> is <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>t</mi><mi>g</mi></msub></mrow><annotation encoding="application/x-tex">t_g</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.61508em;"></span><span class="strut bottom" style="height:0.9011879999999999em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">t</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.03588em;">g</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> if <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">∀</mi><mi>c</mi><mi>l</mi><mi>i</mi><mi>e</mi><mi>n</mi><mi>t</mi><mo>⇒</mo><msub><mi>t</mi><mi>c</mi></msub><mo>≤</mo><msub><mi>t</mi><mi>g</mi></msub></mrow><annotation encoding="application/x-tex">\forall client \Rightarrow t_c \leq t_g</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.980548em;vertical-align:-0.286108em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">∀</span><span class="mord mathit">c</span><span class="mord mathit" style="margin-right:0.01968em;">l</span><span class="mord mathit">i</span><span class="mord mathit">e</span><span class="mord mathit">n</span><span class="mord mathit">t</span><span class="mrel">⇒</span><span class="mord"><span class="mord mathit">t</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">≤</span><span class="mord"><span class="mord mathit">t</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.03588em;">g</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>t</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">t_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.61508em;"></span><span class="strut bottom" style="height:0.76508em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">t</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is the client commit latency.</p>
<p><strong>Definition 3.2.5</strong>. The <em>wave half-time</em> is the time needed to propagate from the participant to the diametrically opposite participant.</p>
<p><strong>Definition 3.2.6</strong>. The <em>wave full-time</em> is doubled wave half-time.</p>
<p><strong>Theorem 3.2.4</strong>. The minimum guaranteed commit latency is the wave half-time.</p>
<p><em>Proof</em>. Let all clients be spread across the sphere. Let all clients issue the operation to be committed at the same time i.e. concurrently. To commit the sequence of operations according to theorem 3.2.3 there must be at least one point on the sphere where all waves propagate from all clients to that point <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq>. Let the opposite point be <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>o</mi></msub></mrow><annotation encoding="application/x-tex">p_o</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">o</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>. At that point, according to our assumptions, we have a client that issues the operation. The commit latency according to lemma 3.1.1 is at least the wave half-time because wave half-time is needed to propagate from the <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>o</mi></msub></mrow><annotation encoding="application/x-tex">p_o</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">o</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> to <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq>. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p><strong>Theorem 3.2.5</strong>. The wave half-time guaranteed commit latency is theoretically achievable.</p>
<p><em>Proof</em> by example. Let all participants be spread across the sphere. In that case, any client issues the operation to the closest participant. Each participant broadcasts the operation to others and waits for the response from all participants, then commits the sequence of received operations. The commit latency is described as the largest wave propagation time between participants and equals to the wave half-time. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<h3>3.3 Geo Split</h3>
<p>The links are unreliable. Thus at any time there is a possibility when part of the system becomes unavailable for the rest of the system. It strictly affects the algorithms to handle such complex scenarios.</p>
<p><strong>Definition 3.3.1</strong>. Group of participants is <em>unavailable</em> when the rest of participants cannot propagate the information to that group.</p>
<p><strong>Definition 3.3.2</strong>. <em>Split</em> is the situation when there is an unavailable group.</p>
<p><strong>Definition 3.3.3</strong>. The system is <em>highly available</em> when it can handle split.</p>
<p><strong>Theorem 3.3.1</strong>. The guaranteed commit of the highly available system requires at least one round trip between available participants.</p>
<p><em>Proof</em> by contradiction. Let us assume that half round trip is enough to commit. Consider the participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> received the operation <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>O</mi></mrow><annotation encoding="application/x-tex">O</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.02778em;">O</span></span></span></span></eq> from the client to be committed. The half round trip allows to receive the operations from other participants and the participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> must make a decision to commit the operations. If we assume that the participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> belongs to unavailable group thus the operation of this participant may not be propagated to other participants. Thus we obtain the situation when the participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> commits the operation <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>O</mi></mrow><annotation encoding="application/x-tex">O</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.02778em;">O</span></span></span></span></eq> while others cannot commit it due to <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> unavailability. Thus that commit is inconsistent. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p>The theorem provides the important clue about the latency. The totally available system requires only half round trip to commit the operations while the highly available system requires at least one round trip.</p>
<p><strong>Definition 3.3.4</strong>. The <em>commit group</em> is the participants that make a decision to commit the operations.</p>
<p><strong>Theorem 3.3.2</strong>. The minimum guaranteed commit latency for the highly available system is the wave full-time.</p>
<p><em>Proof</em>. Let all clients be spread across the sphere. Let all clients issue the operation to be committed concurrently. Let <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>G</mi></mrow><annotation encoding="application/x-tex">G</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">G</span></span></span></span></eq> be the commit group for the client operations. Let <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>c</mi></msub><mo>∈</mo><mi>G</mi></mrow><annotation encoding="application/x-tex">p_c \in G</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">∈</span><span class="mord mathit">G</span></span></span></span></eq> be the participant from the group <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>G</mi></mrow><annotation encoding="application/x-tex">G</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">G</span></span></span></span></eq>. Choose the opposite point <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>o</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">o_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">o</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> on the sphere for the <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">p_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>. According to our assumption there is a client <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>c</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">c_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">c</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> in point <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>o</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">o_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">o</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>. Let <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>r</mi></msub><mo>∈</mo><mi>G</mi></mrow><annotation encoding="application/x-tex">p_r \in G</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">∈</span><span class="mord mathit">G</span></span></span></span></eq> be the participant that sends the commit to the client <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>c</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">c_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">c</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> and from the client point of view, it receives the first commit response from the participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>r</mi></msub></mrow><annotation encoding="application/x-tex">p_r</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>. There are two possibilities: either <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">p_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>r</mi></msub></mrow><annotation encoding="application/x-tex">p_r</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> are the same or not. If <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">p_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>r</mi></msub></mrow><annotation encoding="application/x-tex">p_r</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> are the same the proof is trivial: the latency is the doubled half wave time because <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>c</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">c_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">c</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">p_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> are opposite points. If <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">p_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>r</mi></msub></mrow><annotation encoding="application/x-tex">p_r</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> are different participants thus according to theorem 3.3.1 the information about client <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>c</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">c_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">c</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> operation is propagated in the following way <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>c</mi><mi>c</mi></msub><mo>→</mo><msub><mi>p</mi><mi>c</mi></msub><mo>→</mo><msub><mi>p</mi><mi>r</mi></msub><mo>→</mo><msub><mi>c</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">c_c \rightarrow p_c \rightarrow p_r \rightarrow c_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">c</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">→</span><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">→</span><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">→</span><span class="mord"><span class="mord mathit">c</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq>. Because <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">p_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>c</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">c_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">c</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> are opposite on the sphere thus the latency is at least the wave full-time. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p>For leader-based consensus, the theorem becomes trivial because we can put the client on the opposite side of the leader and all operations must be propagated through the leader.</p>
<p><strong>Theorem 3.3.3</strong>. For the highly available system, the wave full-time guaranteed commit latency is theoretically achievable.</p>
<p><em>Proof</em> by example. Put three participants around any point on the sphere on equidistant positions with the distance <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ε</mi></mrow><annotation encoding="application/x-tex">\varepsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ε</span></span></span></span></eq>. If <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ε</mi><mo>→</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">\varepsilon \rightarrow 0</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.64444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ε</span><span class="mrel">→</span><span class="mord mathrm">0</span></span></span></span></eq> thus the guaranteed latency goes to wave full-time because from any point the latency is the doubled of maximum latency between two points on the sphere. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p>Now we can prove the main theorem 0.</p>
<p><em>Proof</em> of theorem 0. The wave full-time of the Earth according to definitions 3.2.5 and 3.2.6 is equal to <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>2</mn><mi>π</mi><msub><mi>R</mi><mi>E</mi></msub><mi mathvariant="normal">/</mi><mi>c</mi></mrow><annotation encoding="application/x-tex">2 \pi R_E / c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">2</span><span class="mord mathit" style="margin-right:0.03588em;">π</span><span class="mord"><span class="mord mathit" style="margin-right:0.00773em;">R</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.00773em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.05764em;">E</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathrm">/</span><span class="mord mathit">c</span></span></span></span></eq> where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>R</mi><mi>E</mi></msub></mrow><annotation encoding="application/x-tex">R_E</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.00773em;">R</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.00773em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.05764em;">E</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is Earth radius and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span></eq> is the speed of light. The result is 133 ms. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<h2>4 Medium and Effectiveness</h2>
<p>I have introduced the term <em>theoretically achievable</em> meaning the following:</p>
<ol>
<li>Latency is described only the wave propagation and does not have any packet loss or delay during transmissions/handling/routing etc.</li>
<li>There is unlimited bandwidth for the links.</li>
<li>Each client is connected with the participant.</li>
</ol>
<p>The reality has another understanding related to the specified items. The common case is that information propagates slower than wave propagation. If we use the ratio <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>n</mi><mo>=</mo><mi>c</mi><mi mathvariant="normal">/</mi><mi>v</mi></mrow><annotation encoding="application/x-tex">n = c / v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">n</span><span class="mrel">=</span><span class="mord mathit">c</span><span class="mord mathrm">/</span><span class="mord mathit" style="margin-right:0.03588em;">v</span></span></span></span></eq> where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span></eq> is the speed of wave and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>v</mi></mrow><annotation encoding="application/x-tex">v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">v</span></span></span></span></eq> is the speed of information propagation thus <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">n</span></span></span></span></eq> is the medium <a href="https://en.wikipedia.org/wiki/Refractive_index"><em>refractive index</em> [1]</a> (BTW, do you know that particle speed can be greater than speed of light? See <a href="https://en.wikipedia.org/wiki/Cherenkov_radiation">Cherenkov radiation [2]</a> for details). Reciprocal of refractive index is the efficiency <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>η</mi></mrow><annotation encoding="application/x-tex">\eta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">η</span></span></span></span></eq> of the infrastructure. Thus it is the useful measure to compare infrastructure and the common criteria to determine overall efficiency.</p>
<p>The idea and intuition are very simple: the more transparent medium <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mo>⇒</mo></mrow><annotation encoding="application/x-tex">\Rightarrow</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.36687em;"></span><span class="strut bottom" style="height:0.36687em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mrel">⇒</span></span></span></span></eq> the less refractive index <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">n</span></span></span></span></eq> and thus the higher efficiency <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>η</mi><mo>=</mo><mn>1</mn><mi mathvariant="normal">/</mi><mi>n</mi></mrow><annotation encoding="application/x-tex">\eta = 1/n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">η</span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mord mathrm">/</span><span class="mord mathit">n</span></span></span></span></eq>.</p>
<h2>5 Optimization</h2>
<p><strong>Definition 5.1</strong>. <em>Operation commit period</em> is the time period between the time point of operation being issued and client commit point of time.</p>
<p><strong>Definition 5.2</strong>. Two or more operations are <em>not concurrent</em> if their operation commit periods are not intersected.</p>
<p><strong>Definition 5.3</strong>. <em>Nonconcurrent period</em> is the runtime situation when there are no concurrent operations.</p>
<p>It is important to emphasize that nonconcurrent period is the runtime situation based on the client's behavior and not a priory knowledge.</p>
<p><strong>Definition 5.4</strong>. <em>Nonconcurrent commit latency</em> is the commit latency during the nonconcurrent period.</p>
<p><strong>Theorem 5.1</strong>. The minimum guaranteed nonconcurrent commit latency is the wave half-time.</p>
<p><em>Proof</em> by contradiction. Let two diametrically opposite clients issue the operations simultaneously. To detect concurrent execution there must be a participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> that has information about both operations. Thus the minimum required time <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>τ</mi></mrow><annotation encoding="application/x-tex">\tau</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.1132em;">τ</span></span></span></span></eq> to detect the concurrency is the wave propagation time from the client to the participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> that is equal to the quarter of wave full-time. The total commit time is doubled of <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>τ</mi></mrow><annotation encoding="application/x-tex">\tau</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.1132em;">τ</span></span></span></span></eq> because we need to add response time from the participant <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> and thus it must be at least the wave half-time. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p><strong>Theorem 5.2</strong>. The guaranteed nonconcurrent commit latency more than the wave half-time is theoretically achievable.</p>
<p><em>Proof</em> by example. Let all participants be placed equidistantly on the equator. If a number of participants are <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>N</mi><mo>=</mo><mn>2</mn><mi>K</mi><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">N = 2 K - 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.76666em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">N</span><span class="mrel">=</span><span class="mord mathrm">2</span><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mbin">−</span><span class="mord mathrm">1</span></span></span></span></eq> where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.07153em;">K</span></span></span></span></eq> is the natural number thus concurrent execution detection requires <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.07153em;">K</span></span></span></span></eq> participants to have the information about the operation. Assuming <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>n</mi><mo>=</mo><mi>η</mi><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">n = \eta = 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.8388800000000001em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">n</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.03588em;">η</span><span class="mrel">=</span><span class="mord mathrm">1</span></span></span></span></eq> as an ideal (theoretically achievable) situation we conclude that commit latency is equal to <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mn>2</mn></msub><mi mathvariant="normal">/</mi><mo>(</mo><mn>1</mn><mo>−</mo><mn>1</mn><mi mathvariant="normal">/</mi><mn>2</mn><mi>K</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">\tau_2 / (1 - 1/2K)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathrm">/</span><span class="mopen">(</span><span class="mord mathrm">1</span><span class="mbin">−</span><span class="mord mathrm">1</span><span class="mord mathrm">/</span><span class="mord mathrm">2</span><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mclose">)</span></span></span></span></eq> where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>τ</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">\tau_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.1132em;">τ</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.1132em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is the wave half-time. Increasing the number of participants <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>N</mi></mrow><annotation encoding="application/x-tex">N</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10903em;">N</span></span></span></span></eq> allows achieving the latency arbitrarily close to the wave half-time. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<h2>6 Solar System</h2>
<p>Here is a table that represents the minimum guaranteed latencies for the solar system objects in the universe:</p>
<table>
<thead>
<tr>
<th style="text-align:right"><strong>Planet System</strong></th>
<th style="text-align:left"><strong>Latency</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:right">Earth</td>
<td style="text-align:left">133 ms</td>
</tr>
<tr>
<td style="text-align:right">Moon</td>
<td style="text-align:left">36 ms</td>
</tr>
<tr>
<td style="text-align:right">Mercury</td>
<td style="text-align:left">51 ms</td>
</tr>
<tr>
<td style="text-align:right">Mars</td>
<td style="text-align:left">71 ms</td>
</tr>
<tr>
<td style="text-align:right">Venus</td>
<td style="text-align:left">126 ms</td>
</tr>
<tr>
<td style="text-align:right">Jupiter</td>
<td style="text-align:left">1.4 s</td>
</tr>
<tr>
<td style="text-align:right">Sun</td>
<td style="text-align:left">14 s</td>
</tr>
<tr>
<td style="text-align:right">Earth-Moon</td>
<td style="text-align:left">2.6 s</td>
</tr>
<tr>
<td style="text-align:right">Earth-Mars</td>
<td style="text-align:left">6-44 min*</td>
</tr>
</tbody>
</table>
<p>*<em>Depending on the position of the planets.</em></p>
<h2>7 Connection Relaxing</h2>
<p>Theorem 3.3.2 assumes that the participants are connected with each other using surface links thus the wave propagates on the sphere. Let us relax this requirement and allow the wave propagates directly through the surface. One of the possibility how to achieve this is to use <a href="https://en.wikipedia.org/wiki/Neutrino">neutrino particles [3]</a> (BTW, do you know, that neutrino is detected by using <a href="https://en.wikipedia.org/wiki/Cherenkov_radiation">Cherenkov radiation [2]</a> due to extremely low interaction with the matter).</p>
<p><strong>Definition 7.1</strong>. <em>Straight connection</em> is the shortest geometrical connection in space.</p>
<p>Using the mentioned shortcut we may obtain the following theorem.</p>
<p><strong>Theorem 7.1</strong>. The minimum guaranteed latency for the globally highly available strong consistency database using straight connections is 85 ms.</p>
<p><em>Proof</em>. The wave full-time of Earth for the straight connections is <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>4</mn><msub><mi>R</mi><mi>E</mi></msub><mi mathvariant="normal">/</mi><mi>c</mi></mrow><annotation encoding="application/x-tex">4 R_E / c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">4</span><span class="mord"><span class="mord mathit" style="margin-right:0.00773em;">R</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.00773em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.05764em;">E</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathrm">/</span><span class="mord mathit">c</span></span></span></span></eq> which is equal to 85 ms. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<h2>8 Place Relaxing</h2>
<p>Theorem 7.1 assumes that all participants are placed on the sphere. In addition to connection relaxing let us assume that the database participants may be placed at any point within the sphere.</p>
<p><strong>Definition 8.1</strong>. The <em>arbitrarily placed</em> database is the database where participants can be placed at any point within sphere using straight connections.</p>
<p><strong>Theorem 8.1</strong>. The minimum guaranteed latency for the globally available strong consistency arbitrarily placed database is 42 ms.</p>
<p><em>Proof</em>. Let all clients are spread across the sphere and they issue the operation to be committed at the same time. To commit the sequence of operations according to theorem 3.2.3 there must be at least one point on the sphere where all waves propagate from all clients to that point. Let us choose the earliest moment of the time where that point <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> appears. The shortest path among all clients to the point <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> occurs when <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">p</span></span></span></span></eq> is the center of the sphere (otherwise we could find the client with larger distance). Thus the minimum guaranteed latency is <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>2</mn><msub><mi>R</mi><mi>E</mi></msub><mi mathvariant="normal">/</mi><mi>c</mi></mrow><annotation encoding="application/x-tex">2 R_E / c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">2</span><span class="mord"><span class="mord mathit" style="margin-right:0.00773em;">R</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.00773em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.05764em;">E</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathrm">/</span><span class="mord mathit">c</span></span></span></span></eq> or 42 ms. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p><strong>Theorem 8.2</strong>. The minimum guaranteed latency 42 ms for the globally available strong consistency arbitrarily placed database is theoretically achievable.</p>
<p><em>Proof</em> by example. Let <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">p_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> be the center of the sphere and the only participant of the consistent database. It is evident that the time required to propagate wave from any point on the surface to the <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>p</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">p_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">p</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> and back is equal to 42 ms. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<h2>9 Reliability and Fault-Tolerance</h2>
<p>Theorem 3.3.2 is proved under assumptions that any participant lives forever and handles the incoming messages correctly. If we introduce additional failure modes (fail-stop, fail-silent etc) it is evident that latencies cannot become less than theorems 3.3.2, 7.1 and 8.1 state because the system must resend the information to redundant participants increasing availability and timings as well.</p>
<p>Nevertheless, it could be easily proved that latencies are achievable at the same values even in the case of arbitrary node failures.</p>
<p><strong>Definition 9.1</strong>. <em>Byzantine participant group</em> is the group of nodes that uses Byzantine consensus to handle received and send information under arbitrary failure.</p>
<p><strong>Definition 9.2</strong>. <em>Live Byzantine participant group</em> is the Byzantine participant group that achieve consensus under a limited fixed number of round trips between participants within the group.</p>
<p><strong>Theorem 9.1</strong>. Live Byzantine participant group behaves as a single participant without failure.</p>
<p><em>Proof</em>. If Byzantine consensus makes the progress due to liveness conditions it means that it operates normally even in the case of arbitrary failures because consensus algorithm preserves safety. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p><strong>Definition 9.3</strong>. The link has <em>efficiency</em> <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>η</mi><mo>=</mo><mi>v</mi><mi mathvariant="normal">/</mi><mi>c</mi></mrow><annotation encoding="application/x-tex">\eta = v / c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">η</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.03588em;">v</span><span class="mord mathrm">/</span><span class="mord mathit">c</span></span></span></span></eq> where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span></eq> is the wave propagation speed and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>v</mi></mrow><annotation encoding="application/x-tex">v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">v</span></span></span></span></eq> is the information propagation speed.</p>
<p>Note that <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>η</mi><mo>≤</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">\eta \leq 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.8388800000000001em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">η</span><span class="mrel">≤</span><span class="mord mathrm">1</span></span></span></span></eq> in accordance with lemma 3.1.1.</p>
<p><strong>Definition 9.4</strong>. <em>Group efficiency</em> is <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>η</mi><mi>G</mi></msub><mo>=</mo><mi>min</mi><mo>(</mo><msub><mi>η</mi><mi>l</mi></msub><mo>)</mo></mrow><annotation encoding="application/x-tex">\eta_G = \min(\eta_l)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">η</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03588em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">G</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">=</span><span class="mop">min</span><span class="mopen">(</span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">η</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03588em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.01968em;">l</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></eq> across all links <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>η</mi><mi>l</mi></msub></mrow><annotation encoding="application/x-tex">\eta_l</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">η</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.03588em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.01968em;">l</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> within the group.</p>
<p><strong>Definition 9.5</strong>. The group is <em>flexible</em> when group efficiency preserved under proportional distance changing between nodes.</p>
<p><strong>Theorem 9.2</strong>. The commit latency of flexible live Byzantine participant group can be arbitrarily small.</p>
<p><em>Proof</em>. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>t</mi><mi>c</mi></msub><mo>≤</mo><mi>d</mi><msub><mi>N</mi><mi>r</mi></msub><mi mathvariant="normal">/</mi><mrow><mi>η</mi><mi>c</mi></mrow></mrow><annotation encoding="application/x-tex">t_c \leq d N_r / {\eta c}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">t</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mrel">≤</span><span class="mord mathit">d</span><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">N</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mord mathrm">/</span><span class="mord textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">η</span><span class="mord mathit">c</span></span></span></span></span></eq> where <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>t</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">t_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.61508em;"></span><span class="strut bottom" style="height:0.76508em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">t</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is the commit latency, <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.69444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">d</span></span></span></span></eq> is the maximal distance between participants, <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>N</mi><mi>r</mi></msub></mrow><annotation encoding="application/x-tex">N_r</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.10903em;">N</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.10903em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight" style="margin-right:0.02778em;">r</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> is the limited number of round trips and <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span></eq> is the wave propagation speed. Decreasing <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.69444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">d</span></span></span></span></eq> causes <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>t</mi><mi>c</mi></msub></mrow><annotation encoding="application/x-tex">t_c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.61508em;"></span><span class="strut bottom" style="height:0.76508em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">t</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathit mtight">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></eq> decreasing as well. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<p>Based on theorem 9.2 we conclude that latencies are achievable because we could use live flexible groups to handle arbitrary failures that introduce negligible overhead on the overall latency timings.</p>
<p>Thus the theorem allows to easily build a fault-tolerant solution based on the simple non-fail model without sacrificing the timings of the database system.</p>
<h2>10 Geo-Locality</h2>
<p>Flexible groups and their commit latency allow introducing the notion of geo-locality. The idea is the following. Sometimes we know that the client at any time can be only in one place or near some place geographically. Thus we could choose the closest data center to handle client requests. The locality allows us to reduce the commit latency.</p>
<p>The only question is what to do if the client moves to another distant region. To preserve consistency we must apply group membership change protocol and move corresponding commit group to another closest data center preserving the minimum response timings. It may work if we implement a rather complex algorithm to change the group and put data centers around the world to be as close as possible to the client. But it does not work if there are concurrent clients across the planet at any time and space.</p>
<h2>11 Consistency Relaxing</h2>
<p>Another way to decrease the latency of operations is to relax the consistent requirements. E.g. one may consider applying sequential consistency still preserving geo-availability of the data.</p>
<p>The idea of sequential consistency can be represented by the following model. The client may issue any operation at any time. But instead of execution, the participant puts the operation into the queue to execute it later. The queue is the global across all participants. Another participant may dequeue the elements one by one and execute it serially. Thus, we serialize execution of all operations.</p>
<p>To reduce the write latency we could do the following approach. Instead of using the global queue to schedule operation the client chooses the closest consensus group inside some data center and put the operation into that queue. Periodically the operation is dequeued from that group and enqueued into the global queue transactionally using exactly once semantics.</p>
<p>The same approach can be used to reduce read latency by having the closest replicated data based. The replicated data is obtained by reading the operations from the global queue.</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhX5AobKrnm9aunIdGbgsdtSoNC9p7DqdQBHcAzH57Z-VSm6Nbv40GAFe5gr-dKMXJa8EfHOmXfn-XFE3_udju9gUut9IHEwATeHyIojxAFVtwBhxEPPZda7_kIkKiBCpoUuX2QGA5lv3I/s640/seq-consistency.png" alt="Sequential consistency"></p>
<p>The approach allows to significantly reduce the latency of write operations by reducing the consistency level. At the same time, the actual latency to commit or execute the operation still cannot be less than mentioned theorems.</p>
<p>This scheme illustrates the idea of how to reduce latency without sacrificing geo-availability. The cost is the reduction of consistency guarantees.</p>
<h2>12 CAL Theorem</h2>
<p><strong>CAL definition</strong>. <em>CAL</em> abbreviation means:</p>
<ol>
<li>"C" is the consistency that implies linearizability, e.g. strict consistency (linearizability+serializability) or linearizability.</li>
<li>"A" is geo-availability meaning that client can be placed at any point on Earth surface.</li>
<li>"L" is the guaranteed low latency, lower than related theorems states, e.g. submillisecond latency.</li>
</ol>
<p><strong>CAL theorem</strong>. It is impossible for a distributed database to simultaneously provide more than two out of the CAL guarantees.</p>
<p><em>Proof</em>. See theorem 0 and considerations related to consistency relaxing and geo-locality. <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">□</mi></mrow><annotation encoding="application/x-tex">\Box</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0em;"></span><span class="strut bottom" style="height:0em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">□</span></span></span></span></eq></p>
<h2>Conclusion</h2>
<p>I am very sad to present the theorem because it disallows building the best database with smartest geo-replicated consistent algorithms. It happened unintentionally, sorry for that.</p>
<p>On the other hand, it allows me to save my energy not to invent unworkable algorithms and relax by accepting the inevitability.</p>
<p>Nevertheless, the developed approach turned out to be powerful and productive. The wave propagation model allows understanding the minimum achievable latency in ideal situations even without knowing the connection topology and algorithms that are used to achieve consistency. Thus it is applicable always regardless of the system.</p>
<p>As the consequence, it provides the boundary to search and investigate the possible optimizations. One of the important application is to improve consensus algorithms by reducing latency based on the wave and information propagation consideration. In that perspective, the nonconcurrent optimization gives the key understanding how to further improve the consensus algorithm characteristics based on runtime situation.</p>
<p>At the same time, the article introduces the locality approach to significantly improve the latency. Limiting the distance between all participants including clients gives the possibility to further improve system characteristics without sacrificing the other parameters like consistency. The only requirement is to ensure that client cannot be at any point on geosphere simultaneously.</p>
<p>The theorem states the timings regarding ideal system meaning that there is no information (network packets) drops, node failure and other mechanisms that may potentially increase the overall latency. To deal with that complexity the article introduces the notion of efficiency and defines the way to obtain the direct value of the efficiency. The total latency can be easily recalculated based on the ideal latencies and efficiency. The medium model is thoughtful while considering the nonideality of the environment.</p>
<p>Moreover, the article provides the way how to shift consideration from totally available nodes to partially available nodes including arbitrary failures. The further application the Byzantine consensus allows creating a robust infrastructure and treating each participant as totally available. Of course, the model can be easily reduced to the normal consensus if the arbitrary failures are forbidden. The locality consideration to reduce the latency is applicable for the consensus groups as well.</p>
<p>The newly introduced CAL theorem allows tuning important entities describing the tradeoffs and connectivity between them. The possible way to preserve geo-availability and low latency is to use more relaxed consistency models e.g. briefly described sequential consistency application.</p>
<p>Now the turn from the wave propagation model to concrete consensus and transaction algorithms to make the closest bridge between them in the context of achieving the described guaranteed minimum latency. Those algorithms are coming.</p>
<blockquote>
<p><em>Self-Examination Question</em>. What is the minimum possible number of round trips required to commit the client operation excluding round trip from the client to the participants?</p>
<p>The answer is: <eq><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>e</mi><mrow><mi>i</mi><mi>π</mi></mrow></msup><mo>+</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">e^{i \pi} + 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.824664em;"></span><span class="strut bottom" style="height:0.907994em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">e</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathit mtight">i</span><span class="mord mathit mtight" style="margin-right:0.03588em;">π</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span><span class="mbin">+</span><span class="mord mathrm">1</span></span></span></span></eq>.</p>
</blockquote>
<h2>References</h2>
<p>[1] <a href="https://en.wikipedia.org/wiki/Refractive_index">Wikipedia: Refractive index</a><br>
[2] <a href="https://en.wikipedia.org/wiki/Cherenkov_radiation">Wikipedia: Cherenkov radiation</a><br>
[3] <a href="https://en.wikipedia.org/wiki/Neutrino">Wikipedia: Neutrino</a><br></p>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-72458474464513227772017-03-04T08:56:00.001-08:002017-03-06T12:59:37.154-08:00CAP Theorem Myths<h2>Introduction</h2>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1H8hw0H_-7b63SoSLZYwJLuVLOYtbV6cUm7LOpq1CQpMQK9BSriJpP-yKoJbRlxjCwheYbe0dG0fq-fiOo54YWdCguMhrf3GHm6a6zk8TzzBx9niqL7T_bZFHdvh7KpP4XP3lKhufkqM/s1600/cap-fun.png" alt="cap"></p>
<p>The article explains the most widespread myths of CAP theorem. One of the reason is to analyze recent <a href="https://cloud.google.com/spanner/docs/whitepapers/SpannerAndCap.pdf">Spanner, TrueTime & The CAP Theorem</a> article and to make clear understanding about terms involved in the theorem and discussed a lot under different contexts.</p>
<p>We consider that article closer to the end, armed with the concepts and knowledge. Before that, we analyze the most common myths associated with the CAP theorem.</p>
<a name='more'></a>
<h2>Myth 1: A Means Availability</h2>
<p><strong>A</strong>, of course, derived from the word "Availability". However, what does it mean specifically? What kind of availability?</p>
<p>It is not a simple question. Availability in different contexts means different things. Moreover, here we must distinguish at least two different contexts in which it can be used:</p>
<ol>
<li>The availability of real service. This availability is expressed as a percentage: measured by the total time per year of inactivity and a ratio, expressed as a percentage, meaning the probability of availability for a relative long period.</li>
<li>Availability within the model of CAP theorem.</li>
</ol>
<p><a href="http://www.glassbeam.com/sites/all/themes/glassbeam/images/blog/10.1.1.67.6951.pdf">The CAP theorem</a> uses the concept with closest meaning as <em>total availability</em>:</p>
<blockquote>
<p>For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response.</p>
</blockquote>
<p>In this definition, there are a few points I would like to emphasize:</p>
<ol>
<li><em>Non-failing node</em>. It is clear that the failing node cannot respond. However, the thing is that if all nodes are failing nodes, from the definition point of view such service is available. In principle, you can fix the definition, adding that at least one node is a non-failing node.</li>
<li><em>Must result</em>. The theorem does not say exactly when it should happen. It does not know about any timings at all. It is completely obvious that a node cannot respond instantaneously. It is sufficient to reply at some point of time.</li>
</ol>
<p>From a user perspective, if we had 100 nodes and 99 have failed while the remaining one continues to respond with the rate one request per hour that service is hardly available (context 1). However, from the perspective of CAP theorem, everything is fine and the system is available (context 2).</p>
<p>Therefore, <strong>A</strong> is not availability in the conventional sense, but a so-called <em>total availability</em> and that kind of availability can be insufficient and dissatisfactory for the user.</p>
<h2>Myth 2: P Means the Partition Tolerance</h2>
<p>The above definition can be found in almost all articles. To understand what is wrong, we have to look at the problem under a different angle.</p>
<p>Let us take any system that exchanges messages. Consider how messages are transmitted between the actors - system objects. These messages can either be transferred to another actor or can be dropped. There are two cases:</p>
<ol>
<li>There is no possibility of losing the messages in the system.</li>
<li>There is a possibility of losing the messages in the system.</li>
</ol>
<p>It is easy to conclude that the list above is exhaustive. At this point, you should pay attention that each item describes the properties of the system. At the same time, we have not even started describing the algorithm. This fact has far-reaching consequences.</p>
<p>If we consider the first case when the messages are never lost it means that in this situation <em>network split</em> is simply impossible. Indeed, every time a message from each actor can be guaranteed to be transferred, there is no sense to talk about the <em>network split</em>.</p>
<p>In the second case, the opposite is true: because of losses, there is a possibility that a segment of the actors is separated from other segment, i.e. there was a loss of connectivity between groups of actors. In this case, we say that <em>network split</em> has happened.</p>
<p>It should be noted that the property of the possibility of isolation of actor groups from each other is a direct consequence of the second case.</p>
<p>If we consider the real network, it is not difficult to conclude that it falls under the second case. At the same time, we have not started to think about the algorithm, and we already have the ability of losing the connectivity between groups of actors. <strong>P</strong> is about of a simple fact that the <em>network split may happen</em>. It is not a property of the algorithm; it is the property of the physical layer of our system where the algorithm operates.</p>
<p>Why the network split is so important? The reason is that other issues do not cause so much trouble in comparison with network split significantly increasing distributed algorithms complexity.</p>
<p>As the conclusion of the discussed myth consider the quote from <a href="https://aphyr.com/posts/328-jepsen-percona-xtradb-cluster">Aphyr: Percona XtraDB Cluster</a>:</p>
<blockquote>
<p>Partition tolerance does not require every node still be available to handle requests. It just means that partitions may occur. If you deploy on a typical IP network, partitions will occur; partition tolerance in these environments is not optional.</p>
</blockquote>
<p>Thus, if we consider a system that works with an unreliable network a violation of network connectivity is not an exceptional situation. <strong>P</strong> in this context means that <strong>network split may happen</strong>.</p>
<h2>Myth 3: AC Systems Do Not Exist</h2>
<p>According to previous consideration, it should be obvious that you cannot build AC system because there is no completely robust networks capable of transmitting data without any loss. You could immediately propose a scheme with redundant components. However, if the probability of packet loss in the line > 0 additional lines cannot reduce the probability to be equal to zero based on simple mathematical consideration. If so then as described above network split may occur.</p>
<p>However, who said that CAP theorem describes distributed systems only? CAP is a theoretical model that can be applied to a wide class of problems. For example, you can take a multi-core processor:</p>
<ol>
<li>Each core behaves as an actor.</li>
<li>Actors (cores) exchange messages (information).</li>
</ol>
<p>This is enough to apply CAP theorem.</p>
<p>Consider <strong>A</strong>. Are cores available? Of course, yes: at any time, you can go to any core and obtain any information from memory you want.</p>
<p>What about <strong>P</strong>? The processor ensures that data will be transferred to the other core without any issues. If this for some reason does not happen then that processor is considered to be defective. Thus, the letter <strong>P</strong> is absent.</p>
<p>Consistency question resolves in the following way. The memory model describes <em>sequential consistency</em>, which is the highest level of consistency in such a system. At the same time, the processor usually implements cache coherence protocols such as MESI or MOESI thereby providing a predetermined level of consistency.</p>
<p>Thus, the modern processor is <strong>AC</strong> system with guaranteed message delivery between the cores.</p>
<h2>Myth 4: C is consistency</h2>
<p><strong>C</strong> without a doubt means the consistency. However, what kind of consistency should we consider? E.g., eventual consistency is one of the form of consistency. So what should we have in mind?</p>
<p>There are a lot of consistency models, you can look at the picture taken from <a href="https://arxiv.org/pdf/1512.00168.pdf">Consistency in Non-Transactional Distributed Storage Systems</a>:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcfl_4mNuy246lBaQn9QtSZaoVf69h7maaze3xZnSqs6n9bymf5x4sLRysEQ6QaOAnLyiMvcK61QJgQDb5zWUdAVhVjal5l0FaP3e_34MhXBrriDal6Scx-JbIYKnNsER1g27m_uvkuFk/s800/distributed-consistencies.png" alt="Distributed Consistencies"></p>
<p>Those consistencies are applied to the distributed non-transactional system only! If you consider <a href="https://www.ics.forth.gr/tech-reports/2013/2013.TR439_Survey_on_Consistency_Conditions.pdf">distributed transactional system consistencies</a> you can just bury the idea to look into this.</p>
<p>The <a href="http://www.glassbeam.com/sites/all/themes/glassbeam/images/blog/10.1.1.67.6951.pdf">original article about the CAP theorem</a> uses consistency model known as linearizability. Linearizability briefly speaking means the following: if there is any action (no matter, read, write, or mixed action or actions), the result of this action is available immediately right after reply receiving.</p>
<p>The question arises instantly: do other forms of consistency fall under the CAP theorem?</p>
<p>To answer this question, let's consider the picture taken from the article <a href="https://arxiv.org/pdf/1302.0309.pdf">Highly Available Transactions</a>:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhI2MPPvbpOlZciQN_uraaG9eeEMkECVArycEziYpz2yqio68yNMQWschQBi8nSLLn21VSyxUjhwziqvkninunBxb9DnCdzlXReVQJr4vrcsIAOYl3yn5K61ZYoSJO_TtQYq-j9puHeQGw/s800/CAP-consistency.png" alt="Consistencies"></p>
<p>Red circles denote the <em>unavailable models</em>. They are applicable under the CAP theorem meaning that it is impossible to simultaneously achieve both <strong>A</strong> and <strong>P</strong> using those forms of consistencies. However, there are other models with sufficient consistency level for a wide range of tasks, which nevertheless can be made simultaneously with the <strong>AP</strong> obtaining CAP system without any obstacles. A typical example: <em>Read Committed</em> (RC) and <em>Monotonic Atomic View</em> (MAV) allow achieving all three letters in the CAP, and no one can say that those models are weak consistency models. Consistency models that violates CAP theorem are called <em>highly available models</em>.</p>
<p>Thus, speaking of consistency, we mean a broad group of consistency models, called <em>unavailable models</em>.</p>
<h2>Myth 5: CP Systems are not Highly Available</h2>
<p>After the preceding paragraph, it seems quite logical but it is fundamentally wrong. Recall that <strong>A</strong> stands for <em>total availability</em> rather than availability within the nines. Is it possible to make the <strong>CP</strong> system highly available?</p>
<p>Here it is necessary to separate the model and hardware, that is, theory and practice.</p>
<p>First, let us think in terms of the model. Availability under the CAP theorem means <em>total availability</em>, i.e., any alive node must respond. Nevertheless, why do you need it? After all, we could rewrite the logic of the client completely. Instead of operating with single node, we could consider connection to a consensus group of nodes and choose the most recent value based on many responses from different nodes. Thus, the system from the client perspective is highly available for both reads and writes and is consistent due to applying consensus algorithm.</p>
<p>In reality, there is always a non-zero probability that only minority nodes are available. This is easily seen, because if there is a nonzero probability of one node failure, then there is non-zero probability of another node(s) failure. Moreover, it is not the worst can that may happen. In addition to hardware failures, the failure of various network equipment may take place. I think it is not necessary to remind that all those failures have a non-zero probability. All these probabilities are accumulated given a certain sometimes very small number of nines in availability. It is clear that the more redundant hardware we have the greater number of nines we obtain. I still did not take into account the software application itself, which has a non-zero probability of bugs...</p>
<p>Thus in practice we always have availability less than 100%. All science is to achieve the greatest possible number of nines. In this aspect CAP theorem is useless. Because it is about completely different notions and models.</p>
<p>So the idea of having highly available system does not contradict the fact that this is not <strong>A</strong>, and therefore <strong>CP</strong> can be highly available.</p>
<h2>Myth 6: CP Systems Have Low Performance, High Latency and Are not Scalable</h2>
<p>Obviously, the higher level of consistency the less performant system we have. Nevertheless, it turns out that even the <em>strict consistency</em> or <em>Strong-1SR</em> (the highest level of consistency) with exactly once semantics can be used in real-time systems. I have an experimental proof of this fact but here I would like to give some practical considerations in favor of it.</p>
<p>The idea is to use a set of independent fault-tolerant entities. You can run them anywhere, they can work in parallel and their number is limited by the size of the cluster. On top of the entities we could create transactional layer which connects different parts together allowing operating them transparently. This is how <a href="https://static.googleusercontent.com/media/research.google.com/ru//archive/spanner-osdi2012.pdf">Spanner</a> and other distributed scalable systems work.</p>
<p>Thus, we might achieve scalability and performance for <strong>CP</strong> systems.</p>
<h2>Myth 7: AP Systems are Easy to Use due to Scalability</h2>
<p><strong>AP</strong> systems allow implementing simple scaling schemes, but only in theory. In practice, you have to solve the issues related to the weak consistency.</p>
<p>Real systems show that the correctly implemented client based on such system is nontrivial task and sometimes it is even impossible to implement. The reason is that if the system does not provide some basic guarantees to preserve data consistency the subsequent processing is transformed into a very fascinating charade: has the operation been applied? do you know what others may see? is it possible to obtain consistent data snapshot? do different clients see the same data set? etc.</p>
<p>Despite the relative simplicity of such systems, the usage complexity from the client’s perspective increases dramatically.</p>
<h2>Article Analysis</h2>
<p>And now let's proceed with <a href="https://cloud.google.com/spanner/docs/whitepapers/SpannerAndCap.pdf">Spanner, TrueTime & The CAP Theorem</a>. Let's start from the beginning:</p>
<blockquote>
<p>The CAP theorem [Bre12] says that you can only have two of the three desirable properties of:</p>
<ul>
<li>C: Consistency, which we can think of as serializability for this discussion;</li>
<li>A: 100% availability, for both reads and updates;</li>
<li>P: tolerance to network partitions.</li>
</ul>
</blockquote>
<p>The first thing you should pay attention to is the link [Bre12] called <a href="https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed">CAP Twelve Years Later: How the "Rules" Have Changed</a> dated May 2012. It contains some thoughts related to the theorem but not the <a href="http://www.glassbeam.com/sites/all/themes/glassbeam/images/blog/10.1.1.67.6951.pdf">CAP theorem</a> itself.</p>
<p>In addition to that, we have discussed all letters and we have applied at least myth #2 in the quote above.</p>
<blockquote>
<p>Once you believe that partitions are inevitable, any distributed system must be prepared to forfeit either consistency (AP) or availability (CP), which is not a choice anyone wants to make.</p>
</blockquote>
<p>The first part sounds quite reasonable according to our discussion, but the last part looks strange and applies myths #5, #6 and #7.</p>
<p>Then the reasonable words have been written:</p>
<blockquote>
<p>The actual theorem is about 100% availability, while the interesting discussion here is about the tradeoffs involved for realistic high availability</p>
</blockquote>
<p>It seems the author would like to say that Spanner highly available <strong>CP</strong> system avoiding myth #6. Unfortunately, it is not the case and author continues with paragraph:</p>
<blockquote>
<h4>Spanner claims to be consistent and available.</h4>
<p>Spanner claims to be consistent and highly available, which implies there are no partitions and thus many are skeptical.</p>
</blockquote>
<p>Of course, we are skeptical because it does not imply that. In accordance to considerations in myth #5, the so-called <em>highly available</em> system does not mean <strong>A</strong>, and thus does not mean the absence of <strong>P</strong>.</p>
<blockquote>
<p>Based on a large number of internal users of Spanner, we know that they assume Spanner is highly available.</p>
</blockquote>
<p>The phrase itself is remarkable. It turns out that if internal users will say "we assume that it is highly available", it follows immediately that something takes place in practice without any assumptions.</p>
<p>To be more precise author adds:</p>
<blockquote>
<p>If the primary causes of Spanner outages are not partitions, then CA is in some sense more accurate.</p>
</blockquote>
<p>According to my understanding, the article logic is the following. If we have a failure that is not associated with <em>network split</em> then we could treat such system as <strong>CA</strong> in some sense (!). In other words if the probability of other failures are more than the network failure then we may drop <strong>P</strong>.</p>
<p>In that sense myths statements look more reasonable.</p>
<p>Later on, the author provides the definition of used notion "effectively CA":</p>
<blockquote>
<p>... to claim <em>effectively CA</em> a system must be in this state of relative probabilities:
1. At a minimum it must have very high availability in practice (so that users can ignore exceptions), and
2. as this is about partitions it should also have a low fraction of those outages due to partitions.</p>
<p>Spanner meets both.</p>
</blockquote>
<p>The questions immediately appear:</p>
<ol>
<li>What is the level of high availability is sufficient "in practice"? 5 nines? 6 nines? maybe 9 nines? There is certain arbitrariness that does not allow correctly concluding about belonging to that definition. "ignore user exception" completes the ambiguity.</li>
<li>Where is <strong>P</strong>? <strong>P</strong> means that <em>network splits</em> may happen regardless of the probability (see myth #2). Should we redefine <strong>P</strong> as well?</li>
</ol>
<p>Finally:</p>
<blockquote>
<p>Spanner reasonably claims to be an “effectively CA” system despite operating over a wide area, as it is always consistent and achieves greater than 5 9s availability... Even then outages will occur, in which case Spanner chooses consistency over availability.</p>
</blockquote>
<p>It is obviously contradicts to the <strong>CA</strong> system and common sense: in such system there is no choice because both properties are chosen as we have seen above in the example described in myth #3. The presence of such statement just says that it is not <strong>CA</strong> completely.</p>
<p>I did not expect to see conflicting paragraphs in this article.</p>
<h2>The Last Myth: CAP Theorem is Outdated</h2>
<p>The popularity of this topic led to the fact that many people no longer understand the meaning of terms; they have become blurred, emasculating to have quite vulgar understanding. Speculation on the terms, redefinition and misunderstanding - this is an incomplete list of generic spots this distressful theorem.</p>
<p>At the time, the pendulum swung the other way and people had started to forget the theorem. Articles tried to conclude that CAP theorem is outdated and <a href="https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html">asked to stop its use</a>. Even <a href="https://cloud.google.com/spanner/docs/whitepapers/SpannerAndCap.pdf">the author of the theorem</a> begins to substitute the concepts and distort the original intent.</p>
<p>Those attacks in the direction of the theorem repeatedly underscore its relevance, exposing a new faces unknown so far.</p>
<h2>Conclusion</h2>
<p>At one time, the CAP theorem introduced interesting concepts and new understandings. The theoretically impossibility of creation of a certain kind of systems allows to concentrate on developing the tasks of a particular class avoiding implementing unsolvable systems. In the context of distributed systems, it makes sense to consider either <strong>AP</strong> or <strong>CP</strong> systems.</p>
<p>Such theorems are not obsolete. It cannot become obsolete as well as classical mechanics despite the presence of relativistic effects and quantum mechanics. It just needs to find its rightful place. We must remember about it and move on.</p>
<p>And the thing is that this theorem is a special case of a more general <a href="https://dspace.mit.edu/openaccess-disseminate/1721.1/79112">fundamental property</a>:</p>
<blockquote>
<p>The CAP Theorem, in this light, is simply one example of the fundamental fact that you cannot achieve both safety and liveness in an unreliable distributed system.</p>
</blockquote>
<ul>
<li>C: <strong>Safety</strong></li>
<li>A: <strong>Liveness</strong></li>
<li>P: <strong>Unreliable distributed system</strong></li>
</ul>
<p><em>Grigory Demchenko, <a href="https://habrahabr.ru/company/yandex/blog/311104/">YT</a> Software Engineer</em></p>
<h2>References</h2>
<p><a href="https://cloud.google.com/spanner/docs/whitepapers/SpannerAndCap.pdf">Spanner, TrueTime & The CAP Theorem</a></p>
<p><a href="http://www.glassbeam.com/sites/all/themes/glassbeam/images/blog/10.1.1.67.6951.pdf">Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web</a></p>
<p><a href="https://aphyr.com/posts/328-jepsen-percona-xtradb-cluster">Jepsen: Percona XtraDB Cluster</a></p>
<p><a href="https://arxiv.org/pdf/1512.00168.pdf">Consistency in Non-Transactional Distributed Storage Systems</a></p>
<p><a href="https://www.ics.forth.gr/tech-reports/2013/2013.TR439_Survey_on_Consistency_Conditions.pdf">Survey on consistency conditions</a></p>
<p><a href="https://arxiv.org/pdf/1302.0309.pdf">Highly Available Transactions: Virtues and Limitations (Extended Version)</a></p>
<p><a href="https://static.googleusercontent.com/media/research.google.com/ru//archive/spanner-osdi2012.pdf">Spanner: Google’s Globally-Distributed Database</a></p>
<p><a href="https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed">CAP Twelve Years Later: How the "Rules" Have Changed</a></p>
<p><a href="https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html">Please stop calling databases CP or AP</a></p>
<p><a href="https://dspace.mit.edu/openaccess-disseminate/1721.1/79112">Perspectives on the CAP Theorem</a></p>
<p><a href="https://habrahabr.ru/company/yandex/blog/311104/">YT: Why Yandex Has Its Own MapReduce System and How It Works (in Russian)</a></p>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-37686508717141357192016-05-08T03:50:00.000-07:002017-11-16T10:05:58.070-08:00Replicated Object. Part 7: Masterless Consensus Algorithm
<h2>1 Abstract</h2>
<p>The article introduces the new generation of consensus algorithms: masterless consensus algorithm. The core part consists of less than 30 lines of C++ code. Thus it is the simplest consensus algorithm that contains several outstanding features allowing to easily developing complex fault-tolerant distributed services.</p>
<h2>2 Introduction</h2>
<blockquote>
<p>There are only two hard problems in distributed systems:<br>
2. Exactly-once delivery.<br>
1. Guaranteed order of messages.<br>
2. Exactly-once delivery.<br></p>
<p><em>Mathias Verraes</em>.</p>
</blockquote>
<p>Distributed programming is hard. The main reason that you should not rely on the common assumptions about timings, possible failures, devices reliability and operation sequences.</p>
<a name='more'></a>
<p>Distributed systems are truly asynchronous meaning that you do not have explicit a time bounds for your operations. And you cannot rely on the timer due to time drifting, time synchronization, garbage collector pauses, operating system unexpected scheduler delays and others. You cannot rely on network due to packet loss, network delays and network split. You cannot rely on hardware due to hard disk errors, CPU, memory and network device failures.</p>
<p>Redundancy and replication allows to tolerate numerous failures providing service availability. The only question is how to correctly replicate the data without losing consistency in the presence of asynchronous nature of the system and hardware or software failures. That task can be solved by using <em>consensus algorithm</em>.</p>
<h2>3 Consensus Problem Statement</h2>
<p><a href="http://zoo.cs.yale.edu/classes/cs426/2012/bib/fischer83consensus.pdf">Consensus</a> is a fundamental problem in distributed systems. It allows to provide safety guarantee in distributed services. The most common usage is to obtain an agreement on sequence of operations providing <a href="https://cs.brown.edu/%7Emph/HerlihyW90/p463-herlihy.pdf">linearizable form of consistency</a>. As a consequence, the agreement on sequence of operations provides exactly once semantics and preserves the operations ordering. Combining distributed operations sequence agreement together with <a href="https://en.wikipedia.org/wiki/State_machine_replication">state machine replication</a> approach we can easily implement fault tolerant services with consistent shared state.</p>
<h2>4 Overview</h2>
<p>Let's briefly discuss existent approaches that solve distributed consensus problem.</p>
<h3>4.1 Master-based Consensus</h3>
<p>The first generation algorithms are the <em>master-based</em> consensus logic. Currently those algorithms are widespread among different services. The most popular and well-known implementations are <a href="http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf">Paxos</a>, <a href="http://diyhpl.us/%7Ebryan/papers2/distributed/distributed-systems/zab.totally-ordered-broadcast-protocol.2008.pdf">Zab</a> and <a href="https://ramcloud.stanford.edu/raft.pdf">Raft</a>. They have the following features in common:</p>
<ol>
<li>Master makes a progress. At any time to make a progress there is a stable master (leader) which is responsible for handling all write and/or read requests.</li>
<li>Leader election. If there is no leader (e.g. leader has been crashed) the first unavoidable step is to elect the leader using so called "leader election algorithm". The system cannot make a progress unless the leader is elected.</li>
</ol>
<p>Second point is the reason of write unavailability for a relatively large time periods (could be from seconds up to minutes) under any kind of leader failure.</p>
<p>Whereas <a href="http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf">Basic Paxos</a> doesn't contain explicit leader the algorithm still is affected by the issues mentioned above. The reason is that the crash of the proposer that has received the majority of accepted votes requires appropriate handling by using specific timeout mechanism. Actually, Basic Paxos algorithm effectively performs leader election on each step. Multipaxos just optimizes that behavior by using the stable leader thus skipping <em>prepare</em> and <em>promise</em> phases.</p>
<h3>4.2 Multimaster Consensus</h3>
<p>The second generation consensus algorithms utilizes <em>multimaster</em> approach. They include quite recent implementations like <a href="https://www.cs.cmu.edu/%7Edga/papers/epaxos-sosp2013.pdf">EPaxos</a> and <a href="http://www.hyflow.org/pubs/opodis14-alvin.pdf">Alvin</a>. The common idea here is to avoid stable and dedicated master for all commands. They use temporary master for specific command instead. The approach still utilizes the idea of master but per-command basis meaning that on each incoming request from the client to replica that replica becomes a leader for the client command and performs necessary steps to commit incoming client request.</p>
<p>The algorithms have the following features in common:</p>
<ol>
<li>Master per-command basis only. There is no stable master.</li>
<li>Utilizing <em>fast-path quorum</em> approach. It decreases the number of replica to be contacted thus reducing overall latency to commit the command.</li>
<li>Dependency resolver. It allows to process independent requests in parallel without any interference.</li>
<li>Leader election on failure. If the command is not committed a new elected leader for that specific command is responsible for commit it. This item obviously includes the leader election step to be performed.</li>
</ol>
<p>The second generation algorithms have significant advantages over the first generation algorithms because they reduce the latency needed to agree and commit the commands for a price of the overall code complexity. I would like to emphasize that currently available publications of multimaster consensus algorithms do not include detailed description and safety consideration of cluster membership changes unlike e.g. <a href="https://raft.github.io">raft consensus algorithm publications</a>.</p>
<h3>4.3 Masterless Consensus</h3>
<p>The final generation is the masterless consensus algorithms. They use the following ideas:</p>
<ol>
<li>No master. At any time there is no any master. Any replica is indistinguishable from each other.</li>
<li>No failover. The algorithm does not have any special failover logic and automatically tolerates failures by design.</li>
</ol>
<h2>5 Masterless Algorithm</h2>
<p>This section introduces the masterless consensus algorithm.</p>
<h3>5.1 Problem Statement</h3>
<p>The classical consensus problem statement requires to agree on some single value from a proposed set of values. The masterless consensus algorithm slightly modifies that requirement which makes it more practical and effective. Because the final result is having the same sequence of operations on each replica the problem statement is reformulated in the following way: <em>obtain the agreement on the same sequence</em> of values or messages where sequence may contain the proposed values only. Sequence of values must contain at least single element meaning that the sequence must not be empty.</p>
<p>Thus the requirement of having the same and the only value is transformed into the requirement of having the same sequence of operations. Effectively the sequence can be treated as the "classical" value to be agreed on.</p>
<h3>5.2 Brief Description</h3>
<p><center><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFHC9v4cYs7XT101qgBxdV2JNQMUQLthYhNSfk7MW4iGZtAfts6hONppihwCZgAUzZgj1vR0sAypwEC1vXOfxabsrQ87m0GlHJt_zpkhIf-HL2VlM1QBCitIF0CuM_4eKH9Dkw7iOyjE8/s1600/broadcasts_fun.png" alt="alt text" title="Fun broadcasts"></center></p>
<p>The base idea is to:</p>
<ol>
<li>Send the messages by client within steps. Each replica on a single step may propose the only message. To propose next message replica must increase the current step by 1.</li>
<li>Broadcast the state to spread the knowledge about messages in progress.</li>
<li>Sort the messages deterministically.</li>
<li>Commit sequence of messages ensuring that other replicas have the same messages and there are no additional messages for the current step.</li>
</ol>
<h2>6 Naive Approach</h2>
<p>The naive approach demonstrates the mentioned base idea. I briefly describe the model and data structures that I am going to use.</p>
<p>Each replica cooperates with the group of replicas by broadcasting the messages. It is very similar to actor model where each actor uses the messages to exchange the information and handle external events.</p>
<p>The consensus task is to agree on the sequence of client messages. <code>Carry</code> represents the message sent by the client and initiates the agreement processing.</p>
<h3>6.1 Data Structures</h3>
<p>Each client message <code>Carry</code> contains payload with serialized actions to be executed on commit phase. For each <code>Carry</code> message we should define the comparison operators:</p>
<pre><code class="cpp"><span class="keyword">struct</span> Carry
{
<span class="keyword">bool</span> <span class="keyword">operator</span><(<span class="keyword">const</span> Carry&) <span class="keyword">const</span>;
<span class="keyword">bool</span> <span class="keyword">operator</span>==(<span class="keyword">const</span> Carry&) <span class="keyword">const</span>;
<span class="comment">// executes the client operation based on payload</span>
<span class="keyword">void</span> execute();
<span class="comment">// payload</span>
<span class="comment">// ...</span>
};
</code></pre>
<p>The simplest way to implement comparison operations is to add <a href="https://en.wikipedia.org/wiki/Globally_unique_identifier">GUID</a> and generate it on newly created client message. <code>Carry</code> declaration allows us to sort client messages deterministically and independently by any replica. Along with that the client message <code>Carry</code> can be executed to process client requests:</p>
<pre><code class="cpp">Carry msg;
<span class="comment">// to commit the client message `msg` replica executes it by invoking:</span>
msg.execute();
</code></pre>
<p><code>CarrySet</code> contains a sorted set of messages based on overloaded comparison operators:</p>
<pre><code class="cpp"><span class="keyword">using</span> CarrySet = <span class="built_in">std</span>::<span class="stl_container"><span class="built_in">set</span><Carry></span>;
</code></pre>
<p><code>NodesSet</code> is a sorted set of replicas (nodes). Each node has unique identifier <code>NodeId</code>:</p>
<pre><code class="cpp"><span class="keyword">using</span> NodesSet = <span class="built_in">std</span>::<span class="stl_container"><span class="built_in">set</span><NodeId></span>;
</code></pre>
<p><code>Vote</code> describes the remote broadcast message to share the current replica knowledge:</p>
<pre><code class="cpp"><span class="keyword">struct</span> Vote
{
CarrySet carrySet;
NodesSet nodesSet;
};
</code></pre>
<p><code>Commit</code> message is used to commit the client message. It contains the ordered set of client messages to be executed sequentially one by one:</p>
<pre><code class="cpp"><span class="keyword">struct</span> Commit
{
CarrySet commitSet;
};
</code></pre>
<p><code>Context</code> structure represents execution context where <code>sourceNode</code> is the <code>NodeId</code> of the sender and <code>currentNode</code> is the <code>NodeId</code> of the current replica:</p>
<pre><code class="cpp"><span class="keyword">struct</span> Context
{
NodeId sourceNode; <span class="comment">// sender</span>
NodeId currentNode; <span class="comment">// current</span>
};
<span class="comment">// at any time the context can be extracted</span>
<span class="comment">// by using the following function:</span>
Context& context();
</code></pre>
<p><code>Context</code> instance is initialized per incoming message basis. Thus invocation <code>context().currentNode</code> allows to obtain <code>NodeId</code> for the current replica while <code>context().sourceNode</code> contains the sender <code>NodeId</code>.</p>
<p>Any data structure can be broadcasted to the others by using <code>broadcast</code> function:</p>
<pre><code class="cpp"><span class="comment">// broadcast commit</span>
broadcast(Commit{setToBeCommitted});
<span class="comment">// broadcast vote</span>
broadcast(Vote{carries_, nodes_});
</code></pre>
<p>Messages can be handled by appropriate services. Each service may send any number of messages and receives specific set of messages. The following example demonstrates how to catch and handle the incoming message:</p>
<pre><code class="cpp"><span class="comment">// declare service to handle incoming messages</span>
<span class="keyword">struct</span> MyService
{
<span class="comment">// handles `Carry` incoming message sent by the client</span>
<span class="keyword">void</span> on(<span class="keyword">const</span> Carry& msg);
<span class="comment">// handles `Vote` incoming message</span>
<span class="keyword">void</span> on(<span class="keyword">const</span> Vote& vote);
<span class="comment">// handles `Commit` incoming message</span>
<span class="keyword">void</span> on(<span class="keyword">const</span> Commit& commit);
<span class="comment">// handles `Disconnect` incoming message</span>
<span class="keyword">void</span> on(<span class="keyword">const</span> Disconnect&);
<span class="comment">// etc</span>
};
</code></pre>
<h3>6.2 Messages</h3>
<p>The algorithm consists of 4 different types of incoming messages:</p>
<ol>
<li>Client message: <code>void on(const Carry& msg)</code></li>
<li>Voting: <code>void on(const Vote& vote)</code></li>
<li>Committing: <code>void on(const Commit& commit)</code></li>
<li>Disconnection: <code>void on(const Disconnect&)</code></li>
</ol>
<p>Let's consider each message handler in detail.</p>
<h4>6.2.1 Carry</h4>
<p>Client sends the message <code>Carry</code> to be committed. Any replica accepts the client message <code>Carry</code> and generates appropriate <code>Vote</code> message:</p>
<pre><code class="cpp"> <span class="comment">// accepts new incoming message from the client</span>
<span class="keyword">void</span> on(<span class="keyword">const</span> Carry& msg)
{
<span class="comment">// generates initial vote message</span>
on(Vote{CarrySet{msg}, nodes_});
}
</code></pre>
<p>Initially replica adds received carry to <code>Vote::carrySet</code> and uses current known set of nodes extracted from field <code>nodes_</code>.</p>
<h4>6.2.2 Vote</h4>
<p>The main logic is placed inside the vote handler. It is the heart of the consensus algorithm. On each incoming <code>Vote</code> message the following sequence of operations is taken place:</p>
<ol>
<li>Combining replica carries and incoming vote carries: <code>carries_ |= vote.carrySet</code>.</li>
<li>Checking group membership changes: <code>if (nodes_ != vote.nodesSet)</code>. The following sequence is applied on group changing:
<ul>
<li>reset state to initial: <code>state_ = State::Initial</code>,</li>
<li>update nodes group: <code>nodes_ &= vote.nodesSet</code>,</li>
<li>clear votes: <code>voted_.clear()</code>,</li>
</ul></li>
<li>Update current votes: add current and incoming vote.</li>
<li>If all nodes have been voted:
<ul>
<li>commit combined carries: <code>on(Commit{carries_})</code>.</li>
</ul></li>
<li>Otherwise if state is initial:
<ul>
<li>change state to <code>State::Voted</code>,</li>
<li>broadcast updated <code>Vote</code> message to other replicas: <code>broadcast(Vote{carries_, nodes_})</code>.</li>
</ul></li>
</ol>
<h4>6.2.3 Commit</h4>
<p>Commit step is straightforward:</p>
<ol>
<li>Change state to <code>State::Completed</code>.</li>
<li>Broadcast commit message to others: <code>broadcast(commit)</code>.</li>
<li>Execute committed carries: <code>execute(commit.commitSet)</code>.</li>
</ol>
<p>Committed set represents the ordered sequence of client commands. Algorithm must ensure that each replica executes the same sequence of client messages.</p>
<h4>6.2.4 Disconnect</h4>
<p>If the node is disconnected the object receives <code>Disconnect</code> message. The sequence of actions to be performed is the following:</p>
<ol>
<li>If disconnection takes place before client message:
<ul>
<li>remove the disconnected node from the group: <code>nodes_.erase(context().sourceNode)</code>.</li>
</ul></li>
<li>Otherwise replica:
<ul>
<li>generates new <code>Vote</code> message with updated replica group that excludes disconnected node:
<code>Vote{carries_, nodes_ - context().sourceNode}</code> where <code>context().sourceNode</code> contains disconnected <code>NodeId</code>,</li>
<li>sends it to itself: <code>on(Vote{...})</code>.</li>
</ul></li>
</ol>
<h3>6.3 State Diagram</h3>
<p>The following diagram unites states and messages:</p>
<p><center><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglCRYCzrSWw5aEq3Lc4XQdcruPuoSAhAYe-UHVYeACIg5HbTtS0JAtFbRT6ofTCE24qbc9Rw7ngc5vXUexJhEkJwEIVMys2tZhiNo7ZTU1jDiYU8LtD_Wosk-3RQe38KXGqm5l3tayDJA/s1600/state_sore.png" alt="alt text"></center></p>
<h3>6.4 Code</h3>
<p>The code below implements the naive approach:</p>
<pre><code class="cpp"><span class="keyword">struct</span> ReplobSore
{
<span class="keyword">enum</span> <span class="keyword">struct</span> State
{
Initial,
Voted,
Completed,
};
<span class="comment">// accepts new incoming message from the client</span>
<span class="keyword">void</span> on(<span class="keyword">const</span> Carry& msg)
{
<span class="comment">// generates initial vote message</span>
on(Vote{CarrySet{msg}, nodes_});
}
<span class="comment">// main handler: accepts vote from any replica</span>
<span class="keyword">void</span> on(<span class="keyword">const</span> Vote& vote)
{
<span class="comment">// committed? => skip</span>
<span class="keyword">if</span> (state_ == State::Completed)
<span class="keyword">return</span>;
<span class="comment">// does not the vote belong to the group? => skip it</span>
<span class="keyword">if</span> (nodes_.count(context().sourceNode) == <span class="number">0</span>)
<span class="keyword">return</span>;
<span class="comment">// combine messages from other replicas</span>
carries_ |= vote.carrySet;
<span class="comment">// check group changing</span>
<span class="keyword">if</span> (nodes_ != vote.nodesSet)
{
<span class="comment">// group has been changed =></span>
<span class="comment">// - remove node from the group</span>
<span class="comment">// - cleanup all votes</span>
<span class="comment">// - restart voting</span>
state_ = State::Initial;
nodes_ &= vote.nodesSet;
voted_.clear();
}
<span class="comment">// combine votes from source and destination</span>
voted_ |= context().sourceNode;
voted_ |= context().currentNode;
voted_ &= nodes_;
<span class="keyword">if</span> (voted_ == nodes_)
{
<span class="comment">// all replicas have been voted => commit</span>
on(Commit{carries_});
}
<span class="keyword">else</span> <span class="keyword">if</span> (state_ == State::Initial)
{
<span class="comment">// otherwise switch to voted state</span>
<span class="comment">// and broadcast current votes</span>
state_ = State::Voted;
broadcast(Vote{carries_, nodes_});
}
}
<span class="keyword">void</span> on(<span class="keyword">const</span> Commit& commit)
{
<span class="comment">// committed? => skip</span>
<span class="keyword">if</span> (state_ == State::Completed)
<span class="keyword">return</span>;
state_ = State::Completed;
<span class="comment">// broadcast received commit</span>
broadcast(commit);
<span class="comment">// execute client messages combined from all replicas</span>
execute(commit.commitSet);
}
<span class="keyword">void</span> on(<span class="keyword">const</span> Disconnect&)
{
<span class="comment">// disconnect handler</span>
<span class="comment">// disconnected node are placed into Context::sourceNode</span>
<span class="keyword">if</span> (carries_.empty())
{
<span class="comment">// on initial stage just remove from the group</span>
nodes_.erase(context().sourceNode);
}
<span class="keyword">else</span>
{
<span class="comment">// otherwise send vote with reduced set of nodes</span>
on(Vote{carries_, nodes_ - context().sourceNode});
}
}
<span class="keyword">private</span>:
State state_ = State::Initial;
NodesSet nodes_;
NodesSet voted_;
CarrySet carries_;
};
</code></pre>
<h3>6.5 Examples</h3>
<p>Let's consider different scenarios of message handling for different replicas.</p>
<h4>6.5.1 1 Concurrent Client</h4>
<p>The first example demonstrates the sequence of operation in situation when the only client tries to commit the message using the group of 3 replicas:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZmF8Fc9i-a4wgIxQ8CWS3AvU6nY7obl8i0tDJmleGCV_h0yx9-GD7D_v0YyuUn7jgiXcvTcsmAq5yFLa2ccWz095_v4rpzqS7PgEjVoEn38S_UILxsRixP95n2mvXwKw9WrDBLk4m6kk/s1600/diagram_sore1.png" alt="alt text" title="1 client"></p>
<p>The following designations are used:</p>
<ol>
<li>There are 3 replicas: #1, #2 and #3.</li>
<li>Initial client message and corresponding state are marked using yellow color.</li>
<li>Normal voting process is marked by gray color.</li>
<li>Commit state and messages are marked using green color.</li>
<li><code>C:1</code> means carry message from the first client.</li>
<li><code>V:13</code> means that state contains votes from the 1st and 3rd replicas.</li>
<li>Round indicates the half of round trip.</li>
</ol>
<p>It takes 2 rounds to commit the initial client message on the whole set of replicas. Thus the number of round trips is equal to 1.</p>
<h4>6.5.2 2 Concurrent Clients</h4>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiRqrAMLcE7JRt9yjTHEKnqg7YyUuIoW0kJRS_UIEPGsZ6_NpNV743EqpcEAk4_CFDJzI_SsThlNJNnwJDRuLj5NhrCHTfrjXrg75YBqQMOuMr1BImLLypRrRV5RhASWYqEYZ5uVFXI24/s1600/diagram_sore2.png" alt="alt text" title="2 clients"></p>
<p>The result looks pretty similar: it takes 2 rounds to commit all client messages and the number of round trips is equal to 1 in this scenario.</p>
<h4>6.5.3 3 Concurrent Clients</h4>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZDOlIVSPg7eWqwTdh1rWaonPIIiCcU-mAYjcTADP9e6nSK1w_DrXnscDFsKJk4gnIpMmmMLhy0-uSjYIiFVZkavfCT_dPPu1LeQaRUVDydDU0MlPolGulOo7SnCEkWG3rP9LU42L_ds0/s1600/diagram_sore3.png" alt="alt text" title="3 clients"></p>
<p>The number of round trips in that case: 0.5.</p>
<h4>6.5.4 2 Concurrent Clients and Disconnection</h4>
<p>The following example demonstrates the sequence of operation when the first replica is crashed:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_tG5ETn5sy1Q-zdVXC6SuCCRokwBW42mqAJ7Eq9dM9mvYIgZLkD7lXHQQtOb7GbGm4deEA5FncYzKiA8p0q_-O_UbNMALVz_l-18QKgrx3cck8ATTWnmJl4aIL2sFAlWeOj3Zph2IwqM/s1600/diagram_sore2_disconnect.png" alt="alt text" title="2 clients and replica failure"></p>
<p>Red color identifies the replica failure.</p>
<h2>7 DAVE</h2>
<p><center>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhC0-JoDeGX2VTpPbgWHR9sjMco8PhBsatVyYPifZix0Zcjnjl-KQjqbts-YUIlFKkptSnTbFSC-S121CJfYVF5wWXgvJ8Tmu_-EG2mI4d7Fk6TS0XO6VtwyxtbqYj8clKdBs4tV2DgnAk/s1600/where_proofs.jpg" alt="alt text" title="What are your evidences?"><br>
<small>"What are your evidences?"</small>
</center></p>
<p>Usually the consensus algorithms must be verified. Thus I have created a special application that allows to verify distributed algorithms: Distributed Asynchronous Verification Emulator or simply <a href="https://github.com/gridem/DAVE">DAVE</a>.</p>
<p>Verification is based on the following model:</p>
<ol>
<li>Distributed system contains specific number of nodes.</li>
<li>Each node may have specific number of services.</li>
<li>Each service accepts specific number of messages.</li>
<li>Each service may send any number of messages to other services.</li>
<li>All messages are delivered asynchronously.</li>
<li>At each time the node may crash. Each service on the crashed node is died and the rest of the nodes receive the <code>Disconnect</code> message delivered asynchronously.</li>
<li>Each service has a mailbox for incoming messages. Mailbox uses FIFO queue thus the service may process the message extracted from the head of the queue.</li>
<li>Service process messages one by one.</li>
<li>On message processing service may change its internal state or/and may send any number and type of messages to any service on any node including itself.</li>
</ol>
<p>Services lifecycle is the following:</p>
<ol>
<li>Initially the message <code>Init</code> is arrived synchronously for each service to create and set initial state. Service may set internal state and/or send any message to any local and/or remote service.</li>
<li>The message <code>Disconnect</code> is used to notify about remote node failure. <code>context().sourceNode</code> contains node identifier for the failed node. That message is delivered asynchronously.</li>
</ol>
<p>To simplify verification procedure and algorithm development the following approach is used: service cannot send the message to the destination service if the destination service does not contain the method to accept that message. In that case compilation error takes place generated by compiler.</p>
<p>The consensus verification procedure involves the following steps:</p>
<ol>
<li>DAVE initializes 3 nodes and their services.</li>
<li><code>Client</code> service issues initial request. Initial request contains the message to be committed. The number of client instances varies from 1 to 3 meaning that the number of simultaneous client messages to be agreed on varies from 1 to 3.</li>
<li>Client message <code>Carry</code> initiates <code>Replob</code> service starting the consensus algorithm.</li>
<li>Finally DAVE verifies the committed entries collected from all nodes:
<ol>
<li>Consensus algorithm must commit all client messages in the same order on any node.</li>
<li>Non-failed node must contain at least 1 committed message because the first node cannot fail and always sends the message to be committed.</li>
<li>Failed node may contain any committed prefix from non-failed node including empty prefix.</li>
</ol></li>
</ol>
<p>DAVE scheduler uses asynchronous message delivering and tries to handle all asynchronous variants by using brute force approach. The scheduler may choose any service with nonempty mailbox and process its head message. Along with that any node except the first one may be disconnected but the only one. Thus at any time the number of available nodes may be 2 or 3.</p>
<h3>7.1 Verification Results</h3>
<p>The following cases should be considered and verified:</p>
<ol>
<li>1 concurrent client.</li>
<li>2 concurrent clients.</li>
<li>3 concurrent clients.</li>
</ol>
<p>Those cases cover all concurrent variants for 3 replicas.</p>
<h4>7.1.1 1 Concurrent Client</h4>
<p>DAVE provides the following statistics for that execution:</p>
<pre><code>global stats: iterations: 61036, disconnects: 54340
</code></pre>
<p>All verification checks were passed.</p>
<h4>7.1.2 2 Concurrent Clients</h4>
<p>DAVE provides the following outputs:</p>
<pre><code>Verification failed: firstCommitted == nodeCommitted, must be agreement among the same
sequence of messages, node #0: {:112565}, node #2: {:112565, :112566}
Failed sequence: {0, 0, 2, 2, 2, 0, 0, 0, 0, 0}
invoking: Apply 0=>0 [trigger]
invoking: Apply 1=>1 [trigger]
invoking: Vote 0=>2 [trigger]
invoking: Vote 1=>2 [trigger]
invoking: node disconnection: 1 [disconnect]
invoking: Vote 1=>0 [trigger]
invoking: Vote 2=>0 [trigger]
invoking: Commit 2=>0 [trigger]
invoking: Vote 0=>2 [trigger]
invoking: Commit 0=>2 [trigger]
Max fails reached
global stats: iterations: 56283, disconnects: 50002
</code></pre>
<p><center>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEic_k8p9tGbY2JuPzI4zKlVQXejPzRlqtnuazA8DTG19AsaB2xeJvqFegPU9NLK0ENEEHC5rgFQBX5Mss6r9jK1nk32lxyn9JOL3NrPv05EVbHN267lbBQMG0KcQ6W6G2pNpe9GGEclbO8/s1600/cocainum.jpg" alt="alt text" title="Cocainum!"><br>
<small>"Cocainum"</small>
</center></p>
<p>Non-failed nodes #0 and #2 have different commit sequence because node #2 has additional committed message with <code>id=112566</code>. Let's consider the output message sequence in detail:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjagSSEIaDSUeuVPkbfB_FDUNj0A6QJadDE1kQFfBShq5VuGBRzrLUkD3ypMHxojEbV5W1lsLcRnQs-ZLm4J1ImVQoYUeS7Ok_wv-LfDRXwcwRNS63uvBxmyor_L8YotNdioBAIV4uVeXk/s1600/diagram_sore2_dave.png" alt="alt text" title="Verification failure: 2 clients"></p>
<p><code>N:13</code> means that <code>nodes_</code> variable contains #1 and #3 replicas while <code>N:123</code> contains the whole set of nodes. Red color designates the replica failure event and corresponding messages. <code>D:2</code> identifies the propagation of failure of #2 to corresponding node.</p>
<p>This diagram depicts that #1 and #3 nodes have different committed sequences: <code>C:1</code> and <code>C:12</code> respectively. The reason is that the first replica has been changed the <code>nodes_</code> set and commit the sequence without waiting for the message from the failed node. Due to differences in a valid set of replicas <code>N:13</code> for #1 and <code>N:123</code> for #3 the committed sequences <code>C:1</code> and <code>C:12</code> accordingly differ from each other.</p>
<h4>7.1.3 3 Concurrent Clients</h4>
<p>In this case DAVE found the following sequence:</p>
<pre><code>Verification failed: firstCommitted == nodeCommitted, must be agreement among the same
sequence of messages, node #0: {:304, :305, :306}, node #2: {:304, :306}
Failed sequence: {0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0}
invoking: Apply 0=>0 [trigger]
invoking: Apply 1=>1 [trigger]
invoking: Vote 1=>0 [trigger]
invoking: Vote 0=>1 [trigger]
invoking: Apply 2=>2 [trigger]
invoking: Vote 2=>0 [trigger]
invoking: Vote 2=>1 [trigger]
invoking: Commit 1=>0 [trigger]
invoking: Commit 0=>1 [trigger]
invoking: node disconnection: 1 [disconnect]
invoking: Vote 2=>0 [trigger]
invoking: Vote 0=>2 [trigger]
invoking: Commit 2=>0 [trigger]
invoking: Vote 1=>2 [trigger]
invoking: Commit 0=>2 [trigger]
invoking: Commit 1=>2 [trigger]
Max fails reached
global stats: iterations: 102, disconnects: 91
</code></pre>
<p>Again, non-failed nodes #0 and #2 have unequal committed sequence. The following picture represents the sequence of operations:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoOslaz7_w3MuLLMJK4yZ0VlWyaPGN8f_-Dskzel4-ipea1HFW-nu9A9G6HjgSFICXwmIxx-y3qDQzcRqVjgbQAaVf85zio3EKg8bMvFzUYXuPI1HuaMbtX1HAErbewtC42FJ9gW8_zyU/s1600/diagram_sore3_dave.png" alt="alt text" title="Verification failure: 3 clients"></p>
<p>The same reason causes the same consequences: differences in nodes group causes inconsistent committed sequences.</p>
<h2>8 Calm Masterless Consensus Algorithm</h2>
<p>Calm algorithm resolves the found issues. The main differences are:</p>
<ol>
<li>The logic splits <code>Voted</code> state into <code>MayCommit</code> and <code>CannotCommit</code> states.</li>
<li><code>MayCommit</code> allows to commit the set of client messages <code>CarrySet</code> once the replica receives the same <code>Vote</code> messages from all alive replicas.</li>
<li>Otherwise the replica switches the state to <code>CannotCommit</code> and retries the voting process.</li>
</ol>
<h3>8.1 State Diagram</h3>
<p>The following diagram represents the logic described above:</p>
<p><center><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsmY3JRh697rtruRs1cd2IFRvYqY-UPf3K-J2re5Qrle-YkRJESqqXMmYSAWNpPwuWq04_fzWXN6g69ncHYOfrkSDca1GF3eQShl1yYHXPSW0RJq_u1V3DHMcEOE0mDsw6ENfPOnFHmnk/s1600/state_calm.png" alt="alt text"></center></p>
<h3>8.2 Code</h3>
<p>The code below represent the final verified version of the "calm" algorithm:</p>
<pre><code class="cpp"><span class="keyword">struct</span> ReplobCalm
{
<span class="keyword">enum</span> <span class="keyword">struct</span> State
{
ToVote,
MayCommit,
CannotCommit,
Completed,
};
<span class="comment">// the following methods utilize the same logic as before</span>
<span class="keyword">void</span> on(<span class="keyword">const</span> Carry& msg);
<span class="keyword">void</span> on(<span class="keyword">const</span> Commit& commit);
<span class="keyword">void</span> on(<span class="keyword">const</span> Disconnect&);
<span class="comment">// the heart of the verified "calm" algorithm</span>
<span class="keyword">void</span> on(<span class="keyword">const</span> Vote& vote)
{
<span class="comment">// committed? => skip</span>
<span class="keyword">if</span> (state_ == State::Completed)
<span class="keyword">return</span>;
<span class="comment">// does not the vote belong to the group? => skip it</span>
<span class="keyword">if</span> (nodes_.count(context().sourceNode) == <span class="number">0</span>)
<span class="keyword">return</span>;
<span class="keyword">if</span> (state_ == State::MayCommit && carries_ != vote.carrySet)
{
<span class="comment">// if there is changes in client messages =></span>
<span class="comment">// we cannot commit and need to restart the procedure</span>
state_ = State::CannotCommit;
}
<span class="comment">// combine messages from other replicas</span>
carries_ |= vote.carrySet;
<span class="comment">// combine votes from source and destination</span>
voted_ |= context().sourceNode;
voted_ |= context().currentNode;
<span class="keyword">if</span> (nodes_ != vote.nodesSet)
{
<span class="comment">// group has been changed =></span>
<span class="comment">// remove node from the group</span>
<span class="keyword">if</span> (state_ == State::MayCommit)
{
<span class="comment">// we cannot commit at this step if the group has been changed</span>
state_ = State::CannotCommit;
}
nodes_ &= vote.nodesSet;
voted_ &= vote.nodesSet;
}
<span class="keyword">if</span> (voted_ == nodes_)
{
<span class="comment">// received replies from all available replicas</span>
<span class="keyword">if</span> (state_ == State::MayCommit)
{
<span class="comment">// may commit? => commit!</span>
on(Commit{carries_});
<span class="keyword">return</span>;
}
<span class="keyword">else</span>
{
<span class="comment">// otherwise restart the logic</span>
state_ = State::ToVote;
}
}
<span class="keyword">if</span> (state_ == State::ToVote)
{
<span class="comment">// initially we broadcast our internal state</span>
state_ = State::MayCommit;
broadcast(Vote{carries_, nodes_});
}
}
<span class="keyword">private</span>:
State state_ = State::ToVote;
NodesSet nodes_;
NodesSet voted_;
CarrySet carries_;
};
</code></pre>
<p>The main difference is the following. On any <code>nodes_</code> or <code>carries_</code> changing the algorithm forbids going to the commit stage and restarts the voting process from the beginning.</p>
<h3>8.3 Examples</h3>
<h4>8.3.1 1 Concurrent Client</h4>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtfDcKRGCPDi7_ENa-BfGqjZzMRwqb7dkxUY6h3CSVfDddqYTlnrfl6zf9tRVmgfmk6eHrAiry3nkKP9o_0AmDCu8krv-MAPStPZH1-07aqulLqimlr04D4mmhoXE5SkK7V3b-Jupoaqc/s1600/diagram_calm1.png" alt="alt text" title="1 client"></p>
<p>This diagram is similar to the previous algorithm diagram.</p>
<p>Characteristics:</p>
<ul>
<li>Round trips: 1</li>
<li>Messages:
<ul>
<li>Client: 1</li>
<li>Vote: 6 (6 per single client message)</li>
<li>Commit: 6 (6 per single client message)</li>
</ul></li>
</ul>
<h4>8.3.2 2 Concurrent Clients</h4>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiI56OLEWuWG6jsRtDGRPM-hj8ChXDDx5AqZYEYhgnsm2VWW6kCXJ7FrQq1qK54QGS080baZ7ssQ-HR102dODIHx8Y9BugzuMsMtqjvl_miSz4tHHIUvJW1Qfhd3oBzUa3n730A8szl3o4/s1600/diagram_calm2.png" alt="alt text" title="2 clients"></p>
<p>The main difference is the increased number of round trips from 1 to 1.5 required to commit the sequence. The reason is that each replica has to go through the state <code>V:123 C:12</code> with <code>State::CannotCommit</code> before committing to ensure consistency and correct propagation the state across all replicas.</p>
<p>Characteristics:</p>
<ul>
<li>Round trips: 1.5</li>
<li>Messages:
<ul>
<li>Client: 2</li>
<li>Vote: 12 (6 per single client message)</li>
<li>Commit: 6 (3 per single client message)</li>
</ul></li>
</ul>
<h4>8.3.3 3 Concurrent Clients</h4>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTHi_KicCkNQAeIad0avpi5ZJNg7ZNC9l_ow31vzpWVXD1DX9d2RVtrmHopFMcKFEttBm5d3g6Ht0iCa0soEY0irdl-taWr6V49sNLxeEgqaI02zSwNcZk7IX-6fvBub9nJXZWbIrnyI4/s1600/diagram_calm3.png" alt="alt text" title="3 clients"></p>
<p>Case seems to be the same: the number of round trips is increased on 0.5 from 0.5 to 1 with the same reason: each replica has to go through the state <code>V:123 C:123</code> with <code>State::CannotCommit</code>.</p>
<p>Characteristics:</p>
<ul>
<li>Round trips: 1</li>
<li>Messages:
<ul>
<li>Client: 3</li>
<li>Vote: 12 (4 per single client message)</li>
<li>Commit: 6 (2 per single client message)</li>
</ul></li>
</ul>
<h3>8.4 Overall Characteristics</h3>
<p>The table below represents characteristics and number of different messages under typical conditions:</p>
<table><thead>
<tr>
<th align="center">Concurrent Client Messages</th>
<th align="right">Round Trips</th>
<th align="right">Total Votes</th>
<th align="right">Total Commits</th>
<th align="right">Votes per Client</th>
<th align="right">Commits per Client</th>
<th align="right">Total Messages per Client</th>
</tr>
</thead><tbody>
<tr>
<td align="center">1</td>
<td align="right">1</td>
<td align="right">6</td>
<td align="right">6</td>
<td align="right">6</td>
<td align="right">6</td>
<td align="right"><em>12</em></td>
</tr>
<tr>
<td align="center">2</td>
<td align="right">1.5</td>
<td align="right">12</td>
<td align="right">6</td>
<td align="right">6</td>
<td align="right">3</td>
<td align="right"><em>9</em></td>
</tr>
<tr>
<td align="center">3</td>
<td align="right">1</td>
<td align="right">12</td>
<td align="right">6</td>
<td align="right">4</td>
<td align="right">2</td>
<td align="right"><em>6</em></td>
</tr>
</tbody></table>
<p>Algorithm demonstrates nonobvious feature: increasing concurrency decreases the number of messages required to commit the client messages.</p>
<h3>8.5 Calm Verification Results</h3>
<p><a href="https://github.com/gridem/DAVE">DAVE</a> executions showed that all test were passed successfully. The table below represents the collected statistics based on different DAVE running:</p>
<table><thead>
<tr>
<th align="center">Concurrent Messages</th>
<th align="left">Set of Nodes Accepted Client Messages</th>
<th align="right">Total Verified Variants</th>
</tr>
</thead><tbody>
<tr>
<td align="center">1</td>
<td align="left">#0</td>
<td align="right">59 986</td>
</tr>
<tr>
<td align="center">2</td>
<td align="left">#0, #1</td>
<td align="right">148 995 211</td>
</tr>
<tr>
<td align="center">3</td>
<td align="left">#0, #1, #2</td>
<td align="right">734 368 600</td>
</tr>
</tbody></table>
<h2>9 Discussion</h2>
<p>Masterless approach, by definition, means that at any time there is no master. Thus any node is responsible to propagate the client request to the replica group. It is achieved by broadcasting the state to the destination nodes. Thus the algorithm eliminates failover part completely which is crucial and one of the most complex part for other generations of consensus algorithms.</p>
<p>As a consequence client may send request to any node. Thus the algorithm is ready for geo-distributed datacenter replication because the client may choose the closest node that belongs to the same datacenter. The required number of round trips to synchronize and agree on the same sequence is equal to 1 approximately in most common cases. It makes suitable for the wide range of applications including distributed storages, persistent queues, real-time data analysis and on-line data processing.</p>
<p>Additionally, the broadcast design allows to tolerate with a variety of network failures providing robust data exchanging model. Periodic connectivity issues do not affect the replicas and commit may be easily propagated to be accepted by each node.</p>
<p>Finally, the algorithm contains smooth replicas degradation meaning that the group may dynamically shrink due to node failures or different network issues. E.g. algorithm is capable to preserve safety under network split. The whole spectrum of failures and appropriate algorithm changing will be considered in subsequent article.</p>
<h2>10 Conclusion</h2>
<p>Masterless approach is the completely new generation of the consensus algorithms. Along with that it is the simplest known consensus algorithm. The algorithm reduces the number of node roles and makes replicas indistinguishable allowing logic unification. That unification significantly reduces the algorithm complexity providing straightforward symmetric implementation based on the message broadcast to each replica within the group. Moreover, the masterless algorithm partially includes group membership changes providing smooth set of replicas degradation.</p>
<p>The described "calm" masterless algorithm is not the only approach. There are various set of algorithms that have different and distinct features. Stay tuned!</p>
<p><center><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFqk96OGrvpRSqs18kC54yY6Qq60ln_YP1NzxscEjbZFryutLffcuthZSJEenWQDoj6kfNlB0PT2PwWJpeoDmvAE4aeA0P3sz7xtumMXvYuq4bQEVZYcC681piLLOsSQS8cKO7dR2QqXw/s1600/red_cool.jpg" alt="alt text" title="Verified!"></center></p>
<h2>References</h2>
<p>[1] Article: <a href="http://zoo.cs.yale.edu/classes/cs426/2012/bib/fischer83consensus.pdf">The Consensus Problem in Unreliable Distributed Systems (A Brief Survey), Michael J. Fischer</a>.</p>
<p>[2] Article: <a href="https://cs.brown.edu/%7Emph/HerlihyW90/p463-herlihy.pdf">Linearizability: A Correctness Condition for Concurrent Objects, Maurice P. Herlihy etc</a>.</p>
<p>[3] Wikipedia: <a href="https://en.wikipedia.org/wiki/State_machine_replication">State machine replication</a>.</p>
<p>[4] Article: <a href="http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf">Paxos Made Simple, Leslie Lamport</a>.</p>
<p>[5] Article: <a href="http://diyhpl.us/%7Ebryan/papers2/distributed/distributed-systems/zab.totally-ordered-broadcast-protocol.2008.pdf">A simple totally ordered broadcast protocol, Benjamin Reed etc</a>.</p>
<p>[6] Article: <a href="https://ramcloud.stanford.edu/raft.pdf">In Search of an Understandable Consensus Algorithm (Extended Version), Diego Ongaro</a>.</p>
<p>[7] Article: <a href="https://www.cs.cmu.edu/%7Edga/papers/epaxos-sosp2013.pdf">There Is More Consensus in Egalitarian Parliaments, Iulian Moraru etc</a>.</p>
<p>[8] Article: <a href="http://www.hyflow.org/pubs/opodis14-alvin.pdf">Be General and Don’t Give Up Consistency in Geo-Replicated Transactional Systems, Alexandru Turcu etc</a>.</p>
<p>[9] Documentation: <a href="https://raft.github.io">The Raft Consensus Algorithm</a>.</p>
<p>[10] Github: <a href="https://github.com/gridem/DAVE">Distributed Asynchronous Verification Emulator aka DAVE</a>.</p>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com6tag:blogger.com,1999:blog-7694239937514449322.post-27053161409400511162015-11-10T09:26:00.001-08:002017-11-16T10:06:34.833-08:00Replicated Object. Part 2: God Adapter<h2>1 Annotation</h2>
<p>The article introduces a special adapter that allows developers to wrap any object into another one with additional features you want to include. Adapted objects have the same interface thus they are completely transparent from the usage point of view. The generic concept will be introduced step-by-step using simple but powerful examples.</p>
<h2>2 Introduction</h2>
<p><strong>Disclaimer</strong>. If you are not tolerant to C++ perversions please stop reading this article.</p>
<p>The term <em>god adapter</em> is originated from <em><a href="https://en.wikipedia.org/wiki/God_object">god object</a></em> meaning that it implements many features. The same idea is applicable for <em>god adapter</em> as well. Such adapter has outstanding responsibility and includes features that you can or even cannot imagine.</p>
<a name='more'></a>
<h2>3 Problem Statement</h2>
<p>Recently I've presented <a href="http://gridem.blogspot.com/2014/06/c-user-group-in-st-petersburg-21-june.html"><em>smart mutex</em></a> concept to simplify shared data access. The idea was simple: associate mutex with the data and automatically invoke <code>lock</code> and <code>unlock</code> on any data access. The code looks like the following:</p>
<pre><code class="cpp"><span class="keyword">struct</span> Data
{
<span class="keyword">int</span> get() <span class="keyword">const</span>
{
<span class="keyword">return</span> val_;
}
<span class="keyword">void</span> <span class="built_in">set</span>(<span class="keyword">int</span> v)
{
val_ = v;
}
<span class="keyword">private</span>:
<span class="keyword">int</span> val_ = <span class="number">0</span>;
};
<span class="comment">// declare smart mutex</span>
SmartMutex<Data> d;
<span class="comment">// set value, lock and unlock will be taken automatically</span>
d-><span class="built_in">set</span>(<span class="number">4</span>);
<span class="comment">// get value</span>
<span class="built_in">std</span>::<span class="built_in">cout</span> << d->get() << <span class="built_in">std</span>::endl;
</code></pre>
<p>There are several problems.</p>
<h3>3.1 Locking Time</h3>
<p>Lock is obtained for the time of the current expression. Let's consider the following line:</p>
<pre><code class="cpp"><span class="built_in">std</span>::<span class="built_in">cout</span> << d->get() << <span class="built_in">std</span>::endl;
</code></pre>
<p>Unlock is called after all expression is executed including output to <code>std::cout</code>. It's wasting of the time under lock and significantly increases the probability of lock contention.</p>
<h3>3.2 Deadlock possibility</h3>
<p>As a consequence of the first problem, there is a possibility of deadlock due to implicit locking mechanism and expression locking time. Let's consider the following code snippet:</p>
<pre><code class="cpp"><span class="keyword">int</span> sum(<span class="keyword">const</span> SmartMutex<Data>& x, <span class="keyword">const</span> SmartMutex<Data>& y)
{
<span class="keyword">return</span> x->get() + y->get();
}
</code></pre>
<p>It's not evident that the function potentially contains deadlock because <code>->get</code> method can be called in any order for different pair of <code>x</code> and <code>y</code> instances.</p>
<p>Thus it would be better to avoid locking time increasing and mentioned deadlocks as much as possible.</p>
<h2>4 Solution</h2>
<p>The idea is quite simple: we need to incorporate proxy functionality inside the call invocation. To further improve user experience we replace <code>-></code> with <code>.</code>.</p>
<p>Basically, we need to transform our <code>Data</code> into another object:</p>
<pre><code class="cpp"><span class="keyword">using</span> Lock = <span class="built_in">std</span>::unique_lock<<span class="built_in">std</span>::mutex>;
<span class="keyword">struct</span> DataLocked
{
<span class="keyword">int</span> get() <span class="keyword">const</span>
{
Lock _{mutex_};
<span class="keyword">return</span> data_.get();
}
<span class="keyword">void</span> <span class="built_in">set</span>(<span class="keyword">int</span> v)
{
Lock _{mutex_};
data_.<span class="built_in">set</span>(v);
}
<span class="keyword">private</span>:
<span class="keyword">mutable</span> <span class="built_in">std</span>::mutex mutex_;
Data data_;
};
</code></pre>
<p>In that case we have controlled mutex obtain/release operations within methods scope. It prevents from the problems mentioned before.</p>
<p>But it's inconvenient to implement in this way because the base idea of smart mutex is to avoid additional boilerplate coding. The desired way is to use benefits from both approaches: less code and less problems. Thus I have to generalize that solution and spread it for wider usage scenarios.</p>
<h3>4.1 Generalized Adapter</h3>
<p>We need to somehow adapt our old implementation <code>Data</code> without mutex for mutex-based implementation that should look like <code>DataLocked</code> class. For that purpose let's wrap our method call to further invoke in another context:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T_base>
<span class="keyword">struct</span> DataAdapter : T_base
{
<span class="comment">// let's consider just set method</span>
<span class="keyword">void</span> <span class="built_in">set</span>(<span class="keyword">int</span> v)
{
<span class="keyword">this</span>->call([v](Data& data) {
data.<span class="built_in">set</span>(v);
});
}
};
</code></pre>
<p>Here we postpone the call <code>data.set(v)</code> and transfer it to <code>T_base::call(lambda)</code> method. The possible implementation of <code>T_base</code> could be:</p>
<pre><code class="cpp"><span class="keyword">struct</span> MutexBase
{
<span class="keyword">protected</span>:
<span class="keyword">template</span><<span class="keyword">typename</span> F>
<span class="keyword">void</span> call(F f)
{
Lock _{mutex_};
f(data_);
}
<span class="keyword">private</span>:
Data data_;
<span class="built_in">std</span>::mutex mutex_;
};
</code></pre>
<p>As you can see we split the monolith implementation of <code>DataLocked</code> class into two classes: <code>DataAdapter<T_base></code> and <code>MutexBase</code> as one of the possible base class for created adapter. But the actual implementation is very close: we hold the mutex during <code>Data.set(v)</code> call.</p>
<h3>4.2 More Generalization</h3>
<p>Let's further generalize our implementation. We have <code>MutexBase</code> implementation but it works only for <code>Data</code>. Let's solve this:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T_base, <span class="keyword">typename</span> T_locker>
<span class="keyword">struct</span> BaseLocker : T_base
{
<span class="keyword">protected</span>:
<span class="keyword">template</span><<span class="keyword">typename</span> F>
<span class="keyword">auto</span> call(F f)
{
Lock _{lock_};
<span class="keyword">return</span> f(<span class="keyword">static_cast</span><T_base&>(*<span class="keyword">this</span>));
}
<span class="keyword">private</span>:
T_locker lock_;
};
</code></pre>
<p>Here are several generalizations:</p>
<ol>
<li>I don't use specific mutex implementation. You can use either <code>std::mutex</code> or any kind of <a href="http://en.cppreference.com/w/cpp/concept/BasicLockable"><code>BasicLockable</code> concept</a>.</li>
<li><code>T_base</code> represents the instance of the object with the same interface. It could be <code>Data</code> or event adapted <code>Data</code> object like <code>DataLocked</code>.</li>
</ol>
<p>Thus we can define:</p>
<pre><code class="cpp"><span class="keyword">using</span> DataLocked = DataAdapter<BaseLocker<Data, <span class="built_in">std</span>::mutex>>;
</code></pre>
<h3>4.3 I Need More Generalization</h3>
<p>I cannot stop myself. Sometimes I would like to transform the input parameters. For that purpose I modify the adapter:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T_base>
<span class="keyword">struct</span> DataAdapter : T_base
{
<span class="comment">// let's consider just set method</span>
<span class="keyword">void</span> <span class="built_in">set</span>(<span class="keyword">int</span> v)
{
<span class="keyword">this</span>->call([](Data& data, <span class="keyword">int</span> v) {
data.<span class="built_in">set</span>(v);
}, v);
}
};
</code></pre>
<p>And <code>BaseLocker</code> implementation is transformed to:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T_base, <span class="keyword">typename</span> T_locker>
<span class="keyword">struct</span> BaseLocker : T_base
{
<span class="keyword">protected</span>:
<span class="keyword">template</span><<span class="keyword">typename</span> F, <span class="keyword">typename</span>... V>
<span class="keyword">auto</span> call(F f, V&&... v)
{
Lock _{lock_};
<span class="keyword">return</span> f(<span class="keyword">static_cast</span><T_base&>(*<span class="keyword">this</span>), <span class="built_in">std</span>::forward<V>(v)...);
}
<span class="keyword">private</span>:
T_locker lock_;
};
</code></pre>
<h3>4.4 God Adapter</h3>
<p>Finally let's reduce the boilerplate code related to adapter. For that purpose I will use macro:</p>
<pre><code class="cpp"><span class="preprocessor">#define DECL_FN_ADAPTER(D_name) \</span>
<span class="keyword">template</span><<span class="keyword">typename</span>... V> \
<span class="keyword">auto</span> D_name(V&&... v) \
{ \
<span class="keyword">return</span> <span class="keyword">this</span>->call([](<span class="keyword">auto</span>& t, <span class="keyword">auto</span>&&... x) { \
<span class="keyword">return</span> t.D_name(<span class="built_in">std</span>::forward<<span class="keyword">decltype</span>(x)>(x)...); \
}, <span class="built_in">std</span>::forward<V>(v)...); \
}
</code></pre>
<p>It wraps any method with name <code>D_name</code>. The only needed action is to iterate through the object methods and wrap them individually:</p>
<pre><code class="cpp"><span class="preprocessor">#define DECL_FN_ADAPTER_ITERATION(D_r, D_data, D_elem) DECL_FN_ADAPTER(D_elem)</span>
<span class="preprocessor">#define DECL_ADAPTER(D_type, ...) \</span>
<span class="keyword">template</span><<span class="keyword">typename</span> T_base> \
<span class="keyword">struct</span> Adapter<BOOST_PP_REMOVE_PARENS(D_type), T_base> : T_base \
{ \
BOOST_PP_LIST_FOR_EACH(DECL_FN_ADAPTER_ITERATION, , \
BOOST_PP_TUPLE_TO_LIST((__VA_ARGS__))) \
};
</code></pre>
<p>Now we can adapt our <code>Data</code> by using just a single line:</p>
<pre><code class="cpp">DECL_ADAPTER(Data, get, <span class="built_in">set</span>)
<span class="comment">// syntactic sugar for mutex-based adapter</span>
<span class="keyword">template</span><<span class="keyword">typename</span> T, <span class="keyword">typename</span> T_locker = <span class="built_in">std</span>::mutex, <span class="keyword">typename</span> T_base = T>
<span class="keyword">using</span> AdaptedLocked = Adapter<T, BaseLocker<T_base, T_locker>>;
<span class="keyword">using</span> DataLocked = AdaptedLocked<Data>;
</code></pre>
<p>That's it!</p>
<h2>5 Examples</h2>
<p>We considered mutex-based adapter. Let's consider other interesting adapters.</p>
<h3>5.1 Reference Counting Adapter</h3>
<p>Sometimes we need to use <code>shared_ptr</code> for our objects. And it would be better to hide this behavior from user: instead of using <code>operator-></code> you would like to use just <code>.</code>. The implementation is very simple:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T>
<span class="keyword">struct</span> BaseShared
{
<span class="keyword">protected</span>:
<span class="keyword">template</span><<span class="keyword">typename</span> F, <span class="keyword">typename</span>... V>
<span class="keyword">auto</span> call(F f, V&&... v)
{
<span class="keyword">return</span> f(*shared_, <span class="built_in">std</span>::forward<V>(v)...);
}
<span class="keyword">private</span>:
<span class="built_in">std</span>::<span class="built_in">shared_ptr</span><T> shared_;
};
<span class="comment">// helper class to create BaseShared object</span>
<span class="keyword">template</span><<span class="keyword">typename</span> T, <span class="keyword">typename</span> T_base = T>
<span class="keyword">using</span> AdaptedShared = Adapter<T, BaseShared<T_base>>;
</code></pre>
<p>Usage:</p>
<pre><code class="cpp"><span class="keyword">using</span> DataRefCounted = AdaptedShared<Data>;
DataRefCounted data;
data.<span class="built_in">set</span>(<span class="number">2</span>);
</code></pre>
<h3>5.2 Adapters Combining</h3>
<p>Sometimes it's a good idea to share the data between threads. The common pattern is to combine <code>shared_ptr</code> with <code>mutex</code>. <code>shared_ptr</code> resolves the issues with object lifetime while <code>mutex</code> is used to avoid race conditions.</p>
<p>Because every adapted object has the same interface as original one we can simply combine several adapters together:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T, <span class="keyword">typename</span> T_locker = <span class="built_in">std</span>::mutex, <span class="keyword">typename</span> T_base = T>
<span class="keyword">using</span> AdaptedSharedLocked = AdaptedShared<T, AdaptedLocked<T, T_locker, T_base>>;
</code></pre>
<p>With usage:</p>
<pre><code class="cpp"><span class="keyword">using</span> DataRefCountedWithMutex = AdaptedSharedLocked<Data>;
DataRefCountedWithMutex data;
<span class="comment">// data instance can be copied, shared and used across threads safely</span>
<span class="comment">// interface remains the same</span>
<span class="keyword">int</span> v = data.get();
</code></pre>
<h3>5.3 Asynchronous Example: From Callback to Future</h3>
<p>Let's go to future. E.g. we have the following interface:</p>
<pre><code class="cpp"><span class="keyword">struct</span> AsyncCb
{
<span class="keyword">void</span> async(<span class="built_in">std</span>::function<<span class="keyword">void</span>(<span class="keyword">int</span>)> cb);
};
</code></pre>
<p>But we would like to use:</p>
<pre><code class="cpp"><span class="keyword">struct</span> AsyncFuture
{
Future<<span class="keyword">int</span>> async();
};
</code></pre>
<p>Where <code>Future</code> has the following interface:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T>
<span class="keyword">struct</span> Future
{
<span class="keyword">struct</span> Promise
{
Future future();
<span class="keyword">void</span> put(<span class="keyword">const</span> T& v);
};
<span class="keyword">void</span> then(<span class="built_in">std</span>::function<<span class="keyword">void</span>(<span class="keyword">const</span> T&)>);
};
</code></pre>
<p>Corresponding adapter is:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T_base, <span class="keyword">typename</span> T_future>
<span class="keyword">struct</span> BaseCallback2Future : T_base
{
<span class="keyword">protected</span>:
<span class="keyword">template</span><<span class="keyword">typename</span> F, <span class="keyword">typename</span>... V>
<span class="keyword">auto</span> call(F f, V&&... v)
{
<span class="keyword">typename</span> T_future::Promise promise;
f(<span class="keyword">static_cast</span><T_base&>(*<span class="keyword">this</span>), <span class="built_in">std</span>::forward<V>(v)..., [promise](<span class="keyword">auto</span>&& val) <span class="keyword">mutable</span> {
promise.put(<span class="built_in">std</span>::move(val));
});
<span class="keyword">return</span> promise.future();
}
};
<span class="keyword">template</span><<span class="keyword">typename</span> T, <span class="keyword">typename</span> T_future, <span class="keyword">typename</span> T_base = T>
<span class="keyword">using</span> AdaptedCallback = Adapter<T, BaseCallback2Future<T_base, T_future>>;
</code></pre>
<p>Usage:</p>
<pre><code class="cpp">DECL_ADAPTER(AsyncCb, async)
<span class="keyword">using</span> AsyncFuture = AdaptedCallback<AsyncCb, Future<<span class="keyword">int</span>>>;
AsyncFuture af;
af.async().then([](<span class="keyword">int</span> v) {
<span class="comment">// obtained value</span>
});
</code></pre>
<h3>5.4 Asynchronous Example: From Future to Callback</h3>
<p>Because it directs us to the past let it be the home task.</p>
<h3>5.5 Lazy</h3>
<p>Developers are lazy. Let's adapt any object to be consistent with developers.</p>
<p>In that context laziness means on-demand object creation. Let's consider the following example:</p>
<pre><code class="cpp"><span class="keyword">struct</span> Obj
{
Obj();
<span class="keyword">void</span> action();
};
Obj obj; <span class="comment">// Obj::Obj ctor is invoked</span>
obj.action(); <span class="comment">// Obj::action is invoked</span>
AdaptedLazy<Obj> obj; <span class="comment">// ctor is not called!</span>
obj.action(); <span class="comment">// Obj::Obj and Obj::action are invoked</span>
</code></pre>
<p>Therefore the idea is to avoid creation as later as possible. If the user decided to use the object we have to create it and invoke appropriate method. The base class implementation could be:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T>
<span class="keyword">struct</span> BaseLazy
{
<span class="keyword">template</span><<span class="keyword">typename</span>... V>
BaseLazy(V&&... v)
{
<span class="comment">// lambda to lazily create the object</span>
state_ = [v...] {
<span class="keyword">return</span> T{<span class="built_in">std</span>::move(v)...};
};
}
<span class="keyword">protected</span>:
<span class="keyword">using</span> Creator = <span class="built_in">std</span>::function<T()>;
<span class="keyword">template</span><<span class="keyword">typename</span> F, <span class="keyword">typename</span>... V>
<span class="keyword">auto</span> call(F f, V&&... v)
{
<span class="keyword">auto</span>* t = boost::get<T>(&state_);
<span class="keyword">if</span> (t == <span class="keyword">nullptr</span>)
{
<span class="comment">// if we don't have instantiated object</span>
<span class="comment">// => create it</span>
state_ = boost::get<Creator>(state_)();
t = boost::get<T>(&state_);
}
<span class="keyword">return</span> f(*t, <span class="built_in">std</span>::forward<V>(v)...);
}
<span class="keyword">private</span>:
<span class="comment">// variant reuses memory to store either object state</span>
<span class="comment">// or lambda to create the object</span>
boost::variant<Creator, T> state_;
};
<span class="keyword">template</span><<span class="keyword">typename</span> T, <span class="keyword">typename</span> T_base = T>
<span class="keyword">using</span> AdaptedLazy = Adapter<T, BaseLazy<T_base>>;
</code></pre>
<p>And now we can create heavy-weight lazy object and create it only if it's necessary. It's completely transparent to the user.</p>
<h2>6 Performance Overhead</h2>
<p>Let's consider the performance penalty from using the adapter. The thing is that we use lambdas and transfer them to other objects. Thus we would like to know the overhead of such adapters.</p>
<p>For that purpose let's consider simple example: wrap object call by using object itself meaning that we create "nullable" adapter and try to measure overhead. And instead of doing direct measurements let's see just assembler output from different compilers.</p>
<p>First, let's create simple version of our adapter to deal with <code>on</code> methods only:</p>
<pre><code class="cpp"><span class="preprocessor">#include <utility></span>
<span class="keyword">template</span><<span class="keyword">typename</span> T, <span class="keyword">typename</span> T_base>
<span class="keyword">struct</span> Adapter : T_base
{
<span class="keyword">template</span><<span class="keyword">typename</span>... V>
<span class="keyword">auto</span> on(V&&... v)
{
<span class="keyword">return</span> <span class="keyword">this</span>->call([](<span class="keyword">auto</span>& t, <span class="keyword">auto</span>&&... x) {
<span class="keyword">return</span> t.on(<span class="built_in">std</span>::forward<<span class="keyword">decltype</span>(x)>(x)...);
}, <span class="built_in">std</span>::forward<V>(v)...);
}
};
</code></pre>
<p><code>BaseValue</code> is our nullable base class to invoke methods directly from the same type <code>T</code>:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T>
<span class="keyword">struct</span> BaseValue
{
<span class="keyword">protected</span>:
<span class="keyword">template</span><<span class="keyword">typename</span> F, <span class="keyword">typename</span>... V>
<span class="keyword">auto</span> call(F f, V&&... v)
{
<span class="keyword">return</span> f(t, <span class="built_in">std</span>::forward<V>(v)...);
}
<span class="keyword">private</span>:
T t;
};
</code></pre>
<p>And here is our test class:</p>
<pre><code class="cpp"><span class="keyword">struct</span> X
{
<span class="keyword">int</span> on(<span class="keyword">int</span> v)
{
<span class="keyword">return</span> v + <span class="number">1</span>;
}
};
<span class="comment">// reference function without overhead</span>
<span class="keyword">int</span> f1(<span class="keyword">int</span> v)
{
X x;
<span class="keyword">return</span> x.on(v);
}
<span class="comment">// adapted function to be compared to the reference function</span>
<span class="keyword">int</span> f2(<span class="keyword">int</span> v)
{
Adapter<X, BaseValue<X>> x;
<span class="keyword">return</span> x.on(v);
}
</code></pre>
<p>Below you can find results obtained from <a href="https://gcc.godbolt.org/">online compiler</a>:</p>
<p><strong>GCC 4.9.2</strong></p>
<pre><code class="x86asm">f1(int):
leal 1(%rdi), %eax
ret
f2(int):
leal 1(%rdi), %eax
ret
</code></pre>
<p><strong>Clang 3.5.1</strong></p>
<pre><code class="x86asm">f1(int): # @f1(int)
leal 1(%rdi), %eax
retq
f2(int): # @f2(int)
leal 1(%rdi), %eax
retq
</code></pre>
<p>As you can see there is no difference between <code>f1</code> and <code>f2</code> meaning that compilers are able to optimize and completely eliminate overhead related to lambda object creation.</p>
<h2>7 Conclusion</h2>
<p>I introduced the adapter that allows you to transform object into another object with additional features that provides the same interface with minimal overhead. Base adapter classes are universal transformers that could be applied to any object. They are used to enhance and further extend adapter functionality. Different combination of base classes allows easily creating very complex objects without additional efforts.</p>
<p>This powerful technique will be used and extended in subsequent articles.</p>
<h2>References</h2>
<p><a href="https://github.com/gridem/GodAdapter">github.com/gridem/GodAdapter</a></p>
<p><a href="https://bitbucket.org/gridem/godadapter">bitbucket.org/gridem/godadapter</a></p>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-81198959026805618422015-09-20T06:38:00.002-07:002017-11-16T10:06:52.020-08:00Replicated Object. Part 1: Introduction<h2>1 Abstract</h2>
<p>The present article explains an early prototype that introduces the concept of <em>replicated object</em> or <em>replob</em>. Such object is a further rethinking how to deal with complexity related to distributed systems development. Replob eliminates the dependency on the external reliable service and incorporates the consistent data manipulation into the user-defined objects representing data and related functionality. The idea is based on using the power of C++ language and object-oriented programming that allows complex logic utilization within distributed transactions and significantly simplifies development of the reliable applications and services. Subsequent articles will explain presented approach in detail step-by-step.</p>
<h2>2 Introduction</h2>
<p><strong>Disclaimer</strong>. Almost all methods specified in the article contain dirty memory hacks and abnormal usage of C++ language. So if you are not tolerant to system and C++ perversions please stop reading this article.</p>
<p>Today, topics related to distributed systems are one of the most interesting and attract many people including developers and computer scientists. The popularity can be explained in a simple manner: we need to create robust fault-tolerant systems that provide safe environment to perform execution of operations and data storing.</p>
<p>Along with that, the consistency of distributed system plays important role. It comes with a price if you want to have stronger notion of consistency level. There are a set of systems provides a weakest form of consistency: so called eventual consistency. While those systems have relatively good performance they cannot be used in many areas where you need to have transactional semantics for your operations. The thing is that it is much simpler to meditate and reason about a system under consideration using one of the strong forms of consistency like <em>strict consistency</em> or <em>linearizability</em>. Due to those consistency levels, it is much easier to develop reliable application with safe semantics of operations.</p>
<a name='more'></a>
<h2>3 Overview</h2>
<p>The most common way developing a distributed system is using special building blocks. Those building blocks should provide the convenient way to deal with a complexity related to asynchronous nature of distributed services and a various types of failures including networking issues, process crashes and hardware malfunction. In distributed environments, those failures should not be treated as exceptional and must be handled as a normal code execution. Thus the task of having reliable and consistent building block to deal with a distributed issues is appeared on the scene.</p>
<p>Today's systems use fault-tolerant centralized coordination services like <a href="http://zookeeper.apache.org/">Zookeeper</a> (mostly) or <a href="https://coreos.com/etcd/">etcd</a> (still under active development). They use consensus-based algorithms like <a href="https://web.stanford.edu/class/cs347/reading/zab.pdf">Zab</a> (Zookeeper) or <a href="https://ramcloud.stanford.edu/raft.pdf">Raft</a> (etcd) to provide <em>linearizability</em>. The idea here is the following. At the first stage the leader is elected and at the second stage the designated leader (master) commits the messages in a sequential order providing necessary consistency level. Whereas <a href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos">Zookeeper documentation</a> states that Zookeeper uses primary-backup instead of <a href="https://en.wikipedia.org/wiki/State_machine_replication">state machine replication</a>, it is evident that the only difference between those notions that primary-backup based on replica sequence of requests while state machine replication based on client sequence. I think that the only matters the fact that they agree upon the <em>sequence of deterministic operations</em> using the developed <em>master-based consensus</em> algorithms.</p>
<h2>4 Discussion of Existent Approaches</h2>
<p>The drawback of <em>master-based consensus</em> algorithm is obvious: when the master fails it requires a time period to handle committing messages. The master timeout cannot be very small because it can have negative impact on performance due to high probability of new master election. It cannot be very large either due to significantly increasing latency during master failure. Thus the actual timeout is a tradeoff between latency and reelection probability depending on network conditions and replicas performance. The performance of consensus algorithm strictly depends on the master liveness and it takes significant time to restore the operability and efficiency due to timeout period and consistency preserving logic. Such logic requires at least several round trips, agreement on uncommitted entries and it does not guarantee the convergence with limited amount of round trips because almost every participant may become a new master (leader). So the system may become unavailable for a relatively long period:</p>
<ol>
<li><a href="http://research.google.com/archive/chubby-osdi06.pdf">Chubby</a>: most outages were 15s or less, and 52 were under 30s.</li>
<li><a href="http://docs.mongodb.org/manual/faq/replica-sets/#how-long-does-replica-set-failover-take">MongoDB</a>: it varies, but a replica set will select a new primary within a minute... During the election, the cluster is unavailable for writes.</li>
<li><a href="https://aphyr.com/posts/291-call-me-maybe-zookeeper">Zookeeper</a>: After 15 seconds or so, a new leader is elected in the majority component, and writes may proceed again. However, only the clients which can see one of [n3 n4 n5] can write: clients connected to [n1 n2] time out while waiting to make contact with the leader.</li>
</ol>
<h3>4.1 Transactional Semantics and Complex Scenarios</h3>
<p>One of the most difficult challenges is to apply transactional semantics for the complex logic. Let's assume that we have reliable storage like Zookeeper and we would like to perform the following sequence of operations:</p>
<ol>
<li>Load some portion of data from the storage into memory to deal with.</li>
<li>Apply complex logic to process the data and obtain the result.</li>
<li>Save the result to the storage.</li>
</ol>
<p>This scenario could be solved by applying several approaches.</p>
<h4>4.1.1 Pessimistic Locking Scheme</h4>
<p>Pessimistic locking or concurrency control scheme based on explicit locking mechanism like using mutex for multithreaded applications. Task mentioned above could be solved by applying the following sequence of operations:</p>
<ol>
<li>Obtain exclusive lock to perform the operations.</li>
<li>Perform operations mentioned above (load, apply and save).</li>
<li>Release the lock.</li>
</ol>
<p>The disadvantage of that scheme is derived from the exclusive locking mode:</p>
<ol>
<li>Mutual exclusion increases waiting times needed to propagate lock/unlock actions. Therefore, it increases overall operations latency.</li>
<li>In case of process failure we potentially can have inconsistent data (fortunately, Zookeeper has multi update functionality to apply all results atomically on the final stage). It requires considerable amount of time to spread the process failure knowledge to the system to be able to release the obtained lock.</li>
</ol>
<p>I would like to emphasize that the systems like Zookeeper do not have explicit lock/unlock functionality. One has to use special <a href="http://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_recipes_Locks">lock recipe</a> to be able to utilize pessimistic locking scheme. It introduces additional penalty on the overall transaction latency (see also: <a href="http://infoscience.epfl.ch/record/181690/files/OpenAPI.pdf">Addressing the ZooKeeper Synchronization Inefficiency</a>).</p>
<p>Due to mentioned issues the second approach appears on the scene.</p>
<h4>4.1.2 Optimistic Locking Scheme</h4>
<p>Optimistic scheme tries to get around the performance issues from the previous approach. The idea is to verify the actual state of data before committing:</p>
<ol>
<li>Load the state of data under consideration from the storage.</li>
<li>Apply complex logic locally and create batch of writes.</li>
<li>Atomically verify that no other transaction has changed the data and apply batch of writes.</li>
<li>If verification fails => repeat from the 1st step.</li>
</ol>
<p>All action on the 3rd step must be executed atomically including verification and applying. This scheme can be implemented by using the incremental version counter: on any successful update operation we increase the counter by one. The idea is to apply <a href="https://en.wikipedia.org/wiki/Compare-and-swap">compare-and-swap operation</a> that atomically checks the version counter to verify that the data has not been changed and sets the new value.</p>
<p>This scheme still has the following drawbacks:</p>
<ol>
<li><strong>Implementation complexity</strong>: service must implement CAS and batch writes operations as a single atomic operation.</li>
<li><strong>High cost on contention</strong>: when there are many concurrent updates the algorithm requires repeating steps from the beginning wasting the process resources due to version conflicts.</li>
</ol>
<p>Additionally for both pessimistic and optimistic schemes we need to serialize our internal data into hierarchical key space of the corresponding system (e.g. Zookeeper <a href="http://zookeeper.apache.org/doc/r3.1.2/zookeeperOver.html#sc_designGoals">"znodes"</a> or etcd <a href="https://github.com/coreos/etcd/blob/master/Documentation/api.md#key-space-operations">"nodes"</a>). All mentioned facts lead the application to become more complex and error prone. Thus I would like to go to completely another direction.</p>
<h2>5 Replicated Object Concept</h2>
<p>Let's step back and remember <em>object oriented programming</em> (OOP). We have a notion of <em>objects</em>. Each object has the underlying data representing an object <em>state</em>. An object contains a set of <em>methods</em> that transforms the object from one state to another state.</p>
<p>The idea is to replicate actions (<em>object methods</em>) across the nodes instead of data (<em>object state</em>) replication. Those actions change the object state deterministically and create the illusion that the object itself is replicated. Linearizability guarantees that all replicas are agreed on the same sequence of operations thus the distributed object state will remain consistent. It is very similar to <a href="https://en.wikipedia.org/wiki/State_machine_replication">state machine replication</a>. The only difference is that I use ordinary <em>object</em> to represent the <em>state</em> and <em>methods</em> to represent the <em>events</em> transforming the object. This mapping significantly reduces the complexity and allows using the C++ language power because it supports OOP natively without code bloating.</p>
<h2>6 Replicated Object Proposal</h2>
<p>My replicated object (aka <em>replob</em>) proposal has the following features:</p>
<ol>
<li>Embedded.</li>
<li>Masterless.</li>
<li>In-memory.</li>
<li>Linearizability.</li>
<li>FIFO process guarantee.</li>
<li>Fast local readings.</li>
<li>Concurrent flexible distributed transactions.</li>
<li>Parallel independent transactions option.</li>
<li>Supports any native data structures.</li>
<li>CAP tunable.</li>
<li>Smooth set of replicas degradation.</li>
<li>Safety and liveness under network issues:
<ol>
<li>Partitioning.</li>
<li>Partial partitioning like "bridging".</li>
<li>Temporary network instability.</li>
<li>Partial network packets direction.</li>
</ol></li>
</ol>
<p>Below I briefly explain each item.</p>
<p><strong>Embedded</strong>. It is not a standalone service. The functionality operates within user process thus allowing reducing latency by decreasing the number of round trips and corresponding overhead. The approach completely eliminates the dependency on external services like Zookeeper or etcd and utilizes native interfaces dramatically simplifying interoperation with replication logic making it completely transparent from the developer perspective.</p>
<p><strong>Masterless</strong>. The algorithm does not have designated master (leader). Thus any node is indistinguishable from each other. Masterless algorithm significantly reduces the fail recovering timings and provides predictable behavior under most conditions.</p>
<p><strong>In-Memory</strong>. Current implementation does not have the persistent layer and every item is distributed across the replica nodes inside the processes memory. Algorithm still allows adding persistence property.</p>
<p><strong>Linearizability</strong>. Replicated object algorithm provides linearizable consistency.</p>
<p><strong>First in First out Process Guarantee</strong>. For the specified process all operations are completed in the order they scheduled (FIFO order).</p>
<p><strong>Fast Local Readings</strong>. A special mode allows reading the data locally by reducing the consistency to sequential consistency level. It significantly decreases the latency and overall system overhead.</p>
<p><strong>Concurrent Flexible Distributed Transactions</strong>. Deterministic user-specific functionality of any complexity can be placed inside distributed transactions. Those transactions are handled concurrently.</p>
<p><strong>Parallel Independent Transactions Option</strong>. User may decide to have several consensus instances to parallelize the agreement on a sequence of independent transactions.</p>
<p><strong>Supports Any Native Data Structures</strong>. Developer can use standard containers like <code>std::vector</code>, <code>std::map</code> etc as well as <code>boost::optional</code>, <code>boost::variant</code> or other data structures that provides copy semantics.</p>
<p><strong>CAP Tunable</strong>. User may choose between linearizable consistency and availability under network partitioning.</p>
<p><strong>Smooth Set of Replicas Degradation</strong>. The system preserves consistency even if the number of nodes reduces dramatically, e.g. from five replicas to two replicas or even to one replica under appropriate conditions.</p>
<p><strong>Safety and Liveness under Network Issues</strong>. There are plenty of different kinds of network issues (see <a href="https://aphyr.com/posts/288-the-network-is-reliable">Aphyr: The network is reliable</a>). Algorithm retains consistency and operability under mentioned issues.</p>
<p>All those items will be discussed in detail in subsequent articles.</p>
<h2>7 Example: Key-Value Storage</h2>
<p>To demonstrate the flexibility and power of the approach I consider the following example. The task is to implement replicated key-value storage with the following interface (I omit <code>std::</code> and <code>boost::</code> namespaces):</p>
<pre><code class="cpp"><span class="keyword">struct</span> KV
{
optional<<span class="built_in">string</span>> get(<span class="keyword">const</span> <span class="built_in">string</span>& key) <span class="keyword">const</span>;
<span class="keyword">void</span> <span class="built_in">set</span>(<span class="keyword">const</span> <span class="built_in">string</span>& key, <span class="keyword">const</span> optional<<span class="built_in">string</span>>& value);
<span class="keyword">private</span>:
<span class="stl_container"><span class="built_in">unordered_map</span><<span class="built_in">string</span>, <span class="built_in">string</span>></span> kv_;
};
</code></pre>
<p>I chose symmetric interface for simplicity. <code>set</code> method deletes appropriate key if value is empty. Corresponding implementation in case of representing the normal object is the following:</p>
<pre><code class="cpp">optional<<span class="built_in">string</span>> KV::get(<span class="keyword">const</span> <span class="built_in">string</span>& key) <span class="keyword">const</span>
{
<span class="keyword">if</span> (kv_.count(key) == <span class="number">0</span>)
<span class="keyword">return</span> {};
<span class="keyword">return</span> kv_.at(key);
}
<span class="keyword">void</span> KV::<span class="built_in">set</span>(<span class="keyword">const</span> <span class="built_in">string</span>& key, <span class="keyword">const</span> optional<<span class="built_in">string</span>>& value)
{
<span class="keyword">if</span> (value)
kv_[key] = *value;
<span class="keyword">else</span>
kv_.erase(key);
}
</code></pre>
<p>Now I would like to turn the ordinary object into <em>replicated object</em>. To do that I just add the following:</p>
<pre><code class="cpp">DECL_REPLOB(KV, get, <span class="built_in">set</span>)
</code></pre>
<!--hide this-->
<p>Hint: the implementation <code>DECL_REPLOB</code> is the following:</p>
<pre><code class="cpp"><span class="preprocessor">#define DECL_REPLOB DECL_ADAPTER</span>
</code></pre>
<p>Then I can use the following code snippet to replicate my data across the replicas:</p>
<pre><code class="cpp">replob<KV>().<span class="built_in">set</span>(<span class="built_in">string</span>{<span class="string">"hello"</span>}, <span class="built_in">string</span>{<span class="string">"world!"</span>});
</code></pre>
<p>All <code>KV</code> instances from replica set contain specified key-value pair when the invocation of <code>KV::set</code> completes. Please note that the object is referenced by the type <code>KV</code> meaning that each replica contains it is own single object instance.</p>
<p>To read the data in a linearizable manner I write:</p>
<pre><code class="cpp"><span class="keyword">auto</span> world = replob<KV>().get(<span class="built_in">string</span>{<span class="string">"hello"</span>});
</code></pre>
<p>To improve the performance I just write down:</p>
<pre><code class="cpp"><span class="keyword">auto</span> localWorld = replobLocal<KV>().get(<span class="built_in">string</span>{<span class="string">"hello"</span>});
</code></pre>
<p>That's it!</p>
<h3>7.1 Transactions</h3>
<p>Let's suppose I want to update the item. The naive approach is to use the following code:</p>
<pre><code class="cpp"><span class="keyword">auto</span> world = replobLocal<KV>().get(<span class="built_in">string</span>{<span class="string">"hello"</span>}).value_or(<span class="string">"world!"</span>);
replob<KV>().<span class="built_in">set</span>(<span class="built_in">string</span>{<span class="string">"hello"</span>}, <span class="string">"hello "</span> + world);
</code></pre>
<p>The only problem is that two atomic operations together are not atomic (race condition of the second kind). Thus we need to put those actions inside the transaction:</p>
<pre><code class="cpp">MReplobTransactInstance(KV) {
<span class="keyword">auto</span> world = $.get(<span class="built_in">string</span>{<span class="string">"hello"</span>}).value_or(<span class="string">"world!"</span>);
$.<span class="built_in">set</span>(<span class="built_in">string</span>{<span class="string">"hello"</span>}, <span class="string">"hello "</span> + world);
};
</code></pre>
<p>Now those actions are applied atomically across the all replicas.</p>
<h3>7.2 Transactions with Results</h3>
<p>Let's consider the following task: calculate the length of value for specified key. Nothing's easier:</p>
<pre><code class="cpp"><span class="comment">// use local instance because we do not need to update the object</span>
<span class="keyword">auto</span> valueLength = MReplobTransactLocalInstance(KV) {
<span class="keyword">return</span> $.get(<span class="built_in">string</span>{<span class="string">"hello"</span>}).value_or(<span class="string">""</span>).size();
};
</code></pre>
<p>The same approach can be applied for update operation:</p>
<pre><code class="cpp"><span class="keyword">auto</span> valueLength = MReplobTransactInstance(KV) {
<span class="keyword">auto</span> world = $.get(<span class="built_in">string</span>{<span class="string">"hello"</span>});
$.<span class="built_in">set</span>(<span class="built_in">string</span>{<span class="string">"another"</span>}, world);
<span class="keyword">return</span> world.value_or(<span class="string">""</span>).size();
};
</code></pre>
<p>All the mentioned operations are applied on the replicas atomically.</p>
<h3>7.3 Multiple Replob Transactions</h3>
<p>Let's assume that we have two independent instances of key-value storages: <code>KV1</code> and <code>KV2</code>. We can combine operations for corresponding instances by using the modifier <code>MReplobTransact</code>:</p>
<pre><code class="cpp"><span class="comment">// the first transaction is distributed</span>
<span class="comment">// performs value copying from KV2 to KV1 for the same key</span>
MReplobTransact {
$.instance<KV1>().<span class="built_in">set</span>(
<span class="built_in">string</span>{<span class="string">"hello"</span>},
$.instance<KV2>().get(<span class="built_in">string</span>{<span class="string">"hello"</span>}));
};
<span class="comment">// the second transaction is applied locally</span>
<span class="comment">// returns total value size calculation for the same key</span>
<span class="keyword">auto</span> totalSize = MReplobTransactLocal {
<span class="keyword">auto</span> valueSize = [](<span class="keyword">auto</span>&& val) {
<span class="keyword">return</span> val.value_or(<span class="string">""</span>).size();
};
<span class="keyword">return</span> valueSize($.instance<KV1>().get(<span class="built_in">string</span>{<span class="string">"hello"</span>}))
+ valueSize($.instance<KV2>().get(<span class="built_in">string</span>{<span class="string">"hello"</span>}));
};
</code></pre>
<p>Should I mention that all those actions are performed atomically and the first transaction is spread across the all replicas?</p>
<h3>7.4 Advanced Example</h3>
<p>Let's consider iteration through the collection with user-defined function:</p>
<pre><code class="cpp"><span class="keyword">struct</span> KV
{
optional<<span class="built_in">string</span>> get(<span class="keyword">const</span> <span class="built_in">string</span>& key) <span class="keyword">const</span>;
<span class="keyword">void</span> <span class="built_in">set</span>(<span class="keyword">const</span> <span class="built_in">string</span>& key, <span class="keyword">const</span> optional<<span class="built_in">string</span>>& value);
<span class="comment">// generic method to iterate through the collection</span>
<span class="keyword">template</span><<span class="keyword">typename</span> F>
<span class="keyword">void</span> forEach(F f) <span class="keyword">const</span>
{
<span class="keyword">for</span> (<span class="keyword">auto</span>&& v: kv_)
f(v);
}
<span class="keyword">private</span>:
<span class="stl_container"><span class="built_in">unordered_map</span><<span class="built_in">string</span>, <span class="built_in">string</span>></span> kv_;
};
</code></pre>
<p>Now the task is calculating the total size of all values:</p>
<pre><code class="cpp"><span class="keyword">auto</span> valuesSize = MReplobTransactLocalInstance(KV) {
size_t sz = <span class="number">0</span>;
$.forEach([&sz](<span class="keyword">auto</span>&& v) {
sz += v.second.size();
});
<span class="keyword">return</span> sz;
};
</code></pre>
<p>As you can see the way is completely straightforward.</p>
<h2>8 Further Directions</h2>
<p>Previously I consider several simple but powerful examples how to use <em>replicated object</em> approach. Further articles introduce utilized ideas and concepts step-by-step:</p>
<ol>
<li>God adapter.</li>
<li>Nonblocking deadlock-free synchronization or <em>subjector model</em>.</li>
<li>Uniform actor model or <em>funactor model</em>.</li>
<li>Overgeneralized serialization.</li>
<li>Behavior modifiers.</li>
<li>IO and coroutines.</li>
<li>Consistency and CAP theorem applicability.</li>
<li>Phantom, replob and <em>masterless consensus algorithm</em>.</li>
<li>Implementation examples:
<ol>
<li>Atomic failure detector.</li>
<li>Distributed scheduler.</li>
</ol></li>
</ol>
<h2>9 Conclusion</h2>
<p>We consider the introduction into the fault tolerant distributed <em>replicated object</em> with the set of outstanding features. It allows significantly reducing the complexity of reliable distributed application creation and opens the door to use it in a wide range of areas.</p>
<p>Masterless consensus algorithm allows handling fails in a predictable way without wasting the time. Embedded approach eliminates network delays required to cooperate with external services. Whereas strong consistency model provides convenient way to interact with <em>replob</em> in a transactional and flexible manner.</p>
<p><em>Special thanks to <a class="g-profile" href="https://plus.google.com/113021723671522138605" target="_blank">Sergey Polovko</a>, <a class="g-profile" href="https://plus.google.com/117023091752061088817" target="_blank">Yauheni Akhotnikau</a> and Petr Prokhorenkov for useful advices and comments.</em></p>
<h2>10 Test Questions</h2>
<ol>
<li>How is <code>DECL_REPLOB</code> implemented?</li>
<li>What is the difference between local and nonlocal operations?</li>
<li>Is it possible to implement masterless consensus algorithm?</li>
<li>Specify all behavior modifiers mentioned in the article.</li>
</ol>
<h2>References</h2>
<p>[1] Documentation: <a href="http://zookeeper.apache.org/">Zookeeper</a>.</p>
<p>[2] Documentation: <a href="https://coreos.com/etcd/">etcd</a>.</p>
<p>[3] Article: <a href="https://web.stanford.edu/class/cs347/reading/zab.pdf">Zab: High-performance Broadcast For
Primary-Backup Systems</a>.</p>
<p>[4] Article: <a href="https://ramcloud.stanford.edu/raft.pdf">In Search of an Understandable Consensus Algorithm
(Extended Version)</a>.</p>
<p>[5] Zookeeper documentation: <a href="https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos">Zab vs. Paxos</a>.</p>
<p>[6] Wikipedia: <a href="https://en.wikipedia.org/wiki/State_machine_replication">State Machine Replication</a>.</p>
<p>[7] Article: <a href="http://research.google.com/archive/chubby-osdi06.pdf">The Chubby Lock Service For Loosely-Coupled Distributed Systems</a>.</p>
<p>[8] MongoDB documentation: <a href="http://docs.mongodb.org/manual/faq/replica-sets/#how-long-does-replica-set-failover-take">How long does replica set failover take?</a></p>
<p>[9] Aphyr blog: <a href="https://aphyr.com/posts/291-call-me-maybe-zookeeper">Zookeeper</a>.</p>
<p>[10] Documentation: <a href="http://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_recipes_Locks">ZooKeeper Recipes and Solutions: Locks</a>.</p>
<p>[11] Article: <a href="http://infoscience.epfl.ch/record/181690/files/OpenAPI.pdf">Addressing the ZooKeeper Synchronization Inefficiency</a>.</p>
<p>[12] Wikipedia: <a href="https://en.wikipedia.org/wiki/Compare-and-swap">Compare-and-swap</a></p>
<p>[13] Documentation: <a href="http://zookeeper.apache.org/doc/r3.1.2/zookeeperOver.html#sc_designGoals">Zookeeper znodes</a>.</p>
<p>[14] Documentation: <a href="https://github.com/coreos/etcd/blob/master/Documentation/api.md#key-space-operations">etcd nodes</a>.</p>
<p>[15] Aphyr blog: <a href="https://aphyr.com/posts/288-the-network-is-reliable">The Network Is Reliable</a>.</p>
<p>[16] Article: <a href="https://www.usenix.org/legacy/events/usenix10/tech/full_papers/Hunt.pdf">ZooKeeper: Wait-Free Coordination For Internet-Scale Systems</a>.</p>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-8193620970547891122015-08-12T02:29:00.000-07:002015-09-11T16:45:16.063-07:00New Language Paradigm Philosophy<h2 id="rules"><a name="rules" href="#rules"></a>Rules</h2>
<ol>
<li>Simple things must be simple.</li><li>Complex things must be as simple as possible.</li></ol>
<p>Meaning that the implementation should be the simplest.</p>
<h2 id="four-noble-truths"><a name="four-noble-truths" href="#four-noble-truths"></a>Four Noble Truths</h2>
<p>The approach is like a <a href="https://en.wikipedia.org/wiki/Four_Noble_Truths">buddhism</a>:</p>
<ol>
<li>There is a complexity.</li><li>There is a root cause of the complexity.</li><li>There is an absence of complexity.</li><li>There is a way to avoid complexity.</li></ol>
<h3 id="complexity"><a name="complexity" href="#complexity"></a>Complexity</h3>
<p>API depends on the model you choose: actor, callback-style, subscription, future/promise, RPC-style etc. But the model should be implementation details: you should change it if you wish. Currently, it’s not the case.</p>
<p>The idea is to transform the code in such way to have flexibility in the model and approaches. One should consider the model as a low-level (implementation details) architecture.</p>
<h3 id="building-blocks"><a name="building-blocks" href="#building-blocks"></a>Building Blocks</h3>
<p>You should build the application from top to bottom (from architecture to implementation), not from bottom to top (from classes and libraries to satisfy the requirements/architecture).</p>
<a name='more'></a><h2 id="invariants"><a name="invariants" href="#invariants"></a>Invariants</h2>
<p>The main idea is to use invariants during program development. Invariants are the entities that are stable across requirements changes.</p>
<p>Usually the invariants like a declarations of the stable entities in some form. For that purposes developers create DSL to describe corresponding invariants.</p>
<h3 id="development-costs"><a name="development-costs" href="#development-costs"></a>Development Costs</h3>
<p>Development itself consists not only coding but:</p>
<ol>
<li>Architecture.</li><li>Coding.</li><li>Stabilizing.</li><li>Deploying.</li><li>Supporting.</li></ol>
<p>Thus the idea is not to create a language to easy write (2 item), but easy to rewrite (3 and 5) under requirements changing. Invariants help to leave intact the most significant part of your code.</p>
<h3 id="leaked-abstractions"><a name="leaked-abstractions" href="#leaked-abstractions"></a>Leaked Abstractions</h3>
<p>Due to requirement changes (or adding some alternatives to the main calls sequence) abstractions can be extended, improved or redesigned. Sometimes the refactoring is needed.</p>
<p>The reason is that the system usually is created from bottom to top. It’s a language feature: you should create low-level abstractions and then use them to create higher level and then what you actually needed.</p>
<p>But the real solution is to build your application from top to bottom based on invariants you have. Those invariants can be extracted from domain area, requirements or other sources. Such invariants cannot be changed in the nearest future, otherwise they are not invariants.</p>
<p>And the idea is to build a tree based on high level blocks by creating low level blocks. The lowest building blocks are classes, methods, data etc.</p>
<p>The idea is to try to preserve necessary invariants while going from high abstractions to low abstractions. Transformations allow you to do it in a more convenient way.</p>
<p>That’s a reason why the existent or newly created languages cannot satisfy your needs. Because your domain-specific tasks cannot be covered. You should create them by yourself using some UML diagrams or other stuff that you actually forgot then due to outdated information. That’s why it’s important to put them by using the language itself. One of the possible approaches that are widely used is to develop an IDL. It allows you right down your specific needs. But it requires a lot of times to create the language, and then you need to create a source code etc. Error handling, limitations etc. gives you an awful experience.</p>
<p><strong>It’s better to have a language that provides a convenient and systematic way to go from higher to lower levels in a controllable manner to generate the code that satisfies your own needs.</strong></p>
<h3 id="invariant-basis"><a name="invariant-basis" href="#invariant-basis"></a>Invariant Basis</h3>
<p>Sometimes you don’t know the invariants. But you know the invariant basis - the language to represent your invariants. For OOP language basis is the class definition (private/public methods and data). For networking the basis is the connections and messages between nodes etc.</p>
<p>So just define the basis and put some data inside the basis. After that changing the data doesn’t affect overall picture.</p>
<h2 id="transformations"><a name="transformations" href="#transformations"></a>Transformations</h2>
<p>Another item is transformation. The idea is to transform data from high level to lower level invariants.</p>
<p>The declaration is represented as tree of definitions. So transformation transforms that tree to another tree using other (lower level) entries. Those transformations are like hooks that can use current subtree and another context to transform the data.</p>
<p>The final destination can be either LLVM or another language like C.</p>
<h2 id="monads"><a name="monads" href="#monads"></a>Monads</h2>
<p>Monads are just a special transformation of actions inside particular block of commands inside the monad.</p>
<p>Examples:</p>
<ol>
<li>Asynchronous pipelining.</li><li>Dealing with optional or nullable objects like <code>obj.getA().getB().getC()</code> without crash.</li></ol>
<h2 id="verifications-and-optimizations"><a name="verifications-and-optimizations" href="#verifications-and-optimizations"></a>Verifications and Optimizations</h2>
<p>Because we have all information from any level starting from highest to lowest we could use it for <em>verification</em> and <em>optimization</em> purposes. E.g.:</p>
<ol>
<li>Deadlock checking.</li><li>Race conditions of first kind checking.</li><li>Lock optimizations: if we found that this function is invoked only within the same mutex we could remove that mutex at all.</li></ol>
<h3 id="example"><a name="example" href="#example"></a>Example</h3>
<p>Let’s consider the following example: file opening. Working with files depends on the concrete usage:</p>
<ol>
<li>Just read the whole content.</li><li>Read the file chunk by chunk.</li><li>Streaming mode.</li><li>Streaming zero-copy mode.</li></ol>
<p>In those different cases you should use different API to achieve the best performance. But why should I use it differently? I would like to use the API for my purposes and it’s up to implementation to use the most performant version. Because logic is static information we could generate appropriate code by analyzing the usage of the file and using appropriate API.</p>
<h2 id="testability"><a name="testability" href="#testability"></a>Testability</h2>
<p>To allow to test your application developers often try to use dependency injection for all classes. It improves both testability and flexibility. The cost is the complexity.</p>
<p>The approach allows you to change the implementation of any class if you want because it contains the knowledge about it and can transform the source code accordingly.</p>
<h2 id="dependencies"><a name="dependencies" href="#dependencies"></a>Dependencies</h2>
<p>It can automatically calculate the dependencies to avoid doing unnecessary steps.</p>
<h2 id="reverse-transformation"><a name="reverse-transformation" href="#reverse-transformation"></a>Reverse Transformation</h2>
<p>Sometimes you need to perform code refactoring. E.g. you use map-reduce technology and would like to use Spark-like stack. For that purpose you can <em>reverse transformation</em> from map-reduce to change your code. It can be done transparently without any issues if there is a <em>direct transformation</em> from new to old one. So it allows creating higher level abstractions based on low-level implementation. The translator verifies that applying reverse transformation for code will result to the same original code by applying direct transformation as a normal transformation from higher abstractions to lower ones.</p>
<p>Thus it allows refactoring complex logic based on high-level refactoring primitives.</p>
<h2 id="definitions"><a name="definitions" href="#definitions"></a>Definitions</h2>
<p>New term introduction should be like a word definition in a natural language: define the unknown word using known words.</p>
<p>Another interesting aspect: axiomatic approach. Axiom is a set of implicit definitions like a system of equations: you cannot define the words separately, only by using a system. Example: geometry: point, line etc. That system knows as a set of axioms => prolog style. So such definitions (implicit system of rules) significantly increase the complexity of the system. On the first stage it’s better to have simple explicit definition: only single unknown term through a set of known terms.</p>
<h2 id="api-dependencies"><a name="api-dependencies" href="#api-dependencies"></a>API Dependencies</h2>
<p>Usually API shouldn’t depend on internals and particular implementation. But the reality is that it’s hard to do and implement API regardless of the implementation. Part of “implementation details” should be the model you choose like actor-based, asynchronous, message-passing, continuation, future-style etc. Your API strictly depends on the model. But your original idea is not changed, it’s just a low-level layer and you would like to have a possibility to choose the model you want. And you cannot do it because the API depends on the model disallowing you to change that layer and forcing you to choose and think about it in advance.</p>
<p>Even more, it’s better not to rely on particular language, just use an abstractions and later map on different languages just to try. Interesting idea is to use reversed transformations to convert particular language into the abstract one.</p>
<h2 id="destination-language-conversion"><a name="destination-language-conversion" href="#destination-language-conversion"></a>Destination Language Conversion</h2>
<p>We could use special transformers to generate language-specific files. But this transformation can be reversed. In that case there are 3 possibilities:</p>
<ol>
<li>There is the only possibility. Thus we just use this possibility without any issues.</li><li>There is no possibility. Generate error.</li><li>There are several possibilities. Either choose the most probable with warning or use special annotated configuration to specify how to treat this peace of code (configuration may contain the default behavior for all cases and specific transformation for specific parts).</li></ol>
<h2 id="problems-with-existent-approaches"><a name="problems-with-existent-approaches" href="#problems-with-existent-approaches"></a>Problems with Existent Approaches</h2>
<p>The implementation usually depends on the particular usage. There are the following flexibility levels:</p>
<ol>
<li>Hardcoded constants.</li><li>Configuration on start.</li><li>Dynamic configuration per application.</li><li>Dynamic configuration per thread.</li><li>Configuration per function/state etc.</li></ol>
<p>Each level corresponds to particular implementation and complexity and there is no uniform and effective implementation for every level of flexibility.</p>
<h3 id="ui-interaction"><a name="ui-interaction" href="#ui-interaction"></a>UI Interaction</h3>
<p>Usually frameworks provide more functionality than native provides. Thus they use emulation because it’s more portable way to represent the user intentions. Thus you pay for flexibility by sacrificing the application performance.</p>
<p>It’s better to turn on the UI emulation only in case when the user wants such flexibility and native methods cannot provide this. Thus the code depends on the used functionality.</p>
<h3 id="logging,-statistics-and-high-level-operations"><a name="logging,-statistics-and-high-level-operations" href="#logging,-statistics-and-high-level-operations"></a>Logging, Statistics and High-level Operations</h3>
<p>Usually to log some actions the logger is used. There are a lot of options could be applied:</p>
<ol>
<li>Destination: syslog, file, console, network etc.</li><li>Asynchronous and synchronous logging.</li><li>Special formatting.</li><li>Dynamic destination changes.</li><li>Several destinations.</li></ol>
<p>Any of item adds some complexity thus adding the performance penalty by adding corresponding abstractions. But if you need just simple log you would like to avoid such complexities and performance penalty. But you will have them anyway due to the fact that you cannot avoid the code based on abstractions.</p>
<p>Another item is statistics. You would like to automate some branch statics, data statistics or other kind of statistics.</p>
<p>You would like also to have automatic logging of some important high-level operations. They can be used together with exception throwing to put those operations as a high-level callstack. In C++ it could be like checking the exception throwing in destructor and put it into the log if the exception happens.</p>
<p>Ideally, it would be better to mark some variables as important and show them automatically on any log in the function, use marked values in the log. It could be per function basis, per class, per file or per module.</p>
<h2 id="simplifications"><a name="simplifications" href="#simplifications"></a>Simplifications</h2>
<p>Basically, the transformations are simplificators to simpler terms which are low-level terms actually. And the transformations are the path from the high-level complex description to the most simplified low-level part of particular language.</p>
<h2 id="refactoring-and-requirements-changing"><a name="refactoring-and-requirements-changing" href="#refactoring-and-requirements-changing"></a>Refactoring and Requirements Changing</h2>
<p>During software evolution the requirements are subject to change. Usually requirements affect high and medium levels of abstractions while the development is started from the lowest level of abstraction based on existent libraries and language syntax.</p>
<p>While doing invariants from highest level of abstraction to lower levels it’s possible to change higher levels without significant refactoring. The idea here that:</p>
<ol>
<li>We use invariants that tend to be stable.</li><li>We building an application from highest level to lowest.</li></ol>Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-72101723176868246932014-12-08T14:53:00.000-08:002015-09-11T16:50:46.011-07:00C++ User Group in Saratov, October 2014 <p>On 25th, October 2014 the regular C++ meeting conference was held. It was planned to be 5 speakers but one of them could not attend due to family circumstances. Thus the total amount of presentations were equal to 4:</p>
<ol>
<li>Vasiliy Sorokin: <strong>Google C++ Mocking and Test Frameworks</strong></li>
<li>Grigory Demchenko: <strong>Asynchronicity And Coroutines: Data Processing</strong></li>
<li>Rainer Grimm: <strong>Functional Programming in C++11</strong></li>
<li>Mikhail Matrosov: <strong>C++ Without <code>new</code> and <code>delete</code></strong></li>
</ol>
<p>Below you may find small description related to each presentation.</p>
<a name='more'></a>
<p>The first report was about the use of <a href="https://code.google.com/p/googletest/">Google Unit-test framework</a>. Firstly, the basic concepts and macros testing part were considered. The second part described the <a href="https://code.google.com/p/googlemock/">mock-based approach</a> using interfaces and a set of macros. It was noted that the application architecture must be developed appropriately with the mandatory use of abstract interfaces to correctly apply this methodology. In general, it was useful presentation for software testing. For example, I used <code>boost.test</code> library, but there is no "mock" functionality. In this sense, Google's framework looks more promising. However, all of this comes at the price of the macro abundance and as a consequence poor compilation performance. On the other hand, <code>boost.test</code> library is also not very swiftness.</p>
<p>The second presentation was mine. It was related to approach how to process the data by leveraging the network bandwidth and CPU. I'm planning to post the detailed description later. There was several unrelated questions like "but there is Hadoop, why do something else?" on the presentation.</p>
<p>Coffee break was followed by the third presentation of a guest speaker from Germany. It was related to different C++ functionality aspects. It was noted that a modern C++ during the evolution includes more and more functional capabilities, but it still far away from the true functional languages. There were introduced basic primitives and approaches that are used in the functional world and the correspondence between Haskell, Python and C++ were shown. Also the speaker said about meta-programming and ugly and annoying syntax of C++. Overall, it was quite interesting presentation, primarily for the beginners. It is worth noting that the presentation was in English adding zest to Russian C++ User Group.</p>
<p>The final report was about the C++11/C++14 standard and <code>make_shared<T>()</code>/<code>make_unique<T>()</code> usage. Indeed it was like Queen Anne was died for me because I knew this technique about 5 years ago. At that time (5 years ago) almost everyone had opened his eyes wide and twisted his finger to his temple. And now you may use it as a part of a language, part of standard. Oh well. For those who do not know, I recommend to read my article <a href="http://habrahabr.ru/post/118550/">Dependency Inversion And Creational Patterns (In Russian)</a>. There, in addition to avoid of explicit usage of <code>new</code>/<code>delete</code> I use more advanced and consistent method of creating objects using the <a href="http://en.wikipedia.org/wiki/Dependency_injection">Dependency Injection</a>.</p>
<p>Between the second and third presentation there was a coffee break, where you may drink tea or coffee with tasty snacks. At the end there were event more tasty snacks, pancakes and salmon.</p>
<p>The event itself was at the club Leningrad Hall. There was live music at the evening. I am not a fan of the Electronic Rock style but suddenly I liked it. Dark beer was excellent too and together with the music became even more excellent. Also I had managed to talk to Rainer Grimm. What was the idea to go from Germany to Russia, and even to Saratov? My respect.</p>
<p>Overall, the event was pleasant. I hope my presentation too. That is the way it is!</p>Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-6017219460307628362014-08-04T03:15:00.002-07:002015-09-19T14:23:25.112-07:00Memory Model: Brief Description<h2>Introduction</h2>
<p>My previous post was related to discussion about memory model in <em>C++11</em>. Now let’s talk about memory model itself briefly.</p>
<p>So, memory model contains 3 different use cases. It can be classified differently. Let’s use the following criteria: how many threads are affected on atomic operation:</p>
<ol>
<li>One thread, known as <em>relaxed</em> atomic operations.</li>
<li>Two threads, known as <em>acquire-release</em> operations.</li>
<li>More than two threads, the strongest guarantee, known as <em>sequentially consistent</em> operations.</li>
</ol>
<h2>One Thread: Relaxed Atomics</h2>
<p>It’s quite simple: if you just need to have <em>eventual atomic consistency</em> you may use <em>relaxed</em> atomics. It has the best performance but it guarantees only atomic operation for that value. Other threads may read old values, but eventually updated value will appear. It’s useful, e.g., for statistics, debugging flags etc, or during intermediate atomic operations in some complex scenarios.</p>
<a name='more'></a>
<h2>Two Threads: Acquire-Release Atomics</h2>
<p>It has stronger guarantee than <em>relaxed</em> atomic operations thus having additional performance penalty (at least it has compiler barriers). The use-cases are similar to mutex-based approach on <code>lock()</code>/<code>unlock()</code> operations:</p>
<ol>
<li>Locking => resource acquiring => <em>acquire</em> atomic operation</li>
<li>Unlocking => resource releasing => <em>release</em> atomic operation</li>
</ol>
<p>It’s the most common use case: writer <em>releases</em> resource in one thread while reader <em>acquires</em> the same resource in another thread. That’s why it affects two threads. </p>
<h3>Consume-Release Atomics</h3>
<p><em>Acquire-release</em> semantics has additional sub-case: <em>consume-release</em> semantics. The main and the only difference is that it can be used for pointer manipulations:</p>
<pre><code class="cpp"><span class="built_in">std</span>::atomic<Object*> atomicObject;
<span class="comment">// load pointer</span>
Object* object = atomicObject.load(<span class="built_in">std</span>::memory_order_consume);
<span class="comment">// dependent read uses the same loaded pointer</span>
object->value1 = <span class="number">1</span>;
<span class="comment">// another dependent read uses the same loaded pointer</span>
object->value2 = …;
</code></pre>
<p>The idea is that subsequent read operations use loaded pointer. Almost all processor models understand that there is dependent subsequent loadings and don’t require additional memory barriers (hello, <em>DEC Alpha</em>!). Thus the performance of <em>consume-release</em> is at most as good as <em>acquire-release</em> and sometimes is even better.</p>
<h2>Three or More Threads: Sequentially Consistent Atomics</h2>
<p>This is the strongest guarantee and slowest as well, no surprises. All examples have something in common: at least three threads are used to demonstrate usefulness of using <em>sequentially consistent atomics</em>. They are very complicated and very unlikely to see those examples in real applications. But it’s the most expected behavior among all guarantees. That’s why it is used as default value for all atomic operations in <em>C++11</em>. </p>
<h2>Examples</h2>
<p>Here are a couple of interesting examples to demonstrate usage in different scenarios.</p>
<h3>Lock Contention Statistics</h3>
<p>It’s quite natural to use <em>mutexes</em> for complex data structure and logic. But if you are developing high performance application you have to know about lock contention to prevent long waiting on <code>lock()</code> operations (see my previous post <a href="http://gridem.blogspot.ru/2014/06/c-user-group-in-st-petersburg-21-june.html">C++ User Group in St. Petersburg</a>). Simple snippet below provides you the needed information:</p>
<pre><code class="cpp"><span class="built_in">std</span>::mutex mutex;
<span class="built_in">std</span>::atomic<<span class="keyword">int</span>> lockCounter; <span class="comment">// no contention</span>
<span class="built_in">std</span>::atomic<<span class="keyword">int</span>> waitCounter; <span class="comment">// wait due to contention</span>
<span class="comment">// … locki</span>ng
<span class="keyword">if</span> (mutex.try_lock())
{
lockCounter.fetch_add(1, <span class="built_in">std</span>::memory_order_relaxed);
}
<span class="keyword">else</span>
{
waitCounter.fetch_add(1, <span class="built_in">std</span>::memory_order_relaxed);
mutex.lock();
}
// mutex acquired, <span class="keyword">continue</span>
</code></pre>
<p>As you see there is no magic on atomic manipulations. If you obtain that <code>waitCounter</code> is not small as expected you should reconsider locking mechanism usage by applying fine-grained approach.</p>
<h3>Double-Checked Locking</h3>
<p>Not so long ago <a href="http://www.crashedtestdummy.com/2012/01/the-double-checked-locking-anti-pattern/">double-checked locking was considered as anti-pattern</a>. Now it can be implemented correctly and safely using <em>C++11 memory model</em>:</p>
<pre><code class="cpp"><span class="built_in">std</span>::mutex mutex;
<span class="comment">// returns singleton object of the type T</span>
<span class="keyword">template</span><<span class="keyword">typename</span> T>
T& single()
{
<span class="keyword">static</span> <span class="built_in">std</span>::atomic<T*> ptr;
<span class="comment">// try to load pointer => consume</span>
T* t = ptr.load(<span class="built_in">std</span>::memory_order_consume);
<span class="keyword">if</span> (t != <span class="keyword">nullptr</span>)
<span class="keyword">return</span> *t;
<span class="built_in">std</span>::lock_guard<<span class="built_in">std</span>::mutex> lock(mutex);
<span class="comment">// try again under mutex => relaxed</span>
t = ptr.load(<span class="built_in">std</span>::memory_order_relaxed);
<span class="keyword">if</span> (t != <span class="keyword">nullptr</span>)
<span class="keyword">return</span> *t;
t = <span class="keyword">new</span> T();
ptr.store(t, <span class="built_in">std</span>::memory_order_release);
<span class="keyword">return</span> *t;
}
</code></pre>
<p>Done!</p>
<h3>Atomic Counter for Shared Pointers</h3>
<p>As you (maybe) know <code>std::shared_ptr<T></code> contains atomic counters to preserve number of owners. On increase operation there is no any surprises:</p>
<pre><code class="cpp"><span class="built_in">std</span>::atomic<<span class="keyword">int</span>> counter;
<span class="comment">// ...</span>
counter.fetch_add(<span class="number">1</span>, <span class="built_in">std</span>::memory_order_relaxed);
</code></pre>
<p>Decreasing is a little bit much more complicated:</p>
<pre><code class="cpp"><span class="keyword">int</span> c = counter.fetch_sub(<span class="number">1</span>, <span class="built_in">std</span>::memory_order_acq_rel);
<span class="keyword">if</span> (c == <span class="number">1</span>)
<span class="keyword">delete</span> ptr;
</code></pre>
<p>The reason of using acquire and release semantics simultaneously is the following. Let’s suppose we have two threads that own <code>shared_ptr</code> and associated <code>counter == 2</code>. Thread #1 decreases counter from 2 to 1 and thread #2 decreases from 1 to 0 and invokes <code>delete</code> operation. To correctly obtain the whole object state on <code>delete</code> operation thread #2 has to apply <em>acquire</em> semantics while thread #1 has to apply <em>release</em> semantics to <em>"commit"</em> its changes. Because you don’t known in advance the <code>counter</code> value you have to apply both semantics in single atomic operation meaning that reading is performed using <em>acquire</em> semantics while writing decreases value using <em>release</em> semantics. </p>
<h2>Conclusion</h2>
<p>I would recommend using the following receipts:</p>
<ol>
<li>If you just need atomic operation with <em>eventually consistency</em> you may use <em>relaxed atomics</em>.</li>
<li>If you have <em>mutex-like</em> interaction between threads you should use <em>acquire-release atomics</em>. For pointers you should consider <em>consume-release atomics</em>.</li>
<li>If you are not sure about correctness or you have some complex scenarios involving different threads and operations you may fallback to default memory order: <em>sequentially consistent ordering</em>.</li>
</ol>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-74704275499630690312014-07-18T06:35:00.000-07:002014-12-03T08:22:45.766-08:00C++11 Memory Model <p>What is the most interesting and promising feature in <em>C++11</em>? Many of you might say it's the <em>rvalue semantics</em> including <em>move and perfect forward concepts</em>. But I don't think so.</p>
<p>Let me clarify what I mean. Definitely, <em>rvalue semantics</em> is a step forward. But it's not the most advanced and interesting part of the standard. The reason is that it doesn't contain something new. Forwarding is like a syntactic sugar, it allows avoiding unnecessary usage of overloaded functions. <em>Move semantics</em> had been introduced before <em>C++11</em>, for example, in <a href="http://www.ultimatepp.org/srcdoc$Core$pick_$en-us.html">Ultimate++ framework, transfer semantics</a>. So, it's like a syntax evolution to deeper improve performance when <em>return value optimization</em> cannot be applied.</p>
<p>Meanwhile the memory model is another story. It resolves one of the most sophisticated problems: how to obtain common denominator for very different hardware processor models with different assembler instructions. It's very complicated problem. You have to know about every platform to create such denominator. And I would say that it has been resolved perfectly.</p>
<p>Wait, wait, wait. Most of readers might raise an objection against "perfect solution". It's the most complex part of the standard. It's true. But the problem itself is much more complicated. So solution from the problem statement point of view is much simpler.</p>
<p>Solution is great indeed. It provides an abstraction that allows to create highly portable and still very efficient applications using appropriate atomic instructions on any platform: from <em>x86</em> to modern <em>ARM</em> and <em>DEC Alpha</em> (with special <em>consume/release</em> semantics, as if <em>DEC</em> and <em>ARM</em> platforms like relaxing). Finally, memory model of next generation of <em>ARM</em> processor is going to have special atomic sequential consistency instructions providing better performance in most cases. It's the case when hardware pays attention to software needs. It's fundamental model that has far-reaching consequences. And it's amazing!</p>
<p>I'm very expecting that in the nearest future any <em>lock-free</em> and <em>wait-free</em> articles will contain detailed description of memory ordering for all involved atomic operations.</p>
Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-79486121828599848622014-06-27T01:49:00.002-07:002015-09-11T16:48:27.090-07:00Object-Oriented Programming: the Good, the Bad and the Ugly<p>Let me clarify. I love <em>OOP</em>. I developed a lot of functionality using <em>OOP</em> and found it is very productive and highly extensible approach. So what's the point?</p>
<h2>The Good</h2>
<p>What is primary goal of <em>OOP</em>? What are the benefits? Why is it so popular? Where is a magic?</p>
<p>These are simple questions. And I would like to have simple answers. I guess you too. I have this one: it allows to significantly improve the <em>code reuse</em>. How? Because you may use abstractions instead of concrete classes and this fact allows to reuse the functionality based on those abstractions. Why does it matter? Because code reuse is the most effective way to speed up your development.</p>
<h2>The Bad</h2>
<p>So what's wrong with <em>OOP</em>? Is it the holy grail and silver bullet together? Unfortunately the answer is: no. On the one hand we have development speed up. On the other hand we have the price to use abstractions. Let's discuss it in detail.</p>
<a name='more'></a>
<p>Usually the invocation of virtual function is an indirect call. Indirection means that branch prediction mechanism will fail on performing the operation. That causes performance degradation on each virtual call.</p>
<p>Another aspect is the data locality. Recent processor logic is so fast that the actual bottleneck is the physical memory. That's a reason of having several levels of caches on modern architectures. Caches really like sequential operations and hates jumping across the memory back and forth. But any abstraction has the underlying pointer to the associated data, data itself contains pointer to virtual table. So we have several indirections to perform our actions. It would be better to have plain structure in continuous region of memory. Abstractions break up the memory for each object.</p>
<p>Is it important? In approximately 99.99% cases it's not. But when it's important you should take care of that. Fortunately profile-guided optimizations allow you to slightly improve the situation, at least for replacing virtual calls on ordinary function calls in some cases.</p>
<h2>The Ugly</h2>
<p>I would like to emphasize the last moment of <em>OOP</em> development. There are developers trying to apply their knowledge right after reading the book of <a href="http://c2.com/cgi/wiki?GangOfFour">GoF, Gang of Four</a>. Don't allow them to commit the code! It's probably the most horrible and ugliest things that I've ever seen.</p>
<h2>Conclusion</h2>
<p>My conclusion is simple. Apply abstractions only when it's necessary. When performance is important you should try to build your software with minimal usage of abstractions. Because of <em>OOP</em> is like a performance trade off: development speed vs runtime speed.</p>Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-61140346804479567402014-06-26T01:58:00.002-07:002015-09-11T16:49:39.423-07:00Future/Promise Discussion<h2>Introduction</h2>
<p>Currently, <strong>C++11</strong> has introduced <code>async</code> and <code>future</code>/<code>promise</code> pattern. It allows performing concurrent operations and waiting for the result. It looks very promising (in future). Yep.</p>
<p>I don't want to discuss the current standard implementation. It looks like an initial step and contains several flaws (destructor behavior, no thread pools, only blocking semantic for value retrieval etc). I would like to discuss particular usage and overhead questions.</p>
<p>There are several typical usage:</p>
<ol>
<li>Start the task asynchronously and wait for the result.</li>
<li>Start the task asynchronously and don't wait for the result.</li>
<li>Start several (or a lot) tasks asynchronously and wait for the results.</li>
</ol>
<p>Let's discuss them in details.</p>
<h2>Start the task asynchronously and wait for the result</h2>
<p>The idea is simple: sometimes I need start task asynchronously while continuing doing processing at the same time. If the result of the task is needed I invoke the method <code>get()</code> from the future to obtain the result. It looks pretty simple. The only thing is that this is the only case where <code>future</code>/<code>promise</code> technique is very well suited.</p>
<p>Let's consider another typical usage.</p>
<a name='more'></a>
<h2>Start the task asynchronously and don't wait for the result</h2>
<p>The idea is even simpler: I don't want for the result. I just want to start some action to perform operation. Nothing more. Is it possible to implement this case using standard library? No!</p>
<p>But I would like to concentrate on another aspect. Let's suppose that we have <code>detach()</code> method. OK. What do we have? We have <em>mutex</em>, <em>critical section</em> and other interesting stuff to correctly operate with shared state. But we don't want to have the shared state! Why should I pay for it? Okay.</p>
<p>Let's continue.</p>
<h2>Start several (or a lot) tasks asynchronously and wait for the results</h2>
<p>So what should I do? I need to create a <em>vector</em> of <em>futures</em>, put all <em>futures</em> inside <em>vector</em>, iterate through them and invoke <code>get()</code> or <code>wait()</code> methods. If I'm lucky there is no context switches take place. Really? No!</p>
<p>At each <code>get()</code> invocation we either have a result (so only <code>mutex.lock()</code>/<code>unlock()</code> is called or <em>atomic flag</em> depending on the implementation) or we should wait on <em>condition variable</em>. So in worst case each iteration requires context switch. And this worst case is the common case! Because usually the amount of task work is much more then work needed to iterate through <em>futures</em> (obviously).</p>
<h2>And what should I do?</h2>
<p>There is a solution! Just use appropriate implementations for different cases. See my implementation as an example:</p>
<ul>
<li><a href="https://bitbucket.org/gridem/synca">Synca Library on bitbucket</a></li>
<li><a href="https://github.com/gridem/Synca">Synca Library on github</a></li>
</ul>
<p>Below one may find some explanations how to use it:</p>
<ol>
<li>If you want to invoke the task asynchronously and wait for result, use <code>goWait</code> or <code>Waiter</code>.</li>
<li>If you want to invoke the task asynchronously and don't wait for result, use <code>go</code>.</li>
<li>If you want to invoke several tasks asynchronously and wait for results, use <code>goWait</code> or <code>Waiter</code>. So the library implements several tasks in the same way as described in item #1</li>
<li>If you want to invoke several tasks asynchronously and wait for the first actual result, use <code>goAnyWait</code> or <code>goAnyResult</code>.</li>
</ol>
<p>Try it!</p>Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0tag:blogger.com,1999:blog-7694239937514449322.post-4754652574325540622014-06-21T14:40:00.001-07:002014-12-03T10:00:28.477-08:00C++ User Group in St. Petersburg, 21 June 2014<p>Today I've given the presentation related to multithreading topic: <em>"Fine-grained locking"</em>. The considered approach uses fairly simple idea: to incorporate <code>lock</code>/<code>unlock</code> mutex operations into object access using overloaded <code>operator-></code>:</p>
<pre><code class="cpp"><span class="keyword">template</span><<span class="keyword">typename</span> T, <span class="keyword">typename</span> T_mutex>
<span class="keyword">struct</span> Access : <span class="built_in">std</span>::unique_lock<T_mutex>
{
Access(T* t_, T_mutex& m)
: <span class="built_in">std</span>::unique_lock<T_mutex>(m), t(t_) {}
<span class="keyword">template</span><<span class="keyword">typename</span> T_lockType>
Access(T* t_, T_mutex& m, T_lockType type)
: <span class="built_in">std</span>::unique_lock<T_mutex>(m, type), t(t_) {}
T* <span class="keyword">operator</span>->() { init(); <span class="keyword">return</span> t; }
<span class="keyword">private</span>:
<span class="keyword">void</span> init() { <span class="keyword">if</span> (!<span class="keyword">this</span>->owns_lock()) <span class="keyword">this</span>->lock(); }
T* t;
};
<span class="keyword">template</span><<span class="keyword">typename</span> T>
<span class="keyword">struct</span> SmartMutex
{
<span class="keyword">typedef</span> Access<T, Mutex> WAccess;
<span class="keyword">typedef</span> Access<<span class="keyword">const</span> T, Mutex> RAccess;
RAccess <span class="keyword">operator</span>->() <span class="keyword">const</span> { <span class="keyword">return</span> read(); }
WAccess <span class="keyword">operator</span>->() { <span class="keyword">return</span> write(); }
RAccess read() <span class="keyword">const</span> { <span class="keyword">return</span> {get(), mutex()}; }
RAccess readLazy() <span class="keyword">const</span> { <span class="keyword">return</span> {get(), mutex(), <span class="built_in">std</span>::defer_lock}; }
WAccess write() { <span class="keyword">return</span> {get(), mutex()}; }
WAccess writeLazy() { <span class="keyword">return</span> {get(), mutex(), <span class="built_in">std</span>::defer_lock}; }
<span class="keyword">private</span>:
T* get() <span class="keyword">const</span> { <span class="keyword">return</span> data.get(); }
Mutex& mutex() <span class="keyword">const</span> { <span class="keyword">return</span> *mutexData.get(); }
<span class="built_in">std</span>::<span class="built_in">shared_ptr</span><T> data = <span class="built_in">std</span>::make_shared<T>();
<span class="built_in">std</span>::<span class="built_in">shared_ptr</span><Mutex> mutexData = <span class="built_in">std</span>::make_shared<Mutex>();
};
</code></pre>
<p>This allows to avoid race conditions and to use atomicity on the object data level.</p>
<p>Lazy methods are suitable to avoid another issue of multithreaded applications: <strong>deadlock</strong>. Here is an example how to use it correctly with <code>std::lock</code>:</p>
<pre><code class="cpp">SmartMutex<X> x, y;
<span class="keyword">auto</span> rx = x.readLazy();
<span class="keyword">auto</span> ry = y.readLazy();
<span class="built_in">std</span>::lock(rx, ry);
<span class="comment">// now rx and ry can be used</span>
</code></pre>
<p>The same approach was used to implement <code>SmartSharedMutex</code> allowing to share read access. But instead of usual overloaded <code>-></code>, new operator was introduced: <code>---></code> (long arrow). How was it implemented? See related article: <a href="http://habrahabr.ru/post/184436/">Useful Multithreaded Idioms of C++ (in Russian)</a></p>Grigory Demchenkohttp://www.blogger.com/profile/00767146690798788624noreply@blogger.com0