Distributed Systems and Low-Level Programming: New Language Paradigm Philosophy

Rules

Simple things must be simple.
Complex things must be as simple as possible.

Meaning that the implementation should be the simplest.

Four Noble Truths

The approach is like a buddhism:

There is a complexity.
There is a root cause of the complexity.
There is an absence of complexity.
There is a way to avoid complexity.

Complexity

API depends on the model you choose: actor, callback-style, subscription, future/promise, RPC-style etc. But the model should be implementation details: you should change it if you wish. Currently, it’s not the case.

The idea is to transform the code in such way to have flexibility in the model and approaches. One should consider the model as a low-level (implementation details) architecture.

Building Blocks

You should build the application from top to bottom (from architecture to implementation), not from bottom to top (from classes and libraries to satisfy the requirements/architecture).

Invariants

The main idea is to use invariants during program development. Invariants are the entities that are stable across requirements changes.

Usually the invariants like a declarations of the stable entities in some form. For that purposes developers create DSL to describe corresponding invariants.

Development Costs

Development itself consists not only coding but:

Architecture.
Coding.
Stabilizing.
Deploying.
Supporting.

Thus the idea is not to create a language to easy write (2 item), but easy to rewrite (3 and 5) under requirements changing. Invariants help to leave intact the most significant part of your code.

Leaked Abstractions

Due to requirement changes (or adding some alternatives to the main calls sequence) abstractions can be extended, improved or redesigned. Sometimes the refactoring is needed.

The reason is that the system usually is created from bottom to top. It’s a language feature: you should create low-level abstractions and then use them to create higher level and then what you actually needed.

But the real solution is to build your application from top to bottom based on invariants you have. Those invariants can be extracted from domain area, requirements or other sources. Such invariants cannot be changed in the nearest future, otherwise they are not invariants.

And the idea is to build a tree based on high level blocks by creating low level blocks. The lowest building blocks are classes, methods, data etc.

The idea is to try to preserve necessary invariants while going from high abstractions to low abstractions. Transformations allow you to do it in a more convenient way.

That’s a reason why the existent or newly created languages cannot satisfy your needs. Because your domain-specific tasks cannot be covered. You should create them by yourself using some UML diagrams or other stuff that you actually forgot then due to outdated information. That’s why it’s important to put them by using the language itself. One of the possible approaches that are widely used is to develop an IDL. It allows you right down your specific needs. But it requires a lot of times to create the language, and then you need to create a source code etc. Error handling, limitations etc. gives you an awful experience.

It’s better to have a language that provides a convenient and systematic way to go from higher to lower levels in a controllable manner to generate the code that satisfies your own needs.

Invariant Basis

Sometimes you don’t know the invariants. But you know the invariant basis - the language to represent your invariants. For OOP language basis is the class definition (private/public methods and data). For networking the basis is the connections and messages between nodes etc.

So just define the basis and put some data inside the basis. After that changing the data doesn’t affect overall picture.

Transformations

Another item is transformation. The idea is to transform data from high level to lower level invariants.

The declaration is represented as tree of definitions. So transformation transforms that tree to another tree using other (lower level) entries. Those transformations are like hooks that can use current subtree and another context to transform the data.

The final destination can be either LLVM or another language like C.

Monads

Monads are just a special transformation of actions inside particular block of commands inside the monad.

Examples:

Asynchronous pipelining.
Dealing with optional or nullable objects like obj.getA().getB().getC() without crash.

Verifications and Optimizations

Because we have all information from any level starting from highest to lowest we could use it for verification and optimization purposes. E.g.:

Deadlock checking.
Race conditions of first kind checking.
Lock optimizations: if we found that this function is invoked only within the same mutex we could remove that mutex at all.

Example

Let’s consider the following example: file opening. Working with files depends on the concrete usage:

Just read the whole content.
Read the file chunk by chunk.
Streaming mode.
Streaming zero-copy mode.

In those different cases you should use different API to achieve the best performance. But why should I use it differently? I would like to use the API for my purposes and it’s up to implementation to use the most performant version. Because logic is static information we could generate appropriate code by analyzing the usage of the file and using appropriate API.

Testability

To allow to test your application developers often try to use dependency injection for all classes. It improves both testability and flexibility. The cost is the complexity.

The approach allows you to change the implementation of any class if you want because it contains the knowledge about it and can transform the source code accordingly.

Dependencies

It can automatically calculate the dependencies to avoid doing unnecessary steps.

Reverse Transformation

Sometimes you need to perform code refactoring. E.g. you use map-reduce technology and would like to use Spark-like stack. For that purpose you can reverse transformation from map-reduce to change your code. It can be done transparently without any issues if there is a direct transformation from new to old one. So it allows creating higher level abstractions based on low-level implementation. The translator verifies that applying reverse transformation for code will result to the same original code by applying direct transformation as a normal transformation from higher abstractions to lower ones.

Thus it allows refactoring complex logic based on high-level refactoring primitives.

Definitions

New term introduction should be like a word definition in a natural language: define the unknown word using known words.

Another interesting aspect: axiomatic approach. Axiom is a set of implicit definitions like a system of equations: you cannot define the words separately, only by using a system. Example: geometry: point, line etc. That system knows as a set of axioms => prolog style. So such definitions (implicit system of rules) significantly increase the complexity of the system. On the first stage it’s better to have simple explicit definition: only single unknown term through a set of known terms.

API Dependencies

Usually API shouldn’t depend on internals and particular implementation. But the reality is that it’s hard to do and implement API regardless of the implementation. Part of “implementation details” should be the model you choose like actor-based, asynchronous, message-passing, continuation, future-style etc. Your API strictly depends on the model. But your original idea is not changed, it’s just a low-level layer and you would like to have a possibility to choose the model you want. And you cannot do it because the API depends on the model disallowing you to change that layer and forcing you to choose and think about it in advance.

Even more, it’s better not to rely on particular language, just use an abstractions and later map on different languages just to try. Interesting idea is to use reversed transformations to convert particular language into the abstract one.

Destination Language Conversion

We could use special transformers to generate language-specific files. But this transformation can be reversed. In that case there are 3 possibilities:

There is the only possibility. Thus we just use this possibility without any issues.
There is no possibility. Generate error.
There are several possibilities. Either choose the most probable with warning or use special annotated configuration to specify how to treat this peace of code (configuration may contain the default behavior for all cases and specific transformation for specific parts).

Problems with Existent Approaches

The implementation usually depends on the particular usage. There are the following flexibility levels:

Hardcoded constants.
Configuration on start.
Dynamic configuration per application.
Dynamic configuration per thread.
Configuration per function/state etc.

Each level corresponds to particular implementation and complexity and there is no uniform and effective implementation for every level of flexibility.

UI Interaction

Usually frameworks provide more functionality than native provides. Thus they use emulation because it’s more portable way to represent the user intentions. Thus you pay for flexibility by sacrificing the application performance.

It’s better to turn on the UI emulation only in case when the user wants such flexibility and native methods cannot provide this. Thus the code depends on the used functionality.

Logging, Statistics and High-level Operations

Usually to log some actions the logger is used. There are a lot of options could be applied:

Destination: syslog, file, console, network etc.
Asynchronous and synchronous logging.
Special formatting.
Dynamic destination changes.
Several destinations.

Any of item adds some complexity thus adding the performance penalty by adding corresponding abstractions. But if you need just simple log you would like to avoid such complexities and performance penalty. But you will have them anyway due to the fact that you cannot avoid the code based on abstractions.

Another item is statistics. You would like to automate some branch statics, data statistics or other kind of statistics.

You would like also to have automatic logging of some important high-level operations. They can be used together with exception throwing to put those operations as a high-level callstack. In C++ it could be like checking the exception throwing in destructor and put it into the log if the exception happens.

Ideally, it would be better to mark some variables as important and show them automatically on any log in the function, use marked values in the log. It could be per function basis, per class, per file or per module.

Simplifications

Basically, the transformations are simplificators to simpler terms which are low-level terms actually. And the transformations are the path from the high-level complex description to the most simplified low-level part of particular language.

Refactoring and Requirements Changing

During software evolution the requirements are subject to change. Usually requirements affect high and medium levels of abstractions while the development is started from the lowest level of abstraction based on existent libraries and language syntax.

While doing invariants from highest level of abstraction to lower levels it’s possible to change higher levels without significant refactoring. The idea here that:

We use invariants that tend to be stable.
We building an application from highest level to lowest.

Distributed Systems and Low-Level Programming

Wednesday, August 12, 2015

New Language Paradigm Philosophy