Tuesday, July 1, 2014

Separation of Concerns and Components

Do you use any modern technologies such as Java or do you just use older ones like C++?

-- Interviewer

Stating the obvious - or writing out of frustration

I'm writing this because of the apparent confusion between separation of unrelated concerns in relationship to design of features in IT systems and between separation of unrelated concerns related to packaging of IT features into executable components.

There is, especially among WEB developers, a tendency to 'over componentize' architectures. Many relatively small systems (which some years ago would have been called C hacks) contains http servers, database servers, service brokers and application servers, just to mention a few, when in fact the very same IT systems could be written as compact small executable UNIX programs.

Not only does it add to the cost (maintenance, operability, development etc.) to develop systems containing more components than is needed. But it is also ugly and an abomination to the very core of elegant IT design. This is especially true when small internal IT systems contains multiple application servers, several relational databases and other heavy weight components which, twenty years ago would have been written within a few days as small C programs.

Why has development of relatively small IT systems taken this path? I can only guess. Myself I believe that the fundamental problem is that WEB technologies has become industry standard regardless of if the problem is a Web related problem or not and regardless of if Industry Standards matters in the context at hand.

Maybe one reason could also be that calling something an IT System has far more significance when it comes to influencing upper management to cut you a slice of the budget than simply calling it what it maybe should be called: a small stanalone application or maybe just simply a C hack.

What's a component?

What I mean by a Component is something that can be deployed on a computer and made to execute and perform some (hopefully) useful activity.

For example, an HTTP server such as Apache httpd is a component. My program printing 'hello world' is also a component. A more useful component is my Queue Manager routing messages from one queue to another. Yet another component is an Oracle server.

Grey areas

In other words, for something to qualify as being component it must be executable. It cannot be a snippet of C++ code, a piece of Java code or something that cannot be executed by an execution environment. Of course, there are grey areas. Should a (compiled) EJB be seen as a component? Here I consider an EJB to be a component since it can be deployed in a jar file, the jar file deployed in an application server and executed by a JEE app server which runs on top of the underlying operating system.

As a side note, surely something is wrong here - why build such a large stack of execution layers (unix, jee container, jar/war files) just to execute a single component. I love UNIX linkers since I do not have to write nor generate XML files to tell - in an old fashioned JCL IBM way (did we just step back 30 years with the introduction of modern application servers?) - exactly how my software should run. The linker is my friend, the jar/war configuration files are not.

Is separation of concern always applicable?

Separation of concerns usually means separation of unrelated concerns. What unrelated means here is of course highly subjective and belongs in the realm of IT system design. I say subjective since it is dependent on what the focus on an abstraction is. However, I won't make much fuss over if concerns, in general, are unrelated or not, since when it comes to components it doesn't really matter.

It would appear that separation of concerns is always applicable in IT development. After all, we have all been brainwashed to believe by schools, by dogmatic colleagues, managers that have picked up a buzz word here or there or colleagues and IT developers with decades of experience in IT architecture even though only on paper (bah ... why are requests tied to processing through a queue ... don't you know about 'separation of concerns). Of course separation of concern is important in certain contexts but is in fact less important and often harmful in many other contexts.

When is it useful to separate unrelated concerns

Clearly when I design a piece of software drawing boxes, mapping boxes to classes and functions I'll separate an audit log from some machinery translating text from one language to another.

So far so good. But here is the crux. Many developers extend this separation to the component level. That is, logging in an audit log is done by making a call to a separate program. The translation of text from one language to another is done in yet another program. Every single concern that is non-related is separated at the component level. Sometimes one of the most blatantly wrong separation of concern at the component level is the separation of the user interface - possibly a WEB service WSDL based interface - into a separate JEE application server or an http server.

Now, before calling me names ... after all, not using an off the shelf standalone http server or a JEE server for WEB based stuff is blasphemy ... I am not saying that embedding web based interfaces in an application is a general soltion that solves all problems. But I am saying, that it does solve many more problems than most developers and architects are often aware off.

So, what's the problem I'm trying to get at here? The problem is that it is really difficult to deal with systems that have quite a few parts and where the parts are all mostly different. The normal way of solving problems with such systems is to use separation of concern. Typically software systems fall in to this category

Medium number systems

From a system theoretical view point, when designing software systems, we enter the realm of medium number systems - system consisting of a medium number of parts where the parts are different from each other. Such systems are are notoriously difficult to design, analyse and manage. We are dealing with a medium number system when designing the software (that's why it is important to separate unrelated concerns when designing), but there is no need to extend the problematic issues to the component level.

Since the parts are all different and we don't have lots and lots of them - here I mean many millions or billions - we can't use statistical methods to solve our problems. Even worse, since the parts are different and we do have quite a few of them we can't setup equation to analytical come up with a solution (assuming that we actually could formulate the problem as a set of equations).

One solution to the problem is to package functionalities into components so we get fewer parts to manage - even though the parts will still be different from each other.

What do we solve by packaging stuff into components?

First, we end up with fewer parts. We can focus on getting the components right and stop worrying what's inside them.

The second problem we solve is that we don't expose connections between parts that don't need to be managed by operators. For example, internal queues which really are internal implementation details are not exposed and do not need to be managed through configuration files and complex queuing servers. The same is often true for databases which can be embedded inside a component. In other words, the 'dirty' environment outside a component will not have an impact on the internal work being done inside the component.

It's worthwhile to make a mental analogy with electronic integrated circuits. Connections between capacitors, transistors and other gadgets are not exposed to the polluted and dangerous environment outside the integrated circuit.

This is not only true for electronic components. Just imagine that a car engine did not come as a single component - a block containing valves, cylinders, fuel injection electronics etc., - but was distributed across a system communicating through some sort of electronic mechanical bus. Most likely a 20K car would cost at least 200K if that was the case.

A small text processing system

I want to design a system processing text. Specifically I want to translate files containing text from one language to another language.

The design concerns I have are a user interface, requests from users, auditing and billing, internal representation of jobs to be executed (a job represents a file to be translated), segmentation of text in to sentences, translation of sentences and distribution of sentences to be translated across a limited set of resources that can translate sentences.

At a high level it is pretty straight forward. I can separate the various concerns and the design becomes simple

Now, the typical WEB based system (I assume I'll expose the service as a WEB service) consists of a web server and/or JEE server, a number of asynchronous communication mechanisms, a relational database, a backend system and some logging system. The point I will make here is that there are far too many components for such a simple application. Instead of separating the various concerns at the component level, I propose to simple build a single application which internally encapsulates the various concerns.

As an example of the component explosion I don't want to achieve is what I saw a few years ago in a relatively simple system (again, in the old days it would have been called a C hack). Most communication between components in the system were done via one or another asynchronous communication mechanism or, even though the mechanism was not asynchronous it was used in an asynchronous way. I stopped counting at 8 where some of the way to communicate asynchronously were ftp, web services (with call-backs), http, file system queues (using both polling and inotify),JMS, database tables, email etc.

The design

Now, how would I go ahead and design my text processing system? Of course it depends on the scale of the processing. If the system is not too large, say I process a ten or twenty thousand request per day, I would design my classes as usual by separating unrelated concerns. I would implement them and test them. Ones I'm happy that my various 'concerns' do their job right I would simply package them all in a single application.

First and foremost I would avoid, as far as is possible to separate the web service interface from the rest of the system. If I can build single application containing an embedded HTTP server handling the http service request I've come a long way towards making the system simple and operable.

The idea of packaging multiple concerns inside a single executable (component) may of course not scale if the system grows over some limit. However, the almost idiotic notion that everything has to be scalable to any limit without first analysing if the system will ever reach a limit where scalability really matters has killed more than one system.

If we know that we'll never have more than a few thousand of requests per day, why bother to build a full blown industrial system which would scale to 100s of millions of requests per day? If the day would ever come when that happens, we keep our core design which is based on separation of concern and repackage the system using separate HTTP servers and external databases.

Conclusions

Even though being heavily criticized by JEE indoctrinated colleagues and friends, I have yet to hear valid logical arguments why web based system - as if web based is something special as far as IT design is concerned - should not be written as single monolithically applications where it makes sense. The answer I usually get is: everyone knows Apache httpd/JEE. However, even with exhausting ones brain it's clear that just because everyone knows Apache httpd/JEE is simply not a good eneough reason to use them where they don't fit in.

It appears that the 'industry standard' (which by the way benefits the Industry, not the Customer), is hell bent on ensuring that web based applications are broken up into as many separate components as is possible. Typically, a simple web service based application based on JEE has an application server, a relational database, a queuing system, a backend system plus a few more axillary services for generation of identifiers and other goodies.

I am a architect/developer/programmer working mostly in C++ trying to maintain the idea of writing compact, fast IT systems with small foot print which are easy to maintain. Using C++ together with tools such as GSOAP and embedded databases I've built simple web based services written as single applications, having small foot print, being many times faster than JEE based ones, are operable and are scalable up to a certain point.

In the old days such applications would have been called C hacks not IT Systems. For in house use and often for applications exposed on the WEB this is often more than enough.

No comments:

Post a Comment