Thursday, October 29, 2015

Modelling I - Building Blocks

To perform an analysis means decomposing a situation or problem into its constituents. However, there are not any “primitives” that naturally exist. The components defined are arbitrary and depend on individual goals and methods of calculation. The basic elements of a fire are different to a fire fighter, an insurance claims adjuster, and an arson investigator.

-- Gary Klein, Sources of Power

This is the first post in a planned series of blog posts about modelling in IT projects. Each post will target a specific topic within the area of modelling. My goal is to, in each blog entry, get to the most basic and fundamental ideas within the topic of the blog post.

There are a few reasons why I decided to write the posts. The main reason is simple: it just really bothers me that many persons involved in IT development does not view modelling as an activity in and by itself. It also really bothers me that many IT professionals I've worked with over time do not have a clear view of basic ideas and concepts related to modelling. Finally, this is after all a blog and a good reason for writing about modelling is to take stock of a few ideas I've picked up over the years.

I won't start this series with why it's a good idea to treat modelling as a special activity in IT projects – I'll come to that part in a later blog entry. Instead I'll start from the bottom and in a rather dry way describe a few of the basic building blocks used in modelling.

Idioms, patterns, principles and other useful stuff

Before addressing the topic of modelling I want to point out that there are many other ways which are not directly related to modelling that are often used when building IT systems. Some are Idioms, patterns and principles. Of course the list can be made longer. Other tools often useful for thinking include the use of analogies, generalization, abstraction and general system laws among others.

Since most IT people know quite a lot about idioms, patterns and principles I won't discuss them here. Neither will I discuss General Systems Laws since I can get by without them when describing the basics of modelling. The use of analogies, generalization and abstraction are useful when building models and will pop up in some of the posts in this series.

Entities and value objects

An entity is something that has a state that can change over time. An entity also has an identity that is independent of its state and can be used to distinguish the entity from any other entity. Finally, an entity persists over time. Here I’m not talking about persist in the sense of database persistence, but rather in the sense of continues to exist over time.

Are there things that do not continue to exist over time? In the modelling system I describe here there are. An example of something that does not persist over time is an event. Events don't even persist over infinitesimal short time intervals - but more on that later on.

A value object on the other hand has an immutable state and cannot be distinguished from other value objects having the same state. Therefore, you can transparently interchange two value objects having the same state. I italicized the word state since it is questionable if a value object really can be said to have a state since it cannot be changed. Since its state cannot change it is also possible to say that a value object is defined by its value which then is equal to its immutable state. Value objects are also simply called values. Conceptually a value object has always existed and will always exist. In that sense a value object is timeless.

What I said about entities having an identity was s small lie – not a blatant big lie but nevertheless a lie. The truth is that an entity does not have to have a unique identifier. This is usually not so important but I'll explain it anyway. Take the example of a six pack of beer. All the beer cans are full and can be considered to have the same state. Now I grab one can, open it and take a sip and then put it back in the pack. The can I just grabbed has changed its state. Next comes along a neighbour that wants a fresh beer. He does not really care which can he grabs as long as the state of the can he grabs is non-opened. When deciding which can to grab he'll grab any can that has a state non-opened since non-opened cans are interchangeable - well, at least from the perspective of having a cold beer. So even though the cans are entities each having a state that can change there is no need to assign a unique identifier to each can. Now, in an IT system we would most likely assign some sort of technical identifier that uniquely identifies a can – a surrogate key or some address would do. In fact, that's what we almost always would do in an IT system because that is usually what we must do. In a modelling system it's mostly done for convenience – not out of necessity.

Events and messages are not the same thing

An event is an instantaneous happening. That's it – no more, no less. Since an event is an instantaneous happening there is an after the event and a before event but not a during the event. Also, events in contrast to entities do not persist over time. An event is instantaneous and happens at some specific point in time. Once it has happened the event no longer exist.

Since events are happenings you can't receive an event and you can't send an event. What you send and receive might be messages if you have the concept of message in your modelling system. You can however be notified that an event occurred. For example, you can be notified that a message arrived event happened so that you can pick up the message. The act of sending a message is also an event – there is a before the message was sent and an after the message was sent but no during the message being sent. In other words, it is not possible to send an event – a message might be sent but not an event.

Saying that something happened instantaneous is obviously relative to the granularity of your world clock. Something may or may not be seen as occurring instantaneous depending on how you measure time. If your world ticks through time one year at a time, then anything that happens between two clock-ticks is happening instantaneously and can therefore be seen as an event. Consider the example of: the Visigoths sacked Rome in 410 AD. A historian measuring time in years or decades may view the fall of Rome as an event in history as opposed to something that was ongoing over a period of time.

The obvious question now is: how should I choose the length of my clock ticks? As always in modelling it depends on what is important to put the focus on so that the problem at hand can be solved or understood.

If I in broad terms describe major wars that have occurred during the history of mankind I may choose a century as my timescale. For example, in the fifth century AD the Visigoths sacked Rome, in the 16th century Pizzaro conquered the Aztecs and in the 20th century two world wars ravaged Europe and other parts of the world. The people that were around when these events took place would most likely have used faster clock ticks and would not have viewed them as events. The instigators of the wars, and maybe also the defenders, would have described the wars in terms of processes – not events.

It is clear that the context, the timescale and the purpose will voice their opinions when you decide if something should be classified as an event or not. That leads me to the concept of processes which I describe in the next section.

Processes

The simplest view of what a process is goes as follows: a process is a sequence of events that are ordered in time. A process is simply a series of happenings.

If a process is a sequence of events and events are instantaneous happenings how is it then possible to have a process do things in parallel? Well, tasks can still be done in parallel. It is just that two events cannot happen simultaneously. Even though task T1 and task T2 are executing simultaneously, the start of T1 and the start of T2 cannot happen at the same time. This is the case within the modelling system I'm describing but it is not necessarily true in real life. An impportant take away lesson here is that a model is never the same thing as the stuff being modelled.

When talking about processes it's usually a good idea to separate process definitions from process instances. A process definition is the underlying mechanism that controls the sequence of events whereas a process instance is the actual sequence of events. What I described as a process in the previous paragraph is clearly a process instance.

A more business like view of processes definitions and process instances goes like this: in a business, processes exist for the purpose of achieving some goals that have been setup by the business. Running processes (process instances), manipulate things that are valuable to the business. The processes (instances) themselves don’t have any direct value to the business. After all, the sequence of actions that are performed when trying to achieve a goal is not important in itself. What is important is that the result is close to the goal. It should come as no surprise that the goal of most businesses has something to do with profit, cost savings, quality and time-to-market.

Let's stop and take a short thinking pause here. In the previous paragraph I pointed out something fundamental even though the idea of business processes seems rather trivial. Whenever a team is in the middle of designing and implementing a business process it is often worth taking a deap breath ones in a while and think about how the business benefits from the process. If there is no benefit there is most probably no point in implementing the process.

So what are those things that are manipulated by running processes? The things that processes manipulate are usually entities that are of value to a business. The goal of the business describes the desirable state of the entities. The goal is then clearly also the driver for defining how running processes should manipulate the entities – that is, the driver for creating process definitions.

The picture below illustrates the parts that are relevant to processes:

In the diagram the Process only manages objects indirectly through Actors. An Actor can of course be realized as a piece of software, as a person, an organization or some other agent. An Actor does not only modify entities, it can also tweak the Goals when needed. For instance, when a business strategy fails an Actor can take a decision to change the Goals.

The extra indirection introduced by the Actor allows us to separate what has to be done from how it is done. The Process manages what must be done whereas the Actor manages how it is done. When the Actor is implemented as a piece of software the Actor is often called a Service. The idea of Actors implemented as Services and the idea of separating the Process from the Actor is at the core of the rather nebulousidea called Service Oriented Architecture where processes can quickly be defined with the help of existing services.

Business objects and processes

Entities that are manipulated by a business process are often called business objects. Business objects come in many different forms. A few examples of entities and processes from the financial industry are:

  • An Account is often modelled as an entity and is manipulated by a many different processes. For example, a process managing the withdrawal from a cash account ensures among other things that it is not a fraudulent withdrawal, the right amount is withdrawn and that the cash is sent to the correct destination. Another process may bill the customer for the withdrawal service.
  • A Customer is also often modelled as an entity. There may be many processes that manage Customer entities. For instance, one process might create new Customers while another might handle the billing of Customers.

In short, you define processes so that you can control things that you believe are important for achieving some goals. Essentially, a process definition describes how you want the world to behave.

Life cycles and business processes

In this section I present possibly the most important topic in this blog entry. It is not important because lifecycles are important. It is in fact the exact opposite: lifecycles are almost always useless in modelling. Unfortunately lifecycles are prevalent in models almost everywhere and that is a big mistake since they almost always do more harm than good.

The lifecycle of an entity is a description of the state changes in an entity over time. It turns out that the lifecycle of a business entity is an illusion and can cause lots of grief. The reason is simple: a lifecycle is a consequence of one or more processes managing the business entity.

Business processes will twist and turn a business entity in such a way that the state of the entity benefits the business. A lifecycle of an entity represents the sequence of state changes in the entity that occurs as a consequence of one or more processes manipulating the entity. A lifecycle should therefore not be viewed as something intrinsic to the entity. In fact, it is often very difficult to deduce the lifecycle of an entity that is managed by multiple processes.

Maybe you object and say that it is the process, not the lifecycle that is the illusion, or that it makes no difference which view you choose. Not so. The relationship between processes and lifecycles is not symmetric. The reason for this goes back to the purpose of a business process. The purpose of a business process is to manage entities in such a way that certain goals of the business are met. It is not the other way around since the goals of the business are not managed by lifecycles of entities.

There is also a practical reason for not treating lfecycles as building blocks in models. When you fuse a lifecycle together with an entity you effectively grab the entity and eliminate the chance that it can easily be used in some other processing. After all, you’ll have to store a state attribute in the entity keeping track of where in the lifecycle the entity is. If by any chance you one day decide that the entity fits with some other processing, you already have incorporated a state attribute into the entity. Now, you either can add another state attribute to your entity in order to support the new processing or you can create a very complicated – here I really mean very complicated - lifecycle incorporating the new processing with the existing one.

From the discussion it is clear that the processes within an organisation will be subject of scrutiny when an organisation does not meet its objectives. It is usually not the lifecycles of the entities or the entities themselves that are the primary target of investigation. Instead, it is the processes that are the primary target for changes and optimization. A consequence of changing a process is that state-changes in business entities will follow a different pattern. This pattern or the so-called lifecycle of business entities may be altered, but again, only as a consequence of changing the business process.

Here is a concrete example from daily life that shows the fault of using lifecycles of entities as the primary focus when modelling. When you want to achieve a goal you probably do as I do - you make a plan. If I want to fly to London tomorrow I make a plan: I will set the alarm clock before going to sleep, I will turn off the alarm clock after waking up, I will drive to the airport in the morning, I will take the plane to London and so on. I plan my day based on the activities I have to execute so that I will achieve my goal of reaching London. I don’t plan my day based on the lifecycle of my alarm clock, the life cycle of my car and the life cycle of the airplane. State changes in my alarm clock, my car and the airplane are simply consequences of executing my process that will bring me to London.

Why did I put so much focus on not modelling life cycles? The reason is that in the past I've encountered lifecycles which were modelled as intrinsic properties of entities in multiple projects. In all cases – with no exceptions - the cost of making even minor changes to processes within the business was catastrophically high. In some cases entire systems were rewritten simply because of a modelling error which could easily have been avoided had the designers known not to use life cycles as intrinsic behaviour of entities.

The fine line between events, entities and processes

Many phenomena can be viewed as entities in one context and as processes in a different context. Consider a telephone call. When a telephone call is in progress the participants in the phone call can see it as a running process. From their viewpoint the call is a sequence of events. Now imagine that you work in the billing department for a phone company. Here you view the phone call as an entity since it contains a state representing information from the past when the call was made together with other possible business information which may be altered by one or more business processes.

The telephone call could also be described as an event in some other context. A few years after the call was made a lawyer might use the call in a court case where he presents the call as a happening that occurred on June 26 2015 at 4:02pm. If the duration of the phone call is of no significance lawyer will most likely describe the call as an event.

The takeaway knowledge form the example is this. There is no intrinsic property of a phone call that makes it a process, an entity or an event. Depending on the context and the purpose a phone call will be described (modelled) in fundamentally different ways.

A general observation which includes classification of something as an entity, process or event is: the identity, meaning or purpose of something is a relation, as opposed to an intrinsic property, between an observer and the something.

Implementing process execution – an observation

I often hear the comment that the make utility should be used to implement business processes. In general that is not a good idea for the simple reason that make derives the state of a process from the artefacts that are generated during the process.

In general the machinery executing a process should manage the state independently of any artefacts that were created. Many ad-hoc implementations however do not represent the state explicitly. Instead they derive the state from artefacts that have been created during execution.

A problem with deriving the process state from artefacts is that it is not in general possible to derive the process state solely from artefacts. For example, when a process execution crashes there may be artefacts left over which are not complete or that are corrupted. If the artefacts are used to derive the process state when restarting process execution there are no guarantees that the derived state is correct.

Roles

A role can be seen as a placeholder that an actor steps into at some appropriate point in time. For example, if you borrow 50 cents from me to buy a Coke you play the role of a borrower. When I lend you the money I play the role of a lender. The borrower role is a relation between the loan creation event and you. The lender role on the other hand is a relation between the loan creation event and me. In general a role is a relation between an event and one or more participants in the event.

It is clear that a role is one of tool that allows you to add an extra level of indirection at the modelling level. As you already know, indirections lie at the heart of developing maintainable software. Not surprisingly it is often also at the heart when developing maintainable models.

A last point I want to make about roles is that they are useful both at the problem level and at the technical solution level. When we model at the level of the problem roles are useful for understanding the relationships between concepts. When modelling with classes at the level of the technical design (solution level), roles decouple abstractions from a host of possible specializations. This often allows us to control dependencies between classes in better way than had we designed using base classes for all potential participants in some event.

Associations and attributes

It’s a common belief that associations are always binary, tying two things together. This is not the case. Here are a few examples of non-binary associations. A loan can be seen as an association between a lender, a borrower and an amount. A cash account in a bank can be modelled as an association between an owner, an amount and a currency. And finally an example that stretches my view of associations: a sailing boat can be viewed as an association between a hull, a mast and a sail.

Consider the following. Is my height an attribute or an association between myself and 192cm? Maybe the 192cm is an attribute and my height is the association. Or, could it be that my height is the attribute and 192cm is the value of the attribute? Or again, is height the attribute as opposed to my height? Maybe it is not an attribute at all. After all, I can state that I belong to a category of people having a height of 192cm? Does the height attribute refer to the symbol 192cm or does it refer to the distance of 192cm? Say I compare the height attribute, assuming one exists, of me with someone else that has the height of 1.92m. Are the attributes the same? Should I compare the distance that 192cm represents with the distance that 1.92m represents? Or should I compare the symbol 192cm with 1.92m?

Are attributes associations or something else? In the height example the question translates into the following: is the attribute a relationship between myself and a height (distance or symbol)? Or is it the height that is the attribute? The fact that there is no clear-cut answer is not so important. What is important is to realise that it is not clear cut when to model things as attributes or associations. It is important as well to realize that how to model something must be based on if it helps in understanding or solving a problem.

Summary

Most models created in IT projects contain handfull of commonly used constructs. The ones I've discussed in this post are:

  • Entities
  • Value objects
  • Processes
  • Roles
  • Associations
  • Attributes

I also discussed why lifecycles should not be used in models.

When classifying something as one of the named constructs it is important to take the context and a purpose into account. For example, when classifying a telephone call as a process it is done for some specific purpose in some specific context.

The take home lesson is that the thing being classified does not have intrinsic properties specifying how it should be classified.

As I mentioned in the beginning of this post, I aimed at describing some basic ideas and concepts in modelling. I hope you agree with me that I stuck to the basics without much elaboration. In the next blog post I'll discuss the basics of using concepts in IT models.