jeudi, janvier 15, 2009

Project NACA: migration from an IBM mainframe to Intel/Linux servers - rationale and strategy.

UPDATE: Project NACA has given birth to Eranea, a company dedicated to 100% automatic migration of mission-critical apps to Java & Linux. Check out or email to for more information

[pour les lecteurs français: traduction - demandée par des lecteurs anglophones de ce blog - de la version originale en français qui reste à consulter ici pour les intéressés]
[for english readers: don't hesitate to get in touch to obtain additional detailed infos about the project]

This article will describe the NACA project run by Publicitas, advertising sales company based in Switzerland and executed by Consultas, IT subsidiary of the group. It is my employer.

the NACA project was about replacing an IBM mainframe under MVS/OS390 (zOS) with Intel servers on Linux. The project started in January 2003 and successfully ended on june 30, 2007, with the last 2 years much more active than the first2. It was on purpose implemented in a 100% iso-functional way, i.e. without any functional / applicational improvement brought during the process of trans-coding itself and by the transcoding engine. 4 millions lines of COBOL were 100% automatically trans-coded toward their Java equivalent.

The recurrent annual savings in cash-outs amount to a total of 3 millions euros, 85% of the initial yearly level. 
First of all, meaning of the project name:
  • in French, NACA = “Nouvelle Architecture Centrale d’Applications”
  • in English, NACA = “New Architecture for Core Applications”
As the title of the article and the project acronym imply, NACA was launched within Publicitas (Lausanne, Switzerland) to initially convert an IBM mainframe model G5 operated through the standard palette of “big iron” software (MVS/OS390/zOS, CICS, COBOL, DB2) toward its canonical equivalent in the Open Source world: Linux, Java, Tomcat, UDB. 
The aim was to transfer the homegrown administrative application (called PUB 2000 - developed at the end of the eighties) to state-of-the-art technologies, i.e modern and (cost) efficient. 
The first thoughts around the project emerged in July 2002 when we realized that Open Source software was gaining ground and was already heavily utilized for applications much more critical than ours. Multiple case studies were published about core applications in key industries: energy, aeronautics, space, finance, with famous names like Boeing, Sony, Nasa

Most of those industrial applications were run at much higher load with more severe performance / availability constraints than ours. Our own activity is around 750′000 transactions done by approx. 1′500 internal users.
From another perspective, the highly successful startups of that time like Google were demonstrating that Linux can be the cornerstone of a very large scale infrastructure delivering a complex service over the Internet, By that time, it wasn’t yet through 1 million servers delivering 120 billions pages per month but the load level of Google was already far higher than ours. About reliability, the proven resilience / reliability of Open Source is best described by Linus’ law (Linus = Linus Torwald, father of Linux) made famous in the essay of Eric Raymond named “The Cathedral and the Bazaar“: “given enough eyeballs, all bugs are shallow”.
All lights were green. So, we decided to launch the project knowing that many hurdles and unknowns were still in front of us: such a full migration from a mainframe under zOS was (minimally) to be considered as pioneer work. Some even said by that time that it was crazy or heretic…
But, we started to explore the technological possibilities for NACA because the first motivation was massive and critically important: the mainframe had a yearly cost of 3 millions euros paid to IBM and third parties and 2.5 millions euros, i.e. 80%+, were used operating the hardware.
The initial business case was simple (some even said simplistic…) : Open Source software is more or less free. IBM mainframe hardware can run Open Source. So replacing each component of the mainframe palette by its OSS counterpart would save us millions of euros. The only point being that you have to accept this brutal computation…
[Note: IBM was by then publishing figures saying that less than 1% of linux kernel had been modified to run on the mainframe hardware of series G]

So, mainframe Linux is clearly the same as the one running of Intel servers]
Those massive savings were attractive enough for the management and financial staff of Publicitas to authorize the project and support it further during the various troubled periods that always happen during the course of such a risky project. Those troubled times will be the core topic of a subsequent article.
We started NACA with following objectives and guidelines:
  • progressive migration: the one-shot “big bang” of a global migration from the old system toward the new one, implemented over a single week-end was banned from day 1 and for ever. All team members knew internal or external “big bang” projects that failed after a couple of unsuccessful “big step” trials. So, we decided to build our project path rather than as a very long stairway with as steps as possible, those steps being as small as possible. The idea was to achieve many incremental and irrevocable (but always reversible) successes.
  • 100% automatic and strictly iso-functional trans-coding: we did not want to mix genders between technology update and application improvement. This confusion inevitably leads to additional constraints becoming lethal most of time. So, we decided to migrate each COBOL function to its strictest equivalent in Java. So, at the end of the project, just the same application but delivering the same value for a drastically reduced cost. In addition, the automatic (i.e. costless) trans-coding means that the Cobol maintenance doesn’t have to be interrupted: the next run of the transcoder transfers the newly implemented Cobol code to its Java equivalent. So, the end users remain happy while the system engineers achieve their technological marathon over the many years of such a project.
  • continuation with teams in place: the employees loyal to the company and its IT system over 20 years are the best experts to execute a smooth migration. Knowing that system engineer spend their career learning new technologies, shifting to Linux is just another milestone on this road. We just added a couple of linux- and OSS-oriented “rookies” to accelerate the technology adoption.
The tactics of this progressive migration is clearly NOT to build the new system in parallel to (i.e separated from) the old one. It is rather about replacing one by one components of the legacy system with OSS technological equivalent in the live production environment. The target is to preserve the quality of service of the old world in the new system under permanent mutation. Thus, the end-users never perceive the launch of a new system but rather sense the permanent evolution of functions that have always been there. This evolution also brings improvements at no big cost: for us, THE example is the generation as pdf files of our administrative documents (orders, vouchers, etc.): in the mainframe world, it was always too expensive to be done but under Linux it came for free in course of migrating the print management system.
The consequent logical order is to start from the lower system layers of the system stack and then go up toward the application layer: the transcoded Pub2000 could not be launched before the ground layers were there!
The advantage of this bottom-up progression is clear: the existing systems engineers are the first employees to get involved. As such, they become clearly the owners of the new system and master all the Linux and other OSS technologies before anybody else. They feel then highly comfortable and have no fear of being displaced when the developers enter the game a few months later.
From another perspective, the iso-functional transcoding to Java is essential: by using such a cross-compiling tool (that we ended up developing by ourselves in order to achieve the 100% automation that we targeted initially - I’ll come back on that in a subsequent article), the functional maintenance of application is never interrupted. All changes made to Cobol are smoothly, instantly and at no cost transferred to the Java version in the next run of the transcoder (almost done on an hourly basis).

This relieves a very important constraint on the new Java / OSS version of the application: no firm release date is really needed from a functional standpoint. If, let say, a new legal function was implemented only in the new Java version after hand-tweaking of the half-baked java code, a delay on the productive release of the Java version could be dramatic for the entire project.
On the contrary, with the strategy that we chose for NACA, our developers did their functional maintenance on the cobol version (as they always did) until the new java version was satisfactorily used by 100% of our end users (progressively migrated groups by groups) for several weeks in the production environment. At this time only, the transcoded Java version of the application became the official new source code of PUB 2000. If any problem had occurred in this process and in the worst case, we could have brought 100% of the users back on the mainframe in matter of minutes.
It never occurred, by the way. 
But this careful and parallel progressive allowed us to make quietly some (unexpected) stops in user migration at points were some performance thresholds were reached in our Java architecture: our java / database experts would then adapt the system architecture or transcoding + runtime strategies to overcome the hurdle and we could proceed further with more migrated users. [All details in a subsequent article].
Last but not least, we wanted keep the very loyal teams in place even though the entire system was eventually changed. We offered all of them a very intensive OSS training package. The motivations were those:
  • Such a migration can’t succeed without full support of the team mastering the legacy system. Ten of thousands of tiny details (Remember: the devil is in them!) have to be handled for the project to succeed. So, hoping to generate competition by trying to oppose the young wolves of OSS to the old crocodiles of legacy mainframe would be a clear fatal mistake.
  • Most system engineers are used to change the horse that they ride (i.e the operating system that they master) many times over their career. At the end of the day, this is just another occurrence of such a switch! When they see that the world around them is deeply changing around them through a massive migration toward OSS, they welcome (with minimal resistance to change) any opportunity that allow them and their competences to also switch through a good training. Fundamentally, they are more experts in system architecture than in any specific operating system. It means that they just have to learn a new set of “bits and bytes” habits to transfer their expertise to the world of Linux and OSS. The core fundamentals of most operating systems remains essentially the same whatever the name of the OS is! It is just another command syntax and parameter definition format to learn…
To close this first episode, I would like to add that such a migration from proprietary world also brings intangibles advantages that are eventually almost as important (more important than ?) as the very sexy financial fact-based ROI that you can demonstrate for such a project. 
Those intangibles advantages can be presented in one sentence: “we are back in the market”. And even if you can’t equate this to euros, it has a clear value:
  • you can hire new engineers in a much simpler way: mainframe system engineers tend to become rare and then expensive!
  • you can interconnect the new version of the application in a so much easier way with other internal systems (CRM, SAP, etc..) as well as with external sites over the Internet (E-commerce, E-business). Those interconnections suddenly become 10 / 100 / (1000 ?) times more easier when you can rely on the full J2EE environment as well as on tons of open source java library.
You can then integration your home grown application in a much more flexible and efficient way to newer components of the IT systems: the old file exchanges, data conversion and import / export processes can disappear to be replaced by synchronous real-time data exchanges over any kind of remote procedure call. This clearly brings value to the business through processes that are accelerated.
As a conclusion, the only valid initial catalyst of such project are savings. They are so massive that they get the attention of any CIO. But, when those savings are achieved, another benefit probably even more important emerges: the company can now significantly improve its business through IT in ways that were only unachievable dreams before the migration. Some were not even anticipated!

And to close the loop, those improvements can even be financed through the savings just achieved via the migration…

Aucun commentaire: