Modern advances in new technologies enable us to develop complex and intelligent software solutions faster and faster. New programming languages are created every year and frameworks for the most diverse application areas of software solutions are developed and perfected. But the reality in the largest companies in the world is different. The gap between the modern software world and the software technology actually implemented in these companies is growing apart at a continuous rate. The language is from the large legacy software applications that are used by large companies and corporations today. They carry a large part of the western value chain and enable the exchange of means of payment, the purchase of goods, the payment of wages and the efficient operation of our infrastructure. These systems are like large tankers that, once set on course, will unstoppably do their duty and carry companies to their long-term destination. The building materials of these large legacy applications were developed half a century ago. Two to three evolutionary stages after the invention of punched cards for automated looms, they go by catchy names like Number One Programming Language (PL/1) or Common Business Programming Language (COBOL). The fact that these programming languages were introduced to the market more than 50 years ago should not obscure the fact that they are still very widespread and our socially most important software applications are programmed with them.
Companies are feeling tremendous pressure to modernize. This pressure is generated by customers, the market, employees, partners and regulatory requirements. There is an urgent need for digitalization, automation and the increased use of artificial intelligence to make intelligent business decisions. To do this, these legacy software applications must be embedded in a dynamic modernization environment and will gradually hand over their functionalities to more modern software systems in the coming decades.
How do you modernize legacy applications?
In recent decades, attempts have been made to replace these large, monolithic software applications with leaner, more service-oriented software systems. The reason why these projects are not so easy to implement in reality lies in the functional and technical complexity of the legacy applications. They contain a large part of the core value creation and technical logic and are networked with the other IT systems in such a way that even solving one or two nodes can cause a total collapse in the value chain. It is precisely this risk that the value-added applications will not work in the desired quality and speed that makes many decision-makers refrain from modernization projects. In addition, we are talking about high one to three-digit million amounts for these projects and an amortization period of several years to decades. The status quo in most companies is the understanding that the large monolithic systems are gradually being relieved of their functionality and that the newly developed surrounding systems are taking over these functionalities. A concerted orchestra of individual specialist software services is intended to replace the legacy application over the decades and ultimately make it obsolete. In reality, this means that these host systems will remain in use for decades and the software development departments will have to adjust to this fact.
What are the challenges for the further development and maintenance of complex legacy applications?
The biggest blocker in the further development and digitization of our western companies lies in the pronounced lack of good software developers and software engineers. This circumstance is even more pronounced if you need special knowledge of the historical programming languages mentioned above. The majority of the software developers who were trained in this technology tens of years ago are or have already gone into well-deserved retirement. What remains is a select group of software developers who have acquired these programming languages in the course of their careers. However, the demand on the market is much higher than the expertise that is actually available. The big IT companies have already started initiatives to train new young software developers in these important programming languages. Another approach is to use the available resources (software developers) more efficiently. With modern object-oriented programming languages and frameworks, it is common for applications to provide state-of-the-art programming environments with intelligent support services for software developers. Historical programming languages lack these modern and AI-supported programming supports. Many companies have created the programming environments themselves. In addition, there is a lack of adequate documentation of the software applications in all cases. These applications have historically grown over decades and in addition to the team, the tools and methods of documentation have also changed. The fact is that comprehensive documentation would be urgently needed, but the information provided does not deserve the title “software developer documentation”. It is not enough just to show the structure of the software or to generate large diagrams and graphs like wallpaper, but you have to make the complexity of these mammoth-like software structures manageable. The knowledgeable reader of this article will immediately nod in agreement at the statement that each of these software behemoths is unique. It is precisely for this reason that no standardized methodology for managing complexity has been established on the market. The main factor for the complexity lies in the individuality of the legacy application and only individual documentation can make this complexity manageable for software developers.
How can the complexity of a legacy application be made manageable?
Step one to mastering complexity is finding commonalities. Every software application is characterized by four specific dimensions:
• User interaction with the application,
• storage and processing of data,
• Exchange of data and functionalities with surrounding software applications and
• Scheduled processing of data and functionality.
Based on these dimensions it is possible to create a kind of map for these legacy applications.
Every software-based business application reacts to user input and commands. These user commands process data, exchange data, calculate values and return the results to the users. We humans are strongly visually triggered and can familiarize ourselves with complex matter much better through images and graphic representations than through reading pages of text. For this reason, the representation of graphical interfaces in an individual software documentation is an essential necessity and is ideally generated automatically. The actual digitization potential of the present legacy application thus becomes visible. The graphical user interface gives you a good overall impression of the application within seconds and familiarization with the complex subject matter is made significantly easier. However, it is not only the graphic representation that is of interest, but also the course of the user interaction. The user control flow can be represented via a graphical flowchart from calling up one mask to navigating another mask. Complexity levels are abstracted using drill-down options. For modernization projects, it is essential that program variables and function calls can be traced from the graphical user interface through the program logic to the data persistence layer. Graphic processing of this information and clear presentation is important for reducing complexity.
data and data access
Data is undisputedly the hardest currency of our age. A key reason for the longevity of legacy applications is that these software systems house the largest and most valuable data assets of our modern enterprise. So the key to successful modernization is to decentralize these data pots and provide distributed data access to them.
Accessing and managing these records can be boiled down to the five CRUDS access patterns:
• C: CREATE – New creation of data records
• R: READ – Reading records
• U: UPDATE – Manipulate and edit records
• D: DELETE – Delete records
• S:SEARCH – Search and list records
• S:SEARCH – Search and list records
The data structures implemented by a software application provide deep insights into the functional logic and the technical structure. The nature of the data persistence layer is individual, like the implemented business use cases. Nevertheless, detailed analysis and insights are required to make evidence-based transformation decisions and to correctly assess complexity and effort. These data-centric views of the program content are of great importance, especially for familiarization with new legacy projects or applications. Errors can be found and fixed faster and inefficiencies in the program structure can be eliminated. However, not only the data manipulations are of interest, but also the flow of the data through the functional application. The overview of the data and the data variables from the graphical user interface to interfaces and the database is exciting. Another aspect to reduce the complexity of these applications is the reverse engineering of the actual database schema. This means the actual data usage, which can be derived directly from the program code. In contrast to classic or technical data diagrams (DDLs or entity-relationship model), only technical data attributes are listed in these actual data models if the program actually carries out data manipulations on these data attributes. The difference between the technical data model and the actual data model shows inefficiencies and any problems in data management. This data usage and manipulation view is absolutely necessary, especially for large replacement projects or projects for the module-by-module replacement of a legacy application in service-oriented modules.
In today’s world, no software system stands alone on a green field. But on the contrary. The applications are more networked than the spun web of a spider. Data is constantly being exchanged between the systems and ends up in different places in data pots. The data is manipulated by a wide variety of services and software programs and new results are calculated based on them. The analysis of a single monolithic system without considering the surrounding systems would lead to fatal technical misjudgments and a massive underestimation of the effort. In addition, a great deal of specialist and business logic only arises through interaction with other systems. Examples of this are the exchange of user data with the HR system, the connection of ERP systems for financial posting or the exchange of customer data with CRM systems. For experienced legacy system programmers, the biggest challenge remains the large, monolithic and linearly programmed software program code. In reality, it is practically impossible to extract the information on the exchanged data and the existing interfaces in an acceptable time and with an acceptable effort and to be able to derive technical processes from them. It requires the automated analysis of the entire program code of a legacy application in order to list all interfaces with other systems that have actually been implemented. Technology has evolved and changed massively over the past few decades. In their early days, these software systems exchanged data via files. With the advent of database technologies and web technologies, the interface behavior has been adjusted. Modern host systems use up-to-date interface architectures such as REST or web services. Even a complete list of all interaction options with other peripheral systems offers considerable advantages and a significant reduction in effort in modernization and software development projects. Troubleshooting and maintenance are also made easier by quickly finding the required interface, including drill-down options, right down to the actual program code.
Whenever one reads, speaks or writes about historical legacy applications, scheduled batch processing is at the heart of these applications. Only in the last few decades has an online-enabled area emerged. However, the core of the application remained as batch processing. Programmers who have been trained in modern programming languages very often lack the understanding of batch processing and also the will to fully grasp and understand it. In addition, experience from umpteen projects shows that batch processing in particular is one of the well-kept secrets of senior software developers in the company and they protect this knowledge like a treasure. Experience shows that it is ten times more difficult for business analysts to get scheduler information or the scheduler processes than the data diagram of a legacy application. This means that it is very time-consuming and expensive to get the correct information in an organization – when and which time-controlled job is running in an application and which parameters are given. This may also be due to the fact that the knowledge is gradually disappearing from the company and the familiarization with this time-controlled processing is immensely more complex than with the classic user-controlled online processing. The separation into software development teams and software operation teams increases the effect of information hiding. In addition to the scheduler information declaring “when – what – how” will be executed, the actual flow of the batch processing is useful. It’s all about the business logic that is executed by this timed batch processing. These systems often simulate online capability by processing batch processing that is triggered at short notice. A successful and complete modernization of these applications is only possible through an intensive understanding of the entire batch processing.
Code Analytics & Software Insights
Software makes decisions based on programmed logic and the underlying data. For digitization projects and large software replacement projects, you should proceed in a similar way and make decisions based on an evaluated modernization methodology and analyzed data. The methodology plays an important role in this. A new phase in modernization projects is the discovery phase. A legacy system is automated and completely analyzed and documented. Furthermore, key figures are extracted for all important areas of this monolithic software, which provide an indication of the effort and complexity of a new development. Management decisions are thus based on data and made in accordance with reality. Errors are avoided and the business-critical areas of the application are prioritized higher and correctly. Transparency is achieved when complete software documentation is expanded to include tangible numbers, data and facts.
How can you make software decisions transparent and comprehensible for everyone?
Business and social decisions in our digitized world are increasingly being made by software algorithms. These decisions must be comprehensible for managers, employees and all participants in our society. Only a fully automated documentation and processing of software algorithms enables this transparency of technical decisions and programmed logic. An adaptive mode of presentation enables the different target groups to understand the decisions and to draw their conclusions from them. Decisions can be represented by mathematical formulas, decision tables or graphic processes. Depending on the application, different methods and representations are relevant.
Every legacy system is individual. The way the software is structured corresponds to the complexity of a physical building complex. This can vary from a small family house to a skyscraper in one of the big cities in the world. Depending on the application, the architecture and structure of these buildings differ significantly. This circumstance can also be transferred to software systems. This is also the reason why an individually adapted automated documentation of large software systems is necessary. In addition, the current challenges faced by software development teams vary from project to project. A one hundred percent standardized software documentation meets the individual requirements of these teams only to a small degree. Experience shows that a corresponding documentation framework – such as Sysparency – can be adapted with little effort in such a way that the important and individual questions of the software development teams can be solved to their satisfaction. This individuality also depends on the life cycle of the legacy application. For a replacement project, I need a different type of documentation than the documentation required by compliance for a running core banking application. For an application that is very buggy and is heavily integrated in a software network, the structure requires stronger methods and analysis procedures for finding these faulty program points and tracking the dependencies. In addition, the requirements for the presentation of documentation differ. The way the complexities are graphically represented and the semantic information extracted from the program code must be configured individually for each large legacy application.
In the last ten years, the Software Competence Center Hagenberg has been researching together with Sysparency on the optimal framework for the analysis and automated generation of documentation for historically grown legacy applications. The result is the Sysparency Transparency product. The algorithms process a wide variety of programming languages and use them to build abstract digital models of the software application. This abstract model is analyzed from different perspectives and the respective information is generated as documentation in the desired way. This enables the software developer to quickly familiarize himself with a new legacy application, the manager to make the right modernization decisions quickly and the specialist departments to familiarize themselves with and understand the specialist logic that has already been implemented. All these topics contribute to successful digitization projects and modernization projects and save time and resources.