Agile DORA compliance?
The Digital Operational Resilience Act (DORA) forms the legal framework for the digital resilience of financial institutions. De Nederlandsche Bank (DNB) supervises that in the Netherlands. One of the focus areas for the supervision will be the identification and managing of geopolitical risks. And it clearly indicates that they want to look more at how it functions in practice, instead of on paper. So how can we achieve that and set up real compliance?
The Toyota management principle of genchi genbutsu (go and see for yourself to thoroughly understand the situation) creates clarity. It makes transparent to management where the bottlenecks are, and where processes are harmful instead of helpful. I've seen that work very well in different areas a number of times, and we want to apply it to DORA too.
Kent Beck has a nice story of how he helps companies get to really understand fast feedback cycles and agility. He gets all the development teams and their management together and asks them to make a simple one-line change. And then goes through the whole process of getting that change to production. And everyone is asked to only work on things that help achieve that better and faster. That creates a situation where everyone can see the bottleneck. When it is fixed, the next one becomes visible when going to the process again. This creates the needed results.
I've seen a similar story play out well a long time ago when working at energy company Essent. It was having huge problems with handling customers moving. Lots of mistakes and incorrect bills were reported, and they managed to get on the national television consumer programs with them. Only when a member of the board finally came and looked at how a simple move required an employee to make changes on over 40 different pages in the then new SAP system, did the problem get the attention and focus needed. The budget and time was made available to create the customization to make moves easy to handle. That resolved the problem.
So how can we apply this in Dora? DNB explicitly identifies a number of different outsourcing risks, a.o.:
- a limited number of third party ICT service providers
- market domination of three major cloud providers
- concentration to just a few AI application providers
A geopolitical risk we have seen play out in the past few years is that of the government of the USA deciding to not allow US companies to do business with an organization they do not like. The International Criminal Court is the most clear example of that. Microsoft was not allowed to provide email and other services. Given the demonstrated instability in international trading policies and threats against allied nations, financial institutions governed by DORA should be considered at comparable risks.
DORA requires the financial institutions to do digital operational resilience testing. The large cloud companies have done substantial work on creating operational resilience, and the most famous tool for that is chaos monkey. It tests how well the system recovers from the removal of a (server) instance. The major financial institutions have implemented this for their cloud based solutions, and also some variants at a larger scale.
Testing resilience against the same case where US companies are not allowed to do business with them however needs much more, a.o.:
- cloud computing moved fully away from Microsoft Azure, Google Cloud, Amazon Web Services and Oracle Cloud Infrastructure;
- AI running locally or using open models;
- only very few of the third party ICT service providers can be expected to be available (most are US owned);
- source availability so necessary changes can be made.
Given that this probably does not leave that many systems still running and fully operational, it provides a good starting point for an agile approach to building up resilience:
- what can we still deliver when we apply these limitations?
- what is the smallest improvement we can make that helps us deliver more of what we critically need to deliver?
Applying genchi genbutsu in this way allows us to build a cycle of incremental and iterative improvements that can result in real resilience and real compliance.
The resistance against testing this with the real production systems will tell most organizations a lot about how ready they are to be really compliant.