When I began my software career back in the 90s, one of the software-adjacent skills I discovered I needed was risk management. When you’re building anything non-trivial, it’s likely that something will go wrong. How do you find the right balance between analysis paralysis and blindly charging ahead? How do you know what deserves your attention today, and what can be safely put on the back burner and monitored as needed?
This is where risk management comes in.
In this context, a risk is something that could possibly go wrong that would impact your work if it did go wrong. Risk management is the process of identifying risks, and deciding what do do about them.
One simple and lightweight approach for risk management involves looking at two factors: risk likelihood, and risk impact.
Risk likelihood is just what it sounds like: how likely is the risk to occur. Once you’re aware that a risk exists, you can measure or estimate how likely that risk is to be realized. In many situations an educated guess is good enough. You don’t need to have a perfectly accurate number – you just need a number that no key stakeholders disagree with too much. Rather than assigning a percentage value I prefer to use a simple 1-10 scale. This helps make it clear that it’s just an approximation, and can help prevent unproductive discussions about whether a given risk is 25% likely or 26% likely.
Risk impact is also what it sounds like: how bad would it be if the risk did occur? I also like to use a simple 1-10 scale for measuring risk impact, which is more obviously subjective than the risk likelihood. So long as everyone who needs to agree agrees that the impact a given risk is 3 or 4 or whatever, that’s what matters.
Once you have identified risks and assigned impact and likelihood values to each one, multiply them together to get a risk score from 1 to 100. Sort your list by this score and you have a prioritized starting point for risk mitigation.
Risk mitigation generally falls into one or more of these buckets:
- Risk prevention – you take proactive steps to reduce the likelihood of the risk occurring.
- Risk preparation – you take proactive steps to plan for how you’ll respond to reduce the impact if the risk does occur.
For risks with high risk scores, you’ll probably want to do both – you’ll take steps to make the risk more likely to occur, and you’ll take steps to be ready in case it still does.
Here are a few examples of risks that might be identified when performing risk management for a BI project, along with examples of how each might be mitigated:
- Risk: A database server might be unavailable due to hardware failure, thus interrupting business operations
- Possible prevention: Purchase and configure server server hardware with redundant storage and other subsystems
- Possible preparation: Define and test a business continuity and disaster recovery plan for recovering the database server
- Risk: You might not get permissions to access a data source in a timely manner
- Possible prevention: Prioritize getting access to all required data sources before committing to the project
- Possible preparation: Identify an executive sponsor and escalation path for situations where additional data source access is required
- Risk: A key team member might leave the team or the company
- Possible prevention: Work to keep the team member happy and engaged
- Possible preparation: Cross-train other members of your team to minimize the impact if that key member moves on
- Risk: Your data center might lose power for a week
- Possible prevention: Locate the data center in a facility with redundant power and a reliable power grid
- Possible preparation: Purchase and install generators and fuel reserves
- Risk: Your data center location might be destroyed by a giant meteor
- Possible prevention: Um… nothing leaps to mind for this one
- Possible preparation: Again um, but maybe using a geo-distributed database like Azure Cosmos DB to ensure that the destruction of one data center doesn’t result in downtime or data loss?
You get the idea. I’m not going to assign likelihood or impact values to these hypothetical risks, but you can see how some are more likely than others, and some have a higher potential impact.
Now let’s get back to a question posed at the top of the post: how do you find the right balance between analysis paralysis and blindly charging ahead?
Even in simple contexts, it’s not possible to eliminate risk. Insisting that a mitigation strategy needs to eliminate a risk and not only reduce it is ineffective and counterproductive. It’s not useful or rational to refuse to get in a car because of the statistical risk of getting injured or killed in a collision – instead we wear seat belts and drive safely to find a balance.
And this is kind of what inspired this post:
The “perfect or nothing” mindset isn’t effective or appropriate for the real world. Choosing to do nothing because there isn’t an available perfect solution that eliminates a risk is simply willful ignorance.
Most real-world problems don’t have perfect solutions because the real world is complex. Instead of looking for perfect solutions we look for solutions that provide the right tradeoff between cost and benefit. We implement those pragmatic solutions and we keep our eyes open, both for changes to the risks we face and to the possibility of new mitigations we might consider.
Whether or not risk management is a formal part of your development processes, thinking about risks and how you will mitigate them will help you ensure you’re not taken by surprise as often when things go wrong… as they inevitably do…
 Yes, I’m linking to a Wikipedia article for a technical topic. It’s surprisingly useful for an introduction, and any search engine you choose can help you find lots of articles that are likely to be more targeted and useful if you have a specific scenario in mind.
 This is the only approach to risk management that will be shared in this article. If you want something more involved or specialized, you’ll need to look elsewhere… perhaps starting with the Wikipedia article shared earlier, and following the links that sound interesting.
 If you are in a situation where “good enough” isn’t good enough, you’ll probably want to read more than just this introductory blog post. Are you starting to see a trend in these footnotes?
 That Wikipedia article takes a slightly different approach (direct link to section) but there’s a lot of overlap as well. What I describe above as “risk prevention” aligns most with their “risk reduction” and my “risk preparation” aligns most with their “risk retention” even though they’re not exact matches.
 The other BCDR.
 I had originally included the “giant meteor strike” risk as an example of things you couldn’t effectively mitigate, but then I remembered how easy Cosmos DB makes it to implement global data distribution. This made me realize how the other technical risks are also largely mitigated by using a managed cloud service… and this in turn made me realize how long ago I learned about mitigating risks for data projects. Anyway, at that point I wasn’t going back to pick different examples…
 However you want to measure that cost – money, time, effort, or some other metric.