The focus of every manager / executive's concern has to be risk. Every technology project has risks, and some have more than others. Here is a sampler of risk types...
- New technology
- New platform
- New application area
- New architecture
- Unexpected bugs in the substrate
- Requirements changes
- Scope creep
- Budget shrinkage
- Budget expansion (yes, this is a risk)
- Management loses confidence
- Operational personnel resistance to changes
- Inflexible design cannot meet minor scope / requirement changes
- Design has a strong and comprehensive critical path
- Thrashing the design during development
- Neglect of key ethnographic factors in the user population
- Unexpected implementation fan-out
- Unexpected component interdependencies
- Wrong decision on buy vs. build
- Wrong priorities
- Bad morale
- Unconsolidated team
- Loss of a key performer
- Poor development or debugging styles
- Deathmarch "get it done, no matter how" mentality
- Premature optimization mindset
Project management risks
- Poor reporting leads to surprise delays
- Detailed task oriented scheduling leads to massive PM overhead
- Excessive coupling between independent development teams
- Too many meetings
- Meetings too large, poorly focused
- Too little communication
- Too much communication (mythical man-month)
- Not enough knowledge management
- Failing to identify the actual customers of the system (who they are may surprise you)
Almost every one of these risks can be mitigated by the right management style and the proper due diligence in investigation and design. Others can be mitigated by pilot projects and proof-of-concept implementations. And a few are uncontrollable, but can be dealt with using a "SWAT team" strategy. Let's have a look...
Dealing with Risk
Risk is something you need to deal with right from the beginning. When you have a concern about a component or a platform, you need to do testing of those concerns before attempting to schedule a project. When you have concerns about personnel or project management, you need to select a style and a methodology that will suit the project and stick with it.
Above all, you need to be consistent. The appearance of a risk factor as a real problem can be intimidating and disruptive, but it will be even more damaging if you allow it to throw chaos into your project. In almost every case, your design, your methodology and your management style, if properly selected, will be resilient enough to stand the strain - but if you throw them away or radically change them to deal with a problem, you will introduce a whole new set of risks - including personnel risks - that may be impossible to correct.
Mitigation in General
There are a few general strategies that can mitigate risk...
Know your customers well.
Use an incremental and evolutionary methodology to keep the rhythm of delivery at the right tempo.
Build a proof of concept for anything worrisome, and build it right away. To mitigate concerns by management that you are wasting time, try to design the proof of concept so it can be evolved into the actual system, or so that its components can be reused.
Keep your team on the "rested edge" not the "ragged edge".
Never let the politics filter down into the team.
Let the customers steer, but provide the road, and make sure you are driving.
These are the kinds of risks you have when something in your project is new. In general, at least one of these factors is in play. While these can be the easiest factors to mitigate, the obstacle can lie in whether your management will allow you to do so. And mitigation will take time - in many ways, these are the most costly risks to mitigate, because they almost always require some level of up front development and testing.
Take the time. It will be worth it. If these risks suddenly manifest themselves as real problems once your project has gained momentum, you run the worse risk of destroying momentum, morale, and confidence.
And, you need to always be asking - do we really need this? Is this new "whatever" worth the risk right now?
For instance, do you really need to migrate to a new IDE in the middle of your project? Do you really need the latest and greatest version of the operating system? Do you have to have a new messaging middleware product as a critical success factor?
Let's look at specific mitigations...
New technology - If this is a purchased technology, then get an on-site test with your teams involved written into the RFP. Don't buy a new technology without having used it. Make sure the contract has a clause that allows return of the product and refund of the money in the event the product cannot be used for the intended purpose, but don't necessarily expect to recover installation expenses. If this is a purchased or existing technology that is new to your team, do a pilot project and develop written procedures to document how to use the technology.
New platform - If you are going to be working on a new hardware or operating system platform, as with a new technology, do some testing first. Here, your concerns will probably be compatibility with the way you develop systems, performance issues, and potentially disabling bugs. A search of trade press, newsgroups, and other resources can often alert you to the potential for problems and suggest what you need to test.
New application area - This is common. It requires that the team as a whole spend a fair amount of time not only learning in a "book sense" about the new area, but that they spend time talking to and hearing from the subject matter experts who represent the customers. They need to internalize the concerns and critical issues that are part of the application area so that when they make detailed design and implementation decisions they won't violate well-founded expectations.
New architecture - This is perhaps the most common emergent risk. Unless you are fortunate enough to have a well-developed set of layered components to build on, you will be simultaneously developing a new application and an architecture to underlie it. Getting the architecture wrong is a great danger, and the temptation can be to put all resources onto developing the new architecture to see if any problems will emerge. The error in this, however, is assuming that the problems will surface in the architecture taken in isolation. They won't. The problems will only surface when the architecture is used to develop the application. This requires a parallel and iterative approach in analysis, design, and implementation, and is one of the primary reasons for an intimate relationship between the architect and the developers.
Unexpected bugs in the substrate - There are few, if any, up-front mitigations for this risk. Instead, make sure that project management has left time in the scheduling of your key performers who can best handle unexpected bugs as they emerge. Their role in that event will be to identify if a bug is in fact a bug and to develop a workaround for the bug. Keep in mind that the performers who can do this have special qualities and need to be carefully selected. You won't be able to keep them floating around waiting for this to happen, so make sure they have the personality that allows them to easily multitask complex issues, and to not resent interruption.
These are the risks that every manager fears the most, though, in many ways, they are the easiest to address.
However, mitigation requires a lot of knowledge management and communication between you and the customers. And one of those customers will be your management. Keep in mind that if these risks get out of control they will feed on each other and throw your project into a death spiral. (Most aircraft manuals talk about the risk of a spin - "if you cannot recover within n rotations, structural failure will occur" - that is even more true for software projects).
The primary mitigation for all of these is a regular tempo of evolutionary releases. The challenge is to have the releases contain enough useful capabilities to demonstrate success and curb the appetite of the customers, to have them be independent enough that you can reschedule them as priorities require, and have them be small enough that the team can easily complete them within the time frame.
On a particularly aggressive project, you can use staging to accomplish a neat tempo while leveraging parallel teams to a maximum extent, as shown in this diagram...
Here, the development team combines responding to test issues for the components in the first feature set with preparing to develop the next feature. Once the components have passed unit and integration testing, the developers move on to the next component set, while the integration and release teams go on to the first release of the first feature. Naturally, there may be issues which arise to demand the attention of the development team for Feature 1 while working on Feature 2, but if the testing process has been handled properly, these should be few and far between.
Using a small release / evolutionary process generally means that scope creep can be reduced. The issues of scope creep can be confined to "what will go in the next release" or the following releases, and can be made an issue for prioritization meetings. But they will not disrupt team progress and will not have the cascading effects of damaging morale, lowering performance, delaying releases and thus damaging management confidence.
In addition, this reduces the problems of budget shrink. Repriortization of follow-on releases can allow flexibility in dealing with budget variations while avoiding disruption of the in-progress release.
However, budget enlargement is a different issue. This usually happens when management wants the project to go faster and asks - "couldn't you use more people" or "how about cool tool 'x'?". The temptation is to take these additional resources and use them, but be cautious. The first additional resource (people) can generate the "unconsolidated team" risk (discussed below) and the "too much communication" risk (also discussed below). The second additional resource (cool tools) can introduce the emergent risks of new technologies or new platforms, or could even impact the architecture. Your first inclination should be to turn down unplanned resources, but keep in mind that this introduces a risk of the "loss of management confidence", so you must be prepared to explain your rationale in detail. And you need to make sure that you can succeed, because if you fail when you have turned down additional resources, management will blame your project management style for the failure, even if a million other factors were in play.
A final area of external risk lies in the potential for resistance from some of your customers. Yes, some customers do not want you to sucessfully complete the project, or at least, they don't want the pain of dealing with its introduction. There can be many causes...
Operational personnel fear that the introduction of the new system will disrupt their productivity, and that they will be held responsible for the loss. Mitigation: Involve them early, have a minimally disruptive training plan during the pre-release period, make sure their management understands that the transition period will typically cause a productivity downturn, and that their management expresses this understanding to them. A transition committee, composed of management, operational management, operational personnel, and technical team members can help to make this risk much smaller.
Management is imposing this system on operational personnel in an effort to gain control over some aspect of operations they consider an issue. You are now between the proverbial rock and hard place. Even when you take the mitigation steps outlined above, you can expect there to be resentment over the system. The best mitigation: Do the above, and, additionally, make sure that the operational personnel are aware that you are sympathetic to their concerns. If your analysis reveals issues in the operational processes, work with operational management to help develop processes, procedures, and system features to mitigate those issues, and involve the operational personnel deeply in that process.
You have missed some important tacit knowledge in the operational area or you have failed to address it with the personnel who know you have missed it. Mitigation: study any affected operational personnel (that is, direct end users) in their workplace during the early phases of analysis, and make sure to review the analysis with the end users. This technique, called ethnographic study, is an important human factors tool and needs to be part of every analysis and design process.
You or your area are hated. This can be due to past history, a personality conflict, a power struggle between your management and the user area management or any number of other reasons. The only mitigation is to be a nice guy, be sympathetic to all parties, and take all of the other mitgation steps listed above to demonstrate your intent. And don't forget to have a thick skin.
These risks are fairly easily controlled, though successfully controlling them requires a deep design maturity. The designer must design the system from the outset for excellent maintainability. In addition, the architect must identify and isolate the "hinges" and "hot spots" that can be expected to show the highest "mutuation rate". The following diagram, while showing the mutation rate for genes in a virus, demonstrates a pattern that is also true for every software system:
From "Biology for the Modern Mind", H. Bogen, 1968
You can see how most of the system has a low mutuation rate, but one spot has a massively rapid mutation rate, and four or five have a moderate rate. Identification and isolation of these hot spots means that the system is not only maintainable post-release, but it is also maintainable pre-release. Thus, the impact of requirements changes can be mitigated by considering them to be maintenance issues. Several architectural strategies are relevant...
Externalize - get potential issues out of the code and into configuration files and databases that can be quickly modified, perhaps even by end users.
Enginize - think of the code as an engine with plug-ins to control its behavior. As long as the control structure into which the plug-ins fit is sufficiently universal, all of the hot spots can be inside plug-ins that have a very controllable complexity.
Use factories - In object-oriented systems, you can isolate hotspots that might require the use of a variety of subclasses by making sure factories are used to produce the critical instances. Thus, subclass selection is isolated to the factory and can be easily inspected and altered without damage to the rest of the system. Note also that there is nothing wrong with using a factory factory (no, that's not a typo) to produce a factory of the appropriate subclass, though this is comparatively rare.
Use facades or layers to isolate third party components.
Other, more traditional practices, such as proper encapsulation, design of interfaces, etc. must also be a key part of change insulation. Again, keep in mind that this maintainability is key to the success of the evolutionary process, and does not just affect the post-release maintenance of the system, but also how quickly and safely you can react to requirement and scope changes in the pre-release period as well.
You also need to design so that there can be as much parallelism in development as possible. The ability to define interfaces early and a management requirement to construct test harnesses and simulators is key to accomplishing this. Done properly, this almost trivializes the integration effort and allows for easy regression and performance testing of all levels of component.
Finally, do not thrash the design during development. Once the coding starts, the team needs to retain confidence in the design, and this cannot happen if the architecture is in flux during the development process. This means that the architects must do whatever is necessary to ensure that the architecture can handle the demands of implementation and that management must resist the temptation to "do anything, but get it done faster" - a pressure that can seem unendurable, and which, in particularly bad environments where expectations have not been set to realistic levels, can lead to termination of employment for people at a variety of levels. In other words - managing expectations is critical to prevent pressure to change the design after implementation has started.
These can be controlled but not eliminated...
Unexpected implementation fan-out - This is the discovery that an implementation is much, much more complex than expected. The risk can be mitigated by pilot projects, proof-of-concept, and by early warning. SWAT team reserves need to be available in the event this gets to the early warning stage, and you may need to take away a component that is fanning out and give it to a stronger team member or to a subteam. This can be a delicate personnel problem, so be careful in doing so. Having the original developer as an advisor to the team can help, but that can also be worse, if the SWAT team discovers major developer-induced defects. A careful management attention to morale all the way around in this situation is critical to maintaining the energy and performance of the affected team member and colleagues.
Unexpected component interdependencies - This should never happen, but if it does, the design has not been carried down to a sufficiently deep level. The best thing to do in this situation is to create a subteam of those with the dependent components, require them to create test harnesses and simulators, and then let them go their separate ways until integration testing.
Wrong decision on buy vs. build - Buying when you should build and building when you should buy are common mistakes. Generally, the guideline is to make sure you have as much control as possible over the strategically differentiating components of your applications. You also need stable infrastructure and frameworks. If you can create a strategic differentiator from a unique combination of purchased components, then so be it, but be sure it will last long enough (the companies will not go out of business or sell off the product, or go off on a tangent). If you cannot get a stable architecture from purchased components or frameworks, or if the pain of adapting to the model they represent is too high, do the work inside - assuming you have the appropriate skills and resources available. But keep in mind the long-term committment of personnel and resources, and the potential opportunity cost of any expenditure.
Wrong priorities - Management must be constantly ahead of the development / acquisition teams, assessing the priorities of next steps in the light of rapidly changing business conditions. However, priority thrashing, often disparaged by software professionals as "marketing-run development", can be highly dangerous to project continuity. If nothing else, it convinces development / integration staff that management has no idea what they are doing - and this is as dangerous to success as management losing confidence in the technical staff. The danger implies several management mitigations: a tempo to strategic reevaluation that is synchronized, but ahead of, the release tempo; a careful mapping of strategy to technological projects; a consistent and complete communication of context to the technical teams; and a willingness to accept upstream feedback from technical teams about the dangers of a particular strategy change for things like architectural evolution and project completion rates.
Nothing is more important than having and inspiring the best people. Studies indicate that the difference in productivity and error production between the worst and the best performing professional software developers is between one and two orders of magnitude (10-100 times). One suspects that similar ratios obtain for every area of technical staff. In addition, the critical benefits to an organization from personnel who are integrated into the organizational and technical context include lower operating costs, higher productivity and software that works better with the organization. However, there are also downsides - including a rigid mentality ("we've always done it this way"), cynicism about mission or management, inability to look outside the context of a particular tool or process, and constrained industry and organizational experience.
In the course of a development or integration project, personnel management is key. It is critical to ensure alignment with mission and strategy, excitement about the prospects for an organizational win, willingness to deliver bad news to project management, inspiration based on the quality of the leadership, and belief in the ability to succeed and to receive credit for success.
There are major risks if these factors are not attained...
Bad morale - Management has a critical role to play in the development and maintenance of morale, and they have the most to lose if morale goes downhill. Morale is largely based on the potential for success, and it is the obligation of management to provide every opportunity for teams to succeed and for individuals to receive the credit for their contribution to that success. Thus, mitigation includes: careful selection of the mission, making the mission relevant to the team, connecting the mission to specific project elements, identifying key success factors and making sure they are in place, demonstrating early wins, controlling the distribution of tough problems in time and across staff, demonstrating that credit will be given, rebuking in private, and celebrating milestones in word and deed. In addition, confidence in the leaders and in the overall management of the project and the mission must be maintained through honesty, integrity, consistency, and content-rich communication.
Unconsolidated team - Often referred to as a team that hasn't jelled, this is what you have when a team has too many new members, or is largely composed of consultants. People don't know each other, they don't know others' strengths and interests, and they don't have the cohesive personal loyalty that leads to mutual support in tough times or on tough problems. They are also prone to competition rather than cooperation, in-fighting, personality conflicts, and easily take offense at offhand remarks or critiques. The only mitigation that can be undertaken is to build the team explicitly. Just as a manager needs to interview a new team individually to understand strengths, weaknesses, desires, styles, and priorities, the team must also interview each other. This can be arranged in a meeting or series of meetings, through construction of subteams, and, most critically, by ensuring that credit is equally and equitably dispensed and that success in working together is rewarded. Social affairs like parties can have their place, but are prone to the reticence that people who don't know each other will feel. Professional engagement is, by far, the best way to build a cohesive team. The social interactions outside the professional arena will then take care of themselves.
Loss of a key performer - Key performers don't leave when they are happy, though they can be taken out by accident, illness, or family problems. Good morale on the team and among the key performers is critical, but mitigation steps must be taken, including: knowledge management through documentation, and using key people as mentors, coaches, and subteam leads so that their knowledge is spread around. At the same time, management must be careful in dealing with the presence of key performers, especially when those performers are an order of magnitude or more stronger than their peers. Risks here include: overloading the key performer, making the key performers so special that less key performers resent them or are dispirited, not helping peers see a path to becoming a key performer, and worrying the key performer that they will be displaced.
Poor development or debugging styles - On realistically scoped projects, it is important to manage the development of lower-ranked performers whose development or debugging styles are less productive or are actively causing problems. Mitigations include: apprenticeship with mentoring-oriented key performers, gradually increasing focused development opportunities that lead the poor performer to develop better skills and habits, careful attention to reducing cynicism and poor morale in poor performers (because the two conditions often go hand in hand) while allowing them to save face as they lose their cynicism and curmudgeonly ways. Poor performers on high pressure projects, however, are much more difficult to handle. They may be causing problems for downstream component consumers or be damaging the productivity and morale of the team in various ways. Or they may simply be slowing the project as a whole. Mitigations include: keeping them off the critical path, pairing them with a high performer in a mentoring relationship, and keeping them on tasks where the area they lack skills will have less effect on their ability to succeed.
Deathmarch "get it done, no matter how" mentality - Don't fool yourself. Every software professional knows that a deathmarch project is a failure of management. They also know that they will pay, and they smell the panic emanating from the upper levels of the organization. The loss of confidence in management, the hit to morale from the decreased likelihood of success, and the overload of consistent overtime and their own panic will lead to drastically higher error rates and lower productivity. The most important mitigations are avoiding the situation entirely through evolutionary development, reasonable release scope and tempo, and consistent lookahead on the part of management. Still, every project, no matter how well managed, can experience a deathmarch phase, but if morale is good, the other side of the tunnel is visible and stays visible, and carefully selected milestones are celebrated during the deathmarch phase, then a brief deathmarch is survivable and will not result in major and permanent damage.
Premature optimization mindset - Many developers are performance rather than maintenance oriented. They fear their code will be too slow and create all kinds of maintenance (and performance problems) by their guesses as to performance issues and solutions. For this reason, the following mitigations are needed to disarm the performance mindset: architectural provision for performance hotspots, pilot and proof of concept subprojects to assess performance issues, an explicit performance optimization phase in the methodology, and management emphasis on good structure and maintainability as enablers of performance optimization.
Project Management Risks
Perhaps it is not surprising that many of the risks to successful projects come directly from project management. After all, project management sets schedules and timelines and looks to manage risk. But it should not be surprising that a project management bureaucracy can also do the same kind of damage to projects as any other bureaucracy - decoupling a project from its stakeholders, changing direction without understanding the effects on the team and their confidence, and failing to understand technical issues. But there are numerous specific risks and mitigations which work for project management, whether the PM is done by a PMO of project managers or is inherent in the role of architect / development manager.
Poor reporting leads to surprise delays - Reporting project status can be an onerous duty. Requiring status reports on component completion levels by email every two or three days can ease the burden, but also requires that technical people be willing to report when a component is not moving forward. Mitigation: Make sure every team member knows that the messenger will be welcomed with bad news, and that delivering bad news too late is what will cause problems for the individual.
Detailed task oriented scheduling leads to massive PM overhead - People don't have a lot of time to tell others what they are doing, and they also don't want to be constantly interrupted to report status to project management. Using component complete percentage and delivery milestones rather than task percent complete can help. In short - don't drive schedules by detailed tasks, drive them by component delivery. Have an object-oriented schedule, not a procedural schedule.
Excessive coupling between independent development teams - If teams are developing components for later integration, they need early and firm agreement on interfaces, and early availability of software interfaces such as header files. In addition, every team needs to build and provide test harnesses and simulators (which pretend to be a component being developed by the other team). These products need to be built into the time schedule and their presence and functionality need to be enforced. Not only do they help decouple teams, but they also represent valuable debugging, testing, and QA resources.
Too many meetings and interruptions - Any activity has a setup and teardown time that is required to prepare to do the activity and to put the activity aside. There is a point where one is doing almost nothing but setting up and tearing down, and meetings and interruptions can be the worst contributors to this productivity sink. Management and project management need to commit to and enforce the use of asynchronous communications methods like email, allowing people to choose their interruption times and frequencies. They also need to require and enforce periodic status report and issue sharing by email so that the team can maintain cohesive action with fewer meetings.
Meetings too large, poorly focused - Too many people in a meeting with too broad a topic ("so, how are we all doing on the project") are a disaster for productivity. They also destroy the opportunity for management to engage in the critical one on one discussions that make for a strong bond and high confidence, and, in the context of frequent progress reporting convince team members that management is not paying attention. Never have a meeting where a one-on-one or an email will do. Meetings should be focused on decision making or the sharing of critical issues to a carefully selected "communicate to the communicators" population. The items that everyone will need to know to participate in the meeting need to be shared beforehand with all of the participants, along with the purpose and agenda for the meeting.
Too little communication - Developers like to work and not talk. They don't like to document. But without good knowledge management, the project can founder. In addition, the shared experience generated by good communication is critical to the success of the team. Mitigation: Well focused weekly meetings to give credit for successes to individual members in front of the team (this shows that management knows what is going on, and that they will give credit where credit is due), along with an outlook on upcoming issues (this should stimulate freewheeling discussion on how to solve the problems and can lead the management to determining the appropriate composition of obstacle-breaking subteams). It is also wise to find developers who like to document, and to develop the documentation skills of those who don't like to. Also, make sure to create a minimalist documentation. Don't make people document what can be found in the code - make them document what will help orient and focus people who need to understand the work product.
Too much communication (mythical man-month) - Yes, people can communicate too much. If the teams are large, the number of communication paths will increase exponentially. The trap is to try to mitigate this by introducing people to collect and distribute the knowledge. That can work well, but will be a risk itself if the collector / distributor is actually decoupling people who need to talk to each other (often needing to bring out tacit information that is hidden from the collector). Again, the use of email and websites as knowledge management tools can help reduce the overhead while maintaining a high level of interchange, as can focused credit giving / obstacle raising meetings. And make sure people know each other's strengths so they can seek out appropriate resources when they need help.
Not enough knowledge management - Documentation, especially design rationale documentation, often sits on its own little island and eventually is lost (assuming it is ever created). Publication of documentation should have its own track in the project, just like performance optimization. Leveraging print, web, email and presentations not only increases the dissemination of knowledge, but it also offers opportunities to grow team members and develop credit. The risk is that attention paid to the ability to write and present will discomfort those who are more shy or less articulate. Teaming such individuals with more articulate presenters, and ensuring that the presenters credit them in the presentation can help to reduce this risk. Making sure those individuals share the stage with the presenters can reduce their performance anxiety and inspire them to improve that area of their skill set.
Failing to identify the actual customers of the system (who they are may surprise you) - Stakeholder and customer identification is apparently difficult. If you've seen a web project that focused on internal business representatives and never spoke to a site user or potential site user from the outside, you have seen a failure to identify the actual customer of a system. Product-oriented projects also often focus on the buyer rather than the user or vice versa - and those populations can be drastically different. Mitigation: Create lists of stakeholders and roles. Do diagrams to ensure that every important strategic focus of the project has a representative. Make lists of potential resisting populations and solicit representation. Do presentations outside the known stakeholder group to raise awareness. Use any internal or external publications you can access to spread the word about the project and attract stakeholders.
So what is the "biggest risk"? The biggest risk is the one you don't anticipate. The second biggest risk is the one you can't anticipate, can prepare for, but don't. And perhaps the most frightening risk is how easy it is to focus in just one area and not realize the holistic picture of the interaction between strategy, mission, team, customers, technology, timeline, planning, deliverables and success.
Thanks to Mike Schreck for all of our give and take on these issues over the years, and to Erik Scheirer for stimulating me to finally put this to text.