“Running the gauntlet” refers to a ceremony of passage, trial, or even execution wherein a person is forced to walk or run between two lines of people who are usually armed with clubs, whips, or other weapons. The people in the lines are then free to strike at the person running up the middle. Except in those cases where the intent is execution, if the person running the gauntlet makes it to the other end, he is deemed to have expiated his crimes (in case of a trial) or worthy to join the appropriate group (in case of passage). If the person collapses without making it all the way though, then he might be sent back to the start to run the gauntlet all over again.
This is a useful metaphor for testing software. The idea behind software testing is to determine that the software has sufficient functionality, reliability, and performance to warrant putting it into production (actual day-to-day use). If changes are made to the software, then typically it goes through regression testing, that is, it is made to run through the testing gauntlet (or an acceptable subset thereof) all over again.
Many of the reports about the Healthcare.gov website and the associated back-end systems that have come out in the past several weeks indicate that testing was late, minimal, and unsuccessful. Sadly, that just won’t work. As I’ve written here before. software is unforgiving, and no amount of good intentions can make up for that.
To better assess what testing should have been done on the Healthcare.gov software, here is a slightly-edited version of a table I created for a review of an IT project of comparable size (four years in, $500M spent when I led a team to review it). The first column identifies the test type (including subcategories within that type); the second column explain why those tests are to be done; and the third column explains the possible consequences of not doing those tests, or not doing them sufficiently.
The question to ask yourself as you look through this list is: given that we know that the groups building the Healthcare.gov systems did inadequate performance and security testing — both areas that we can agree are rather important — what other tests did they skimp on?
|Test Type||Benefits of doing||Risks of not doing|
|Unit||Earliest means to flush out defects in actual written source code; can reduce time and costs of all subsequent tests.||Instability and lack of progress in subsequent testing efforts.|
|String||Ensures that low-level components (classes, frameworks, packages) work together appropriately; again, early means of flushing out defects; can reduce time and costs of all subsequent tests.||Instability and lack of progress in subsequent testing efforts; highly inefficient ‘thrashing’ cycle between integration test and development (component or system test candidate shuttles back and forth between development and test).|
|Smoke||Saves time for higher-level tests by ensuring that system under test meets at least a minimum level of quality.||Repeated false starts for complex testing efforts due to missing or broken functionality.|
|Conversion||Not optional (for systems with legacy data). Verifies that legacy data has been correctly transformed, and that system in development can read and work with that data.||Production failure. High risk of production failure, serious data errors, lost partners and customers, financial loss, civil, regulatory, and criminal liability.|
|Performance||Not optional. Ensures that system in development can operate under real-world load conditions (volume, stress, time limits).||Production failure. Systems that work fine in development may fail to scale up to production demands; may be unable to complete tasks within time window; user response time may be too slow; system may freeze up.|
|System Integration (Error & exception handling; GUI; link navigation; reports; logging; scheduling; end of day; certification; data integrity; reliability)||Not optional. Ensures that system in development can interact appropriately with other systems, environment, other systems and components.||Delivery failure. Inability to set up a configuration of developed and existing/custom components that can do real-world work.|
|Functional (computational, algorithmic, and workflow accuracy; functional / business rules; positive & negative path; legal & regulatory compliance)||Not optional. Ensures that system in development matches requirements and other constraints.||Delivery or production failure. Attempting to put a malfunctioning and/or insufficient system into production.|
|Interface (positive & negative path)||Not optional. Ensures that system components can interact with one another, and that they can interact with external systems and data files.||Delivery or production failure. Test thrashing due to interface mismatches; inability to assemble a working production candidate; data and/or processing errors and failures during production.|
|Security||Assurance that desired security model and levels are implemented and working.||Unauthorized access to system in production. Inability to guarantee confidentiality, integrity and/or availability (CIA) of system data and functions.|
|User Acceptance||Pre-production verification of correct implementation; advance buy-in by end users of new system; advance training of new users; additional testing.||End-user resistance, hindering or blocking adoption of new system.|
|Failure and Recovery (handle unavailable systems, full failure, partial failure)||Greater reliability and robustness in production; greater ability to recover from business interruptions.||Fragility of system to internal and external factors; loss of critical processing and/or data in event of failure or crash; inability to do business recovery in event of disaster.|
|Production Readiness||Not optional. Assurance that system under test will likely function acceptably if put into production.||Delivery or production failure. Inability to successfully put the system into production; having to pull the system out of production (and switch back to legacy systems).|
|Regression||Increases forward progress in development and testing; fewer iatrogenic defects; fewer delays and reworks.||Development/test thrashing; re-emergence of previously closed defects; slower decline of find/fix ratio.|
The right-most column describes many of the symptoms observed (directly or indirectly) not just for the Healthcare.gov website but with regards to the back-end systems as well. You can also use the left-hand column to sound knowledgeable about IT testing and — should you have the chance — to ask uncomfortable questions of those working on Healthcare.gov.
As I’ve said before, good intentions do not matter in building software any more than they matter in building a jetliner or a suspension bridge. If you fail to do the proper engineering and requisite testing, all three will fail. ..bruce w..
About the Author (Author Profile)Webster is Principal and Founder at Bruce F. Webster & Associates, as well as an Adjunct Professor of Computer Science at Brigham Young University. He works with organizations to help them with troubled or failed information technology (IT) projects. He has also worked in several dozen legal cases as a consultant and as a testifying expert, both in the United States and Japan. He can be reached at firstname.lastname@example.org, or you can follow him on Twitter as @bfwebster.
Sites That Link to this Post
- News of the Week (November 3rd, 2013) | The Political Hat | November 3, 2013
- Obamacare and the Potemkin Website : And Still I Persist… | November 16, 2013
- Obamacare and the Bursting Dam : And Still I Persist… | November 25, 2013