The biggest bugs in software history

We all know that software bugs are bad. But how bad can they be? Here are three of the bigger bugs from software history.

The most expensive

Ariane 5 Flight 501

June 4, 1996 the very first Ariane 5 rocket launched. But it began to disintegrate only 30 seconds after launch - slowly at first and then with a final explosion. In this case, there was a bug in the guidance code which allowed vibration to cause it to misread a variable. Simulations to find the cause of this showed that in the rocket’s software (which came from Ariane 4), a 64-bit variable with decimals was transformed (cast in tech speak) into a 16-bit variable without decimals. In the 16 bit world of Ariane 4’s operating system a variable can only have a number between −32,768 to 32,767 yet for a 64 bit variable that range is the huge range of -9,223,372,036,854,775,808 and a maximum value of 9,223,372,036,854,775,807! These variables, taking different sizes in memory, triggered a series of bugs that affected all the on-board computers and hardware, paralyzing the entire ship and triggering its self-destruct sequence.

A very expensive bug at $370 million price tag. You can imagine the stress the software and QA team must’ve gone through after this super expensive fireworks. We have the video that shows the effect of the bug though…


The deadliest

The patriot missile failure

In February 1991 an Iraqi modified Scud missile hit the US base of Dhahran in Saudi Arabia, killing 28 American soldiers. This was not supposed to happen as the base was protected by a super sophisticated anti missile system called the Patriot. But a software bug was what made this happen which translates to a delay of  ⅓ of a second after 100 hours - about the time of running for that disaster. A 0.33 seconds doesn’t sound too big; but for a radar that follows these fast moving (1.5 km per second / 0.88 miles per sec) missile, this translates to a 600 meter error. Enough for the anti-missile-missile to miss it’s target and for letting the Scud do its damage.


The most fun

Windows 98 presentation BSOD with Bill Gates

This actually happened! And the best thing is that we have a video of this happening live. You can’t have it better than this.

A nervous-looking Chris Capossela, then chief marketing officer at Microsoft, plugged in a scanner into a Windows 98 machine and Mr. Gates was just beside him smiling. Their plan was to show how easy it is to just add hardware into the new Windows - the famed plug-and-play abilities of Windows. And boom we had a gem of a Blue Screen of Death (BSoD), and a priceless moment in bug history. Mr. Gates pulled another wonder with line that went: "That must be why we're not shipping Windows 98 yet!". Here’s the video for your viewing pleasure.

Living with bugs in released software products

Bugs are the reality of software. We hate them, we do everything we can to get rid of them but they are always there and we have to learn to live with them - that is we have to manage the bugs and fix them in a sane way. This is important. Actually trillion dollar important. in study done by the Austrian software testing firm Tricentis, software failures cost the worldwide economy over $1 trillion annually!

The same report also found that software failures have caused more than 315 years of lost time and have affected approximately 4.4 billion customers. Software failures also have a massive negative impact on the reputation of companies. The companies surveyed by Tricentis lost an average of $2.3 billion of shareholder value just on the first day after announcing a software failure. No wonder that so many companies keep quiet about bugs. Yet that is probably the biggest mistake. The trick is to learn live with the bugs as a reality and have a practical and realistic process for managing bugs in live products. Today’s post is an attempt to summarize some of the strategies we found as the best for managing bugs on live products.

Accept bugs

how to live with software bugs

This may sound obvious, but you’d be surprised how many companies treat bugs as something totally unacceptable and creates all kinds of penalties and punishments when bugs start surfacing on a product. This behavior comes from not understanding the nature of software products. Yes, the goal is always to have a bug free software release, but the reality is that there will always be some. So the goal should be:

“Release with as few deadly bugs as possible”

Accepting bugs opens up the possibility of a realistic software development process with healthy bug identification and resolution strategy. So this is the first step - for the management and for the technical team leadership.

Make finding and reporting bugs easy

Real life story:

A group of security researchers were prepping for a major reveal in 2013: They planned to disclose at a D.C. cybersecurity conference how a security flaw in luxury vehicles could let bad guys break in without keys and start the cars.

But Volkswagen stopped them, winning an injunction in a British court after arguing that publishing a paper detailing the problem would "allow someone, especially a sophisticated criminal gang with the right tools, to break the security and steal a car,"

https://www.theguardian.com/technology/2013/jul/26/scientist-banned-revealing-codes-cars

After you’ve accepted that there will always be bugs the released products your next sane thing is to setup a process and workflow for identifying and resolving bugs. It’s just crazy how many software companies out there try to hide their bugs or try their best to make it difficult for people to report them. There are literally hundreds of known cases where large companies tried to stop reporting bugs in their software - just making their users suffer even more.

This line of thinking is stupidity defined. You just cannot stop people from finding bugs. What you should do is make it easy for people to find them for you. Have easy ways for reporting such bugs - such as customer support emails that can easily turn into bug reports, a bug reporting page like Facebook’s Report a Bug feature.

On your development team’s side also have the tools for keeping track of all bugs that are being reported - both by customers and your QA teams. Tools such as Jira are absolutely essential, just as important as the software development tools or issue trackers when the actual software was being developed.

Involve customers in the bug finding journey

This needs to be a management and marketing level decision - involve your customers in the bug finding process. Ask them for feedback, have formal ways of checking back for issues, reward them with freebies when they find bugs. Customers are the ideal QA of a software, they use it everyday and they use in ways that your dev team would not even think about using a software (e.g using MS Word for image editing or Power point to make web pages!).

Microsoft has a thing called bug bounty which literally inspires users to spend time to find bugs and be rewarded for such efforts. Google has it’s various reward programs for finding defects, these are all examples of far sighted companies who have not only accepted bugs in live products and made it easy to report - they are actively involving the customers in their journey to find and ultimately fix those bugs.

Have workflow for updating users about bugs

As you accept bugs as reality, your wording and communication will let your users also accept them and make them expect you to come up with fixes as soon as they are found. They also expect you to be mature about bugs found in the current system and have a place to know about such bugs and the status of those fixes. User forums are great for such things and modern ones like Zendesk or Uservoice and many others like them do a great job of updating users about bugs, getting their feedback and letting them know when things are fixed. They have also the feel of user communities and can lead to bigger and better things such as users helping each other out and users forming their own groups to teach each other.

Create a culture for celebrating finding and fixing bugs

Finally, the most important thing of all - within your company and your development team create a culture of celebrating the finding of each bug and fixing them. Some companies make the fatal mistake of making the dev team feel bad each time a bug is found - this creates the negative emotion that takes away the enthusiasm and spirit the team needs to polish off a released product over time. To a dev team the release product is their own creation - any bug would sound bad to them anyway, it defeats the purpose if the company culture is to make them feel worse. The culture should be that bugs are unfortunate yet inevitable - but once found how fast they are resolved and an updated software is released is the true sign of a great software company.

Trust me on this, after more than two decades in this bug creation, I mean software development business, this culture fix is the biggest thing that differentiates a great software company from the others.

Usability testing: How not to strangle your customers

usability testing.png

Picture this: your website has a section for the charging plans for car rentals. The page looks awesome, the letters are bold and colorful; the buttons are functioning as they should. Yet, Customer retention is falling like Corona beer share value. And Poof!  your competitor has overtaken you. And your investors are shaky all of a sudden.

Scary, huh?  What if I told you there is simple test that could have saved you? A test which could have warned you about the visitors’ discomfort about the entire process. Perhaps, they are getting lost half way through their buying journey. Seeing varying rates when choosing the rentals and feeling suspicious. Maybe the wrong words are bold and the colors are causing them to shift to night mode. These and other such concerns are the focus of a class of testing call the usability testing.

The importance of usability is known and felt to those who are competing in the highly contested market. While it is expensive and time consuming even for other QA testing processes but people come to realize usability plays crucial role to attract or retain customers only too late. It’s surprising how much of the software world is largely unaware of this and does not make their clients aware of this important aspect.

This is a quick guide to approach the world of usability testing.

1. Consider the business of the project:

Usability is essential to every product. That being said in the out sourcing market usability makes more sense for some particular projects than others. For example: for client facing projects have crucial importance for usability testing while business to business products might want to explore the aspect at a later stage. The right people will benefit from integrating this test in even the pilot as first impression lasts longest.

2. Adapt the process to Sprints:

In the agile development process is fast and do not take any prisoners. Therefore the tests should be tied to the specific Sprints. It will also help to minimize and control the number of tests that need to be conducted. As the per the sprint goal the project manager can assign the tests. This will ensure that the development process is not hampered.

3.   Bake it into QA and UX:

Usability testing is a UXR tool that is used to determine the UX experience. It can be purely an UX task to help improve change and adapt the system. The task can be distributed to the QA team and involve UX team as audience. It can also be vice versa if the task is considered a QA task.

4. Clear deliverables:

There should be a clear deliverable for the test that should be made clear to the client. The test should result in providing a qualitative assessment of the application through a metric. The metric should contain clear scores which are understandable to you.  Also the tests should produce usability issues that should be tracked into the issue tracker. The issues should be streamlined into the sprints.

5. Scale according to budget:

Usability testing could be conducted remotely or on site. It could include 20 people in groups of five or surveys in field. According to the need of the project the companies should offer the plan to find out what is essential and how to acquire it. If the budget makes sense only the test is possible.

Usability testing done well can give you the edge just like performance testing. Be sure to consider it before plunging head first into the ocean of startups; make sure your life jackets have no holes in them! Here’s a stolen Dilbert to make you smile :)

Dilbert-usability.jpg

5 tips to stay sane with automation testing

Automation is great in the SQA world, but if not managed well it can get really messy. So how do you stay sane?

If one manages to land a sensible product manager who understands the many felicities of developing a business critical UI automation system it is incumbent upon you, as the “SQA guy” to try to do it the best way possible. So here are some road signs I picked up while navigating this cool neighborhood:

5 tips for sane automation.png

1.  Use the best Unit Test module


I started off with the well trodden and well documented path of Python’s default unittest module. The module is very well adjusted to the Object Oriented ways of the UI Automation. But upon looking into the wonderful pytest and its many wonderful features, (fixtures, smart test grouping, excellent hooks and plugins) I had to go with the small learning curve it accompanies. It was worth the time and trying out these modules before starting the project saved a lot of time.

2.  Break down the page object methods


The solution should follow a simple principle. Leave the testing to the test methods. The dilemma is what the functions mean in the Page objects. Well ideally it should not think in web elements but it should think in what it actually does. Like does a function might be used to log in, and not always sending the password to the web element. Sometimes a function may just mean a lot of complex things done in a single function and it makes sense until you are left with huge heavy functions with many dependencies. So you have to find a balance and make it as granular as possible. A thorough understanding of the business is essential to do break this into sensible chunks.

3.  Xpaths but smart


So the saying goes Xpaths are prone to breaking. Which it does. Copy pasting the full xpath will get you fast result but will leave you with flaky code. So if time permits you have to construct and spend some time on unique CSS attributes, Name and Ids to construct good Xpaths (Or use those attributes by themselves). The use of recorders maybe helpful in granular cases. Relative xpaths are quick and better solutions as the time is constrained. You have to find the balance to using these and not be shy to provide the full xpath when its time. But you should definitely know when that time is.

4.  The case for using a lightweight IDE


As the code is repetitive and the coding pattern is relatively simple one might make the argument the lighter and much modular IDEs like VSCode and Sublime text might be given a go. The Modular nature and the plugin strengths of the crowd favorite VS Code makes it an obvious choice but the lightness of Sublime text might just suit your style.

5.  Write clean code


Just like any other kind of project it should follow the basic Clean coding tenets. But especially true for aforementioned Single responsibility principle. Avoiding code duplication is also pretty important. And what I found was particularly important was proper use of Comments. You can literally follow the software business to create a cogent story for the automation to make sense.  It could be done even the untrained eye could make sense of the code.

 

The Quality of Software

fix your software now.png

When Oneplus co-founder Carl Pei appeared in a video with American Tech guru Marcus Brownlee, he showed off the prototype of the Oneplus Nord design, spoke honestly about the choices they made and broke kind of a fourth wall between the accessory makers and the consumers of the brand. Among the many aspects they discussed one hit me home, the fact that they are cutting about forty dollars’ worth of testing to get the ip68. This seems to have become the norm in the outsourcing industry. The customers want an Airbnb looking site and they can get it affordably.  It is a good deal. But just like the Logo, homepage or product catalog is important to the business, similarly important quality aspects which may make or break the business remain less understood and not well communicated from the provider’s part. It is because the frugal entrepreneurial handbook guides the business to be economically lean from the beginning and the provider is in a competitive market that often prioritizes agreement over proper counseling. There is a case to be made for quick business solution but as an engineer concerned with quality nearly half a decade it is incumbent upon me to try to inform both the solution provider and the valued customer about the pitfalls of the treaded path.


Sacrificial Testability

The first victim of removing the quality concern is the architecture is then designed with testability in lower priority. Even for the moderately tech savvy customer it will seem like a matter of no concern and it will therefore be reflected in the quotation. But what they should know is that it is the first step for the application is taking towards failure in case of success. It means for them that the application while look as pretty as any of the best looking sites but will not preempt failure in case the business application is as successful as it hopes it would be. Succinctly said, software written with non-testable code means that, failures/bugs will be an ever increasing problem and will reach the threshold only when you know enough to rue the problem.

The Performance Quicksand

A successful delivery of a Share market live information site, a betting app or a simple e-commerce site is something the valued customer would accept, applaud and give a glowing recommendation too. It certainly checks all the boxes of the SRS. But then it might be outdone by the competition, failing to retain users or get user reviews concerning slowness that they can’t seem to put a finger on. A performance engineer would point out the fact that the average response time is of the Share Market info is way slower than the competition, or the betting app is becoming unresponsive at a certain peak time or the e-commerce sites various modules are performing inconsistently. He might plan out a way to identify these issues before the development phase and help the dev team solve the issue when they arrive. It might not seem like a good investment to the entrepreneur looking to cut development cost but it would be incumbent upon the solution provider to inform the hidden costs to actually performance tune after it has been launched. Which, I have to add, is a lot more than addressing them in the first phase.

sqa post.png

The Tested Software Fallacy

Again with the idea that software development should be in tuned with the business, I will propose that not all of the features have critical business aspects. For example a E-commerce site can live without the Wish list feature but will sure face huge losses if the purchase feature is broken or the search is not returning the right results. Yes, the manual tester might be expected to catch these errors but at the time the product is in production it might be too costly to handle. This is particularly important when a live product is under maintenance or is getting enhancement. So an experienced QA would suggest a business critical (risk factor driven) UI automation process to be integrated in the software development cycle which basically ensures that the new toy doesn’t choke the baby.

Hypothetical usability

Those with better accumulated focus group data could easily identify the core desire of the customer and use it against the next competing product. User Acceptance Testing does something similar; it gives a reaction from the end users about the site. In the book “Hooked”, Nir Eyal points out the importance of returning customers. Plenty of Software with technological advantages are failing because of the lack of emotional response from the user.  Even in the case of blue ocean market ideas every business not getting the edge by collecting real user based data will be outdone by someone who is going to use it. But the customer often overlooks the fact that this kind of testing might be way cheaper than to hire a focus group while the business is running.  What they don’t know might just hurt them in this case.

In the end Oneplus did what they do best to break into a saturated market, they did it not by doling out fancy new technology but assuring quality where it mattered. The part Pei was mentioning was actually a smart move considering they prioritized the software experience over durability which makes the device popular. It took some time but they edged out their place in the pantheon of high end smart phones. By making informed decision of assuring quality of the business critical aspects of any solution both the software solution providers and the potential solution seekers will be benefited with better business outcome. That I think is the writing on the wall we finally need to see.