To DOE or not to DOE: The Do’s and Don’ts of troubleshooting manufacturing problems

A tech company was struggling to release its latest product. It had a big problem: the glass on its handheld device had an astronomically high failure rate in drop tests. Stumped, engineers initiated a series of experiments with different variables to isolate the cause of a problem, known as a DOE*. They mounted the glass with thicker adhesive, tested double-stick foam, and even tried a rigid glue. They attempted a several controlled experiments to make the the product either more rigid or more flexible. Nothing was working. With many thousands of units ready to be manufactured and costs mounting by the day, panic began to set in.

DOEs sound simple enough: run tests until you figure out the problem. But they can go horribly wrong, resulting in poor-quality products, cost overruns, and shipment delays. Here’s Instrumental’s five-step DOE process, with our recommended do’s and don’ts for each step to help you avert disaster.

1. Identify the variable to test.

DO: Test enough units.

Engineers are sometimes pushed to try to accomplish their DOEs with the least number of units possible, usually because of cost or scheduling concerns. But you’ve got to make sure you have enough units to create a reasonable, statistically significant result. Scale the amount of tests to the failure rate of the problem area.

As a rule of thumb: if you are validating a solution to an issue with a failure rate of p, you should test at least n = 3/p units with zero observed failures to have confidence (⍺ = 0.05) that you have indeed improved your parts or process. For example, to statistically validate improvement on an issue with a 10% failure rate, you should expect to test 30 units with zero failures!

DO: Keep extremely accurate records.

You might have 20 different configurations and multiple vendors for parts. Record everything that changes from one configuration to another — this will help to isolate variables once testing begins. Even if you have excellent records, component kitting errors are more common than any engineer would like to imagine. Instrumental’s system can provide an extra layer of validation that the right parts were assembled correctly.

DON’T: Test everything at once.

When in a time crunch, a common urge is to throw everything and the kitchen sink at the problem into the same configuration. Test one variable or solution at a time. This minimizes complications and makes it easier to pinpoint the most effective solution. And when testing different configurations, don’t make ten modifications to the design if only one of them matters — the other nine will raise costs and risk creating new issues.

2. Run the right test.

DO: Pinpoint potential problems beforehand.

Think forward to when you’ll have the data from the test — what will the potential results say? Try to identify weaknesses in your test setup before you test. For example, a product can fail a condensation test if it was unintentionally sealed at a high heat and humidity — even if it didn’t spring any leaks. Project what mistakes may be built into a test and adjust accordingly.

DON’T: Game your test results.

At one tech company, an engineer considered throwing out a failing drop test result because the product impacted at a 15-degree angle, instead of head on. Passing on an arbitrary technicality is still a failure. In deploying the water-resistance IPX7 test on a phone, an engineering team could opt to design the speaker mesh in a way that passes IPX7 but fails IPX5. Customers might think that because it passed one, it passed the other — a boon to sales in the short term, but a bust for return rates in the long term. If a problem will hurt your brand, eat the cost and fix it.

Remember, it’s about shipping a good product. Passing a deliberately gamed test doesn’t make a product good.

DO: Validate your test with the real world.

Tests are simulations, but it’s real life that matters. One tech company’s product test involved a robot that repeatedly pushed a button on the unit. Units passed the robot tests, but test units were failing in the field. The failures involved fatigue cracks and peeling only when buttons were pushed at a slower rate. The problem took a smart team weeks to unravel. Test your products in real-world environments, not just in the factory.

3. Build and execute the test.

DO: Err on the side of hypervigilance during assembly.

Don’t underestimate the capability for massive error during the assembly process. Engineers will at times take apart a failed unit and see that it was put together using configuration B when it should have been configuration A. Be on hand to ensure that doesn’t happen — don’t let the possibility of an assembly error hang over your product, or turn a great design into a terrible one. Can’t be on the line yourself to verify each unit was assembled as intended? Instrumental’s system creates an image record of every unit so you can know each unit was built accurately.

4. Review the data carefully and present the results.

DO: Make a slide to justify the decision you’re recommending.

Use some of the raw data. You may have backup slides, but your suggested action should be defensible in one slide. If it isn’t, you’ve got too many holes and caveats in your results — so go back to step 2.

5. Deploy the solution and validate the change at scale.

DON’T: Test changes on a small sample size and call it a day.

Make sure that your change didn’t unintentionally mess up another part of your product by performing on-line performance tests, reliability tests and any other regulatory tests or forms of validation required. Once your product clears those, you can consider the change “made”. Don’t forget to tell the rest of your team about it so they can update any SOPs or fixtures accordingly!

Conclusion

So what happened to the product with the high failure rate in the drop tests? At the end of the DOE process, engineers determined that the best way to prevent the issue wasn’t by changing the adhesive at all, but in stiffening the product’s corners. That insight — and discovering the correct solution of reinforcing the corners — came from executing the right DOE process. First, the team built enough units for testing. They didn’t skimp due to cost. The engineers were then hypervigilant about how the tests were run. No gaming of the results or testing too many configurations at once. Finding the right solution was a challenge, but the well-executed DOE was worth its weight in gold.

* While DOEs have a precise definition of a specific systematic method of determining cause-and-effect relationships, we’re using it here as a shorthand to refer to a series of experiments testing different variables designed to isolate the cause of a problem. This shorthand is something we picked up as product design engineers at Apple, where almost everything has a three-letter acronym, but we have heard it used similarly by a wide variety of our customers so believe this shorthand has now become part of the engineering lexicon.

News, Blog, & Resources

Build Better Handbook

Case Studies

All Site

To DOE or not to DOE: The Do’s and Don’ts of troubleshooting manufacturing problems

1. Identify the variable to test.

2. Run the right test.

3. Build and execute the test.

4. Review the data carefully and present the results.

5. Deploy the solution and validate the change at scale.

Conclusion

Read This Next

Instrumental Taps AI and Accelerated Computing to Speed Server Production With NVIDIA