Whose fault is it anyway: Don’t let fear motivate your engineering process
For me, being a product design engineer at Apple was about managing fear. The wrong design choice or decision could easily cost millions of dollars in units that had to be reworked or scrapped; an unexpected issue not discovered or resolved early enough could delay a product launch enough to move global markets. In a small company, the ramifications are still severe — missing Christmas can kill a consumer electronic company, or a bad design decision can necessitate a 6 month hold while more funding is raised. Regardless of company size, hardware engineers are given heavy loads of responsibility over both product and outcomes.
How fear motivates your engineers
The Apple I experienced did not have a blame culture — but with its famous DRI system (Directly Responsible Individual), it was always clear who was “on the hook” for a given part, feature, or solution. Within my first couple of months at Apple, before I was given any real responsibility, I saw my peers go through a grueling late-development issue. It made an impression. From then on, I wanted to be sure that any decision I made was defensible — so much so that for years I had a folder on my desktop filled with “just in case” one-page Keynote slides summarizing the data that defended the decisions I had made. That folder of Keynotes was all about managing my fear of blame.
Fear is a very strong motivating factor for many engineering teams. Fearful engineers won’t take worthwhile risks, or will under-promise what’s possible — killing the potential to innovate. In most organizations, it is not okay to be the one responsible for “gating the build” or “holding the product back” — and so teams make choices that optimize around those parameters instead of other important ones like user experience or building a best-in-class version of a feature. Specifically for mechanical engineers, fear can lead to silly design or drawing choices. “Fear tolerancing” is the practice of publishing part drawings with ridiculously tight tolerances (e.g. +/- 0.03mm on every feature). While in most organizations, these tolerances will be challenged and loosened to reasonable levels, sometimes they never are — causing higher fallout, increased part cost, and reduced margins (or at the very least, expensive yield studies to open the tolerances back to reasonable levels).
Fear of blame can motivate design choices, but the most dangerous place it can play out is by clouding truth in the failure analysis process. There are often costs associated with something going wrong, or the changes needed to fix it — and those costs, combined with simple engineering pride, create conflicting motivations for the multi-party system (product company, manufacturer, parts suppliers) that can create significant delays in the failure analysis and corrective action process.
Three common problems and one enigma
Development and mass production issues come in several flavors. Vanilla problems are straight-forward and often have straight-forward solutions: fingers bend antenna elements, so a fixture to protect them eliminates the bending. Slightly more difficult are intermittent issues — where the same unit can pass and fail when it’s tested multiple times. The canonical example is a solder joint crack, where broken leads can sometimes be connected and other times not, especially during and after thermal testing. Layered “Onion” problems are some of the worst because their true form is usually not discovered until late in development. An Onion problem presents one way, but then as you are trying to fix it, you realize there’s another underlying problem. Sometimes there could be four or five layers — and as a result these can take weeks or even months to fully resolve.
The last type of problem is the “Enigma” problem — one that defies understanding and therefore cannot truly be fixed, only mitigated to reduce its impact. As an engineer, it pains me to admit that such a problem is even possible: everything has cause and effect, right? But there are limitations to what you can control and what you can measure in every process, creating the possibility of an enigma. In my seven years in this space, I’ve only experienced one enigma problem: a reliability issue that occurred on assemblies made from parts from certain cavities of certain injection molding tools. Try as I might, I could not correlate the failures to any of the hundreds of dimensions on the well-controlled part drawing. In the end, there was no root cause, only a mitigation: each cavity had to be qualified separately.
Getting lost in the root cause spiral
All of these issues stem from several standard root causes:
- Workmanship: human error, sloppiness, or poor training
- Part Quality: incoming cosmetic, dimensional, or functional defects
- Process: incorrect parameters, settings, or order of operations
- Design: even if all of the above are done well, it doesn’t work; often results from not taking process, quality, and/or workmanship variations into effect
Once a root cause is identified, its type determines who the DRI is to fix it (and often who pays for the remedy). Part quality issues are the responsibility of the upstream supplier; workmanship issues are (usually) the responsibility of the contract manufacturer; process issues can fall anywhere between the product company and the contract manufacturer; and design issues are the product company’s to resolve.
At this point, it’s not just about simple blame anymore: money is involved. The politics can get fierce, creating bias in the failure analysis process. If the engineer doing the analysis is from the product company, most likely she will want to eliminate all potential root causes first, before admitting a design issue (I think this is more human nature than politics). If a factory engineer does the failure analysis, the conclusion may be that the design makes a reliable mass production process infeasible. In this blame game, both sides lose: a lot of precious time is wasted trying to defend or push a politically motivated position, instead of really getting to the bottom of the issue.
Let data be your guide
The best way out of the accountability spiral is to put all focus on the data. As engineers, we know that data-driven decisions are the best — and yet, when we get tired after days in the factory fighting fires, sometimes we need a little help.
Instrumental customers use our system to collect critical data throughout the development process. When an issue occurs, they often already have what they need on hand to eliminate one or more potential root causes right away. For example, one customer noticed what appeared to be poor quality solder on a unit that failed a reliability test. Instead of pointing their fingers at the contract manufacturer (who had done the Surface Mount Assembly process), Instrumental data gave them a more complete picture of what all passing and failing units looked like, making it clear that soldering was not the root cause.
If you don’t have the data you need on hand, our advice is to prioritize getting what you need to verify you are actually working on the right root cause. That may mean doing more teardowns, inputting additional units with controlled variables, or taking new measurements. This process can be costly in terms of time and money — but taking a shortcut can make the situation worse.