Causes of build delays and how to prevent them
Since we started Instrumental, I’ve had nearly 100 conversations with product design engineers and hardware company executives about what they find painful about development and early production. Development schedule delays are a recurring theme for good reasons: they cause lost revenue, higher costs, and intense stress for everyone from junior Product Design Engineers to the executive team.
Causes of delays
Delays can be caused by many reasons, but there are four main themes:
No matter how good your team is, design changes are inevitable. Hardware development is messy and there are always unanticipated issues. At Apple, we used to joke that a Proto build could be considered a success if all of the parts fit together and we got the Pin 1s on our connectors lined up. Even if we didn’t make those mistakes, there were always many things that needed to be fixed, like a performance variation we didn’t expect, a mushy button, or a devastating reliability test result. The fixes sometimes involved a simple process or SOP change at the factory, but often required a painful PCB spin, tool modification, or new parts.
Finding the issues that require design changes is a big part of the schedule battle. The time between ignorance and issue discovery needs to be minimized as much as possible to maintain flexibility. For example, early discovery of related issues could enable multiple changes to be rolled into a tool modification. Today, product companies send engineers to the factories during development builds to improve their chances of discovering issues early. They are relying on being in the right place at the right time to find them. This strategy is an inefficient use of engineering time — and this is one of the first things Instrumental set out to change.
Parts suppliers and manufacturers sometimes run into trouble producing at the expected quality or rate, resulting in impact to build schedules. Suppliers want to do a good job and deliver quality work, so it can be common for them to try to fix the issue on their own without clearly communicating schedule risk to the broader team. Cherry-picking is a good example that can blindside an engineering team: the supplier may make 5000 parts to deliver 50 and doesn’t communicate that the yield is 1%. It won’t be possible for this vendor to deliver 500 to the next build, but sometimes the team will not find out until the parts do not show up. This lack of transparency prevents the team from being able to seek alternatives to keep the schedule on track.
When issues are highlighted early enough, engineers can be dispatched to the supplier sites to work on the problems together. If design changes are needed, they can often be kicked off then and potentially still make the build. Plans can be put in place to bin or sort parts in order to get through the build and buy more time to fix the issue. When the issue is not highlighted, engineering is forced to take the bad parts and limp through the build, which can put the engineering validation of the product into question since the system was never actually tested with the “real parts.”
While it varies from industry to industry, consumer electronic products are plagued by highly optimistic or “line-to-line” schedules. Regardless of what you are building, there is a market window to release the product to maximize return for the company: Dads and Grads, Back to School, Christmas, etc. By design, these schedules generally assume that everything will go well and do not account for iteration. When the inevitable hiccup arises, the team will have to adjust by de-featuring the product, spending more money to expedite, or delaying. If the team is under too much pressure to hit a schedule, they may start cutting away bits of their standard process, which can cause the project to suffer and actually end up creating more problems or issues that will need to be solved later on.
Failure analysis and triage
The failure analysis process itself can be a place where significant time is lost. For any issue, there are often several potential root causes: part quality, workmanship, process, design, etc. Each of these has to be systematically proven or disproven in order to make progress on the issue.
In development builds, failures are expected, so product engineers are often on-site at the factory to look at failures and assess root cause. Others rely on their contract manufacturers or joint development partners to do the failure analysis for them. Really, it comes down to numbers: in an EVT build, it would not be strange to build 300 units and have over 100 failures for one item or another. In a standard process, each of these will be retested to confirm the failure, disassembled, and photographed. Sometimes, even with a pile of units to dissect, the engineer does not have all of the data he or she needs to fully prove or disprove a root cause, and additional experiments are suggested as a next step to collect this information. Engineers are notorious for requiring evidence disproving all other possibilities before they will admit that the design may be at fault and needs to be changed. As a corollary, the factory is often unfairly pegged as the first culprit, no matter the issue, and has to spend resources to prove that their process is not the root cause. Even without politics, this process is not fast -- so the heaviest yield hitters are addressed first, and sometimes in development the onesie-twosie failures are ignored altogether. In mass production, speed is even more important because in the blink of an eye you can have thousands of units in a bonepile. Fast triage is necessary to keep the line running, and in the worst case, an engineer may have to travel to the factory to inspect, evaluate, and disposition units in person.
Preventing build delays
Since engineers need data to make decisions about the changes needed, the key to preventing build delays is to give them immediate access to the data needed to both identify an issue and to find its root cause. This means more measurements. But what to measure?
The Instrumental system is built around photographic data because photos have a key advantage: you do not need to know what you will want to go back and look at when you take the photo. We created a system that captures high-resolution pictures of each unit at key stages of assembly, and makes them immediately accessible to your team, wherever they may be. Our software makes it easy to sort, review, and compare images and numerical data.
Instrumental provides a full virtual history of every unit built so that engineers can review each unit in its full context, speeding up or even preventing design changes. For example, recently, an Instrumental customer realized that a radio chipset needed an EMI shield can where there wasn’t one before. Unfortunately, the surrounding parts were not necessarily in spec, so the space available in the CAD wasn’t necessarily available in real time. With retroactive measurements enabled by Instrumental — with no disassembly required — the customer was able to spend an hour to confirm that there was just enough space to squeeze in the can. Without Instrumental, they would have faced a gut-wrenching radio chipset rip-up and PCB re-route — an eight day delay at least.
Having a data record can also make supplier and part quality issues easier to find and diagnose. For example, an Instrumental customer noticed that the wires in some units had been improperly routed near a camera module. As it turns out, their factory partner had received wires that were longer than spec, and so routed them differently to keep the line running. The factory didn’t highlight the issue because these extra-long wires hadn’t caused any functional failures, but the engineer who discovered them was concerned about a risk to long-term camera reliability. The customer was able to identify all of the affected units and rework them before they were shipped out, potentially preventing field failures and returns that could have brought down the line.
While Instrumental cannot change an aggressive schedule, we have supported customers in their efforts to stick to aggressive shipment dates. In short, actionable data enabled faster learning and iteration cycles for those customers — and every saved hour adds up. In addition, the ability to find and root cause issues remotely puts more eyes on the units, increasing the chance that issues will be found earlier.
Finally, Instrumental can either prove or eliminate potential root causes right away. For example, a customer found a finished unit with a cosmetic issue that could have come from molding, incorrect resin selection, chemical use on the line, or improper testing. The customer had used Instrumental to collect images of those cosmetic parts entering the line, and the defects were clearly visible. In minutes, the customer engineer had a failure rate and a correlation to a specific cavity and lot code. He followed up with the part supplier and moved on to other work. It wasn’t a line issue, a design issue, or a testing issue — and no one on the team had to spend time collecting the data to prove it, because Instrumental already had.