TL;DR: How to Avoid the Dangers of Over the Air Updates
- Some Fitbit users have reported issues after a recent over-the-air (OTA) update.
- However, Fitbit has denied the allegations and stated that the matter is still under investigation.
- This situation highlights the potential risks for any company involved with wireless firmware updates if sufficient safeguards aren’t in place, including:
- Rigorous in-house testing on diverse hardware
- Closed beta trials to validate real-world functionality
- Gradual staged rollout to minimise risk
- Robust analytics to identify issues needing rollback
- Built-in rollback capabilities in case issues emerge after deployment
- We discuss best practices for rolling out remote updates safely, including:
- Following industry best practices for OTA updates can help any organisation maintain customer trust.
- Work experienced specialists to build in protections proactively.
Over The Air Updates in the News
Last month, a number of Fitbit owners claimed they woke up to find their fitness trackers and smartwatches suddenly unusable. A software update released in December 2023 appeared to have gone awry, amid reports of bricked devices and fuming users.
Complaints flooded Fitbit’s online forums, with the update apparently being blamed for issues from rapidly draining batteries to complete failure to power on.
Fitbit, now owned by Google, has refuted accusations that its software update rendered devices unusable. At the time of writing, spokesperson confirmed the issue is still under investigation, and not related to the recent firmware update.
This case underscores the risks involved with over the air (OTA) software updates and demonstrates how vital meticulous planning is to avoid potentially catastrophic outcomes.
Wired vs Wireless Over The Air Updates
Unlike traditional “wired” software updates that require a physical connection, OTA updates allow for wireless delivery of new code directly to devices.
Comparison of Wired vs Wireless Over the Air Updates
Wired Updates | Wireless OTA Updates |
---|---|
Physical connection required | Automated wireless delivery |
Manual user-initiated process | Background updates without user input |
Limited scalability | Scales to thousands of devices |
Testing done by user on single device | Testing needs to cover every scenario |
This automated approach enables the seamless delivery of features and fixes to hardware already in customers’ hands.
However, the convenience comes at a price.
Without rigorous safeguards in place, buggy over the air updates can spell disaster by collectively bricking vast numbers of units.
So, how can product designers and manufacturers avoid a scenario such as the Fitbit fiasco?
OTA updates require:
- A finely-tuned process combining exhaustive in-house testing, closed beta trials, staged rollout, and robust monitoring
- Thorough testing on a variety of hardware configs, which is critical to surface issues before reaching users' devices
- Extensive beta trials, then validation of updates in real-world conditions
A phased rollout further minimises risk, while granular analytics identify problems needing immediate rollback.
Now, let’s take a closer look at best practices for mitigating the hidden dangers of OTA updates.
Planning ahead is key - once an OTA update bricks devices en masse, it's too late
With meticulous oversight and the right protections built in, over-the-air updates can safely provide customers frictionless access to the latest features.
The Potential Risks of OTA Updates
What exactly makes OTAs more precarious than traditional wired firmware updates?
For one, the lack of physical access to devices means updates are deployed automatically in the background without user oversight. This introduces significant risks that simply aren’t present with manual updates via USB or other direct connections.
Remember, without rigorous compatibility testing, over the air updates run a high risk of bricking devices – rendering them permanently inoperable. This can occur when an update ends up being incompatible with certain hardware variants or configurations in the field.
Wireless delivery means such an update gets instantly pushed to every device without discrimination. The result? Potentially thousands or millions of bricked units if flaws slip through testing.
Also, faulty OTA updates can also inadvertently drain batteries by triggering runaway processes. For example, if an update bug causes GPS polling to go into an endless loop, batteries can rapidly deplete to zero. Bugs have also been known to prevent devices from properly entering sleep states, leading to premature battery exhaustion.
Bear in mind that OTAs can also break key functionality by introducing new bugs unrelated to hardware compatibility. From sensors malfunctioning, to loss of data connectivity, to crashes galore – the possibilities are endless when deploying untested code. Malfunctions severe enough can also essentially brick the device, even if it remains powered on.
Compounding these risks is the lack of user visibility or control over OTAs. Updates initiate and install automatically, with no option for users to defer updates or selectively roll back in case of issues.
This takes users completely out of the equation in terms of managing the risks involved.
Best Practices for Safe OTA Rollouts
The foundation for delivering safe over the air updates is exhaustive testing on a diverse range of hardware configs and OS versions. QA needs to replicate the full spectrum of devices an update will reach.
For each variant, test cases should validate:
- Hardware compatibility - no bricking
- Battery life impact - no abnormal drainage
- Feature functionality - no regressions
- Performance - no slowdowns or crashes
- Data integrity - no corruption issues
Unit, integration, system, and regression testing across the board are critical for any organisation.
Tests also need to cover edge cases by simulating flaky network connections, low memory and storage, etc.
This level of exhaustive testing is considered industry best practice.
Image Security and Validation
To prevent corrupted or malicious updates, the OTA system should consider techniques such as checksums or digital signing.
Checksums sent at the start of the update are calculated from the update contents and can be used to ensure that the downloaded image is correct before attempting to apply it.
For additional security updates can also be signed by the manufacturer’s private key, ensuring that only official updates received by the OTA system are ever executed.
Closed Beta Testing
After in-house testing, closed beta trials with a subset of users can provide an extra layer of protection. Betas enable real-world validation of updates in uncontrolled environments.
Testers should represent the full user demographic – spanning geographic regions, tech savviness, usage patterns, devices ages and more.
Analytics telemetry during beta provides vital data on issues experienced by testers. Bug reports can help surface failures missed during QA testing. Limited, closed beta groups minimise risk, compared to deploying untested updates en masse.
Staged Rollout
A phased rollout is another practice used to derisk OTA updates.
Here, the update first reaches a small percentage of users – 1% or less – and then increases over time.
Analytics monitor for anomalies or spikes that warrant pausing the rollout. If issues emerge, the deployment stops before impacting more users.
Ability to Track and Analyse Issues
Throughout testing and staged rollout, granular analytics provide real-time insights into update issues.
Error reporting and monitoring help detect flaws and prevent problematic updates from reaching users.
Analytics guide the decision of whether to continue rollout or rollback.
With the advent of cloud technologies and connected IoT it’s becoming easier for designers to collect detailed information from devices in the field in real time.
This information can also be linked into AI systems to detect and predict potential errors introduced from updates early before they are rolled out to a wider audience.
Built-in Rollback Capability
Despite extensive testing, unforeseen issues can still occur post-deployment.
Having configurable rollback capabilities built-in provides a critical safeguard to revert an update if severe issues appear.
Analytics can inform the decision to trigger rollback across all devices or specific problem-causing configs only.
A/B Deployment
ByteSnap has achieved this in previous work through such techniques as AB deployment, where the software that is running from slot A downloads the update into slot B.
If something goes wrong when the device runs the new software version in slot B it can automatically fallback to slot A to resume normal functionality.
Phased Rollout Process for Over the Air Updates - Flow Chart
Conclusion
There are serious potential risks involved with over-the-air updates, as highlighted by the recent Fitbit situation which remains under investigation. However, with meticulous planning and rigorous execution, companies can deploy OTA updates safely and seamlessly.
Key takeaways
- Do not underestimate the need for exhaustive and diverse testing of updates
- Use closed beta testing to validate real-world functionality
- Gradually roll out updates in phases to minimise risk
- Implement robust telemetry to identify issues in real time
- Build in rollback capabilities to revert faulty updates
Looking for expert firmware update support?
Anthony is a Birmingham-based electronics and software engineer who has been creating bespoke embedded products since 2019.
While studying for his Masters in Electronics Engineering in 2018, he worked at the Aston Institute of Photonic Technologies, developing advanced laser control solutions for the in-house research teams. Since moving to ByteSnap, he has developed a wide range of products, from smartwatches and AI cameras to fluid monitoring and hydraulic control systems.
Outside of the office, Anthony enjoys creating digital art and snowboarding.