The last decade has seen a surge in commercial applications using machine learning (ML). Similarly, marked improvements in latency and bandwidth of wireless communication have led to the rapid adoption of cloud-connected devices, which gained the moniker Internet of Things (IoT). With such technology, it became possible to add intelligence to sensor systems and devices, enabling new technologies such as Amazon Echo, Google Nest, and other so-called "smart devices." However, these devices offer only the illusion of intelligence and are merely vessels for submitting and receiving queries from a centralized cloud infrastructure. This cloud processing leads to concerns about where user data is being stored, what other services it might be used for, and who has access to it.7
More recently, efforts have progressed in dovetailing the domains of IoT and machine learning to embed intelligence directly on the device, known as tiny machine learning (TinyML).10 TinyML has several benefits over traditional cloud-based IoT architectures as the performance of these devices is both latency- and bandwidth-dependent. For example, wireless communication is associated with high power consumption due to the electric current required to amplify an antenna's signal. Furthermore, potentially sensitive data is being broadcast over large distances, opening up the opportunity for interception by malicious actors. In contrast, TinyML can process data on-device, meaning wireless communication is unnecessary. Such offline devices can improve security, reduce power consumption, and reserve communication solely for firmware updates or communicating anomalies.1,6 However, this new ML paradigm is met with similar challenges to the IoT workflow, most notably data privacy and the need for more transparency. For more on TinyML, see Prakash et al. on p. 68.
These challenges are best illustrated through example; consider an everyday use case of TinyML known as person detection—determining whether a person is present within an image. One could imagine having "person detectors" spread about their home to control lighting or other domestic systems. The key data-generating device for a person detector is a camera, the data from which is then submitted to the application processor and run through a neural network stored in flash memory. The output from the network then provides a simple binary label denoting whether a person is present.
With current cloud-based IoT deployments, while the binary output of such a device likely does not contain large amounts of sensitive information, there is no way to guarantee that the raw images collected by these devices are not being harvested and used for other purposes, perhaps unsavory. One could imagine that such images might be used to determine how many people live in a house, their ethnicities, genders, habits, and so forth, all based on data obtained under the guise of person detection. Such factors have evoked concern from privacy advocates, end users, and workers' unions. Unfortunately, this is not an isolated issue, with many newer devices containing microphones or cameras without the user's knowledge.2 This trend underscores the need for improved transparency and the introduction of regulations or mechanisms to protect user privacy and inform users of what data their devices are collecting.
To this end, we propose the ML sensor, a logical framework for developing ML-enabled embedded systems that empowers end users through its privacy-by-design approach. By limiting the data interface, the ML sensor paradigm helps ensure no user information can be extracted beyond the scope of the sensor's functionality. Our proposed definition is: "An ML sensor is a self-contained, embedded system that utilizes machine learning to process sensor data on-device—logically decoupling data computation from the main application processor and limiting the data access of the wider system to high-level ML model outputs."
The underlying idea behind a ML sensor is that an ML processor is coupled to a sensor as part of a single physical entity, separate from the central processor (see Figure 1). Similar to the distinction between user and kernel space in computer science, these two processors would live in fundamentally different worlds. Instead of the central processor having access to all the data, it would only receive the output from the ML sensor. On the other hand, the ML sensor has access to the data but only contains the functionality and peripherals to perform its essential operation, no other auxiliary computations. This way, the raw sensor data would never leave the ML sensor, promoting privacy while enabling intelligence.
Figure 1. The ML sensor paradigm. The ML model is tightly coupled with the physical sensor, separate from the application processor in the ML sensor. This paradigm provides isolation of data-level computation from the wider system, precluding system-level accessibility to sensitive user data.
For our motivating person-detection example, this could be implemented by passing a binary label denoting whether a person is present in an image. This approach would enable a simple hardware design containing only three communication pins: one for ground, one for power, and one for the binary output of the neural network. Sensors that require more than a single bit for output variables, such as for multiclass classification, would require more channels but only enough to cover the co-domain of the machine learning output. Thus, while the ML sensor is best suited for applications with a finite set of model outputs, this still admits a wide variety of possible commercial applications, from voice command interfaces to analog display monitoring.9 Useful Sensors recently developed a person detection ML sensor module7 that operates similarly to our motivating example.
This new ML paradigm is met with similar challenges to the IoT workflow, most notably data privacy and the need for more transparency.
To further promote privacy, the ML learning model could be loaded onto the ML sensor during production without hardware capabilities to enable model updates. Alternatively, secure networking capabilities could be included to ensure the device's firmware and model would still be updateable but would be performed over-the-air and under rare circumstances. Furthermore, the ML Sensor could also incorporate existing privacy-preserving technologies (for example, SGX and TrustZone5). However, most current devices within the scope of tinyML are too resource-constrained to implement such technologies.
The ML sensor also has additional desirable features. The device would be self-contained and modular, allowing it to interface easily with embedded systems. This approach makes the technology more accessible to non-experts by removing the need for hardware, software, and ML expertise to develop ML-enabled embedded devices. These devices could also be integrated into larger, more complex systems to enable advanced functionality while providing users assurance that their sensitive data is protected.
An ML sensor would need to be provided alongside a datasheet that communicates its coree sign to ensure ease of use. This datasheet should include sections seen in traditional sensor datasheets (for example, electrical characteristics), as well as additional sections to outline ML model performance, end-to-end performance analysis, and articulate features regarding privacy, ethical, and environmental considerations, including the dataset used for training, inspired by Gebru et al.3 Such a datasheet could be audited to ensure it meets specific requirements regarding algorithmic bias, device reliability, and performance verification. It would not only provide relevant information to device designers but also help to promote trust between corporations and end users based on added transparency and validation by a third-party entity.
Figure 2 is an example of what such an ML sensor datasheet might look like for our example person detector module. Designing such datasheets requires community involvement and engagement since it affects stakeholders at all levels, including end users, manufacturers, technical experts, privacy advocates, auditing bodies, and policymakers. The ML sensor ecosystem will need to decide:
Figure 2. An illustrative example of an ML sensor datasheet.
We believe the net impact afforded by increasing the usability of ML in hardware applications and its possible positive downstream effects on privacy, security, and transparency will be positive. However, as with any approach, this paradigm has challenges, and the net positive impact relies on developers applying appropriate ethical considerations when designing and developing ML sensors. For example, traditional ML concerns remain, such as model bias, the potential for adversarial attacks, and reduced explainability of the device's functionality. There is also the potential for ML sensors to be exploited for malicious purposes, such as within weaponry or suppressing freedom of speech. Based on these shortcomings, we envisage these devices being used to augment existing systems and not to replace them, especially for safety-critical applications. And perhaps someday, it might be possible to walk into a hardware store and purchase a person detection sensor like one might purchase a temperature sensor today.
The rapid development of intelligent sensors means the challenges described here will occur faster than most anticipate. A privacy-by-design approach is necessary to tackle these issues proactively and promptly. Simultaneously, a dialogue is needed between the general public, corporations, manufacturers, and policymakers, to discern the appropriate level of privacy, security, and transparency to meet the needs of all parties and how these needs might be achieved. The solution may involve developing an auditing system, certification process, regulatory body, or a mixture of these alongside other mechanisms. Care must be taken to ensure user privacy and security are protected while avoiding stifling innovation. There may be lessons that can be learned from similar ethical issues within cloud-based ML, as well as solutions made for embedded ML that may transfer over to the domain of cloud-based ML.
1. Banbury, C.R. et al. (2021, January 29). Benchmarking tinyml systems: Challenges and direction. arXiv.org. https://arxiv.org/abs/2003.04821
2. Fussell, S. The microphones that may be hidden in your home. The Atlantic (Feb. 23, 2019); https://bit.ly/3rlnOoR
3. Gebru, T. et al. Datasheets for datasets. Commun. ACM 64, 12 (Dec. 2021); https://doi.org/10.1145/3458723
4. Holland, S. et al. The dataset nutrition label. Data Protection and Privacy. (2020); https://doi.org/10.5040/9781509932771.ch-001
5. Kuznetsov, E. et al. Securefl: Privacy preserving federated learning with sgx and trustzone. In Proceedings of 2021 IEEE/ACM Symposium on Edge Computing (SEC) (Dec. 2021), 55–67.
6. Reddi, V.J. et al. Widening access to Applied Machine Learning with tinyml. Harvard Data Science Review (2020); https://doi.org/10.1162/99608f92.762d171a
7. Tawalbeh, L. et al. IoT privacy and security: Challenges and solutions. Applied Sciences, 10, 12 (2020); https://doi.org/10.3390/app10124102
8. Useful Sensors Inc. Person Sensor (Aug. 1, 2023); https://bit.ly/3t7fBoL
9. Warden, P. et al. Machine Learning Sensors. (2020); arXiv preprint arXiv:2206.03266.
10. Warden, P. and Situnayake, D. TinyML: Machine Learning with Tensorflow Lite on Arduino and Ultra-Low-Power Microcontrollers. O'Reilly. (2020).
The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.
No entries found