Edge ML

Putting intelligence on chips small enough to get into disconcerting places

2016-10-14 — 2023-08-14

Wherein the practice of running compact, quantized neural models on microcontrollers and single‑board computers is described, and deployment challenges such as low‑precision arithmetic and toolchains like TensorFlow Lite and ONNX are adduced.

bounded compute

edge computing

machine learning

neural nets

sparser than thou

when to compute

The art of doing ML stuff on small controllers, such as single board computers or even individual microcontrollers, or some weird chip embedded in some medical device. A.k.a. edge ml,¹

This does connect to the ideal of bitterology and tokenomics; when do you do the compute? If you want to do compute in something in someone’s pocket, you are dealing with the edg ML situation.

1 Making neural models small

Obviously, if your target model is a neural net one important step is making it be as small as possible in the sense of having as few parameters as possible.

2 Low precision nets

There is another sense of small: Using 16-bit float, fixed point, or even single bit, arithmetic so that the numbers involved are compact. TBD.

3 Multi-task learning

Training one learning algorithm to solve several problems simultaneously. Probably needs its own page.

4 Tooling

Tensorflow appears to have intimate microcontroller integration via TensorFlow Lite for Microcontrollers.

Browser ML in particular has some quirks.

Other frameworks convert to intermediate format ONNX which can be run on microcontrollers, although I suspect with higher overhead.

Deploying on the Edge With ONNX
Introducing ONNX Runtime mobile: a reduced size, high performance package for edge devices
Deploying a PyTorch MobileNetV2 Classifier on the Intel Neural Compute Stick 2
jomjol/AI-on-the-edge-device implements an image AI network on an ESP32 device
Apache TVM

Apache TVM is a compiler stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end-to-end compilation to different backends.
Apache TVM Unity: a vision for the ML software & hardware ecosystem in 2022
Introducing ONNX Script: Authoring ONNX with the ease of Python - Microsoft Open Source Blog

5 Minifying neural nets

openvinotoolkit/nncf: Neural Network Compression Framework for enhanced OpenVINO™ inference

6 Compiling neural nets

openvinotoolkit/openvino: OpenVINO™ is an open-source toolkit for optimising and deploying AI inference
apache/tvm: Open deep learning compiler stack for cpu, gpu and specialized accelerators
tiny-dnn/tiny-dnn: header only, dependency-free deep learning framework in C++14 (defunct)
pytorch/glow: Compiler for Neural Network hardware accelerators in AOT mode
Minimalist tiny-dnn is a C++11 implementation of certain tools for deep learning. It targets deep learning on limited-compute, embedded systems and IoT devices.

7 Link slurry

8 References

Cai, Gan, Wang, et al. 2020. “Once-for-All: Train One Network and Specialize It for Efficient Deployment.” In.

Chen, Tianqi, Goodfellow, and Shlens. 2015. “Net2Net: Accelerating Learning via Knowledge Transfer.” arXiv:1511.05641 [Cs].

Cheng, Wang, Zhou, et al. 2017. “A Survey of Model Compression and Acceleration for Deep Neural Networks.” arXiv:1710.09282 [Cs].

Chen, Lihu, and Varoquaux. 2024. “What Is the Role of Small Models in the LLM Era: A Survey.”

Cohn, Agarwal, Gupta, et al. 2023. “EELBERT: Tiny Models Through Dynamic Embeddings.” In.

He, Lin, Liu, et al. 2019. “AMC: AutoML for Model Compression and Acceleration on Mobile Devices.” arXiv:1802.03494 [Cs].

Howard, Zhu, Chen, et al. 2017. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” arXiv:1704.04861 [Cs].

Roth. 2021. “Probabilistic Methods for Resource Efficiency in Machine Learning.”

Shi, Feng, and ZhifanZhu. 2016. “Functional Hashing for Compressing Neural Networks.” arXiv:1605.06560 [Cs].

Waltsburger, Libessart, Ren, et al. 2023. “Neural Network Scoring for Efficient Computing.” In 2023 IEEE International Symposium on Circuits and Systems (ISCAS).

Wang, Xu, Xu, et al. 2019. “Packing Convolutional Neural Networks in the Frequency Domain.” IEEE transactions on pattern analysis and machine intelligence.

Warden, and Situnayake. 2020. TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers.

Footnotes

I do not like this term, because it tends to imply that we care especially about some kind of centre-edge distinction, which we only do sometimes. It tends to imply that large NN models in data centres are the default type of ML. Chris Mountford’s Hasn’t AI Been the Wrong Edgy for Too Long?, mentioned in the comments, riffs on this harder than I imagined, though↩︎