Over 8 years ago I completed a Bachelor in Electronics Engineering, with a focus on embedded systems. Since then I have done primarily software engineering in embedded and web projects, sometimes combined in so-called “Internet of Things” (IoT) projects. Often there was a strong data- and signal-processing focus in these systems; from audio processing in microphone-arrays, to image processing for smart website builders. Recognizing the importance of data, I realized around 2 years ago I wanted to add a new skill-set to my engineering capabilities: Data Analysis and Machine Learning (ML).
And today I’m proud to say that I have successfully completed the Master in Data Science program at the Norwegian University of Life Sciences, as one the first batch to have this degree in Norway.
Master of Data Science thesis successfully defended. Left: me, Right: External sensor Lars Erik Solheim
Research
Throughout my degree, I’ve kept the vast majority of my notes in the open-source way – public on Github. Over time I have distilled these into two resources covering the main topics of my work.
Embedded Machine Learning: Machine Learning applied to Embedded System, with a focus on-edge ML in low-cost, low-power sensors.
Machine Hearing: Using Machine Learning on audio, with a focus on general sound (less music and speech).
Thesis
My masters thesis combined these two topics, and applied it to classification of everyday urban sounds for noise monitoring in smart cities. The report and all the code can be found on Github:
Since Embedded Machine Learning is an emerging niche, the availability of software tools are not as good as for machine learning in general. To help with that I developed emlearn, an open-source ML inference engine tailored for micro-controllers and very small embedded systems. emlearn allows to convert models built with existing Python machine learning frameworks such as scikit-learn and Keras, and execute them on device using portable C code. The focus is on simple and efficient models such as Random Forests, Decision Trees, Naive Bayes, linear models. In this way, emlearn is a compliment to deep learning inference libraries for embedded devices, such as TFLite and X-CUBE-AI.
Consulting
While the master degree was nominally a full time program, I kept doing engineering work for customers in the period. Projects have included:
dlock. A IoT doorlock system for retrofitting existing public infrastructure doors. Developed for municipality of Oslo as part of the Oslonøkkelen project, an app that allows inhabitants to access municipality services such as libraries and recycling stations outside of manned working-hours. Made in collaboration with IoT solutions provider Trygvis IO.
Since the start of this year, I have started to focus on machine learning projects. Especially things that incorporate my particular expertise: Embedded/Edge Machine Learning, Machine Learning for Audio, and Machine Learning on IoT sensor data. The first ML consulting project for Roest coffee is well underway (details to be announced). Going forward, most of my time is dedicated to products at my new startup, Soundsensing. However, there should also be some capacity for new consulting work.
Embedded systems and microcontroller programs can be really hard to understand. Here are some techniques that can help.
An embedded system is a computer system with a dedicated function within a larger mechanical or electrical system …
– Wikipedia
Examples of embedded systems include everything from a fridge thermostat, to the airbag sensors in your car, to consumer electronics like a electronic keyboard for music. Programming such a system can be extra challenging compared to writing software for a general-purpose computer. The complicating factors may include:
Limited computing power, memory, storage or bandwidth
Need for low and deterministic response times is needed (soft or hard real-time)
Running on battery, within a constrained power budget
Need for high reliability without user intervention over long periods of time (months,years)
A malfunctioning system may cause material damage or harm people
Problem may require considerable domain-specific knowledge
But before all that, we typically need to come to terms with a more mundane and practical problem:
that it is usually very hard to understand what is going on with our program!
Why embedded systems are harder to debug
This problem if hard-to-understand software systems is not at all unique to embedded, it happens frequently also when developing for a PC or mobile device. But several aspects typically make the problem more severe.
It is often hard to stimulate the system with inputs automatically, as they are typically physical/external in nature
Inputs, outputs and internal state of the system may change faster than humans can observe in real-time
The environment of the system typically influences results, sometimes in unforseen ways
Low-level programming languages and techniques is still the norm
The target device does not have a UI or development tools
The connection to a system is often (or at best) a slow serial connection
To make working with the system more pleasant and productive, I have found two techniques very useful:
Recording sensor data from device, and then analyzing it ‘offline’ on the PC
Running the firmware logic on the PC, by abstracting away hardware dependencies
Case: Triggering MIDI with capacitive sensors
A friend of mine is building a electronic music instrument, a Hang-like thing with 9 pads. It uses capacitive touch sensors and sends MIDI notes over USB for triggering a sampler or synthesizer to make sound.
Core components of system. Capacitive sensor(s) connected to Arduino, sending MIDI messages to computer software over USB.
The firmware on the device was an Arduino sketch, using the CapacitiveSensor library to read the capacitance of the pads.
Summing up N consecutive samples gives basic filtering, and a note is triggered if the value exceedes a specified threshold TH.
The setup worked in principle, and in practice for some ways of hitting the drumpad. But it seemed impossible to find a combination of N and TH where the device would trigger correctly in all cases. If N was too high, then fast taps would be dropped/ignored. If N was low, it caused occational double triggering. If TH was too high, it would not trigger on single finger taps. If TH was too low, it would trigger when a palm was just hovering over the pad.
How to make it work reliably?
To solve a problem, you first have to understand it
Several hours had been spent tweaking the values, recompiling the sketch, uploading and hitting the pads a bit, listening if it did the right thing. This gave us some rough intuitions about things that worked and not, but generally our understanding of the situation was quite sparse. Which is understandable; there is only so much one can learn from observing a system in real-time from the outside with the naked eye/ear.
It seemed that how the pad was hit had an influence. Which is plausible, the surface area might influences how effective the capacitive coupling is.
Example poses that should trigger. Hitting with fingertip, finger flat, 3-fingers and side of thumb.
Some cases which should *not* trigger. Hand hovering over pad, touching casing with thumb but not the pad.
What we needed to understant was: What is the input data in different scenarios?
First added logging to the Arduino sketch, sending the time between readings and the current sensor value (for one sensor).
const long beforeRead = millis();
const Input input = readInputs();
const long afterRead = millis();
Serial.print("(");
Serial.print(afterRead-beforeRead);
Serial.print(",");
Serial.print(input.values[0].capacitance);
Serial.println(")");
The stream showed that with N=70, the reading the sensors took a long time, over 40ms. This is unacceptable for a musical instrument, so it was clear that this value *had* to go down. Any issues caused would have to be fixed in some other way.
Then a small Python script to read the serial port, and writing the raw data to a file.
This allowed us to record the data from a whole scenario. For instance we recorded things like ‘tapping-3-times-quickly’ or ‘hovering-then-touching’, using the filename as a description for what the scenarios was, and directories to group sets of recordings.
Another Python script was then used for analysing the data, parsing the raw values and using matplotlib to plot it out.
Now we could finally *see* what was going on, over longer periods of time and compare different scenarios against eachother.
This case illustrated the crux of our problem. The red areas indicate where we are hovering *over* the pad with palm, but the sensed capacitance values are higher than when touching with a fingertip.
If the threshold is set too high (orange line) we miss the finger tap, and if it is too low *yellow) we will false trigger on a hovering palm.
Can we do better with an alternative detection algorithm? Maybe a high-pass filter to detect the changes at the edges, it may be possible be possible to identify both cases. Plugging in a high-pass filter in the Python analysis script and playing with the values seemed to support this.
But we cannot run Python for real-time processing. We need to be able to implement the filter in the Arduino firmware.
Exponential Moving Average filters
An Exponential Moving Average (EMA or EMWA) was selected as the basis of the filter. It has many desirable properties for use in a latency-sensitive application on a microcontroller: It only requires storing one number, is computationally simple, and is robust against variation in sampling time (jitter). And unlike a FIR filter, it does not introduces latency (apart from the time-constant of the filter itself). Here is a nice introduction for Arduino usage.
static int
exponentialMovingAverage(const int value, const int previous, const float alpha) {
return (alpha*value) + ((1-alpha)*previous);
}
...
next.highfilter = exponentialMovingAverage(input.capacitance, previous.highfilter, appConfig.highpass);
next.highpassed = input.capacitance - next.highfilter;
...
What value should highpass have and how do we know if works correctly?
Host-based simulation
A regular Arduino sketch can generally only run on the target microcontroller. This is because the application logic is mixed with the hardware-dependent I/O libraries, in this case CapacitiveSensor and MidiUSB.
But Arduino is just C++. Nothing prevents us from separating out the application logic and making it hardware-independent so it can also execute on our host. The easiest method is to put the code into a .hpp, and then include that in our sketch and any host-only tools we have.
#include "./hangdrum.hpp"
This lets us use all the regular C++ tools and practices for testing and validating code, without needing access to the hardware. Automated unit- and integration-testing, fuzz-testing, mutation testing, dynamic analysis like Valgrind, using a continious integration services like Travis CI. In a project with custom hardware, it lets you develop most parts of the software before the hardware is finalized, potentially saving a lot of time.
I like to express the entire application logic of the firmware as a pure function which takes Input and current State, and returns the new State. This formulation lets us know exactly what may affect the system – no hidden dependencies or state.
State
calculateState(const State &previous, const Input &input, const Config &config) {
State next = previous;
next.time = input.time;
for (unsigned int i=0; i<N_PADS; i++) {
next.pads[i] = calculateStatePad(previous.pads[i], input.values[i], config.pads[i], config);
}
calculateMidiMessages(next, config, next.messages);
return next;
}
Because all the inputs and outputs of the functions are plain-old-data, we can safely and meaningfully serialize and deserialize them.
To get better visibility into the internals of the system and help our understanding, we also store intermediate values:
To store the execution this I used a Flowtrace, a JSON-based format for tracing Flow-based-programming/dataflow system.
Because time is just data in our programming model (part of Input or State), we can run through hours of input scenarios in seconds.
I made another plotting tool, this time reading the flowtrace, visualizing all the steps in our signal processing pipeline, and the detected notes.
Avoiding false triggering for hovering hand, by looking at changes instead of absolute values.
By going over a range of different input scenarios and seeing how different values perform, we get a decent confidence that the algorithm works. But does it actually run fast enough on the Arduino?
Profiling on device
The Atmel AVR chip on the Arduino Leonardo is an 8-bit processor without a floating point unit. So I was a bit worried about the exponential averaging filter using several expensive features: 16bit `int`, divisions and a multiplication with a float. Using a Arduino sketch to do some simple profiling showed that my worries were unfounded.
const long beforeCalculation = millis();
State next = state;
for (int i=0; i<100; i++) {
next = hangdrum::calculateState(state, input, config);
}
state = next;
const long afterCalculation = millis();
Serial.print("calculating: ");
Serial.println(afterCalculation-beforeCalculation);
The 100 iterations of the application logic executed it took 80 ms with both a high-pass and low-pass, or less than 1ms per execution. Since sensor readout is up to 10 ms, it dominates the time spent. So if we want lower latency, optimization efforts should be focused on sensor readout first. Only when sensor readout is down to around 1ms does it make sense to optimize the filtering.
Don’t forget the hardware
Testing the code with highpass-based in practice showed that yes, it did correctly detect tapping while supressing false triggers from a hovering palm over the sensor. Another benefit when using change detection a notes will trigger even if a finger is currently touching, and hitting the pad with another finger. With absolute value thresholding, the second finger tap is not detected.
However, we also found that by moving the sensor to the outside, the data quality increases a lot. In this case, even the simple absolute threshold code was able to correctly discriminate a hovering palm. The higher data quality may also enable other features like velocity or aftertouch.
Putting sensor on outside of body gives better readings, but requires additional manufacturing steps.
In conclusion
Sensor-recording together with a separting hardware from application logic, and host-based simulation form powerful tools that help you better understand an embedded system. By visualizing the input data, the internal state and the outputs of the firmware, you can more easily see and understand the conditions which cause problems. The effects of changing the firmware can be quickly understood, as re-running the simulation suite on a wide range of inputs can be done in seconds. It can be implemented easily in C++ firmware, and any scripting language can be used for the data analysis.
The techniques here form a baseline level of tooling which can be extended in many ways.
If we had recorded a high-framerate video stream together with the input data, it would be easier to see how the input data corresponds to physical actions.
We could annotate the input data to indicate the correct locations of notes (expected output of system), and write an automated test against the output trace to check how well the firmware detects them. By using data-driven testing, we could generate test variations over different inputs and filter configuration. And then use machine learning techniques to help find the best values for the filters.
We could also create an end-to-end test covering the vast majority of the code by inject input sensor data in the on-device firmware over serial, and then verify that it produces the expected MIDI messages.