SWHarden.com

The personal website of Scott W Harden

Speaking Numbers with a Microcontroller

How to encode WAV files into C code that can be replayed from memory

This page describes the technique I used to get a microcontroller to speak numbers out loud. Reading numbers from a speaker is an interesting and simple alternative to displaying numbers on a display, which often requires complex multiplexing circuitry and/or complex software to drive the display. This page describes the techniques I used to extract audio waveforms from MP3 files and encode them into data that can be stored in the microcontroller’s flash memory.

Because microcontrollers have a limited amount of flash memory this method is not suitable for long recordings, but it is fine for storing a few seconds of audio at a limited sample rate. Unlike more common methods for playing audio with a microcontroller, playing audio from program memory does not require a SD card, special hardware, or complex audio decoding software. Although this technique works best when a speaker is driven with a amplifier circuit, I found acceptable audio can be produced by driving a speaker directly from a microcontroller pin. This technique makes it possible to play surprisingly good audio without requiring any components other than a speaker.

NumberSpeaker Arduino Library

The code described on this page has been packaged into the NumberSpeaker library that can be installed from the Arduino library manager. Users who just wish to have some numbers read out loud can use this library and not hassle with the complex techniques described lower on this page. Source code is available on GitHub, but to get started using it you can perform the following steps:

#include "NumberSpeaker.h"

NumberSpeaker numberSpeaker = NumberSpeaker();

void setup() {
  numberSpeaker.begin();  // speaker on pin 11
}

void loop() {
  unsigned int count = 0;
  for (;;) {
    numberSpeaker.speak_int(count++);
    delay(500);
  }
}

Theory of Operation

Encoding the Audio

After some trial and error I found that 5 kHz 8-bit data is a good balance of quality vs. size for storing a human voice in program memory. The 5kHz sample rate has a 2.5 kHz Nyquist frequency, meaning it can reproduce audio from 0-2.5 kHz which I found acceptable for voice. For music I found it helps to increase sample rate to 8 kHz. To reduce encoding artifacts that may result from such an aggressive reduction in sample rate, a low-pass filter is useful to apply to the audio before resampling it. Resampling should also be performed using interpolation, allowing the smooth reduction of sample rate by an arbitrary scale factor.

Full source code is available on the NumberSpeaker GitHub project.

Here’s a slimmed-down version of the Python code I used to perform these operations:

# read audio values from file
ys, sample_rate = librosa.load("zero.mp3")

# lowpass filter
new_sample_rate = 5_000
freq_low = 150
freq_high = new_sample_rate/2
sos = scipy.signal.butter(6, [freq_low, freq_high], 'bandpass', fs=sample_rate, output='sos')
ys = scipy.signal.sosfilt(sos, ys)

# resample with interpolation
xs_new = numpy.arange(len(ys)) / new_sample_rate
xs = numpy.arange(len(ys)) / sample_rate
ys = numpy.interp(xs_new, xs, ys)

# scale to [0, 255] and quantize
ys = ys / max(abs(min(ys)), abs(max(ys)))
ys = ys / 2 + .5
ys = ys * 255
ys = ys.astype(numpy.int16)

I then used python to loop across a folder of MP3 files and generate a C header file containing the waveforms for each recording stored as a byte array:

const uint8_t AUDIO_1[] PROGMEM = { 129, 125, ..., 130, 132 };
const uint8_t AUDIO_2[] PROGMEM = { 127, 123, ..., 134, 130 };
const uint8_t AUDIO_3[] PROGMEM = { 122, 124, ..., 137, 135 };

Waveforms are stored in program memory using the PROGMEM keyword and later retrieved using pgm_read_byte() function. See AVR-GCC’s Program Space Utilities documentation for additional information.

Playing the Audio

An 8-bit timer is used to generate the PWM signal that creates the audio waveform. I used Timer2 to generate this signal because Arduino uses Timer0 for its own tasks. I also setup the Timer2 settings myself to ensure it would run at maximum speed (ideal for generating waveforms using a simple R/C lowpass filter). Running with the 16 MHz system clock with no prescaler, the 8-bit timer overflows at a rate of 62.5 kHz.

// Use Timer2 to generate an analog voltage using PWM on pin 11
pinMode(11, OUTPUT); 

// Set OC2A on BOTTOM, clear OC2A on compare match
TCCR2A |= bit(COM2A1);

// Clock at CPU rate without prescaling.
TCCR2B = bit(CS20);

// Set the initial level to half the positive rail voltage
OCR2A = 127;

Playback is achieved by setting PWM duty from the stored audio data. Playback speed can be customized by adjusting the delay between sample advancements.

for (int i = 0; i < sizeof(AUDIO_1); i++) {
    uint8_t value = pgm_read_byte(&AUDIO_1[i]);
    OCR2A = value; // 8-bit timer generating the PWM signal
    delayMicroseconds(200); // approximate delay for 5 kHz sample rate
}

Memory Management

Using the strategies above I was able to encode numbers 0-9 and the word “point” using a total of 16,720 bytes (about half of the 32k program memory). I could save space by speeding-up the recordings (reading the numbers faster), but I found the present settings to be a good balance between comfort and memory consumption.

I experimented with strategies that used 4 bytes instead of 8 to store the waveform, but I found them unacceptably noisy. It may be possible to use 4-byte frames to store the difference between each point and the next, effectively halving memory required to store these waveforms. There are many signal compression algorithms we could employ to reduce the memory footprint, but for now the present strategy is working satisfactorily and has the benefit of minimal code complexity.

External memory could be used to store more audio. Arduino (ATMega328P) has 32 KB program memory and it must be shared with the main program code, so this places a restrictive upper limit on the total amount of audio data that may be stored on the chip itself. Users demanding more storage may benefit from interfacing a SPI flash memory IC. For example, W25Q32JV has 4 MB of flash memory and is available on Mouser for $0.76 each. At 5 kHz, a 4 MB flash memory chip could store over thirteen minutes of 8-bit audio. There’s even a SPIMemory library for Arduino. However, some thought must be placed into how to program the flash memory with the audio waveform on your computer.

Audio Amplification

Driving headphones or a speaker directly with a PWM output pin works, but amplifying the signal substantially improves the quality of the sound. The LM386 single chip audio amplifier is technically obsolete (discontinued in 2016), but it’s commonly used in hobby circles because clones are ubiquitously available in DIP packages which make them convenient for breadboarding. The LM4862 is a better choice for serious designs as it is currently in production and does not require large electrolytic capacitors. I’m using the old LM386 here with the minimum of components and the default 20x gain experienced when pins 1 and 8 are left floating.

This is audio amplifier circuit I ended-up using for this project. The combination of a 10 kΩ resistor and 0.1 µF capacitor formed a low-pass filter with -3 dB cutoff of about 160 Hz. This seems extremely aggressive since the 8-bit PWM frequency is 62.5 kHz, but I found this setup worked pretty well on my bench.

Arduino Demo

Here you can observe the digital PWM waveform (yellow) next to the low-pass filtered analog signal (blue). I acknowledge that the oscilloscope demonstrates high frequency noise all over the place, but for the present application it doesn’t really matter.

Playing Audio from Newer AVR Chips

Let’s leave Arduino behind and switch to one of the more modern AVR microcontroller chips. These newer 8-bit AVR microcontrollers offer a superior set of peripherals at lower cost and have greater availability. Unlike older AVR chips, these new AVR microcontrollers cannot be flashed using ICSP programmers. See my Programming Modern AVR Microcontrollers page to learn how to use inexpensive gear to program the latest family of AVR chips with a UDPI programmer.

The AVR64DD32 microcontroller is one of the highest-end 8-bit AVRs currently on the market. It has 64 KB flash memory, and this increased size allowed me to experiment with storing longer recordings and at a higher sample rate. It’s worth noting that the DA and DB family of chips has products with 128 KB of flash memory. Array lengths are limited to 32 KB, but I was able to circumvent this restriction by storing audio across multiple arrays and adding extra logic to ensure the correct source array is sampled.

These chips do not come in DIP packages, but QFP/DIP breakout boards make them easy to experiment with in a breadboard.

AVR64 DD Code

Two timers can be configured to achieve audio playback that does not block the main program. This strategy uses two clocks to achieve asynchronous audio playback using interrupts to advance the waveform so it does not block main program execution.

Use the 24 MHz internal clock for the fastest PWM possible:

#define F_CPU 24000000UL
CCP = CCP_IOREG_gc; // Protected write
CLKCTRL.OSCHFCTRLA = CLKCTRL_FRQSEL_24M_gc; // Set clock to 24MHz

Setup the 8-bit TimerB to produce the PWM signal:

// Enable PWM output on pin 32
// (don't forget to set the PORT direction)
TCB0.CTRLA |= TCB_ENABLE_bm;

// Make waveform output available on the pin
TCB0.CTRLB |= TCB_CCMPEN_bm;

// Enable 8-bit PWM mode
TCB0.CTRLB |= TCB_CNTMODE_PWM8_gc;

// Set period and duty
TCB0.CCMPL = 255; // top value
TCB0.CCMPH = 50; // flip value

Setup the 16-bit TimerA to advance the waveform at 5 kHz:

// enable Timer A
TCA0.SINGLE.CTRLA |= TCA_SINGLE_ENABLE_bm;

// Overflow triggers interrupt
TCA0.SINGLE.INTCTRL |= TCA_SINGLE_OVF_bm;

// Set period and duty
TCA0.SINGLE.PER = F_CPU/5000; // 5 kHz audio

Populate the overflow event code:

ISR(TCA0_OVF_vect)
{
    // read next level from program memory
    uint8_t level = pgm_read_byte(&AUDIO_SAMPLES[AUDIO_INDEX++]);
    
    // wait until next rollover to reduce static
    while(TCB0.CNT > 0){}

    // update the PWM level
	TCB0.CCMPH = level;

    // rollover the index to loop the audio
	if (AUDIO_INDEX >= sizeof(AUDIO_SAMPLES))
        AUDIO_INDEX = 0;

    // indicate the interrupt was handled
	TCA0.SINGLE.INTFLAGS = TCA_SINGLE_OVF_bm;
}

Enable global interrupts:

sei();

Full source code is on GitHub:

Use the AVR’s DAC for Audio Playback

Modern 8-bit AVRs have a 10-bit digital-to-analog converter (DAC) built in. It’s simpler to setup and use than a discrete timer/counter in PWM mode. Although the code above uses the AVR’s timer/counter B (TCB) to generate the analog waveform, this method is recommended when a DAC is available:

// Enable the DAC and output on pin 16
DAC0.CTRLA = DAC_OUTEN_bm | DAC_ENABLE_bm;

// Set the DAC level
uint8_t level = 123; // Retrieved from memory
DAC0.DATA = level << 8; // Shift to use the highest bits

Playing Music from an AVR DD Series Microcontroller

YouTube has a surprisingly large number of videos of people beeping the Coffin Dance song from an Arduino, so here’s my video response playing the actual song audio encoded as an 8 kHz 8-bit waveform on an AVR64DD32 microcontroller. The original song is Astronomia by Vicetone and Tony Igy. Refer to Know Your Meme: Coffin Dance for more information about internet culture.

Additional Resources