The personal website of Scott W Harden

Local AI Chat with C#

How to run an LLM locally to power AI chat and answer questions about documents

This page describes how I use C# to run the LLaMA 2 large language model (LLM) locally to achieve AI chat, including the ability to answer questions about local documents. I previously described how I run LLama2 locally using Python (and how I use it to answer questions about documents). Although the Python ecosystem is fantastic for end-users who are strong programmers, setting-up the development environment required to run Python scripts can be a challenge. Maintaining Python projects can be cumbersome too, and I received numerous emails indicating that the code examples I posted just a few months ago no longer run as expected. Although I intend to go back and update those old Python tutorials to try to keep them current, I was very happy to learn I can achieve similar functionality using the .NET ecosystem. This article shows how I use free and open-source tools to create C# applications that leverage locally-hosted LLMs to provide interactive chat, including searching, summarizing, and answering questions about information in local documents.

Key Resources

Summary (TLDR)

AI Chat

To create an interactive AI chat bot that answers user questions:

  1. Download a GGUF file from HuggingFace (I’m using llama-2-7b-chat.Q5_K_M.gguf)

  2. Create a new .NET console application and add the LLamaSharp and LLamaSharp.Backend.Cpu NuGet packages

  3. Add the following your code to your main program:

using LLama.Common;
using LLama;

// Indicate where the GGUF model file is
string modelPath = @"C:\path\to\llama-2-7b-chat.Q5_K_M.gguf";

// Load the model into memory
Console.ForegroundColor = ConsoleColor.DarkGray;
ModelParams modelParams = new(modelPath);
using LLamaWeights weights = LLamaWeights.LoadFromFile(modelParams);

// Setup a chat session
using LLamaContext context = weights.CreateContext(modelParams);
InteractiveExecutor ex = new(context);
ChatSession session = new(ex);
var hideWords = new LLamaTransforms.KeywordTextOutputStreamTransform(["User:", "Bot: "]);
InferenceParams infParams = new()
    Temperature = 0.6f, // higher values give more "creative" answers
    AntiPrompts = ["User:"]

while (true)
    // Get a question from the user
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("\nQuestion: ");
    string userInput = Console.ReadLine() ?? string.Empty;
    ChatHistory.Message msg = new(AuthorRole.User, "Question: " + userInput);

    // Display answer text as it is being generated
    Console.ForegroundColor = ConsoleColor.Yellow;
    await foreach (string text in session.ChatAsync(msg, infParams))

Note: some lines of code related to styling have been omitted. See the GitHub repository for this blog post for full source code.

Example Output

A basic question about planets yields a concise response:

Chat sessions preserve history, enabling “follow-up” questions where the model uses context from previous discussion:

Chat about Documents

To create an AI chat bot that answers user questions about documents:

  1. Download a GGUF file from HuggingFace (I’m using llama-2-7b-chat.Q5_K_M.gguf)

  2. Create a new .NET console application and add the LLamaSharp and LLamaSharp.Backend.Cpu NuGet packages

  3. Add the following your code to your main program:

using LLamaSharp.KernelMemory;
using Microsoft.KernelMemory.Configuration;
using Microsoft.KernelMemory;
using System.Diagnostics;

// Setup the kernel memory with the LLM model
string modelPath = @"C:\path\to\llama-2-7b-chat.Q5_K_M.gguf";
LLama.Common.InferenceParams infParams = new() { AntiPrompts = ["\n\n"] };
LLamaSharpConfig lsConfig = new(modelPath) { DefaultInferenceParams = infParams };
SearchClientConfig searchClientConfig = new() { MaxMatchesCount = 1, AnswerTokens = 100 };
TextPartitioningOptions parseOptions = new() { MaxTokensPerParagraph = 300, MaxTokensPerLine = 100, OverlappingTokens = 30 };
IKernelMemory memory = new KernelMemoryBuilder()

// Ingest documents (format is automatically detected from the filename)
string documentFolder = @"C:\path\to\documents";
string[] documentPaths = Directory.GetFiles(documentFolder, "*.txt");
for (int i = 0; i < documentPaths.Length; i++)
    await memory.ImportDocumentAsync(documentPaths[i], steps: Constants.PipelineWithoutSummary);

// Allow the user to ask questions forever
while (true)
    Console.Write("\nQuestion: ");
    string question = Console.ReadLine() ?? string.Empty;
    MemoryAnswer answer = await memory.AskAsync(question);
    Console.WriteLine($"Answer: {answer.Result}");

Note: some lines of code related to styling have been omitted. See the GitHub repository for this blog post for full source code.

Example Output

I gave this program a PDF copy of the About Scott page from my website, then asked who Scott is. I think the phrase “skilled computer programmer and electrical engineer” is a bit dramatic, but overall the information returned lines up pretty well!

I then provided information about a fictitious Python package as a text file and asked about it. The information I provided is quoted below, and the response to my question about it is pretty good!

“JupyterGoBoom” is the name of a Python package for creating unmaintainable Jupyter notebooks. It is no longer actively developed and is now considered obsolete because modern software developers have come to realize that Jupyter notebooks grow to become unmaintainable all by themselves.

Document Ingestion with Local Storage

Users who run the code above to perform document ingestion will find that it takes a long time to ingest large documents (on the order of minutes), and restarting the program requires reanalyzing those files all over again.

The Kernel Memory package has functionality that allows the information gathered from documents to be stored, then re-loaded into memory almost instantly the next time the program is started. There are extensions to allow memory to be stored in various cloud engines and databases (Azure AI Search, Elasticsearch, Postgres, SQL Server, etc.), but in this example we will store and retrieve this information using the local filesystem.

To enable local storage of ingested document information for quick retrieval, modify the code example above to build the kernel memory as shown here:

SimpleFileStorageConfig storageConfig = new()
    Directory = "./storage/",
    StorageType = FileSystemTypes.Disk,

SimpleVectorDbConfig vectorDbConfig = new()
    Directory = "./storage/",
    StorageType = FileSystemTypes.Disk,

IKernelMemory memory = new KernelMemoryBuilder()
    .WithSimpleFileStorage(storageConfig) // store information locally
    .WithSimpleVectorDb(vectorDbConfig)   // retrieve information locally

When the code is run it will be slow to start the first time as it ingests the documents, but once ingested the information will be saved to disk (in the ./storage/ folder) and rapidly loaded into memory the next time the application starts.


The LLamaSharp and Kernel Memory packages can be easily combined to create small C# projects which are able to run LLMs locally to enable AI chat functionality, including summarizing and answering questions about documents. Unlike Python-centric strategies that require end users to have development tools installed and maintain system-wide or virtual environments for package management just to run basic scripts, the .NET-centric strategy described here makes it possible to create compiled apps that are simple to distribute and easy to run by non-technical users. Much of the AI/ML documentation and discussion in recent users has been dominated by Python, but I’m thrilled to see C#/.NET tools like these growing in the AI/ML landscape! I’m excited to watch how these projects will continue to evolve in the years to come.


Frequency Measurement with Modern AVR Microcontrollers

How to use the AVR64DD32's asynchronous counter to measure frequencies beyond 100 MHz

Modern AVR microcontrollers have asynchronous counters that can be externally driven to count pulses from 1 Hz to beyond 100 MHz. Over the years I’ve explored various methods for building frequency counters typically using the SN74LV8154 32-bit counter, but my new favorite method uses the AVR64DD32 microcontroller ($1.52 on Mouser) to directly measure a signal and report its frequency to a PC using a USB serial adapter. I’m working on a special frequency counter project which builds upon this strategy, but I found the core concept to be interesting enough that I decided to write about it in its own article. The following information is a summary of how the strategy can be achieved, but additional information and source code is available on GitHub.

Theory of Operation

1 The AVR64DD32 datasheet suggests EXTCLK can be driven via XTALHF1 pin to a maximum frequency of 32 MHz (Section, page 93), but this article by sm6vfz demonstrates this strategy produces results accurate to the single Hz up to 150 MHz.

2 The AVR64DD32 datasheet says “an external digital clock can be connected to the XTAL32K1 pin” (section 26.3, page 344) but my read doesn’t clearly indicate what the upper limit of the frequency is that may be clocked in. Although the XTAL32K1 pin in combination with XTAL32K2 are designed for a 32 kHz crystal oscillator, my read does not indicate that 32 kHz is intended to be an upper limit of what may be clocked in externally.

Basic Setup

Microcontroller: The AD64DD32 8-bit AVR does not come in a DIP package, but the VQFN32 package is easy to hand solder to a QFN32/DIP breakout board. It also cannot be programmed with a ICSP programmer, but instead requires a UDPI programmer. See my Programming Modern AVR Microcontrollers article for more information about programming these chips.

Code: Full source code for this project is on GitHub, and the code highlights are shown at the bottom of this article.

PC Connection: I’m using an RS232 breakout board as a USB/serial adapter. It’s Rx pin is connected to the microcontroller’s Tx pin (pin 2).

Test Signal: I’m using a 50 MHz can oscillator as a test signal. It’s been in my junk box for years and it doesn’t surprise me if it has drifted a few kHz from 50 MHz. Note too that there may be some inaccuracy in the gating time base due to the imprecise nature of the AVR’s 24 MHz internal oscillator.

Serial Monitor: I’m using RealTerm to monitor the output of the microcontroller. The code below gates the counter once per second (1 PPS) then displays the count, so the number displayed is the frequency in Hz. This value would be easy to read in a language like Python for applications requiring frequency measurement over time.

Code: Counting EXTCLK pulses with Timer/Counter D

void setup_extclk_counter()
	// Enable the highest frequency external clock on pin 30
	CCP = CCP_IOREG_gc; // protected write
	// Setup TCD to count the external clock
	TCD0.CMPBCLR = 0x0FFF; // count to max (12-bit)
	TCD0.CTRLA = TCD_CLKSEL_EXTCLK_gc; // count external clock input
	TCD0.INTCTRL = TCD_OVF_bm; // Enable overflow interrupt
	while (!(TCD0.STATUS & 0x01)); // Wait for ENRDY before enabling
	TCD0.CTRLA |= TCD_ENABLE_bm; // Enable the counter

// Increments the counter every time TCD0 overflows
volatile uint32_t COUNTER;

volatile uint32_t COUNT_DISPLAY = 0;
volatile uint32_t COUNT_NOW = 0;
volatile uint32_t COUNT_PREVIOUS = 0;

// Call this method once per second to update the display frequency
void update_display_count()
    while ((TCD0.STATUS & TCD_CMDRDY_bm) == 0); // synchronized read

Code: Gating at 1 Hz using the system clock as a time base

void setup_gate_sysclk(){
	// 24 MHz clock div 256 is 93,750 ticks/second
	// enable overflow interrupt
	// overflow 5 times per second
	TCA0.SINGLE.PER = 18750-1;

// this interrupt is called 5 times per second
uint8_t GATE_TICKS = 0;
	if (GATE_TICKS == 5){

Code: The main block runs an infinite loop and displays the frequency if an updated number is detected. How to send text to the serial port is outside the scope of this article, but see this project’s code on GitHub for more information about how I did it. I did find this function helpful:

void print_with_commas(unsigned long freq){
	int millions = freq / 1000000;
	freq -= millions * 1000000;
	int thousands = freq / 1000;
	freq -= thousands * 1000;
	int ones = freq;
	printf("%d,%03d,%03d\r\n", millions, thousands, ones);

Amplify Small Signals

Using an RF amplifier module, I was able to measure the frequency of radio signals using an antenna. I found a convenient RF buffer amplifier board on Amazon based on a TLV3501 comparator. It is powered with 5V and has SMA connectors for RF input and TTL output, and I was able to use this device to measure frequency of various transmitters including my 144 MHz handheld VHF radio.

Use a Prescaler to Measure Higher Frequencies

There are many inexpensive single chip prescalers which can divide-down high frequency input to produce a waveform that slower counters can measure. It appears there are several RF prescaler modules on Amazon with SMA connectors, making them easy to pair with the preamplifier module above. Most of them seem to use a MB506 2.4 GHz prescaler which is not currently available on Mouser.

I’m also noticing a lot of people using the MC12080 1.1 GHz Prescaler for custom frequency counter designs. It’s a little over $4 on Mouser and doesn’t require much supporting circuitry, although I haven’t personally used this chip yet. I also found recommendations for the MC12093 prescaler. If you have experience creating a frequency counter using a prescaler, send me an email and let me know which chip you recommend and why!

Gate with an External 10 MHz Reference

The examples above use the AVR’s system clock to generate the 1 Hz gate, but accuracy can be improved by gating based upon a 10 MHz frequency reference. This strategy passes the 10 MHz into the XTAL32K1 pin and counts it with the RTC counter, generating 5 hz interrupts that can trigger the gating logic.

In this example I’m measuring the 10 MHz signal which is also responsible for the gating, so because of the chick-and-egg problem the measured frequency will always appear to be exactly 10 MHz even if the oscillator drifts. However, this strategy is useful for ensuring the software is written correctly. If the software is incorrect (e.g., the overflow period is off by one) this number will not read exactly 10 Mhz. Note also that the displayed frequency is ±1 which I presume can be attributed to variations in synchronization alignment while reading the asynchronous counter. No counts are “missed”, so a deficit by 1 in one reading will self-correct by rolling over and appearing as as a surplus by 1 in a future reading.

Code: Gate by dividing-down an external 10 Mhz reference to 5 Hz

void setup_gate_rtc(){
	// Enable the RTC

    // External clock on the XTAL32K1 pin, enable
	// Setup the RTC at 10 MHz to interrupt periodically
	// 10 MHz with 128 prescaler is 78,125 ticks/sec
	RTC.PER = 15624; // 5 overflows per second (78125/5-1)
	RTC.CLKSEL = RTC_CLKSEL_XTAL32K_gc; // clock in XOSC23K pin

// this interrupt is called 5 times per second
	/* same logic as above */


The AVR64DD32 is a versatile chip with an impressive set of peripherals that is currently offered at low cost with high availability. The asynchronous peripherals make it easy to measure frequency independent of the system clock, and in practice frequencies well into the VHF band can be directly measured with this chip. Although it isn’t available in a DIP package, it’s easy to experiment with on a breadboard using a QFN/DIP breakout board, and I hope more people get the opportunity to experiment with this interesting line of modern AVR microcontrollers.


Play Audio from SPI Flash with a Microcontroller

How to use a microcontroller to drive a speaker using PWM from audio levels stored in a SPI flash chip

This project uses a microcontroller’s PWM output to drive a speaker and play audio stored in a SPI flash chip. This article combines what was learned in my two previous articles: play audio with a microcontroller and use a FT232H to program a SPI flash chip which go into more detail about the circuitry and code behind each of these major steps. By encoding audio at 8-bit resolution with an 8 kHz sample rate, 32 Mb (4 MB) of memory is sufficient to store approximately 8 minutes of raw audio. In this project I’m using a W25Q32 breakout board available on Amazon for about $2 each. Although many similar projects online demonstrate audio playback using SD cards, I find the strategies demonstrated here favorable for simple projects because it can be achieved with the addition of only a single inexpensive component.

Play Audio from SPI Flash with Arduino

Audio levels are stored in the SPI flash memory, so by reading each address and setting the PWM level to that value at a rate of 8 kHz, the sounds stored in flash memory can be played back in real time. Here are the important parts of the Arduino code I used to achieve continuous audio playback, and the full source code can be reviewed in audio.ino on GitHub.

char spi_transfer(char data) {
  SPDR = data;
  while (!(SPSR & (1 << SPIF))) {};
  return SPDR;

volatile long SOURCE_ADDRESS;

void loop() {
  digitalWrite(CS, LOW);
  spi_transfer(SOURCE_ADDRESS >> 16);
  spi_transfer(SOURCE_ADDRESS >> 8);
  spi_transfer(SOURCE_ADDRESS >> 0);
  OCR2B = spi_transfer(255);
  digitalWrite(CS, HIGH);


  delayMicroseconds(88); // determined experimentally

The additional circuitry on the breadboard is for power supply filtering and audio amplification using a LM386 as described in my previous article.

The delay between each cycle of the main loop (88 µs) was determined experimentally to achieve approximately 8 kHz playback. Ideally another timer’s interrupt could manage playback, but the Arduino’s primary timer is occupied with systems tasks (like timing) and the secondary timer is used for PWM (to generate the analog audio output waveform), so this was the simplest option. An alternative approach could probably be to slow down the PWM timer’s period and use its overflow interrupt and a counter to manage frame advancement and flash memory reads outside the main program loop, but this code works well for demonstration purposes.

It’s worth noting that accessing the flash memory at 8 kHz is also excessive. A more sophisticated approach is to use a buffer in memory to store chunks of audio data which can be tactically loaded from the SPI chip without requiring a full transaction on every PWM update. Building large buffers can be slow though, so managing the buffer should be performed carefully so as not to require more time than the 8 kHz interrupt needs to complete its cycle.

Arduino Audio Playback Demo

This video clip shows an Arduino using the strategy described above to play 8-bit audio stored in the SPI flash chip at 8 kHz. The song is NIVIRO - The Guardian Of Angels (NCS Release) provided by NoCopyrightSounds.

Play Audio from SPI Flash with AVR

Let’s leave Arduino behind and use a more sophisticated 8-bit AVR microcontroller. The AVR64DD32 is one of most advanced 8-bit AVR microcontrollers currently on the market. Modern AVR microcontrollers cannot be programmed with a traditional ICSP programmer but instead require a UPDI programmer. However, these newer microcontrollers sport three timers (two 16-bit and one 12-bit) and even the ability to clock them asynchronously from the main clock. We won’t need all these advanced features, but we will do a better job than the Arduino can simultaneously managing the PWM level with one timer and managing interrupts at 8 kHz with another timer, keeping the main loop unblocked. Here’s the gist of how I achieved this, and the full source code can be reviewed in main.c on GitHub.

The additional circuitry on the breadboard is for power supply filtering and audio amplification using a LM386 as described in my previous article.

volatile long AUDIO_ADDRESS;

uint8_t SPI_SEND(uint8_t data){
	SPI0.DATA = data;
	while (!(SPI0.INTFLAGS & SPI_IF_bm));
	return SPI0.DATA;

    // read the level from an address in flash memory
    uint8_t level = SPI_SEND(0xFF);

    // set PWM duty update after the next rollover
    while(TCB0.CNT > 0){}
    TCB0.CCMPH = level;

AVR Playback Demo

This video clip shows an AVR64DD32 using the strategy described above to play 8-bit audio stored in the SPI flash chip at 8 kHz. The song is NIVIRO - The Guardian Of Angels (NCS Release) provided by NoCopyrightSounds. The LED blinking is the result of an infinite loop running inside main() demonstrating that the main program is not blocked during playback.

Use the AVR’s DAC for Audio Playback

Modern 8-bit AVRs have a 10-bit digital-to-analog converter (DAC) built in. It’s simpler to setup and use than a discrete timer/counter in PWM mode.

// Enable the DAC and output on pin 16

// Set the DAC level
uint8_t level = 123; // Retrieved from memory
DAC0.DATA = level << 8; // Shift to use the highest bits


For short audio clips a microcontroller’s program memory can be used to store audio, but for minutes of audio SPI flash memory can be used to source the audio waveform. On the upper extreme SD cards can be used to store audio, but there are plenty of online resources describing how to achieve this. My last few days exploring using in-chip program memory and SPI-accessible flash memory for audio playback in 8-bit microcontrollers with minimal external circuitry has been an interesting journey, and I look forward to using these techniques in upcoming embedded projects that require playback of stored audio.


Program SPI Flash with a FT232H

How to use a FT232H breakout board to read/write flash memory

FTFlash is a Windows application for reading and writing SPI flash memory with a FT232H breakout board. I created FTFlash to be an easy to use click-to-run alternative to existing strategies that use console applications, complex python distributions, or custom USB drivers. FTFlash source code is on GitHub and a zip file containing the EXE can be downloaded from the FTFlash releases page. This page demonstrates interfacing a W25Q32, but the strategies described here should work for any SPI flash chip.


FT232H Flash Module Description
D0 CLK Clock - Idles low, levels are sampled on the rising edge
D1 MOSI Master Out Serial In - FT232H shifts data to the module
D2 MISO Master In Serial Out - FT232H reads data from the module
D3 CS Cable Select - Idles high, FT232H pulls low to initiate commands
5V 3.3V Use a regultor (like L78L33) to conver 5V to 3.3V


The test window is used for learning about the connected chip. It can read device IDs, read/write specific memory addresses, and erase the full chip. Use this window to confirm that your device is connected and can be communicated with.

💡 You cannot write to an address multiple times without erasing it first! Programming bytes in flash memory can only flip bits from 1 to 0, and erasing flash memory sets resets all bytes to 0xFF.

The programming window is for reading and writing large amounts of data to and from .bin files on the local disk. Binary files can be viewed and edited with hex editors such as HxD.

Download FTFlash

Read/Write SPI Flash with a Bus Pirate

I use my old school Bus Pirate (v3) any time I start interfacing a chip I haven’t worked with before. The Bus Pirate appears as a USB serial port you can communicate with to send arbitrary SPI or I2C commands. It has a built-in power supply that can deliver 5V and 3.3V too. It’s great way to practice interfacing with unfamiliar chips without having to use a breadboard or write any software.

Bus Pirate Setup

Bus Pirate Commands

💡 You cannot write to an address multiple times without erasing it first! Programming bytes in flash memory can only flip bits from 1 to 0, and erasing flash memory sets resets all bytes to 0xFF.

Additional Resources

Hack an Atmel ICE to Deliver Power

How I broke out VCC and programming lines so my Atmel ICE can power devices and program them without requiring the programming cable

The Atmel ICE is a development tool for programming and debugging Atmel microcontrollers, but it does not have the ability to power devices under test. Older AVR series microcontrollers could be programmed with inexpensive ICSP programmers that carried a VCC line, but the newest series of AVR microcontrollers cannot be programmed with ICSP. See my Programming Modern AVR Microcontrollers article for information about options for programming these chips using inexpensive gear. Although the Atmel ICE has a VCC sense line, it does not come with the ability to deliver power. This page describes how I modified my Atmel ICE to break-out 5V and 3.3V lines, and also VSENSE and UPDI pins to make it easier to program microcontrollers without needed the ribbon cable.

Locating the Power Rails

I found that the Atmel ICE was very easy to open by inserting a large flat-head screwdriver into the grooves on the side and twisting (without applying any inward force).

After probing around I found convenient locations for soldering wires to break-out key lines. Ground can be difficult to solder because of how thermally connected it is to the ground planes in the multi-layer board, but soldering to the through-hole thermal vias made this easier. A 3.3V line was easy to locate, but I would hesitate* to use this for significant power draw. I’m not sure how this board is regulated or how close it runs to its current limit when it’s performing power-intensive operations. Also I’m not sure how easy it is to damage the programmer if the 3.3V line is exposed to higher voltage or shorted directly to ground. On the other hand, the 5V USB power rail was easy to locate and I’m much less concerned about loading that down.

Follow-up: I ended up removing the 3.3V wire because it doesn’t offer that much benefit, and the risk of accidentally touching a 5V rail or ground and potentially damaging the programmer’s internal voltage regulator or power protection circuitry was higher than I was comfortable with.

Avoiding the Stupid Ribbon Cable

Did I mention how frustrating the Atmel ICE’s ribbon cable is? The device itself has a reversed connector, so it can only work with a reversing cable! The headers on the Atmel ICE use teeny 1.27 mm pin spacing which prevents manually inserting wires with 2.54 mm female headers. The pins are unnecessarily small considering the other end of the reversing cable has standard 2.54 mm pin spacing! I’m not the only person who noticed how frustrating this cable is. If you lose or break your reversing cable, new ones are available from the major electronics distributors but they seem exorbitantly expensive for what they are.

After breaking out VSENSE and UPDI lines I tossed the ribbon cable into my junk box where it belongs.

Reassembling the Programmer

I used a nibbler to cut rectangular notches in the white plastic beneath the blue plastic ring and ran the breakout wires through the hole. When reassembled, these gaps left just enough space for the wires to easily pass through. I didn’t attempt to secure the wires to the case, but users concerned about damage from pulling may achieve enhanced protection by using a small zip tie around the wires inside the case next to the hole.

I added zip ties to secure the wires from accidental tugs. I also removed the 3.3 V line to reduce risk for the reasons described a few paragraphs back.


My Atmel ICE can now power a device and program it without requiring the ribbon cable or an external power supply. After using this for a few days, I’m very satisfied with the result! This modification doesn’t disable any of the original functionality, so users always have the option to plug in the ribbon cable if they want to program a device that way.

Additional Resources