SWHarden.com

The personal website of Scott W Harden

Local AI Chat with C#

How to run an LLM locally to power AI chat and answer questions about documents

This page describes how I use C# to run the LLaMA 2 large language model (LLM) locally to achieve AI chat, including the ability to answer questions about local documents. I previously described how I run LLama2 locally using Python (and how I use it to answer questions about documents). Although the Python ecosystem is fantastic for end-users who are strong programmers, setting-up the development environment required to run Python scripts can be a challenge. Maintaining Python projects can be cumbersome too, and I received numerous emails indicating that the code examples I posted just a few months ago no longer run as expected. Although I intend to go back and update those old Python tutorials to try to keep them current, I was very happy to learn I can achieve similar functionality using the .NET ecosystem. This article shows how I use free and open-source tools to create C# applications that leverage locally-hosted LLMs to provide interactive chat, including searching, summarizing, and answering questions about information in local documents.

Key Resources

Summary (TLDR)

AI Chat

To create an interactive AI chat bot that answers user questions:

  1. Download a GGUF file from HuggingFace (I’m using llama-2-7b-chat.Q5_K_M.gguf)

  2. Create a new .NET console application and add the LLamaSharp and LLamaSharp.Backend.Cpu NuGet packages

  3. Add the following your code to your main program:

using LLama.Common;
using LLama;

// Indicate where the GGUF model file is
string modelPath = @"C:\path\to\llama-2-7b-chat.Q5_K_M.gguf";

// Load the model into memory
Console.ForegroundColor = ConsoleColor.DarkGray;
ModelParams modelParams = new(modelPath);
using LLamaWeights weights = LLamaWeights.LoadFromFile(modelParams);

// Setup a chat session
using LLamaContext context = weights.CreateContext(modelParams);
InteractiveExecutor ex = new(context);
ChatSession session = new(ex);
var hideWords = new LLamaTransforms.KeywordTextOutputStreamTransform(["User:", "Bot: "]);
session.WithOutputTransform(hideWords);
InferenceParams infParams = new()
{
    Temperature = 0.6f, // higher values give more "creative" answers
    AntiPrompts = ["User:"]
};

while (true)
{
    // Get a question from the user
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("\nQuestion: ");
    string userInput = Console.ReadLine() ?? string.Empty;
    ChatHistory.Message msg = new(AuthorRole.User, "Question: " + userInput);

    // Display answer text as it is being generated
    Console.ForegroundColor = ConsoleColor.Yellow;
    await foreach (string text in session.ChatAsync(msg, infParams))
    {
        Console.Write(text);
    }
}

Note: some lines of code related to styling have been omitted. See the GitHub repository for this blog post for full source code.

Example Output

A basic question about planets yields a concise response:

Chat sessions preserve history, enabling “follow-up” questions where the model uses context from previous discussion:

Chat about Documents

To create an AI chat bot that answers user questions about documents:

  1. Download a GGUF file from HuggingFace (I’m using llama-2-7b-chat.Q5_K_M.gguf)

  2. Create a new .NET console application and add the LLamaSharp and LLamaSharp.Backend.Cpu NuGet packages

  3. Add the following your code to your main program:

using LLamaSharp.KernelMemory;
using Microsoft.KernelMemory.Configuration;
using Microsoft.KernelMemory;
using System.Diagnostics;

// Setup the kernel memory with the LLM model
string modelPath = @"C:\path\to\llama-2-7b-chat.Q5_K_M.gguf";
LLama.Common.InferenceParams infParams = new() { AntiPrompts = ["\n\n"] };
LLamaSharpConfig lsConfig = new(modelPath) { DefaultInferenceParams = infParams };
SearchClientConfig searchClientConfig = new() { MaxMatchesCount = 1, AnswerTokens = 100 };
TextPartitioningOptions parseOptions = new() { MaxTokensPerParagraph = 300, MaxTokensPerLine = 100, OverlappingTokens = 30 };
IKernelMemory memory = new KernelMemoryBuilder()
    .WithLLamaSharpDefaults(lsConfig)
    .WithSearchClientConfig(searchClientConfig)
    .With(parseOptions)
    .Build();

// Ingest documents (format is automatically detected from the filename)
string documentFolder = @"C:\path\to\documents";
string[] documentPaths = Directory.GetFiles(documentFolder, "*.txt");
for (int i = 0; i < documentPaths.Length; i++)
{
    await memory.ImportDocumentAsync(documentPaths[i], steps: Constants.PipelineWithoutSummary);
}

// Allow the user to ask questions forever
while (true)
{
    Console.Write("\nQuestion: ");
    string question = Console.ReadLine() ?? string.Empty;
    MemoryAnswer answer = await memory.AskAsync(question);
    Console.WriteLine($"Answer: {answer.Result}");
}

Note: some lines of code related to styling have been omitted. See the GitHub repository for this blog post for full source code.

Example Output

I gave this program a PDF copy of the About Scott page from my website, then asked who Scott is. I think the phrase “skilled computer programmer and electrical engineer” is a bit dramatic, but overall the information returned lines up pretty well!

I then provided information about a fictitious Python package as a text file and asked about it. The information I provided is quoted below, and the response to my question about it is pretty good!

“JupyterGoBoom” is the name of a Python package for creating unmaintainable Jupyter notebooks. It is no longer actively developed and is now considered obsolete because modern software developers have come to realize that Jupyter notebooks grow to become unmaintainable all by themselves.

Document Ingestion with Local Storage

Users who run the code above to perform document ingestion will find that it takes a long time to ingest large documents (on the order of minutes), and restarting the program requires reanalyzing those files all over again.

The Kernel Memory package has functionality that allows the information gathered from documents to be stored, then re-loaded into memory almost instantly the next time the program is started. There are extensions to allow memory to be stored in various cloud engines and databases (Azure AI Search, Elasticsearch, Postgres, SQL Server, etc.), but in this example we will store and retrieve this information using the local filesystem.

To enable local storage of ingested document information for quick retrieval, modify the code example above to build the kernel memory as shown here:

SimpleFileStorageConfig storageConfig = new()
{
    Directory = "./storage/",
    StorageType = FileSystemTypes.Disk,
};

SimpleVectorDbConfig vectorDbConfig = new()
{
    Directory = "./storage/",
    StorageType = FileSystemTypes.Disk,
};

IKernelMemory memory = new KernelMemoryBuilder()
    .WithSimpleFileStorage(storageConfig) // store information locally
    .WithSimpleVectorDb(vectorDbConfig)   // retrieve information locally
    .WithLLamaSharpDefaults(lsConfig)
    .WithSearchClientConfig(searchClientConfig)
    .With(parseOptions)
    .Build();

When the code is run it will be slow to start the first time as it ingests the documents, but once ingested the information will be saved to disk (in the ./storage/ folder) and rapidly loaded into memory the next time the application starts.

Conclusions

The LLamaSharp and Kernel Memory packages can be easily combined to create small C# projects which are able to run LLMs locally to enable AI chat functionality, including summarizing and answering questions about documents. Unlike Python-centric strategies that require end users to have development tools installed and maintain system-wide or virtual environments for package management just to run basic scripts, the .NET-centric strategy described here makes it possible to create compiled apps that are simple to distribute and easy to run by non-technical users. Much of the AI/ML documentation and discussion in recent users has been dominated by Python, but I’m thrilled to see C#/.NET tools like these growing in the AI/ML landscape! I’m excited to watch how these projects will continue to evolve in the years to come.

Resources