The personal website of Scott W Harden
May 1st, 2022

Using DataFrames in C#

The DataFrame is a data structure designed for manipulation, analysis, and visualization of tabular data, and it is the cornerstone of many data science applications. One of the most famous implementations of the data frame is provided by the Pandas package for Python. An equivalent data structure is available for C# using Microsoft's data analysis package. Although data frames are commonly used in Jupyter notebooks, they can be used in standard .NET applications as well. This article surveys Microsoft's Data Analysis package and introduces how to interact with with data frames using C# and the .NET platform.

DataFrame Quickstart

  • A DataFrame is a 2D matrix that stores data values in named columns.
  • Each column has a distinct data type.
  • Rows represent observations.

Add the Microsoft.Data.Analysis package to your project, then you can create a DataFrame like this:

using Microsoft.Data.Analysis;

string[] names = { "Oliver", "Charlotte", "Henry", "Amelia", "Owen" };
int[] ages = { 23, 19, 42, 64, 35 };
double[] heights = { 1.91, 1.62, 1.72, 1.57, 1.85 };

DataFrameColumn[] columns = {
    new StringDataFrameColumn("Name", names),
    new PrimitiveDataFrameColumn<int>("Age", ages),
    new PrimitiveDataFrameColumn<double>("Height", heights),
};

DataFrame df = new(columns);

Contents of a DataFrame can be previewed using Console.WriteLine(df) but the formatting isn't pretty.

Name  Age   Height
Oliver23    1.91
Charlotte19    1.62
Henry 42    1.72
Amelia64    1.57
Owen  35    1.85

Pretty DataFrame Formatting

A custom PrettyPrint() extension method can improve DataFrame readability. Implementing this as an extension method allows me to call df.PrettyPrint() anywhere in my code.

💡 View the full PrettyPrinters.cs source code
using Microsoft.Data.Analysis;
using System.Text;

internal static class PrettyPrinters
{
    public static void PrettyPrint(this DataFrame df) => Console.WriteLine(PrettyText(df));
    public static string PrettyText(this DataFrame df) => ToStringArray2D(df).ToFormattedText();

    public static string ToMarkdown(this DataFrame df) => ToStringArray2D(df).ToMarkdown();

    public static void PrettyPrint(this DataFrameRow row) => Console.WriteLine(Pretty(row));
    public static string Pretty(this DataFrameRow row) => row.Select(x => x?.ToString() ?? string.Empty).StringJoin();
    private static string StringJoin(this IEnumerable<string> strings) => string.Join(" ", strings.Select(x => x.ToString()));

    private static string[,] ToStringArray2D(DataFrame df)
    {
        string[,] strings = new string[df.Rows.Count + 1, df.Columns.Count];

        for (int i = 0; i < df.Columns.Count; i++)
            strings[0, i] = df.Columns[i].Name;

        for (int i = 0; i < df.Rows.Count; i++)
            for (int j = 0; j < df.Columns.Count; j++)
                strings[i + 1, j] = df[i, j]?.ToString() ?? string.Empty;

        return strings;
    }

    private static int[] GetMaxLengthsByColumn(this string[,] strings)
    {
        int[] maxLengthsByColumn = new int[strings.GetLength(1)];

        for (int y = 0; y < strings.GetLength(0); y++)
            for (int x = 0; x < strings.GetLength(1); x++)
                maxLengthsByColumn[x] = Math.Max(maxLengthsByColumn[x], strings[y, x].Length);

        return maxLengthsByColumn;
    }

    private static string ToFormattedText(this string[,] strings)
    {
        StringBuilder sb = new();
        int[] maxLengthsByColumn = GetMaxLengthsByColumn(strings);

        for (int y = 0; y < strings.GetLength(0); y++)
        {
            for (int x = 0; x < strings.GetLength(1); x++)
            {
                sb.Append(strings[y, x].PadRight(maxLengthsByColumn[x] + 2));
            }
            sb.AppendLine();
        }

        return sb.ToString();
    }


    private static string ToMarkdown(this string[,] strings)
    {
        StringBuilder sb = new();
        int[] maxLengthsByColumn = GetMaxLengthsByColumn(strings);

        for (int y = 0; y < strings.GetLength(0); y++)
        {
            for (int x = 0; x < strings.GetLength(1); x++)
            {
                sb.Append(strings[y, x].PadRight(maxLengthsByColumn[x]));
                if (x < strings.GetLength(1) - 1)
                    sb.Append(" | ");
            }
            sb.AppendLine();

            if (y == 0)
            {
                for (int i = 0; i < strings.GetLength(1); i++)
                {
                    int bars = maxLengthsByColumn[i] + 2;
                    if (i == 0)
                        bars -= 1;
                    sb.Append(new String('-', bars));

                    if (i < strings.GetLength(1) - 1)
                        sb.Append("|");
                }
                sb.AppendLine();
            }
        }

        return sb.ToString();
    }
}
Name       Age  Height
Oliver     23   1.91
Charlotte  19   1.62
Henry      42   1.72
Amelia     64   1.57
Owen       35   1.85

I can create similar methods to format a DataFrame as Markdown or HTML.

Name      | Age | Height
----------|-----|--------
Oliver    | 23  | 1.91
Charlotte | 19  | 1.62
Henry     | 42  | 1.72
Amelia    | 64  | 1.57
Owen      | 35  | 1.85
Name Age Height
Oliver 23 1.91
Charlotte 19 1.62
Henry 42 1.72
Amelia 64 1.57
Owen 35 1.85

Using DataFrames in Interactive Notebooks

To get started using .NET workbooks, install the .NET Interactive Notebooks extension for VS Code, create a new demo.ipynb file, then add your code.

Previously users had to create custom HTML formatters to properly display DataFrames in .NET Interactive Notebooks, but these days it works right out of the box.

💡 See demo.html for a full length demonstration notebook

// visualize the DataFrame
df

Append a Row

Build a new row using key/value pair then append it to the DataFrame

List<KeyValuePair<string, object>> newRowData = new()
{
    new KeyValuePair<string, object>("Name", "Scott"),
    new KeyValuePair<string, object>("Age", 36),
    new KeyValuePair<string, object>("Height", 1.65),
};

df.Append(newRowData, inPlace: true);

Add a Column

Build a new column, populate it with data, and add it to the DataFrame

int[] weights = { 123, 321, 111, 121, 131 };
PrimitiveDataFrameColumn<int> weightCol = new("Weight", weights);
df.Columns.Add(weightCol);

Sort and Filter

The DataFrame class has numerous operations available to sort, filter, and analyze data in many different ways. A popular pattern when working with DataFrames is to use method chaining to combine numerous operations together into a single statement. See the DataFrame Class API for a full list of available operations.

df.OrderBy("Name")
    .Filter(df["Age"].ElementwiseGreaterThan(30))
    .PrettyPrint();
Name    Age  Height
Henry   42   1.72
Oliver  23   1.91
Owen    35   1.85

Mathematical Operations

It's easy to perform math on columns or across multiple DataFrames. In this example we will perform math using two columns and create a new column to hold the output.

DataFrameColumn iqCol = df["Age"] * df["Height"] * 1.5;

double[] iqs = Enumerable.Range(0, (int)iqCol.Length)
    .Select(x => (double)iqCol[x])
    .ToArray();

df.Columns.Add(new PrimitiveDataFrameColumn<double>("IQ", iqs));
df.PrettyPrint();
Name       Age  Height  IQ
Oliver     23   1.91    65.9
Charlotte  19   1.62    46.17
Henry      42   1.72    108.36
Amelia     64   1.57    150.72
Owen       35   1.85    97.12

Statistical Operations

You can iterate across every row of a column to calculate population statistics

foreach (DataFrameColumn col in df.Columns.Skip(1))
{
    // warning: additional care must be taken for datasets which contain null
    double[] values = Enumerable.Range(0, (int)col.Length).Select(x => Convert.ToDouble(col[x])).ToArray();
    (double mean, double std) = MeanAndStd(values);
    Console.WriteLine($"{col.Name} = {mean} +/- {std:N3} (n={values.Length})");
}
Age = 36.6 +/- 15.982 (n=5)
Height = 1.734 +/- 0.130 (n=5)
💡 View the full MeanAndStd() source code
private static (double mean, double std) MeanAndStd(double[] values)
{
    if (values is null)
        throw new ArgumentNullException(nameof(values));

    if (values.Length == 0)
        throw new ArgumentException($"{nameof(values)} must not be empty");

    double sum = 0;
    for (int i = 0; i < values.Length; i++)
        sum += values[i];

    double mean = sum / values.Length;

    double sumVariancesSquared = 0;
    for (int i = 0; i < values.Length; i++)
    {
        double pointVariance = Math.Abs(mean - values[i]);
        double pointVarianceSquared = Math.Pow(pointVariance, 2);
        sumVariancesSquared += pointVarianceSquared;
    }

    double meanVarianceSquared = sumVariancesSquared / values.Length;
    double std = Math.Sqrt(meanVarianceSquared);

    return (mean, std);
}

Plot Values from a DataFrame

I use ScottPlot.NET to visualize data from DataFrames in .NET applications and .NET Interactive Notebooks. ScottPlot can generate a variety of plot types and has many options for customization. See the ScottPlot Cookbook for examples and API documentation.

// Register a custom formatter to display ScottPlot plots as images
using Microsoft.DotNet.Interactive.Formatting;
Formatter.Register(typeof(ScottPlot.Plot), (plt, writer) => 
    writer.Write(((ScottPlot.Plot)plt).GetImageHTML()), HtmlFormatter.MimeType);
// Get the data you wish to display in double arrays
double[] ages = Enumerable.Range(0, (int)df.Rows.Count).Select(x => Convert.ToDouble(df["Age"][x])).ToArray();
double[] heights = Enumerable.Range(0, (int)df.Rows.Count).Select(x => Convert.ToDouble(df["Height"][x])).ToArray();
// Create and display a plot
var plt = new ScottPlot.Plot(400, 300);
plt.AddScatter(ages, heights);
plt.XLabel("Age");
plt.YLabel("Height");
plt

💡 See demo.html for a full length demonstration notebook

If you are only working inside a Notebook and you want all your plots to be HTML and JavaScript, XPlot.Plotly is a good tool to use.

Data may contain null

I didn't demonstrate it in the code examples above, but note that all column data types are nullable. While null-containing data requires extra considerations when writing mathematical routes, it's a convenient way to model missing data which is a common occurrence in the real world.

Why not just use LINQ?

I see this question asked frequently, often with an aggressive and condescending tone. LINQ (Language-Integrated Query) is fantastic for performing logical operations on simple collections of data. When you have large 2D datasets of labeled data, advantages of data frames over flat LINQ statements start to become apparent. It is also easy to perform logical operations across multiple data frames, allowing users to write simpler and more readable code than could be achieved with LINQ statements. Data frames also make it much easier to visualize complex data too. In the data science world where complex labeled datasets are routinely compared, manipulated, merged, and visualized, often in an interactive context, the data frames are much easier to work with than raw LINQ statements.

Conclusions

Although I typically reach for Python to perform exploratory data science, it's good to know that C# has a DataFrame available and that it can be used to inspect and manipulate tabular data. DataFrames pair well with ScottPlot figures in interactive notebooks and are a great way to inspect and communicate complex data. I look forward to watching Microsoft's Data Analysis namespace continue to evolve as part of their machine learning / ML.NET platform.

Resources

Markdown source code last modified on May 2nd, 2022
---
title: Using DataFrames in C#
description: How to use the DataFrame class from the Microsoft.Data.Analysis package to interact with tabular data
date: 2022-05-01 23:00:00
tags: csharp
---

# Using DataFrames in C# 

**The DataFrame is a data structure designed for manipulation, analysis, and visualization of tabular data, and it is the cornerstone of many data science applications.** One of the most famous implementations of the data frame is provided by the Pandas package for Python. An equivalent data structure is available for C# using Microsoft's data analysis package. Although data frames are commonly used in Jupyter notebooks, they can be used in standard .NET applications as well. This article surveys Microsoft's Data Analysis package and introduces how to interact with with data frames using C# and the .NET platform.

## DataFrame Quickstart

* A DataFrame is a 2D matrix that stores data values in named columns.
* Each column has a distinct data type.
* Rows represent observations.

Add the [Microsoft.Data.Analysis package](https://www.nuget.org/packages/Microsoft.Data.Analysis/) to your project, then you can create a DataFrame like this:

```cs
using Microsoft.Data.Analysis;

string[] names = { "Oliver", "Charlotte", "Henry", "Amelia", "Owen" };
int[] ages = { 23, 19, 42, 64, 35 };
double[] heights = { 1.91, 1.62, 1.72, 1.57, 1.85 };

DataFrameColumn[] columns = {
    new StringDataFrameColumn("Name", names),
    new PrimitiveDataFrameColumn<int>("Age", ages),
    new PrimitiveDataFrameColumn<double>("Height", heights),
};

DataFrame df = new(columns);
```

Contents of a DataFrame can be previewed using `Console.WriteLine(df)` but the formatting isn't pretty.

```text
Name  Age   Height
Oliver23    1.91
Charlotte19    1.62
Henry 42    1.72
Amelia64    1.57
Owen  35    1.85
```

## Pretty DataFrame Formatting

**A custom `PrettyPrint()` extension method can improve DataFrame readability.** Implementing this as an extension method allows me to call `df.PrettyPrint()` anywhere in my code.

<details>
<summary>💡 View the full <code>PrettyPrinters.cs</code> source code</summary>

```cs
using Microsoft.Data.Analysis;
using System.Text;

internal static class PrettyPrinters
{
    public static void PrettyPrint(this DataFrame df) => Console.WriteLine(PrettyText(df));
    public static string PrettyText(this DataFrame df) => ToStringArray2D(df).ToFormattedText();

    public static string ToMarkdown(this DataFrame df) => ToStringArray2D(df).ToMarkdown();

    public static void PrettyPrint(this DataFrameRow row) => Console.WriteLine(Pretty(row));
    public static string Pretty(this DataFrameRow row) => row.Select(x => x?.ToString() ?? string.Empty).StringJoin();
    private static string StringJoin(this IEnumerable<string> strings) => string.Join(" ", strings.Select(x => x.ToString()));

    private static string[,] ToStringArray2D(DataFrame df)
    {
        string[,] strings = new string[df.Rows.Count + 1, df.Columns.Count];

        for (int i = 0; i < df.Columns.Count; i++)
            strings[0, i] = df.Columns[i].Name;

        for (int i = 0; i < df.Rows.Count; i++)
            for (int j = 0; j < df.Columns.Count; j++)
                strings[i + 1, j] = df[i, j]?.ToString() ?? string.Empty;

        return strings;
    }

    private static int[] GetMaxLengthsByColumn(this string[,] strings)
    {
        int[] maxLengthsByColumn = new int[strings.GetLength(1)];

        for (int y = 0; y < strings.GetLength(0); y++)
            for (int x = 0; x < strings.GetLength(1); x++)
                maxLengthsByColumn[x] = Math.Max(maxLengthsByColumn[x], strings[y, x].Length);

        return maxLengthsByColumn;
    }

    private static string ToFormattedText(this string[,] strings)
    {
        StringBuilder sb = new();
        int[] maxLengthsByColumn = GetMaxLengthsByColumn(strings);

        for (int y = 0; y < strings.GetLength(0); y++)
        {
            for (int x = 0; x < strings.GetLength(1); x++)
            {
                sb.Append(strings[y, x].PadRight(maxLengthsByColumn[x] + 2));
            }
            sb.AppendLine();
        }

        return sb.ToString();
    }


    private static string ToMarkdown(this string[,] strings)
    {
        StringBuilder sb = new();
        int[] maxLengthsByColumn = GetMaxLengthsByColumn(strings);

        for (int y = 0; y < strings.GetLength(0); y++)
        {
            for (int x = 0; x < strings.GetLength(1); x++)
            {
                sb.Append(strings[y, x].PadRight(maxLengthsByColumn[x]));
                if (x < strings.GetLength(1) - 1)
                    sb.Append(" | ");
            }
            sb.AppendLine();

            if (y == 0)
            {
                for (int i = 0; i < strings.GetLength(1); i++)
                {
                    int bars = maxLengthsByColumn[i] + 2;
                    if (i == 0)
                        bars -= 1;
                    sb.Append(new String('-', bars));

                    if (i < strings.GetLength(1) - 1)
                        sb.Append("|");
                }
                sb.AppendLine();
            }
        }

        return sb.ToString();
    }
}
```

</details>

```cs
Name       Age  Height
Oliver     23   1.91
Charlotte  19   1.62
Henry      42   1.72
Amelia     64   1.57
Owen       35   1.85
```

I can create similar methods to format a DataFrame as Markdown or HTML.

```text
Name      | Age | Height
----------|-----|--------
Oliver    | 23  | 1.91
Charlotte | 19  | 1.62
Henry     | 42  | 1.72
Amelia    | 64  | 1.57
Owen      | 35  | 1.85
```


Name      | Age | Height
----------|-----|--------
Oliver    | 23  | 1.91
Charlotte | 19  | 1.62
Henry     | 42  | 1.72
Amelia    | 64  | 1.57
Owen      | 35  | 1.85

## Using DataFrames in Interactive Notebooks

To get started using .NET workbooks, install the [.NET Interactive Notebooks extension for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.dotnet-interactive-vscode), create a new `demo.ipynb` file, then add your code.

Previously users had to create custom HTML formatters to properly display DataFrames in .NET Interactive Notebooks, but these days it works right out of the box.

> 💡 See [demo.html](demo.html) for a full length demonstration notebook

```cs
// visualize the DataFrame
df
```

![](dataframe-notebook.jpg)

## Append a Row

Build a new row using key/value pair then append it to the DataFrame

```cs
List<KeyValuePair<string, object>> newRowData = new()
{
    new KeyValuePair<string, object>("Name", "Scott"),
    new KeyValuePair<string, object>("Age", 36),
    new KeyValuePair<string, object>("Height", 1.65),
};

df.Append(newRowData, inPlace: true);
```

## Add a Column

Build a new column, populate it with data, and add it to the DataFrame

```cs
int[] weights = { 123, 321, 111, 121, 131 };
PrimitiveDataFrameColumn<int> weightCol = new("Weight", weights);
df.Columns.Add(weightCol);
```

## Sort and Filter

**The DataFrame class has numerous operations available** to sort, filter, and analyze data in many different ways. A popular pattern when working with DataFrames is to use _method chaining_ to combine numerous operations together into a single statement. See the [DataFrame Class API](https://docs.microsoft.com/en-us/dotnet/api/microsoft.data.analysis.dataframe) for a full list of available operations.

```cs
df.OrderBy("Name")
    .Filter(df["Age"].ElementwiseGreaterThan(30))
    .PrettyPrint();
```

```text
Name    Age  Height
Henry   42   1.72
Oliver  23   1.91
Owen    35   1.85
```

## Mathematical Operations

It's easy to perform math on columns or across multiple DataFrames. In this example we will perform math using two columns and create a new column to hold the output.

```cs
DataFrameColumn iqCol = df["Age"] * df["Height"] * 1.5;

double[] iqs = Enumerable.Range(0, (int)iqCol.Length)
    .Select(x => (double)iqCol[x])
    .ToArray();

df.Columns.Add(new PrimitiveDataFrameColumn<double>("IQ", iqs));
df.PrettyPrint();
```

```text
Name       Age  Height  IQ
Oliver     23   1.91    65.9
Charlotte  19   1.62    46.17
Henry      42   1.72    108.36
Amelia     64   1.57    150.72
Owen       35   1.85    97.12
```

## Statistical Operations

You can iterate across every row of a column to calculate population statistics

```cs
foreach (DataFrameColumn col in df.Columns.Skip(1))
{
    // warning: additional care must be taken for datasets which contain null
    double[] values = Enumerable.Range(0, (int)col.Length).Select(x => Convert.ToDouble(col[x])).ToArray();
    (double mean, double std) = MeanAndStd(values);
    Console.WriteLine($"{col.Name} = {mean} +/- {std:N3} (n={values.Length})");
}
```

```text
Age = 36.6 +/- 15.982 (n=5)
Height = 1.734 +/- 0.130 (n=5)
```


<details>
<summary>💡 View the full <code>MeanAndStd()</code> source code</summary>

```cs
private static (double mean, double std) MeanAndStd(double[] values)
{
	if (values is null)
		throw new ArgumentNullException(nameof(values));

	if (values.Length == 0)
		throw new ArgumentException($"{nameof(values)} must not be empty");

	double sum = 0;
	for (int i = 0; i < values.Length; i++)
		sum += values[i];

	double mean = sum / values.Length;

	double sumVariancesSquared = 0;
	for (int i = 0; i < values.Length; i++)
	{
		double pointVariance = Math.Abs(mean - values[i]);
		double pointVarianceSquared = Math.Pow(pointVariance, 2);
		sumVariancesSquared += pointVarianceSquared;
	}

	double meanVarianceSquared = sumVariancesSquared / values.Length;
	double std = Math.Sqrt(meanVarianceSquared);

	return (mean, std);
}
```

</details>

## Plot Values from a DataFrame

**I use [ScottPlot.NET](https://scottplot.net) to visualize data from DataFrames in .NET applications and .NET Interactive Notebooks.** ScottPlot can generate a variety of plot types and has many options for customization. See [the ScottPlot Cookbook](https://scottplot.net/cookbook/4.1/) for examples and API documentation.

```cs
// Register a custom formatter to display ScottPlot plots as images
using Microsoft.DotNet.Interactive.Formatting;
Formatter.Register(typeof(ScottPlot.Plot), (plt, writer) => 
    writer.Write(((ScottPlot.Plot)plt).GetImageHTML()), HtmlFormatter.MimeType);
```

```cs
// Get the data you wish to display in double arrays
double[] ages = Enumerable.Range(0, (int)df.Rows.Count).Select(x => Convert.ToDouble(df["Age"][x])).ToArray();
double[] heights = Enumerable.Range(0, (int)df.Rows.Count).Select(x => Convert.ToDouble(df["Height"][x])).ToArray();
```

```cs
// Create and display a plot
var plt = new ScottPlot.Plot(400, 300);
plt.AddScatter(ages, heights);
plt.XLabel("Age");
plt.YLabel("Height");
plt
```

![](scottplot-notebook.png)

> 💡 See [demo.html](demo.html) for a full length demonstration notebook

If you are only working inside a Notebook and you want all your plots to be HTML and JavaScript, [XPlot.Plotly](https://towardsdatascience.com/getting-started-with-c-dataframe-and-xplot-ploty-6ea6ce0ce8e3) is a good tool to use.

## Data may contain null

I didn't demonstrate it in the code examples above, but note that all column data types are nullable. While null-containing data requires extra considerations when writing mathematical routes, it's a convenient way to model missing data which is a common occurrence in the real world. 

## Why not just use LINQ?

I see this question asked frequently, often with an aggressive and condescending tone. LINQ (Language-Integrated Query) is fantastic for performing logical operations on simple collections of data. When you have large 2D datasets of _labeled_ data, advantages of data frames over flat LINQ statements start to become apparent. It is also easy to perform logical operations across multiple data frames, allowing users to write simpler and more readable code than could be achieved with LINQ statements. Data frames also make it much easier to visualize complex data too. In the data science world where complex labeled datasets are routinely compared, manipulated, merged, and visualized, often in an interactive context, the data frames are much easier to work with than raw LINQ statements.

## Conclusions

Although I typically reach for Python to perform exploratory data science, it's good to know that C# has a DataFrame available and that it can be used to inspect and manipulate tabular data. DataFrames pair well with [ScottPlot](https://scottplot.net) figures in interactive notebooks and are a great way to inspect and communicate complex data. I look forward to watching Microsoft's Data Analysis namespace continue to evolve as part of their machine learning / ML.NET platform.

## Resources

* [Example notebook for this project](demo.html)

* [Source code for this project](https://github.com/swharden/Csharp-Data-Visualization/tree/main/projects/dataframe)

* [Official `Microsoft.Data.Analysis.DataFrame` Class Documentation](https://docs.microsoft.com/en-us/dotnet/api/microsoft.data.analysis.dataframe)

* [Microsoft.Data.Analysis source code](https://github.com/dotnet/machinelearning/tree/main/src/Microsoft.Data.Analysis) 

* [An Introduction to DataFrame](https://devblogs.microsoft.com/dotnet/an-introduction-to-dataframe/) (.NET Blog)

* [ExtremeOptimization DataFrame Quickstart](https://www.extremeoptimization.com/QuickStart/CSharp/DataFrames.aspx)

* [`Microsoft.Data.Analysis` on NuGet](https://www.nuget.org/packages/Microsoft.Data.Analysis/)

* [Getting Started With C# DataFrame and XPlot.Plotly](https://towardsdatascience.com/getting-started-with-c-dataframe-and-xplot-ploty-6ea6ce0ce8e3)

* [10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html)
April 4th, 2022

Mystify your Mind with SkiaSharp

This article explores my recreation of the classic screensaver Mystify your Mind implemented using C#. I used SkiaSharp to draw graphics and FFMpegCore to encode frames into high definition video files suitable for YouTube.

The Mystify Sandbox application has advanced options allowing exploration of various configurations outside the capabilities of the original screensaver. Interesting configurations can be exported as video (x264-encoded MP4 or WebM format) or viewed in full-screen mode resembling an actual screensaver.

Download

Programming Strategy

  • Corner - tracks point that bounces around the edges of the screen
    • Has Position and Velocity fields
    • Has Advance() to move points collide with edges
  • Wire - represents a single polygon that moves around the screen
    • Contains List<Corner> and a Color which all change over time
    • Has Advance() which advances all corner and cycles Color.
    • Contains List<WireSnapshot> to record history
  • WireSnapshot - represents properties of a Wire at an instant in time
    • Contains Point[] and Color and is intended to be immutable
    • Can draw itself using a Draw() method that accepts a SKCanvas
  • Field - represents the whole animation
    • Contains List<Wire> and has Width and Height
    • Has Advance() which advances all wires
    • Can draw itself using a Draw() method that accepts a SKCanvas

Original Behavior

Close inspection of video from the original Mystify screensaver revealed notable behaviors.

Broken Lines

The original Mystify implementation did not clear the screen and between every frame. With GDI large fills (clearing the background) are expensive, and drawing many polygons probably challenged performance in the 90s. Instead only the leading wire was drawn, and the trailing wire was drawn-over using black. This strategy results in lines which appear to have single pixel breaks on a black background (magenta arrow). It may not have been particularly visible on CRT monitors available in the 90s, but it is quite noticeable on LCD screens today.

Bouncing Changes Speed

Observing videos of the classic screensaver I noticed that corners don't bounce symmetrically off edges. After every bounce they change their speed slightly. This can be seen by observing the history of corners which reflect off edges of the screen demonstrating their change in speed (green arrow). I recreated this behavior using a weighted random number generator.

Programming Notes

Color Cycling

I used a HSL-to-RGB method to generate colors from hue (variable), saturation (always 100%), and luminosity (always 50%). By repeatedly ramping hue from 0% to 100% slowly I achieved a rainbow gradient effect. Increasing the color change speed (% change for every new wire) cycles the colors faster, and very high values produce polygons whose visible history spans a gradient of colors. Fade effect is achieved by increasing alpha of wire snapshots as they are drawn from old to new.

Encoding video with C

The FFMpegCore package is a C# wrapper for FFMpeg that can encode video from frames piped into it. Using this strategy required creation of a SkiaSharp.SKBitmap wrapper that implements FFMpegCore.Pipes.IVideoFrame. For a full explaination and example code see C# Data Visualization: Render Video with SkiaSharp.

Performance

It's amusing to see retro screensavers running on modern gear! I can run this graphics model simulation at full-screen resolutions using thousands of wires at real-time frame rates. The most natural density of shapes for my 3440x1440 display was 20 wires with a history of 5.

Rendering the 2D image and encoding HD video using the x264 codec occupies all my CPU cores and runs a little above 500 frames per second. Encoding 24 hours of video (over 2 million frames) took this system 1 hour and 12 minutes and produced a 15.3 GB MP4 file. Encoding WebM format is considerably slower, with the same system only achieving an encoding rate of 12 frames per second.

Simulations

Traditional Behavior

The classic screensaver is typically run with two 4-cornered polygons that slowly change color.

Rainbow

Increasing the rate of color transition produces a rainbow effect within the visible history of polygons. The effect is made more striking by increasing the history length and decreasing the speed so the historical lines are closer together.

Solid

If the speed is greatly decreased and the number of historical records is greatly increased the resulting shape has little or no gap between historical traces and appears like a solid object. If fading is enabled (where opacity of older traces fades to transparent) the resulting effect is very interesting.

Chaos

Adding 100 shapes produces a chaotic but interesting effect. This may be the first time the world has seen Mystify like this!

EDIT: All these lines are very stressful on the video encoder and produce large file sizes to achieve high quality (25 MB for 10 seconds). I'm showing this one as a JPEG but click here to view mystify-100.webm if you're on a good internet connection.

YouTube

Resources

Markdown source code last modified on April 9th, 2022
---
title: Mystify your Mind with SkiaSharp
description: My implementation of the classic screensaver using SkiaSharp, OpenGL, and FFMpeg
date: 2022-04-04 18:34:00
tags: csharp, graphics
---

# Mystify your Mind with SkiaSharp

**This article explores my recreation of the classic screensaver _Mystify your Mind_ implemented using C#.** I used [SkiaSharp](https://github.com/mono/SkiaSharp) to draw graphics and [FFMpegCore](https://github.com/rosenbjerg/FFMpegCore) to encode frames into high definition video files suitable for YouTube.

<div class="text-center">

![](mystify.gif)

</div>

**The Mystify Sandbox application has advanced options** allowing exploration of various configurations outside the capabilities of the original screensaver. Interesting configurations can be exported as video (x264-encoded MP4 or WebM format) or viewed in full-screen mode resembling an actual screensaver. 

![](mystify-advanced.jpg)

## Download
* The [Releases page](https://github.com/swharden/Mystify/releases) has a click-to-run EXE for Windows
* [GitHub.com/swharden/Mystify](https://github.com/swharden/Mystify/) contains project source code (C#/.NET6)

## Programming Strategy

* `Corner` - tracks point that bounces around the edges of the screen
  * Has `Position` and `Velocity` fields
  * Has `Advance()` to move points collide with edges
* `Wire` - represents a single polygon that moves around the screen
  * Contains `List<Corner>` and a `Color` which all change over time
  * Has `Advance()` which advances all corner and cycles `Color`.
  * Contains `List<WireSnapshot>` to record history
* `WireSnapshot` - represents properties of a `Wire` at an instant in time
  * Contains `Point[]` and `Color` and is intended to be immutable
  * Can draw itself using a `Draw()` method that accepts a `SKCanvas`
* `Field` - represents the whole animation
  * Contains `List<Wire>` and has `Width` and `Height`
  * Has `Advance()` which advances all wires
  * Can draw itself using a `Draw()` method that accepts a `SKCanvas`

## Original Behavior

Close inspection of [video from the original](https://youtu.be/SaBvcHHdlGE) Mystify screensaver revealed notable behaviors.

<img src="mystify-inspection.jpg" class="d-block shadow mx-auto my-5">

### Broken Lines
The original Mystify implementation did not clear the screen and between every frame. With GDI large fills (clearing the background) are expensive, and drawing many polygons probably challenged performance in the 90s. Instead only the leading wire was drawn, and the trailing wire was drawn-over using black. This strategy results in lines which appear to have single pixel breaks on a black background (magenta arrow). It may not have been particularly visible on CRT monitors available in the 90s, but it is quite noticeable on LCD screens today.

### Bouncing Changes Speed
Observing videos of the classic screensaver I noticed that corners don't bounce symmetrically off edges. After every bounce they change their speed slightly. This can be seen by observing the history of corners which reflect off edges of the screen demonstrating their change in speed (green arrow). I recreated this behavior using a weighted random number generator.

## Programming Notes

### Color Cycling
I used a HSL-to-RGB method to generate colors from hue (variable), saturation (always 100%), and luminosity (always 50%). By repeatedly ramping hue from 0% to 100% slowly I achieved a rainbow gradient effect. Increasing the color change speed (% change for every new wire) cycles the colors faster, and very high values produce polygons whose visible history spans a gradient of colors. Fade effect is achieved by increasing alpha of wire snapshots as they are drawn from old to new.

### Encoding video with C#
The FFMpegCore package is a C# wrapper for FFMpeg that can encode video from frames piped into it. Using this strategy required creation of a `SkiaSharp.SKBitmap` wrapper that implements `FFMpegCore.Pipes.IVideoFrame`. For a full explaination and example code see [C# Data Visualization: Render Video with SkiaSharp](https://swharden.com/csdv/skiasharp/video/).

### Performance

**It's amusing to see retro screensavers running on modern gear!** I can run this graphics model simulation at full-screen resolutions using thousands of wires at real-time frame rates. The most natural density of shapes for my 3440x1440 display was 20 wires with a history of 5.

<img src="desk.jpg" class="d-block shadow mx-auto my-5">

Rendering the 2D image and encoding HD video using the x264 codec occupies all my CPU cores and runs a little above 500 frames per second. Encoding 24 hours of video (over 2 million frames) took this system 1 hour and 12 minutes and produced a 15.3 GB MP4 file. Encoding WebM format is considerably slower, with the same system only achieving an encoding rate of 12 frames per second.

<img src="cpu.png" class="d-block mx-auto my-5">


## Simulations

### Traditional Behavior

The classic screensaver is typically run with two 4-cornered polygons that slowly change color.

<video width="759" height="470" controls class="d-block mx-auto my-5 shadow" style="max-width: 100%; height: 100%;">
  <source src="mystify-01-standard.webm" type="video/mp4">
</video>

### Rainbow

Increasing the rate of color transition produces a rainbow effect within the visible history of polygons. The effect is made more striking by increasing the history length and decreasing the speed so the historical lines are closer together.

<video width="759" height="470" controls class="d-block mx-auto my-5 shadow" style="max-width: 100%; height: 100%;">
  <source src="mystify-02-rainbow.webm" type="video/mp4">
</video>

### Solid

If the speed is greatly decreased and the number of historical records is greatly increased the resulting shape has little or no gap between historical traces and appears like a solid object. If fading is enabled (where opacity of older traces fades to transparent) the resulting effect is very interesting.

<video width="759" height="470" controls class="d-block mx-auto my-5 shadow" style="max-width: 100%; height: 100%;">
  <source src="mystify-03-solid.webm" type="video/mp4">
</video>

### Chaos

Adding 100 shapes produces a chaotic but interesting effect. This may be the first time the world has seen Mystify like this!

_EDIT: All these lines are very stressful on the video encoder and produce large file sizes to achieve high quality (25 MB for 10 seconds). I'm showing this one as a JPEG but [click here to view mystify-100.webm](mystify-04-100.webm) if you're on a good internet connection._

<a href='mystify-04-100.webm'><img src="mystify-04-100.jpg" class="d-block mx-auto my-5 shadow"></a>

## YouTube

<div class="text-center">

![](https://youtu.be/queN9r3Leis)

</div>

## Resources
* A click-to-run EXE can be downloaded from the [Releases Page](https://github.com/swharden/Mystify/releases)
* Source Code is available on https://github.com/swharden/Mystify
* Implementation Details: [C# Data Visualization: Mystify](https://swharden.com/csdv/simulations/mystify/)
* [C# Data Visualization: Render Video with SkiaSharp](https://swharden.com/csdv/skiasharp/video/)
* GitHub: [SkiaSharp](https://github.com/mono/SkiaSharp)
* GitHub: [FFMpegCore](https://github.com/rosenbjerg/FFMpegCore) 
* Windows 3.1 Mystify (video): https://youtu.be/osCZyfoScFg?t=370
* Windows 95 Mystify (video): https://youtu.be/SaBvcHHdlGE
February 3rd, 2022

Generic Math in C# with .NET 6

Generic types are great, but it has traditionally been difficult to do math with them. Consider the simple task where you want to accept a generic array and return its sum. With .NET 6 (and features currently still in preview), this got much easier!

public static T Sum<T>(T[] values) where T : INumber<T>
{
    T sum = T.Zero;
    for (int i = 0; i < values.Length; i++)
        sum += values[i];
    return sum;
}

To use this feature today you must:

  1. Install the System.Runtime.Experimental NuGet package
  2. Add these lines to the PropertyGroup in your csproj file:
<langversion>preview</langversion>
<EnablePreviewFeatures>true</EnablePreviewFeatures>

Note that the generic math function above is equivalent in speed to one that accepts and returns double[], while a method which accepts a generic but calls Convert.ToDouble() every time is about 3x slower than both options:

// this code works on older versions of .NET but is about 3x slower
public static double SumGenericToDouble<T>(T[] values)
{
    double sum = 0;
    for (int i = 0; i < values.Length; i++)
        sum += Convert.ToDouble(values[i]);
    return sum;
}

Resources

Markdown source code last modified on February 4th, 2022
---
Title: Generic Math in C# with .NET 6
Description: How to perform math on generic types in C# with .NET 6
Date: 2022-02-03 11:55PM EST
Tags: csharp
---

# Generic Math in C# with .NET 6

**Generic types are great, but it has traditionally been difficult to do math with them.** Consider the simple task where you want to accept a generic array and return its sum. With .NET 6 (and features currently still in preview), this got much easier!

```cs
public static T Sum<T>(T[] values) where T : INumber<T>
{
    T sum = T.Zero;
    for (int i = 0; i < values.Length; i++)
        sum += values[i];
    return sum;
}
```

To use this feature today you must:
1. Install the [System.Runtime.Experimental ](https://www.nuget.org/packages/System.Runtime.Experimental/6.0.2-mauipre.1.22054.8) NuGet package
2. Add these lines to the `PropertyGroup` in your csproj file:

```xml
<langversion>preview</langversion>
<EnablePreviewFeatures>true</EnablePreviewFeatures>
```

Note that the generic math function above is equivalent in speed to one that accepts and returns `double[]`, while a method which accepts a generic but calls `Convert.ToDouble()` every time is about 3x slower than both options:

```cs
// this code works on older versions of .NET but is about 3x slower
public static double SumGenericToDouble<T>(T[] values)
{
    double sum = 0;
    for (int i = 0; i < values.Length; i++)
        sum += Convert.ToDouble(values[i]);
    return sum;
}
```

## Resources
* [Preview Features in .NET 6 – Generic Math](https://devblogs.microsoft.com/dotnet/preview-features-in-net-6-generic-math/)
* [Generic Math in .NET 6](https://dunnhq.com/posts/2021/generic-math/)
* [Code Notes: Benchmark Generic Math](https://github.com/swharden/code-notes/tree/master/Csharp/misc/projects/BenchmarkGenericMath)
February 1st, 2022

Rotated Rectangle Hit Detection with C#

I recently had the need to determine if a point is inside a rotated rectangle. This need arose when I wanted to make a rotated rectangular textbox draggable, but I wanted to determine if the mouse was over the rectangle. I know the rectangle's location, size, and rotation, and the position of the mouse cursor, and my goal is to tell if the mouse is inside the rotated rectangle. In this example I'll use Maui.Graphics to render a test image in a Windows Forms application (with SkiaSharp and OpenGL), but the same could be achieved with System.Drawing or other similar 2D graphics libraries.

I started just knowing the width and height of my rectangle. I created an array of points representing its corners.

float rectWidth = 300;
float rectHeight = 150;

PointF[] rectCorners =
{
    new(0, 0),
    new(rectWidth, 0),
    new(rectWidth, rectHeight),
    new(0, rectHeight),
};

I then rotated the rectangle around an origin point by applying a rotation transformation to each corner.

PointF origin = new(200, 300); // center of rotation
double angleRadians = 1.234;
PointF[] rotatedCorners = rectCorners.Select(x => Rotate(origin, x, angleRadians)).ToArray();
private PointF Rotate(PointF origin, PointF point, double radians)
{
    double dx = point.X * Math.Cos(radians) - point.Y * Math.Sin(radians);
    double dy = point.X * Math.Sin(radians) + point.Y * Math.Cos(radians);
    return new PointF(origin.X + (float)dx, origin.Y + (float)dy);
}

To determine if a given point is inside the rotated rectangle I called this method which accepts the point of interest and an array containing the four corners of the rotated rectangle.

public bool IsPointInsideRectangle(PointF pt, PointF[] rectCorners)
{
    double x1 = rectCorners[0].X;
    double x2 = rectCorners[1].X;
    double x3 = rectCorners[2].X;
    double x4 = rectCorners[3].X;

    double y1 = rectCorners[0].Y;
    double y2 = rectCorners[1].Y;
    double y3 = rectCorners[2].Y;
    double y4 = rectCorners[3].Y;

    double a1 = Math.Sqrt((x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2));
    double a2 = Math.Sqrt((x2 - x3) * (x2 - x3) + (y2 - y3) * (y2 - y3));
    double a3 = Math.Sqrt((x3 - x4) * (x3 - x4) + (y3 - y4) * (y3 - y4));
    double a4 = Math.Sqrt((x4 - x1) * (x4 - x1) + (y4 - y1) * (y4 - y1));

    double b1 = Math.Sqrt((x1 - pt.X) * (x1 - pt.X) + (y1 - pt.Y) * (y1 - pt.Y));
    double b2 = Math.Sqrt((x2 - pt.X) * (x2 - pt.X) + (y2 - pt.Y) * (y2 - pt.Y));
    double b3 = Math.Sqrt((x3 - pt.X) * (x3 - pt.X) + (y3 - pt.Y) * (y3 - pt.Y));
    double b4 = Math.Sqrt((x4 - pt.X) * (x4 - pt.X) + (y4 - pt.Y) * (y4 - pt.Y));

    double u1 = (a1 + b1 + b2) / 2;
    double u2 = (a2 + b2 + b3) / 2;
    double u3 = (a3 + b3 + b4) / 2;
    double u4 = (a4 + b4 + b1) / 2;

    double A1 = Math.Sqrt(u1 * (u1 - a1) * (u1 - b1) * (u1 - b2));
    double A2 = Math.Sqrt(u2 * (u2 - a2) * (u2 - b2) * (u2 - b3));
    double A3 = Math.Sqrt(u3 * (u3 - a3) * (u3 - b3) * (u3 - b4));
    double A4 = Math.Sqrt(u4 * (u4 - a4) * (u4 - b4) * (u4 - b1));

    double difference = A1 + A2 + A3 + A4 - a1 * a2;
    return difference < 1;
}

How does it work?

Consider 4 triangles formed by lines between the point and the 4 corners...

If the point is inside the rectangle, the area of the four triangles will equal the area of the rectangle.

If the point is outside the rectangle, the area of the four triangles will be greater than the area of the rectangle.

The code above calculates the area of the 4 rectangles and returns true if it is approximately equal to the area of the rectangle.

Notes

  • In practice you'll probably want to use a more intelligent data structure than a 4-element Pointf[] when calling these functions.

  • The points in the array are clockwise, but I assume this method will work regardless of the order of the points in the array.

  • At the very end of IsPointInsideRectangle() the final decision is made based on a distance being less than a given value. It's true that the cursor will be inside the rectangle if the distance is exactly zero, but with the possible accumulation of floating-point math errors this seemed like a safer option.

Resources

Markdown source code last modified on February 2nd, 2022
---
Title: Point Inside Rectangle
Description: How to determine if a point is inside a rotated rectangle with C#
Date: 2022-02-01 12:10AM EST
Tags: csharp, graphics
---

# Rotated Rectangle Hit Detection with C# 

**I recently had the need to determine if a point is inside a rotated rectangle.** This need arose when I wanted to make a rotated rectangular textbox draggable, but I wanted to determine if the mouse was over the rectangle. I know the rectangle's location, size, and rotation, and the position of the mouse cursor, and my goal is to tell if the mouse is inside the rotated rectangle. In this example I'll use [`Maui.Graphics`](https://maui.graphics) to render a test image in a Windows Forms application (with SkiaSharp and OpenGL), but the same could be achieved with `System.Drawing` or other similar 2D graphics libraries.

<div class="text-center">

![](point-inside-rotated-rectangle.gif)

</div>

I started just knowing the width and height of my rectangle. I created an array of points representing its corners.

```cs
float rectWidth = 300;
float rectHeight = 150;

PointF[] rectCorners =
{
    new(0, 0),
    new(rectWidth, 0),
    new(rectWidth, rectHeight),
    new(0, rectHeight),
};
```

I then rotated the rectangle around an origin point by applying a rotation transformation to each corner.

```cs
PointF origin = new(200, 300); // center of rotation
double angleRadians = 1.234;
PointF[] rotatedCorners = rectCorners.Select(x => Rotate(origin, x, angleRadians)).ToArray();
```

```cs
private PointF Rotate(PointF origin, PointF point, double radians)
{
	double dx = point.X * Math.Cos(radians) - point.Y * Math.Sin(radians);
	double dy = point.X * Math.Sin(radians) + point.Y * Math.Cos(radians);
	return new PointF(origin.X + (float)dx, origin.Y + (float)dy);
}
```

To determine if a given point is inside the rotated rectangle I called this method which accepts the point of interest and an array containing the four corners of the rotated rectangle.

```cs
public bool IsPointInsideRectangle(PointF pt, PointF[] rectCorners)
{
    double x1 = rectCorners[0].X;
    double x2 = rectCorners[1].X;
    double x3 = rectCorners[2].X;
    double x4 = rectCorners[3].X;

    double y1 = rectCorners[0].Y;
    double y2 = rectCorners[1].Y;
    double y3 = rectCorners[2].Y;
    double y4 = rectCorners[3].Y;

    double a1 = Math.Sqrt((x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2));
    double a2 = Math.Sqrt((x2 - x3) * (x2 - x3) + (y2 - y3) * (y2 - y3));
    double a3 = Math.Sqrt((x3 - x4) * (x3 - x4) + (y3 - y4) * (y3 - y4));
    double a4 = Math.Sqrt((x4 - x1) * (x4 - x1) + (y4 - y1) * (y4 - y1));

    double b1 = Math.Sqrt((x1 - pt.X) * (x1 - pt.X) + (y1 - pt.Y) * (y1 - pt.Y));
    double b2 = Math.Sqrt((x2 - pt.X) * (x2 - pt.X) + (y2 - pt.Y) * (y2 - pt.Y));
    double b3 = Math.Sqrt((x3 - pt.X) * (x3 - pt.X) + (y3 - pt.Y) * (y3 - pt.Y));
    double b4 = Math.Sqrt((x4 - pt.X) * (x4 - pt.X) + (y4 - pt.Y) * (y4 - pt.Y));

    double u1 = (a1 + b1 + b2) / 2;
    double u2 = (a2 + b2 + b3) / 2;
    double u3 = (a3 + b3 + b4) / 2;
    double u4 = (a4 + b4 + b1) / 2;

    double A1 = Math.Sqrt(u1 * (u1 - a1) * (u1 - b1) * (u1 - b2));
    double A2 = Math.Sqrt(u2 * (u2 - a2) * (u2 - b2) * (u2 - b3));
    double A3 = Math.Sqrt(u3 * (u3 - a3) * (u3 - b3) * (u3 - b4));
    double A4 = Math.Sqrt(u4 * (u4 - a4) * (u4 - b4) * (u4 - b1));

    double difference = A1 + A2 + A3 + A4 - a1 * a2;
    return difference < 1;
}
```

## How does it work?

Consider 4 triangles formed by lines between the point and the 4 corners...

**If the point is _inside_ the rectangle,** the area of the four triangles will _equal_ the area of the rectangle.

<div class="text-center">

![](rectangle-point-inside.png)

</div>

**If the point is _outside_ the rectangle,** the area of the four triangles will be _greater_ than the area of the rectangle.

<div class="text-center">

![](rectangle-point-outside.png)

</div>

**The code above calculates the area of the 4 rectangles** and returns `true` if it is approximately equal to the area of the rectangle.

## Notes

* In practice you'll probably want to use a more intelligent data structure than a 4-element `Pointf[]` when calling these functions.

* The points in the array are clockwise, but I assume this method will work regardless of the order of the points in the array.

* At the very end of `IsPointInsideRectangle()` the final decision is made based on a distance being less than a given value. It's true that the cursor will be inside the rectangle if the distance is exactly zero, but with the possible accumulation of floating-point math errors this seemed like a safer option.

## Resources
* Source code for this application: [Form1.cs](https://github.com/swharden/Csharp-Data-Visualization/blob/203e024253a2545fc325d1f68d2861a1b9fac74d/projects/rotated-rectangle-intersection/Form1.cs)

* Thanks [@BambOoxX](https://github.com/BambOoxX) for suggesting this in [ScottPlot/PR#1616](https://github.com/ScottPlot/ScottPlot/pull/1616)

* [How to check if a point is inside a rectangle?](https://math.stackexchange.com/q/190403) (StackExchange)

* [swharden / C# Data Visualization](https://github.com/swharden/Csharp-Data-Visualization)
January 22nd, 2022

Spline Interpolation with C#

I recently had the need to create a smoothed curve from a series of X/Y data points in a C# application. I achieved this using cubic spline interpolation. I prefer this strategy because I can control the exact number of points in the output curve, and the generated curve (given sufficient points) will pass through the original data making it excellent for data smoothing applications.

The code below is an adaptation of original work by Ryan Seghers (links below) that I modified to narrow its scope, support double types, use modern language features, and operate statelessly in a functional style with all static methods.

  • It targets .NET Standard 2.0 so it can be used in .NET Framework and .NET Core applications.

  • Input Xs and Ys must be the same length but do not need to be ordered.

  • The interpolated curve may have any number of points (not just even multiples of the input length), and may even have fewer points than the original data.

  • Users cannot define start or end slopes so the curve generated is a natural spline.

public static class Cubic
{
    /// <summary>
    /// Generate a smooth (interpolated) curve that follows the path of the given X/Y points
    /// </summary>
    public static (double[] xs, double[] ys) InterpolateXY(double[] xs, double[] ys, int count)
    {
        if (xs is null || ys is null || xs.Length != ys.Length)
            throw new ArgumentException($"{nameof(xs)} and {nameof(ys)} must have same length");

        int inputPointCount = xs.Length;
        double[] inputDistances = new double[inputPointCount];
        for (int i = 1; i < inputPointCount; i++)
        {
            double dx = xs[i] - xs[i - 1];
            double dy = ys[i] - ys[i - 1];
            double distance = Math.Sqrt(dx * dx + dy * dy);
            inputDistances[i] = inputDistances[i - 1] + distance;
        }

        double meanDistance = inputDistances.Last() / (count - 1);
        double[] evenDistances = Enumerable.Range(0, count).Select(x => x * meanDistance).ToArray();
        double[] xsOut = Interpolate(inputDistances, xs, evenDistances);
        double[] ysOut = Interpolate(inputDistances, ys, evenDistances);
        return (xsOut, ysOut);
    }

    private static double[] Interpolate(double[] xOrig, double[] yOrig, double[] xInterp)
    {
        (double[] a, double[] b) = FitMatrix(xOrig, yOrig);

        double[] yInterp = new double[xInterp.Length];
        for (int i = 0; i < yInterp.Length; i++)
        {
            int j;
            for (j = 0; j < xOrig.Length - 2; j++)
                if (xInterp[i] <= xOrig[j + 1])
                    break;

            double dx = xOrig[j + 1] - xOrig[j];
            double t = (xInterp[i] - xOrig[j]) / dx;
            double y = (1 - t) * yOrig[j] + t * yOrig[j + 1] +
                t * (1 - t) * (a[j] * (1 - t) + b[j] * t);
            yInterp[i] = y;
        }

        return yInterp;
    }

    private static (double[] a, double[] b) FitMatrix(double[] x, double[] y)
    {
        int n = x.Length;
        double[] a = new double[n - 1];
        double[] b = new double[n - 1];
        double[] r = new double[n];
        double[] A = new double[n];
        double[] B = new double[n];
        double[] C = new double[n];

        double dx1, dx2, dy1, dy2;

        dx1 = x[1] - x[0];
        C[0] = 1.0f / dx1;
        B[0] = 2.0f * C[0];
        r[0] = 3 * (y[1] - y[0]) / (dx1 * dx1);

        for (int i = 1; i < n - 1; i++)
        {
            dx1 = x[i] - x[i - 1];
            dx2 = x[i + 1] - x[i];
            A[i] = 1.0f / dx1;
            C[i] = 1.0f / dx2;
            B[i] = 2.0f * (A[i] + C[i]);
            dy1 = y[i] - y[i - 1];
            dy2 = y[i + 1] - y[i];
            r[i] = 3 * (dy1 / (dx1 * dx1) + dy2 / (dx2 * dx2));
        }

        dx1 = x[n - 1] - x[n - 2];
        dy1 = y[n - 1] - y[n - 2];
        A[n - 1] = 1.0f / dx1;
        B[n - 1] = 2.0f * A[n - 1];
        r[n - 1] = 3 * (dy1 / (dx1 * dx1));

        double[] cPrime = new double[n];
        cPrime[0] = C[0] / B[0];
        for (int i = 1; i < n; i++)
            cPrime[i] = C[i] / (B[i] - cPrime[i - 1] * A[i]);

        double[] dPrime = new double[n];
        dPrime[0] = r[0] / B[0];
        for (int i = 1; i < n; i++)
            dPrime[i] = (r[i] - dPrime[i - 1] * A[i]) / (B[i] - cPrime[i - 1] * A[i]);

        double[] k = new double[n];
        k[n - 1] = dPrime[n - 1];
        for (int i = n - 2; i >= 0; i--)
            k[i] = dPrime[i] - cPrime[i] * k[i + 1];

        for (int i = 1; i < n; i++)
        {
            dx1 = x[i] - x[i - 1];
            dy1 = y[i] - y[i - 1];
            a[i - 1] = k[i - 1] * dx1 - dy1;
            b[i - 1] = -k[i] * dx1 + dy1;
        }

        return (a, b);
    }

Usage

This sample .NET 6 console application uses the class above to create a smoothed (interpolated) curve from a set of random X/Y points. It then plots the original data and the interpolated curve using ScottPlot.

// generate sample data using a random walk
Random rand = new(1268);
int pountCount = 20;
double[] xs1 = new double[pountCount];
double[] ys1 = new double[pountCount];
for (int i = 1; i < pountCount; i++)
{
    xs1[i] = xs1[i - 1] + rand.NextDouble() - .5;
    ys1[i] = ys1[i - 1] + rand.NextDouble() - .5;
}

// Use cubic interpolation to smooth the original data
(double[] xs2, double[] ys2) = Cubic.InterpolateXY(xs1, ys1, 200);

// Plot the original vs. interpolated data
var plt = new ScottPlot.Plot(600, 400);
plt.AddScatter(xs1, ys1, label: "original", markerSize: 7);
plt.AddScatter(xs2, ys2, label: "interpolated", markerSize: 3);
plt.Legend();
plt.SaveFig("interpolation.png");

Additional Interpolation Methods

There are many different methods that can smooth data. Common methods include Bézier splines, Catmull-Rom splines, corner-cutting Chaikin curves, and Cubic splines. I recently implemented these strageies to include with ScottPlot (a MIT-licensed 2D plotting library for .NET). Visit ScottPlot.net to find the source code for that project and search for the Interpolation namespace.

Resources

Markdown source code last modified on January 26th, 2022
---
Title: Spline Interpolation with C# 
Description: How to smooth X/Y data using spline interpolation in Csharp
Date: 2022-01-22 4:00PM EST
Tags: csharp
---

# Spline Interpolation with C# 

**I recently had the need to create a smoothed curve from a series of X/Y data points in a C# application.** I achieved this using cubic [spline interpolation](https://en.wikipedia.org/wiki/Spline_interpolation). I prefer this strategy because I can control the exact number of points in the output curve, and the generated curve (given sufficient points) will pass through the original data making it excellent for data smoothing applications.

<div class='text-center'>

![](screenshot.gif)

</div>

The code below is an adaptation of original work by Ryan Seghers (links below) that I modified to narrow its scope, support `double` types, use modern language features, and operate statelessly in a functional style with all `static` methods.

* It targets `.NET Standard 2.0` so it can be used in .NET Framework and .NET Core applications.

* Input `Xs` and `Ys` must be the same length but do not need to be ordered.

* The interpolated curve may have any number of points (not just even multiples of the input length), and may even have fewer points than the original data.

* Users cannot define start or end slopes so the curve generated is a _natural_ spline.

```cs
public static class Cubic
{
    /// <summary>
    /// Generate a smooth (interpolated) curve that follows the path of the given X/Y points
    /// </summary>
    public static (double[] xs, double[] ys) InterpolateXY(double[] xs, double[] ys, int count)
    {
        if (xs is null || ys is null || xs.Length != ys.Length)
            throw new ArgumentException($"{nameof(xs)} and {nameof(ys)} must have same length");

        int inputPointCount = xs.Length;
        double[] inputDistances = new double[inputPointCount];
        for (int i = 1; i < inputPointCount; i++)
        {
            double dx = xs[i] - xs[i - 1];
            double dy = ys[i] - ys[i - 1];
            double distance = Math.Sqrt(dx * dx + dy * dy);
            inputDistances[i] = inputDistances[i - 1] + distance;
        }

        double meanDistance = inputDistances.Last() / (count - 1);
        double[] evenDistances = Enumerable.Range(0, count).Select(x => x * meanDistance).ToArray();
        double[] xsOut = Interpolate(inputDistances, xs, evenDistances);
        double[] ysOut = Interpolate(inputDistances, ys, evenDistances);
        return (xsOut, ysOut);
    }

    private static double[] Interpolate(double[] xOrig, double[] yOrig, double[] xInterp)
    {
        (double[] a, double[] b) = FitMatrix(xOrig, yOrig);

        double[] yInterp = new double[xInterp.Length];
        for (int i = 0; i < yInterp.Length; i++)
        {
            int j;
            for (j = 0; j < xOrig.Length - 2; j++)
                if (xInterp[i] <= xOrig[j + 1])
                    break;

            double dx = xOrig[j + 1] - xOrig[j];
            double t = (xInterp[i] - xOrig[j]) / dx;
            double y = (1 - t) * yOrig[j] + t * yOrig[j + 1] +
                t * (1 - t) * (a[j] * (1 - t) + b[j] * t);
            yInterp[i] = y;
        }

        return yInterp;
    }

    private static (double[] a, double[] b) FitMatrix(double[] x, double[] y)
    {
        int n = x.Length;
        double[] a = new double[n - 1];
        double[] b = new double[n - 1];
        double[] r = new double[n];
        double[] A = new double[n];
        double[] B = new double[n];
        double[] C = new double[n];

        double dx1, dx2, dy1, dy2;

        dx1 = x[1] - x[0];
        C[0] = 1.0f / dx1;
        B[0] = 2.0f * C[0];
        r[0] = 3 * (y[1] - y[0]) / (dx1 * dx1);

        for (int i = 1; i < n - 1; i++)
        {
            dx1 = x[i] - x[i - 1];
            dx2 = x[i + 1] - x[i];
            A[i] = 1.0f / dx1;
            C[i] = 1.0f / dx2;
            B[i] = 2.0f * (A[i] + C[i]);
            dy1 = y[i] - y[i - 1];
            dy2 = y[i + 1] - y[i];
            r[i] = 3 * (dy1 / (dx1 * dx1) + dy2 / (dx2 * dx2));
        }

        dx1 = x[n - 1] - x[n - 2];
        dy1 = y[n - 1] - y[n - 2];
        A[n - 1] = 1.0f / dx1;
        B[n - 1] = 2.0f * A[n - 1];
        r[n - 1] = 3 * (dy1 / (dx1 * dx1));

        double[] cPrime = new double[n];
        cPrime[0] = C[0] / B[0];
        for (int i = 1; i < n; i++)
            cPrime[i] = C[i] / (B[i] - cPrime[i - 1] * A[i]);

        double[] dPrime = new double[n];
        dPrime[0] = r[0] / B[0];
        for (int i = 1; i < n; i++)
            dPrime[i] = (r[i] - dPrime[i - 1] * A[i]) / (B[i] - cPrime[i - 1] * A[i]);

        double[] k = new double[n];
        k[n - 1] = dPrime[n - 1];
        for (int i = n - 2; i >= 0; i--)
            k[i] = dPrime[i] - cPrime[i] * k[i + 1];

        for (int i = 1; i < n; i++)
        {
            dx1 = x[i] - x[i - 1];
            dy1 = y[i] - y[i - 1];
            a[i - 1] = k[i - 1] * dx1 - dy1;
            b[i - 1] = -k[i] * dx1 + dy1;
        }

        return (a, b);
    }
```

## Usage

This sample .NET 6 console application uses the class above to create a smoothed (interpolated) curve from a set of random X/Y points. It then plots the original data and the interpolated curve using [ScottPlot](https://scottplot.net).

```cs
// generate sample data using a random walk
Random rand = new(1268);
int pountCount = 20;
double[] xs1 = new double[pountCount];
double[] ys1 = new double[pountCount];
for (int i = 1; i < pountCount; i++)
{
    xs1[i] = xs1[i - 1] + rand.NextDouble() - .5;
    ys1[i] = ys1[i - 1] + rand.NextDouble() - .5;
}

// Use cubic interpolation to smooth the original data
(double[] xs2, double[] ys2) = Cubic.InterpolateXY(xs1, ys1, 200);

// Plot the original vs. interpolated data
var plt = new ScottPlot.Plot(600, 400);
plt.AddScatter(xs1, ys1, label: "original", markerSize: 7);
plt.AddScatter(xs2, ys2, label: "interpolated", markerSize: 3);
plt.Legend();
plt.SaveFig("interpolation.png");
```

<div class='text-center'>

![](interpolation.png)

</div>

## Additional Interpolation Methods

There are many different methods that can smooth data. Common methods include [Bézier splines](https://en.wikipedia.org/wiki/B%C3%A9zier_curve), [Catmull-Rom splines](https://www.cs.cmu.edu/~fp/courses/graphics/asst5/catmullRom.pdf), [corner-cutting Chaikin curves](https://www.cs.unc.edu/~dm/UNC/COMP258/LECTURES/Chaikins-Algorithm.pdf), and [Cubic splines](https://en.wikipedia.org/wiki/Spline_interpolation). I recently implemented these strageies to include with ScottPlot (a MIT-licensed 2D plotting library for .NET). Visit [ScottPlot.net](https://ScottPlot.NET) to find the source code for that project and search for the `Interpolation` namespace.

<div class='text-center'>

![](csharp-spline-interpolation.png)

</div>

## Resources
* [Cubic Spline Interpolation source code](https://github.com/SCToolsfactory/SCJMapper-V2/blob/master/OGL/CubicSpline.cs) by Ryan Seghers (MIT license)
* [C# Cubic Spline Interpolation article](https://www.codeproject.com/Articles/560163/Csharp-Cubic-Spline-Interpolation) by Ryan Seghers (Code Project)
* [Numerical Recipes in C++: Cubic Spline Interpolation
](http://www.foo.be/docs-free/Numerical_Recipe_In_C/c3-3.pdf)
* [Fast Cubic Spline Interpolation
](https://arxiv.org/pdf/2001.09253.pdf) by Haysn Hornbeck
* Download this project from [C# Data Visualization](https://github.com/swharden/Csharp-Data-Visualization) on GitHub
* [Bézier Spline Interpolation](http://scaledinnovation.com/analytics/splines/aboutSplines.html)
Pages