SWHarden.com

The personal website of Scott W Harden

Representing Images in Memory

quick reference for programmers interested in working with image data in memory (byte arrays)

This page is a quick reference for programmers interested in working with image data in memory (byte arrays). This topic is straightforward overall, but there are a few traps that aren’t necessarily intuitive so I try my best to highlight those here.

💡 See my newer article, Creating Bitmaps from Scratch in C#

This article assumes you have some programming experience working with byte arrays in a C-type language and have an understanding of what is meant by 32-bit, 24-bit, 16-bit, and 8-bit integers.

Pixel Values

An image is composed of a 2D grid of square pixels, and the type of image greatly influences how much memory each pixel occupies and what format its data is in.

Bits per pixel (bpp) is the number of bits it takes to represent the value a single pixel. This is typically a multiple of 8 bits (1 byte).

Common Pixel Formats

There are others (e.g., 64-bit RGB images), but these are the most typically encountered pixel formats.

Endianness

Endianness describes the order of bytes in a multi-byte value uses to store its data:

Assuming array index values ascend from left to right, 32-bit (4-byte) pixel data can be represented using either of these two formats in memory:

Bitmap images use little-endian integer format! New programmers may expect the bytes that contain “RGB” values to be sequenced in the order “R, G, B”, but this is not the case.

Premultiplied Alpha

Premultiplication refers to the relationship between color (R, G, B) and transparency (alpha). In transparent images the alpha channel may be straight (unassociated) or premultiplied (associated).

With straight alpha, the RGB components represent the full-intensity color of the object or pixel, disregarding its opacity. Later R, G, and B will each be multiplied by the alpha to adjust intensity and transparency.

With premultiplied alpha, the RGB components represent the emission (color and intensity) of each pixel, and the alpha only represents transparency (occlusion of what is behind it). This reduces the computational performance for image processing if transparency isn’t actually used.

In C# using System.Drawing premultiplied alpha is not enabled by default. This must be defined when creating new Bitmap as seen here:

var bmp = new Bitmap(400, 300, PixelFormat.Format32bppPArgb);

Benchmarking reveals the performance enhancement of drawing on bitmaps in memory using premultiplied alpha pixel format. In this test I’m using .NET 5 with the System.Drawing.Common NuGet package. Anti-aliasing is off in this example, but similar results were obtained with it enabled.

Random rand = new(0);
int width = 600;
int height = 400;
var bmp = new Bitmap(600, 400, PixelFormat.Format32bppPArgb);
var gfx = Graphics.FromImage(bmp);
var pen = new Pen(Color.Black);
gfx.Clear(Color.Magenta);
var sw = Stopwatch.StartNew();
for (int i = 0; i < 1e6; i++)
{
    pen.Color = Color.FromArgb(rand.Next());
    gfx.DrawLine(pen, rand.Next(width), rand.Next(height), rand.Next(width), rand.Next(height));
}
Console.WriteLine(sw.Elapsed);
bmp.Save("benchmark.png", ImageFormat.Png);

Time to render 1 million frames:

At the end you have a beautiful figure:

Pixel Locations in Space and Memory

A 2D image is composed of pixels, but addressing them in memory isn’t as trivial as it may seem. The dimensions of bitmaps are stored in their header, and the arrangement of pixels forms rows (left-to-right) then columns (top-to-bottom).

Width and height are the dimensions (in pixels) of the visible image, but…

⚠️ Image size in memory is not just width * height * bytesPerPixel

Because of old hardware limitations, bitmap widths in memory (also called the stride) must be multiplies of 4 bytes. This is effortless when using ARGB formats because each pixel is already 4 bytes, but when working with RGB images it’s possible to have images with an odd number of bytes in each row, requiring data to be padded such that the stride length is a multiple of 4.

// calculate stride length of a bitmap row in memory
int stride = 4 * ((imageWidth * bytesPerPixel + 3) / 4);

Working with Bitmap Bytes in C#

This example demonstrates how to convert a 3D array (X, Y, C) into a flat byte array ready for copying into a bitmap. Notice this code adds padding to the image width to ensure the stride is a multiple of 4 bytes. Notice also the integer encoding is little endian.

public static byte[] GetBitmapBytes(byte[,,] input)
{
    int height = input.GetLength(0);
    int width = input.GetLength(1);
    int bpp = input.GetLength(2);
    int stride = 4 * ((width * bpp + 3) / 4);

    byte[] pixelsOutput = new byte[height * stride];
    byte[] output = new byte[height * stride];

    for (int y = 0; y < height; y++)
        for (int x = 0; x < width; x++)
            for (int z = 0; z < bpp; z++)
                output[y * stride + x * bpp + (bpp - z - 1)] = input[y, x, z];

    return output;
}

For completeness, here’s the complimentary code that converts a flat byte array from bitmap memory to a 3D array (assuming we know the image dimensions and bytes per pixel from reading the image header):

public static byte[,,] GetBitmapBytes3D(byte[] input, int width, int height, int bpp)
{
    int stride = 4 * ((width * bpp + 3) / 4);

    byte[,,] output = new byte[height, width, bpp];
    for (int y = 0; y < height; y++)
        for (int x = 0; x < width; x++)
            for (int z = 0; z < bpp; z++)
                output[y, x, z] = input[stride * y + x * bpp + (bpp - z - 1)];

    return output;
}

Marshalling Bytes in and out of Bitmaps

The code examples above are intentionally simple to focus on the location of pixels in memory and the endianness of their values. To actually convert between byte[] and System.Drawing.Bitmap you must use Marshall.Copy as shown:

public static byte[] BitmapToBytes(Bitmap bmp)
{
    Rectangle rect = new(0, 0, bmp.Width, bmp.Height);
    BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadWrite, bmp.PixelFormat);
    int byteCount = Math.Abs(bmpData.Stride) * bmp.Height;
    byte[] bytes = new byte[byteCount];
    Marshal.Copy(bmpData.Scan0, bytes, 0, byteCount);
    bmp.UnlockBits(bmpData);
    return bytes;
}
public static Bitmap BitmapFromBytes(byte[] bytes, PixelFormat bmpFormat)
{
    Bitmap bmp = new(width, height, bmpFormat);
    var rect = new Rectangle(0, 0, width, height);
    BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadOnly, bmpFormat);
    Marshal.Copy(bytes, 0, bmpData.Scan0, bytes.Length);
    bmp.UnlockBits(bmpData);
    return bmp;
}

How to Create a Bitmap in Memory Without a Graphics Library

The code examples above use System.Drawing.Common to create graphics, but creating bitmaps in a byte[] array is not difficult and can be done in any language. See the Creating Bitmaps from Scratch article for more information.

Reference


Deploy a Website with Python and FTPS

How to use Python, keyring, and TLS to securely deploy website content using FTP (FTPS)

Python can be used to securely deploy website content using FTPS. Many people have used a FTP client like FileZilla to drag-and-drop content from their local computer to a web server, but this method requires manual clicking and is error-prone. If you write a script to accomplish this task it lowers the effort barrier for deployment (encouraging smaller iterations) and reduces the risk you’ll accidentally do something unintentional (like deleting an important folder by accident).

This post reviews how I use Python, keyring, and TLS to securely manage login credentials and deploy builds from my local computer to a remote server using FTP. The strategy discussed here will be most useful in servers that use the LAMP stack, and it’s worth noting that .NET and Node have their own deployment paradigms. I hope you find the code on this page useful, but you should carefully review your deployment script and create something specific to your needs. Just as you could accidentally delete an important folder using a graphical client, an incorrectly written deployment script could cause damage to your website or leak secrets.

Use Keyring to Manage Your Password

I recently wrote about several ways to manage credentials with Python.

In these examples I will use the keyring package to store and recall my FTP password securely.

pip install keyring

Storing Credentials

Store your password using an interactive interpreter to ensure you don’t accidentally save it in a plain text file somewhere. This only needs to be done once.

>>> import keyring
>>> keyring.set_password("system", "me@swharden.com", "P455W0RD")

Recalling Credentials

import keyring
hostname = "swharden.com"
username = "me@swharden.com"
password = keyring.get_password("system", username)

FTP vs. FTPS vs. SFTP

FTP was not designed to be a secure - it transfers login credentials in plain text! Traditionally FTP in Python was achieved using ftplib.FTP from the standard library, but logging-in using this protocol allows anyone sniffing traffic on your network to capture your password. In Python 2.7 ftplib.FTP_TLS was added which adds transport layer security to FTP (FTPS), improving protection of your login credentials.

# ⚠️ This is insecure (password transferred in plain text)
from ftplib import FTP
with FTP(hostname, username, password) as ftp:
    print(ftp.nlst())
# 👍 This is better (password is encrypted)
from ftplib import FTP_TLS
with FTP_TLS(hostname, username, password) as ftps:
    print(ftps.nlst())

By default ftplib.FTP_TLS only encrypts the username and password. You can call prot_p() to encrypt all transferred data, but in this post I’m only interested in encrypting my login credentials.

FTP over SSL (FTPS) is different than FTP over SSH (SFTP), but both use encryption to transfer usernames and passwords, making them superior to traditional FTP which transfers these in plain text.

Recursively Delete a Folder with FTP

This method deletes each of the contents of a folder, then deletes the folder itself. If one of the contents is a subfolder, it calls itself. This example uses modern Python practices, favoring pathlib over os.path.

Note that I define the remote path using pathlib.PurePosixPath() to ensure it’s formatted as a unix-type path since my remote server is a Linux machine.

import ftplib
import pathlib

def removeRecursively(ftp: ftplib.FTP, remotePath: pathlib.PurePath):
    """
    Remove a folder and all its contents from a FTP server
    """

    def removeFile(remoteFile):
        print(f"DELETING FILE {remoteFile}")
        ftp.delete(str(remoteFile))

    def removeFolder(remoteFolder):
        print(f"DELETING FOLDER {remoteFolder}/")
        ftp.rmd(str(remoteFolder))

    for (name, properties) in ftp.mlsd(remotePath):
        fullpath = remotePath.joinpath(name)
        if name == '.' or name == '..':
            continue
        elif properties['type'] == 'file':
            removeFile(fullpath)
        elif properties['type'] == 'dir':
            removeRecursively(ftp, fullpath)

    removeFolder(remotePath)

if __name__ == "__main__":
    remotePath = pathlib.PurePosixPath('/the/remote/folder')
    with ftplib.FTP_TLS("swharden.com", "scott", "P455W0RD") as ftps:
        removeFolder(ftps, remotePath)

Recursively Upload a Folder with FTP

This method recursively uploads a local folder tree to the FTP server. It first creates the folder tree, then uploads all files individually. This example uses modern Python practices, favoring pathlib over os.walk() and os.path.

Like before I define the remote path using pathlib.PurePosixPath() since the server is running Linux, but I can use pathlib.Path() for the local path and it will auto-detect how to format it based on which system I’m currently running on.

import ftplib
import pathlib

def uploadRecursively(ftp: ftplib.FTP, remoteBase: pathlib.PurePath, localBase: pathlib.PurePath):
    """
    Upload a local folder to a remote path on a FTP server
    """

    def remoteFromLocal(localPath: pathlib.PurePath):
        pathParts = localPath.parts[len(localBase.parts):]
        return remoteBase.joinpath(*pathParts)

    def uploadFile(localFile: pathlib.PurePath):
        remoteFilePath = remoteFromLocal(localFile)
        print(f"UPLOADING FILE {remoteFilePath}")
        with open(localFile, 'rb') as localBinary:
            ftp.storbinary(f"STOR {remoteFilePath}", localBinary)

    def createFolder(localFolder: pathlib.PurePath):
        remoteFolderPath = remoteFromLocal(localFolder)
        print(f"CREATING FOLDER {remoteFolderPath}/")
        ftp.mkd(str(remoteFolderPath))

    createFolder(localBase)
    for localFolder in [x for x in localBase.glob("**/*") if x.is_dir()]:
        createFolder(localFolder)
    for localFile in [x for x in localBase.glob("**/*") if not x.is_dir()]:
        uploadFile(localFile)

if __name__ == "__main__":
    localPath = pathlib.Path(R'C:\my\project\folder')
    remotePath = pathlib.PurePosixPath('/the/remote/folder')
    with ftplib.FTP_TLS("swharden.com", "scott", "P455W0RD") as ftps:
        uploadRecursively(ftps, remotePath, localPath)

Minimize Disruption by Renaming

Because walking remote folder trees deleting and upload files can be slow, this process may be disruptive to a website with live traffic. For low-traffic websites this isn’t an issue, but as traffic increases (or the size of the deployment increases) it may be worth considering how to achieve the swap faster.

An improved method of deployment could involve uploading the new website to a temporary folder, switching the names of the folders, then deleting the old folder. There is brief downtime between the two FTP rename calls.

remotePath = "/live"
remotePathNew = "/live-new"
remotePathOld = "/live-old"
localPath = R"C:\dev\site\live"

upload(localPath, remotePathNew)
ftpRename(remotePath, remotePathOld)
ftpRename(remotePathNew, remotePath)
delete(remotePathOld)

Speed could be improved by handling the renaming with a shell script that runs on the server. This would require some coordination to execute though, but is worth considering. It could be executed by a HTTP endpoint.

mv /live /live-old;
mv /live-new /live;
rm -rf /live-old;

Deploy a React App with FTP and Python

You can automate deployment of a React project using Python and FTPS. After creating a new React app add a deploy.py in the project folder that uses FTPS to upload the build folder to the server, then edit your project’s package.json to add predeploy and deploy commands.

  "scripts": {
    "start": "react-scripts start",
    "build": "react-scripts build",
    "test": "react-scripts test",
    "eject": "react-scripts eject",
    "predeploy": "npm run build",
    "deploy" : "python deploy.py"
  },

Then you can create a production build and deploy with one command:

npm run deploy

Consider Using Git to Deploy Content

This post focused on how to automate uploading local content to a remote server using FTP, but don’t overlook the possibility that this may not be the best method for deployment for your application.

You can maintain a website as a git repository and use git pull on the server to update it. GitHub Actions can be used to trigger the pull step automatically using an HTTP endpoint. If this method is available to you, it should be strongly considered.

This method is very popular, but it (1) requires git to be on the server and (2) requires all the build tools/languages to be available on the server if a build step is required. I’m reminded that only SiteGround’s most expensive shared hosting plan even has git available at all.

Resources


Managing Credentials with Python

How to safely work with secret passwords in python scripts that are committed to source control

I enjoy contributing to open-source, but I’d prefer to keep my passwords to myself! Python is a great glue language for automating tasks, and recently I’ve been using it to log in to my web server using SFTP and automate log analysis, file management, and software updates. The Python scripts I’m working on need to know my login information, but I want to commit them to source control and share them on GitHub so I have to be careful to use a strategy which minimizes risk of inadvertently leaking these secrets onto the internet.

This post explores various options for managing credentials in Python scripts in public repositories. There are many different ways to manage credentials with Python, and I was surprised to learn of some new ones as I was researching this topic. This post reviews the most common options, starting with the most insecure and working its way up to the most highly regarded methods for managing secrets.

Plain-Text Passwords in Code

⚠️☠️ DANGER: Never do this

You could put a password or API key directly in your python script, but even if you intend to remove it later there’s always a chance you’ll accidentally commit it to source control without realizing it, posing a security risk forever. This method is to be avoided at all costs!

username = "myUsername"
password = "S3CR3T_P455W0RD"
logIn(username, password)

Obfuscated Passwords in Code

⚠️☠️ DANGER: Never do this

A slightly less terrible idea is to obfuscate plain-text passwords by storing them as base 64 strings. You won’t know the password just by seeing it, but anyone who has the string can easily decode it. Websites like https://www.base64decode.org are useful for this.

"""Demonstrate conversion to/from base 64"""

import base64

def obfuscate(plainText):
    plainBytes = plainText.encode('ascii')
    encodedBytes = base64.b64encode(plainBytes)
    encodedText = encodedBytes.decode('ascii')
    return encodedText


def deobfuscate(obfuscatedText):
    obfuscatedBytes = obfuscatedText.encode('ascii')
    decodedBytes = base64.b64decode(obfuscatedBytes)
    decodedText = decodedBytes.decode('ascii')
    return decodedText
original = "S3CR3T_P455W0RD"
obfuscated = obfuscate(original)
deobfuscated = deobfuscate(obfuscated)

print("original: " + original)
print("obfuscated: " + obfuscated)
print("deobfuscated: " + deobfuscated)
original: S3CR3T_P455W0RD
obfuscated: UzNDUjNUX1A0NTVXMFJE
deobfuscated: S3CR3T_P455W0RD

Passwords in Plain Text Files

⚠️ WARNING: This method is prone to mistakes. Ensure the text file is never committed to source control.

You could store username/password on the first two lines of a plain text file, then use python to read it when you need it.

with open("secrets.txt") as f:
    lines = f.readlines()
    username = lines[0].strip()
    password = lines[1].strip()
    print(f"USERNAME={username}, PASSWORD={password}")

If the text file is in the repository directory you should modify .gitignore to ensure it’s not tracked by source source control. There is a risk that you may forget to do this, exposing your credentials online! A better idea may be to place the secrets file outside your repository folder entirely.

💡 There are libraries which make this easier. One example is Python Decouple which implements a lot of this logic gracefully and can even combine settings from multiple files (e.g., .ini vs .env files) for environments that can benefit from more advanced configuration options. See the notes below about helper libraries that environment variables and .env files

Passwords in Python Modules

⚠️ WARNING: This method is prone to mistakes. Ensure the secrets module is never committed to source control.

Similar to a plain text file not tracked by source control (ideally outside the repository folder entirely), you could store passwords as variables in a Python module then import it.

from mySecrets import username, password
print(f"USERNAME={username}, PASSWORD={password}")

If your secrets file is in an obscure folder, you will have to add it to your path so the module can be found when importing.

import sys
sys.path.append("C:/path/to/secrets/folder")

from mySecrets import username, password
print(f"USERNAME={username}, PASSWORD={password}")

Don’t name your module secrets because the secrets module is part of the standard library and that will likely be imported in stead.

Passwords as Program Arguments

⚠️ WARNING: This method may store plain text passwords in your command history.

This isn’t a great idea because passwords are seen in plain text in the console and also may be stored in the command history. However, you’re unlikely to accidentally commit passwords to source control.

import sys
username = sys.argv[1]
password = sys.argv[2]
print(f"USERNAME={username}, PASSWORD={password}")
python test.py myUsername S3CR3T_P455W0RD

Type Passwords in the Console

You could request the user to type their password in the console, but the characters would be visible as they’re typed.

# ⚠️ This code displays the typed password
password = input("Password: ")

Python has a getpass module in its standard library made for prompting the user for passwords as console input. Unlike input(), characters are not visible as the password is typed.

# 👍 This code hides the typed password
import getpass
password = getpass.getpass('Password: ')

Extract Passwords from the Clipboard

This is an interesting method. It’s fast and simple, but a bit quirky. Downsides are (1) it requires the password to be in the clipboard which may expose it to other programs, (2) it requires installation of a nonstandard library, and (3) it won’t work easily in server environments.

Note that I trust pyperclip more than clipboard (which is just another developer wrapping pyperclip)

pip install pyperclip

Run after copying a password to the clipboard:

import pyperclip
password = pyperclip.paste()

Request Credentials with Tk

The Tk graphics library is a cross-platform graphical widget toolkit that comes with Python. A login window that collects username and password can be created programmatically and wrapped in a function for easily inclusion in scripts that otherwise don’t have a GUI.

I find this technique particularly useful when the username and password are stored in a password manager.

def getCredentials(defaultUser):
    """Request login credentials using a GUI."""
    import tkinter
    root = tkinter.Tk()
    root.eval('tk::PlaceWindow . center')
    root.title('Login')
    uv = tkinter.StringVar(root, value=defaultUser)
    pv = tkinter.StringVar(root, value='')
    userEntry = tkinter.Entry(root, bd=3, width=35, textvariable=uv)
    passEntry = tkinter.Entry(root, bd=3, width=35, show="*", textvariable=pv)
    btnClose = tkinter.Button(root, text="OK", command=root.destroy)
    userEntry.pack(padx=10, pady=5)
    passEntry.pack(padx=10, pady=5)
    btnClose.pack(padx=10, pady=5, side=tkinter.TOP, anchor=tkinter.NE)
    root.mainloop()
    return [uv.get(), pv.get()]
username, password = getCredentials("user@site.com")

Manage Passwords with a Keyring

The keyring package provides an easy way to access the system’s keyring service from python. On MacOS it uses Keychain, on Windows it uses the Windows Credential Locker, and on Linux it can use KDE’s KWallet or GNOME’s Secret Service.

Downsides of keyrings are (1) it requires a nonstandard library, (2) implementation may be OS-specific, (3) it may not function easily in cloud environments.

pip install keyring
# store the password once
import keyring
keyring.set_password("system", "myUsername", "S3CR3T_P455W0RD")
# recall the password at any time
import keyring
password = keyring.get_password("system", "myUsername")

Passwords in Environment Variables

Environment variables are one of the better ways of managing credentials with Python. There are many articles on this topic, including Twilio’s How To Set Environment Variables and Working with Environment Variables in Python. Environment variables are one of the preferred methods of credential management when working with cloud providers.

Be sure to restart your console session after editing environment variables before attempting to read them from within python.

import os
password = os.getenv('demoPassword')

There are many helper libraries such as python-dotenv and Python Decouple which can use local .env files to dynamically set environment variables as your program runs. As noted in previous sections, when storing passwords in plain-text in the file structure of your repository be extremely careful not to commit these files to source control!

Example .env file:

demoPassword2=superSecret

The dotenv package can load .env variables as environment variables when a Python script runs:

import dotenv
dotenv.load_dotenv()
password2 = os.getenv('demoPassword2')
print(password2)

Additional Resources

How do you manage credentials in Python? If you wish to share feedback or a creative method you use that I haven’t discussed above, send me an email and I can include your suggestions in this document.


WSPR Code Generator

An online resource for generating encoded WSPR messages

WSPR (Weak Signal Propagation Reporter) is a protocol for weak-signal radio communication. Transmissions encode a station’s callsign, location (grid square), and transmitter power into a frequency-shift-keyed (FSK) signal that hops between 4 frequencies to send 162 tones in just under two minutes. Signals can be decoded with S/N as low as −34 dB! WSJT-X, the most commonly-used decoding software, reports decoded signals to wsprnet.org so propagation can be assessed all over the world.

WsprSharp is a WSPR encoding library implemented in C#. I created this library learn more about the WSPR protocol. I also made a web application (using Blazor WebAssembly) to make it easy to generate WSPR transmissions online: WSPR Code Generator

WSPR Transmission Information

Resources


Reflection and XML Documentation in C#

How to access XML documentation from within a C# application

In C#, you can document your code using XML directly before code blocks. This XML documentation is used by Visual Studio to display tooltips and provide autocomplete suggestions.

/// <summary>
///  This method performs an important function.
/// </summary>
public void MyMethod() {}

To enable automatic generation of XML documentation on every build, add the following to your csproj file:

<PropertyGroup>
  <DocumentationFile>MyProgram.xml</DocumentationFile>
</PropertyGroup>

However, XML documentation is not actually metadata, so it is not available in the compiled assembly. In this post I’ll show how you can use System.Reflection to gather information about methods and combine it with documentation from an XML file read with System.Xml.Linq.XDocument.

I find this useful for writing code which automatically generates documentation. Reflection alone cannot provide comments, and XML documentation alone cannot provide parameter names (just their types). By combining reflection with XML documentation, we can more completely describe methods in C# programs.

⚠️ WARNING: These code examples are intentionally simple. They only demonstrate how to read summary comments from XML documentation for methods found using reflection. Additional functionality can be added as needed, and my intent here is to provide a simple starting point rather than overwhelmingly complex examples that support all features and corner cases.

💡 Source code can be downloaded at the bottom of this article

✔️ The “hack” described on this page is aimed at overcoming limitations of partially-documented XML. A better solution is to fully document your code, in which case the XML document is self-sufficient. The primary goal of these efforts is to use XML documentation where it is present, but use Reflection to fill in the blanks about methods which are undocumented or only partially documented. Perhaps a better strategy would be to have a fully documented code base, with an XML file containing <summary>, <returns>, and <param> for every parameter. Tests can ensure this is and remains the case.

What does the XML Documentation look like?

There are some nuances here you might not expect, especially related to arrays, generics, and nullables. Let’s start with a demo class with documented summaries. Keep in mind that the goal of this project is to help use Reflection to fill in the blanks about undocumented or partially-documented code, so this example will only add a <summary> but no <param> descriptions.

DemoClass.cs

/// <summary>
/// Display a name
/// </summary>
public static void ShowName(string name)
{
    Console.WriteLine($"Hi {name}");
}

/// <summary>
/// Display a name a certain number of times
/// </summary>
public static void ShowName(string name, byte repeats)
{
    for (int i = 0; i < repeats; i++)
        Console.WriteLine($"Hi {name}");
}

/// <summary>
/// Display the type of the variable passed in
/// </summary>
public static void ShowGenericType<T>(T myVar)
{
    Console.WriteLine($"Generic type {myVar.GetType()}");
}

/// <summary>
/// Display the value of a nullable integer
/// </summary>
public static void ShowNullableInt(int? myInt)
{
    Console.WriteLine(myInt);
}

XML Documentation File

There are a few important points to notice here:

<?xml version="1.0"?>
<doc>
    <assembly>
        <name>XmlDocDemo</name>
    </assembly>
    <members>
        <member name="M:XmlDocDemo.DemoClass.ShowName(System.String)">
            <summary>
            Display a name
            </summary>
        </member>
        <member name="M:XmlDocDemo.DemoClass.ShowName(System.String,System.Byte)">
            <summary>
            Display a name a certain number of times
            </summary>
        </member>
        <member name="M:XmlDocDemo.DemoClass.ShowGenericType``1(``0)">
            <summary>
            Display the type of the variable passed in
            </summary>
        </member>
        <member name="M:XmlDocDemo.DemoClass.ShowNullableInt(System.Nullable{System.Int32})">
            <summary>
            Display the value of a nullable integer
            </summary>
        </member>
    </members>
</doc>

XML Name Details

Thanks Zachary Patten for sharing these details in an MSDN article and e-mail correspondence

Read XML Documentation File

This code reads the XML documentation file (using the modern XDocument) and stores method summaries in a Dictionary using the XML method name as a key. This dictionary will be accessed later to look-up documentation for methods found using Reflection.

private readonly Dictionary<string, string> MethodSummaries = new Dictionary<string, string>();

public XmlDoc(string xmlFile)
{
    XDocument doc = XDocument.Load(xmlFile);
    foreach (XElement element in doc.Element("doc").Element("members").Elements())
    {
        string xmlName = element.Attribute("name").Value;
        string xmlSummary = element.Element("summary").Value.Trim();
        MethodSummaries[xmlName] = xmlSummary;
    }
}

Determine XML Method Name for a Reflected Method

This example code returns the XML member name for a method found by reflection. This is the key step required to connect reflected methods with their descriptions in XML documentation files.

⚠️ Warning: This code sample may not support all corner-cases, but in practice I found it supports all of the ones I typically encounter in my code bases and it’s a pretty good balance between functionality and simplicity.

public static string GetXmlName(MethodInfo info)
{
	string declaringTypeName = info.DeclaringType.FullName;

	if (declaringTypeName is null)
		throw new NotImplementedException("inherited classes are not supported");

	string xmlName = "M:" + declaringTypeName + "." + info.Name;
	xmlName = string.Join("", xmlName.Split(']').Select(x => x.Split('[')[0]));
	xmlName = xmlName.Replace(",", "");

	if (info.IsGenericMethod)
		xmlName += "``#";

	int genericParameterCount = 0;
	List<string> paramNames = new List<string>();
	foreach (var parameter in info.GetParameters())
	{
		Type paramType = parameter.ParameterType;
		string paramName = GetXmlNameForMethodParameter(paramType);
		if (paramName.Contains("#"))
			paramName = paramName.Replace("#", (genericParameterCount++).ToString());
		paramNames.Add(paramName);
	}
	xmlName = xmlName.Replace("#", genericParameterCount.ToString());

	if (paramNames.Any())
		xmlName += "(" + string.Join(",", paramNames) + ")";

	return xmlName;
}

private static string GetXmlNameForMethodParameter(Type type)
{
	string xmlName = type.FullName ?? type.BaseType.FullName;
	bool isNullable = xmlName.StartsWith("System.Nullable");
	Type nullableType = isNullable ? type.GetGenericArguments()[0] : null;

	// special formatting for generics (also Func, Nullable, and ValueTulpe)
	if (type.IsGenericType)
	{
		var genericNames = type.GetGenericArguments().Select(x => GetXmlNameForMethodParameter(x));
		var typeName = type.FullName.Split('`')[0];
		xmlName = typeName + "{" + string.Join(",", genericNames) + "}";
	}

	// special case for generic nullables
	if (type.IsGenericType && isNullable && type.IsArray == false)
		xmlName = "System.Nullable{" + nullableType.FullName + "}";

	// special case for multidimensional arrays
	if (type.IsArray && (type.GetArrayRank() > 1))
	{
		string arrayName = type.FullName.Split('[')[0].Split('`')[0];
		if (isNullable)
			arrayName += "{" + nullableType.FullName + "}";
		string arrayContents = string.Join(",", Enumerable.Repeat("0:", type.GetArrayRank()));
		xmlName = arrayName + "[" + arrayContents + "]";
	}

	// special case for generic arrays
	if (type.IsArray && type.FullName is null)
		xmlName = "``#[]";

	// special case for value types
	if (xmlName.Contains("System.ValueType"))
		xmlName = "`#";

	return xmlName;
}

Get XML Documentation for a Reflected Method

Now that we have XmlName(), we can easily iterate through reflected methods and get their XML documentation.

// use Reflection to get info from custom methods
var infos = typeof(DemoClass).GetMethods()
                             .Where(x => x.DeclaringType.FullName != "System.Object")
                             .ToArray();

// display XML info about each reflected method
foreach (MethodInfo mi in infos)
{
    string xmlName = XmlName(mi);
    Console.WriteLine("");
    Console.WriteLine("Method: " + XmlDoc.MethodSignature(mi));
    Console.WriteLine("XML Name: " + xmlName);
    Console.WriteLine("XML Summary: " + MethodSummaries[xmlName]);
}

Output

Method: XmlDocDemo.DemoClass.ShowName(string name)
XML Name: M:XmlDocDemo.DemoClass.ShowName(System.String)
XML Summary: Display a name

Method: XmlDocDemo.DemoClass.ShowName(string name, byte repeats)
XML Name: M:XmlDocDemo.DemoClass.ShowName(System.String,System.Byte)
XML Summary: Display a name a certain number of times

Method: XmlDocDemo.DemoClass.ShowGenericType<T>(T myVar)
XML Name: M:XmlDocDemo.DemoClass.ShowGenericType``1(``0)
XML Summary: Display the type of the variable passed in

Method: XmlDocDemo.DemoClass.ShowNullableInt(int? myInt)
XML Name: M:XmlDocDemo.DemoClass.ShowNullableInt(System.Nullable{System.Int32})
XML Summary: Display the value of a nullable integer

Resources

Source Code

A simple-case working demo of these concepts can be downloaded here:

Documentation Generators

Zachary Patten’s Useful Article

There is an extensive article on this topic in the October 2019 issue of MSDN Magazine, Accessing XML Documentation via Reflection by Zachary Patten. The code examples there provide a lot of advanced features, but are technically incomplete and some critical components are only shown using pseudocode. The reader is told that full code is available as part of the author’s library Towel, but this library is extensive and provides many functions unrelated to reflection and XML documentation making it difficult to navigate. The method to convert a method to its XML documentation name is Towel/Meta.cs#L1026-L1092, but it’s coupled to other code which requires hash maps to be pre-formed in order to use it. My post here is intended to be self-contained simple reference for how to combine XML documentation with Reflection, but users interested in reading further on this topic are encouraged to read Zachary’s article.

Update: Potentially Useful Libraries

Update (Feb 21, 2021): I continued to do research on this topic. I thought I’d find a “golden bullet” library that could help me do this perfectly. The code above does a pretty good job, but I would feel more confident using something designed/tested specifically for this task. I looked and found some helpful libraries, but none of them met all me needs. For my projects, I decided just to use the code above.

DocXml

DocXml is a small .NET standard 2.0 library of helper classes and methods for compiler-generated XML documentation retrieval. Its API is very simple and easy to use with a predictive IDE. Out of the box though it was unable to properly identify the XML name of one of my functions. I think it got stuck on the generic method with a multi-dimensional generic array as an argument, but don’t recall for sure. For basic code bases, this looks like a fantastic library.

NuDoq

NuDoq (previously NuDoc?) is a standalone API to read and write .NET XML documentation files and optionally augment it with reflection information. According to the releases it was actively worked on around 2014, then rested quietly for a few years and new releases began in 2021. NuDoq looks quite extensive, but takes some studying before it can be used effectively. “Given the main API to traverse and act on the documentation elements is through the visitor pattern, the most important part of the API is knowing the types of nodes/elements in the visitable model.”

Towel

Towel is a .NET library intended to add core functionality and make advanced topics as clean and simple as possible. Towel has tools for working with data structures, algorithms, mathematics, metadata, extensions, console, and more. Towel favors newer coding practices over maintaining backwards compatibility, and at the time of writing Towel only supports .NET 5 and newer. One of Towel’s Meta module has methods to get XML names for reflected objects. It’s perfect, but requires .NET 5.0 or newer so I could not use it in my project.

Washcloth

I tried to create a version of Towel that isolated just the XML documentation reflection module so I could back-port it to .NET Standard. I created Washcloth which largely achieved this and wrapped Towel behind a simple API. This task proved extremely difficult to accomplish cleanly though, because most modules in the Towel code base are highly coupled to one another so it was very difficult to isolate the Meta module and I never achieved this goal to my satisfaction. I named the project Washcloth because a washcloth is really just a small towel with less functionality. Washcloth technically works (and can be used in projects back to .NET Core 2.0 and .NET Framework 4.6.1), but so much coupled code from the rest of Towel came along with it that I decided not to use this project for my application.

Final Thoughts

After a month poking around at this, here’s where I landed: