The personal website of Scott W Harden
September 24th, 2020

Exponential Fit with Python

Fitting an exponential curve to data is a common task and in this example we'll use Python and SciPy to determine parameters for a curve fitted to arbitrary X/Y points. You can follow along using the fit.ipynb Jupyter notebook.

import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt

xs = np.arange(12) + 7
ys = np.array([304.08994, 229.13878, 173.71886, 135.75499,
               111.096794, 94.25109, 81.55578, 71.30187, 
               62.146603, 54.212032, 49.20715, 46.765743])

plt.plot(xs, ys, '.')
plt.title("Original Data")

To fit an arbitrary curve we must first define it as a function. We can then call scipy.optimize.curve_fit which will tweak the arguments (using arguments we provide as the starting parameters) to best fit the data. In this example we will use a single exponential decay function.

def monoExp(x, m, t, b):
    return m * np.exp(-t * x) + b

In biology / electrophysiology biexponential functions are often used to separate fast and slow components of exponential decay which may be caused by different mechanisms and occur at different rates. In this example we will only fit the data to a method with a exponential component (a monoexponential function), but the idea is the same.

# perform the fit
p0 = (2000, .1, 50) # start with values near those we expect
params, cv = scipy.optimize.curve_fit(monoExp, xs, ys, p0)
m, t, b = params
sampleRate = 20_000 # Hz
tauSec = (1 / t) / sampleRate

# determine quality of the fit
squaredDiffs = np.square(ys - monoExp(xs, m, t, b))
squaredDiffsFromMean = np.square(ys - np.mean(ys))
rSquared = 1 - np.sum(squaredDiffs) / np.sum(squaredDiffsFromMean)
print(f"R² = {rSquared}")

# plot the results
plt.plot(xs, ys, '.', label="data")
plt.plot(xs, monoExp(xs, m, t, b), '--', label="fitted")
plt.title("Fitted Exponential Curve")

# inspect the parameters
print(f"Y = {m} * e^(-{t} * x) + {b}")
print(f"Tau = {tauSec * 1e6} µs")

Y = 2666.499 * e^(-0.332 * x) + 42.494
Tau = 150.422 µs
R² = 0.999107330342064

Extrapolating the Fitted Curve

We can use the calculated parameters to extend this curve to any position by passing X values of interest into the function we used during the fit.

The value at time 0 is simply m + b because the exponential component becomes e^(0) which is 1.

xs2 = np.arange(25)
ys2 = monoExp(xs2, m, t, b)

plt.plot(xs, ys, '.', label="data")
plt.plot(xs2, ys2, '--', label="fitted")
plt.title("Extrapolated Exponential Curve")

Constraining the Infinite Decay Value

What if we know our data decays to 0? It's not best to fit to an exponential decay function that lets the b component be whatever it wants. Indeed, our fit from earlier calculated the ideal b to be 42.494 but what if we know it should be 0? The solution is to fit using an exponential function where b is constrained to 0 (or whatever value you know it to be).

def monoExpZeroB(x, m, t):
    return m * np.exp(-t * x)

# perform the fit using the function where B is 0
p0 = (2000, .1) # start with values near those we expect
paramsB, cv = scipy.optimize.curve_fit(monoExpZeroB, xs, ys, p0)
mB, tB = paramsB
sampleRate = 20_000 # Hz
tauSec = (1 / tB) / sampleRate

# inspect the results
print(f"Y = {mB} * e^(-{tB} * x)")
print(f"Tau = {tauSec * 1e6} µs")

# compare this curve to the original
ys2B = monoExpZeroB(xs2, mB, tB)
plt.plot(xs, ys, '.', label="data")
plt.plot(xs2, ys2, '--', label="fitted")
plt.plot(xs2, ys2B, '--', label="zero B")
Y = 1245.580 * e^(-0.210 * x)
Tau = 237.711 µs

The curves produced are very different at the extremes (especially when time is 0), even though they appear to both fit the data points nicely. Which curve is more accurate? That depends on your application. A hint can be gained by inspecting the time constants of these two curves.

Parameter Fitted B Fixed B
m 2666.499 1245.580
t 0.332 0.210
Tau 150.422 µs 237.711 µs
b 42.494 0

By inspecting Tau I can gain insight into which method may be better for me to use in my application. I expect Tau to be near 250 µs, leading me to trust the fixed-B method over the fitted B method. Choosing the correct method has great implications on the value of m (which is also the value of the curve when time is 0).

Markdown source code last modified on April 29th, 2021
---
title: Exponential Fit with Python
date: 2020-09-24 17:45:00
tags: python
---

# Exponential Fit with Python

**Fitting an exponential curve to data is a common task** and in this example we'll use Python and SciPy to determine parameters for a curve fitted to arbitrary X/Y points. You can follow along using the [fit.ipynb](fit.ipynb) Jupyter notebook.

```python
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt

xs = np.arange(12) + 7
ys = np.array([304.08994, 229.13878, 173.71886, 135.75499,
               111.096794, 94.25109, 81.55578, 71.30187, 
               62.146603, 54.212032, 49.20715, 46.765743])

plt.plot(xs, ys, '.')
plt.title("Original Data")
```

<div class="text-center">

![](original.png)

</div>

**To fit an arbitrary curve** we must first define it as a function. We can then call `scipy.optimize.curve_fit` which will tweak the arguments (using arguments we provide as the starting parameters) to best fit the data. In this example we will use a single [exponential decay](https://en.wikipedia.org/wiki/Exponential_decay) function. 

```python
def monoExp(x, m, t, b):
    return m * np.exp(-t * x) + b
```

**In biology / electrophysiology _biexponential_ functions are often used** to separate fast and slow components of exponential decay which may be caused by different mechanisms and occur at different rates. In this example we will only fit the data to a method with a exponential component (a _monoexponential_ function), but the idea is the same.

```python
# perform the fit
p0 = (2000, .1, 50) # start with values near those we expect
params, cv = scipy.optimize.curve_fit(monoExp, xs, ys, p0)
m, t, b = params
sampleRate = 20_000 # Hz
tauSec = (1 / t) / sampleRate

# determine quality of the fit
squaredDiffs = np.square(ys - monoExp(xs, m, t, b))
squaredDiffsFromMean = np.square(ys - np.mean(ys))
rSquared = 1 - np.sum(squaredDiffs) / np.sum(squaredDiffsFromMean)
print(f"R² = {rSquared}")

# plot the results
plt.plot(xs, ys, '.', label="data")
plt.plot(xs, monoExp(xs, m, t, b), '--', label="fitted")
plt.title("Fitted Exponential Curve")

# inspect the parameters
print(f"Y = {m} * e^(-{t} * x) + {b}")
print(f"Tau = {tauSec * 1e6} µs")
```

<div class="text-center">

![](fitted.png)

</div>

```
Y = 2666.499 * e^(-0.332 * x) + 42.494
Tau = 150.422 µs
R² = 0.999107330342064
```

## Extrapolating the Fitted Curve

**We can use the calculated parameters to extend this curve** to any position by passing X values of interest into the function we used during the fit. 

**The value at time 0** is simply `m + b` because the exponential component becomes e^(0) which is 1.

```python
xs2 = np.arange(25)
ys2 = monoExp(xs2, m, t, b)

plt.plot(xs, ys, '.', label="data")
plt.plot(xs2, ys2, '--', label="fitted")
plt.title("Extrapolated Exponential Curve")
```

<div class="text-center">

![](fitted2.png)

</div>

## Constraining the Infinite Decay Value

**What if we know our data decays to 0?** It's not best to fit to an exponential decay function that lets the `b` component be whatever it wants. Indeed, our fit from earlier calculated the ideal `b` to be `42.494` but what if we know it should be `0`? The solution is to fit using an exponential function where `b` is constrained to 0 (or whatever value you know it to be).

```python
def monoExpZeroB(x, m, t):
    return m * np.exp(-t * x)

# perform the fit using the function where B is 0
p0 = (2000, .1) # start with values near those we expect
paramsB, cv = scipy.optimize.curve_fit(monoExpZeroB, xs, ys, p0)
mB, tB = paramsB
sampleRate = 20_000 # Hz
tauSec = (1 / tB) / sampleRate

# inspect the results
print(f"Y = {mB} * e^(-{tB} * x)")
print(f"Tau = {tauSec * 1e6} µs")

# compare this curve to the original
ys2B = monoExpZeroB(xs2, mB, tB)
plt.plot(xs, ys, '.', label="data")
plt.plot(xs2, ys2, '--', label="fitted")
plt.plot(xs2, ys2B, '--', label="zero B")
```

```
Y = 1245.580 * e^(-0.210 * x)
Tau = 237.711 µs
```

<div class="text-center">

![](fits.png)

</div>

**The curves produced are very different** at the extremes (especially when time is 0), even though they appear to both fit the data points nicely. Which curve is more accurate? That depends on your application. A hint can be gained by inspecting the time constants of these two curves.

<div class="text-center">

Parameter | Fitted B | Fixed B
---|---|---
m|2666.499|1245.580
t|0.332|0.210
Tau|150.422 µs|237.711 µs
b|42.494|0

</div>

**By inspecting Tau** I can gain insight into which method may be better for me to use in my application. I expect Tau to be near 250 µs, leading me to trust the fixed-B method over the fitted B method. Choosing the correct method has great implications on the value of `m` (which is also the value of the curve when time is 0).
September 23rd, 2020

Signal Filtering in Python

Over a decade ago I posted code demonstrating how to filter data in Python, but there have been many improvements since then. My original posts (1, 2, 3, 4) required creating discrete filtering functions, but modern approaches can leverage Numpy and Scipy to do this more easily and efficiently. In this article we will use scipy.signal.filtfilt to apply low-pass, high-pass, and band-pass filters to reduce noise in an ECG signal (stored in ecg.wav (created as part of my Sound Card ECG project).

Moving-window filtering methods often result in a filtered signal that lags behind the original data (a phase shift). By filtering the signal twice in opposite directions filtfilt cancels-out this phase shift to produce a filtered signal which is nicely aligned with the input data.

import scipy.io.wavfile
import scipy.signal
import numpy as np
import matplotlib.pyplot as plt

# read ECG data from the WAV file
sampleRate, data = scipy.io.wavfile.read('ecg.wav')
times = np.arange(len(data))/sampleRate

# apply a 3-pole lowpass filter at 0.1x Nyquist frequency
b, a = scipy.signal.butter(3, 0.1)
filtered = scipy.signal.filtfilt(b, a, data)

# plot the original data next to the filtered data

plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.plot(times, data)
plt.title("ECG Signal with Noise")
plt.margins(0, .05)

plt.subplot(122)
plt.plot(times, filtered)
plt.title("Filtered ECG Signal")
plt.margins(0, .05)

plt.tight_layout()
plt.show()

Cutoff Frequency

The second argument passed into the butter method customizes the cut-off frequency of the Butterworth filter. This value (Wn) is a number between 0 and 1 representing the fraction of the Nyquist frequency to use for the filter. Note that Nyquist frequency is half of the sample rate. As this fraction increases, the cutoff frequency increases. You can get fancy and express this value as 2 * Hz / sample rate.

plt.plot(data, '.-', alpha=.5, label="data")

for cutoff in [.03, .05, .1]:
    b, a = scipy.signal.butter(3, cutoff)
    filtered = scipy.signal.filtfilt(b, a, data)
    label = f"{int(cutoff*100):d}%"
    plt.plot(filtered, label=label)

plt.legend()
plt.axis([350, 500, None, None])
plt.title("Effect of Different Cutoff Values")
plt.show()

Improve Edges with Gustafsson’s Method

Something weird happens at the edges. There's not enough data "off the page" to know how to smooth those points, so what should be done?

Padding is the default behavior, where edges are padded with with duplicates of the edge data points and smooth the trace as if those data points existed. The drawback of this is that one stray data point at the edge will greatly affect the shape of your smoothed data.

Gustafsson’s Method may be superior to padding. The advantage of this method is that stray points at the edges do not greatly influence the smoothed curve at the edges. This technique is described in a 1994 paper by Fredrik Gustafsson. "Initial conditions are chosen for the forward and backward passes so that the forward-backward filter gives the same result as the backward-forward filter." Interestingly this paper demonstrates the method by filtering noise out of an EKG recording.

# A small portion of data will be inspected for demonstration
segment = data[350:400]

filtered = scipy.signal.filtfilt(b, a, segment)
filteredGust = scipy.signal.filtfilt(b, a, segment, method="gust")

plt.plot(segment, '.-', alpha=.5, label="data")
plt.plot(filtered, 'k--', label="padded")
plt.plot(filteredGust, 'k', label="Gustafsson")
plt.legend()
plt.title("Padded Data vs. Gustafsson’s Method")
plt.show()

Band-Pass Filter

Low-pass and high-pass filters can be selected simply by customizing the third argument passed into the filter. The second argument indicates frequency (as fraction of Nyquist frequency, half the sample rate). Passing a list of two values in for the second argument allows for band-pass filtering of a signal.

b, a = scipy.signal.butter(3, 0.05, 'lowpass')
filteredLowPass = scipy.signal.filtfilt(b, a, data)

b, a = scipy.signal.butter(3, 0.05, 'highpass')
filteredHighPass = scipy.signal.filtfilt(b, a, data)

b, a = scipy.signal.butter(3, [.01, .05], 'band')
filteredBandPass = scipy.signal.lfilter(b, a, data)

Filter using Convolution

Another way to low-pass a signal is to use convolution. In this method you create a window (typically a bell-shaped curve) and convolve the window with the signal. The wider the window is the smoother the output signal will be. Also, the window must be normalized so its sum is 1 to preserve the amplitude of the input signal.

There are different ways to handle what happens to data points at the edges (see numpy.convolve for details), but setting mode to valid delete these points to produce an output signal slightly smaller than the input signal.

# create a normalized Hanning window
windowSize = 40
window = np.hanning(windowSize)
window = window / window.sum()

# filter the data using convolution
filtered = np.convolve(window, data, mode='valid')

plt.subplot(131)
plt.plot(kernel)
plt.title("Window")

plt.subplot(132)
plt.plot(data)
plt.title("Data")

plt.subplot(133)
plt.plot(filtered)
plt.title("Filtered")

Different window functions filter the signal in different ways. Hanning windows are typically preferred because they have a mostly Gaussian shape but touch zero at the edges. For a discussion of the pros and cons of different window functions for spectral analysis using the FFT, see my notes on FftSharp.

Resources

Markdown source code last modified on January 18th, 2021
---
title: Signal Filtering in Python
date: 2020-09-23 21:46:00
tags: python
---

# Signal Filtering in Python

**Over a decade ago I posted code demonstrating how to filter data in Python, but there have been many improvements since then.** My original posts ([1](https://swharden.com/blog/2008-11-17-linear-data-smoothing-in-python/), [2](https://swharden.com/blog/2009-01-21-signal-filtering-with-python/), [3](https://swharden.com/blog/2010-06-20-smoothing-window-data-averaging-in-python-moving-triangle-tecnique/), [4](https://swharden.com/blog/2010-06-24-detrending-data-in-python-with-numpy/)) required creating discrete filtering functions, but modern approaches can leverage Numpy and Scipy to do this more easily and efficiently. In this article we will use [`scipy.signal.filtfilt`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.filtfilt.html) to apply low-pass, high-pass, and band-pass filters to reduce noise in an ECG signal (stored in [ecg.wav](ecg.wav) (created as part of my [Sound Card ECG](https://swharden.com/blog/2019-03-15-sound-card-ecg-with-ad8232/) project).

<div class="text-center">

![](signal-lowpass-filter.png)

</div>

Moving-window filtering methods often result in a filtered signal that lags behind the original data (a _phase shift_). By filtering the signal twice in opposite directions `filtfilt` cancels-out this phase shift to produce a filtered signal which is nicely aligned with the input data.

```python
import scipy.io.wavfile
import scipy.signal
import numpy as np
import matplotlib.pyplot as plt

# read ECG data from the WAV file
sampleRate, data = scipy.io.wavfile.read('ecg.wav')
times = np.arange(len(data))/sampleRate

# apply a 3-pole lowpass filter at 0.1x Nyquist frequency
b, a = scipy.signal.butter(3, 0.1)
filtered = scipy.signal.filtfilt(b, a, data)
```

<div class="text-center">

![](signal-lowpass-ecg.png)

</div>

```python
# plot the original data next to the filtered data

plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.plot(times, data)
plt.title("ECG Signal with Noise")
plt.margins(0, .05)

plt.subplot(122)
plt.plot(times, filtered)
plt.title("Filtered ECG Signal")
plt.margins(0, .05)

plt.tight_layout()
plt.show()
```

## Cutoff Frequency

The second argument passed into the `butter` method customizes the cut-off frequency of the Butterworth filter. This value (Wn) is a number between 0 and 1 representing the _fraction of the Nyquist frequency_ to use for the filter. Note that [Nyquist frequency](https://en.wikipedia.org/wiki/Nyquist_frequency) is half of the sample rate. As this fraction increases, the cutoff frequency increases. You can get fancy and express this value as 2 * Hz / sample rate.

```python
plt.plot(data, '.-', alpha=.5, label="data")

for cutoff in [.03, .05, .1]:
    b, a = scipy.signal.butter(3, cutoff)
    filtered = scipy.signal.filtfilt(b, a, data)
    label = f"{int(cutoff*100):d}%"
    plt.plot(filtered, label=label)
    
plt.legend()
plt.axis([350, 500, None, None])
plt.title("Effect of Different Cutoff Values")
plt.show()
```

<div class="text-center">

![](signal-lowpass-cutoff.png)

</div>

## Improve Edges with Gustafsson’s Method

Something weird happens at the edges. There's not enough data "off the page" to know how to smooth those points, so what should be done? 

**Padding is the default behavior,** where edges are padded with with duplicates of the edge data points and smooth the trace as if those data points existed. The drawback of this is that one stray data point at the edge will greatly affect the shape of your smoothed data.

**Gustafsson’s Method may be superior to padding.** The advantage of this method is that stray points at the edges do not greatly influence the smoothed curve at the edges. This technique is described in [a 1994 paper by Fredrik Gustafsson](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=492552). "Initial conditions are chosen for the forward and backward passes so that the forward-backward filter gives the same result as the backward-forward filter." Interestingly this paper demonstrates the method by filtering noise out of an EKG recording.

```python
# A small portion of data will be inspected for demonstration
segment = data[350:400]

filtered = scipy.signal.filtfilt(b, a, segment)
filteredGust = scipy.signal.filtfilt(b, a, segment, method="gust")

plt.plot(segment, '.-', alpha=.5, label="data")
plt.plot(filtered, 'k--', label="padded")
plt.plot(filteredGust, 'k', label="Gustafsson")
plt.legend()
plt.title("Padded Data vs. Gustafsson’s Method")
plt.show()
```

<div class="text-center">

![](signal-method-gust.png)

</div>

## Band-Pass Filter

Low-pass and high-pass filters can be selected simply by customizing the third argument passed into the filter. The second argument indicates frequency (as fraction of Nyquist frequency, half the sample rate). Passing a list of two values in for the second argument allows for band-pass filtering of a signal.

```python
b, a = scipy.signal.butter(3, 0.05, 'lowpass')
filteredLowPass = scipy.signal.filtfilt(b, a, data)

b, a = scipy.signal.butter(3, 0.05, 'highpass')
filteredHighPass = scipy.signal.filtfilt(b, a, data)

b, a = scipy.signal.butter(3, [.01, .05], 'band')
filteredBandPass = scipy.signal.lfilter(b, a, data)
```

<div class="text-center">

![](signal-lowpass-highpass-bandpass.png)

</div>

## Filter using Convolution

**Another way to low-pass a signal is to use convolution.** In this method you create a window (typically a bell-shaped curve) and _convolve_ the window with the signal. The wider the window is the smoother the output signal will be. Also, the window must be normalized so its sum is 1 to preserve the amplitude of the input signal.

There are different ways to handle what happens to data points at the edges (see [`numpy.convolve`](https://numpy.org/doc/stable/reference/generated/numpy.convolve.html) for details), but setting `mode` to `valid` delete these points to produce an output signal slightly smaller than the input signal.

```python
# create a normalized Hanning window
windowSize = 40
window = np.hanning(windowSize)
window = window / window.sum()

# filter the data using convolution
filtered = np.convolve(window, data, mode='valid')
```

<div class="text-center">

![](signal-convolution-filter.png)

</div>

```python
plt.subplot(131)
plt.plot(kernel)
plt.title("Window")

plt.subplot(132)
plt.plot(data)
plt.title("Data")

plt.subplot(133)
plt.plot(filtered)
plt.title("Filtered")
```

**Different window functions filter the signal in different ways.** Hanning windows are typically preferred because they have a mostly Gaussian shape but touch zero at the edges. For a discussion of the pros and cons of different window functions for spectral analysis using the FFT, see my notes on [FftSharp](https://github.com/swharden/FftSharp).

## Resources

* Sample data: [ecg.wav](ecg.wav)

* [Sound Card ECG](https://swharden.com/blog/2019-03-15-sound-card-ecg-with-ad8232/)

* Jupyter notebook for this page: [signal-filtering.ipynb](signal-filtering.ipynb)

* SciPy Cookbook: [Filtfilt](https://scipy-cookbook.readthedocs.io/items/FiltFilt.html), [Buterworth Bandpass Filter](https://scipy-cookbook.readthedocs.io/items/ButterworthBandpass.html)

* SciPy Documentation: [scipy.signal.filtfilt](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.filtfilt.html), [scipy.signal.butter](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.butter.html)

* Numpy Documentation: [numpy.convolve](https://numpy.org/doc/stable/reference/generated/numpy.convolve.html)

* [Savitzky Golay Filtering](https://scipy-cookbook.readthedocs.io/items/SavitzkyGolay.html) - The Savitzky Golay filter is a particular type of low-pass filter, well adapted for data smoothing.
September 22nd, 2020

Test React Apps in Azure Pipelines

Azure Pipelines makes it easy to run tests in the cloud, but I found that a new React projects made with create-react-app fail to properly test in the cloud using the simple npm test command. Attempting this would display No tests found related to files changed since last commit but hang forever.

I solved this problem and got my React app to test properly in the cloud by adding -- --watchAll=false after npm test. This is my final azure-pipelines.yml file:

trigger:
  - master

pool:
  vmImage: "ubuntu-latest"

steps:
  - task: NodeTool@0
    inputs:
      versionSpec: "10.x"
    displayName: "Install Node.js"

  - script: npm install
    displayName: "Install NPM"

  - script: npm run build
    displayName: "Build"

  - script: npm test -- --watchAll=false
    displayName: "Test"

A working React app that tests properly with Azure Pipelines is GitHub.com/swharden/AliCalc

Markdown source code last modified on January 18th, 2021
---
title: Test React Apps in Azure Pipelines
date: 2020-09-22 13:15:00
---

# Test React Apps in Azure Pipelines

Azure Pipelines makes it easy to run tests in the cloud, but I found that a new React projects made with [`create-react-app`](https://reactjs.org/docs/create-a-new-react-app.html) fail to properly test in the cloud using the simple `npm test` command. Attempting this would display `No tests found related to files changed since last commit` but hang forever.

<div class="text-center img-border">

[![](npm-test-azure-pipelines_thumb.jpg)](npm-test-azure-pipelines.jpg)

</div>

I solved this problem and got my React app to test properly in the cloud by adding `-- --watchAll=false` after `npm test`. This is my final `azure-pipelines.yml` file:

```yaml
trigger:
  - master

pool:
  vmImage: "ubuntu-latest"

steps:
  - task: NodeTool@0
    inputs:
      versionSpec: "10.x"
    displayName: "Install Node.js"

  - script: npm install
    displayName: "Install NPM"

  - script: npm run build
    displayName: "Build"

  - script: npm test -- --watchAll=false
    displayName: "Test"
```

A working React app that tests properly with Azure Pipelines is [GitHub.com/swharden/AliCalc](https://github.com/swharden/AliCalc)
September 19th, 2020

Deleted from The Wayback Machine

This week my website was removed from the Wayback Machine. The Wayback Machine is an impressive website that lets you view what a website looked like years ago. As part of Archive.org Internet Archive, this website is truly impressive and holds entertainingly-old versions of most webpages. Just look at Amazon.com in the year 2000 for a good laugh.

I started this blog as a child twenty years ago and after seeing what the Wayback Machine pulled-up I realized that it may be best that the thoughts I had as a child stay in the past. I have personal copies of all my old blog posts, but with the wisdom of age and hindsight I'd much prefer that that material stay off the internet. Luckily I was able to get my website removed from the Wayback Machine, and this post documents how I did it.

For those of you wanting to do the same, this is how I did it: I sent an email to info@archive.org stating the following:

Please remove my website [MY URL] from the Wayback Machine. [MY URL]/robots.txt has been updated to indicate I do not wish this website to be archived.

https://lookup.icann.org/ shows that [MY URL] points to [HOSTING COMPANY] nameservers, and I have attached a recent invoice from [HOSTING COMPANY] as evidence that I own this domain.

If additional evidence or action is required (e.g., DMCA takedown notice) please let me know.

Thank you!
Scott

I'm not sure if editing robots.txt was necessary, but I felt it gave credence to the fact that I had control over the content of this domain. That file contains the following text. In the past I read this was all it took to get your website de-listed from the wayback machine, but I added this same file to another domain name of mine and it has not been de-listed.

User-agent: archive.org_bot
Disallow: /

User-agent: ia_archiver
Disallow: /

I attached an invoice from the present year showing a credit card payment to my hosting company for the domain as a PDF. Interestingly I did not have to show a history of domain ownership. I downloaded the invoice from my hosting company's billing page that day, and it displays my home address but not my email address.

Six days later, my site was removed. This is the email I received:

FROM: Office Manager (Internet Archive)

Hello,

The following has now been submitted for exclusion from the Wayback Machine at web.archive.org: [MY SITE]

Please allow up to a day for the automated portions of the process to run their course and for the changes to take effect.

– The Internet Archive Team

I reviewed a lot of websites before reaching my strategy. I was surprised to see some people using issuing DMCA takedown notices notices to Archive.org, and was happy to find this was not required in my case. Here are some of the resources I found helpful:

⚠️ WARNING: This may not be permanent. I'm not sure what will happen if I lose my domain name (and robots.txt file) in the future. It is possible that my site is still being archived, while not being available on the wayback machine, and that some time in the future my site will be re-listed.

If you have updated information send me an email so I can update this page! In the mean time, I hope this information will be useful for others interested in curating their historical online presence.

Markdown source code last modified on January 18th, 2021
---
title: How I Deleted my Site from the Wayback Machine
date: 2020-09-19
---

# Deleted from The Wayback Machine

**This week my website was removed from the Wayback Machine.** The [Wayback Machine](https://archive.org/web/) is an impressive website that lets you view what a website looked like years ago. As part of [Archive.org](https://archive.org/) Internet Archive, this website is truly impressive and holds entertainingly-old versions of most webpages. Just look at [Amazon.com in the year 2000](https://web.archive.org/web/20000601000000*/amazon.com) for a good laugh.

<div class="text-center img-border">

[![](delete-waybackmachine_thumb.jpg)](delete-waybackmachine.png)

</div>

**I started this blog as a child twenty years ago** and after seeing what the Wayback Machine pulled-up I realized that it may be best that the thoughts I had as a child stay in the past. I have personal copies of all my old blog posts, but with the wisdom of age and hindsight I'd much prefer that that material stay off the internet. Luckily I was able to get my website removed from the Wayback Machine, and this post documents how I did it.

**For those of you wanting to do the same, this is how I did it:** I sent an email to `info@archive.org` stating the following:

> Please remove my website [MY URL] from the Wayback Machine. 
[MY URL]/robots.txt has been updated to indicate I do not wish 
this website to be archived.
<br><br>
https://lookup.icann.org/ shows that [MY URL] points to 
[HOSTING COMPANY] nameservers, and I have attached a recent 
invoice from [HOSTING COMPANY] as evidence that I own this domain.
<br><br>
If additional evidence or action is required (e.g., DMCA takedown 
notice) please let me know.
<br><br>
Thank you!
<br>
Scott


**I'm not sure if editing `robots.txt` was necessary, but I felt it gave credence to the fact that I had control over the content of this domain.** That file contains the following text. In the past I read this was all it took to get your website de-listed from the wayback machine, but I added this same file to another domain name of mine and it has not been de-listed.

```
User-agent: archive.org_bot
Disallow: /

User-agent: ia_archiver
Disallow: /
```

**I attached an invoice** from the present year showing a credit card payment to my hosting company for the domain as a PDF. Interestingly I did not have to show a history of domain ownership. I downloaded the invoice from my hosting company's billing page that day, and it displays my home address but not my email address.

**Six days later, my site was removed.** This is the email I received:

> FROM: Office Manager (Internet Archive)
<br><br>
Hello,
<br><br>
The following has now been submitted for exclusion from the 
Wayback Machine at web.archive.org: [MY SITE]
<br><br>
Please allow up to a day for the automated portions of the process
to run their course and for the changes to take effect.
<br><br>
&#8211; The Internet Archive Team

**I reviewed a lot of websites** before reaching my strategy. I was surprised to see some people using issuing [DMCA takedown notices](https://www.dmca.com/faq/What-is-a-DMCA-Takedown) notices to Archive.org, and was happy to find this was not required in my case. Here are some of the resources I found helpful:

* [Archive.org forums](https://archive.org/iathreads/forums.php) - many recent discussions about how to have websites removed. Ironically posting on a public forum may draw _more_ attention to a sensitive website before it is shut down, so this doesn't seem like a great strategy. However, it does seem to work for some.

* [How to Block Your Website From The Wayback Machine](https://www.fightcyberstalking.org/how-to-block-your-website-from-the-wayback-machine/) - uses `robots.txt` which worked in ~2018 but is no longer sufficient.

* [3 Easy Steps To Removing Your Site From Archive.org Wayback Machine](https://blog.imincomelab.com/remove-site-wayback-machine-archive/) - suggests creating a `robots.txt`, issuing DMCA takedown notice, then sending email.

* [How to Delete Your Site from the Internet Archive](https://www.joshualowcock.com/tips-tricks/how-to-delete-your-site-from-the-internet-archive-wayback-machine-archive-org/) - suggests creating a `robots.txt`, issuing DMCA takedown notice, sending historical records of domain ownership, then sending email.

> **⚠️ WARNING:** This may not be permanent. I'm not sure what will happen if I lose my domain name (and robots.txt file) in the future. It is possible that my site is still being archived, while not being available on the wayback machine, and that some time in the future my site will be re-listed.

**If you have updated information** send me an email so I can update this page! In the mean time, I hope this information will be useful for others interested in curating their historical online presence.
September 13th, 2020

Leaving WordPress

After fifteen years using WordPress, I'm leaving it for a simpler alternative: flat markdown files. There were several reasons behind why I made the change. First, I was disappointed with how frequently I had to update WordPress (and upgrade my database) to stay current with security updates. Second, I didn't like how abstract post content was. The text of posts was stored in SQL tables and references to image URLs weren't easily accessible (posts point to content IDs, the URLs of which were stored in another table), and images and media were scattered all over the filesystem because the default image placement changed several times over the years. Finally I found that logging in to a web front-end just to write a post was a bit of a barrier that prevented me from writing more frequently.

I have been very active on GitHub over the last few years and used their platform to share my code instead of this website. Lots of code and notes belong in repositories there, yes, but sometimes I create neat things which would be better represented as one-off posts on my personal website. Some of my repositories have collected notes like these, so I look forward to migrating a lot of that content here. My hope is that the new system I put together will make it easier to share content by writing it in Markdown using the editors I'm already working in every day.

Dynamic Markdown Parsing with PHP

The system I'm using now is pretty simple. Every post is a folder, and each folder contains a markdown file along with all of the images and files that post references. At the top of the markdown file is a little header which has information like title, date, and categories (tags). I use a PHP script route HTTP requests and if a requested folder lacks index.html but has index.md, I serve that using Parsedown to convert it to HTML. I also add a few tweaks to do things like convert YouTube links to embedded videos and add syntax highlighting to code blocks. Backups are easy (I just zip the folder), and the website could be committed to source control. I'm leaning away from this because it's about 1GB (lots of images), but I'll consider it. Also, the URL is just the path to the folder.

There's a clear path toward generating a static site. If a folder lacks index.html, index.md is parsed and served. Switching to and from a static site can be achieved just by pre-converting all the markdown files to html and deleting them. I'll probably keep working on refining the PHP script until the conversions are reliably processing like I desire, then convert most of the old pages to static files. The cool thing about this method is that it lets me serve some posts statically but others dynamically.

Wordpress (slow) Markdown (fast)

Performing the Conversion

The conversion from WordPress to Markdown was semi-automated, but still labor-intensive.

  • I first dumped the database to a SQL file, parsed-out the content and metadata (url, title, date, and privacy status), then created the filesystem and markdown files.

  • I then had to manually inspect every markdown file and reformat it, converting inline HTML to markdown (mostly images, galleries, and divs for alignment formatting). In many cases code formatting was damaged over the years, so lots of my old code was run through an autoformatter.

  • I also had to hunt-down the media (images, MP3s, ZIP files, etc.) for every post, copy it to the same folder, and update the URLs to be relative. This was especially hard for galleries which only point to meta content IDs (stored in a separate database table), and my database had gotten damaged somewhere along the way over the years so I really struggled to find the right content sometimes.

  • I also added tags to indicate categories, carefully reviewing content and code and marking posts as "old" if they contained out of date examples (lots of Python 2 code) or code that I deemed today to be of very poor quality. Part of me wanted to delete (hide) old posts with bad code, but I decided to leave them up. It's a reminder of how long I've worked in improving my craft, and my revulsion to code I wrote in the past is an indication of how much I've learned since.

  • This process took me about 10 hours a day for 3 days in a row.

Along the way I had a few laughs at the ridiculousness of some of my old content. I think it's probably a good thing to encourage teenagers to have personal websites, but I also encourage professionals and employers not to give too much credence to ramblings written by a person decades ago that Google happens to remember. I didn't delete any content, but I marked most of the posts I made as a teenager as private and only exposed the ones that discuss this website.

History of this Blog

After reviewing all of my posts I now have a really good understanding of the evolution of the technologies I used to serve my website over the years. Here's a summary of the major events:

  • It started as a blog on GeoCities, with the oldest surviving post dating to June 2001. Back then adding content meant editing HTML files and using FTP to upload changes.

  • In 2002 I started hosting my website from a server at my house. Initially it was served with Windows/IIS using ASP for comments pages. On October 19, 2002 I switched to FreeBSD/Apache using PHP for comments pages.

  • I started using the Movable Type (a flat-file PHP-based CMS) on Aug 25, 2003.

  • I migrated to WordPress (a CMS that stored posts in a database) in 2005.

  • In 2020 I converted all my posts to Markdown using PHP to dynamically generate HTML (with an avenue to generate flat-file output).

Markdown source code last modified on September 12th, 2021
---
title: Leaving WordPress
date: 2020-09-13 18:15:00
Tags: php
---

# Leaving WordPress

**After fifteen years using WordPress, I'm leaving it for a simpler alternative: flat markdown files.** There were several reasons behind why I made the change. First, I was disappointed with how frequently I had to update WordPress (and upgrade my database) to stay current with security updates. Second, I didn't like how abstract post content was. The text of posts was stored in SQL tables and references to image URLs weren't easily accessible (posts point to content IDs, the URLs of which were stored in another table), and images and media were scattered all over the filesystem because the default image placement changed several times over the years. Finally I found that logging in to a web front-end just to write a post was a bit of a barrier that prevented me from writing more frequently. 

<div class="text-center">

[![](github_thumb.jpg)](github.png)

</div>

**I have been [very active on GitHub](https://github.com/swharden) over the last few years** and used their platform to share my code instead of this website. Lots of code and notes belong in repositories there, yes, but sometimes I create neat things which would be better represented as one-off posts on my personal website. Some of my repositories have collected notes like these, so I look forward to migrating a lot of that content here. My hope is that the new system I put together will make it easier to share content by writing it in Markdown using the editors I'm already working in every day.

## Dynamic Markdown Parsing with PHP

**The system I'm using now is pretty simple.** Every post is a folder, and each folder contains a markdown file along with all of the images and files that post references. At the top of the markdown file is a little header which has information like title, date, and categories (tags). I use a PHP script route HTTP requests and if a requested folder lacks index.html but has index.md, I serve that using [Parsedown](https://github.com/erusev/parsedown) to convert it to HTML. I also add a few tweaks to do things like convert YouTube links to embedded videos and add syntax highlighting to code blocks. Backups are easy (I just zip the folder), and the website could be committed to source control. I'm leaning away from this because it's about 1GB (lots of images), but I'll consider it. Also, the URL is just the path to the folder.

**There's a clear path toward generating a static site.** If a folder lacks index.html, index.md is parsed and served. Switching to and from a static site can be achieved just by pre-converting all the markdown files to html and deleting them. I'll probably keep working on refining the PHP script until the conversions are reliably processing like I desire, then convert most of the old pages to static files. The cool thing about this method is that it lets me serve some posts statically but others dynamically.

<div align="center">

Wordpress (slow) | Markdown (fast)
---|---
<div style='text-align: center;'><img src="benchmark-slow.png"></div> | <div style='text-align: center;'><img src="benchmark-fast.png"></div>

</div>

## Performing the Conversion

The conversion from WordPress to Markdown was semi-automated, but still labor-intensive. 

* I first dumped the database to a SQL file, parsed-out the content and metadata (url, title, date, and privacy status), then created the filesystem and markdown files. 

* I then had to manually inspect every markdown file and reformat it, converting inline HTML to markdown (mostly images, galleries, and divs for alignment formatting). In many cases code formatting was damaged over the years, so lots of my old code was run through an autoformatter.

* I also had to hunt-down the media (images, MP3s, ZIP files, etc.) for every post, copy it to the same folder, and update the URLs to be relative. This was especially hard for galleries which only point to meta content IDs (stored in a separate database table), and my database had gotten damaged somewhere along the way over the years so I really struggled to find the right content sometimes. 

* I also added tags to indicate categories, carefully reviewing content and code and marking posts as "old" if they contained out of date examples (lots of Python 2 code) or code that I deemed today to be of very poor quality. Part of me wanted to delete (hide) old posts with bad code, but I decided to leave them up. It's a reminder of how long I've worked in improving my craft, and my revulsion to code I wrote in the past is an indication of how much I've learned since.

* This process took me about 10 hours a day for 3 days in a row.

Along the way I had a few laughs at the ridiculousness of some of my old content. I think it's probably a good thing to encourage teenagers to have personal websites, but I also encourage professionals and employers not to give too much credence to ramblings written by a person decades ago that Google happens to remember. I didn't delete any content, but I marked most of the posts I made as a teenager as private and only exposed the ones that discuss this website.

## History of this Blog

After reviewing all of my posts I now have a really good understanding of the evolution of the technologies I used to serve my website over the years. Here's a summary of the major events:

* It started as a blog on GeoCities, with the [oldest surviving post](../2001-06-16-geocities-hardentechnologies-1) dating to June 2001. Back then adding content meant editing HTML files and using FTP to upload changes. 

* In 2002 I started hosting my website from a server at my house. Initially it was served with Windows/IIS using ASP for comments pages. On October 19, 2002 I switched to FreeBSD/Apache using PHP for comments pages.

* I started using the [Movable Type](https://en.wikipedia.org/wiki/Movable_Type) (a flat-file PHP-based CMS) on Aug 25, 2003.

* I migrated to [WordPress](https://en.wikipedia.org/wiki/WordPress) (a CMS that stored posts in a database) in 2005.

* In 2020 I converted all my posts to [Markdown](https://en.wikipedia.org/wiki/Markdown) using PHP to dynamically generate HTML (with an avenue to generate flat-file output).
Pages