The personal website of Scott W Harden
May 16th, 2021

Deploy a Website with Python and FTPS

Python can be used to securely deploy website content using FTPS. Many people have used a FTP client like FileZilla to drag-and-drop content from their local computer to a web server, but this method requires manual clicking and is error-prone. If you write a script to accomplish this task it lowers the effort barrier for deployment (encouraging smaller iterations) and reduces the risk you'll accidentally do something unintentional (like deleting an important folder by accident).

This post reviews how I use Python, keyring, and TLS to securely manage login credentials and deploy builds from my local computer to a remote server using FTP. The strategy discussed here will be most useful in servers that use the LAMP stack, and it's worth noting that .NET and Node have their own deployment paradigms. I hope you find the code on this page useful, but you should carefully review your deployment script and create something specific to your needs. Just as you could accidentally delete an important folder using a graphical client, an incorrectly written deployment script could cause damage to your website or leak secrets.

Use Keyring to Manage Your Password

I recently wrote about several ways to manage credentials with Python.

In these examples I will use the keyring package to store and recall my FTP password securely.

pip install keyring

Storing Credentials

Store your password using an interactive interpreter to ensure you don't accidentally save it in a plain text file somewhere. This only needs to be done once.

>>> import keyring
>>> keyring.set_password("system", "me@swharden.com", "P455W0RD")

Recalling Credentials

import keyring
hostname = "swharden.com"
username = "me@swharden.com"
password = keyring.get_password("system", username)

FTP vs. FTPS vs. SFTP

FTP was not designed to be a secure - it transfers login credentials in plain text! Traditionally FTP in Python was achieved using ftplib.FTP from the standard library, but logging-in using this protocol allows anyone sniffing traffic on your network to capture your password. In Python 2.7 ftplib.FTP_TLS was added which adds transport layer security to FTP (FTPS), improving protection of your login credentials.

# ⚠️ This is insecure (password transferred in plain text)
from ftplib import FTP
with FTP(hostname, username, password) as ftp:
    print(ftp.nlst())
# 👍 This is better (password is encrypted)
from ftplib import FTP_TLS
with FTP_TLS(hostname, username, password) as ftps:
    print(ftps.nlst())

By default ftplib.FTP_TLS only encrypts the username and password. You can call prot_p() to encrypt all transferred data, but in this post I'm only interested in encrypting my login credentials.

FTP over SSL (FTPS) is different than FTP over SSH (SFTP), but both use encryption to transfer usernames and passwords, making them superior to traditional FTP which transfers these in plain text.

Recursively Delete a Folder with FTP

This method deletes each of the contents of a folder, then deletes the folder itself. If one of the contents is a subfolder, it calls itself. This example uses modern Python practices, favoring pathlib over os.path.

Note that I define the remote path using pathlib.PurePosixPath() to ensure it's formatted as a unix-type path since my remote server is a Linux machine.

import ftplib
import pathlib

def removeRecursively(ftp: ftplib.FTP, remotePath: pathlib.PurePath):
    """
    Remove a folder and all its contents from a FTP server
    """

    def removeFile(remoteFile):
        print(f"DELETING FILE {remoteFile}")
        ftp.delete(str(remoteFile))

    def removeFolder(remoteFolder):
        print(f"DELETING FOLDER {remoteFolder}/")
        ftp.rmd(str(remoteFolder))

    for (name, properties) in ftp.mlsd(remotePath):
        fullpath = remotePath.joinpath(name)
        if name == '.' or name == '..':
            continue
        elif properties['type'] == 'file':
            removeFile(fullpath)
        elif properties['type'] == 'dir':
            removeRecursively(ftp, fullpath)

    removeFolder(remotePath)

if __name__ == "__main__":
    remotePath = pathlib.PurePosixPath('/the/remote/folder')
    with ftplib.FTP_TLS("swharden.com", "scott", "P455W0RD") as ftps:
        removeFolder(ftps, remotePath)

Recursively Upload a Folder with FTP

This method recursively uploads a local folder tree to the FTP server. It first creates the folder tree, then uploads all files individually. This example uses modern Python practices, favoring pathlib over os.walk() and os.path.

Like before I define the remote path using pathlib.PurePosixPath() since the server is running Linux, but I can use pathlib.Path() for the local path and it will auto-detect how to format it based on which system I'm currently running on.

import ftplib
import pathlib

def uploadRecursively(ftp: ftplib.FTP, remoteBase: pathlib.PurePath, localBase: pathlib.PurePath):
    """
    Upload a local folder to a remote path on a FTP server
    """

    def remoteFromLocal(localPath: pathlib.PurePath):
        pathParts = localPath.parts[len(localBase.parts):]
        return remoteBase.joinpath(*pathParts)

    def uploadFile(localFile: pathlib.PurePath):
        remoteFilePath = remoteFromLocal(localFile)
        print(f"UPLOADING FILE {remoteFilePath}")
        with open(localFile, 'rb') as localBinary:
            ftp.storbinary(f"STOR {remoteFilePath}", localBinary)

    def createFolder(localFolder: pathlib.PurePath):
        remoteFolderPath = remoteFromLocal(localFolder)
        print(f"CREATING FOLDER {remoteFolderPath}/")
        ftp.mkd(str(remoteFolderPath))

    createFolder(localBase)
    for localFolder in [x for x in localBase.glob("**/*") if x.is_dir()]:
        createFolder(localFolder)
    for localFile in [x for x in localBase.glob("**/*") if not x.is_dir()]:
        uploadFile(localFile)

if __name__ == "__main__":
    localPath = pathlib.Path(R'C:\my\project\folder')
    remotePath = pathlib.PurePosixPath('/the/remote/folder')
    with ftplib.FTP_TLS("swharden.com", "scott", "P455W0RD") as ftps:
        uploadRecursively(ftps, remotePath, localPath)

Minimize Disruption by Renaming

Because walking remote folder trees deleting and upload files can be slow, this process may be disruptive to a website with live traffic. For low-traffic websites this isn't an issue, but as traffic increases (or the size of the deployment increases) it may be worth considering how to achieve the swap faster.

An improved method of deployment could involve uploading the new website to a temporary folder, switching the names of the folders, then deleting the old folder. There is brief downtime between the two FTP rename calls.

remotePath = "/live"
remotePathNew = "/live-new"
remotePathOld = "/live-old"
localPath = R"C:\dev\site\live"

upload(localPath, remotePathNew)
ftpRename(remotePath, remotePathOld)
ftpRename(remotePathNew, remotePath)
delete(remotePathOld)

Speed could be improved by handling the renaming with a shell script that runs on the server. This would require some coordination to execute though, but is worth considering. It could be executed by a HTTP endpoint.

mv /live /live-old;
mv /live-new /live;
rm -rf /live-old;

Deploy a React App with FTP and Python

You can automate deployment of a React project using Python and FTPS. After creating a new React app add a deploy.py in the project folder that uses FTPS to upload the build folder to the server, then edit your project's package.json to add predeploy and deploy commands.

  "scripts": {
    "start": "react-scripts start",
    "build": "react-scripts build",
    "test": "react-scripts test",
    "eject": "react-scripts eject",
    "predeploy": "npm run build",
    "deploy" : "python deploy.py"
  },

Then you can create a production build and deploy with one command:

npm run deploy

Consider Using Git to Deploy Content

This post focused on how to automate uploading local content to a remote server using FTP, but don't overlook the possibility that this may not be the best method for deployment for your application.

You can maintain a website as a git repository and use git pull on the server to update it. GitHub Actions can be used to trigger the pull step automatically using an HTTP endpoint (e.g., main.yml). If this method is available to you, it should be strongly considered.

This method is very popular, but it (1) requires git to be on the server and (2) requires all the build tools/languages to be available on the server if a build step is required. I'm reminded that only SiteGround's most expensive shared hosting plan even has git available at all.

Resources

Markdown source code last modified on September 12th, 2021
---
Title: Deploy a Website with Python and FTPS
Description: How to use Python, keyring, and TLS to securely deploy website content using FTP (FTPS)
Date: 2021-05-16 11AM EST
Tags: python
---

# Deploy a Website with Python and FTPS

**Python can be used to securely deploy website content using FTPS.** Many people have used a FTP client like [FileZilla](https://en.wikipedia.org/wiki/FileZilla) to drag-and-drop content from their local computer to a web server, but this method requires manual clicking and is error-prone. If you write a script to accomplish this task it lowers the effort barrier for deployment (encouraging smaller iterations) and reduces the risk you'll accidentally do something unintentional (like deleting an important folder by accident).

**This post reviews how I use Python, keyring, and TLS to securely manage login credentials and deploy builds from my local computer to a remote server using FTP.** The strategy discussed here will be most useful in servers that use the [LAMP stack](https://en.wikipedia.org/wiki/LAMP_(software_bundle)), and it's worth noting that .NET and Node have their own deployment paradigms. I hope you find the code on this page useful, but you should carefully review your deployment script and create something specific to your needs. Just as you could accidentally delete an important folder using a graphical client, an incorrectly written deployment script could cause damage to your website or leak secrets.

## Use Keyring to Manage Your Password

I recently wrote about [several ways to manage credentials with Python](https://swharden.com/blog/2021-05-15-python-credentials/). 

In these examples I will use the [keyring package](https://pypi.org/project/keyring/) to store and recall my FTP password securely.

```bash
pip install keyring
```

### Storing Credentials

Store your password using an interactive interpreter to ensure you don't accidentally save it in a plain text file somewhere. This only needs to be done once.

```py
>>> import keyring
>>> keyring.set_password("system", "me@swharden.com", "P455W0RD")
```

### Recalling Credentials

```py
import keyring
hostname = "swharden.com"
username = "me@swharden.com"
password = keyring.get_password("system", username)
```

## FTP vs. FTPS vs. SFTP

**FTP was not designed to be a secure - it transfers login credentials in plain text!** Traditionally FTP in Python was achieved using `ftplib.FTP` from the standard library, but logging-in using this protocol allows anyone sniffing traffic on your network to capture your password. In Python 2.7 `ftplib.FTP_TLS` was added which adds transport layer security to FTP (FTPS), improving protection of your login credentials. 

```py
# ⚠️ This is insecure (password transferred in plain text)
from ftplib import FTP
with FTP(hostname, username, password) as ftp:
    print(ftp.nlst())
```

```py
# 👍 This is better (password is encrypted)
from ftplib import FTP_TLS
with FTP_TLS(hostname, username, password) as ftps:
    print(ftps.nlst())
```

By default `ftplib.FTP_TLS` only encrypts the username and password. You can call `prot_p()` to encrypt all transferred data, but in this post I'm only interested in encrypting my login credentials.

FTP over SSL (FTPS) is different than FTP over SSH (SFTP), but both use encryption to transfer usernames and passwords, making them superior to traditional FTP which transfers these in plain text.

## Recursively Delete a Folder with FTP

This method deletes each of the contents of a folder, then deletes the folder itself. If one of the contents is a subfolder, it calls itself. This example uses modern Python practices, favoring `pathlib` over `os.path`.

Note that I define the remote path using `pathlib.PurePosixPath()` to ensure it's formatted as a unix-type path since my remote server is a Linux machine.

```py
import ftplib
import pathlib

def removeRecursively(ftp: ftplib.FTP, remotePath: pathlib.PurePath):
    """
    Remove a folder and all its contents from a FTP server
    """

    def removeFile(remoteFile):
        print(f"DELETING FILE {remoteFile}")
        ftp.delete(str(remoteFile))

    def removeFolder(remoteFolder):
        print(f"DELETING FOLDER {remoteFolder}/")
        ftp.rmd(str(remoteFolder))

    for (name, properties) in ftp.mlsd(remotePath):
        fullpath = remotePath.joinpath(name)
        if name == '.' or name == '..':
            continue
        elif properties['type'] == 'file':
            removeFile(fullpath)
        elif properties['type'] == 'dir':
            removeRecursively(ftp, fullpath)

    removeFolder(remotePath)

if __name__ == "__main__":
    remotePath = pathlib.PurePosixPath('/the/remote/folder')
    with ftplib.FTP_TLS("swharden.com", "scott", "P455W0RD") as ftps:
        removeFolder(ftps, remotePath)
```

## Recursively Upload a Folder with FTP

This method recursively uploads a local folder tree to the FTP server. It first creates the folder tree, then uploads all files individually. This example uses modern Python practices, favoring `pathlib` over `os.walk()` and `os.path`.

Like before I define the remote path using `pathlib.PurePosixPath()` since the server is running Linux, but I can use `pathlib.Path()` for the local path and it will auto-detect how to format it based on which system I'm currently running on.

```py
import ftplib
import pathlib

def uploadRecursively(ftp: ftplib.FTP, remoteBase: pathlib.PurePath, localBase: pathlib.PurePath):
    """
    Upload a local folder to a remote path on a FTP server
    """

    def remoteFromLocal(localPath: pathlib.PurePath):
        pathParts = localPath.parts[len(localBase.parts):]
        return remoteBase.joinpath(*pathParts)

    def uploadFile(localFile: pathlib.PurePath):
        remoteFilePath = remoteFromLocal(localFile)
        print(f"UPLOADING FILE {remoteFilePath}")
        with open(localFile, 'rb') as localBinary:
            ftp.storbinary(f"STOR {remoteFilePath}", localBinary)

    def createFolder(localFolder: pathlib.PurePath):
        remoteFolderPath = remoteFromLocal(localFolder)
        print(f"CREATING FOLDER {remoteFolderPath}/")
        ftp.mkd(str(remoteFolderPath))

    createFolder(localBase)
    for localFolder in [x for x in localBase.glob("**/*") if x.is_dir()]:
        createFolder(localFolder)
    for localFile in [x for x in localBase.glob("**/*") if not x.is_dir()]:
        uploadFile(localFile)

if __name__ == "__main__":
    localPath = pathlib.Path(R'C:\my\project\folder')
    remotePath = pathlib.PurePosixPath('/the/remote/folder')
    with ftplib.FTP_TLS("swharden.com", "scott", "P455W0RD") as ftps:
        uploadRecursively(ftps, remotePath, localPath)
```

## Minimize Disruption by Renaming

Because walking remote folder trees deleting and upload files can be slow, this process may be disruptive to a website with live traffic. For low-traffic websites this isn't an issue, but as traffic increases (or the size of the deployment increases) it may be worth considering how to achieve the swap faster.

An improved method of deployment could involve uploading the new website to a temporary folder, switching the names of the folders, then deleting the old folder. There is brief downtime between the two FTP rename calls.

```py
remotePath = "/live"
remotePathNew = "/live-new"
remotePathOld = "/live-old"
localPath = R"C:\dev\site\live"

upload(localPath, remotePathNew)
ftpRename(remotePath, remotePathOld)
ftpRename(remotePathNew, remotePath)
delete(remotePathOld)
```

Speed could be improved by handling the renaming with a shell script that runs on the server. This would require some coordination to execute though, but is worth considering. It could be executed by a HTTP endpoint.

```bash
mv /live /live-old;
mv /live-new /live;
rm -rf /live-old;
```

## Deploy a React App with FTP and Python

**You can automate deployment of a React project using Python and FTPS.** After [creating a new React app](https://reactjs.org/docs/create-a-new-react-app.html) add a `deploy.py` in the project folder that uses FTPS to upload the build folder to the server, then edit your project's `package.json` to add `predeploy` and `deploy` commands.

```json
  "scripts": {
    "start": "react-scripts start",
    "build": "react-scripts build",
    "test": "react-scripts test",
    "eject": "react-scripts eject",
    "predeploy": "npm run build",
    "deploy" : "python deploy.py"
  },
```

Then you can create a production build and deploy with one command:

```
npm run deploy
```

## Consider Using Git to Deploy Content

This post focused on how to automate uploading local content to a remote server using FTP, but don't overlook the possibility that this may not be the best method for deployment for your application.

**You can maintain a website as a git repository and use `git pull` on the server to update it.** GitHub Actions can be used to trigger the pull step automatically using an HTTP endpoint (e.g., [main.yml](https://github.com/ScottPlot/Website/blob/main/.github/workflows/main.yml)). If this method is available to you, it should be strongly considered.

This method is very popular, but it (1) requires `git` to be on the server and (2) requires all the build tools/languages to be available on the server if a build step is required. I'm reminded that [only SiteGround's most expensive shared hosting plan](https://www.siteground.com/shared-hosting-features.htm) even has `git` available at all. 

## Resources
* [ftplib.FTP_TLS official documentation](https://docs.python.org/2/library/ftplib.html#ftplib.FTP_TLS)
* [SFTP vs. FTPS: What's the Best Protocol for Secure FTP?](https://www.goanywhere.com/blog/2011/10/20/sftp-ftps-secure-ftp-transfers)
* [Managing Credentials with Python](https://swharden.com/blog/2021-05-15-python-credentials/)
* [Create React App: Deployment](https://create-react-app.dev/docs/deployment/) is useful but never mentions FTP
* [ftp-deploy](https://www.npmjs.com/package/ftp-deploy) is a Node.js package to help with deploying code using FTP
May 15th, 2021

Managing Credentials with Python

I enjoy contributing to open-source, but I'd prefer to keep my passwords to myself! Python is a great glue language for automating tasks, and recently I've been using it to log in to my web server using SFTP and automate log analysis, file management, and software updates. The Python scripts I'm working on need to know my login information, but I want to commit them to source control and share them on GitHub so I have to be careful to use a strategy which minimizes risk of inadvertently leaking these secrets onto the internet.

This post explores various options for managing credentials in Python scripts in public repositories. There are many different ways to manage credentials with Python, and I was surprised to learn of some new ones as I was researching this topic. This post reviews the most common options, starting with the most insecure and working its way up to the most highly regarded methods for managing secrets.

Plain-Text Passwords in Code

⚠️☠️ DANGER: Never do this

You could put a password or API key directly in your python script, but even if you intend to remove it later there's always a chance you'll accidentally commit it to source control without realizing it, posing a security risk forever. This method is to be avoided at all costs!

username = "myUsername"
password = "S3CR3T_P455W0RD"
logIn(username, password)

Obfuscated Passwords in Code

⚠️☠️ DANGER: Never do this

A slightly less terrible idea is to obfuscate plain-text passwords by storing them as base 64 strings. You won't know the password just by seeing it, but anyone who has the string can easily decode it. Websites like https://www.base64decode.org are useful for this.

"""Demonstrate conversion to/from base 64"""

import base64

def obfuscate(plainText):
    plainBytes = plainText.encode('ascii')
    encodedBytes = base64.b64encode(plainBytes)
    encodedText = encodedBytes.decode('ascii')
    return encodedText


def deobfuscate(obfuscatedText):
    obfuscatedBytes = obfuscatedText.encode('ascii')
    decodedBytes = base64.b64decode(obfuscatedBytes)
    decodedText = decodedBytes.decode('ascii')
    return decodedText
original = "S3CR3T_P455W0RD"
obfuscated = obfuscate(original)
deobfuscated = deobfuscate(obfuscated)

print("original: " + original)
print("obfuscated: " + obfuscated)
print("deobfuscated: " + deobfuscated)
original: S3CR3T_P455W0RD
obfuscated: UzNDUjNUX1A0NTVXMFJE
deobfuscated: S3CR3T_P455W0RD

Passwords in Plain Text Files

⚠️ WARNING: This method is prone to mistakes. Ensure the text file is never committed to source control.

You could store username/password on the first two lines of a plain text file, then use python to read it when you need it.

with open("secrets.txt") as f:
    lines = f.readlines()
    username = lines[0].strip()
    password = lines[1].strip()
    print(f"USERNAME={username}, PASSWORD={password}")

If the text file is in the repository directory you should modify .gitignore to ensure it's not tracked by source source control. There is a risk that you may forget to do this, exposing your credentials online! A better idea may be to place the secrets file outside your repository folder entirely.

💡 There are libraries which make this easier. One example is Python Decouple which implements a lot of this logic gracefully and can even combine settings from multiple files (e.g., .ini vs .env files) for environments that can benefit from more advanced configuration options. See the notes below about helper libraries that environment variables and .env files

Passwords in Python Modules

⚠️ WARNING: This method is prone to mistakes. Ensure the secrets module is never committed to source control.

Similar to a plain text file not tracked by source control (ideally outside the repository folder entirely), you could store passwords as variables in a Python module then import it.

from mySecrets import username, password
print(f"USERNAME={username}, PASSWORD={password}")

If your secrets file is in an obscure folder, you will have to add it to your path so the module can be found when importing.

import sys
sys.path.append("C:/path/to/secrets/folder")

from mySecrets import username, password
print(f"USERNAME={username}, PASSWORD={password}")

Don't name your module secrets because the secrets module is part of the standard library and that will likely be imported in stead.

Passwords as Program Arguments

⚠️ WARNING: This method may store plain text passwords in your command history.

This isn't a great idea because passwords are seen in plain text in the console and also may be stored in the command history. However, you're unlikely to accidentally commit passwords to source control.

import sys
username = sys.argv[1]
password = sys.argv[2]
print(f"USERNAME={username}, PASSWORD={password}")
python test.py myUsername S3CR3T_P455W0RD

Type Passwords in the Console

You could request the user to type their password in the console, but the characters would be visible as they're typed.

# ⚠️ This code displays the typed password
password = input("Password: ")

Python has a getpass module in its standard library made for prompting the user for passwords as console input. Unlike input(), characters are not visible as the password is typed.

# 👍 This code hides the typed password
import getpass
password = getpass.getpass('Password: ')

Extract Passwords from the Clipboard

This is an interesting method. It's fast and simple, but a bit quirky. Downsides are (1) it requires the password to be in the clipboard which may expose it to other programs, (2) it requires installation of a nonstandard library, and (3) it won't work easily in server environments.

Note that I trust pyperclip more than clipboard (which is just another developer wrapping pyperclip)

pip install pyperclip

Run after copying a password to the clipboard:

import pyperclip
password = pyperclip.paste()

Request Credentials with Tk

The Tk graphics library is a cross-platform graphical widget toolkit that comes with Python. A login window that collects username and password can be created programmatically and wrapped in a function for easily inclusion in scripts that otherwise don't have a GUI.

I find this technique particularly useful when the username and password are stored in a password manager.

def getCredentials(defaultUser):
    """Request login credentials using a GUI."""
    import tkinter
    root = tkinter.Tk()
    root.eval('tk::PlaceWindow . center')
    root.title('Login')
    uv = tkinter.StringVar(root, value=defaultUser)
    pv = tkinter.StringVar(root, value='')
    userEntry = tkinter.Entry(root, bd=3, width=35, textvariable=uv)
    passEntry = tkinter.Entry(root, bd=3, width=35, show="*", textvariable=pv)
    btnClose = tkinter.Button(root, text="OK", command=root.destroy)
    userEntry.pack(padx=10, pady=5)
    passEntry.pack(padx=10, pady=5)
    btnClose.pack(padx=10, pady=5, side=tkinter.TOP, anchor=tkinter.NE)
    root.mainloop()
    return [uv.get(), pv.get()]
username, password = getCredentials("user@site.com")

Manage Passwords with a Keyring

The keyring package provides an easy way to access the system's keyring service from python. On MacOS it uses Keychain, on Windows it uses the Windows Credential Locker, and on Linux it can use KDE's KWallet or GNOME's Secret Service.

Downsides of keyrings are (1) it requires a nonstandard library, (2) implementation may be OS-specific, (3) it may not function easily in cloud environments.

pip install keyring
# store the password once
import keyring
keyring.set_password("system", "myUsername", "S3CR3T_P455W0RD")
# recall the password at any time
import keyring
password = keyring.get_password("system", "myUsername")

Passwords in Environment Variables

Environment variables are one of the better ways of managing credentials with Python. There are many articles on this topic, including Twilio's How To Set Environment Variables and Working with Environment Variables in Python. Environment variables are one of the preferred methods of credential management when working with cloud providers.

Be sure to restart your console session after editing environment variables before attempting to read them from within python.

import os
password = os.getenv('demoPassword')

There are many helper libraries such as python-dotenv and Python Decouple which can use local .env files to dynamically set environment variables as your program runs. As noted in previous sections, when storing passwords in plain-text in the file structure of your repository be extremely careful not to commit these files to source control!

Example .env file:

demoPassword2=superSecret

The dotenv package can load .env variables as environment variables when a Python script runs:

import dotenv
dotenv.load_dotenv()
password2 = os.getenv('demoPassword2')
print(password2)

Additional Resources

How do you manage credentials in Python? If you wish to share feedback or a creative method you use that I haven't discussed above, send me an email and I can include your suggestions in this document.

Markdown source code last modified on September 12th, 2021
---
Title: Managing Credentials with Python
Description: How to safely work with secret passwords in python scripts that are committed to source control
Date: 2021-05-15 9PM EST
Tags: python
---

# Managing Credentials with Python

**I enjoy contributing to open-source, but I'd prefer to keep my _passwords_ to myself!** Python is a great glue language for automating tasks, and recently I've been using it to log in to my web server using SFTP and automate log analysis, file management, and software updates. The Python scripts I'm working on need to know my login information, but I want to commit them to source control and share them on GitHub so I have to be careful to use a strategy which minimizes risk of inadvertently leaking these secrets onto the internet.

**This post explores various options for managing credentials in Python scripts in public repositories.** There are many different ways to manage credentials with Python, and I was surprised to learn of some new ones as I was researching this topic. This post reviews the most common options, starting with the most insecure and working its way up to the most highly regarded methods for managing secrets.

## Plain-Text Passwords in Code

> **⚠️☠️ DANGER:** Never do this

You could put a password or API key directly in your python script, but even if you intend to remove it later there's always a chance you'll accidentally commit it to source control without realizing it, posing a security risk forever. This method is to be avoided at all costs!

```python
username = "myUsername"
password = "S3CR3T_P455W0RD"
logIn(username, password)
```

## Obfuscated Passwords in Code

> **⚠️☠️ DANGER:** Never do this

A _slightly_ less terrible idea is to obfuscate plain-text passwords by storing them as base 64 strings. You won't know the password just by seeing it, but anyone who has the string can easily decode it. Websites like https://www.base64decode.org are useful for this.

```py
"""Demonstrate conversion to/from base 64"""

import base64

def obfuscate(plainText):
    plainBytes = plainText.encode('ascii')
    encodedBytes = base64.b64encode(plainBytes)
    encodedText = encodedBytes.decode('ascii')
    return encodedText


def deobfuscate(obfuscatedText):
    obfuscatedBytes = obfuscatedText.encode('ascii')
    decodedBytes = base64.b64decode(obfuscatedBytes)
    decodedText = decodedBytes.decode('ascii')
    return decodedText
```

```py
original = "S3CR3T_P455W0RD"
obfuscated = obfuscate(original)
deobfuscated = deobfuscate(obfuscated)

print("original: " + original)
print("obfuscated: " + obfuscated)
print("deobfuscated: " + deobfuscated)
```

```
original: S3CR3T_P455W0RD
obfuscated: UzNDUjNUX1A0NTVXMFJE
deobfuscated: S3CR3T_P455W0RD
```

## Passwords in Plain Text Files

> **⚠️ WARNING:** This method is prone to mistakes. Ensure the text file is never committed to source control.

You could store username/password on the first two lines of a plain text file, then use python to read it when you need it.

```py
with open("secrets.txt") as f:
    lines = f.readlines()
    username = lines[0].strip()
    password = lines[1].strip()
    print(f"USERNAME={username}, PASSWORD={password}")
```


If the text file is in the repository directory you should modify `.gitignore` to ensure it's not tracked by source source control. There is a risk that you may forget to do this, exposing your credentials online! A better idea may be to place the secrets file outside your repository folder entirely.

💡 There are libraries which make this easier. One example is [Python Decouple](https://pypi.org/project/python-decouple/) which implements a lot of this logic gracefully and can even combine settings from multiple files (e.g., `.ini` vs `.env` files) for environments that can benefit from more advanced configuration options. See the notes below about helper libraries that environment variables and `.env` files

## Passwords in Python Modules

> **⚠️ WARNING:** This method is prone to mistakes. Ensure the secrets module is never committed to source control.

Similar to a plain text file not tracked by source control (ideally outside the repository folder entirely), you could store passwords as variables in a Python module then import it.

```py
from mySecrets import username, password
print(f"USERNAME={username}, PASSWORD={password}")
```

If your secrets file is in an obscure folder, you will have to add it to your path so the module can be found when importing.

```py
import sys
sys.path.append("C:/path/to/secrets/folder")

from mySecrets import username, password
print(f"USERNAME={username}, PASSWORD={password}")
```

Don't name your module `secrets` because the [secrets module](https://docs.python.org/3/library/secrets.html) is part of the standard library and that will likely be imported in stead.

## Passwords as Program Arguments

> **⚠️ WARNING:** This method may store plain text passwords in your command history.

This isn't a great idea because passwords are seen in plain text in the console and also may be stored in the command history. However, you're unlikely to accidentally commit passwords to source control.

```py
import sys
username = sys.argv[1]
password = sys.argv[2]
print(f"USERNAME={username}, PASSWORD={password}")
```

```bash
python test.py myUsername S3CR3T_P455W0RD
```

## Type Passwords in the Console

You could request the user to type their password in the console, but the characters would be visible as they're typed.

```py
# ⚠️ This code displays the typed password
password = input("Password: ")
```

Python has a [getpass module](https://docs.python.org/3/library/getpass.html) in its standard library made for prompting the user for passwords as console input. Unlike `input()`, characters are not visible as the password is typed.

```py
# 👍 This code hides the typed password
import getpass
password = getpass.getpass('Password: ')
```

## Extract Passwords from the Clipboard

This is an interesting method. It's fast and simple, but a bit quirky. Downsides are (1) it requires the password to be in the clipboard which may expose it to other programs, (2) it requires installation of a nonstandard library, and (3) it won't work easily in server environments.

Note that I trust [pyperclip](https://pypi.org/project/pyperclip/) more than [clipboard](https://pypi.org/project/clipboard/) (which is just [another developer wrapping pyperclip](https://github.com/terryyin/clipboard/blob/master/clipboard.py))

```bash
pip install pyperclip
```

Run after copying a password to the clipboard:

```py
import pyperclip
password = pyperclip.paste()
```

## Request Credentials with Tk

The [Tk graphics library](https://docs.python.org/3/library/tk.html) is a cross-platform graphical widget toolkit that comes with Python. A login window that collects username and password can be created programmatically and wrapped in a function for easily inclusion in scripts that otherwise don't have a GUI.

I find this technique particularly useful when the username and password are stored in a password manager.

<div class="text-center">

![](tk-password-dialog.png)

</div>

```py
def getCredentials(defaultUser):
    """Request login credentials using a GUI."""
    import tkinter
    root = tkinter.Tk()
    root.eval('tk::PlaceWindow . center')
    root.title('Login')
    uv = tkinter.StringVar(root, value=defaultUser)
    pv = tkinter.StringVar(root, value='')
    userEntry = tkinter.Entry(root, bd=3, width=35, textvariable=uv)
    passEntry = tkinter.Entry(root, bd=3, width=35, show="*", textvariable=pv)
    btnClose = tkinter.Button(root, text="OK", command=root.destroy)
    userEntry.pack(padx=10, pady=5)
    passEntry.pack(padx=10, pady=5)
    btnClose.pack(padx=10, pady=5, side=tkinter.TOP, anchor=tkinter.NE)
    root.mainloop()
    return [uv.get(), pv.get()]
```

```py
username, password = getCredentials("user@site.com")
```

## Manage Passwords with a Keyring

The [keyring package](https://pypi.org/project/keyring/) provides an easy way to access the system's keyring service from python. On MacOS it uses [Keychain](https://en.wikipedia.org/wiki/Keychain_%28software%29), on Windows it uses the [Windows Credential Locker](https://docs.microsoft.com/en-us/windows/uwp/security/credential-locker), and on Linux it can use KDE's [KWallet](https://en.wikipedia.org/wiki/KWallet) or GNOME's [Secret Service](https://specifications.freedesktop.org/secret-service/latest).

Downsides of keyrings are (1) it requires a nonstandard library, (2) implementation may be OS-specific, (3) it may not function easily in cloud environments.

```bash
pip install keyring
```

```py
# store the password once
import keyring
keyring.set_password("system", "myUsername", "S3CR3T_P455W0RD")
```

```py
# recall the password at any time
import keyring
password = keyring.get_password("system", "myUsername")
```

## Passwords in Environment Variables

Environment variables are one of the better ways of managing credentials with Python. There are many articles on this topic, including Twilio's [How To Set Environment Variables](https://www.twilio.com/blog/2017/01/how-to-set-environment-variables.html) and [Working with Environment Variables in Python](https://www.twilio.com/blog/environment-variables-python). Environment variables are one of the preferred methods of credential management when working with cloud providers.

<div class="text-center">

![](environment-variables.png)

</div>

Be sure to restart your console session after editing environment variables before attempting to read them from within python.

```py
import os
password = os.getenv('demoPassword')
```

There are many helper libraries such as [python-dotenv](https://pypi.org/project/python-dotenv/) and [Python Decouple](https://pypi.org/project/python-decouple/) which can use local `.env` files to dynamically set environment variables as your program runs. As noted in previous sections, when storing passwords in plain-text in the file structure of your repository be extremely careful not to commit these files to source control!


Example `.env` file:
```txt
demoPassword2=superSecret
```

The `dotenv` package can load `.env` variables as environment variables when a Python script runs:

```py
import dotenv
dotenv.load_dotenv()
password2 = os.getenv('demoPassword2')
print(password2)
```

## Additional Resources
* [Using .env Files for Environment Variables in Python Applications](https://dev.to/jakewitcher/using-env-files-for-environment-variables-in-python-applications-55a1)
* [Environment Variables vs. Secrets In Python](https://www.activestate.com/blog/python-environment-variables-vs-secrets/)
* [How To Set Environment Variables](https://www.twilio.com/blog/2017/01/how-to-set-environment-variables.html)
* [Working with Environment Variables in Python](https://www.twilio.com/blog/environment-variables-python)
* [How to Set and Get Environment Variables in Python](https://able.bio/rhett/how-to-set-and-get-environment-variables-in-python--274rgt5)
* [python-dotenv](https://pypi.org/project/python-dotenv/) reads key-value pairs from a .env file and can set them as environment variables. It helps in the development of applications following the 12-factor principles.
* [Python Decouple](https://pypi.org/project/python-decouple/) helps you to organize your settings so that you can change parameters without having to redeploy your app.

_How do you manage credentials in Python? If you wish to share feedback or a creative method you use that I haven't discussed above, send me an email and I can include your suggestions in this document._
September 24th, 2020

Exponential Fit with Python

Fitting an exponential curve to data is a common task and in this example we'll use Python and SciPy to determine parameters for a curve fitted to arbitrary X/Y points. You can follow along using the fit.ipynb Jupyter notebook.

import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt

xs = np.arange(12) + 7
ys = np.array([304.08994, 229.13878, 173.71886, 135.75499,
               111.096794, 94.25109, 81.55578, 71.30187, 
               62.146603, 54.212032, 49.20715, 46.765743])

plt.plot(xs, ys, '.')
plt.title("Original Data")

To fit an arbitrary curve we must first define it as a function. We can then call scipy.optimize.curve_fit which will tweak the arguments (using arguments we provide as the starting parameters) to best fit the data. In this example we will use a single exponential decay function.

def monoExp(x, m, t, b):
    return m * np.exp(-t * x) + b

In biology / electrophysiology biexponential functions are often used to separate fast and slow components of exponential decay which may be caused by different mechanisms and occur at different rates. In this example we will only fit the data to a method with a exponential component (a monoexponential function), but the idea is the same.

# perform the fit
p0 = (2000, .1, 50) # start with values near those we expect
params, cv = scipy.optimize.curve_fit(monoExp, xs, ys, p0)
m, t, b = params
sampleRate = 20_000 # Hz
tauSec = (1 / t) / sampleRate

# determine quality of the fit
squaredDiffs = np.square(ys - monoExp(xs, m, t, b))
squaredDiffsFromMean = np.square(ys - np.mean(ys))
rSquared = 1 - np.sum(squaredDiffs) / np.sum(squaredDiffsFromMean)
print(f"R² = {rSquared}")

# plot the results
plt.plot(xs, ys, '.', label="data")
plt.plot(xs, monoExp(xs, m, t, b), '--', label="fitted")
plt.title("Fitted Exponential Curve")

# inspect the parameters
print(f"Y = {m} * e^(-{t} * x) + {b}")
print(f"Tau = {tauSec * 1e6} µs")

Y = 2666.499 * e^(-0.332 * x) + 42.494
Tau = 150.422 µs
R² = 0.999107330342064

Extrapolating the Fitted Curve

We can use the calculated parameters to extend this curve to any position by passing X values of interest into the function we used during the fit.

The value at time 0 is simply m + b because the exponential component becomes e^(0) which is 1.

xs2 = np.arange(25)
ys2 = monoExp(xs2, m, t, b)

plt.plot(xs, ys, '.', label="data")
plt.plot(xs2, ys2, '--', label="fitted")
plt.title("Extrapolated Exponential Curve")

Constraining the Infinite Decay Value

What if we know our data decays to 0? It's not best to fit to an exponential decay function that lets the b component be whatever it wants. Indeed, our fit from earlier calculated the ideal b to be 42.494 but what if we know it should be 0? The solution is to fit using an exponential function where b is constrained to 0 (or whatever value you know it to be).

def monoExpZeroB(x, m, t):
    return m * np.exp(-t * x)

# perform the fit using the function where B is 0
p0 = (2000, .1) # start with values near those we expect
paramsB, cv = scipy.optimize.curve_fit(monoExpZeroB, xs, ys, p0)
mB, tB = paramsB
sampleRate = 20_000 # Hz
tauSec = (1 / tB) / sampleRate

# inspect the results
print(f"Y = {mB} * e^(-{tB} * x)")
print(f"Tau = {tauSec * 1e6} µs")

# compare this curve to the original
ys2B = monoExpZeroB(xs2, mB, tB)
plt.plot(xs, ys, '.', label="data")
plt.plot(xs2, ys2, '--', label="fitted")
plt.plot(xs2, ys2B, '--', label="zero B")
Y = 1245.580 * e^(-0.210 * x)
Tau = 237.711 µs

The curves produced are very different at the extremes (especially when time is 0), even though they appear to both fit the data points nicely. Which curve is more accurate? That depends on your application. A hint can be gained by inspecting the time constants of these two curves.

Parameter Fitted B Fixed B
m 2666.499 1245.580
t 0.332 0.210
Tau 150.422 µs 237.711 µs
b 42.494 0

By inspecting Tau I can gain insight into which method may be better for me to use in my application. I expect Tau to be near 250 µs, leading me to trust the fixed-B method over the fitted B method. Choosing the correct method has great implications on the value of m (which is also the value of the curve when time is 0).

Markdown source code last modified on April 29th, 2021
---
title: Exponential Fit with Python
date: 2020-09-24 17:45:00
tags: python
---

# Exponential Fit with Python

**Fitting an exponential curve to data is a common task** and in this example we'll use Python and SciPy to determine parameters for a curve fitted to arbitrary X/Y points. You can follow along using the [fit.ipynb](fit.ipynb) Jupyter notebook.

```python
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt

xs = np.arange(12) + 7
ys = np.array([304.08994, 229.13878, 173.71886, 135.75499,
               111.096794, 94.25109, 81.55578, 71.30187, 
               62.146603, 54.212032, 49.20715, 46.765743])

plt.plot(xs, ys, '.')
plt.title("Original Data")
```

<div class="text-center">

![](original.png)

</div>

**To fit an arbitrary curve** we must first define it as a function. We can then call `scipy.optimize.curve_fit` which will tweak the arguments (using arguments we provide as the starting parameters) to best fit the data. In this example we will use a single [exponential decay](https://en.wikipedia.org/wiki/Exponential_decay) function. 

```python
def monoExp(x, m, t, b):
    return m * np.exp(-t * x) + b
```

**In biology / electrophysiology _biexponential_ functions are often used** to separate fast and slow components of exponential decay which may be caused by different mechanisms and occur at different rates. In this example we will only fit the data to a method with a exponential component (a _monoexponential_ function), but the idea is the same.

```python
# perform the fit
p0 = (2000, .1, 50) # start with values near those we expect
params, cv = scipy.optimize.curve_fit(monoExp, xs, ys, p0)
m, t, b = params
sampleRate = 20_000 # Hz
tauSec = (1 / t) / sampleRate

# determine quality of the fit
squaredDiffs = np.square(ys - monoExp(xs, m, t, b))
squaredDiffsFromMean = np.square(ys - np.mean(ys))
rSquared = 1 - np.sum(squaredDiffs) / np.sum(squaredDiffsFromMean)
print(f"R² = {rSquared}")

# plot the results
plt.plot(xs, ys, '.', label="data")
plt.plot(xs, monoExp(xs, m, t, b), '--', label="fitted")
plt.title("Fitted Exponential Curve")

# inspect the parameters
print(f"Y = {m} * e^(-{t} * x) + {b}")
print(f"Tau = {tauSec * 1e6} µs")
```

<div class="text-center">

![](fitted.png)

</div>

```
Y = 2666.499 * e^(-0.332 * x) + 42.494
Tau = 150.422 µs
R² = 0.999107330342064
```

## Extrapolating the Fitted Curve

**We can use the calculated parameters to extend this curve** to any position by passing X values of interest into the function we used during the fit. 

**The value at time 0** is simply `m + b` because the exponential component becomes e^(0) which is 1.

```python
xs2 = np.arange(25)
ys2 = monoExp(xs2, m, t, b)

plt.plot(xs, ys, '.', label="data")
plt.plot(xs2, ys2, '--', label="fitted")
plt.title("Extrapolated Exponential Curve")
```

<div class="text-center">

![](fitted2.png)

</div>

## Constraining the Infinite Decay Value

**What if we know our data decays to 0?** It's not best to fit to an exponential decay function that lets the `b` component be whatever it wants. Indeed, our fit from earlier calculated the ideal `b` to be `42.494` but what if we know it should be `0`? The solution is to fit using an exponential function where `b` is constrained to 0 (or whatever value you know it to be).

```python
def monoExpZeroB(x, m, t):
    return m * np.exp(-t * x)

# perform the fit using the function where B is 0
p0 = (2000, .1) # start with values near those we expect
paramsB, cv = scipy.optimize.curve_fit(monoExpZeroB, xs, ys, p0)
mB, tB = paramsB
sampleRate = 20_000 # Hz
tauSec = (1 / tB) / sampleRate

# inspect the results
print(f"Y = {mB} * e^(-{tB} * x)")
print(f"Tau = {tauSec * 1e6} µs")

# compare this curve to the original
ys2B = monoExpZeroB(xs2, mB, tB)
plt.plot(xs, ys, '.', label="data")
plt.plot(xs2, ys2, '--', label="fitted")
plt.plot(xs2, ys2B, '--', label="zero B")
```

```
Y = 1245.580 * e^(-0.210 * x)
Tau = 237.711 µs
```

<div class="text-center">

![](fits.png)

</div>

**The curves produced are very different** at the extremes (especially when time is 0), even though they appear to both fit the data points nicely. Which curve is more accurate? That depends on your application. A hint can be gained by inspecting the time constants of these two curves.

<div class="text-center">

Parameter | Fitted B | Fixed B
---|---|---
m|2666.499|1245.580
t|0.332|0.210
Tau|150.422 µs|237.711 µs
b|42.494|0

</div>

**By inspecting Tau** I can gain insight into which method may be better for me to use in my application. I expect Tau to be near 250 µs, leading me to trust the fixed-B method over the fitted B method. Choosing the correct method has great implications on the value of `m` (which is also the value of the curve when time is 0).
September 23rd, 2020

Signal Filtering in Python

Over a decade ago I posted code demonstrating how to filter data in Python, but there have been many improvements since then. My original posts (1, 2, 3, 4) required creating discrete filtering functions, but modern approaches can leverage Numpy and Scipy to do this more easily and efficiently. In this article we will use scipy.signal.filtfilt to apply low-pass, high-pass, and band-pass filters to reduce noise in an ECG signal (stored in ecg.wav (created as part of my Sound Card ECG project).

Moving-window filtering methods often result in a filtered signal that lags behind the original data (a phase shift). By filtering the signal twice in opposite directions filtfilt cancels-out this phase shift to produce a filtered signal which is nicely aligned with the input data.

import scipy.io.wavfile
import scipy.signal
import numpy as np
import matplotlib.pyplot as plt

# read ECG data from the WAV file
sampleRate, data = scipy.io.wavfile.read('ecg.wav')
times = np.arange(len(data))/sampleRate

# apply a 3-pole lowpass filter at 0.1x Nyquist frequency
b, a = scipy.signal.butter(3, 0.1)
filtered = scipy.signal.filtfilt(b, a, data)

# plot the original data next to the filtered data

plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.plot(times, data)
plt.title("ECG Signal with Noise")
plt.margins(0, .05)

plt.subplot(122)
plt.plot(times, filtered)
plt.title("Filtered ECG Signal")
plt.margins(0, .05)

plt.tight_layout()
plt.show()

Cutoff Frequency

The second argument passed into the butter method customizes the cut-off frequency of the Butterworth filter. This value (Wn) is a number between 0 and 1 representing the fraction of the Nyquist frequency to use for the filter. Note that Nyquist frequency is half of the sample rate. As this fraction increases, the cutoff frequency increases. You can get fancy and express this value as 2 * Hz / sample rate.

plt.plot(data, '.-', alpha=.5, label="data")

for cutoff in [.03, .05, .1]:
    b, a = scipy.signal.butter(3, cutoff)
    filtered = scipy.signal.filtfilt(b, a, data)
    label = f"{int(cutoff*100):d}%"
    plt.plot(filtered, label=label)

plt.legend()
plt.axis([350, 500, None, None])
plt.title("Effect of Different Cutoff Values")
plt.show()

Improve Edges with Gustafsson’s Method

Something weird happens at the edges. There's not enough data "off the page" to know how to smooth those points, so what should be done?

Padding is the default behavior, where edges are padded with with duplicates of the edge data points and smooth the trace as if those data points existed. The drawback of this is that one stray data point at the edge will greatly affect the shape of your smoothed data.

Gustafsson’s Method may be superior to padding. The advantage of this method is that stray points at the edges do not greatly influence the smoothed curve at the edges. This technique is described in a 1994 paper by Fredrik Gustafsson. "Initial conditions are chosen for the forward and backward passes so that the forward-backward filter gives the same result as the backward-forward filter." Interestingly this paper demonstrates the method by filtering noise out of an EKG recording.

# A small portion of data will be inspected for demonstration
segment = data[350:400]

filtered = scipy.signal.filtfilt(b, a, segment)
filteredGust = scipy.signal.filtfilt(b, a, segment, method="gust")

plt.plot(segment, '.-', alpha=.5, label="data")
plt.plot(filtered, 'k--', label="padded")
plt.plot(filteredGust, 'k', label="Gustafsson")
plt.legend()
plt.title("Padded Data vs. Gustafsson’s Method")
plt.show()

Band-Pass Filter

Low-pass and high-pass filters can be selected simply by customizing the third argument passed into the filter. The second argument indicates frequency (as fraction of Nyquist frequency, half the sample rate). Passing a list of two values in for the second argument allows for band-pass filtering of a signal.

b, a = scipy.signal.butter(3, 0.05, 'lowpass')
filteredLowPass = scipy.signal.filtfilt(b, a, data)

b, a = scipy.signal.butter(3, 0.05, 'highpass')
filteredHighPass = scipy.signal.filtfilt(b, a, data)

b, a = scipy.signal.butter(3, [.01, .05], 'band')
filteredBandPass = scipy.signal.lfilter(b, a, data)

Filter using Convolution

Another way to low-pass a signal is to use convolution. In this method you create a window (typically a bell-shaped curve) and convolve the window with the signal. The wider the window is the smoother the output signal will be. Also, the window must be normalized so its sum is 1 to preserve the amplitude of the input signal.

There are different ways to handle what happens to data points at the edges (see numpy.convolve for details), but setting mode to valid delete these points to produce an output signal slightly smaller than the input signal.

# create a normalized Hanning window
windowSize = 40
window = np.hanning(windowSize)
window = window / window.sum()

# filter the data using convolution
filtered = np.convolve(window, data, mode='valid')

plt.subplot(131)
plt.plot(kernel)
plt.title("Window")

plt.subplot(132)
plt.plot(data)
plt.title("Data")

plt.subplot(133)
plt.plot(filtered)
plt.title("Filtered")

Different window functions filter the signal in different ways. Hanning windows are typically preferred because they have a mostly Gaussian shape but touch zero at the edges. For a discussion of the pros and cons of different window functions for spectral analysis using the FFT, see my notes on FftSharp.

Resources

Markdown source code last modified on January 18th, 2021
---
title: Signal Filtering in Python
date: 2020-09-23 21:46:00
tags: python
---

# Signal Filtering in Python

**Over a decade ago I posted code demonstrating how to filter data in Python, but there have been many improvements since then.** My original posts ([1](https://swharden.com/blog/2008-11-17-linear-data-smoothing-in-python/), [2](https://swharden.com/blog/2009-01-21-signal-filtering-with-python/), [3](https://swharden.com/blog/2010-06-20-smoothing-window-data-averaging-in-python-moving-triangle-tecnique/), [4](https://swharden.com/blog/2010-06-24-detrending-data-in-python-with-numpy/)) required creating discrete filtering functions, but modern approaches can leverage Numpy and Scipy to do this more easily and efficiently. In this article we will use [`scipy.signal.filtfilt`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.filtfilt.html) to apply low-pass, high-pass, and band-pass filters to reduce noise in an ECG signal (stored in [ecg.wav](ecg.wav) (created as part of my [Sound Card ECG](https://swharden.com/blog/2019-03-15-sound-card-ecg-with-ad8232/) project).

<div class="text-center">

![](signal-lowpass-filter.png)

</div>

Moving-window filtering methods often result in a filtered signal that lags behind the original data (a _phase shift_). By filtering the signal twice in opposite directions `filtfilt` cancels-out this phase shift to produce a filtered signal which is nicely aligned with the input data.

```python
import scipy.io.wavfile
import scipy.signal
import numpy as np
import matplotlib.pyplot as plt

# read ECG data from the WAV file
sampleRate, data = scipy.io.wavfile.read('ecg.wav')
times = np.arange(len(data))/sampleRate

# apply a 3-pole lowpass filter at 0.1x Nyquist frequency
b, a = scipy.signal.butter(3, 0.1)
filtered = scipy.signal.filtfilt(b, a, data)
```

<div class="text-center">

![](signal-lowpass-ecg.png)

</div>

```python
# plot the original data next to the filtered data

plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.plot(times, data)
plt.title("ECG Signal with Noise")
plt.margins(0, .05)

plt.subplot(122)
plt.plot(times, filtered)
plt.title("Filtered ECG Signal")
plt.margins(0, .05)

plt.tight_layout()
plt.show()
```

## Cutoff Frequency

The second argument passed into the `butter` method customizes the cut-off frequency of the Butterworth filter. This value (Wn) is a number between 0 and 1 representing the _fraction of the Nyquist frequency_ to use for the filter. Note that [Nyquist frequency](https://en.wikipedia.org/wiki/Nyquist_frequency) is half of the sample rate. As this fraction increases, the cutoff frequency increases. You can get fancy and express this value as 2 * Hz / sample rate.

```python
plt.plot(data, '.-', alpha=.5, label="data")

for cutoff in [.03, .05, .1]:
    b, a = scipy.signal.butter(3, cutoff)
    filtered = scipy.signal.filtfilt(b, a, data)
    label = f"{int(cutoff*100):d}%"
    plt.plot(filtered, label=label)
    
plt.legend()
plt.axis([350, 500, None, None])
plt.title("Effect of Different Cutoff Values")
plt.show()
```

<div class="text-center">

![](signal-lowpass-cutoff.png)

</div>

## Improve Edges with Gustafsson’s Method

Something weird happens at the edges. There's not enough data "off the page" to know how to smooth those points, so what should be done? 

**Padding is the default behavior,** where edges are padded with with duplicates of the edge data points and smooth the trace as if those data points existed. The drawback of this is that one stray data point at the edge will greatly affect the shape of your smoothed data.

**Gustafsson’s Method may be superior to padding.** The advantage of this method is that stray points at the edges do not greatly influence the smoothed curve at the edges. This technique is described in [a 1994 paper by Fredrik Gustafsson](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=492552). "Initial conditions are chosen for the forward and backward passes so that the forward-backward filter gives the same result as the backward-forward filter." Interestingly this paper demonstrates the method by filtering noise out of an EKG recording.

```python
# A small portion of data will be inspected for demonstration
segment = data[350:400]

filtered = scipy.signal.filtfilt(b, a, segment)
filteredGust = scipy.signal.filtfilt(b, a, segment, method="gust")

plt.plot(segment, '.-', alpha=.5, label="data")
plt.plot(filtered, 'k--', label="padded")
plt.plot(filteredGust, 'k', label="Gustafsson")
plt.legend()
plt.title("Padded Data vs. Gustafsson’s Method")
plt.show()
```

<div class="text-center">

![](signal-method-gust.png)

</div>

## Band-Pass Filter

Low-pass and high-pass filters can be selected simply by customizing the third argument passed into the filter. The second argument indicates frequency (as fraction of Nyquist frequency, half the sample rate). Passing a list of two values in for the second argument allows for band-pass filtering of a signal.

```python
b, a = scipy.signal.butter(3, 0.05, 'lowpass')
filteredLowPass = scipy.signal.filtfilt(b, a, data)

b, a = scipy.signal.butter(3, 0.05, 'highpass')
filteredHighPass = scipy.signal.filtfilt(b, a, data)

b, a = scipy.signal.butter(3, [.01, .05], 'band')
filteredBandPass = scipy.signal.lfilter(b, a, data)
```

<div class="text-center">

![](signal-lowpass-highpass-bandpass.png)

</div>

## Filter using Convolution

**Another way to low-pass a signal is to use convolution.** In this method you create a window (typically a bell-shaped curve) and _convolve_ the window with the signal. The wider the window is the smoother the output signal will be. Also, the window must be normalized so its sum is 1 to preserve the amplitude of the input signal.

There are different ways to handle what happens to data points at the edges (see [`numpy.convolve`](https://numpy.org/doc/stable/reference/generated/numpy.convolve.html) for details), but setting `mode` to `valid` delete these points to produce an output signal slightly smaller than the input signal.

```python
# create a normalized Hanning window
windowSize = 40
window = np.hanning(windowSize)
window = window / window.sum()

# filter the data using convolution
filtered = np.convolve(window, data, mode='valid')
```

<div class="text-center">

![](signal-convolution-filter.png)

</div>

```python
plt.subplot(131)
plt.plot(kernel)
plt.title("Window")

plt.subplot(132)
plt.plot(data)
plt.title("Data")

plt.subplot(133)
plt.plot(filtered)
plt.title("Filtered")
```

**Different window functions filter the signal in different ways.** Hanning windows are typically preferred because they have a mostly Gaussian shape but touch zero at the edges. For a discussion of the pros and cons of different window functions for spectral analysis using the FFT, see my notes on [FftSharp](https://github.com/swharden/FftSharp).

## Resources

* Sample data: [ecg.wav](ecg.wav)

* [Sound Card ECG](https://swharden.com/blog/2019-03-15-sound-card-ecg-with-ad8232/)

* Jupyter notebook for this page: [signal-filtering.ipynb](signal-filtering.ipynb)

* SciPy Cookbook: [Filtfilt](https://scipy-cookbook.readthedocs.io/items/FiltFilt.html), [Buterworth Bandpass Filter](https://scipy-cookbook.readthedocs.io/items/ButterworthBandpass.html)

* SciPy Documentation: [scipy.signal.filtfilt](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.filtfilt.html), [scipy.signal.butter](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.butter.html)

* Numpy Documentation: [numpy.convolve](https://numpy.org/doc/stable/reference/generated/numpy.convolve.html)

* [Savitzky Golay Filtering](https://scipy-cookbook.readthedocs.io/items/SavitzkyGolay.html) - The Savitzky Golay filter is a particular type of low-pass filter, well adapted for data smoothing.
August 20th, 2017

Microcontroller Action Potential Generator

Here I demonstrate how to use a single microcontroller pin to generate action-potential-like waveforms. The output is similar my fully analog action potential generator circuit, but the waveform here is created in an entirely different way. A microcontroller is at the core of this project and determines when to fire action potentials. Taking advantage of the pseudo-random number generator (rand() in AVR-GCC's stdlib.h), I am able to easily produce unevenly-spaced action potentials which more accurately reflect those observed in nature. This circuit has a potentiometer to adjust the action potential frequency (probability) and another to adjust the amount of overshoot (afterhyperpolarization, AHP). I created this project because I wanted to practice designing various types of action potential measurement circuits, so creating an action potential generating circuit was an obvious perquisite.

The core of this circuit is a capacitor which is charged and discharged by toggling a microcontroller pin between high, low, and high-Z states. In the high state (pin configured as output, clamped at 5V) the capacitor charges through a series resistor as the pin sources current. In the low state (pin configured as output, clamped at 0V) the capacitor discharges through a series resistor as the pin sinks current. In the high-Z / high impedance state (pin configured as an input and little current flows through it), the capacitor rests. By spending most of the time in high-Z then rapidly cycling through high/low states, triangular waveforms can be created with rapid rise/fall times. Amplifying this transient and applying a low-pass filter using a single operational amplifier stage of an LM-358 shapes this transient into something which resembles an action potential. Wikipedia has a section describing how to use an op-amp to design an active low-pass filter like the one used here.

The code to generate the digital waveform is very straightforward. I'm using PB4 to charge/discharge the capacitor, so the code which actually fires an action potential is as follows:

// rising part = charging the capacitor
DDRB|=(1<<PB4); // make output (low Z)
PORTB|=(1<<PB4); // make high (5v, source current)
_delay_ms(2); // 2ms rise time

// falling part
DDRB|=(1<<PB4); // make output (low Z)
PORTB&=~(1<<PB4); // make low (0V, sink current)
_delay_ms(2); // 2ms fall time
_delay_us(150); // extra fall time for AHP

// return to rest state
DDRB&=~(1<<PB4); // make input (high Z)

Programming the microcontroller was accomplished after it was soldered into the device using test clips attached to my ICSP (USBtinyISP). I only recently started using test clips, and for one-off projects like this it's so much easier than adding header sockets or even wiring up header pins.

I am very pleased with how well this project turned out! I now have an easy way to make irregularly-spaced action potentials, and have a great starting point for future projects aimed at measuring action potential features using analog circuitry.

Notes

  • Action potential half-width (relating to the speed of the action potential) could be adjusted in software by reducing the time to charge and discharge the capacitor. A user control was not built in to the circuit shown here, however it would be very easy to allow a user to switch between regular and fast-spiking action potential waveforms.
  • I am happy that using the 1n4148 diode on the positive input of the op-amp works, but using two 100k resistors (forming a voltage divider floating around 2.5V) at the input and reducing the gain of this stage may have produced a more reliable result.
  • Action potential frequency (probability) is currently detected by sensing the analog voltage output by a rail-to-rail potentiometer. However, if you sensed a noisy line (simulating random excitatory and inhibitory synaptic input), you could easily make an integrate-and-fire model neuron which fires in response to excitatory input.
  • Discussion related to the nature of this "model neuron" with respect to other models (i.e., Hodgkin–Huxley) are on the previous post.
  • Something like this would make an interesting science fair project

Source Code on GitHub

Markdown source code last modified on January 18th, 2021
---
title: Microcontroller Action Potential Generator
date: 2017-08-20 15:12:33
tags: science, circuit, microcontroller, python
---

# Microcontroller Action Potential Generator

__Here I demonstrate how to use a _single_ microcontroller pin to generate action-potential-like waveforms. __The output is similar [my fully analog action potential generator circuit](https://www.swharden.com/wp/2017-08-12-analog-action-potential-generator-circuit/), but the waveform here is created in an entirely different way. A microcontroller is at the core of this project and determines when to fire action potentials. Taking advantage of the pseudo-random number generator ([rand() in AVR-GCC's stdlib.h](http://www.nongnu.org/avr-libc/user-manual/group__avr__stdlib.html#gae23144bcbb8e3742b00eb687c36654d1)), I am able to easily produce unevenly-spaced action potentials which more accurately reflect those observed in nature. This circuit has a potentiometer to adjust the action potential frequency (probability) and another to adjust the amount of overshoot (afterhyperpolarization, AHP). I created this project because I wanted to practice designing various types of action potential _measurement_ circuits, so creating an action potential _generating_ circuit was an obvious perquisite.

![](https://www.youtube.com/embed/2s8t3UsONFs)

__The core of this circuit is a capacitor which is charged and discharged by toggling a microcontroller pin between high, low, and high-Z states.__ In the high state (pin configured as output, clamped at 5V) the capacitor charges through a series resistor as the pin sources current. In the low state (pin configured as output, clamped at 0V) the capacitor discharges through a series resistor as the pin sinks current. In the high-Z / high impedance state (pin configured as an _input_ and little current flows through it), the capacitor rests. By spending most of the time in high-Z then rapidly cycling through high/low states, triangular waveforms can be created with rapid rise/fall times. Amplifying this transient and applying a low-pass filter using a single operational amplifier stage of an [LM-358](http://www.ti.com/lit/ds/symlink/lm158-n.pdf) shapes this transient into something which resembles an action potential. Wikipedia has a section describing how to [use an op-amp to design an active low-pass filter](https://en.wikipedia.org/wiki/Low-pass_filter#Active_electronic_realization) like the one used here.

<div class="text-center img-border">

[![](action-potential-generator-circuit_thumb.jpg)](action-potential-generator-circuit.jpg)

</div>

__The code to generate the digital waveform is very straightforward.__ I'm using PB4 to charge/discharge the capacitor, so the code which actually fires an action potential is as follows:

```c
// rising part = charging the capacitor
DDRB|=(1<<PB4); // make output (low Z)
PORTB|=(1<<PB4); // make high (5v, source current)
_delay_ms(2); // 2ms rise time

// falling part
DDRB|=(1<<PB4); // make output (low Z)
PORTB&=~(1<<PB4); // make low (0V, sink current)
_delay_ms(2); // 2ms fall time
_delay_us(150); // extra fall time for AHP

// return to rest state
DDRB&=~(1<<PB4); // make input (high Z)
```

__Programming the microcontroller__ was accomplished after it was soldered into the device using test clips attached to my ICSP ([USBtinyISP](https://www.ebay.com/sch/i.html?&_nkw=USBtinyISP)). I only recently started using test clips, and for one-off projects like this it's so much easier than adding header sockets or even wiring up header pins.

<div class="text-center img-border">

[![](ap-generator-programmer_thumb.jpg)](ap-generator-programmer.jpg)
[![](ap-generator-programmer-close_thumb.jpg)](ap-generator-programmer-close.jpg)

</div>

__I am very pleased with how well this project turned out!__ I now have an easy way to make irregularly-spaced action potentials, and have a great starting point for future projects aimed at _measuring_ action potential features using analog circuitry.


<div class="text-center img-border">

[![](ap-generator-running_thumb.jpg)](ap-generator-running.jpg)
[![](ap-generator-running-2_thumb.jpg)](ap-generator-running-2.jpg)

</div>

### Notes

*   Action potential half-width (relating to the speed of the action potential) could be adjusted _in software_ by reducing the time to charge and discharge the capacitor. A user control was not built in to the circuit shown here, however it would be very easy to allow a user to switch between regular and fast-spiking action potential waveforms.
*   I am happy that using the 1n4148 diode on the positive input of the op-amp works, but using two 100k resistors (forming a voltage divider floating around 2.5V) at the input and reducing the gain of this stage may have produced a more reliable result.
*   Action potential frequency (probability) is currently detected by sensing the analog voltage output by a rail-to-rail potentiometer. However, if you sensed a noisy line (simulating random excitatory and inhibitory synaptic input), you could easily make an integrate-and-fire model neuron which fires in response to excitatory input.
*   Discussion related to the nature of this "model neuron" with respect to other models (i.e., Hodgkin–Huxley) are on the [previous post](https://www.swharden.com/wp/2017-08-12-analog-action-potential-generator-circuit/).
*   Something like this would make an interesting science fair project

### Source Code on GitHub

*   <https://github.com/swharden/AVR-projects/>

*   <https://github.com/swharden/AVR-projects/tree/master/ATTiny85%202017-08-19%20action%20potential%20generator>
Pages