The personal website of Scott W Harden

Analyzing my Writings with Python

I had some free time in lab today (between steps while of an immunohistochemistry experiment) so I decided to further investigate the field of bioinformatics. I was curious if it may be worth seeking a PhD in bioinformatics if I don't get into dental school. UCF offers a PhD in bioinformatics but it's a new and small department (I think there are only 4 faculty). A degree in bioinformatics combines molecular biology (DNA, proteins, etc), computer science (programming), and statistics.

I came across a paper today: Structural Alignment of Pseudoknotted RNA which is a good example of the practice of bioinformatics. Think about what goes on in a cell... the sequence of a gene (a short region of DNA) is copied (letter-by-letter) onto an RNA molecule. The RNA molecule is later read by an enzyme (called a ribosome) and converted into a protein based on its sequence. (This process is the central dogma of molecular biology) Traditionally, it was believed that RNA molecules' only function was to copy gene sequences from DNA to ribosomes, but recently (the last several years) it was discovered that some small RNA molecules are never read and turned into proteins, but rather serve their own unique functions! For example, some RNA molecules (siRNAs) can actually turn genes on and off, and have been associated with cancer development and other immune diseases. Given the human genome (the ~3 billion letter long sequence all of our DNA), how can we determine what regions form these functional RNA molecules which don't get converted into proteins? The paper I mentioned earlier addresses this. An algorithm was developed and used to test regions of DNA and predict its probability of forming small RNA molecules. Spikes in this trace (figure 7 of the paper) represent areas of the DNA which are likely to form these RNA molecules. (Is this useful? What if you were to compare these results between normal person and someone with cancer?)

After reading the article I considered how similar my current programming projects are with this one (e.g., my recent DIY ECG projects). The paper shows a trace of score (likelihood that a region of DNA forms an RNA molecule) where peaks represent likely locations of RNA formation. Just generate the trace, determine the positions of the peaks, and you're golden. How similar is this to the work I've been doing with my homemade ECG machine, where I perform signal analysis to eliminate electrical noise and then analyze the resulting trace to isolate and identify peaks corresponding to heartbeats?

I got the itch to write my own string-analysis program. It reads the content of my website (exported from a SQL query), splits it up by date, and analyzes it. Ultimately I want to track the usage of certain words, but for now I wrote a script which plots the number of words I wrote.

Pretty cool huh? Check out all those spikes between 2004 and 2005! Not only are they numerous (meaning many posts), but they're also high (meaning many words per post). As you can see by the top trace, the most significant contribution to my site occurred during this time. So, let's zoom in on it.

Here is the code I used to produce this image.

"""Convert SQL backups of my WordPress blog into charts"""

import datetime, pylab, numpy

class blogChrono():
    baseUrl="http://www.SWHarden.com/blog"
    posts=[]
    dates=[]
    def __init__(self,fname):
        self.fname=fname
        self.load()
    def load(self):
        print "loading [%s]..."%self.fname,
        f=open(self.fname)
        raw=f.readlines()
        f.close()
        for line in raw:
            if "INSERT INTO" in line
            and';' in line[-2:-1]
            and " 'post'," in line[-20:-1]:
                post={}
                line=line.split("VALUES(",1)[1][:-3]
                line=line.replace(', NULL',', None')
                line=line.replace(", '',",", None,")
                line=line.replace("''","")
                c= line.split(',',4)[4][::-1]
                c= c.split(" ,",21)
                text=c[-1]
                text=text[::-1]
                text=text[2:-1]
                text=text.replace('"""','###')
                line=line.replace(text,'blogtext')
                line=line.replace(', ,',', None,')
                line=eval("["+line+"]")
                if len(line[4])>len('blogtext'):
                    x=str(line[4].split(', '))[2:-2]
                    raw=str(line)
                    raw=raw.replace(line[4],x)
                    line=eval(raw)
                post["id"]=int(line[0])
                post["date"]=datetime.datetime.strptime(line[2],
                                                        "%Y-%m-%d %H:%M:%S")
                post["text"]=eval('"""'+text+' """')
                post["title"]=line[5]
                post["url"]=line[21]
                post["comm"]=int(line[25])
                post["words"]=post["text"].count(" ")
                self.dates.append(post["date"])
                self.posts.append(post)
        self.dates.sort()
        d=self.dates[:]
        i,newposts=0,[]
        while len(self.posts)>0:
            die=min(self.dates)
            for post in self.posts:
                if post["date"]==die:
                    self.dates.remove(die)
                    newposts.append(post)
                    self.posts.remove(post)
        self.posts,self.dates=newposts,d
        print "read %d posts!n"%len(self.posts)

#d=blogChrono('sml.sql')
d=blogChrono('test.sql')

fig=pylab.figure(figsize=(7,5))
dates,lengths,words,ltot,wtot=[],[],[],[0],[0]
for post in d.posts:
    dates.append(post["date"])
    lengths.append(len(post["text"]))
    ltot.append(ltot[-1]+lengths[-1])
    words.append(post["words"])
    wtot.append(wtot[-1]+words[-1])
ltot,wtot=ltot[1:],wtot[1:]

pylab.subplot(211)
#pylab.plot(dates,numpy.array(ltot)/(10.0**6),label="letters")
pylab.plot(dates,numpy.array(wtot)/(10.0**3),label="words")
pylab.ylabel("Thousand")
pylab.title("Total Blogged Words")
pylab.grid(alpha=.2)
#pylab.legend()
fig.autofmt_xdate()
pylab.subplot(212,sharex=pylab.subplot(211))
pylab.bar(dates,numpy.array(words)/(10.0**3))
pylab.title("Words Per Entry")
pylab.ylabel("Thousand")
pylab.xlabel("Date")
pylab.grid(alpha=.2)
#pylab.axis([min(d.dates),max(d.dates),None,20])
fig.autofmt_xdate()
pylab.subplots_adjust(left=.1,bottom=.13,right=.98,top=.92,hspace=.25)
width=675
pylab.savefig('out.png',dpi=675/7)
pylab.show()

print "DONE"

I wrote a Python script to analyze the word frequency of the blogs in my website (extracted from an SQL query WordPress backup file) for frequency, then I took the list to Wordie and created a word jumble. Neat, huh?

import datetime, pylab, numpy
f=open('dump.txt')
body=f.read()
f.close()
body=body.lower()
body=body.split(" ")
tot=float(len(body))
words={}
for word in body:
    for i in word:
        if 65< =ord(i)<=90 or 97<=ord(i)<=122: pass
        else: word=None
    if word:
        if not word in words:words[word]=0
        words[word]=words[word]+1
data=[]
for word in words: data.append([words[word],word])
data.sort()
data.reverse()
out= "<b>Out of %d words...n"%tot
xs=[]
for i in range(1000):
    d=data[i]
    out += '<b>"%s"</b> ranks #%d used <b>%d</b> times (%.05f%%)
n'%
                (d[1],i+1,d[0],d[0]/tot)
f=open("dump.html",'w')
f.write(out)
f.close()
print "DONE"</b>

Markdown source code last modified on January 18th, 2021
---
title: Analyzing my Writings with Python
date: 2009-01-29 18:46:22
tags: python, old
---

# Analyzing my Writings with Python

__I had some free time in lab today__ (between steps while of an immunohistochemistry experiment) so I decided to further investigate the field of bioinformatics. I was curious if it may be worth seeking a PhD in bioinformatics if I don't get into dental school. UCF offers a PhD in bioinformatics but it's a new and small department (I think there are only 4 faculty). A degree in bioinformatics combines molecular biology (DNA, proteins, etc), computer science (programming), and statistics.

__I came across a paper today__: [Structural Alignment of Pseudoknotted RNA](http://cseweb.ucsd.edu/users/shzhang/app/RECOMB2005_pseudoknot.pdf) which is a good example of the practice of bioinformatics. Think about what goes on in a cell... the sequence of a gene (a short region of DNA) is copied (letter-by-letter) onto an RNA molecule. The RNA molecule is later read by an enzyme (called a ribosome) and converted into a protein based on its sequence. (This process is the central dogma of molecular biology) Traditionally, it was believed that RNA molecules' only function was to copy gene sequences from DNA to ribosomes, but recently (the last several years) it was discovered that some small RNA molecules are never read and turned into proteins, but rather serve their own unique functions! For example, some RNA molecules (siRNAs) can actually turn genes on and off, and have been associated with cancer development and other immune diseases. Given the human genome (the ~3 billion letter long sequence all of our DNA), how can we determine what regions form these functional RNA molecules which don't get converted into proteins? The paper I mentioned earlier addresses this. An algorithm was developed and used to test regions of DNA and predict its probability of forming small RNA molecules. Spikes in this trace (figure 7 of the paper) represent areas of the DNA which are likely to form these RNA molecules. (Is this useful? What if you were to compare these results between normal person and someone with cancer?)

__After reading the article__ I considered how similar my current programming projects are with this one (e.g., my recent DIY ECG projects). The paper shows a trace of score (likelihood that a region of DNA forms an RNA molecule) where peaks represent likely locations of RNA formation. Just generate the trace, determine the positions of the peaks, and you're golden. How similar is this to the work I've been doing with my homemade ECG machine, where I perform signal analysis to eliminate electrical noise and then analyze the resulting trace to isolate and identify peaks corresponding to heartbeats?

__I got the itch to write my own string-analysis program.__ It reads the content of my website (exported from a SQL query), splits it up by date, and analyzes it. Ultimately I want to track the usage of certain words, but for now I wrote a script which plots the number of words I wrote.

<div class="text-center">

[![](blog_words_thumb.jpg)](blog_words.png)

</div>

__Pretty cool huh? __Check out all those spikes between 2004 and 2005! Not only are they numerous (meaning many posts), but they're also high (meaning many words per post). As you can see by the top trace, the most significant contribution to my site occurred during this time. So, let's zoom in on it.

<div class="text-center">

[![](blog_words_zoom_thumb.jpg)](blog_words_zoom.png)

</div>

__Here is the code__ I used to produce this image.

```python
"""Convert SQL backups of my WordPress blog into charts"""

import datetime, pylab, numpy

class blogChrono():
    baseUrl="http://www.SWHarden.com/blog"
    posts=[]
    dates=[]
    def __init__(self,fname):
        self.fname=fname
        self.load()
    def load(self):
        print "loading [%s]..."%self.fname,
        f=open(self.fname)
        raw=f.readlines()
        f.close()
        for line in raw:
            if "INSERT INTO" in line
            and';' in line[-2:-1]
            and " 'post'," in line[-20:-1]:
                post={}
                line=line.split("VALUES(",1)[1][:-3]
                line=line.replace(', NULL',', None')
                line=line.replace(", '',",", None,")
                line=line.replace("''","")
                c= line.split(',',4)[4][::-1]
                c= c.split(" ,",21)
                text=c[-1]
                text=text[::-1]
                text=text[2:-1]
                text=text.replace('"""','###')
                line=line.replace(text,'blogtext')
                line=line.replace(', ,',', None,')
                line=eval("["+line+"]")
                if len(line[4])&gt;len('blogtext'):
                    x=str(line[4].split(', '))[2:-2]
                    raw=str(line)
                    raw=raw.replace(line[4],x)
                    line=eval(raw)
                post["id"]=int(line[0])
                post["date"]=datetime.datetime.strptime(line[2],
                                                        "%Y-%m-%d %H:%M:%S")
                post["text"]=eval('"""'+text+' """')
                post["title"]=line[5]
                post["url"]=line[21]
                post["comm"]=int(line[25])
                post["words"]=post["text"].count(" ")
                self.dates.append(post["date"])
                self.posts.append(post)
        self.dates.sort()
        d=self.dates[:]
        i,newposts=0,[]
        while len(self.posts)&gt;0:
            die=min(self.dates)
            for post in self.posts:
                if post["date"]==die:
                    self.dates.remove(die)
                    newposts.append(post)
                    self.posts.remove(post)
        self.posts,self.dates=newposts,d
        print "read %d posts!n"%len(self.posts)

#d=blogChrono('sml.sql')
d=blogChrono('test.sql')

fig=pylab.figure(figsize=(7,5))
dates,lengths,words,ltot,wtot=[],[],[],[0],[0]
for post in d.posts:
    dates.append(post["date"])
    lengths.append(len(post["text"]))
    ltot.append(ltot[-1]+lengths[-1])
    words.append(post["words"])
    wtot.append(wtot[-1]+words[-1])
ltot,wtot=ltot[1:],wtot[1:]

pylab.subplot(211)
#pylab.plot(dates,numpy.array(ltot)/(10.0**6),label="letters")
pylab.plot(dates,numpy.array(wtot)/(10.0**3),label="words")
pylab.ylabel("Thousand")
pylab.title("Total Blogged Words")
pylab.grid(alpha=.2)
#pylab.legend()
fig.autofmt_xdate()
pylab.subplot(212,sharex=pylab.subplot(211))
pylab.bar(dates,numpy.array(words)/(10.0**3))
pylab.title("Words Per Entry")
pylab.ylabel("Thousand")
pylab.xlabel("Date")
pylab.grid(alpha=.2)
#pylab.axis([min(d.dates),max(d.dates),None,20])
fig.autofmt_xdate()
pylab.subplots_adjust(left=.1,bottom=.13,right=.98,top=.92,hspace=.25)
width=675
pylab.savefig('out.png',dpi=675/7)
pylab.show()

print "DONE"
```

__I wrote a Python script to analyze the word frequency __of the blogs in my website (extracted from an SQL query WordPress backup file) for frequency, then I took the list to [Wordie](http://www.wordle.net/) and created a word jumble. Neat, huh?

```python
import datetime, pylab, numpy
f=open('dump.txt')
body=f.read()
f.close()
body=body.lower()
body=body.split(" ")
tot=float(len(body))
words={}
for word in body:
    for i in word:
        if 65&lt; =ord(i)&lt;=90 or 97&lt;=ord(i)&lt;=122: pass
        else: word=None
    if word:
        if not word in words:words[word]=0
        words[word]=words[word]+1
data=[]
for word in words: data.append([words[word],word])
data.sort()
data.reverse()
out= "&lt;b&gt;Out of %d words...n"%tot
xs=[]
for i in range(1000):
    d=data[i]
    out += '&lt;b&gt;"%s"&lt;/b&gt; ranks #%d used &lt;b&gt;%d&lt;/b&gt; times (%.05f%%)
n'%
                (d[1],i+1,d[0],d[0]/tot)
f=open("dump.html",'w')
f.write(out)
f.close()
print "DONE"&lt;/b&gt;
```

<div class="text-center">

[![](wordie2_thumb.jpg)](wordie2.png)

</div>

Signal Filtering with Python

⚠️ SEE UPDATED POST: Signal Filtering in Python

I've been spending a lot of time creating a DIY ECGs which produce fairly noisy signals. I have researched the ways to clean-up these signals, and the results are very useful! I document some of these findings here.

This example shows how I take a noisy recording and turn it into a smooth trace. This is achieved by eliminating excess high-frequency components which are in the original recording due to electromagnetic noise. A major source of noise can be from the AC passing through wires traveling through the walls of my apartment. My original ECG circuit was highly susceptible to this kind of interference, but my improved ECG circuit eliminates much of this noise. However, noise is still in the trace and it needs to be removed.

One method of reducing noise uses the FFT (Fast Fourier Transformation) and its inverse (iFFT) algorithm. Let's say you have a trace with repeating sine-wave noise. The output of the FFT is the breakdown of the signal by frequency. Check out this FFT trace of a noisy signal from a few posts ago. High peaks represent frequencies which are common. See the enormous peak around 60 Hz? That's noise from AC power lines. Other peaks (shown in colored bands) are other electromagnetic noise sources, such as wireless networks, TVs, telephones, and maybe my computer. The heart produces changes in electricity that are very slow (a heartbeat is about 1 Hz), so if we can eliminate higher-frequency sine waves we can get a pretty clear trace. This is called a band-stop filter (we block-out certain bands of frequencies). A band-pass filter is the opposite, where we only allow frequencies which are below (low-pass) or above (high-pass) a given frequency. By eliminating each of the peaks in the colored regions (setting each value to 0), then performing an inverse fast Fourier transformation (going backwards from frequency back to time), the result is the signal trace (seen as light gray on the bottom graph) with those high-frequency sine waves removed! (the gray trace on the bottom graph). A little touch-up smoothing makes a great trace (black trace on the bottom graph).

Here's some Python code you may find useful. The image below is the output of the Python code at the bottom of this entry. This python file requires that test.wav (an actual ECG recording of my heartbeat) exist in the same folder.

  • (A) The original signal we want to isolate. (IE: our actual heart signal)

  • (B) Some electrical noise. (3 sine waves of different amplitudes and periods)

  • (C) Electrical noise (what happens when you add those 3 sine waves together)

  • (D) Static (random noise generated by a random number generator)

  • (E) Signal (A) plus static (D)

  • (F) Signal (A) plus static (D) plus electrical noise (C)

  • (G) Total FFT trace of (F). Note the low frequency peak due to the signal and electrical noise (near 0) and the high frequency peak due to static (near 10,000)

  • (H) This is a zoomed-in region of (F) showing 4 peaks (one for the original signal and 3 for high frequency noise). By blocking-out (set it to 0) everything above 10Hz (red), we isolate the peak we want (signal). This is a low-pass filter.

  • (I) Performing an inverse FFT (iFFT) on the low-pass iFFT, we get a nice trace which is our original signal!

  • (J) Comparison of our iFFT with our original signal shows that the amplitude is kinda messed up. If we normalize each of these (set minimum to 0, maximum to 1) they line up. Awesome!

  • (K) How close were we? Graphing the difference of iFFT and the original signal shows that usually we're not far off. The ends are a problem though, but if our data analysis trims off these ends then our center looks great.

  • Note: these ends can be fixed by applying a windowing function to the original data. The FFT works best if the input data starts and ends at zero.

import numpy, scipy, pylab, random

# This script demonstrates how to use band-pass (low-pass)
# filtering to eliminate electrical noise and static
# from signal data!

##################
### PROCESSING ###
##################

xs=numpy.arange(1,100,.01) #generate Xs (0.00,0.01,0.02,0.03,...,100.0)
signal = sin1=numpy.sin(xs*.3) #(A)
sin1=numpy.sin(xs) # (B) sin1
sin2=numpy.sin(xs*2.33)*.333 # (B) sin2
sin3=numpy.sin(xs*2.77)*.777 # (B) sin3
noise=sin1+sin2+sin3 # (C)
static = (numpy.random.random_sample((len(xs)))-.5)*.2 # (D)
sigstat=static+signal # (E)
rawsignal=sigstat+noise # (F)
fft=scipy.fft(rawsignal) # (G) and (H)
bp=fft[:]
for i in range(len(bp)): # (H-red)
    if i&gt;=10:bp[i]=0
ibp=scipy.ifft(bp) # (I), (J), (K) and (L)

################
### GRAPHING ###
################

h,w=6,2
pylab.figure(figsize=(12,9))
pylab.subplots_adjust(hspace=.7)

pylab.subplot(h,w,1);pylab.title("(A) Original Signal")
pylab.plot(xs,signal)

pylab.subplot(h,w,3);pylab.title("(B) Electrical Noise Sources (3 Sine Waves)")
pylab.plot(xs,sin1,label="sin1")
pylab.plot(xs,sin2,label="sin2")
pylab.plot(xs,sin3,label="sin3")
pylab.legend()

pylab.subplot(h,w,5);pylab.title("(C) Electrical Noise (3 sine waves added together)")
pylab.plot(xs,noise)

pylab.subplot(h,w,7);pylab.title("(D) Static (random noise)")
pylab.plot(xs,static)
pylab.axis([None,None,-1,1])

pylab.subplot(h,w,9);pylab.title("(E) Signal + Static")
pylab.plot(xs,sigstat)

pylab.subplot(h,w,11);pylab.title("(F) Recording (Signal + Static + Electrical Noise)")
pylab.plot(xs,rawsignal)

pylab.subplot(h,w,2);pylab.title("(G) FFT of Recording")
fft=scipy.fft(rawsignal)
pylab.plot(abs(fft))
pylab.text(200,3000,"signals",verticalalignment='top')
pylab.text(9500,3000,"static",verticalalignment='top',
        horizontalalignment='right')

pylab.subplot(h,w,4);pylab.title("(H) Low-Pass FFT")
pylab.plot(abs(fft))
pylab.text(17,3000,"sin1",verticalalignment='top',horizontalalignment='left')
pylab.text(37,2000,"sin2",verticalalignment='top',horizontalalignment='center')
pylab.text(45,3000,"sin3",verticalalignment='top',horizontalalignment='left')
pylab.text(6,3000,"signal",verticalalignment='top',horizontalalignment='left')
pylab.axvspan(10,10000,fc='r',alpha='.5')
pylab.axis([0,60,None,None])

pylab.subplot(h,w,6);pylab.title("(I) Inverse FFT")
pylab.plot(ibp)

pylab.subplot(h,w,8);pylab.title("(J) Signal vs. iFFT")
pylab.plot(signal,'k',label="signal",alpha=.5)
pylab.plot(ibp,'b',label="ifft",alpha=.5)

pylab.subplot(h,w,10);pylab.title("(K) Normalized Signal vs. iFFT")
pylab.plot(signal/max(signal),'k',label="signal",alpha=.5)
pylab.plot(ibp/max(ibp),'b',label="ifft",alpha=.5)

pylab.subplot(h,w,12);pylab.title("(L) Difference / Error")
pylab.plot(signal/max(signal)-ibp/max(ibp),'k')

pylab.savefig("SIG.png",dpi=200)
pylab.show()
Markdown source code last modified on January 18th, 2021
---
title: Signal Filtering with Python
date: 2009-01-21 16:25:04
tags: diyECG, python, old
---

# Signal Filtering with Python

> **⚠️ SEE UPDATED POST:** [**Signal Filtering in Python**](https://swharden.com/blog/2020-09-23-signal-filtering-in-python/)

I've been spending a lot of time creating a [DIY ECGs](http://www.swharden.com/blog/category/diy-ecg-home-made-electrocardiogram/) which produce fairly noisy signals. I have researched the ways to clean-up these signals, and the results are very useful! I document some of these findings here.

<div class="text-center">

[![](filtering_thumb.jpg)](filtering.png)

</div>

__This example shows how I take__ __a noisy recording and turn it into a smooth trace. __This is achieved by eliminating excess high-frequency components which are in the original recording due to electromagnetic noise. A major source of noise can be from the AC passing through wires traveling through the walls of my apartment. My [original ECG circuit](http://www.swharden.com/blog/images/opampecg.gif) was highly susceptible to this kind of interference, but my [improved ECG circuit](http://www.swharden.com/blog/images/bigsch.gif) eliminates much of this noise. However, noise is still in the trace and it needs to be removed.

__One method of reducing noise uses the FFT (Fast Fourier Transformation) and its inverse (iFFT) algorithm__. Let's say you have a trace with repeating sine-wave noise. The output of the FFT is the breakdown of the signal by frequency. Check out [this FFT trace of a noisy signal](http://www.swharden.com/blog/images/diy_ecg4.png) from a few posts ago. High peaks represent frequencies which are common. See the enormous peak around 60 Hz? That's noise from AC power lines. Other peaks (shown in colored bands) are other electromagnetic noise sources, such as wireless networks, TVs, telephones, and maybe my computer. The heart produces changes in electricity that are very slow (a heartbeat is about 1 Hz), so if we can eliminate higher-frequency sine waves we can get a pretty clear trace. This is called a band-stop filter (we block-out certain bands of frequencies). A band-pass filter is the opposite, where we only allow frequencies which are below (low-pass) or above (high-pass) a given frequency. By eliminating each of the peaks in the colored regions (setting each value to 0), then performing an inverse fast Fourier transformation (going backwards from frequency back to time), the result is the signal trace (seen as light gray on the bottom graph) with those high-frequency sine waves removed! (the gray trace on the bottom graph). A little touch-up smoothing makes a great trace (black trace on the bottom graph).

__Here's some Python code you may find useful.__ The image below is the output of the Python code at the bottom of this entry. This python file requires that [test.wav](http://www.SWHarden.com/tmp/test.wav) (an actual ECG recording of my heartbeat) exist in the same folder.

<div class="text-center img-border">

[![](sig_thumb.jpg)](sig.png)

</div>

*   (A) The original signal we want to isolate. (IE: our actual heart signal)
*   (B) Some electrical noise. (3 sine waves of different amplitudes and periods)
*   (C) Electrical noise (what happens when you add those 3 sine waves together)
*   (D) Static (random noise generated by a random number generator)
*   (E) Signal (A) plus static (D)
*   (F) Signal (A) plus static (D) plus electrical noise (C)
*   (G) Total FFT trace of (F). Note the low frequency peak due to the signal and electrical noise (near 0) and the high frequency peak due to static (near 10,000)
*   (H) This is a zoomed-in region of (F) showing 4 peaks (one for the original signal and 3 for high frequency noise). By blocking-out (set it to 0) everything above 10Hz (red), we isolate the peak we want (signal). This is a low-pass filter.
*   (I) Performing an inverse FFT (iFFT) on the low-pass iFFT, we get a nice trace which is our original signal!
*   (J) Comparison of our iFFT with our original signal shows that the amplitude is kinda messed up. If we normalize each of these (set minimum to 0, maximum to 1) they line up. Awesome!
*   (K) How close were we? Graphing the difference of iFFT and the original signal shows that usually we're not far off. The ends are a problem though, but if our data analysis trims off these ends then our center looks great.

*   _Note: these ends can be fixed by applying a windowing function to the original data. The FFT works best if the input data starts and ends at zero._

```python
import numpy, scipy, pylab, random

# This script demonstrates how to use band-pass (low-pass)
# filtering to eliminate electrical noise and static
# from signal data!

##################
### PROCESSING ###
##################

xs=numpy.arange(1,100,.01) #generate Xs (0.00,0.01,0.02,0.03,...,100.0)
signal = sin1=numpy.sin(xs*.3) #(A)
sin1=numpy.sin(xs) # (B) sin1
sin2=numpy.sin(xs*2.33)*.333 # (B) sin2
sin3=numpy.sin(xs*2.77)*.777 # (B) sin3
noise=sin1+sin2+sin3 # (C)
static = (numpy.random.random_sample((len(xs)))-.5)*.2 # (D)
sigstat=static+signal # (E)
rawsignal=sigstat+noise # (F)
fft=scipy.fft(rawsignal) # (G) and (H)
bp=fft[:]
for i in range(len(bp)): # (H-red)
    if i&gt;=10:bp[i]=0
ibp=scipy.ifft(bp) # (I), (J), (K) and (L)

################
### GRAPHING ###
################

h,w=6,2
pylab.figure(figsize=(12,9))
pylab.subplots_adjust(hspace=.7)

pylab.subplot(h,w,1);pylab.title("(A) Original Signal")
pylab.plot(xs,signal)

pylab.subplot(h,w,3);pylab.title("(B) Electrical Noise Sources (3 Sine Waves)")
pylab.plot(xs,sin1,label="sin1")
pylab.plot(xs,sin2,label="sin2")
pylab.plot(xs,sin3,label="sin3")
pylab.legend()

pylab.subplot(h,w,5);pylab.title("(C) Electrical Noise (3 sine waves added together)")
pylab.plot(xs,noise)

pylab.subplot(h,w,7);pylab.title("(D) Static (random noise)")
pylab.plot(xs,static)
pylab.axis([None,None,-1,1])

pylab.subplot(h,w,9);pylab.title("(E) Signal + Static")
pylab.plot(xs,sigstat)

pylab.subplot(h,w,11);pylab.title("(F) Recording (Signal + Static + Electrical Noise)")
pylab.plot(xs,rawsignal)

pylab.subplot(h,w,2);pylab.title("(G) FFT of Recording")
fft=scipy.fft(rawsignal)
pylab.plot(abs(fft))
pylab.text(200,3000,"signals",verticalalignment='top')
pylab.text(9500,3000,"static",verticalalignment='top',
        horizontalalignment='right')

pylab.subplot(h,w,4);pylab.title("(H) Low-Pass FFT")
pylab.plot(abs(fft))
pylab.text(17,3000,"sin1",verticalalignment='top',horizontalalignment='left')
pylab.text(37,2000,"sin2",verticalalignment='top',horizontalalignment='center')
pylab.text(45,3000,"sin3",verticalalignment='top',horizontalalignment='left')
pylab.text(6,3000,"signal",verticalalignment='top',horizontalalignment='left')
pylab.axvspan(10,10000,fc='r',alpha='.5')
pylab.axis([0,60,None,None])

pylab.subplot(h,w,6);pylab.title("(I) Inverse FFT")
pylab.plot(ibp)

pylab.subplot(h,w,8);pylab.title("(J) Signal vs. iFFT")
pylab.plot(signal,'k',label="signal",alpha=.5)
pylab.plot(ibp,'b',label="ifft",alpha=.5)

pylab.subplot(h,w,10);pylab.title("(K) Normalized Signal vs. iFFT")
pylab.plot(signal/max(signal),'k',label="signal",alpha=.5)
pylab.plot(ibp/max(ibp),'b',label="ifft",alpha=.5)

pylab.subplot(h,w,12);pylab.title("(L) Difference / Error")
pylab.plot(signal/max(signal)-ibp/max(ibp),'k')

pylab.savefig("SIG.png",dpi=200)
pylab.show()
```

January 15th, 2009

DIY ECG Trial 1

⚠️ Check out my newer ECG designs:

I've succeeded in building my own electrocardiograph (ECG) to record the electrical activity of my own heart! Briefly, I built a micropotential amplifier using an op-amp and attached it to makeshift electrodes on my chest (pennies and shampoo), fed the amplified signal into my sound card, and recorded it as a WAV. The signal is very noisy though. I was able to do a great job at removing this noise using band/frequency filters in GoldWave (audio editing software designed to handle WAV files). I band-blocked 50-70 Hz (which removed the oscillations from the 60 Hz AC lines running around my apartment). I then wrote the Python code (at the bottom of this entry) to load this WAV file as a single list of numbers (voltage potentials). I performed a data condensation algorithm (converting 100 points of raw WAV data into a single, averaged point, lessening my processing load by 100x), followed by two consecutive moving window averages (20-point window, performed on the condensed data). The result was a voltage reading that had most of the noise removed and a beautiful ECG signal emerged! I also tossed in some code to determine the peak of the R wave, label it (yellow dot), and use the inverse R-R time distance to calculate heart rate.

This is my actual ECC signal as record by a circuit similar to the one in the previous entry, recorded through my sound card, and processed with the Python script below. You can start to see the Q, R, S, and T components. I can't wait to solder-up a prototype (it's currently breadboarded) and try to analyze hours of data rather than just a few seconds. I'll take pictures of this device soon.

And here's the code I used: note that it relies on the WAV file I recorded. This code has extra functions not required to produce the image above, but I left them in in case they may be useful.

import wave, struct, numpy, pylab, scipy

def readwave(wavfilename):
    """load raw data directly from a WAV file."""
    global rate
    w=wave.open(wavfilename,'rb')
    (nchannel, width, rate, length, comptype, compname) = w.getparams()
    print "[%s] %d HZ (%0.2fsec)" %(wavfilename, rate, length/float(rate))
    frames = w.readframes(length)
    return numpy.array(struct.unpack("%sh" %length*nchannel,frames))

def shrink(data,deg=100):
    """condense a linear data array by a multiple of [deg]."""
    global rate
    small=[]
    print "starting with", len(data)
    for i in range(len(data)/deg):
        small.append(numpy.average(data[i*deg:(i+1)*deg]))
    print "ending with", len(small)
    rate = rate/deg
    return small

def normalize(data):
    """make all data fit between -.5 and +.5"""
    data=data-numpy.average(data)
    big=float(max(data))
    sml=float(min(data))
    data=data/abs(big-sml)
    data=data+float(abs(min(data)))-.47
    return data

def smooth(data,deg=20):
    """moving window average (deg = window size)."""
    for i in range(len(data)-deg):
        if i==0: cur,smooth=sum(data[0:deg]),[]
        smooth.append(cur/deg)
        cur=cur-data[i]+data[i+deg]
    return smooth

def makeabs(data):
    """center linear data to its average value."""
    for i in range(len(data)): data[i]=abs(data[i])
    return data

def invert(data):
    """obviously."""
    for i in range(len(data)): data[i]=-data[i]
    return data

def loadwav(fname='./wavs/bandpassed.wav'):
    """a do-everything function to get usable, smoothed data from a WAV."""
    wav=readwave(fname)
    wav=shrink(wav)
    wav=invert(wav)
    wav=smooth(wav)
    wav=smooth(wav,10)
    wav=normalize(wav)
    Xs=getXs(wav)
    return Xs,wav

def getXs(datalen):
    """calculate time positions based on WAV frequency resolution."""
    Xs=[]
    for i in range(len(datalen)):
        Xs.append(i*(1/float(rate)))
    print len(datalen), len(Xs)
    return Xs

def integrate(data):
    """integrate the function with respect to its order."""
    inte=[]
    for i in range(len(data)-1):
        inte.append(abs(data[i]-data[i+1]))
    inte.append(inte[-1])
    return inte

def getPoints(Xs,data,res=10):
    """return X,Y coordinates of R peaks and calculate R-R based heartrate."""
    pXs,pYs,pHRs=[],[],[]
    for i in range(res,len(data)-res):
        if data[i]&gt;data[i-res]+.1 and data[i]&gt;data[i+res]+.1:
            if data[i]&gt;data[i-1] and data[i]&gt;data[i+1]:
                pXs.append(Xs[i])
                pYs.append(data[i])
                if len(pXs)&gt;1:
                    pHRs.append((1.0/(pXs[-1]-pXs[-2]))*60.0)
    pHRs.append(pHRs[-1])
    return pXs,pYs,pHRs

Xs,Ys=loadwav()
px,py,pHR=getPoints(Xs,Ys)

pylab.figure(figsize=(12,6))
pylab.subplot(2,1,1)
#pylab.axhline(color='.4',linestyle=':')
pylab.plot(Xs,Ys,'b-')
pylab.plot(px,py,'y.')
pylab.axis([None,None,-.6,.6])
pylab.title("DIY Electrocardiogram - Trial 1",fontsize=26)
pylab.ylabel("Normalized Potential",fontsize=16)
#pylab.xlabel("Time (sec)")
ax=pylab.axis()
pylab.subplot(2,1,2)
pylab.plot(px,pHR,'k:')
pylab.plot(px,pHR,'b.')
pylab.axis([ax[0],ax[1],None,None])
pylab.ylabel("Heart Rate (BPM)",fontsize=16)
pylab.xlabel("Time (seconds)",fontsize=16)
pylab.savefig("test.png",dpi=120)
pylab.show()
print "DONE"
Markdown source code last modified on January 18th, 2021
---
title: DIY ECG Trial 1
date: 2009-01-15 00:39:45
tags: diyECG, python, old
---

# DIY ECG Trial 1

> **⚠️ Check out my newer ECG designs:** 
* [**Sound Card ECG with AD8232**](https://swharden.com/blog/2019-03-15-sound-card-ecg-with-ad8232/)
* [**Single op-amp ECG**](https://swharden.com/blog/2016-08-08-diy-ecg-with-1-op-amp/)

__I've succeeded in building my own electrocardiograph (ECG) to record the electrical activity of my own heart!__ Briefly, I built a micropotential amplifier using an op-amp and attached it to makeshift electrodes on my chest (pennies and shampoo), fed the amplified signal into my sound card, and recorded it as a WAV. The signal is very noisy though. I was able to do a great job at removing this noise using band/frequency filters in GoldWave (audio editing software designed to handle WAV files). I band-blocked 50-70 Hz (which removed the oscillations from the 60 Hz AC lines running around my apartment). I then wrote the Python code (at the bottom of this entry) to load this WAV file as a single list of numbers (voltage potentials). I performed a data condensation algorithm (converting 100 points of raw WAV data into a single, averaged point, lessening my processing load by 100x), followed by two consecutive moving window averages (20-point window, performed on the condensed data). The result was a voltage reading that had most of the noise removed and a beautiful ECG signal emerged! I also tossed in some code to determine the peak of the R wave, label it (yellow dot), and use the inverse R-R time distance to calculate heart rate.

<div class="text-center">

[![](diy_ecg2_thumb.jpg)](diy_ecg2.png)

</div>

__This is my actual ECC signal__ as record by a circuit similar to the one in the previous entry, recorded through my sound card, and processed with the Python script below. You can start to see the Q, R, S, and T components. I can't wait to solder-up a prototype (it's currently breadboarded) and try to analyze hours of data rather than just a few seconds. I'll take pictures of this device soon.

<div class="text-center">

[![](diy_ecg1_thumb.jpg)](diy_ecg1.png)

</div>

__And here's the code I used:__ note that it relies on the WAV file I recorded. This code has extra functions not required to produce the image above, but I left them in in case they may be useful.

```python
import wave, struct, numpy, pylab, scipy

def readwave(wavfilename):
    """load raw data directly from a WAV file."""
    global rate
    w=wave.open(wavfilename,'rb')
    (nchannel, width, rate, length, comptype, compname) = w.getparams()
    print "[%s] %d HZ (%0.2fsec)" %(wavfilename, rate, length/float(rate))
    frames = w.readframes(length)
    return numpy.array(struct.unpack("%sh" %length*nchannel,frames))

def shrink(data,deg=100):
    """condense a linear data array by a multiple of [deg]."""
    global rate
    small=[]
    print "starting with", len(data)
    for i in range(len(data)/deg):
        small.append(numpy.average(data[i*deg:(i+1)*deg]))
    print "ending with", len(small)
    rate = rate/deg
    return small

def normalize(data):
    """make all data fit between -.5 and +.5"""
    data=data-numpy.average(data)
    big=float(max(data))
    sml=float(min(data))
    data=data/abs(big-sml)
    data=data+float(abs(min(data)))-.47
    return data

def smooth(data,deg=20):
    """moving window average (deg = window size)."""
    for i in range(len(data)-deg):
        if i==0: cur,smooth=sum(data[0:deg]),[]
        smooth.append(cur/deg)
        cur=cur-data[i]+data[i+deg]
    return smooth

def makeabs(data):
    """center linear data to its average value."""
    for i in range(len(data)): data[i]=abs(data[i])
    return data

def invert(data):
    """obviously."""
    for i in range(len(data)): data[i]=-data[i]
    return data

def loadwav(fname='./wavs/bandpassed.wav'):
    """a do-everything function to get usable, smoothed data from a WAV."""
    wav=readwave(fname)
    wav=shrink(wav)
    wav=invert(wav)
    wav=smooth(wav)
    wav=smooth(wav,10)
    wav=normalize(wav)
    Xs=getXs(wav)
    return Xs,wav

def getXs(datalen):
    """calculate time positions based on WAV frequency resolution."""
    Xs=[]
    for i in range(len(datalen)):
        Xs.append(i*(1/float(rate)))
    print len(datalen), len(Xs)
    return Xs

def integrate(data):
    """integrate the function with respect to its order."""
    inte=[]
    for i in range(len(data)-1):
        inte.append(abs(data[i]-data[i+1]))
    inte.append(inte[-1])
    return inte

def getPoints(Xs,data,res=10):
    """return X,Y coordinates of R peaks and calculate R-R based heartrate."""
    pXs,pYs,pHRs=[],[],[]
    for i in range(res,len(data)-res):
        if data[i]&gt;data[i-res]+.1 and data[i]&gt;data[i+res]+.1:
            if data[i]&gt;data[i-1] and data[i]&gt;data[i+1]:
                pXs.append(Xs[i])
                pYs.append(data[i])
                if len(pXs)&gt;1:
                    pHRs.append((1.0/(pXs[-1]-pXs[-2]))*60.0)
    pHRs.append(pHRs[-1])
    return pXs,pYs,pHRs

Xs,Ys=loadwav()
px,py,pHR=getPoints(Xs,Ys)

pylab.figure(figsize=(12,6))
pylab.subplot(2,1,1)
#pylab.axhline(color='.4',linestyle=':')
pylab.plot(Xs,Ys,'b-')
pylab.plot(px,py,'y.')
pylab.axis([None,None,-.6,.6])
pylab.title("DIY Electrocardiogram - Trial 1",fontsize=26)
pylab.ylabel("Normalized Potential",fontsize=16)
#pylab.xlabel("Time (sec)")
ax=pylab.axis()
pylab.subplot(2,1,2)
pylab.plot(px,pHR,'k:')
pylab.plot(px,pHR,'b.')
pylab.axis([ax[0],ax[1],None,None])
pylab.ylabel("Heart Rate (BPM)",fontsize=16)
pylab.xlabel("Time (seconds)",fontsize=16)
pylab.savefig("test.png",dpi=120)
pylab.show()
print "DONE"
```

January 15th, 2009

Circuits vs. Software

⚠️ Check out my newer ECG designs:

Would I rather design circuits or software? I'm a software guy (or at least I know more about software than circuits) so I'd rather record noisy signals and write software to eliminate the noise, rather than assembling circuits to eliminate the noise in hardware. In the case of my DIY ECG machine, I'd say I've done a surprisingly good job of eliminating noise using software. Most DIY ECG circuits on the net use multiple op-amps and filters to do this. Instead of all that fancy stuff, I made a crude circuit (a single op-amp and two resisters) that is capable of record my ECG and filtered it in software. The output is pretty good!

The first step in removing noise is understanding it. Most of the noise in my signal came from sine waves caused by my electrodes picking up radiated signals in the room. Since this type of interference is consistent through the entire recording, power-spectral analysis could be applied to determine the frequencies of the noise so I could selectively block them out. I used the fast Fourier transform algorithm (FFT) on the values to generate a plot of the spectral components of my signal (mostly noise) seen as sharp peaks. I manually band-stopped certain regions of the spectrum that I thought were noise-related (colored bands). This is possible to do electronically with a more complicated circuit, but is interesting to do in software. I think performed an inverse FFT on the trace. The result was a trace with greatly reduced noise. After a moving window smoothing algorithm was applied the signal was even better! Note that I recorded the WAV file with "sound recorder" (not GoldWave) and did all of the processing (including band-pass filtering) within Python.

The ECG came out better than expected! The graph above shows the power spectral analysis with band-stop filters applied at the colored regions. Below is the trace of the original signal (light gray), the inverse-FFT-filtered trace (dark gray), and the smoothed filtered trace (black) - the final ECG signal I intend to use.

This is a magnified view of a few heartbeats. It looks pretty good! Here's the code I used to do all the calculations:

import wave, struct, numpy, pylab, scipy

fname='./success3.wav'

def readwave(wavfilename):
    """load raw data directly from a WAV file."""
    global rate
    w=wave.open(wavfilename,'rb')
    (nchannel, width, rate, length, comptype, compname) = w.getparams()
    print "[%s] %d HZ (%0.2fsec)" %(wavfilename, rate, length/float(rate))
    frames = w.readframes(length)
    return numpy.array(struct.unpack("%sh" %length*nchannel,frames))

def shrink(data,deg=100):
    """condense a linear data array by a multiple of [deg]."""
    global rate
    small=[]
    print "starting with", len(data)
    for i in range(len(data)/deg):
        small.append(numpy.average(data[i*deg:(i+1)*deg]))
    print "ending with", len(small)
    rate = rate/deg
    #return small[40000:50000]
    return small

def normalize(data):
    """make all data fit between -.5 and +.5"""
    data=data-numpy.average(data)
    big=float(max(data))
    sml=float(min(data))
    data=data/abs(big-sml)
    data=data+float(abs(min(data)))-.47
    return data

def smooth(data,deg=20,expand=False):
    """moving window average (deg = window size)."""
    for i in range(len(data)-deg):
        if i==0: cur,smooth=sum(data[0:deg]),[]
        smooth.append(cur/deg)
        cur=cur-data[i]+data[i+deg]
    if expand:
        for i in range(deg):
            smooth.append(smooth[-1])
    return smooth

def smoothListGaussian(list,degree=10,expand=False):
    window=degree*2-1
    weight=numpy.array([1.0]*window)
    weightGauss=[]
    for i in range(window):
        i=i-degree+1
        frac=i/float(window)
        gauss=1/(numpy.exp((4*(frac))**2))
        weightGauss.append(gauss)
    weight=numpy.array(weightGauss)*weight
    smoothed=[0.0]*(len(list)-window)
    for i in range(len(smoothed)):
        smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)
    if expand:
        for i in range((degree*2)-1):
            smoothed.append(smoothed[-1])
    return smoothed

def goodSmooth(data):
    #data=smooth(fix,20,True)
    data=smooth(fix,100,True)
    #data=smooth(fix,20,True)
    return data

def makeabs(data):
    """center linear data to its average value."""
    for i in range(len(data)): data[i]=abs(data[i])
    return data

def invert(data):
    """obviously."""
    for i in range(len(data)): data[i]=-data[i]
    return data

def loadwav(fname):
    """a do-everything function to get usable, smoothed data from a WAV."""
    wav=readwave(fname)
    wav=shrink(wav)
    wav=invert(wav)
    wav=smooth(wav)
    wav=smooth(wav,10)
    wav=normalize(wav)
    Xs=getXs(wav)
    return Xs,wav

def getXs(datalen):
    """calculate time positions based on WAV frequency resolution."""
    Xs=[]
    for i in range(len(datalen)):
        Xs.append(i*(1/float(rate)))
    print len(datalen), len(Xs)
    return Xs

def integrate(data):
    """integrate the function with respect to its order."""
    inte=[]
    for i in range(len(data)-1):
        inte.append(abs(data[i]-data[i+1]))
    inte.append(inte[-1])
    return inte

def getPoints(Xs,data,res=10):
    """return X,Y coordinates of R peaks and calculate R-R based heartrate."""
    pXs,pYs,pHRs=[],[],[]
    for i in range(res,len(data)-res):
        if data[i]&gt;data[i-res]+.1 and data[i]&gt;data[i+res]+.1:
            if data[i]&gt;data[i-1] and data[i]&gt;data[i+1]:
                pXs.append(Xs[i])
                pYs.append(data[i])
                if len(pXs)&gt;1:
                    pHRs.append((1.0/(pXs[-1]-pXs[-2]))*60.0)
    pHRs.append(pHRs[-1])
    return pXs,pYs,pHRs

def bandStop(fft,fftx,low,high,show=True):
    lbl="%d-%d"%(low,high)
    print "band-stopping:",lbl
    if show:
        col=pylab.cm.spectral(low/1200.)
        pylab.axvspan(low,high,alpha=.4,ec='none',label=lbl,fc=col)
        #pylab.axvspan(-low,-high,fc='r',alpha=.3)
    for i in range(len(fft)):
        if abs(fftx[i])&gt;low and abs(fftx[i])&lt;high :
            fft[i]=0
    return fft

def getXs(data):
    xs=numpy.array(range(len(data)))
    xs=xs*(1.0/rate)
    return xs

def clip(x,deg=1000):
    return numpy.array(x[deg:-deg])

pylab.figure(figsize=(12,8))
raw = invert(shrink(readwave(fname),10))
xs = getXs(raw)
fftr = numpy.fft.fft(raw)
fft = fftr[:]
fftx= numpy.fft.fftfreq(len(raw), d=(1.0/(rate)))

pylab.subplot(2,1,1)
pylab.plot(fftx,abs(fftr),'k')

fft=bandStop(fft,fftx,30,123)
fft=bandStop(fft,fftx,160,184)
fft=bandStop(fft,fftx,294,303)
fft=bandStop(fft,fftx,386,423)
fft=bandStop(fft,fftx,534,539)
fft=bandStop(fft,fftx,585,610)
fft=bandStop(fft,fftx,654,660)
fft=bandStop(fft,fftx,773,778)
fft=bandStop(fft,fftx,893,900)
fft=bandStop(fft,fftx,1100,max(fftx))
pylab.axis([0,1200,0,2*10**6])
pylab.legend()
pylab.title("Power Spectral Analysis",fontsize=28)
pylab.ylabel("Power",fontsize=20)
pylab.xlabel("Frequency (Hz)",fontsize=20)

pylab.subplot(2,1,2)
pylab.title("Original Trace",fontsize=28)
pylab.ylabel("Potential",fontsize=20)
pylab.xlabel("Time (sec)",fontsize=20)
pylab.plot(clip(xs),clip(raw),color='.8',label='1: raw')

fix = scipy.ifft(fft)
pylab.plot(clip(xs),clip(fix)+5000,color='.6',label='2: band-stop')
pylab.plot(clip(xs),clip(goodSmooth(fix))-5000,'k',label='3: smoothed')
pylab.legend()
pylab.title("Band-Stop Filtered Trace",fontsize=28)
pylab.ylabel("Potential",fontsize=20)
pylab.xlabel("Time (sec)",fontsize=20)

pylab.subplots_adjust(hspace=.5)
pylab.savefig('out.png',dpi=100)
pylab.show()
print "COMPLETE"
Markdown source code last modified on January 18th, 2021
---
title: Circuits vs. Software
date: 2009-01-15 17:47:52
tags: diyECG, old, python
---

# Circuits vs. Software

> **⚠️ Check out my newer ECG designs:** 
* [**Sound Card ECG with AD8232**](https://swharden.com/blog/2019-03-15-sound-card-ecg-with-ad8232/)
* [**Single op-amp ECG**](https://swharden.com/blog/2016-08-08-diy-ecg-with-1-op-amp/)

__Would I rather design circuits or software?__ I'm a software guy (or at least I know more about software than circuits) so I'd rather record noisy signals and write software to eliminate the noise, rather than assembling circuits to eliminate the noise in hardware. In the case of my DIY ECG machine, I'd say I've done a surprisingly good job of eliminating noise using software. Most DIY ECG circuits on the net use multiple op-amps and filters to do this. Instead of all that fancy stuff, I made a crude circuit (a single op-amp and two resisters) that is capable of record my ECG and filtered it in software. The output is pretty good!

__The first step in removing noise is understanding it.__ Most of the noise in my signal came from sine waves caused by my electrodes picking up radiated signals in the room. Since this type of interference is consistent through the entire recording, power-spectral analysis could be applied to determine the frequencies of the noise so I could selectively block them out. I used the [fast Fourier transform algorithm (FFT)](http://en.wikipedia.org/wiki/Fft) on the values to generate a plot of the spectral components of my signal (mostly noise) seen as sharp peaks. I manually band-stopped certain regions of the spectrum that I thought were noise-related (colored bands). This is possible to do electronically with a more complicated circuit, but is interesting to do in software. I think performed an inverse FFT on the trace. The result was a trace with greatly reduced noise. After a moving window smoothing algorithm was applied the signal was even better! Note that I recorded the WAV file with "sound recorder" (not GoldWave) and did all of the processing (including band-pass filtering) within Python.

<div class="text-center">

[![](diy_ecg4_thumb.jpg)](diy_ecg4.png)

</div>

__The ECG came out better than expected!__ The graph above shows the power spectral analysis with band-stop filters applied at the colored regions. Below is the trace of the original signal (light gray), the inverse-FFT-filtered trace (dark gray), and the smoothed filtered trace (black) - the final ECG signal I intend to use.

<div class="text-center">

[![](diy_ecg3_thumb.jpg)](diy_ecg3.png)

</div>

__This is a magnified view of a few heartbeats__. It looks pretty good! Here's the code I used to do all the calculations:

```python
import wave, struct, numpy, pylab, scipy

fname='./success3.wav'

def readwave(wavfilename):
    """load raw data directly from a WAV file."""
    global rate
    w=wave.open(wavfilename,'rb')
    (nchannel, width, rate, length, comptype, compname) = w.getparams()
    print "[%s] %d HZ (%0.2fsec)" %(wavfilename, rate, length/float(rate))
    frames = w.readframes(length)
    return numpy.array(struct.unpack("%sh" %length*nchannel,frames))

def shrink(data,deg=100):
    """condense a linear data array by a multiple of [deg]."""
    global rate
    small=[]
    print "starting with", len(data)
    for i in range(len(data)/deg):
        small.append(numpy.average(data[i*deg:(i+1)*deg]))
    print "ending with", len(small)
    rate = rate/deg
    #return small[40000:50000]
    return small

def normalize(data):
    """make all data fit between -.5 and +.5"""
    data=data-numpy.average(data)
    big=float(max(data))
    sml=float(min(data))
    data=data/abs(big-sml)
    data=data+float(abs(min(data)))-.47
    return data

def smooth(data,deg=20,expand=False):
    """moving window average (deg = window size)."""
    for i in range(len(data)-deg):
        if i==0: cur,smooth=sum(data[0:deg]),[]
        smooth.append(cur/deg)
        cur=cur-data[i]+data[i+deg]
    if expand:
        for i in range(deg):
            smooth.append(smooth[-1])
    return smooth

def smoothListGaussian(list,degree=10,expand=False):
    window=degree*2-1
    weight=numpy.array([1.0]*window)
    weightGauss=[]
    for i in range(window):
        i=i-degree+1
        frac=i/float(window)
        gauss=1/(numpy.exp((4*(frac))**2))
        weightGauss.append(gauss)
    weight=numpy.array(weightGauss)*weight
    smoothed=[0.0]*(len(list)-window)
    for i in range(len(smoothed)):
        smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)
    if expand:
        for i in range((degree*2)-1):
            smoothed.append(smoothed[-1])
    return smoothed

def goodSmooth(data):
    #data=smooth(fix,20,True)
    data=smooth(fix,100,True)
    #data=smooth(fix,20,True)
    return data

def makeabs(data):
    """center linear data to its average value."""
    for i in range(len(data)): data[i]=abs(data[i])
    return data

def invert(data):
    """obviously."""
    for i in range(len(data)): data[i]=-data[i]
    return data

def loadwav(fname):
    """a do-everything function to get usable, smoothed data from a WAV."""
    wav=readwave(fname)
    wav=shrink(wav)
    wav=invert(wav)
    wav=smooth(wav)
    wav=smooth(wav,10)
    wav=normalize(wav)
    Xs=getXs(wav)
    return Xs,wav

def getXs(datalen):
    """calculate time positions based on WAV frequency resolution."""
    Xs=[]
    for i in range(len(datalen)):
        Xs.append(i*(1/float(rate)))
    print len(datalen), len(Xs)
    return Xs

def integrate(data):
    """integrate the function with respect to its order."""
    inte=[]
    for i in range(len(data)-1):
        inte.append(abs(data[i]-data[i+1]))
    inte.append(inte[-1])
    return inte

def getPoints(Xs,data,res=10):
    """return X,Y coordinates of R peaks and calculate R-R based heartrate."""
    pXs,pYs,pHRs=[],[],[]
    for i in range(res,len(data)-res):
        if data[i]&gt;data[i-res]+.1 and data[i]&gt;data[i+res]+.1:
            if data[i]&gt;data[i-1] and data[i]&gt;data[i+1]:
                pXs.append(Xs[i])
                pYs.append(data[i])
                if len(pXs)&gt;1:
                    pHRs.append((1.0/(pXs[-1]-pXs[-2]))*60.0)
    pHRs.append(pHRs[-1])
    return pXs,pYs,pHRs

def bandStop(fft,fftx,low,high,show=True):
    lbl="%d-%d"%(low,high)
    print "band-stopping:",lbl
    if show:
        col=pylab.cm.spectral(low/1200.)
        pylab.axvspan(low,high,alpha=.4,ec='none',label=lbl,fc=col)
        #pylab.axvspan(-low,-high,fc='r',alpha=.3)
    for i in range(len(fft)):
        if abs(fftx[i])&gt;low and abs(fftx[i])&lt;high :
            fft[i]=0
    return fft

def getXs(data):
    xs=numpy.array(range(len(data)))
    xs=xs*(1.0/rate)
    return xs

def clip(x,deg=1000):
    return numpy.array(x[deg:-deg])

pylab.figure(figsize=(12,8))
raw = invert(shrink(readwave(fname),10))
xs = getXs(raw)
fftr = numpy.fft.fft(raw)
fft = fftr[:]
fftx= numpy.fft.fftfreq(len(raw), d=(1.0/(rate)))

pylab.subplot(2,1,1)
pylab.plot(fftx,abs(fftr),'k')

fft=bandStop(fft,fftx,30,123)
fft=bandStop(fft,fftx,160,184)
fft=bandStop(fft,fftx,294,303)
fft=bandStop(fft,fftx,386,423)
fft=bandStop(fft,fftx,534,539)
fft=bandStop(fft,fftx,585,610)
fft=bandStop(fft,fftx,654,660)
fft=bandStop(fft,fftx,773,778)
fft=bandStop(fft,fftx,893,900)
fft=bandStop(fft,fftx,1100,max(fftx))
pylab.axis([0,1200,0,2*10**6])
pylab.legend()
pylab.title("Power Spectral Analysis",fontsize=28)
pylab.ylabel("Power",fontsize=20)
pylab.xlabel("Frequency (Hz)",fontsize=20)

pylab.subplot(2,1,2)
pylab.title("Original Trace",fontsize=28)
pylab.ylabel("Potential",fontsize=20)
pylab.xlabel("Time (sec)",fontsize=20)
pylab.plot(clip(xs),clip(raw),color='.8',label='1: raw')

fix = scipy.ifft(fft)
pylab.plot(clip(xs),clip(fix)+5000,color='.6',label='2: band-stop')
pylab.plot(clip(xs),clip(goodSmooth(fix))-5000,'k',label='3: smoothed')
pylab.legend()
pylab.title("Band-Stop Filtered Trace",fontsize=28)
pylab.ylabel("Potential",fontsize=20)
pylab.xlabel("Time (sec)",fontsize=20)

pylab.subplots_adjust(hspace=.5)
pylab.savefig('out.png',dpi=100)
pylab.show()
print "COMPLETE"
```

Compress Strings and Store to Files in Python

While writing code for my graduate research thesis I came across the need to lightly compress a huge and complex variable (a massive 3D data array) and store it in a text file for later retrieval. I decided to use the zlib compression library because it's open source and works pretty much on every platform. I ran into a snag for a while though, because whenever I loaded data from a text file it wouldn't properly decompress. I fixed this problem by adding the "rb" to the open line, forcing python to read the text file as binary data rather than ascii data. Below is my code, written in two functions to save/load compressed string data to/from files in Python.

import zlib  

def saveIt(data,fname):  
    data=str(data)  
    data=zlib.compress(data)  
    f=open(fname,'wb')  
    f.write(data)  
    f.close()  
    return  

def openIt(fname,evaluate=True):  
    f=open(fname,'rb')  
    data=f.read()  
    f.close()  
    data=zlib.decompress(data)  
    if evaluate: data=eval(data)  
    return data  

Oh yeah, don't forget the evaluate option in the openIt function. If set to True (default), the returned variable will be an evaluated object. For example, [[1,2],[3,4]] will be returned as an actual 2D list, not just a string. How convenient is that?

Markdown source code last modified on January 18th, 2021
---
title: Compress and Store Files in Python
date: 2008-11-24 17:48:16
tags: python, old
---

# Compress Strings and Store to Files in Python

 __While writing code for my graduate research thesis__ I came across the need to lightly compress a huge and complex variable (a massive 3D data array) and store it in a text file for later retrieval.  I decided to use the [zlib](http://en.wikipedia.org/wiki/Zlib) compression library because it's open source and works pretty much on every platform.  I ran into a snag for a while though, because whenever I loaded data from a text file it wouldn't properly decompress.  I fixed this problem by adding the "rb" to the open line, forcing python to read the text file as binary data rather than ascii data.  Below is my code, written in two functions to save/load compressed string data to/from files in Python.

```python
import zlib  
  
def saveIt(data,fname):  
    data=str(data)  
    data=zlib.compress(data)  
    f=open(fname,'wb')  
    f.write(data)  
    f.close()  
    return  
  
def openIt(fname,evaluate=True):  
    f=open(fname,'rb')  
    data=f.read()  
    f.close()  
    data=zlib.decompress(data)  
    if evaluate: data=eval(data)  
    return data  
```

__Oh yeah, don't forget__ the evaluate option in the openIt function.  If set to True (default), the returned variable will be an evaluated object.  For example, `[[1,2],[3,4]]` will be returned as an actual 2D list, not just a string.  How convenient is that?
Pages