SWHarden.com

The personal website of Scott W Harden

The Internet was Moved?

I’m working late in lab (as usual) and I went to open Firefox (my default web browser), so I clicked Start and selected “Mozilla Firefox” at the top (where your default browser usually goes in XP). Nothing happened… then this message popped up! I had to take a screen shot because it’s so bizarre.

I swear this is straight from my screen (PrntScrn -> mspaint (pasted) -> PNG) with no modifications. I especially love how the computer is trying to tell me what could have gone wrong; the internet might have been moved, renamed, or removed. Slick.


Analyzing my Writings with Python

I had some free time in lab today (between steps while of an immunohistochemistry experiment) so I decided to further investigate the field of bioinformatics. I was curious if it may be worth seeking a PhD in bioinformatics if I don’t get into dental school. UCF offers a PhD in bioinformatics but it’s a new and small department (I think there are only 4 faculty). A degree in bioinformatics combines molecular biology (DNA, proteins, etc), computer science (programming), and statistics.

I came across a paper today: Structural Alignment of Pseudoknotted RNA which is a good example of the practice of bioinformatics. Think about what goes on in a cell… the sequence of a gene (a short region of DNA) is copied (letter-by-letter) onto an RNA molecule. The RNA molecule is later read by an enzyme (called a ribosome) and converted into a protein based on its sequence. (This process is the central dogma of molecular biology) Traditionally, it was believed that RNA molecules’ only function was to copy gene sequences from DNA to ribosomes, but recently (the last several years) it was discovered that some small RNA molecules are never read and turned into proteins, but rather serve their own unique functions! For example, some RNA molecules (siRNAs) can actually turn genes on and off, and have been associated with cancer development and other immune diseases. Given the human genome (the ~3 billion letter long sequence all of our DNA), how can we determine what regions form these functional RNA molecules which don’t get converted into proteins? The paper I mentioned earlier addresses this. An algorithm was developed and used to test regions of DNA and predict its probability of forming small RNA molecules. Spikes in this trace (figure 7 of the paper) represent areas of the DNA which are likely to form these RNA molecules. (Is this useful? What if you were to compare these results between normal person and someone with cancer?)

After reading the article I considered how similar my current programming projects are with this one (e.g., my recent DIY ECG projects). The paper shows a trace of score (likelihood that a region of DNA forms an RNA molecule) where peaks represent likely locations of RNA formation. Just generate the trace, determine the positions of the peaks, and you’re golden. How similar is this to the work I’ve been doing with my homemade ECG machine, where I perform signal analysis to eliminate electrical noise and then analyze the resulting trace to isolate and identify peaks corresponding to heartbeats?

I got the itch to write my own string-analysis program. It reads the content of my website (exported from a SQL query), splits it up by date, and analyzes it. Ultimately I want to track the usage of certain words, but for now I wrote a script which plots the number of words I wrote.

__Pretty cool huh? __Check out all those spikes between 2004 and 2005! Not only are they numerous (meaning many posts), but they’re also high (meaning many words per post). As you can see by the top trace, the most significant contribution to my site occurred during this time. So, let’s zoom in on it.

Here is the code I used to produce this image.

"""Convert SQL backups of my WordPress blog into charts"""

import datetime, pylab, numpy

class blogChrono():
    baseUrl="https://swharden.com/blog"
    posts=[]
    dates=[]
    def __init__(self,fname):
        self.fname=fname
        self.load()
    def load(self):
        print "loading [%s]..."%self.fname,
        f=open(self.fname)
        raw=f.readlines()
        f.close()
        for line in raw:
            if "INSERT INTO" in line
            and';' in line[-2:-1]
            and " 'post'," in line[-20:-1]:
                post={}
                line=line.split("VALUES(",1)[1][:-3]
                line=line.replace(', NULL',', None')
                line=line.replace(", '',",", None,")
                line=line.replace("''","")
                c= line.split(',',4)[4][::-1]
                c= c.split(" ,",21)
                text=c[-1]
                text=text[::-1]
                text=text[2:-1]
                text=text.replace('"""','###')
                line=line.replace(text,'blogtext')
                line=line.replace(', ,',', None,')
                line=eval("["+line+"]")
                if len(line[4])>len('blogtext'):
                    x=str(line[4].split(', '))[2:-2]
                    raw=str(line)
                    raw=raw.replace(line[4],x)
                    line=eval(raw)
                post["id"]=int(line[0])
                post["date"]=datetime.datetime.strptime(line[2],
                                                        "%Y-%m-%d %H:%M:%S")
                post["text"]=eval('"""'+text+' """')
                post["title"]=line[5]
                post["url"]=line[21]
                post["comm"]=int(line[25])
                post["words"]=post["text"].count(" ")
                self.dates.append(post["date"])
                self.posts.append(post)
        self.dates.sort()
        d=self.dates[:]
        i,newposts=0,[]
        while len(self.posts)>0:
            die=min(self.dates)
            for post in self.posts:
                if post["date"]==die:
                    self.dates.remove(die)
                    newposts.append(post)
                    self.posts.remove(post)
        self.posts,self.dates=newposts,d
        print "read %d posts!n"%len(self.posts)

#d=blogChrono('sml.sql')
d=blogChrono('test.sql')

fig=pylab.figure(figsize=(7,5))
dates,lengths,words,ltot,wtot=[],[],[],[0],[0]
for post in d.posts:
    dates.append(post["date"])
    lengths.append(len(post["text"]))
    ltot.append(ltot[-1]+lengths[-1])
    words.append(post["words"])
    wtot.append(wtot[-1]+words[-1])
ltot,wtot=ltot[1:],wtot[1:]

pylab.subplot(211)
#pylab.plot(dates,numpy.array(ltot)/(10.0**6),label="letters")
pylab.plot(dates,numpy.array(wtot)/(10.0**3),label="words")
pylab.ylabel("Thousand")
pylab.title("Total Blogged Words")
pylab.grid(alpha=.2)
#pylab.legend()
fig.autofmt_xdate()
pylab.subplot(212,sharex=pylab.subplot(211))
pylab.bar(dates,numpy.array(words)/(10.0**3))
pylab.title("Words Per Entry")
pylab.ylabel("Thousand")
pylab.xlabel("Date")
pylab.grid(alpha=.2)
#pylab.axis([min(d.dates),max(d.dates),None,20])
fig.autofmt_xdate()
pylab.subplots_adjust(left=.1,bottom=.13,right=.98,top=.92,hspace=.25)
width=675
pylab.savefig('out.png',dpi=675/7)
pylab.show()

print "DONE"

__I wrote a Python script to analyze the word frequency __of the blogs in my website (extracted from an SQL query WordPress backup file) for frequency, then I took the list to Wordie and created a word jumble. Neat, huh?

import datetime, pylab, numpy
f=open('dump.txt')
body=f.read()
f.close()
body=body.lower()
body=body.split(" ")
tot=float(len(body))
words={}
for word in body:
    for i in word:
        if 65< =ord(i)<=90 or 97<=ord(i)<=122: pass
        else: word=None
    if word:
        if not word in words:words[word]=0
        words[word]=words[word]+1
data=[]
for word in words: data.append([words[word],word])
data.sort()
data.reverse()
out= "<b>Out of %d words...n"%tot
xs=[]
for i in range(1000):
    d=data[i]
    out += '<b>"%s"</b> ranks #%d used <b>%d</b> times (%.05f%%)
n'%
                (d[1],i+1,d[0],d[0]/tot)
f=open("dump.html",'w')
f.write(out)
f.close()
print "DONE"</b>


Signal Filtering with Python

⚠️ SEE UPDATED POST: Signal Filtering in Python

I’ve been spending a lot of time creating a DIY ECGs which produce fairly noisy signals. I have researched the ways to clean-up these signals, and the results are very useful! I document some of these findings here.

This example shows how I take __a noisy recording and turn it into a smooth trace. __This is achieved by eliminating excess high-frequency components which are in the original recording due to electromagnetic noise. A major source of noise can be from the AC passing through wires traveling through the walls of my apartment. My original ECG circuit was highly susceptible to this kind of interference, but my improved ECG circuit eliminates much of this noise. However, noise is still in the trace and it needs to be removed.

One method of reducing noise uses the FFT (Fast Fourier Transformation) and its inverse (iFFT) algorithm. Let’s say you have a trace with repeating sine-wave noise. The output of the FFT is the breakdown of the signal by frequency. Check out this FFT trace of a noisy signal from a few posts ago. High peaks represent frequencies which are common. See the enormous peak around 60 Hz? That’s noise from AC power lines. Other peaks (shown in colored bands) are other electromagnetic noise sources, such as wireless networks, TVs, telephones, and maybe my computer. The heart produces changes in electricity that are very slow (a heartbeat is about 1 Hz), so if we can eliminate higher-frequency sine waves we can get a pretty clear trace. This is called a band-stop filter (we block-out certain bands of frequencies). A band-pass filter is the opposite, where we only allow frequencies which are below (low-pass) or above (high-pass) a given frequency. By eliminating each of the peaks in the colored regions (setting each value to 0), then performing an inverse fast Fourier transformation (going backwards from frequency back to time), the result is the signal trace (seen as light gray on the bottom graph) with those high-frequency sine waves removed! (the gray trace on the bottom graph). A little touch-up smoothing makes a great trace (black trace on the bottom graph).

Here’s some Python code you may find useful. The image below is the output of the Python code at the bottom of this entry. This python file requires that ecg.wav (an actual ECG recording of my heartbeat) exist in the same folder.

import numpy, scipy, pylab, random

# This script demonstrates how to use band-pass (low-pass)
# filtering to eliminate electrical noise and static
# from signal data!

##################
### PROCESSING ###
##################

xs=numpy.arange(1,100,.01) #generate Xs (0.00,0.01,0.02,0.03,...,100.0)
signal = sin1=numpy.sin(xs*.3) #(A)
sin1=numpy.sin(xs) # (B) sin1
sin2=numpy.sin(xs*2.33)*.333 # (B) sin2
sin3=numpy.sin(xs*2.77)*.777 # (B) sin3
noise=sin1+sin2+sin3 # (C)
static = (numpy.random.random_sample((len(xs)))-.5)*.2 # (D)
sigstat=static+signal # (E)
rawsignal=sigstat+noise # (F)
fft=scipy.fft(rawsignal) # (G) and (H)
bp=fft[:]
for i in range(len(bp)): # (H-red)
    if i>=10:bp[i]=0
ibp=scipy.ifft(bp) # (I), (J), (K) and (L)

################
### GRAPHING ###
################

h,w=6,2
pylab.figure(figsize=(12,9))
pylab.subplots_adjust(hspace=.7)

pylab.subplot(h,w,1);pylab.title("(A) Original Signal")
pylab.plot(xs,signal)

pylab.subplot(h,w,3);pylab.title("(B) Electrical Noise Sources (3 Sine Waves)")
pylab.plot(xs,sin1,label="sin1")
pylab.plot(xs,sin2,label="sin2")
pylab.plot(xs,sin3,label="sin3")
pylab.legend()

pylab.subplot(h,w,5);pylab.title("(C) Electrical Noise (3 sine waves added together)")
pylab.plot(xs,noise)

pylab.subplot(h,w,7);pylab.title("(D) Static (random noise)")
pylab.plot(xs,static)
pylab.axis([None,None,-1,1])

pylab.subplot(h,w,9);pylab.title("(E) Signal + Static")
pylab.plot(xs,sigstat)

pylab.subplot(h,w,11);pylab.title("(F) Recording (Signal + Static + Electrical Noise)")
pylab.plot(xs,rawsignal)

pylab.subplot(h,w,2);pylab.title("(G) FFT of Recording")
fft=scipy.fft(rawsignal)
pylab.plot(abs(fft))
pylab.text(200,3000,"signals",verticalalignment='top')
pylab.text(9500,3000,"static",verticalalignment='top',
        horizontalalignment='right')

pylab.subplot(h,w,4);pylab.title("(H) Low-Pass FFT")
pylab.plot(abs(fft))
pylab.text(17,3000,"sin1",verticalalignment='top',horizontalalignment='left')
pylab.text(37,2000,"sin2",verticalalignment='top',horizontalalignment='center')
pylab.text(45,3000,"sin3",verticalalignment='top',horizontalalignment='left')
pylab.text(6,3000,"signal",verticalalignment='top',horizontalalignment='left')
pylab.axvspan(10,10000,fc='r',alpha='.5')
pylab.axis([0,60,None,None])

pylab.subplot(h,w,6);pylab.title("(I) Inverse FFT")
pylab.plot(ibp)

pylab.subplot(h,w,8);pylab.title("(J) Signal vs. iFFT")
pylab.plot(signal,'k',label="signal",alpha=.5)
pylab.plot(ibp,'b',label="ifft",alpha=.5)

pylab.subplot(h,w,10);pylab.title("(K) Normalized Signal vs. iFFT")
pylab.plot(signal/max(signal),'k',label="signal",alpha=.5)
pylab.plot(ibp/max(ibp),'b',label="ifft",alpha=.5)

pylab.subplot(h,w,12);pylab.title("(L) Difference / Error")
pylab.plot(signal/max(signal)-ibp/max(ibp),'k')

pylab.savefig("SIG.png",dpi=200)
pylab.show()

DIY ECG Detected an Irregular Heartbeat

⚠️ Check out my newer ECG designs:

Am I going to die? It’s unlikely. Upon analyzing ~20 minutes of heartbeat data (some of which is depicted in the previous entry) I found a peculiarity. Technically this could be some kind of noise (a ‘pop’ in the microphone signal due to the shuffling of wires or a momentary disconnect from the electrodes or perhaps even a static shock to my body from something), but because this peculiarity happened only once in 20 minutes I’m not ruling out the possibility that this is the first irregular heartbeat I captured with my DIY ECG. Note that single-beat irregularities are common, and that this does not alarm me so much as fascinates me. Below is the section of the data which contains this irregular beat.

In the spirit of improvement I wonder how much more interesting this project would be if I were to combine the already-designed ECG machine with a sensor to detect the physical effect of the heart’s beating on my vasculature. in other words, can I combine my electrical traces with physical traces? (Blood pressure or blood flow) I found an interesting site that shows how someone built a DIY blood flow meter using a piezo film pulse sensor. Pretty clever I must say… but I think I draw my limit at what I’ve done. Although blood flow would be interesting to analyze (does the murmur depicted above produce an alteration in normal blood flow?), it’s not worth the time, hassle or expense of building.


DIY ECG Improvements

⚠️ Check out my newer ECG designs:

Instead of using a single op-amp circuit like the previous entries which gave me decent but staticky traces, I decided to build a more advanced ECG circuit documented by Jason Nguyen which used 6 op amps! (I’d only been using one). Luckily I got a few couple LM324 quad op-amps from radioshack ($1.40 each), so I had everything I needed.

The results look great! Noise is almost zero, so true details of the trace are visible. I can now clearly see the PQRST features in the wave. I’ll detail how I did this in a later entry. For now, here are some photos of the little device.

UPDATE: After analyzing ~20 minutes of heartbeat data I found a peculiarity. Technically this could be some kind of noise (a ‘pop’ in the microphone signal), but because this peculiarity happened only once in 20 minutes I’m not ruling out the possibility that this is the first irregular heartbeat I captured with my DIY ECG. Note that single-beat irregularities are common in healthy people, and that this does not alarm me so much as fascinate me.