This is an AI Generated Summary
Many of the blog posts I wrote when I was young are no longer available on this website.
However, AI was used to summarize the content of the original post into a single sentence
as described in my 2003 article,
Summarizing Blog Posts with AI.
Summary:
The blog post describes how to compress strings and store them to files using Python's `zlib` module, with examples of how to both compress and decompress data.
This summary was generated in 60.06 seconds
from an original post containing 197 words.
This is an AI Generated Summary
Many of the blog posts I wrote when I was young are no longer available on this website.
However, AI was used to summarize the content of the original post into a single sentence
as described in my 2003 article,
Summarizing Blog Posts with AI.
Summary:
The author of the blog is proud of their writing style, which they describe as casual yet indistinctly formal, and they are grateful for having a quasi-organized account of their thoughts going back to 2001.
This summary was generated in 78.49 seconds
from an original post containing 343 words.
⚠️ WARNING: This page is obsolete
Articles typically receive this designation when the technology they describe is no longer relevant, code
provided is later deemed to be of poor quality, or the topics discussed are better presented in future
articles. Articles like this are retained for the sake of preservation, but their content should be
critically assessed.
While writing code for my graduate research thesis I came across the need to lightly compress a huge and complex variable (a massive 3D data array) and store it in a text file for later retrieval. I decided to use the zlib compression library because it’s open source and works pretty much on every platform. I ran into a snag for a while though, because whenever I loaded data from a text file it wouldn’t properly decompress. I fixed this problem by adding the “rb” to the open line, forcing python to read the text file as binary data rather than ascii data. Below is my code, written in two functions to save/load compressed string data to/from files in Python.
import zlib
def saveIt(data,fname):
data=str(data)
data=zlib.compress(data)
f=open(fname,'wb')
f.write(data)
f.close()
return
def openIt(fname,evaluate=True):
f=open(fname,'rb')
data=f.read()
f.close()
data=zlib.decompress(data)
if evaluate: data=eval(data)
return data
Oh yeah, don’t forget the evaluate option in the openIt function. If set to True (default), the returned variable will be an evaluated object. For example, [[1,2],[3,4]]
will be returned as an actual 2D list, not just a string. How convenient is that?
I accidentally nuked my laptop’s 80G hard drive this morning (D’OH!) while shuffling around partitions. Supposedly there’s a valid windows (XP) installation on there still that’s about 20G. I’d love to repair it so I can use it today while I’m in the confocal room, but I don’t have an Ubuntu CD, Windows CD, or any CD for that matter! I looked around, but I guess blank CD-Rs aren’t something that’s standard in molecular biology laboratories. Anyhow, I wanted to install the new Ubuntu 8.10 Linux distribution, and I’ve downloaded the ISO, but since I can’t find a CD to burn it to I decided to try booting from a USB drive (something I’ve never done before). I found an AWESOME program which specialized in putting ISO files onto bootable USB drives. It’s called UNetBootin and it’s free (of course), runs on Linux or Windows, and has some built-in options for various linux distributions. I can repair my PC now! Yay!
⚠️ WARNING: This page is obsolete
Articles typically receive this designation when the technology they describe is no longer relevant, code
provided is later deemed to be of poor quality, or the topics discussed are better presented in future
articles. Articles like this are retained for the sake of preservation, but their content should be
critically assessed.
⚠️ SEE UPDATED POST: Signal Filtering in Python
def smoothListGaussian(list, degree=5):
window = degree*2-1
weight = numpy.array([1.0]*window)
weightGauss = []
for i in range(window):
i = i-degree+1
frac = i/float(window)
gauss = 1/(numpy.exp((4*(frac))**2))
weightGauss.append(gauss)
weight = numpy.array(weightGauss)*weight
smoothed = [0.0]*(len(list)-window)
for i in range(len(smoothed)):
smoothed[i] = sum(numpy.array(list[i:i+window])*weight)/sum(weight)
return smoothed
Provide a list and it will return a smoother version of the data. The Gaussian smoothing function I wrote is leagues better than a moving window average method, for reasons that are obvious when viewing the chart below. Surprisingly, the moving triangle method appears to be very similar to the Gaussian function at low degrees of spread. However, for large numbers of data points, the Gaussian function should perform better.
import pylab
import numpy
def smoothList(list, strippedXs=False, degree=10):
if strippedXs == True:
return Xs[0:-(len(list)-(len(list)-degree+1))]
smoothed = [0]*(len(list)-degree+1)
for i in range(len(smoothed)):
smoothed[i] = sum(list[i:i+degree])/float(degree)
return smoothed
def smoothListTriangle(list, strippedXs=False, degree=5):
weight = []
window = degree*2-1
smoothed = [0.0]*(len(list)-window)
for x in range(1, 2*degree):
weight.append(degree-abs(degree-x))
w = numpy.array(weight)
for i in range(len(smoothed)):
smoothed[i] = sum(numpy.array(list[i:i+window])*w)/float(sum(w))
return smoothed
def smoothListGaussian(list, strippedXs=False, degree=5):
window = degree*2-1
weight = numpy.array([1.0]*window)
weightGauss = []
for i in range(window):
i = i-degree+1
frac = i/float(window)
gauss = 1/(numpy.exp((4*(frac))**2))
weightGauss.append(gauss)
weight = numpy.array(weightGauss)*weight
smoothed = [0.0]*(len(list)-window)
for i in range(len(smoothed)):
smoothed[i] = sum(numpy.array(list[i:i+window])*weight)/sum(weight)
return smoothed
### DUMMY DATA ###
data = [0]*30 # 30 "0"s in a row
data[15] = 1 # the middle one is "1"
### PLOT DIFFERENT SMOOTHING FUNCTIONS ###
pylab.figure(figsize=(550/80, 700/80))
pylab.suptitle('1D Data Smoothing', fontsize=16)
pylab.subplot(4, 1, 1)
p1 = pylab.plot(data, ".k")
p1 = pylab.plot(data, "-k")
a = pylab.axis()
pylab.axis([a[0], a[1], -.1, 1.1])
pylab.text(2, .8, "raw data", fontsize=14)
pylab.subplot(4, 1, 2)
p1 = pylab.plot(smoothList(data), ".k")
p1 = pylab.plot(smoothList(data), "-k")
a = pylab.axis()
pylab.axis([a[0], a[1], -.1, .4])
pylab.text(2, .3, "moving window average", fontsize=14)
pylab.subplot(4, 1, 3)
p1 = pylab.plot(smoothListTriangle(data), ".k")
p1 = pylab.plot(smoothListTriangle(data), "-k")
pylab.axis([a[0], a[1], -.1, .4])
pylab.text(2, .3, "moving triangle", fontsize=14)
pylab.subplot(4, 1, 4)
p1 = pylab.plot(smoothListGaussian(data), ".k")
p1 = pylab.plot(smoothListGaussian(data), "-k")
pylab.axis([a[0], a[1], -.1, .4])
pylab.text(2, .3, "moving gaussian", fontsize=14)
# pylab.show()
pylab.savefig("smooth.png", dpi=80)
This data needs smoothing. Below is a visual representation of the differences in the methods of smoothing.
The degree of window coverage for the moving window average, moving triangle, and Gaussian functions are 10, 5, and 5 respectively. Also note that (due to the handling of the “degree” variable between the different functions) the actual number of data points assessed in these three functions are 10, 9, and 9 respectively. The degree for the last two functions represents “spread” from each point, whereas the first one represents the total number of points to be averaged for the moving average.