SWHarden.com

The personal website of Scott W Harden

Compress Strings and Store to Files in Python

Summary: The blog post describes how to compress strings and store them to files using Python's `zlib` module, with examples of how to both compress and decompress data.
This summary was generated in 60.06 seconds from an original post containing 197 words.

Is Descriptive Eloquence Inherited?

Summary: The author of the blog is proud of their writing style, which they describe as casual yet indistinctly formal, and they are grateful for having a quasi-organized account of their thoughts going back to 2001.
This summary was generated in 78.49 seconds from an original post containing 343 words.

Compress and Store Files in Python

While writing code for my graduate research thesis I came across the need to lightly compress a huge and complex variable (a massive 3D data array) and store it in a text file for later retrieval. I decided to use the zlib compression library because it’s open source and works pretty much on every platform. I ran into a snag for a while though, because whenever I loaded data from a text file it wouldn’t properly decompress. I fixed this problem by adding the “rb” to the open line, forcing python to read the text file as binary data rather than ascii data. Below is my code, written in two functions to save/load compressed string data to/from files in Python.

import zlib  
  
def saveIt(data,fname):  
    data=str(data)  
    data=zlib.compress(data)  
    f=open(fname,'wb')  
    f.write(data)  
    f.close()  
    return  
  
def openIt(fname,evaluate=True):  
    f=open(fname,'rb')  
    data=f.read()  
    f.close()  
    data=zlib.decompress(data)  
    if evaluate: data=eval(data)  
    return data  

Oh yeah, don’t forget the evaluate option in the openIt function. If set to True (default), the returned variable will be an evaluated object. For example, [[1,2],[3,4]] will be returned as an actual 2D list, not just a string. How convenient is that?


Run Ubuntu Live CD From a USB Drive

I accidentally nuked my laptop’s 80G hard drive this morning (D’OH!) while shuffling around partitions. Supposedly there’s a valid windows (XP) installation on there still that’s about 20G. I’d love to repair it so I can use it today while I’m in the confocal room, but I don’t have an Ubuntu CD, Windows CD, or any CD for that matter! I looked around, but I guess blank CD-Rs aren’t something that’s standard in molecular biology laboratories. Anyhow, I wanted to install the new Ubuntu 8.10 Linux distribution, and I’ve downloaded the ISO, but since I can’t find a CD to burn it to I decided to try booting from a USB drive (something I’ve never done before). I found an AWESOME program which specialized in putting ISO files onto bootable USB drives. It’s called UNetBootin and it’s free (of course), runs on Linux or Windows, and has some built-in options for various linux distributions. I can repair my PC now! Yay!


Linear Data Smoothing in Python

⚠️ SEE UPDATED POST: Signal Filtering in Python

def smoothListGaussian(list, degree=5):
    window = degree*2-1
    weight = numpy.array([1.0]*window)
    weightGauss = []
    for i in range(window):
        i = i-degree+1
        frac = i/float(window)
        gauss = 1/(numpy.exp((4*(frac))**2))
        weightGauss.append(gauss)
    weight = numpy.array(weightGauss)*weight
    smoothed = [0.0]*(len(list)-window)
    for i in range(len(smoothed)):
        smoothed[i] = sum(numpy.array(list[i:i+window])*weight)/sum(weight)
    return smoothed

Provide a list and it will return a smoother version of the data. The Gaussian smoothing function I wrote is leagues better than a moving window average method, for reasons that are obvious when viewing the chart below. Surprisingly, the moving triangle method appears to be very similar to the Gaussian function at low degrees of spread. However, for large numbers of data points, the Gaussian function should perform better.

import pylab
import numpy

def smoothList(list, strippedXs=False, degree=10):
    if strippedXs == True:
        return Xs[0:-(len(list)-(len(list)-degree+1))]
    smoothed = [0]*(len(list)-degree+1)
    for i in range(len(smoothed)):
        smoothed[i] = sum(list[i:i+degree])/float(degree)
    return smoothed

def smoothListTriangle(list, strippedXs=False, degree=5):
    weight = []
    window = degree*2-1
    smoothed = [0.0]*(len(list)-window)
    for x in range(1, 2*degree):
        weight.append(degree-abs(degree-x))
    w = numpy.array(weight)
    for i in range(len(smoothed)):
        smoothed[i] = sum(numpy.array(list[i:i+window])*w)/float(sum(w))
    return smoothed

def smoothListGaussian(list, strippedXs=False, degree=5):
    window = degree*2-1
    weight = numpy.array([1.0]*window)
    weightGauss = []
    for i in range(window):
        i = i-degree+1
        frac = i/float(window)
        gauss = 1/(numpy.exp((4*(frac))**2))
        weightGauss.append(gauss)
    weight = numpy.array(weightGauss)*weight
    smoothed = [0.0]*(len(list)-window)
    for i in range(len(smoothed)):
        smoothed[i] = sum(numpy.array(list[i:i+window])*weight)/sum(weight)
    return smoothed

### DUMMY DATA ###
data = [0]*30  # 30 "0"s in a row
data[15] = 1  # the middle one is "1"

### PLOT DIFFERENT SMOOTHING FUNCTIONS ###
pylab.figure(figsize=(550/80, 700/80))
pylab.suptitle('1D Data Smoothing', fontsize=16)
pylab.subplot(4, 1, 1)
p1 = pylab.plot(data, ".k")
p1 = pylab.plot(data, "-k")
a = pylab.axis()
pylab.axis([a[0], a[1], -.1, 1.1])
pylab.text(2, .8, "raw data", fontsize=14)
pylab.subplot(4, 1, 2)
p1 = pylab.plot(smoothList(data), ".k")
p1 = pylab.plot(smoothList(data), "-k")
a = pylab.axis()
pylab.axis([a[0], a[1], -.1, .4])
pylab.text(2, .3, "moving window average", fontsize=14)
pylab.subplot(4, 1, 3)
p1 = pylab.plot(smoothListTriangle(data), ".k")
p1 = pylab.plot(smoothListTriangle(data), "-k")
pylab.axis([a[0], a[1], -.1, .4])
pylab.text(2, .3, "moving triangle", fontsize=14)
pylab.subplot(4, 1, 4)
p1 = pylab.plot(smoothListGaussian(data), ".k")
p1 = pylab.plot(smoothListGaussian(data), "-k")
pylab.axis([a[0], a[1], -.1, .4])
pylab.text(2, .3, "moving gaussian", fontsize=14)
# pylab.show()
pylab.savefig("smooth.png", dpi=80)

This data needs smoothing. Below is a visual representation of the differences in the methods of smoothing.

The degree of window coverage for the moving window average, moving triangle, and Gaussian functions are 10, 5, and 5 respectively. Also note that (due to the handling of the “degree” variable between the different functions) the actual number of data points assessed in these three functions are 10, 9, and 9 respectively. The degree for the last two functions represents “spread” from each point, whereas the first one represents the total number of points to be averaged for the moving average.