Removing Textile Markup From Wordpress Entries
⚠️ WARNING: This page is obsolete
Articles typically receive this designation when the technology they describe is no longer relevant, code provided is later deemed to be of poor quality, or the topics discussed are better presented in future articles. Articles like this are retained for the sake of preservation, but their content should be critically assessed.
I realized that the C code from yesterday wasn’t showing-up properly because of textile, a rapid, inline, tag-based formatting system. While it’s fun and convenient to use, it’s not always practical. The problem I was having was that in C code, variable names (such as delay) were becoming irrevocably italicized, and nothing I did could prevent textile from ignoring code while styling text. The kicker is that I couldn’t disable it easily, because I’ve been writing in this style for over four years! I decided that the time was now to put my mad Python skills to the test and write code to handle the conversion from textile-format to raw HTML. I accomplished this feat in a number of steps. Yeah, I could have done hours of research to find a “faster way”, but it simply wouldn’t have been as creative. In a nutshell, I backed-up the SQL database using PHPMyAdmin to a single “x.sql” file. I then wrote a pythons script to parse this [massive] file and output “o.sql”, the same data but with all of the textile tags I commonly used replaced by their HTML equivalent. It’s not 100% perfect, but it’s 99.999% perfect. I’ll accept that. The output? You’re viewing it! Here’s the code I used to do it:
## This Python script removes *SOME* textile formatting from Wordpress
## backups in plain text SQL format (dumped from PHP MyAdmin). Specifically,
## it corrects bold and itallic fonts and corrects links. It should be easy
## to expand if you need to do something else with it.
infile = 'x.sql'
replacements= ["r"," "],["n"," n "],["*:","* :"],["_:","_ :"],
["n","<br>n"],[">*","> *"],["*< ","* <"],
[">_","> _"],["_< ","_ <"],
[" *"," <b>"],["* "," "],[" _"," <i>"],["_ ","</i> "]
#These are the easy replacements
def fixLinks(line):
## replace ["links":URL] with [<a href="https://swharden.com/static/2009/05/15/URL">links</a>]. ##
words = line.split(" ")
for i in range(len(words)):
word = words[i]
if '":' in word:
upto=1
while (word.count('"')&lt;2):
word = words[i-upto]+" "+word
upto+=1
word_orig = word
extra=""
word = word.split('":')
word[0]=word[0][1:]
for char in ".),'":
if word[1][-1]==char: extra=char
if len(extra)>0: word[1]=word[1][:-1]
word_new='<a href="https://swharden.com/static/2009/05/15/%s">%s</a>'%(word[1],word[0])+extra
line=line.replace(word_orig,word_new)
return line
def stripTextile(orig):
## Handle the replacements and link fixing for each line. ##
if not orig.count("', '") == 13: return orig #non-normal post
line=orig
temp = line.split
line = line.split("', '",5)[2]
if len(line)&lt;10:return orig #non-normal post
origline = line
line = " "+line
for replacement in replacements:
line = line.replace(replacement[0],replacement[1])
line=fixLinks(line)
line = orig.replace(origline,line)
return line
f=open(infile)
raw=f.readlines()
f.close
posts=0
for raw_i in range(len(raw)):
if raw[raw_i][:11]=="INSERT INTO":
if "wp_posts" in raw[raw_i]: #if it's a post, handle it!
posts+=1
print "on post",posts
raw[raw_i]=stripTextile(raw[raw_i])
print "WRITING..."
out = ""
for line in raw:
out+=line
f=open('o.sql','w')
f.write(out)
f.close()
I certainly held my breath while the thing ran. As I previously mentioned, this thing modified SQL tables. Therefore, when I uploaded the “corrected” versions, I kept breaking the site until I got all the bugs worked out. Here’s an image from earlier today when my site was totally dead (0 blog posts)