Removing Textile Markup From Wordpress Entries
⚠️ WARNING: This page is obsolete
Articles typically receive this designation when the technology they describe is no longer relevant, code provided is later deemed to be of poor quality, or the topics discussed are better presented in future articles. Articles like this are retained for the sake of preservation, but their content should be critically assessed.
I realized that the C code from yesterday wasn’t showing-up properly because of textile, a rapid, inline, tag-based formatting system. While it’s fun and convenient to use, it’s not always practical. The problem I was having was that in C code, variable names (such as delay) were becoming irrevocably italicized, and nothing I did could prevent textile from ignoring code while styling text. The kicker is that I couldn’t disable it easily, because I’ve been writing in this style for over four years! I decided that the time was now to put my mad Python skills to the test and write code to handle the conversion from textile-format to raw HTML. I accomplished this feat in a number of steps. Yeah, I could have done hours of research to find a “faster way”, but it simply wouldn’t have been as creative. In a nutshell, I backed-up the SQL database using PHPMyAdmin to a single “x.sql” file. I then wrote a pythons script to parse this [massive] file and output “o.sql”, the same data but with all of the textile tags I commonly used replaced by their HTML equivalent. It’s not 100% perfect, but it’s 99.999% perfect. I’ll accept that. The output? You’re viewing it! Here’s the code I used to do it:
## This Python script removes *SOME* textile formatting from Wordpress ## backups in plain text SQL format (dumped from PHP MyAdmin). Specifically, ## it corrects bold and itallic fonts and corrects links. It should be easy ## to expand if you need to do something else with it. infile = 'x.sql' replacements= ["r"," "],["n"," n "],["*:","* :"],["_:","_ :"], ["n","<br>n"],[">*","> *"],["*< ","* <"], [">_","> _"],["_< ","_ <"], [" *"," <b>"],["* "," "],[" _"," <i>"],["_ ","</i> "] #These are the easy replacements def fixLinks(line): ## replace ["links":URL] with [<a href="https://swharden.com/static/2009/05/15/URL">links</a>]. ## words = line.split(" ") for i in range(len(words)): word = words[i] if '":' in word: upto=1 while (word.count('"')&lt;2): word = words[i-upto]+" "+word upto+=1 word_orig = word extra="" word = word.split('":') word=word[1:] for char in ".),'": if word[-1]==char: extra=char if len(extra)>0: word=word[:-1] word_new='<a href="https://swharden.com/static/2009/05/15/%s">%s</a>'%(word,word)+extra line=line.replace(word_orig,word_new) return line def stripTextile(orig): ## Handle the replacements and link fixing for each line. ## if not orig.count("', '") == 13: return orig #non-normal post line=orig temp = line.split line = line.split("', '",5) if len(line)&lt;10:return orig #non-normal post origline = line line = " "+line for replacement in replacements: line = line.replace(replacement,replacement) line=fixLinks(line) line = orig.replace(origline,line) return line f=open(infile) raw=f.readlines() f.close posts=0 for raw_i in range(len(raw)): if raw[raw_i][:11]=="INSERT INTO": if "wp_posts" in raw[raw_i]: #if it's a post, handle it! posts+=1 print "on post",posts raw[raw_i]=stripTextile(raw[raw_i]) print "WRITING..." out = "" for line in raw: out+=line f=open('o.sql','w') f.write(out) f.close()
I certainly held my breath while the thing ran. As I previously mentioned, this thing modified SQL tables. Therefore, when I uploaded the “corrected” versions, I kept breaking the site until I got all the bugs worked out. Here’s an image from earlier today when my site was totally dead (0 blog posts)