SWHarden.com

The personal website of Scott W Harden

Generate Apache-Style HTTP Access Logs via SQL and PHP

Does your web hosting company block access to access.log, the text file containing raw website log files? If so, you’re like me, and it sucks. There’s a plethora of gorgeous and extremely insightful website traffic analyzers, but all of them require access to raw HTTP access logs. Today I propose a semi-efficient way to generate such logs utilizing PHP to determine page load data (time, user IP, requested page, referring page, user client, etc) and SQL to save such data for easy retrieval later. Note that this method is a HUGE improvement of my previous project which used PHP scripts to store HTTP access logs as flat files. Although it worked in theory, in all practicality the process of opening, writing to, and closing a text file (which grew a few MB a week) was too cumbersome for my server to comfortable handle. The method described on this page utilizes SQL, a database engine well-suited to meet these exact demands. When we’re done, you’ll be able to use a web interface to view your access log (pictured, converting long, complicated search queries to web search and image search strings automatically), or have the option to export it directly to an access.log text file in a standard Apache-style format.

First, make sure your database is structured appropriately. This page is written for those with a working knowledge of PHP and SQL, but if you’re new to the field I encourage you to learn! W3Schools.com is an awesome resource to rapidly learn new languages. Also, when starting-out with SQL (like me), phpMyAdmin is a awesome. The code, as it’s currently written (below) is designed to store data in the “nibjb” database under the “logs” table. Briefly, it uses PHP to determine user data (time, ip, requested page, etc.) and injects this information into the SQL database. In fact, it’s doing it to you right now! Don’t believe me? View the source of this web page and scroll to the bottom. BAM! There you are.

// logme.php
<?php

if ( !isset($wp_did_header) ) {
    $wp_did_header = true;
    require_once( '/home/content/n/i/b/nibjb/html/blog/wp-load.php' );
    //wp();
    //require_once( '/home/content/n/i/b/nibjb/html/blog/wp-includes/template-loader.php' );
}

function logwriter_handlevar($varname,$defaultvalue){
    $tempvar = getenv($varname);
    if(!empty($tempvar)) {
        return $tempvar;
    } else {
        return $defaultvalue;
    }
}

if (!empty($REMOTE_HOST)) {
$logwriter_remote_vistor = $REMOTE_HOST;
}else{
$logwriter_remote_vistor = logwriter_handlevar("REMOTE_ADDR","-");
}

$logwriter_remote_ident = logwriter_handlevar("REMOTE_IDENT","-");
$logwriter_remote_user = logwriter_handlevar("REMOTE_USER","-");
$logwriter_date = date("d/M/Y:H:i:s");
$logwriter_request_method = logwriter_handlevar("REQUEST_METHOD","GET");
$logwriter_request_uri = logwriter_handlevar("REQUEST_URI","");
$logwriter_server_protocol = logwriter_handlevar("SERVER_PROTOCOL","HTTP/1.1");
$logwriter_http_referer = logwriter_handlevar("HTTP_REFERER","-");
$logwriter_http_user_agent = logwriter_handlevar("HTTP_USER_AGENT","");
$logwriter_logstring = "$logwriter_remote_vistor $logwriter_remote_ident $logwriter_remote_user [$logwriter_date $logwriter_timezone] "$logwriter_request_method $logwriter_request_uri $logwriter_server_protocol" 200 - "$logwriter_http_referer" "$logwriter_http_user_agent"n";
?>

<?php
$username="YOUR_USERNAME";
$password="YOUR_PASSWORD";
$database="nibjb";
mysql_connect('mysql157.secureserver.net',$username,$password);
//mysql_connect(localhost,$username,$password);

$query = "INSERT INTO logs VALUES ('','$logwriter_date','$logwriter_remote_vistor','$logwriter_request_method','$logwriter_request_uri','$logwriter_server_protocol','$logwriter_http_referer','$logwriter_http_user_agent')";
mysql_query($query);
mysql_close();
?>

<!--
LOG DETAILS:
time: <?php echo($logwriter_date); ?>
vistor: <?php echo($logwriter_remote_vistor); ?>
method: <?php echo($logwriter_request_method); ?>
request: <?php echo($logwriter_request_uri); ?>
protocol: <?php echo($logwriter_server_protocol); ?>
referrer: <?php echo($logwriter_http_referer); ?>
agent: <?php echo($logwriter_http_user_agent); ?>
HTML LOG LINE:
<?php echo($logwriter_logstring); ?>
-->

All right, that was easy. Every time we load logme.php, it adds the data to the SQL database. To add data every time you go to a particular web page, you could use a PHP include() statement in each webpage, or you could take advantage of the PHP’s auto_append_file feature! Simply insert the following line into your php.ini file if you have access to yours:

auto_append_file = "/path/to/html/logme.php"

How do we access this data once it’s been loaded into the database? There are many different ways, but I’ve chosen to get a little creative with a sleek, yet minimalistic web-based fronted. It basically just shows the last [x] number of entries in the access log. You can adjust the number of entries displayed by slapping on some arguments to the URL, transforming viewLast.php into viewLast.php?limit=123 or something (see the screenshot above). I won’t discuss the details of this script. It’s self-explanatory.

// viewLast.php
<html>
<head>
<style type="text/css">
td {
font-family: verdana, arial;
font-size:10px;
}
</style>
</head>
<body>
<?php

$limit = (int)$_GET['limit'];
if ($limit===0) {$limit=25;}

$username="YOUR_USERNAME";
$password="YOUR_PASSWORD";
$database="nibjb";
mysql_connect('mysql157.secureserver.net',$username,$password);
mysql_select_db($database) or die( "Unable to select database");
$query="
SELECT * FROM logs WHERE
request NOT LIKE "%testlog.php%"
AND request NOT LIKE  "%/logs/%"
AND request NOT LIKE "%/wp-admin/%"
ORDER BY ID DESC LIMIT 0,$limit
";
//$query="SELECT * FROM logs WHERE referrer LIKE "%&q=%" or referrer LIKE "%&prev=%" ";
$result=mysql_query($query);
$num=mysql_numrows($result);
mysql_close();
?>

<b><?php echo($query); ?></b>
<table border="1">
<tr>
<td>id</td>
<td>time</td>
<td>visitor</td>
<td>request</td>
<td>referrer</td>
</tr>

<?php
$i=1;
while ($i<$num) {
$id=mysql_result($result,$i,"id");
$time=mysql_result($result,$i,"time");
$visitor=mysql_result($result,$i,"visitor");
$method=mysql_result($result,$i,"method");
$request=mysql_result($result,$i,"request");
$protocol=mysql_result($result,$i,"protocol");
$referrer=mysql_result($result,$i,"referrer");
$referrer2=str_replace("&", "& ", $referrer);
$agent=mysql_result($result,$i,"agent");
$searchWords="";
$searchEngine="";
if (strpos($referrer, "q=")>0 and strpos($referrer, "google")>0) {$searchEngine="Google Web Search: ";}
if (strpos($referrer, "prev=/images")>0 and strpos($referrer, "google")>0) {$searchEngine="Google Image Search: ";}

// SEARCH EXTRACTION //
$j=0;
$rTemp=str_replace("prev=/images%3Fq%3D", "q=", $referrer);
$rTemp=str_replace("?q=","&q=", $rTemp);
$rTemp=str_replace("%2B"," ", $rTemp);
$rTemp=str_replace("%26"," ", $rTemp);
$rTemp=str_replace("%3D"," ", $rTemp);
$rTemp=str_replace("+"," ", $rTemp);
$wvars=split("&",$rTemp);
while ($j<count($wvars)){
    if (substr($wvars[$j],0,2) === "q=") {
        $searchWords = $searchWords . $wvars[$j] . " ";
        }
    $j++;
}

$searchWords=substr($searchWords,strpos($searchWords, "q=")+2);
if (strlen($searchWords)<3) {$searchWords=$referrer;}

echo "
<tr>
<td>$id</td>
<td>$time</td>
<td>$visitor</td>
<td><a href='$request'>$request</a></td>
<td>$searchEngine <a href='$referrer'>$searchWords</a></td>
</td>
";
$i++;
}
?>
</table>
</body>
</html>

And you’re done! This example is a simplified, bare bones example. You can take this a long way if you’d like. My goal is lite & flexible. A quick query from Python and Matplotlib (for example) yields gorgeous visual representations of otherwise-convoluted data!

If you have any questions, or end-up developing something awesome with this code, shoot me an email! It’s not luxurious, but this code works for me, and I share it with the best of intentions.


PySquelch: A Python-Based Frequency Audio Activity Monitor

I’m pretty much done with this project so it’s time to formally document it. This project is a collaboration between Fred, KJ4LFJ who supplied the hardware and myself, Scott, KJ4LDF who supplied the software. Briefly, a scanner is set to a single frequency (147.120 MHz, the output of an active repeater in Orlando, FL) and the audio output is fed into the microphone hole of a PC sound card. The scripts below (run in the order they appear) detect audio activity, log the data, and display such data graphically.

Here is some sample output:

Live-running software is current available at: Fred’s Site. The most current code can be found in its working directory. For archival purposes, I’ll provide the code for pySquelch in ZIP format. Now, onto other things…


Microcontroller-Powered Prime Number Generator

My microcontroller-powered prime number calculator is complete! Although I’m planning on improving the software (better menus, the addition of sound, and implementation of a more efficient algorithm) and hardware (a better enclosure would be nice, battery/DC wall power, and a few LEDs on the bottom row are incorrectly wired), this device is currently functional therefore I met my goal!

This device generates large prime numbers (v) while keeping track of how many prime numbers have been identified (N). For example, the 5th prime number is 11. Therefore, at one time this device displayed N=5 and V=11. N and V are displayed on the LCD. In the photo the numbers mean the 16,521,486th prime is 305,257,039 (see for yourself!). The LCD had some history. In December, 2003 (6 years ago) I worked with this SAME display, and I even located the blog entry (November 25, 2003) where I mentioned I was thinking of buying the LCD (it was $19 at the time). Funny stuff. Okay, fast forward to today. Primes (Ns and Vs) are displayed on the LCD.

In addition to the LCD, numbers are displayed in binary: Each row of LEDs represents a number. Each row of 30 LEDs allows me to represent numbers up to 2^31-1 (2,147,483,647, about 2.15 billion) in the binary numeral system. Since there’s no algorithm to simply generate prime numbers (especially the Nth prime), the only way to generate large Nth primes is to start small (2) and work up (to 2 billion) testing every number along the way for primeness. The number being tested is displayed on the middle row (N_test). The last two digits of N_test are shown on the top left. To test a number (N_test) for primeness, it is divided by every number from 2 to the square root of N_test. If any divisor divides evenly (with a remainder of zero) it’s assumed not to be prime, and N_test_ is incremented. If it can’t be evenly divided by any number, it’s assumed to be prime and loaded into the top row. In the photo (with the last prime found over 305 million) the device is generating new primes every ~10 seconds.

I’d like to emphasize that this device is not so much technologically innovative as it is creative in its oddness and uniqueness. I made it because no one’s ever made one before. It’s not realistic, practical, or particularly useful. It’s just unique. The brain behind it is an ATMEL ATMega8 AVR microcontroller (What is a microcontroller?), the big 28-pin microchip near the center of the board. (Note: I usually work with ATTiny2313 chips, but for this project I went with the ATMega8 in case I wanted to do analog-to-digital conversions. The fact that the ATMega8 is the heart of the Arduino is coincidental, as I’m not a fan of Arduino for purposes I won’t go into here).

I’d like to thank my grandmother’s brother and his wife (my great uncle and aunt I guess) for getting me interested in microcontrollers almost 10 years ago when they gave me BASIC Stamp kit (similar to this one) for Christmas. I didn’t fully understand it or grasp its significance at the time, but every few years I broke it out and started working with it, until a few months ago when my working knowledge of circuitry let me plunge way into it. I quickly outgrew it and ventured into directly programming cheaper microcontrollers which were nearly disposable (at $2 a pop, compared to $70 for a BASIC stamp), but that stamp kit was instrumental in my transition from computer programming to microchip programming.

The microcontroller is currently running at 1 MHz, but can be clocked to run faster. The PC I’m writing this entry on is about 2,100 MHz (2.1 GHz) to put it in perspective. This microchip is on par with computers of the 70s that filled up entire rooms. I program it with the C language (a language designed in the 70s with those room-sized computers in mind, perfectly suited for these microchips) and load software onto it through the labeled wires two pictures up. The microcontroller uses my software to bit-bang data through a slew of daisy-chained shift registers (74hc595s, most of the 16-pin microchips), allowing me to control over 100 pin states (on/off) using only 3 pins of the microcontroller. There are also 2 4511-type CMOS chips which convert data from 4 pins (a binary number) into the appropriate signals to illuminate a 7-segment display. Add in a couple switches, buttons, and a speaker, and you’re ready to go!

I’ll post more pictures, videos, and the code behind this device when it’s a little more polished. For now it’s technically complete and functional, and I’m very pleased. I worked on it a little bit every day after work. From its conception on May 27th to completion July 5th (under a month and a half) I learned a heck of a lot, challenged my fine motor skills to complete an impressive and confusing soldering job, and had a lot of fun in the process.


Summer's End is Nearing

My favorite summer yet is reaching its end. With about a month and a half before I begin dental school, I pause to reflect on what I’ve done, and what I still plan to do. Unlike previous summers where my time was devoted to academic requirements, this summer involved a 9-5 job with time to do whatever I wanted after. I made great progress in the realm of microcontroller programming, and am nearing the completion of my prime number calculator. I’m very happy with its progress.

Most of the LEDs are working but I’m still missing a few shift registers. It’s not that they’re missing, so much as I broke them. (D’oh!) I have to wait for a dozen more to come in the mail so I can continue this project. Shift registers are also responsible for powering the binary-to-7-segment chips on the upper left, whose sockets are currently empty.

Since this project is on pause, I began work hacking a VFD I heard about at Skycraft. It’s a 20x2 character display (forgot to photograph the front) and if I can make it light up, it will be gorgeous.

Here’s a high resolution photo of the back panel of the VFD. I believe it used to belong to an old cash register, and it has some digital interfacing circuitry between the driver chips (the big OKI ones) and the 9-pin input connector. I think my best bet for being able to control this guy as much as I want is to attack those driver chips, with help from the Oki C1162A datasheet. It looks fairly straightforward. As long as I don’t screw up my surface-mount soldering, and assuming that I come up with 65 volts to power the thing (!) I think it’s a doable project.


Prime Number Generator Schematics

Here’s a schematic of the prime number calculator I’m working on. Last night I finished wiring all 12 shift registers for the primary display, so now it’s time to start working on software. Notice that we have a lot of pins free still. This will be advantageous if I decide to go crazy adding extraneous functionality, such as fancy displays (LCD?, 7-segment LEDs?, VFD?, all 3?!) or creative input systems (how about a numerical keypad?).

After feeling the stink of paying almost $15 for 100’ of yellow, 24 gauge, solid-core wire from DigiKey I was relieved (and a little embarrassed) to find I could score 1,000’ of yellow, 24 gauge, threaded wire for $10 at Skycraft! Anyway, here’s the current schematic.