I love Python’s logging module. I use it all the time to log a wide variety of information — messages to help me debug as well as informative messages for the user. Though you can toggle which messages you want to be printed, if the Python interpreter encounters a logging method call it still creates the string for the log message (the argument to the method) (sadly there Python doesn’t have lazy evaluation like Haskell). If creating this string is expensive, then your application’s performance may suffer. Unfortunately, there is no Python preprocessor (like C’s cpp … though preprocess might be able to do it) so it is difficult to automatically remove a large number of logging statements prior to running an application in a production environment.
The best solution I’ve seen is to prefix logging statements with if __debug__: so that they are optimized away by python -O (see this post on StackOverflow). I like it, but it unfortunately requires this statement to be prefixed to every logging statement I don’t want in a production environment. That’s a lot of ugly extra code and it isn’t easy to change which statements it applies to either.
I decided to write a script which automatically parses a Python file and replaces logging statements of a particular level with a pass statement and a commented out copy of the logging code. It can also do the reverse operation. It has some limitations (see the code, or run the script with the --help option), but it should work for most Python files. I used it for the VNS project and it successfully operated on every file in the project. It also improved performance dramatically – the maximum throughput of the VNS simulator increased by 25%! In comparison, running the code with Psyco only garnered a 6% improvement (though pretty substantial for the minimal 13 lines I had to add to take advantage of it).
I think this script is worth using before running your code in a production environment if you are a heavy user of the logging module like I am. You can find the code here (it is hosted on Siafoo, a neat site for sharing code). Here’s the latest version of the code:
David Underhill Coding, Python comment, logging, performance, preprocess, Python, script
Twisted is an awesome event-driven networking engine. Unfortunately, it does not have good support for interfacing with raw sockets (unlike its support for many network protocols, which is amazing). Anyway, I recently needed to work with raw sockets so I had to find a way to make it work with Twisted. Though Twisted does have a module (twisted.pair) which tries to provide some support for raw sockets, the module is poorly documented and requires a library which is not readily available.
Luckily, I stumbled on a module which works on top of the libpcap packet capture library called pcapy. It is simple to use, and thread-safe — and easy to integrate into a Twisted-based project.
I put together a short sample (see below) which shows how to capture raw packets alongside the main Twisted event loop. It would be trivial to extend this example to also write to a raw socket (using an ordinary Python socket). This example can also be downloaded here.
# This sample shows how to run a libpcap-based packet sniffer concurrently with
# the Twisted framework. The Twisted component is an "Echo" TCP server
# (listening on port 9999) which prints everything it receives. When a client
# connects, it starts the pcap thread. When the pcap thread receives a packet,
# it sends a message to the client telling it the size of the received packet.
# Finally, when the client disconnects the program is terminated.
# To try this contrived example out, run this script as root (so that it can use
# pcap) and then connect to the echo server (e.g., telnet localhost 9999). Note
# that the pcap parameters are hard-coded. This code uses twisted 8.0.2 and
# pcapy-0.10.4.
import os
from pcapy import open_live
from twisted.internet.protocol import Protocol, Factory
from twisted.internet import reactor
# pcap settings
DEV = 'eth0' # interface to listen on
MAX_LEN = 1514 # max size of packet to capture
PROMISCUOUS = 1 # promiscuous mode?
READ_TIMEOUT = 100 # in milliseconds
PCAP_FILTER = '' # empty => get everything (or we could use a BPF filter)
MAX_PKTS = -1 # number of packets to capture; -1 => no limit
def run_pcap(f):
# the method which will be called when a packet is captured
def ph(hdr, data):
print 'pcap heard: when=%s sz=%dB' % (hdr.getts(), len(data))
# thread safety: call from the main twisted event loop
reactor.callFromThread(f, len(data))
# start the packet capture
p = open_live(DEV, MAX_LEN, PROMISCUOUS, READ_TIMEOUT)
p.setfilter(PCAP_FILTER)
print "Listening on %s: net=%s, mask=%s" % (DEV, p.getnet(), p.getmask())
p.loop(MAX_PKTS, ph)
# a silly echo server which prints what it receives and sends info about the
# size of each packet captured on DEV
class Echo(Protocol):
def connectionLost(self, reason):
os._exit(0) # kill the whole process
def connectionMade(self):
# run pcap in another thread (it will run forever)
reactor.callInThread(run_pcap, self.pcapDataReceived)
def dataReceived(self, data):
print 'echo got: %s' % data
def pcapDataReceived(self, sz):
self.transport.write('pcap got: %uB\n' % sz)
# starts the silly echo server on port 9999
def main():
factory = Factory()
factory.protocol = Echo
reactor.listenTCP(9999, factory)
reactor.run()
if __name__ == "__main__":
main()
David Underhill Coding, Python pcap, pcapy, Python, raw socket, sniffer, twisted
I recently wanted to put a footnote reference inside a table. Unfortunately, LaTeX makes it somewhat difficult to add footnotes inside floats (e.g., tabular). If you try to put a footnote inside a tabular, then pdflatex will show the reference but not the footnote itself! I came across several suggestions for fixing this:
One idea is to put the table in a minipage. This causes the footnote to show up at the bottom of the table (in its own numbering system) — but I wanted the footnote to show up at the bottom of the page like other footnotes!
Another idea was to manually specify the footnote number inside the text and then use the \footnotetext command (outside the tabular) to manually add the footer. Unfortunately, this is not a robust solution since it forces you to manually maintain this footnote number inside the tabular.
Building on the previous idea, I discovered a way to make footnotes appear inside tabulars without breaking the automatic numbering of footnotes. Here is my approach:
- Include the “fmtcount” package so that you can display the values of counters (e.g., the footnote counter):
- Immediately before your tabular, increment the footnote counter:
\addtocounter{footnote}{1}
- Next, specify the contents of the footnote:
\footnotetext[\value{footnote}]{your text here}
- Finally, add a reference to the footnote inside the table:
You can extend this idea to add multiple footnotes within a single tabular by adjusting the counters (using \addtocounter) appropriately. Here is a complete example of how to add two footnotes inside a single tabular (you can see the PDF output here):
\documentclass[12pt]{article}
\usepackage{fmtcount} % displaying latex counters
\begin{document}
\title{An Example of Footnotes Inside a Tabular}
\author{David Gridley Underhill}
\maketitle
% manually add a footnote which exists inside the table
\addtocounter{footnote}{1}
\footnotetext[\value{footnote}]{my first footnote}
% add another footnote
\addtocounter{footnote}{1}
\footnotetext[\value{footnote}]{my second footnote}
% reset the counter to the first footnote's value
\addtocounter{footnote}{-1}
\begin{tabular}{|l|l|}
\hline
% this next row references the first footnote I added above, and then
% advances the counter to the next footnote.
{\bf First Column} & {\bf Second Column}$^{\decimal{footnote}}$\addtocounter{footnote}{1} \\
\hline
% now reference the second footnote from above -- don't increment the footnote
% counter beyond the last footnote!
X & Y$^{\decimal{footnote}}$ \\
\hline
\end{tabular}
\end{document}
David Underhill Coding, LaTeX caption, figure, float, footnote, latex, table, tabular
I’ve been looking for a way to have github, my favorite repository hosting service, send emails to interested parties whenever someone pushes new commits into the repository, They don’t seem to provide this service directly, but they do provide both an API for programatically querying the site as well as post-receive callbacks. The latter sends a POST request to URL(s) of your choice. The POST request includes JSON data which contains information about the repository and the new commits (for details, see here). Using this, I was able to put together a relatively simple PHP script which repackages this information into a human-readable form and sends it off in an email (source code here).
I wanted to use the github API to get a list of those who watch the project, and then send an email to those people. Unfortunately the API does not let you query that yet. Instead, my script lets you specify the recipients manually. Alternatively, a simple mailing list implementation is provided so people can sign up for the post-receive emails through a webpage instead.
The only downside is that the JSON github provides does not include information about how many lines were modified, just files. In this respect, Andy Parkin’s post-receive script produces slightly more informative e-mails (but of course it doesn’t work with github).
David Underhill Coding, PHP, version control git, github, JSON, post-receive
I’ve revamped the front page to list current projects instead of past ones. I’ve also updated the Projects page to include some more recent projects including jToolbar, a minimum spanning tree algorithm library, and the ENVI network visualization and control framework. I also updated ltprotocol to provide additional and more symmetric (between the client and server) callbacks.
David Underhill Coding, Software project, update
I recently had a need to rewrite a git repository’s history. This isn’t generally a very good idea, though it is useful if your repository contains files it should not (such as unneeded large binary files or copyrighted material). I also am using it because I had a branch where I only wanted to merge a subset of files back into master (though there are probably better ways of doing this). Anyway, it is not very hard to rewrite history thanks to the excellent git-filter-branch tool which comes with git. However, if your goal was to reduce a large repository’s size then git-filter-branch does not quite finish the job since it makes temporary backups of the filtered out files. To remove those, you need to do a little more work. To make it easier to permanently remove files, I wrapped it in a little bash script git-remove-history (also shown below) — simply go to the root of your repository and run the script with the list of files you want to delete and it will do the rest. There is an interesting thread about doing this here on KernelTrap.
#!/bin/bash
set -o errexit
# Author: David Underhill
# Script to permanently delete files/folders from your git repository. To use
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2
if [ $# -eq 0 ]; then
exit 0are still
fi
# make sure we're at the root of git repo
if [ ! -d .git ]; then
echo "Error: must run this script from the root of a git repository"
exit 1
fi
# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD
# remove the temporary history git-filter-branch otherwise leaves behind for a long time
rm -rf .git/refs/original/ && git reflog expire --all && git gc --aggressive --prune
David Underhill Coding, bash, version control bash, binary file, git, history, remove, repository, version control
It seems like I often have a need to work with a simple TCP protocol whose messages have a header which starts with the length of the message and an integer representing the message type (OpenFlow is one of many such protocols). To save myself the trouble of creating and debugging a very similar custom implementation each time I have this need, I decided to package it as a simple Python framework which does this for me. It is based on the event-driven Twised networking engine. Using this simple extension on top of Twisted has a number of benefits:
- Automatic handling of the length and type fields when sending and receiving messages.
- Automatic unpacking of messages based on type.
- Client automatically tries to reconnect if the connection is lost.
- Server can handle any number of clients simultaneously.
You can view the official package on the PyPi website here. My local page for the package is here — please view it for an example on how to use this package.
David Underhill Coding, Python, WordPress client, protocol, Python, server, tcp, twisted
Today I decided that did not really like the how WordPress handled user logins. Whenever you want to login, it whisks you away from what you were reading and onto a very empty login page. Once you have logged in, in tends to whisk you off somewhere new. Worse, when you logout it again takes you away from the page you were on to show you a blank login page. Thus I headed back to the WordPress plugins directory in search of something better.
What I found was a nifty plugin named AJAX Login which (surprise) used AJAX to handle almost all login processing within the page the user was on. Unfortunately, it had not been updated in over a year and was no longer compatible with the latest version of WordPress. Thus I started hacking on it and ended up making a number of improvements to its UI and how it handled AJAX calls. Anyway, I decided to package it up as a new plugin — you can get the plugin and read all the details about what it does here.
Its official location in the WordPress plugins directory is at http://wordpress.org/extend/plugins/ajax-login-widget/!
David Underhill Coding, WordPress AJAX Login Widget++, login form, plugin, WordPress