<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David Underhill &#187; Python</title>
	<atom:link href="http://dound.com/category/coding/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://dound.com</link>
	<description>dound&#039;s space on the web</description>
	<lastBuildDate>Tue, 31 Jan 2012 10:57:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>PostgreSQL UPSERT (in Python)</title>
		<link>http://dound.com/2011/01/postgresql-upsert-in-python/</link>
		<comments>http://dound.com/2011/01/postgresql-upsert-in-python/#comments</comments>
		<pubDate>Mon, 10 Jan 2011 00:53:25 +0000</pubDate>
		<dc:creator>David Underhill</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://dound.com/?p=492</guid>
		<description><![CDATA[PostgreSQL does not yet support the UPSERT command (though it is on their Todo list). If you have a row you want to update (if it already exists in the database) or insert (if it doesn&#8217;t exist yet), then PostgreSQL unfortunately makes you implement the logic yourself. Other popular databases like SQLite (INSERT OR IGNORE) [...]]]></description>
			<content:encoded><![CDATA[<p>PostgreSQL does not yet support the <code>UPSERT</code> command (though it is on their <a href="http://wiki.postgresql.org/wiki/Todo">Todo list</a>).  If you have a row you want to update (if it already exists in the database) or insert (if it doesn&#8217;t exist yet), then PostgreSQL unfortunately makes you implement the logic yourself.  Other popular databases like SQLite (<code>INSERT OR IGNORE</code>) and MySQL (<code>ON DUPLICATE KEY UPDATE</code>) both support upserting.  I haven&#8217;t run across a <em>generic</em> PL/pgSQL function which can do this, but you could write a trigger (like <a href="http://database-programmer.blogspot.com/2009/06/approaches-to-upsert.html">this one</a>) for each table where this functionality is needed.</p>
<p>Unfortunately, this is a bit of a pain if you want to use <code>UPSERT</code> on many tables, so I wrote a <a href="https://gist.github.com/772171">Python method</a> which takes care of the <code>UPSERT</code> logic generically.  To use it, you call it with a cursor connected to your database, the schema and table name, a list of primary key field names, and the key-value pairs for each field.</p>
<p>For example, let&#8217;s say you have a table which tracks scores (and only the last score counts):</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> MySchema<span style="color: #66cc66;">.</span>Scores <span style="color: #66cc66;">&#40;</span>
    user_id integer <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span><span style="color: #66cc66;">,</span>
    score   integer <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
<span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>To <code>UPSERT</code> a row into this table you would:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">db_conn = psycopg2.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;...&quot;</span><span style="color: black;">&#41;</span>
db_cur = db_conn.<span style="color: black;">cursor</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
upsert<span style="color: black;">&#40;</span>db_cur, <span style="color: #483d8b;">'Scores'</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">'user_id'</span>,<span style="color: black;">&#41;</span>, schema=<span style="color: #483d8b;">'MySchema'</span>, user_id=..., score=...<span style="color: black;">&#41;</span>
db_conn.<span style="color: black;">commit</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Here&#8217;s the code for the Python-based <code>upsert</code> method:</p>
<p><script src="https://gist.github.com/772171.js"> </script></p>
]]></content:encoded>
			<wfw:commentRss>http://dound.com/2011/01/postgresql-upsert-in-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Python logging and performance: how to have your cake and eat it too</title>
		<link>http://dound.com/2010/02/python-logging-performance/</link>
		<comments>http://dound.com/2010/02/python-logging-performance/#comments</comments>
		<pubDate>Sun, 07 Feb 2010 05:08:48 +0000</pubDate>
		<dc:creator>David Underhill</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[comment]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[preprocess]]></category>
		<category><![CDATA[script]]></category>

		<guid isPermaLink="false">http://dound.com/?p=268</guid>
		<description><![CDATA[I love Python&#8216;s logging module. I use it all the time to log a wide variety of information &#8212; messages to help me debug as well as informative messages for the user. Though you can toggle which messages you want to be printed, if the Python interpreter encounters a logging method call it still creates [...]]]></description>
			<content:encoded><![CDATA[<p>I love <a href="http://www.python.org">Python</a>&#8216;s <a href="http://docs.python.org/library/logging.html">logging module</a>.  I use it all the time to log a wide variety of information &#8212; messages to help me debug as well as informative messages for the user.  Though you can toggle which messages you want to be printed, if the Python interpreter encounters a logging method call it still creates the string for the log message (the argument to the method) (sadly there Python doesn&#8217;t have <a href="http://www.haskell.org/haskellwiki/Performance/Laziness">lazy evaluation</a> like <a href="http://www.haskell.org/">Haskell</a>).  If creating this string is expensive, then your application&#8217;s performance may suffer.  Unfortunately, there is no Python preprocessor (like C&#8217;s <a href="http://gcc.gnu.org/onlinedocs/cpp/">cpp</a> &#8230; though <a href="http://code.google.com/p/preprocess/">preprocess</a> might be able to do it) so it is difficult to automatically remove a large number of logging statements prior to running an application in a production environment.</p>
<p>The best solution I&#8217;ve seen is to prefix logging statements with <code>if __debug__:</code> so that they are optimized away by <code>python -O</code> (see <a href="http://stackoverflow.com/questions/2006190/python-equivalent-of-define-func-or-how-to-comment-out-a-function-call-in-p">this post on StackOverflow</a>).  I like it, but it unfortunately requires this statement to be prefixed to every logging statement I don&#8217;t want in a production environment.  That&#8217;s a lot of ugly extra code and it isn&#8217;t easy to change which statements it applies to either.</p>
<p>I decided to write a script which automatically parses a Python file and replaces logging statements of a particular level with a <code>pass</code> statement and a commented out copy of the logging code.  It can also do the reverse operation.  It has some limitations (see the code, or run the script with the <code>--help</code> option), but it should work for most Python files.  I used it for the <a href="http://yuba.stanford.edu/vns">VNS</a> project and it successfully operated on every file in the project.  It also improved performance dramatically &#8211; the maximum throughput of the VNS simulator <a href="http://yuba.stanford.edu/vns/2010/02/fairness/">increased by 25%</a>!  In comparison, running the code with <a href="http://psyco.sourceforge.net/">Psyco</a> only garnered a 6% improvement (though pretty substantial for the minimal <a href="http://github.com/dound/vns/commit/54d5415a43043062ae6195b828707692b9231aab">13 lines</a> I had to add to take advantage of it).</p>
<p>I think this script is worth using before running your code in a production environment if you are a heavy user of the logging module like I am.  You can find the code <a href="http://www.siafoo.net/snippet/348">here</a> (it is hosted on <a href="http://www.siafoo.com">Siafoo</a>, a neat site for sharing code).  Here&#8217;s the latest version of the code:<br />
<script type='text/javascript' src='http://www.siafoo.net/snippet/348/embed.js?nolinenos'></script></p>
]]></content:encoded>
			<wfw:commentRss>http://dound.com/2010/02/python-logging-performance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Integrating Twisted with a pcap-based Python packet sniffer</title>
		<link>http://dound.com/2009/09/integrating-twisted-with-a-pcap-based-python-packet-sniffer/</link>
		<comments>http://dound.com/2009/09/integrating-twisted-with-a-pcap-based-python-packet-sniffer/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 00:53:34 +0000</pubDate>
		<dc:creator>David Underhill</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[pcap]]></category>
		<category><![CDATA[pcapy]]></category>
		<category><![CDATA[raw socket]]></category>
		<category><![CDATA[sniffer]]></category>
		<category><![CDATA[twisted]]></category>

		<guid isPermaLink="false">http://dound.com/?p=223</guid>
		<description><![CDATA[Twisted is an awesome event-driven networking engine. Unfortunately, it does not have good support for interfacing with raw sockets (unlike its support for many network protocols, which is amazing). Anyway, I recently needed to work with raw sockets so I had to find a way to make it work with Twisted. Though Twisted does have [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://twistedmatrix.com/">Twisted</a> is an awesome event-driven networking engine.  Unfortunately, it does not have good support for interfacing with <a href="http://en.wikipedia.org/wiki/Raw_socket">raw sockets</a> (unlike its <a href="http://twistedmatrix.com/documents/8.1.0/api/twisted.protocols.html">support</a> for many network protocols, which is amazing).  Anyway, I recently needed to work with raw sockets so I had to find a way to make it work with Twisted.  Though Twisted does have a module (<a href="http://twistedmatrix.com/trac/wiki/TwistedPair">twisted.pair</a>) which tries to provide some support for raw sockets, the module is poorly documented and requires a library which is not readily available.</p>
<p>Luckily, I stumbled on a module which works on top of the <a href="http://www.tcpdump.org/">libpcap</a> packet capture library called <a href="http://oss.coresecurity.com/projects/pcapy.html">pcapy</a>.  It is simple to use, and thread-safe &#8212; and easy to integrate into a Twisted-based project.</p>
<p>I put together a short sample (see below) which shows how to capture raw packets alongside the main Twisted event loop.  It would be trivial to extend this example to also write to a raw socket (using an ordinary <a href="http://docs.python.org/library/socket.html">Python socket</a>).  This example can also be downloaded <a href="http://dound.com/wp/files/twisted_and_pcap_together.py">here</a>.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;"># This sample shows how to run a libpcap-based packet sniffer concurrently with</span>
<span style="color: #808080; font-style: italic;"># the Twisted framework.  The Twisted component is an &quot;Echo&quot; TCP server</span>
<span style="color: #808080; font-style: italic;"># (listening on port 9999) which prints everything it receives.  When a client</span>
<span style="color: #808080; font-style: italic;"># connects, it starts the pcap thread.  When the pcap thread receives a packet,</span>
<span style="color: #808080; font-style: italic;"># it sends a message to the client telling it the size of the received packet.</span>
<span style="color: #808080; font-style: italic;"># Finally, when the client disconnects the program is terminated.</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># To try this contrived example out, run this script as root (so that it can use</span>
<span style="color: #808080; font-style: italic;"># pcap) and then connect to the echo server (e.g., telnet localhost 9999).  Note</span>
<span style="color: #808080; font-style: italic;"># that the pcap parameters are hard-coded.  This code uses twisted 8.0.2 and</span>
<span style="color: #808080; font-style: italic;"># pcapy-0.10.4.</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> pcapy <span style="color: #ff7700;font-weight:bold;">import</span> open_live
<span style="color: #ff7700;font-weight:bold;">from</span> twisted.<span style="color: black;">internet</span>.<span style="color: black;">protocol</span> <span style="color: #ff7700;font-weight:bold;">import</span> Protocol, Factory
<span style="color: #ff7700;font-weight:bold;">from</span> twisted.<span style="color: black;">internet</span> <span style="color: #ff7700;font-weight:bold;">import</span> reactor
&nbsp;
<span style="color: #808080; font-style: italic;"># pcap settings</span>
DEV          = <span style="color: #483d8b;">'eth0'</span>  <span style="color: #808080; font-style: italic;"># interface to listen on</span>
MAX_LEN      = <span style="color: #ff4500;">1514</span>    <span style="color: #808080; font-style: italic;"># max size of packet to capture</span>
PROMISCUOUS  = <span style="color: #ff4500;">1</span>       <span style="color: #808080; font-style: italic;"># promiscuous mode?</span>
READ_TIMEOUT = <span style="color: #ff4500;">100</span>     <span style="color: #808080; font-style: italic;"># in milliseconds</span>
PCAP_FILTER  = <span style="color: #483d8b;">''</span>      <span style="color: #808080; font-style: italic;"># empty =&gt; get everything (or we could use a BPF filter)</span>
MAX_PKTS     = -<span style="color: #ff4500;">1</span>      <span style="color: #808080; font-style: italic;"># number of packets to capture; -1 =&gt; no limit</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> run_pcap<span style="color: black;">&#40;</span>f<span style="color: black;">&#41;</span>:
    <span style="color: #808080; font-style: italic;"># the method which will be called when a packet is captured</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> ph<span style="color: black;">&#40;</span>hdr, data<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'pcap heard: when=%s sz=%dB'</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>hdr.<span style="color: black;">getts</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>data<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        <span style="color: #808080; font-style: italic;"># thread safety: call from the main twisted event loop</span>
        reactor.<span style="color: black;">callFromThread</span><span style="color: black;">&#40;</span>f, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>data<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #808080; font-style: italic;"># start the packet capture</span>
    p = open_live<span style="color: black;">&#40;</span>DEV, MAX_LEN, PROMISCUOUS, READ_TIMEOUT<span style="color: black;">&#41;</span>
    p.<span style="color: black;">setfilter</span><span style="color: black;">&#40;</span>PCAP_FILTER<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Listening on %s: net=%s, mask=%s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>DEV, p.<span style="color: black;">getnet</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>, p.<span style="color: black;">getmask</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    p.<span style="color: black;">loop</span><span style="color: black;">&#40;</span>MAX_PKTS, ph<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># a silly echo server which prints what it receives and sends info about the</span>
<span style="color: #808080; font-style: italic;"># size of each packet captured on DEV</span>
<span style="color: #ff7700;font-weight:bold;">class</span> Echo<span style="color: black;">&#40;</span>Protocol<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> connectionLost<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, reason<span style="color: black;">&#41;</span>:
        <span style="color: #dc143c;">os</span>._exit<span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;"># kill the whole process</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> connectionMade<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #808080; font-style: italic;"># run pcap in another thread (it will run forever)</span>
        reactor.<span style="color: black;">callInThread</span><span style="color: black;">&#40;</span>run_pcap, <span style="color: #008000;">self</span>.<span style="color: black;">pcapDataReceived</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> dataReceived<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, data<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'echo got: %s'</span> <span style="color: #66cc66;">%</span> data
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> pcapDataReceived<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, sz<span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">transport</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'pcap got: %uB<span style="color: #000099; font-weight: bold;">\n</span>'</span> <span style="color: #66cc66;">%</span> sz<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># starts the silly echo server on port 9999</span>
<span style="color: #ff7700;font-weight:bold;">def</span> main<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    factory = Factory<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    factory.<span style="color: black;">protocol</span> = Echo
    reactor.<span style="color: black;">listenTCP</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">9999</span>, factory<span style="color: black;">&#41;</span>
    reactor.<span style="color: black;">run</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">&quot;__main__&quot;</span>:
    main<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://dound.com/2009/09/integrating-twisted-with-a-pcap-based-python-packet-sniffer/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Python + Twisted Length-Type-Based Protocol Client / Server</title>
		<link>http://dound.com/2009/03/python-and-twisted-length-type-based-protocol-client-server/</link>
		<comments>http://dound.com/2009/03/python-and-twisted-length-type-based-protocol-client-server/#comments</comments>
		<pubDate>Sun, 08 Mar 2009 09:25:12 +0000</pubDate>
		<dc:creator>David Underhill</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[WordPress]]></category>
		<category><![CDATA[client]]></category>
		<category><![CDATA[protocol]]></category>
		<category><![CDATA[server]]></category>
		<category><![CDATA[tcp]]></category>
		<category><![CDATA[twisted]]></category>

		<guid isPermaLink="false">http://dound.com/?p=108</guid>
		<description><![CDATA[I often have a need to work with a simple TCP protocol whose messages have a header which starts with the length of the message and an integer representing the message type.  To save myself the trouble of creating and debugging a very similar custom implementation each time I have this need, I decided to package it as a simple <a href="http://www.python.org">Python</a> framework which does this for me.  It is based on the event-driven <a href="http://www.twistedmatrix.com">Twised</a> networking engine.]]></description>
			<content:encoded><![CDATA[<p>It seems like I often have a need to work with a simple TCP protocol whose messages have a header which starts with the length of the message and an integer representing the message type (<a href="http://openflowswitch.org/">OpenFlow</a> is one of many such protocols).  To save myself the trouble of creating and debugging a very similar custom implementation each time I have this need, I decided to package it as a simple <a href="http://www.python.org">Python</a> framework which does this for me.  It is based on the event-driven <a href="http://www.twistedmatrix.com">Twised</a> networking engine.  Using this simple extension on top of Twisted has a number of benefits:</p>
<ol>
<li>Automatic handling of the length and type fields when sending and receiving messages.</li>
<li>Automatic unpacking of messages based on type.</li>
<li>Client automatically tries to reconnect if the connection is lost.</li>
<li>Server can handle any number of clients simultaneously.</li>
</ol>
<p>You can view the official package on the <a href="http://pypi.python.org">PyPi</a> website <a href="http://pypi.python.org/pypi/ltprotocol">here</a>.  My local page for the package is <a href="http://dound.com/projects/python/ltprotocol/">here</a> &#8212; please <a href="http://dound.com/projects/python/ltprotocol/">view it</a> for an example on how to use this package.</p>
]]></content:encoded>
			<wfw:commentRss>http://dound.com/2009/03/python-and-twisted-length-type-based-protocol-client-server/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

