be Groovie

About

Ben Bangert is a San Francisco Bay Area programmer, best known for his open-source work creating and contributing to Python libraries such as Pylons, Beaker, and Routes.
He currently works at Mozilla.

Categories

Sep 17 2012

Trying out SmartOS and OpenIndiana

After building my new server capable of running SmartOS, it was time to give it a spin!

If you’ve only built desktop machines, its hard to express how awesome IPMI KVM is. No longer do you need to grab another keyboard / video monitor / mouse (the KVM), you just plug in the IPMI Ethernet port on the motherboard to your switch and hit the web-server its running. It then lets you remotely access the machine as if you had it hooked up directly. You can get into the BIOS, boot from ISO’s on your local machine, hard reset, power down, power up, etc. It’s very slick and means I can stick the computer in the rack without needing to go near it to do everything that used to require a portable set of additional physical hardware.

Note

This post assumes some basic knowledge of OS virtualization. In this case QEMU, KVM (which was ported by Joyent to run on SmartOS), and Zones. I generally refer to them as VM’s and will differentiate when I add a Zone vs. a KVM instance.

First Go at SmartOS

Installation is ridiculously easy, there is none. You download SmartOS, put it on a USB stick or CD-ROM, and boot the computer from it. I was feeling especially lazy and used the motherboards IPMI KVM interface to remotely mount the ISO image directly from my Mac.

Once SmartOS booted, it asked me to setup the main ZFS pool, and it was done. SmartOS runs a lot like a VMWare ESXI hyper-visor, with the assumption that the machine will only be booting VM’s. So the entire ZFS pool is just for your VM’s, which I appreciate greatly. After playing with it a little bit, it almost felt.... too easy.

I had really allocated at least a week or two of my spare time to fiddle around with the OS before I wanted it to just work, and having it running so quickly was almost disappointing.

The only bit that was slightly annoying was that retaining settings in the GZ (Global Zone) is kind of a pain. You have to drop in a service file (which is XML, joy!) on a path which SmartOS will then load and run on startup. This was mildly annoying, and some folks on the IRC channel suggested I give OpenIndiana a spin, which is aimed more at a home server / desktop scenario. There was also a suggestion that I give Sophos UTM a spin instead of pfsense for the firewall / router VM.

OpenIndiana

Since OpenIndiana has SmartOS‘s QEMU/KVM functionality (needed to run other OS’s like Linux/BSD/Windows under an illumos based distro), it seemed worth giving a go. It actually installs itself on the system unlike SmartOS, so I figured it’d take a little more space. No big deal. Until I installed it.

Then I saw that the ZFS boot pool can’t have disks in it larger than 2TB (well, it can, but it only lets you use 2TB of the space). Doh. After chatting with some IRC folks again, its common to use two small disks in a mirror as a ZFS boot pool and then have the much larger storage pool. Luckily I had a 250GB drive around so I could give this a spin, though I was bummed to have to use one of my drive bays just for a boot disk.

Installation went smoothly, but upon trying to fire up a KVM instance I was struck by how clunky it is in comparison to SmartOS. Again, this difference comes down to SmartOS optimizing the heck out of its major use-case.... virtualizing in the data-center. In SmartOS there’s a handy imgadm tool to manage available images, and vmadm to manage VM’s. These don’t seem to exist for OpenIndiana (maybe as an add-on package?), so you have to use the less friendly QEMU/KVM tools directly.

Then the KVM failed to start. Apparently the QEMU/KVM support in OpenIndiana (at least for my Sandy Bridge based motherboard) has been broken in the latest 3 OpenIndiana releases for the past 5 months. There’s a work-around to install a specific set of packages, but to claim QEMU/KVM support with such a glaring bug in a fairly prominent motherboard chip-set isn’t a good first start.

My first try to install the specific packages failed as my server kernel-panicked halfway through the QEMU/KVM package installation. Upon restarting, the package index was apparently corrupted. The only way to fix it is to re-install OpenIndiana... or rollback the boot environment (a feature utilizing ZFS thus including snapshots). Boot environments and the beadm tool to manage them are a bit beyond the scope of this entry, but the short version is that it let me roll-back the boot file-system including the package index to a non-mangled state (Very cool!).

With QEMU / KVM finally installed and working, I installed and configured Sophos UTM in a KVM and was off and running. Except it seemed to run abysmally slow... oh well, I was about to go on vacation anyways. I set the KVM to load at boot-time and restarted.

Upon loading the KVM at boot, the machine halted. This issue is apparently related to the broken QEMU / KVM packages. It was about time for my vacation, and I had now played with an OS with some rather rough edges in my spare time for a week. So I powered it off, took out the boot drive, and went on my vacation.

Back to SmartOS

When I got back from my vacation, I was no longer in the mood to deal with failures in the OS distribution. I rather like the OpenIndiana community, but now I just wanted my server to work. SmartOS fit the bill, and didn’t require boot drives which was greatly appreciated. It also has a working QEMU / KVM, since its rather important to Joyent. :)

In just a day, I went from a blank slate to a smoothly running SmartOS machine. As before, installation was dead simple, and my main ZFS pool zones (named as such by SmartOS) was ready for VM’s. Before I added a VM I figured I should have an easy way to access the ZFS file-system. I turned on NFS for the file-systems I wanted to access and gave my computer’s IP write privilege and the rest of the LAN read-only. This is insanely easy in ZFS:

zfs set sharenfs=rw=MYIP,ro=192.168.2.0 zones/media/Audio

To say the least, I love ZFS. Every other file-system / volume manager feels like a relic of the past in comparison. Mounting NFS file-systems on OSX used to suck, but now its a breeze. They work fast and reliably (thus far at least).

Setting Up the Router KVM

First, I needed my router / firewall KVM. I have a DSL connection, so I figured I’d wire that into one NIC, and have the other NIC on the motherboard go to the LAN. SmartOS virtualizes these so that each VM gets its own Virtual NIC (VNIC), this is part of the Solaris feature- set called Crossbow. Setting up the new KVM instance for Sophos UTM was simple, I gave it a VNIC on the physical interface connected to the DSL modem and another on the physical interface connected to my switch.

Besides for the fact that the VM was working without any issues like I had in OpenIndiana, I noticed it was much faster as well. Unfortunately for some reason it wasn’t actually routing my traffic. It took me about an hour (and clearing the head while walking the dog) to see that I was missing several important VNIC config options, such as dhcp_server, allow_ip_spoofing, allow_dhcp_spoofing, and allow_restricted_traffic.

These settings are needed for a VM that intends to act as a router so that it can move the packets and NAT them as appropriate across the VNICs. Once I set those everything ran smoothly.

So far, this only took me about 3 hours and was rather simple so I decided to keep going and get a nice network backup for the two OSX machines in the house.

Setting Up Network Backups

After some research I found out the latest version of netatalk would work quite nicely for network Time Machine backups. I created a zones/tmbackups ZFS file-system, and two nested file-systems under that for my wifes’ Macbook and my own Mac Mini. Then I told ZFS that zones/tmbackups should have compression enabled (Time Machine doesn’t actually compress its backups, transparent ZFS file compression FTW!) and I set quota’s on each nested file-system to prevent Time Machine from expanding forever.

Next I created a Zone with a SmartOS Standard dataset. Technically, the KVM instances run in a Zone for additional resource constraints and security, while I wanted to use just a plain Zone for the network backups. This was mainly because I wanted to make the zones/tmbackups file-system directly available to it without having to NFS mount it into a KVM.

If you’ve ever compiled anything from source in Solaris, you’re probably thinking about how many days I spent to get netatalk running in a Zone right now. Thankfully Joyent has done an awesome job bringing a lot of the common GNU compiler toolchain to SmartOS. It only took me about an hour to get netatalk running and recognized by both macs as a valid network Time Machine backup volume.

Unfortunately I can’t remember how exactly I set it up, but here are the pages that gave me the guidance I needed:

I’ve heard that netatalk 3.x is faster, and will likely upgrade that one of these days.

Setting Up the Media Server KVM

One of the physical machines I wanted to get rid of was the home theater PC I had built a few years back. It was rarely used, not very energy efficient, and XBMC was nowhere near spouse-friendly enough for my wife. We have an AppleTV and Roku, and I figured I’d give Plex a try on the Roku since the UI was so simple.

I setup a KVM instance and installed Ubuntu 12.04 server on it. Then I added the Plex repo’s and installed their Media Server packages. Fired it up and pointed Plex at my Video folders and it was ready to go. The Roku interface is slick and makes it a breeze to navigate. Being based on XBMC means that it can play all the same media and trans-codes it as necessary for the other network devices that want to play it.

At first Plex ran into CPU problems in the KVM... which I quickly realized was because I hadn’t changed the default resource constraints. The poor thing only had a single virtual CPU... after giving it a few more it easily had enough CPU allocated to do the video trans-coding.

While KVM runs CPU-bound tasks at bare-metal speed, disk I/O is virtualized. To reduce this problem I have Plex writing its trans- coded files to the ZFS file-system directly via an NFS mount. The media folders are also NFS mounted into the Media Server KVM.

I threw some other useful apps onto this KVM that I was running on the home theater PC and left it alone.

SmartOS Rocks

I now have a nice little home SmartOS server setup running that does a great job taking on jobs previously done by 2 other pieces of hardware. I still need to setup a base Ubuntu image to use for other development KVM’s, which I’ll blog about when I get that going. Despite being intended for the data-center, SmartOS works great for a home NAS / Media Server / Router system. I’m sure I’ll be even happier as I start to ramp up my use of development VM’s.

OpenIndiana is a small community taking on a big job. It’s a great community and people are very friendly. But you should expect to be hacking on things very early on if you use it, rather than playing with the other components. The SmartOS community is doing great too, and there’s more than a few forks that add some additional home-centric type functionality. So far I haven’t needed any of those enough to get me to try them out.

Anything else I should blog about regarding SmartOS or the rest of my setup?

Sep 16 2012

Building A SmartOS Server

I’ve been reading about SmartOS for awhile now and have wanted to build a home server that would let me run VM’s with ZFS for the main file-system. Getting rid of my home theater PC and wireless router (which has been annoying me with its flakiness for months) was also a goal. Running something like pfsense in a VM would give me more options and theoretically be more stable than the fairly crappy software that seems to plague home consumer-grade wireless routers.

So after a month or so of research in my spare time, it seemed like SmartOS was going to be the best bet. Even though its generally intended for use in the datacenter, it had all the features I wanted (which I’ll blog about separately in my next post). Now I just needed a parts list that had already been verified to work with SmartOS, which is a bit pickier on hardware than the linux/BSD distributions.

Equipment

Here’s what I ended up with:

  • CPU: Intel Xeon E3-1230 V2
  • Motherboard: SUPERMICRO MBD-X9SCL-F-O
  • Case: NORCO RPC-2212 Black 2U Rackmount Server Case with 12 Hot-Swappable SATA/SAS Drive Bays
  • HBA: LSI Internal SATA/SAS 9211-8i (Hooks up to 2 of the back-plane connectors in the case for 8 drives)
  • RAM: 16GB ECC (The 8 GB unbuffered sticks were unfortunately not around at the time or I would’ve gotten two of those to begin with)

I already had a 2TB and 3TB drive, so I bought one more of each so that I could run a ZFS storage pool with 2 vdev mirrors as Constantin Gonzalez blogs about regarding RAID vs. mirrors.

In retrospect, and after reading a bit more, I think I would’ve gotten one of the larger Norco 4U cases. Not because I need or want 20+ hot-swap bays, but because you can easily use a ‘desktop’ grade 80+ Titanium rated power supply. Finding a 2U 80+ PSU is difficult, a 80+ Titanium rated that puts all its power out on a single 5v rail is almost impossible. The cost savings in getting a good desktop-grade PSU with the Norco 4U case is about the same as the one I got with the more expensive 2U PSU.

I also bought a rack to put the server in along with my other home networking gear, so that it’d all be nicely packed away in a corner of the garage. Here’s a photo of the completed setup:

My home-server rack

I have one of the cheaper Cisco SG300-10 switches which conveniently came with rack-mounts, and monoprice had a very affordable patch panel and blank plates to make it look tidy.

Overall cost: ~$2200

That includes the nice Tripp Lite SR12UB 12U Rack Enclosure which I’ve found handy to lock to ensure my toddler doesn’t yank out hard drives (he figured out how to pull out the hot-swap drive in all of 20 seconds when I was assembling it). Not that I let him run around the garage, but keeping everything locked is handy just in case.

OS Choice

When I was assembling and preparing to install SmartOS, some people on IRC mentioned that OpenIndiana might be a better choice for a home server. Suffice it to say it didn’t work out well, while SmartOS has been flawless now and running smoothly for the past two months.

My next post will have a lot more details on my OpenIndiana experience as well as how I have the SmartOS box setup.

Mar 26 2012

New Blog Software Again!

I’ve been using tumblr for awhile and while its useful when posting random stuff I should’ve posted to Facebook instead (images, links, videos, etc.), writing my text in HTML was just icky. Markdown isn’t a huge improvement and I was really itching to write all my posts in reST as I already know it quite well from writing my docs using Sphinx.

I considered using blogofile, but it seems to be abandoned and it’s not trivial to add normal reST style code highlighting. Then I saw tinkerer, which is basically just a few extensions on top of Sphinx... perfect!

The Migration

For anyone considering migrating from tumblr, here’s my simple dump script that pulled all my posts I cared about out of tumblr and dropped them into directories for tinkerer

import json
import re
import os
import subprocess

import requests


date_regex = re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')


def pull_posts(blog, api_key):
    api_url = 'http://api.tumblr.com/v2/blog/%s/posts?api_key=%s&limit=2000'
    call_url = api_url % (blog, api_key)
    r = requests.get(call_url)
    posts = json.loads(r.content)
    return posts


def html2rst(html):
    p = subprocess.Popen(['pandoc', '--from=html', '--to=rst'],
                         stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    return p.communicate(html)[0]


def dump_posts(posts):
    post_links = []
    for post in posts:
        if post['type'] not in ['text']:
            continue
        d = date_regex.match(post['date']).groupdict()
        os.system("mkdir -p %s/%s/%s" % (d['year'], d['month'], d['day']))
        slug = post['post_url'].split('/')[-1].replace('-', '_')
        link = '%s/%s/%s/%s' % (d['year'], d['month'], d['day'], slug)
        post_links.append(link)
        bar = '=' * len(post['title'])
        with open('%s.rst' % link, 'wb') as f:
            print post
            f.writelines([post['title'].encode('utf-8'), '\n', bar, '\n\n'])
            if post['type'] == 'text':
                body = html2rst(post['body'].encode('utf-8').replace('’', "'"))
                f.writelines([body, '\n\n'])
            elif post['type'] == 'link':
                desc = html2rst(post['description'].replace('’', "'").encode('utf-8'))
                f.writelines(['Link: `%s <%s>`_\n\n' % (post['title'].encode('utf-8'), post['url'].encode('utf-8'))])
                f.write(desc + '\n\n'),
            f.writelines([
                '.. author:: default\n', '.. categories:: %s\n' % ', '.join(post['tags']),
                '.. comments::\n', '   :url: %s' % post['post_url']
            ])
    return post_links

It’s quite nice that I get all the Sphinx extensions for use in my blog, and there’s no more mental context switching to write a blog post vs. writing project documentation for my open-source projects.

Dropping a graphviz diagram into my blog also became trivial.

digraph foo {     "awesome" -> "sphinx";     "awesome" -> "tinkerer"; }

The Bad News

It’s not perfect, tinkerer is still very beta. But I can wrap my head around it, and its easy to extend. I’ve already made a little modification in my own fork which allows me to specify URL’s for the comments to ensure I get the right disqus threads on the old blog posts I ported. This wasn’t a flawless process due to how the main reactions and comments thing on the main pages look, they’re a bit off for the legacy posts... but at least the comments and such show up fine once you click in so I’ll live with it.

There’s no category specific RSS feeds at the moment, so I’ll need to hack that in so that I can get relisted on the Python aggregators. I also will likely update the theme, right now I’m just using ‘minimal’ which isn’t bad.

Since this is more for just posts, I dropped the other tumblr things like links and videos to retain just content. I don’t think this is too negative but some might want all the types tumblr supports.

Overall, I’m quite happy with it thus far. We’ll see how I feel in a few months, and hopefully I’ll be blogging more since there’s less friction involved since I get to use the Sphinx tools I’m quite familiar with.

Nov 13 2010

Notes on the Pylons & repoze.bfg Merger

Some folks might not have time to follow the Pylons-discuss mail list, so this might be news to them, but I’m thrilled to announce that the Pylons and repoze.bfg web frameworks are merging. If this is the first you’ve heard about it, don’t worry, it was only announced a week ago now on the Pylons mail list.

In the time since the announcement, I’ve heard a lot of varying feedback. Some people took a look at Pyramid (the core package that will be equivilant to ‘Pylons 2.0’) and were quick to respond, usually in a knee-jerk type response. I think some of this was due to a miscommunication, and partly because there was so much already done. When other frameworks have merged in other languages, such as Rails merging with Merb, the announcement was just that. There was no code at the time to show, just a promise that when it was ready, it would be awesome.

This merger in contrast already had a starting foundation for a huge chunk of the core features. As a result, people assumed that what we had was already ‘finished’, or close to it. The polish of much of the documentation made it feel odd that there was no “Porting Pylons 1.0 to Pyramid” guide done. In reality, Pyramid is definitely not done, there is still quite a bit of work left before Pyramid will meet the expectations that many Pylons users have. There’s still refinements to be done to Pyramid, and additional packages that Pylons users will most likely always use with it for the feature-set they’re accustomed to.

I’ve summed up a few thoughts on when Pylons users should port to Pyramid to try and help manage expectations better in the future. I’ll make more announcements when packages are ready to ease the transition and a “Porting Guide” is ready.

What is Pylons?

Many Pylons users don’t realize which features they enjoy come from the package ‘pylons’ vs. the other packages that Pylons depends on. Contrary to popular belief the majority of features present in Pylons actually come from other packages. This mistaken belief that most of the features come from the pylons package led some to think that because a lot of my future development time will be spent on adding features/packages around pyramid, Pylons is somehow dead>. This is not the case.

First, Pylons the web framework is mainly a small (~ 1000 LoC) glue layer between Paste, PasteScript, PasteDeploy, WebOb, WebError, Routes, WebHelpers, Mako, and SQLAlchemy. Some people usually end up swapping out Mako/SQLAlchemy but by and large this is the common ‘Pylons Stack’. Most of the new features in Pylons over the past several years actually came from additions to WebHelpers, WebError, or Routes. All of these packages continue to get the same development as they have, so no ‘death’ is occurring.

Second, for over the past 6 months now, there’s been very little in the way of patches submitted, bugs reported, or other feature requests. In many ways Pylons is ‘done’ regarding adding more feature to the core package itself. As I announced on the Pylons-discuss mail list, the Pylons code-base hit some design issues. Adding the features I heard requested from quite a few users (and needed myself) regarding extensibility couldn’t be retro-fitted into the existing design. I encourage anyone curious to read my prior entry on sub-classing for extensibility to be a preview of some future blog posts. I’ll be writing more about design patterns in Python that handle extensibility which many popular Python web frameworks are also struggling to handle.

The Future

I’m very excited about the future for the Pylons Project, which is the new over-arching organization that will be developing Python web framework technologies. The core will be Pyramid, with additional features and functionality building around that. We’re already quickly expanding the developer team with some long-time contributors and having a combined team has definitely helped us progress rapidly.

One of my main goals is to encourage and ease contributions from the community. To that extent I’ve been filling in the contributing section for the Pylons Project as much as possible. I believe this is an area that will quickly set us apart from other projects as we emphasize a higher standard of Python development.

Django did a good job setting the bar high for its documentation of how to contribute to Django, which deserves a lot of credit for clearly defining community policies. Its missing a portion we considered extremely valuable which core developers generally get very picky on when accepting patches… how to test your code. The Pylons Project adapted the rather thorough testing dogma noted by Tres Seaver, which I personally can’t recommend highly enough when it comes to writing unit tests. It’d be nice to see more posts expand on exactly how to test your code. Many developers (including myself) can write code that passes 100% test coverage… but is it brittle test code? Prone to failure if some overly clever macro it uses fail? Seeing a well written set of examples on designing unit tests to avoid common gotcha’s is definitely something anyone contributing (and developers in general) should be familiar with.

For those wanting a gentler introduction to Pyramid (the docs are very verbose and detailed, not at all opinionated), I’ll be blogging more about new features and how to utilize them. Please be patient, I think a lot of people are going to be excited at what’s in store.

Oct 19 2010

Why Extending Through Subclassing (a framework’s classes) is a Bad Idea

Ok, I’ll admit it, overly ambitious blog post. So I’ll refine it a little now, this is intended mainly as my thoughts on why as a tool developer (one who makes tools/frameworks that other programmers then use), its a bad idea to implement extensible objects via developer subclassing. This is actually how the web framework I wrote - Pylons - provides its extensibility to developers and lets them change how the framework functions.

Please excuse the short and possibly incomplete description, this is mainly a quick post to illustrate a tweet I recently made.

First, some background…

One of the things that Pylons 1.0 and prior is missing is a way to easily extend a Pylons project. While it can be done, its very ad-hoc, kludgy, and generally not very well thought-out. Or thought-out at all really. What was somewhat thought-out was how a developer was supposed to extend and customize the framework.

In a Pylons project, the project creates a PylonsApp WSGI object, and all the projects controllers subclass WSGIController. This seemed to work quite well, and indeed many users happily imported PylonsApp, subclassed it to extend/override methods they needed for customization, or changed how their WSGIController subclass worked to change how individual actions would be called.

Everything seemed just fine…. until…

Improving Pylons

When I had some free time a little while back, I set about looking into how to extend and improve Pylons to make up for where it was lacking, extensibility. I quickly realized that I’d need to change rather drastically how Pylons dispatch worked, and how controller methods were called to make them more easily extendable. But then with a certain feeling of dread, the subclassing issue nipped me. All my implementations of PylonsApp and WSGIController were effectively frozen.

Since every single developer using Pylons sub-classes WSGIController, and to a much lesser extent, PylonsApp, any change to any of the main methods would result in immediate breakage of every single Pylons users app that happened to customize them (the very reason subclassing was used!). This meant that I couldn’t very well change the implementation of the actual classes to fix their design, because that would just cause complete breakage. Ugh!

So after looking into it more, I’ve ended up with this short list of the obvious bad reasons this shouldn’t be done. BTW, in Pylons 2, controllers don’t subclass anything, and customization is all with hooks into the framework, no subclassing in sight!

Short List of Why It’s Bad

From a framework maintainers point of view…

  1. Implementations of the classes are effectively frozen, because all the class methods are the API.
  2. Correcting design flaws or implementation flaws are much more difficult if not impossible without major breakage due to point #1.
  3. Heavily sub-classed/large hierarchy classes can have performance penalties.

From a developer ‘extending’ the classes point of view…

  1. Figuring out how to unit-test is more difficult as the full implementation is not in your own code… its in the framework code.
  2. When using mix-in’s and other classes that also subclassed the framework, strange conflicts or overrides occur that aren’t at all obvious or easy to debug/troubleshoot.

I think there were a few more reasons I came across as well, but I can’t recall them at the moment. In short, I’m now of the rather firm opinion that the only classes you should ever subclass are your own classes. Preferably in the same package, or nearby.

Feb 06 2010

Pylons 0.10 and 1.0 Beta 1 Released

Without further ado,

I’m pleased to announced that Pylons 0.10b1 and 1.0b1 are now out. I have not put them on Cheeseshop to ensure they’re not downloaded accidentally.

Upgrading / Installing

I have updated upgrading instructions here: http://pylonshq.com/docs/en/1.0/upgrading/

The instructions to install from scratch on Pylons 1.0b1: http://pylonshq.com/docs/en/1.0/gettingstarted/#installing

The upgrading page covers the important upgrading instructions that Mike Orr touched briefly on before.

Note that these are beta releases, intended for us to discover remaining issues and continue updating any other documentation where applicable. Very little has actually changed in Pylons since 0.9.7, apart from 1.0 dropping all of the legacy functionality and a few explicit clean-ups.

Updates

Routes, Beaker, and WebHelpers however have been seeing quite a bit of updates through the life of Pylons 0.9.7 so no one should think that the developers working on Pylons and its related parts have been hanging out doing nothing. :)

Since Pylons 0.9.7 was released on February 23, 2009, almost one year ago now:

  • Routes 1.11 was released, and 1.12 with some great updates will be out shortly
  • Beaker has gone from 1.2.2 -> 1.5 with 3 major updates substantially increasing its ease of use and reliability
  • WebHelpers is now at 1.0b4 with major updates, core functions rewritten, and new docs up
  • SQLAlchemy has gone from 0.4 to 0.5 (with 0.6 in beta)

I believe this speaks a great deal about the benefits of keeping the core Pylons functionality separate from other parts, as a variety of bug fixes and features can be improved without requiring new Pylons releases to quickly address bug reports.

How to Help!

To bring Pylons to 1.0, many docs likely need very small changes. Also, it would be great to take care of reference docs where people have commented about problems/tips. Helping is fairly easy, especially if you’re familiar with restructured text.

First: Clone the Pylons repository on Bitbucket: http://bitbucket.org/bbangert/pylons/

Then: Edit the documentation files under pylons/docs/en/ to read as appropriate, commit the fix, and push it to bitbucket.

Finally: Issue a pull request on bitbucket so that we’ll know your fix is ready. Ideally you should include a note in it about what your fix remedies.

Bug Reports

Did your upgrade not go according to plan? Was there something missing that you needed to do from the upgrading docs?

Let us know by filing a bug report (mark component as documentation, and milestone as 0.10: http://pylonshq.com/project/pylonshq/newticket

You’ll need to login to file a bug report, or feel free to reply to this announcement with the issue.

Thanks (in alphabetical order) to Mike Bayer, Ian Bicking, Mike Burrows, Graham Higgins, Phil Jenvey, Mike Orr, and anyone else I missed for all their hard work on making Pylons and its various components what they are today.

Jan 07 2010

Deploying Python web apps with toppcloud

Ian Bicking recently released a rather interesting package called toppcloud that aims to tackle what I see as a growing need for those of us deploying Python webapps.

I was actually interested in easing my own deployment woes before I saw Ian’s announcement about his package, and was halfway through a rather hefty amount of research on automating server deployments with tools like Chef and Puppet, but toppcloud is a bit different. It not only tackles provisioning a new system on the fly from the ‘cloud’ (using libcloud), but it also handles easy Python (and now PHP) web application deployments.

With such a tantalizing set of goals, I couldn’t really resist getting my feet wet. Boy am I glad I did. PylonsHQ is now running with toppcloud, and I won’t be surprised when more people get it running for them. When many shared hosting providers are $5-10 a month, its rather nice to pay $10/mth for an automatically configured VPS, with one-command deployments. Though of course, unless you go to quite a bit of work yourself, most shared hosting providers don’t have one-command deployments for you.

Before I continue on to describe how I setup Kai - the source code behind PylonsHQ - I should provide a few caveats about using toppcloud:

  • toppcloud is alpha software, there’s no releases of it yet, you will be checking it out from source code
  • currently, only Rackspace Cloud is known to be working, though since it uses libcloud, in theory, any cloud providers that it supports should be usable
  • toppcloud is changing rapidly, be ready to keep up on the commit log to see whats changing
  • there are no unit tests, most likely because its very tedious to make the rather significant amount of mock objects required to test the various local/remote commands and the fact that its changing so fast the tests would probably be obsolete in a week

toppcloud philosophy

I’m half guessing here, based on talking with Ian and reading the docs myself, but the philosophy of toppcloud is around providing a common deployment platform, ala Google App Engine, except of course with Postgres or other ‘services’ that you can request to use. At the moment the only services that toppcloud comes with is CouchDB, Files (to store/serve files for your app on the filesystem), and Postgres w/postgis extensions. For those diving into the source code, it shouldn’t be too hard to see how to create additional services and I hope to see more get added as people get more interested.

Therefore, toppcloud is not expected to be everything to everyone, it is expected to be at least an 80+% solution to deploying web apps in a Google App Engine style ease-of-deployment process. toppcloud itself then ensures that depending on what service you asked for, its setup and ready for use on the server when your app is deployed.

If that sounds like something you’re dying to try out, you’re in luck!

Setting up a Pylons App

I’m not going to mention how to check out the toppcloud source, except to mention the directions are in the

/docs/index.txt

file in the toppcloud source (which should be read in its entirety!). Once you have toppcloud installed on your computer, getting from zero to running website on VPS is remarkably quick:

  1. $ toppcloud create-node --image-id=14362 baseimage

    Wait until email arrived indicating that server is up and ready

  2. $ toppcloud setup-node baseimage

    Server is now all setup to run web apps!

  3. $ toppcloud init myapp

    Create an app to deploy to our server

  4. $ source myapp/bin/activate

    Activate the virtualenv used for this app

  5. $ pip install Pylons
  6. $ cd myapp/src
  7. $ paster create -t pylons awesomeapp

    Create our Pylons app, or check out an existing one from your VCS here

  8. $ cd ../..
  9. $ ln -s myapp/src/awesomeapp/awesomeapp/public/ myapp/static

    toppcloud will make available things in the static directory available without hitting the webapp

  10. Configure myapp/app.ini similarly to (taken from pylonshq site):

    [production]
    app_name = pylonshq
    runner = src/kai/production.ini
    version = 1
    update_fetch = /sync_app
    service.couchdb =
    service.files =
    default_host = pylonshq.com

    In this case, the Pylons site needs CouchDB setup, and the files service (all Pylons sites should use the files service to store cached files, templates, etc.)

  11. $ toppcloud update myapp/

    Note that we’re one directory above myapp, and we have a trailing slash on myapp, this is needed because an rsync is done to copy it to the server, don’t leave off the trailing slash! (This will prolly be fixed at some point)

That’s it! 10 easy steps to go from zero to a running deployed website.

I should also note that you will need to make a production.ini file, and that it should have a few important changes in it. All the references to %(here)s should be changed to %(CONFIG_FILES)s since that’s the persistent location that files can be stored for a toppcloud app between deployments. Other configuration information provided by services (Couch supplies the db/host, Postgres has its host/user/pass info) can be accessed via CONFIG_ vars as well. The services docs have some more info.

More apps can be added to a host as desired, until the ram runs out of course. At the moment toppcloud is using a process pool of 5 with 1 thread under mod-wsgi for each application. This can use a bit of ram if you have multiple heavy Pylons processes, hopefully there will shortly be a way to ask toppcloud to use a single process with multiple threads which will help cut the ram profile a bit.

Using Django and PHP

Unfortunately I haven’t actually tried this myself, but there’s nothing preventing it. You’ll need to change the app.ini so that instead of using ‘src/kai/production.ini’ as the runner, it uses a Python file in the directory, say main.py that then loads the Django app as a WSGI application and returns it. Sort of like this. Note that the config vars Django needs for its database should then be present in os.environ when setting up the settings.py that Django uses.

If you’re looking through the toppcloud source code by now, you may have also noticed there’s an example app that uses PHP. There’s nothing holding back toppcloud from setting up mod_passenger and deploying Ruby apps at some point either should someone wish to add that feature.

dumb pipes

There’s been numerous mentions on various blogs and in the news about how much the cell phone carriers hate the concept of being nothing more than “dumb pipes” for wireless Internet and phone use. That means of course, that they’d no longer be competing on what phones you could use but instead solely on service quality and price… I think the same type of transition is in store for shared hosting providers and some of the boutique app deployment shops like heroku.

It’ll take awhile, toppcloud is very rough right now. But when you can d/l a nice little open-source package, run it, choose your choice of cloud provider (based on price + quality!), have it automatically setup for you to deploy your apps to, then deploy apps with a single command… you’ve already done the vast majority of what a service like heroku does, except you could still modify toppcloud if there was something lacking you really needed. And it’s a reason like that, that toppcloud exists to begin with.

Oh, and thanks Ian for writing this before I wasted more time making it myself.

Aug 13 2009

Advanced Caching with Django and Beaker

After seeing more than a few blog posts and packages attempt to provide more advanced caching capability for Django, it occurs to me I should actually just blog how to use Beaker in Django, rather than just keep mumbling about how “Beaker already does that”. So, if you’ve needed caching in Django that goes beyond using just one backend at a time, or maybe can actually cope with the Dog-Pile Effect, this is the blog entry for you (Until I flesh it out further into actual docs on the Beaker site).

Install Beaker

This is simple enough, if you have easy_install available, just:

easy_install -U Beaker

Or if you prefer to download tar files, grab the Beaker 1.4 tar.gz file

Configuring the Cache

Setting up Beaker’s cache for your Django app is pretty easy. Since only a single cache instance is needed for an app, we’ll set it up as a module global.

Create a beakercache.py file in your Django project with the following contents:

from beaker.cache import CacheManager
from beaker.util import parse_cache_config_options

cache_opts = {
    'cache.type': 'file',
    'cache.data_dir': 'cache/data',
    'cache.lock_dir': 'cache/lock'
}

cache = CacheManager(**parse_cache_config_options(cache_opts))

There’s a lot more options available, such as memcached, configuring multiple cache backends at once, etc. Now that you know how to provide the configuration options, further customization can be done as needed using the Beaker configuration docs. (Note the very handy cache region configurations which make it easy to toggle cache backend configurations on the fly!)

Using the Cache

Beaker provides a conveinent decorator API to make it easy to cache the results of functions. In this example we’ll just sleep and make a string including the time, add this to your views.py:

import time
from datetime import datetime

from django.http import HttpResponse

from YOURPROJECT.beakercache import cache

def hello(request):
    @cache.cache(expire=10)
    def fetch_data():
        time.sleep(4)
        return 'Hello world, its %s' % datetime.now()
    results = fetch_data()
    return HttpResponse(results)

In this case, the cached data is in a nested function with the decorator. It could of course be in the module elsewhere as well.

Hook the view function up in your urls.py, and hit the view. The first time it will wait a few seconds, then it will return the old time until the cache expires (10 seconds in this case).

The cached function can also accept positional (non-keyword) arguments, which will be used to key the cache. That is, different argument values will result in different copies of the cache that require those arguments to match.

That’s it, it’s really quite easy to use.

Update: It occurs to me this post does say ‘advanced’, and that example wasn’t very advanced, so here’s something a bit more interesting. Let’s configure cache regions to make it easy to toggle how long and where something is cached. Cache regions allow you to arbitrarily configure batches of settings, a ‘region’. Later you can then indicate you want to use that region, and it uses the settings you configured. This also make its easy to setup cache policies and change them in a single location.

In this case, we’ll have ‘long_term’ and ‘short_term’ cache settings, though you can of course come up with as many regions as desired, with the name of your choice. We’ll have the long_term settings use the filesystem, since we want to retain the results for quite awhile and not have them pushed out like memcached does. The short_term settings will use memcached, and be cached for only 2 minutes, long enough to help out on those random slashdog/digg hits.

In the beakercache.py file:

from beaker.cache import CacheManager
from beaker.util import parse_cache_config_options

cache_opts = {
    'cache.type': 'file',
    'cache.data_dir': 'cache/data',
    'cache.lock_dir': 'cache/lock',
    'cache.regions': 'short_term, long_term',
    'cache.short_term.type': 'ext:memcached',
    'cache.short_term.url': '127.0.0.1.11211',
    'cache.short_term.expire': '1200',
    'cache.long_term.type': 'file',
    'cache.long_term.expire': '86400',
}

cache = CacheManager(**parse_cache_config_options(cache_opts))

Now in our views.py:

import time
from datetime import datetime

from django.http import HttpResponse

from testdjango.beakercache import cache

def hello(request):
    @cache.region('long_term')
    def fetch_data():
        time.sleep(15)
        return 'Hello world, its %s' % datetime.now()
    results = fetch_data()
    return HttpResponse(results)

def goodbye(request):
    @cache.region('short_term'')
    def fetch_data():
        time.sleep(4)
        return 'Bye world, its %s' % datetime.now()
    results = fetch_data()
    return HttpResponse(results)
Jul 24 2009

Beaker 1.4 Released

Beaker 1.4 has now been released, and addresses several fairly important bugs. First, the full changelog:

  • Fix bug with hmac on Python 2.4. Patch from toshio, closes ticket #2133 from the TurboGears2 Trac.
  • Fix bug with occasional ValueError from FileNamespaceManager.do_open. Fixes #10.
  • Fixed bug with session files being saved despite being new and not saved.
  • Fixed bug with CacheMiddleware overwriting configuration with default arguments despite prior setting.
  • Fixed bug with SyntaxError not being caught properly in entry point discovery.
  • Changed to using BlobProperty for Google Datastore.
  • Added domain/path properties to the session. This allows one to dynamically set the cookie’s domain and/or path on the fly, which will then be set on the cookie for the session.
  • Added support for cookie-based sessions in Jython via the JCE (Java Cryptography Extensions). Patch from Alex Grönholm.
  • Update Beaker database extensions to work with SQLAlchemy 0.6 PostgreSQL, and Jython.

Note that the beaker database extension now works on Jython, and the cookies for sessions can be set dynamically during a request (for sites that operate across multiple domains/sub-domains).

Most importantly though, a bug in the import of the Google back-end has been fixed, which caused installation failures on Beaker 1.3.x.

Docs can be found on the Beaker site.

To upgrade your Beaker with easy_install:

easy_install -U Beaker

This release is also notable as the majority of the fixes were contributed by several web framework communities. Thanks for the patches!

Jun 29 2009

Comments and Web Services

I’m trying out Disqus for comments on the blog, for one main reason, I just didn’t feel like implementing comments myself. I am somewhat wary of services that seem integral to a blog, like where the comments are, which is why I’ve been leery of using services like this for some time.

What happens if Disqus runs out of VC and goes belly-up? I’ve seen this with a few other web services, and no one using it is very happy when it occurs. On the other hand, I did find some irony in the fact that one of the features Disqus pitches users is, “Don’t lose your comments if the blog disappears”.

On the other hand, I really appreciate the capabilities a central service-based comment system brings. I’ve inadvertently used it on other blogs, and was very pleasantly surprised to get updates and actually be able to keep up on what was happening on the blogs I commented on.

Are other people worried about using services by new companies or am I just overly paranoid?

I’d almost feel better about it if I could pay five bucks a year or something, as I’d at least have some better reassurance that the company is actually making money like I do with Flickr. Or maybe it’d be nice if companies that are profitable to say so, though it occurs to me that even that isn’t a guarantee, as someone could come along and buy them up, then decide to terminate their functionality (like that one site that people used and were pissed about when Six Apart bought them and shut it down, but for some reason I can’t remember the name of it at all).