3

Antipattern: chaining stateless protocol requests

published on 2008|09|24

As we all know, HTTP is a stateless protocol. We do all sort of hacks to add state, like ext/session in PHP. While such hacks work great for a lot of use cases, we should remind ourselves that they are hacks. There is a phenomenon of state creep: coupling unrelated HTTP requests. Think of a page that references a thumbnail in an <img/>-tag and the picture is generated as needed: it would be possible to generate that image in the context of the request that embeds that image. So the template calls a helper to generate the thumbnail and the thumbnail is generated in the file system.

While this works well for a single host, your personal weblog about cooking and cats, it won’t work for something serious. When you start load balancing between two webserver nodes you are set on fire as you can’t guarantee that the image is present on the correct node (beside you are generating the image n times where n is the number of nodes). The solution is not that hard: pregenerate all the images with a queuing system and display “This image is currently not available”-placeholders as long as they are not ready or – in case of little image uploads – generate them when uploading the image. The other option is to generate them on the fly when they are requested. If you do the latter, do it in the context of the request that tries to receive the image, not in the embedding context (the page that embeds the image). Generating on the fly means that you deliver your files through PHP or something similar: this is fine as long as you have an HTTP accelerator in place.

One of the systems that does it in the way described above is Drupal. I’ve implement MogileFS for image storage and retrieval for Drupal and let me say, it was not a pleasure.

On a side note: HTTP 1.1 allows resources to be fetched in parallel, which makes generating images in the wrong context even worse from a user experience point of view, as the page will not show up until each thumbnail is generated.

6

8 Hints out of Testing-Turmoil

published on 2008|09|19
  1. Have a continuous integration solution in place. Really. If you don’t, you just burn money by writing tests. I would go so far and say, if you don’t have continuous integration, you should stop writing unit tests and do click testing. Let your CI system generate API docs, high level docs, code coverage report, testdox and every statical analysis info you generate.
  2. The definition of “tests pass” is “tests pass on the continous integration system”. “Works for me” has neither a place in the bugtracker nor everywhere else.
  3. If you can’t test it, the architecture is most likely wrong (exceptions are sessions and caching related code which is generally hard to test). Testability should be your main concern when writing code. What’s the use of fast or wonderful looking code, if you can’t repeatable prove it is working?
  4. Prefer method calls over annotations. A typo in setExpectedException will trigger a transparent error, while a typo in expectedException will lead to Obscure Test, and most likely a Mistery Guest.
  5. Run the whole test harness twice. This will hellp to identify setup/teardown bugs. Create a random test suite to identify the hard to track mistakes.
  6. Run your testsuite really often. We run it with 15 seconds delay every minute and I’m pretty happy with it.
  7. Use good test names that describe the behavior of the unit. The behavior is not the unit you test itself, that’s what I see in the code, it is something like “calling register changes the status of the user to foobar” so the good test name would be “testRegisterChangesTheStatus …”.
  8. Aim for 100% code coverage. 95% is nothing to be proud about, I can guarantee, the missing 5% will be the hardest part.
2

Testing PHP 5.3 alpha1

published on 2008|08|03

Finally Johannes Schlüter baked a first alpha-tarball for PHP 5.3. The new version contains a huge amount of new features, like closures, namespaces and late static binding. Such a huge amount requires thorough testing: if you are using a PHP application you would like to see fully working with our brand new version or you are developing a PHP application, this is your chance to make sure everything will go smoothly. If you are a web hosting provider, do your performance benchmarks now!
If you are, accidentally, using the Gentoo Linux distribution, I have something for you: in my personal PHP overlay you can find an ebuild for PHP 5.3.0_alpha1. A few warnings: currently, ext/fileinfo does not compile because of #45636 and of course I did not test all possible USE-flag combinations. If you experience problems with it, just leave a comment here.

Tags: , ,

0

Specific env vars for Gentoo packages

published on 2008|08|02

Since Gentoo Portage introduced the package specific configuration in /etc/portage there was one thing I always missed: specifying environment variables per package. Some environment variables you might want to specify per package are CFLAGS, CXXFLAGS and FEATURES. Especially when you do debugging, some packages should not be stripped, which is the perfect use case for the FEATURES environment variable. While specifying USE-flags and keywords per package, the rest is not that easy. Christian Hoffmann dropped me a link to this mail: the tip there works fine. I’ve played around with it and implemented it slightly differently: first, I would like to be informed which environment files are read and second I changed the resolution order so that the specific configuration inherits from the more generic. So this is what my /etc/portage/bashrc looks like now:
[geshi lang=bash] for conf in ${PN} ${PN}-${PV} ${PN}-${PV}-${PR}; do env=/etc/portage/env/${CATEGORY}/${conf}.env if [[ -f ${env} ]]; then einfo "Reading specific environment from ${env}" . ${env} fi done [/geshi]
For dev-lang/php-5.2.6-r3 I can use three different files to customize the build environment: /etc/portage/env/dev-lang/php.env would apply for all PHP ebuilds, /etc/portage/env/dev-lang/php-5.2.6.env for all revision of the ebuild for 5.2.6 and /etc/portage/env/dev-lang/php-5.2.6-r2.env for the exact ebuild. My /etc/portage/env/dev-lang/php.env file now looks like this to disable stripping the binaries after emerging them and keeping the working directory for better backtraces:
[geshi lang=bash] FEATURES="${FEATURES} nostrip keepwork" [/geshi]

48

Antipattern: the verbose constructor

published on 2008|07|31

Constructors are often used to shortcut dependency injection and parameter passing on instantiation. This is a valid practice and often leads to shorter code. Consider the following example (a simple value object, often used to not mess around with floats and to keep currency and amount together):

class Money
{
    protected $_amount;
    protected $_currency;
    protected $_divisor;
    public function __construct(
        $amount = null, $currency = null, $divisor = null)
    {
        if ($amount !== null)
            $this->setAmount($amount);
        if ($current !== null)
            $this->setCurrency($currency);
        if ($divisor !== null)
            $this->setDivisor($divisor);
    }
    ... setter and getter ...
}

Now consider instantiating this object. Instead of creating a new instance of “Money” and calling three setter, everything can be done compactly in the constructor.

bc . $money = new Money(13200, ‘EUR’, 100);

So for the money object this works pretty well. The code is easy to read, but wait, the first argument can be grasped easily, the second too, but the third? It is not too obvious that it is a divisor is passed. An alternative would be changing the constructor to accept an array. This is a replacement for true named arguments, as e.g. Python supports. Solar uses that a lot, as well as the Zend Framework.

$money = new Money(
    array(
        'amount' => 13200,
        'currency' => 'EUR',
        'divisor' => 100
    )
);

Much better readable but does your IDE code completion works? And what happens if you pass “amoµnt”, because your fingers are as clumsy as mine? Exactly, the parameter will be silently ignored.
But look at this:

$money = new Money();
$money->setAmount(13200);
$money->setCurrency('EUR');
$money->setDivisor(100);

It is at least equally short, readable, your IDE works and if you have problems with the dimensions of your keys on your keyboard (they are too small, it has nothing to do with your fingers) you will be warned. But we could even have an even shorter example while maintaining the readability. With fluent interfaces we would get the following:

$money = new Money();
$money->setAmount(13200)->setCurrency('EUR')->setDivisor(100);

Wonderful! If you want, you can add a newline between each object operator and you would have the same amount of lines but less dense code (sad that we don’t have fluent constructors, isn’t it?). Sometimes setters are so elegant.

So until know one thing should be clear: it is not just about easily writing the code, but about the next guy understanding it too. Because you never write code for yourself. Never. But let’s investigate some real live example. I work with a framework that allows me to define really nifty business logic by just sticking together a bunch of fields and every field having a bunch of validators and filters attached.

class User extends Model
{
    protected function _define(Definition $definition)
    {
        $definition->addField(new StringField('username', true, null, true));
    }
    protected function _getStorageClass()
    {
        return 'UserStorage';
    }
}

All the time I write such a definition, I need to look into the code to check the order of the parameters. I can remember the first parameter, but the rest is too similar. To explain it: the second parameter specifies whether the field is required, the third expects a default parameter and the fourth indicates whether the value can be changed after it has been set once. I’ve talked about filters and validators, right?

class User extends Model
{
    protected function _define(Definition $definition)
    {
        $definition->addField(new StringField('username', true, null, true))
            ->addValidator(new UniqueUserValidator())
            ->addFilter(new LowercaseFilter())
            ->addValidator(new RegexValidator('/^[a-z]+$/'));
    }
}

Definition::addField() returns the passed field object to allow adding validators and filters. What works for validators and filters, should work for the rest too, shouldn’t it?

class User extends Model
{
    protected function _define(Definition $definition)
    {
        $definition->addField(new StringField('username'))
            ->setRequired(true)
            ->setReadonly(true);
    }
}

I admit, a bit more code to write, but a huge improvement in readability and therefore in maintainability. Other variants, where setter are not a good solution is to create an expressive factory. We e.g. have a Criteria object that creates and orders Criterion objects internally. Because we don’t have a fluent constructor, we have a static create-method for the Criteria object.

$criteria = Criteria::create('User')->field('id')->equal(1);

The alternative with just utilizing the constructor would be horribly to read and would have limitations regarding the parameter parsing capabilities (except if func_get_args() is used, which is totally the opposite of the paradigm of strict APIs). But back to the constructor only example:

$criteria = new Criteria('User', array('id' => 1));

And how would you express “id not equal 1” with it? So that’s where expressive factories are an alternative.

Constructors, as like any other method, should have as less parameters as possible but as much as needed. Obvious. The constructor should only allow setting vital information for the object (if the object has a name, there is a good chance, that the name is the parameter of the class’ constructor because it is considered vital). And the ease of use depends heavily whether the parameters passed can be intuitively distinguished by looking at there values. As well when the code is written first time as for maintaining it for the rest of your life.

(There are a bunch of other tricks to make parameters more readable, like using class constants as parameters, but this is out of scope of this article).

Tags: , ,

10

Over abbreviated

published on 2008|06|30


© Giant Ginkgo

Matthew Weier O’Phinney announced Zend’s naming scheme for the Zend Framework from the point where PHP 5.3 namespaces are used. The issue is, that the PHP parser does not allow class Abstract, neither interface Interface as both “abstract” and “interface” are reserved keywords. So Zend suggests prefixing interfaces with “I” and abstract classes with “A”. Hungarian notation for classes and interfaces.

One of the bullet points in the list of “what makes a name a good name?” is and will be forever “as short as possible, as verbose as needed”, other points are “you must understand the name without studying specific rules before”. The last is why hungarian notation sucks so tremendously. The IFoo/ABar violates two of those criteria: first it is not as verbose as it could be with just a few keystrokes more: AbstractBar would work fine and is much clearer. At second it introduces a special notation you have to grasp before. While AbstractBar would be as descripive as possible, ABar is cryptic for those who are not lucky enough to practice Python programming.

If we are at it, the scheme makes it impossible to have grammatically correct names: IFoo would be read as InterfaceFoo which really should be FooInterface. And no, the fix is not FooI.

0

Join us

published on 2008|06|04


You were a bit bored lately: you wanted to have time and infrastructure for unit tests and continuous integration but it was “too expensive”, you wanted a more grown up, professional structure for development, a coding style, a build system, lots of books for training, augmentative thinking about architecture and object orientation or – more general – work you can both take pride in and sleep well with. This is how we want to develop software (and we are close to it and continuously improving). We are offering two positions: senior developer and another one more suited for career starters.
You will work on various projects, including a not yet released open source framework based on the Zend Framework (and yes, we are using PHP 5.3 for development). You should be fluent in PHP 5, at least know what unit tests are and you have a good understanding of object oriented programming. And no, you don’t need to know who’s invented the pepper mill or the handcar or PHP.
Additionally benefits include table football, a Wii, free water and coffee and silent workplaces.
So the ball’s in your court: if you are interested, drop me a message.

5

Security "to go"?

published on 2008|05|20

I’m a huge fan of PHP-IDS. Mario Heiderich and Christian Matthies did an incredible job polishing this tool, adding new features and trying to catch every esoteric attack signature. However I have the feeling there is some confusion (german) about what intrusion detection is for. On a server, intrusion detection is used to diagnose a break in. First of all you do everything not to let your server go down. You have a firewall, you try not to expose services to the outside, you do SSH with port knocking, you put a risky service into jail or chroots, you use the Suhosin patchset and so on. There are various strategies how to harden a server. The hardening is the barrier against break-in attempts.
If the hell freezes, the intrusion detection mechanism comes into play to make sure the attempt is not overseen and the machine does not become yet another zombie in a bot net. PHP-IDS is an intrusion detection tool on the application level. Application firewalls know about a certain protocol and its structure (e.g. HTTP) and inspect the protocol to detect attack patterns. Some of them are even capable of learning from usual request signatures and enforcing rules based on the learned data. There are various commercial products to achieve application firewalling. PHP-IDS does the same for free and sits directly on the webserver in the scope of the application. For personal usage or projects with a lower budgets who can’t effort expensive products, it might be a good supplement. Beside being a supplement, application firewalls are a valid use when security becomes an urgent problem: a lot of heavily flawed software is designed (often it is not even designed) and developed without a developer even heard about security: “Yes you can inject HTML, that’s a feature!”, “‘ OR true/* lists you every item, isn’t that cool?”. If such projects become popular, application firewalls might be an option to hotfix the disaster. But nevertheless the application needs to be fixed.
The very immanent issue with application firewalls is that there is no other place to know exactly what’s proper incoming data for the application – except in the application itself. That’s why application firewalls can never be perfect. IDS is needed for the 2% the developer forgot. So it is not like coffee to go. It is like having the coffee and adding milk or sugar. Having milk without coffee seems pointless to me anyway.

1

PHP Unconference Hamburg - Day 1

published on 2008|04|27

The first day at the PHP Unconference in Hamburg was quite nice. The day started with a slightly confused registration, followed by the notorious voting for sessions. Our planned talk was magically lost but I was too tired to object.
I attended two sessions, “Security Development Lifecycle”, a process model developed by Microsoft to strengthen the focus on security during development. While the entire process is pretty complex, there are a few ideas and basic rules that are worth adapting. Treating security problems as show-stoppers should be obvious, classifying attack surfaces, scenarios and privacy impacts is a thankless job, regular security training for the development team is a good idea, but do you really do it? The second session was “Ask the core developer” by Johannes Schlüter. It ended up pitying one another and wining a bit about missing innovation in core, an impression I don’t share.
The interesting parts were not the sessions but the corridor conversations. It’s always interesting to hear how others do PHP.

7

NOWDOC + double quotes = HEREDOC

published on 2008|04|12

PHP 5.3 introduces a new syntax element, NOWDOC. If you know HEREDOC, NOWDOC is easy to understand: it is in fact HEREDOC taken literally. Whily variables are expanded in HEREDOC, in NOWDOC they are not. Just to remind us, a small HEREDOC example:

$value = "Hello World!";
$var = <<<LABEL
$value
LABEL;

$var will contain “Hello World!” now.

<?php
$value = "Hello World!";
$var = <<<'LABEL'
$value
LABEL;

$value is not expanded, so $var contains literally “$value”.

For consistency and the sake of completeness, an alternative syntax has been introduced:

<?php
$value = "Hello World!";
$var = <<<"LABEL"
$value
LABEL;

Guess how it behaves …

Tags:

3

Best practice: Convenient notifications

published on 2008|04|08

Community websites often notify users about events taking place on a website. These notification, may it classically be an email or may it try a cooler approach like instant messaging, can be good or bad. Most of the time they are really inconvenient. The following list provides a few ideas how to do it better.

Let me adjust

First of all, let me decide which notifications I get. This will make me feel better and making me – the user – feel better is always a good thing. It is sad that not everyone is doing it. And yes, per default every notification – maybe except the notification about new messages – is disabled.
This implicitly

Let me decide

I hate HTML emails. I really hate them. They are seldom rendered correctly and hide the information I want to know in an unnecessary layout. Emails are just text and that’s fine. But I know people who really prefers this colorful stuff. Some people just weight a well formatted mail higher. So give them HTML. And text for me. The email standard provides us a way to do this: multipart. My mail client will render text, for others it will render HTML. We are both happy.
There are also libraries encapsulating the multipart standard. Zend_Mail from the Zend Framework is one of those, ezMail from ezComponents is another option.
By the way: there is no need to keep the mail template twice, one for text and one for Email. Use reStructuredText or Markdown to do both in one go.

Authenticate me

When I click a link in the notification email I do not want to get a 403. I want to be logged in seamlessly. I know this is a trade-off between convenience and security but generating a secure token for each sent notification is not that hard. When I open my messagebox, let me just click the link host.com/messages/abcdefg123456 and my mailbox appears and I’m logged in. Technically that’s not hard to do, one authentication token per mail with an TTL of a week or so. This applies for Instant Messaging too.

Don’t try to track my behaviour

It’s alright if you send me a mail if I’m not logged in for two weeks (unless I don’t disable it) or if you offer me cheaper deals for your payed account (unless I don’t disable it …). But please, do not try to track if I read the mail or what link I have clicked or whatever you might want to know. My inbox is private and when the mail is in my inbox, you are no longer allowed to do anything with it.
The data you get from this kind of tracking will be flawed and wrong anyway – see the chapter about formats and my preference for text above. Also a number of widely spreaded webmail clients like Google mail will not render your tracking pixels anyway, so stop doing that.
And no, it is not cool the use the XHTML extension of Jabber to load tracking pixels (alright, me, being a geek would find it sort of cool).

My mailclient is not your billboard

If you provide a free service for me and do advertising, even context sensitive advertising, it might be a fair deal for me. But this does not apply to messages you send to me. Again: my mailbox is my mailbox, so it is an act of grace to allow you to send me emails. A lot of people more important than you are not allowed to do that, so feel honored. It is just an act of courtesy to not include advertising material from you or your partners in a notification.

Show me your guts

I would really love to see the mails you have sent me and what the status is. If your system tells me the notification mail for XYZ has been delayed, because the receiving SMTP-server does greylisting I want to know that I have to wait a few minutes. You will have that information anyway so why shouldn’t you make the process transparent to me? And no, you are not going to tell me the exact SMTP status, assume I’m not the nerd I am, assume I am a customer who wants to know what’s going on. It’s not that hard to translate an SMTP status code into a human readable (and understandable) message which can be displayed to the user.

1

XHR request signatures and Dojo

published on 2008|03|25

Recently I discussed Zend Framework’s XHR integration. As a result of my research I’ve filed a bug to let Dojo send the X-Requested-With-header per default. A patch has landed in Dojo and will be part of the 1.1 release.

3

Preorder!

published on 2008|02|26


Garvin Hicking asked me to promote his book featuring Serendipity, the world’s best weblog engine. Garvin is the maintainer of Serendipity and you can assume he didn’t spend less energy in writing this book as in coding Serendipity. So it must be worth buying it.

So preorder now!

Full disclosure: I’m using Serendipity, I like this project and maybe I’m getting a copy of this book for free.

Tags: ,

12

New magic constant in PHP 5.3

published on 2008|02|22

In PHP 5.3 there will be another magic constant __DIR__. Until 5.3 a typical pattern to include files was to do something like this:

[geshi lang=php]The extra dirname()-call will be gratuitous:

[geshi lang=php]__DIR__ always references the directory which contains the current file. In case of /var/www/host/app/foo.php the __DIR__ will reference /var/www/host/app.
To allow this, the internal function php_dirname() has been moved in the Zend Engine and is now called zend_dirname(). Nevertheless an alias still exists.

4

Dojo and the Zend Framework

published on 2008|02|08

The Zend Framework provides a neat function in its request object called isXmlHttpRequest(). The following is therefore possible:

public function someAction()
{
    if ($this->getRequest()->isXmlHttpRequest()) {
        // AJAX request specific parts
    }
}

Zend_Controller_Request_Http::isXmlHttpRequest() checks internally if the request header X-Requested-With is set to “XMLHttpRequest”. If this condition is fullfilled, the request is considered an XH request. However, Prototype, jQuery and YUI set this request in their XHR abstractions, Dojo does not. The following snippet helps to set this in Dojo:

dojo.xhrGet({
    url: <url>,
    load: <callback>,
    headers: {"X-Requested-With": "XMLHttpRequest"}
});

This stucks the X-Requested-With header into the XMLHttp-request and isXmlHttpRequest() will return bool(true).

What the others do

Interesting to see that Rails is a bit more generous in what it accepts as an XHR. As this header is a pseudo standard it would be worth doing it exactly the same way as Rails does it. Nothing is worse than a pseudo standard with slightly different implementations.

def xml_http_request?
  !(@env['HTTP_X_REQUESTED_WITH'] !~ /XMLHttpRequest/i)
end

The RequestHandlerComponent from CakePHP provides a method isAjax() in the stricter Zend Framework variant. Django, Pylons and TurboGears seem not to provide such helpers.

(Page 1 of 5, totaling 61 entries) » next page