geek-isms: PHP & multiviews

Apache, PHP and MultiViews

A dedicated webmaster

As a dedicated webmaster, and a true geek, I am trying to create clean, straightforward code and organization for bmt-online. Thus, after installing Apache 1.3 on my Windows box, after installing PHP4, in order to have an easy document organisation on the site, after validating all the content with W3C, I started to think about how to organize my documents on the site.

While doing the validation, I came across an interesting article on the W3C site entitled "Cool URIs don't change". A whole program ! Suffice to say that I was convinced, and decided to take advantage of the facilities offered by Apache.

Thus, the current page, as I am writing this article, is a php script. Its canonical URL is :

http://www.bmt.dnsalias.org/geekisms

However, it would be equally valid to access it at:

http://www.bmt.dnsalias.org/geekisms.php

All this is fine, and I am pretty happy about my setup, except for one little detail: the CSS validation of the documents when refered by the "good looking" URIs was broken. Instead of congratulating me with my valid CSS, the validator now spit a hard and short error at my face:

I/O Error: http://www.bmt.dnsalias.org/: Not Acceptable .

Investigation...

After investigation, it turns out that this is the result of the conjunction of several independant factors. So, let's look at the technical details of my setup:

  1. I set up PHP for Apache as a module. This apparently requires that a pseudo MIME type, application/x-http-php be declared so that the module correctly handles PHP files. The directive
        AddType application/x-httpd-php .php
    in my Apache configuration file in effects makes all files with extension .php to be considered of type application/x-httpd-php.
  2. To achieve the "good looking" URIs, I used the MultiViews directive of Apache. The description, taken verbatim from the Apache documentation is:

    The effect of MultiViews is as follows: if the server receives a request for /some/dir/foo, if /some/dir has MultiViews enabled, and /some/dir/foo does not exist, then the server reads the directory looking for files named foo.*, and effectively fakes up a type map which names all those files, assigning them the same media types and content-encodings it would have if the client had asked for one of them by name. It then chooses the best match to the client's requirements.

    The last part is important. Remembering that we attributed the media type application/x-httpd-php to PHP files, we see that while processing the content negociation request of the client, the server considers the PHP files as being of their own private type.
    This is somewhat off, given that the actual type of the output of the PHP script is a genuine HTML page, with correctly set text/html content type, but, oh well, it doesn't do much harm. Or does it ?
  3. The CSS validator at W3C is a picky web client. A capture of the packets exchanged between the validator and my server shows that it sends a quite peculiar Accept: field in its HTTP request.
        GET /geekisms HTTP/1.1
        Cache-Control: no-cache
        Date: Thu, 10 Jul 2003 00:06:02 GMT
        Pragma: no-cache
        Accept: text/css,text/html,text/xml,     \
          application/xhtml+xml,application/xml, \
          image/svg+xml,*/*;q=0
        Accept-Language: en-us
        Host: www.bmt.dnsalias.org
        User-Agent: Jigsaw/2.2.0 W3C_CSS_Validator_JFouffa/2.0
    So, wait a minute, what ? This means that the validator accepts css, html, xml and a few other mixes of standard web publishing formats, and, */*;q=0, nothing else ?!?

Yes indeed, that's what it means. This client is picky, but it is its own right. It is not violating any standard, nor any rules of good conduct. It is just enforcing the fact that, being a CSS validator, it has to be served an *ML document.

Now, during content negociation, triggered by the fact that the client asked for a content-type-generic URI, the server found out that it was asked for *ML only content, while it was only able to deliver application/x-httpd-php content. Thus, complying with the HTTP 1.1 standard, it returned a 406 Not Acceptable response to the client, which in turn displayed it on its result page as the dry error above.

Where to go from there ?

When I found out about that problem, I started, of course, researching about it. There are several faqs and bug reports describing the problem around:

I have tried to bring up the matter on comp.lang.php, to no avail. On the #php channel on IRC, I have been told to read the fscking manual :-(.

In any case, nobody was able to tell me if there was a known workaround for the issue, or if there was a way to install PHP for Apache as a module invoked as a handler, or any way to modify the dreaded mime type application/x-httpd-php and still have the scripts correctly processed by PHP.

I have had a look at the code of the PHP module. The mime type is very much hard-coded in there, as evidenced by the following excerpt from the code:

    handler_rec php_handlers[] =
    {
      {"application/x-httpd-php", send_parsed_php},
      {"application/x-httpd-php-source", send_parsed_php_source},
      {"text/html", php_xbithack_handler},
      {NULL}
    };

I thought there might be some hope with the line containing text/html. It handles the so-called x bit hack, which allows PHP to "parse files with executable bit set as PHP regardles of their file ending". However, it is no more than a hack as its name indicates, and it does not work as expected under a Windows system.

A story in the making

As it is, I still haven't found a satisfying way of solving the issue. If a certified PHP geek is able to devise a viable workaround, he is more than welcome to contact me and let me know about it.

wed 2003-07-09