I can't say that I'm wildly familiar with web servers in general but this is a small, simple server and I use it a fair amount a test server as it requires no configuration -- just a folder to work from.

I have some plans to develop it over in projects but they've been stalled by the fact that, as usual, my style of coding is less than compatible and since I'm pretty much fixed on using DragonFly as my primary server platform from here on out, it's probably worth re-working this basic idea (i.e. a non-forking static server) to do the job -- anyway, I digress.

With the help of ikiwiki.info/joeyh it became clear that thttpd's sanitization of the client headers was not only dubious but wrong. It was stripping the port number -- and so ikiwiki was struggling with the CGI processing; or more specifically (as Joey put it) perl's CGI module was struggling and since perl's CGI module pretty much defined CGI I was a fair bet that thttpd was buggering it.

Quick glance to the CGI specification and we see that the HTTP_ responses are built from the actual client headers so we'd better check those and a quick server hack

from BaseHTTPServer import \

class http_handler( BaseHTTPRequesteHandler ) :
    def do_GET( self ) :
        print self.headers()

if __name__ == "__main__" :
    http_addr = ( 'my.server.rox', 80 )
    httpd = HTTPServer( http_addr,
            http_handler )

was all that was required to confirm that we were getting a port number in the Host: header.

Now we jump to thttpd source and it's pretty clear that it's deliberately removing the port.

httpd_parse_request( httpd_conn *hc )
    if( strncasecmp(buf,"Host:",5) == 0 )
        cp = &buf[5];
        cp += strspn( cp, " \t" );
        hc->hdrhost = cp;
        cp = strchr( hc->hdrhost, ':' );
        if ( cp != (char*) 0 )
            *cp = '\0';

but that's obviously not what apache and the rest of the interwebs do.

In truth, it's not precisely defined what a server should do (and modifying the header seems wrong) but whatever logic was behind this code has been lost. So we cut it. Simple.

I was also tempted to remove the other check which checks for path and relative components but those conditions throw an error, so we'll leave them for now.

I'd suggest that a server do nothing and applications should presume nothing. The header fields defined as client supplied entities, however, the decision would require a degree of investigation as it seems sensible that the Host: be something that the server can "expect" (i.e. sanitise).

It is an interesting discussion though. What level of sanitization should a server perform on the CGI variables it passes? Of course since CGI is largely dead (in favour of WSGI), does it really matter? Or, alternatively, is it so well established in convention that theory is irrelevant?

Pretty sure the answer to that last question is, 'Yes.'.


diff --git a/libhttpd.c b/libhttpd.c
index 73689be..039b7e3 100644
--- a/libhttpd.c
+++ b/libhttpd.c
@@ -2074,9 +2074,6 @@ httpd_parse_request( httpd_conn* hc )
        cp = &buf[5];
        cp += strspn( cp, " \t" );
        hc->hdrhost = cp;
-      cp = strchr( hc->hdrhost, ':' );
-      if ( cp != (char*) 0 )
-          *cp = '\0';
        if ( strchr( hc->hdrhost, '/' ) != (char*) 0 || hc->hdrhost[0] == '.' )
            httpd_send_err( hc, 400, httpd_err400title, "", httpd_err400form, "" );