It's a web server, not a task multiplexer

I spent an exciting part of today debugging a service which was failing in production, but not in any demo, UAT or dev installations. In doing so, I discovered one of those architectures which requires me to go and make a cup of tea in order to distract my hands from doing the double facepalm they so instinctively want to do.

The gist of said architecture is this. You have a web site or service, with some public resources like /puppies and /kittens and so on... and then in addition to that is a special hidden resource like /longrunningpoocleanuptask. What's that? Oh, there's a scheduled task running on a box somewhere to make a POST to /longrunningpoocleanuptask every 10 minutes. (Or worse, a global handler decides if it's time to run the task on every request to the server). It creates a new thread to run the task, then returns a 200 OK. It's fine, we tested it in demo, and even though it sometimes doesn't work the next time the task runs it picks up all the additional poo just fine.

This is a crushingly stupid idea.

Your web server is designed around the concept of serving requests. A request being a short unit of work that finishes by returning a response. That's important. Not, "finishes some time after returning a response, when all the long-running asynchronous tasks it invited to the party have got bored and gone home." IIS in particular will assume anything you left on the thread pool is a mistake, and thus happily kill said threads if it needs them for serving a new request. This is why it took a production environment to uncover this bug - demo servers just don't see enough requests for IIS to be regularly picking threads out of the pool.

Phil Haack explains a lot more on why this is a bad idea, and I recommend you follow his advice on why this is a bad idea rather than his suggestion on how to make this work. (Which, I will admit, is very useful if the pattern already exists in your project and you're stuck with it.)

So why does this thing keep cropping up? This is far from the first time I've had to rescue a web server from being forced to run long background tasks, or patch up the damage if I got to the project too late.

In my experience, it's the good old hammer problem: I have something running in a web server, therefore I shall implement all of my functionality inside said web server. Even when I find myself adding three sprints to the project to write a half-arsed event queue and wasting a day on trying to find out where an exception went, I'll convince myself that what I'm doing is "simpler" than having a small console app which slurps events from a ready-made queue or, hell, even just runs on a schedule to pick up whatever needs cleaning. (Another useful benefit is that you can check easily whether an app is already running, whereas checking whether the web server is still handling your previous POST is a bit more of a challenge.)

The sad thing is I'm not even advocating anything complicated to fix the problem of needing something to run the odd long-running task; just boring old well-understood technology like cron jobs, shell scripts and small console applications. If you have a decent CI/CD system then none of this should add any day-to-day deployment burden, and if you're not it's still only a matter of having a deployment script. Let the web server get on with being a web server, and don't load it up with things it's not designed to do.