AI Agents are great for tedious programming tasks

I have been pretty anti-AI/LLM for a long while. Particularly when it comes to LLM applications that try to handle things artistic or diminish human experiences and pursuits. A short example of where I'll draw the line here - I haven't used any LLM to correct what I've written here. Enjoy some fresh artisanal human-only prose. However, I've been intrigued by what LLMs can achieve when facing logic or "menial" tasks that humans usually try and automate, but we've not figured out good solutions for yet. I have explored using [GitHub co-pilot tasks](https://github.blog/changelog/2025-10-28-a-mission-control-to-assign-steer-and-track-copilot-coding-agent-tasks/) for some tedious tasks on my open source projects and I have found it to be really useful at keeping the project tidy. # Background Let me give some context before we dive in. I love writing open source projects in my spare time. Either they are [solving bug bears I've had for years](https://github.com/phalt/clientele) or they are [long term API projects for hobbies I have](https://github.com/PokeAPI/pokeapi). I also do a fair bit of "maker" or [home automation with software](https://github.com/phalt/calcifer) that I write myself. Now I am 36 years old (at the time of writing) I am also a time-poor parent, and I prefer to focus my spare time on my family, my friends, and not being sat at a computer (because I've done that for nearly 20 years now and work has made it a bit less fun). My most recent project that has had some promise is [Clientele](https://github.com/phalt/clientele), which could be more successful but I don't have any time to do all the menial or tedious work that is involved with maintaining it. I think Clientele is only a few more features away from being great, but like I said - I don't have the time to invest in it, and when I do get around to working on it I am faced with a huge pile of tedious things I need to resolve. # Using LLMs to complete menial tasks I read a lot about LLMs, and it is clear that is has a huge mountain to climb to meet the standards of a human, and I personally don't think it'll ever get there. Actually, if we take a small tangent right now, I personally feel like we'll experience the [Jevons paradox](https://en.wikipedia.org/wiki/Jevons_paradox) and instead of taking away software engineering jobs we'll end up all being more productive by using LLMs and therefore we'll get the opposite - an explosion in software engineering jobs as the cost to build software goes down. Anyway, back to menial tasks. These are the sort of things we _have to do_ to keep a project current but they are boring and tedious and not worth the time. This is usually because they require no problem solving, just tinkering until things work. I have been [maintaining a list of things](https://github.com/phalt/clientele/issues?q=is%3Aissue%20state%3Aclosed) I want to do with Clientele, and I decided to put GitHub's Tasks to the challenge of solving as many of them as possible. Here is the full list of things that it has done for my project: - Update the CI pipeline to cover more versions of python in my tests. - Added mypy and other linting to my CI pipeline. - Addresses some basic programming inefficiencies like string concatenations or N(0) problems with my code. - Swapped out Poetry with UV for managing the project's dependencies. - Set up a small set of tools for helping me publish the package to Brew.sh. - Updated the project to support the newest versions of python. - Refactored a bunch of imports to use module-level imports in all files. For all of these tasks I supervised and directed what the LLM was doing if it ever veered off course of the goal I gave it. I also regularly added extra things like "make sure you update the change log". I set it to work in the background and got on with my day. Sometimes I even used the app on my phone to direct it and review the code changes, all while I was out and about, or on the loo! Without any direct prompting from me the LLM figured out how to use my makefile, run tests, and do things like linting. I guess this is a core feature of Co-pilot itself and it is not applicable to generic LLMs. At the end of each task it creates a pull request, which I review like any other. For me this means a pretty strict review and I would give it prompts on what to do next if it had made a mistake. All these little things were very successful, so I decided to test it a bit more - to not do anything requiring _too_ much individual thought, but stuff that was easy enough to figure out based on patterns and established conventions in the language itself: - Added a whole new feature to the client generator based on the established patterns in the codebase. - Fixed bugs raised by other people about some poorly handled implementations against a known, defined standard (Swagger spec). # So why does this work? I think I've convinced myself that this is a good use case for LLM agents. All of the things I've asked it to do are tedious, time consuming grunt work that I'd barely let a junior do because even for a junior it is a waste of their time. None of these tasks require divine inspiration, ingenuity, or diverse thinking in order to make these changes. The changes are mostly well established patterns that many projects have so the chances of it making the wrong suggestions is lower than average. Another way to put it: the problems are already solved. Being logical coding problems, it is easy for the LLM to figure out what is correct. The project I asked it to help with is small enough to gain full context, and also organised well enough that co-pilot can figure out the patterns and ways of contributing to the code. So it does require some human curation (aka software engineering) to allow the LLM to operate well. I think a poorly organised project would be far more confusing for it. The fact I supervised and reviewed all of what it had done before merging the changes was enough for me to be confident the changes were correct. I think this is great. It means the project is well maintained, current, and any free time I might have to dedicate to it can now be used for new valuable input rather than maintenance. Just as a final note - I was charged around $5USD for doing all of this over a few days. That is absolutely worth it in my opinion. You can see all of the contributions, including the prompts I have given to co-pilot here: https://github.com/phalt/clientele/pulls?q=is%3Apr+is%3Aclosed Thanks for reading this article, if you'd like to get in touch you can reach me through: - [Bluesky](https://bsky.app/profile/paulwrites.software) - [Email](mailto:[email protected]) - [GitHub](https://github.com/phalt) Paul