Swift re-write for QuickPkg and some thoughts on LLM aided coding

I have created a new version of quickpkg, you can get it here.

If all you care about is the new version of quickpkg, go follow the link. There are also a few new features. Hope you like them.

Full disclosure: I used Claude Code to create this new version. I recently got access to Claude Code through work and I chose quickpkg as an experiment to understand where modern “agentic” coding tools are and how they fit in my workflows, coding and learning processes.

I have been and (spoiler) remain a skeptic of modern “AI” hypes and the companies whose business this is. I am not a skeptic in regards to there being useful aspects to Large Language Models (LLMs) and machine learning based solutions in general. For example, I have been living in countries where the main language is not my first for more than twenty years and the progress of translation software, both text, visual and audio based has massively simplified that experience recently.

I have been trying out various LLM based tools over the past years. I always got frustrated very quickly. I was told alot, that I was “holding them wrong” but the frustration always seemed to outgrow the benefit in short time. None of the upsides outweighed my concerns on the social, economical, ecological and ethical impact of the tech. (More on that later.) Certainly not enough to purchase any of the subscriptions which would give me access to the better models, which would be so much better, I was repeatedly told.

I have always believed that I should know and understand the things I criticize, so it was time for an experiment.

Why quickpkg?

This seemed like the perfect experimental project to me. quickpkg addresses a very specific problem, that I happen to know quite a bit about. It is simple, but not trivially simple. It is a command line tool, which is far less complex than a tool with a graphical interface.

quickpkg was originally written in Python 2 and when the demise of that version of Python was evident, I put in minimal effort to make it work with Python 3. Re-building it with Swift to remove that dependency had been on my to-do list for a long time, but it never made it high enough on my priority list.

Converting code from one programming language to another is tedious for humans (part of the reason I procrastinated on this) but something that coding assistants are supposedly very good at. On the other hand, building macOS installer packages is something that is woefully under documented, so I expected a bit of struggle there.

How it went: the translation

To prepare the project I created a new branch on the existing repository and created a template Swift Package Manager folder structure for a ‘tool.’ (a command line executable using the swift-argument-parser library) I set the Swift language version in the Package.swift to the 6.0 expecting/hoping that this would make it use the latest Swift concurrency. I told the agent that I want to translate the python code to Swift using swift-argument-parser and the new swift-subprocess package.

The agent went off for a few minutes to analyze the existing project, created a Claude.md file with its findings and presented me with a plan on how it would split the functionality contained in a single python file into various swift files. The plan looked reasonable to me and told it to go ahead and it started its work. I could watch the code it generated and it asked for a few confirmations.

I had to interrupt it at this point, since it apparently had no idea about the swiftlang/subprocress package I had asked to use and kept choosing either an older and long not updated subprocess repo hosted on the Apple GitHub or one from Jamf, which uses Foundation.Process for running shell commands. Then the agent even preferred building its own functions (also with Foundation.Process) instead of using the subprocess package I wanted. I had to explicitly add the swiftlang subprocess repo to the Package.swift myself and reference its documentation before the agent consistently used it, over the alternatives.

Once I had overcome that problem, the rest of the translation went fairly smoothly. It took maybe 10-15 minutes, which is obviously far faster than I could have done it.

Towards the end of that process, I could watch the agent repeatedly compiling the command line tool, and fixing errors that occurred. This seemed a very human approach to me. When the compile succeeded it started running the command line with a local app to test if it actually did something. The only outcome it tested for was whether a pkg file with expected name existed, not if it was a valid installer pkg file. It’s a good start, but there are obviously more things that would need to be tested.

It even ran the correct security command to determine a Developer ID certificate to test the --sign option. Then realized I had documented the command in the ReadMe file for the python tool, which gave me insight into where it got the information from.

The local application the agent chose to re-package was /System/Applications/Calculator.app which is a poor example for many reasons, but works for generating the pkg file. The resulting pkg file is useless because that folder is part of the signed system volume. I wondered for a moment, whether it had picked that up from the ReadMe, too, but I had used /Applications/Numbers.app in those examples. I had Numbers.app installed on the machine I was running this on, so why it didn’t respect that information from the documentation remains a mystery.

Once the agent told me it was ready, I did some more detailed tests, testing a few more input file types and several combinations of options. Since one of the main use cases I use quickpkg for is re-packaging Xcode, which is also the only real-world example of an app delivered in a xip archive, this took a while, even on a M4 MacBook Pro. Overall, about 90 minutes after giving the first set of instructions to Claude, I determined that the translation had worked.

Success?

Remember that Claude had a working python script to start out with. Nevertheless, aside from getting Claude to accept the (admittedly quite new) subprocess repository, this was a smooth process. I could and probably should have written up a list of commands and sample apps to use for testing and Claude would have done those for me, as well, saving some time in between as I invariably got distracted while larger packages built.

At this point, I could have stopped and called it a success. The code works. I can’t tell for sure how long the translation would have taken me manually (more on that later) but I am certain that I wouldn’t be able to do it in 90 minutes, let alone 15.

So, huge gain in efficiency, right?

Technical and cognitive debt

When I mentor people on scripting and coding I always stress that “working” is the most fundamental success criteria and everyone should be proud when they achieve that.

However, passing “it works” is only the first step along the way. If you plan to support and maintain and possibly build on the code going forward, you need to take the time to clean-up, refactor, and document the code. Especially, if you are planning to share the code.

Since the tool was working, I really wanted to publish and share it on my GitHub. But that means that I will be responsible to support the tool and the code going forward. Regardless of how the code for the tool was created, it is now my responsibility. So, I have the obligation to review and understand the code. This is another reason I chose a small project with a limited scope, since I anticipated that I wouldn’t have the time and energy to review and understand the code for a larger project that an agent could have generated in a fairly short time.

I actually started with the code review while I was testing whether the package build process was working as it should. As I said, some of those packages take a long time to build. Unfortunately, I started editing the generated code immediately, without creating a commit in the repo. I regret this now as I cannot link to these first changes.

Most of the code was good. There were a few cases of code repetition, as if a lazy programmer had copy/pasted certain code instead of abstracting it into a function or method. I have certainly been guilty of this a lot. But this is exactly what the “clean up” phase of a project is for.

There was one big four-way if-then clause in the ShellExecutor type, that was partially redundant. It checked for a nil value on workingDirectory and used two different calls to Subprocess.run, even though that function already takes an optional value. Then it did the same check for input resulting in a big unwieldy if-then clause with four calls to Subprocess.run that were only slightly different. Not wrong. The code did the right thing, but it was very hard to read code.

I actually think the entire ShellExecutor type is redundant and comes from the very many projects that use Process to run shell commands and need a wrapper type. At that moment I was happy to fix only the most egregious issues. (I have since refactored and removed the ShellExecutor type for the 2.0.1 release.)

Again, the code was working before. This is cleanup and refactoring to make the code more readable and understandable. I strongly believe more readable, clean code is easier to understand, maintain, and extend at a later time. I value putting in this extra effort, whether I have written the code myself, or get it from somewhere else. This process also forces me to understand the code, not just read over it and nod and feel “that’s good.”

Until this point, I was mostly editing the code myself. The connection from thinking about a code change to editing it myself in the editor is a long-trained habit for me. But then I remembered that I could tell Claude to do the refactoring. This worked surprisingly well. However, for small code changes, it felt slower and more complicated to phrase the change in ‘normal’ english, rather than just applying the change myself.

For example, I told the agent to create an extension on URL to wrap isFileURL and FileManager.shared.fileExists(atPath:) to make all the checks whether a file exists more readable. It did that and replaced all the uses of the less readable FileManager.shared.fileExists(atPath:) method. But I needed three attempts to phrase the request correctly and feel I would have been faster just writing the extension myself and using find and replace.

The run() function the agent originally generated was very long (again, something I have been guilty of a lot) and I asked it to refactor it with functions to make it more readable, and the result was quite good, but I needed to review these changes again to understand them and be sure the code and functionality remained the same and I feel that took at least as much time as doing it myself.

After a bit of refactoring and cleanup, I felt I understood the code that was generated. There was more cleanup to be done, which I put in the 2.0.1 update. But I was itching to add a few features that I wanted an updated version of quickpkg in 2026 to have.

quarantine flags are removed from the payload before packaging
minimum OS version is picked up from the app bundle and applied to the pkg
pkgbuild‘s compression option is set to latest with a command line option to revert to legacy
quickpkg now builds distribution packages/product archives by default

These weren’t complicated additions and the agent did those just fine. I really appreciated that it often (but not always) would update the ReadMe file to match the new options. The inconsistency was a bit frustrating.

Packaging the tool

I did try to use Claude to build a script which compiles, packages and notarizes the command line tool, which quickly turned into a frustrating experience. If the LLM could feel frustration I am sure it would have been mutual. Building, signing, and notarizing are famously under-documented tasks, even though my articles on the subject have been around for a while.

I gave up on that and copied the pkgAndNotarize script from another project. I couldn’t let it be and asked Claude for suggestions on how to improve that script and it suggested checking whether the signature certificates and keychain profile entries actually existed, which I thought was a good idea.

However, it konfabulated a notarytool store-credentials --list command to determine whether the keychain entry exists and I didn’t catch that until later, when I actually tried to build the final pkg. That should teach me to trust the LLM at its edge of competence.

Efficiency?

Compared to my earlier experiments with LLMs for coding, I was surprised how far the ‘agentic coding models’ have come. You cannot argue that they are completely useless anymore.

Translating working code from one language to the other is an easier task than generating code from scratch, but still. The fifteen minutes or so it took to generate a working Swift version is impressively fast.

Human developers are generally quite bad at judging how long a task will take. They are also very bad at judging how long it would have taken them with or without LLM support, compared to however they did it. There is research supporting this claim.

So, take my estimates with a grain of salt, but I estimated (before I started on the Claude project) that re-writing quickpkg and adding the new features would take me four to eight hours.

Now that I have seen and reviewed the generated code, I could re-create that much faster than my original estimate. Had I done the translation by hand before I put the agent to that task, the prompts would have been different and my review of the generated code would have been faster, because I would have had an idea of what to expect. So, either way, there is no fair control test.

Fifteen minutes compared to four to eight hours. I can see how someone might get excited at this point, call it a day and claim a huge efficiency gain.

There is a word for trusting the output of a coding agent without testing and verification: “vibe coding.” I consider it an horrendous lack of standards.

It took me more than an hour to verify that the generated code was actually doing what it was supposed to. I consider this really important since it generates package installers that install files on potentially thousands of devices. I might have been able to save some time by giving the agent more detailed instructions on how to test. Automating tests is good. But it wouldn’t have been much faster and defining the tests would have taken quite some time, as well. Re-packaging Xcode simply takes a long time and is an essential test. Also, I would still have had to verify that the agent was performing and evaluating the tests properly.

Then it took me another three to four hours to understand, review, and cleanup the code.

I would have had to test, review, cleanup the code if I had done the translation myself, but much of that would have happened during the re-writing, so that is part of my original estimate. And, of course, I understand and trust the code I wrote myself much better than code I get from elsewhere.

I do not dare to declare my code as always perfect, but neither is LLM generated code, so that’s a fair comparison. When I have to debug issues in the future though, I will be faster understanding the issue when it is my own code, or when I invested the time to review, understand, and clean up the code.

In the end we have five and half hours of time spent with Claude versus the four to eight hours estimate without. Much less exciting.

There’s a lot of dicussion that could be had here. How good is my estimate? Would I be more efficient with an agent if I spent more time learning the tool and how to write proper prompts? Will future models or agents be much better? Is it necessary to review and understand the generated code, as long as it works?

A comparison

Indulge me for a moment. I will get back to the topic.

For my lunch break, I usually go for a walk. There is a shopping area nearby, with a super market and a bakery, so I usually pick up some groceries. Depending on how much time I have available, I walk either a 2km, 3km or 5km loop. This gets me out of the house for some scenery, sun (weather permitting), and fresh air, provides some exercise, allows—no, forces—me to disconnect for a while from whatever I am doing at the desk and screens. It keeps the groceries stocked and I also get something nice from the bakery for lunch.

I could go get the groceries with the car. It’d be faster and take less time, so if that is your metric, it would be “more efficient.”

Yet I have no desire at all to replace my walk with a car trip. Less time is not what I value for my lunch break.

A car trip would have several downsides. Instead of a relaxing walk through parks and backstreets, I’d have to focus on the road and traffic, bikes, and pedestrians while driving and looking for a parking spot. I wouldn’t get the exercise, little as it is. I wouldn’t get a mental break, which I know will reduce my focus and productivity in the afternoon and evening. I couldn’t enjoy the sun. (Or rain, as it may be.) A car trip would also use far more energy and be more of a burden on the environment.

If I really wanted to optimize my grocery shopping for time spent, I could go to the big super market once per week and not leave the house at all during the week.

It’s not that taking a walk or the car, or going to the big store once a week are “better” or “worse” solutions. Each is an optimization for a different goal. Each has a different metric, different values that it is more optimal for.

quickpkg is a simple project. This was an intentional choice for this experiment, since I didn’t want to spend too much time on it. The quickpkg rewrite was also the first time I used the new Subprocess package in one of my projects, so one of my goals was to learn how that worked. Had I let the agent use the old Process way of launching shell commands that it wanted to use initially, or had I not reviewed and cleaned up the generated code afterwards, I would have learned nothing about the new Subprocess package.

There are other code projects I am currently working on, which are far more complex than quickpkg. Yet I feel no desire to use the assistance of an agent on these projects. For these projects, my main goal is to have full ownership and understanding of the code and their workflows. I am learning a lot about how I can control aspects of the system with Swift code and the macOS native frameworks. A lot of this is new to me, or I am re-visiting things that I thought I knew from a different perspective and challenging my knowledge.

Obviously, a result has to be delivered eventually, but gathering knowledge about the system and how to code these particular problems and exploring the limits of what is possible and, more importantly, what is not possible, has been a goal of these projects from the very beginning. In the course of this project we have already found some limitations we hadn’t antipicated, but also found solutions we had thought completely out of reach when we started planning.

If I didn’t challenge myself to explore the possibilities and craft the code and design the workflows, I believe the project would be far less useful than it is right now. I also believe I will be better at my profession and at implmenting future projects because of these experiences.

Keep in mind that as a consultant in the Mac management and system administration space, we live very much on the edge of things that are commonly documented. Since LLMs work on probabalistic data from large data sets, they get worse when there is less documentation. I could tell that Claude was fairly solid with common tasks, such as building a command line tool and refactoring code, but started konfabulating with pkgbuild and notarytool. When your project is more within well-documented domains, you will have better results.

This is also the reason I don’t use LLMs for writing. For me, the process of writing is a fundamental part of sorting out, challenging, and clarifying the half formed ideas contained in my head. I also generally enjoy the process, or at least gain satisfaction from the finished text. I would not and could not ask another person to do this process for me. How could I ask a machine? Why would I ask a machine?

Why would I take a car trip for my lunch break?

The upside

However, I will admit that I have used the built-in Xcode LLM functionality on a few occasions and found it helpful.

The first situation was a gnarly SwiftUI layout problem that I couldn’t find a solution for on the web. When I asked the Xcode 26 ChatGPT integration, it built a solution that worked, even though it seemed quite elaborate. Just last week, I found a weird crash that would happen when the window was resized a certain way and I couldn’t understand why. I fed the crash log back into the ChatGPT assistant and it pointed to a recursion generated by the interaction of the generated layout code and a seemingly unrelated, different view object. The suggestions to fix the issue from the assistant turned out to be dead ends, but it would have taken me much longer to identify the problem without the agent’s analysis. (I was able to remove the problem by reviewing, refactoring and simplifying the code. At least I hope so…)

When you have the ‘Coding Intelligence’ enabled in Xcode, there will be a “Generate Fix for this Issue” button next to the error and those can be very helpful to explain obscure compiler errors. SwiftUI certainly generates a few of those. Even though I rarely use the suggested fixes, the explanations of the issues usually are very helpful.

I believe it says a more about the sad state of modern IDEs, systems and frameworks when you need a large language model built with thousands of GPUs and hundreds of billions of tokens to understand a crash log or compiler error, than it does about the supposed “intelligence” of the model. But I will admit that has saved me a ton of time and frustration.

Should we focus on improving the frameworks, logs, and developer environments, rather than building monstrous data centers? Well, I guess that depends, like my lunch break walk, on what you are optimizing for…

Conclusion

I have been talking about efficiency and how we measure it, or don’t. I have not addressed all the other externalities that concern me with regards to LLMs and the general AI business these days.

My example illustrates that different solutions can be “best” when you are valuing different outcomes. I think a lot of the discussion of around coding agents and LLM help in general is based on a mismatch of values.

You may care more about “immediate time spent” with no concern for future ramifications and time you may have to spend later on improving the code. Technical and cognitive debt may not be part of your metrics. (They are difficult to measure.) You may not value the habit of building a tool as a means to learn about a particular topic. You may not care about the exploitative practices of the AI industry, which gathered and stole source material from where ever they could with no regards to ownership and licensing and now want to re-sell the digested slop back to us. You may not care about the unintentional—or sometimes fully intentional—political, ethnical, sexist and countless other biases in the data models. You may not care about the impact to your personal learning and growth, and education in general. You may not care how the next generation of experts is supposed to build their experience. You may not care about the ecological impact of the industry and the massive data centers they are planning to build. You may not care about the skewed and possibly fraudulent economics as the infusion of absolutely insane amounts of venture capital is papering over the actual costs. You may be starting to care about the secondary economical impacts of the bubble, as prices for RAM and other components are sky rocketing.

You may disagree on some, or even all, of these points, which will change your evaluation of this technology.

The benefits you gain from this technology also depend very much on what you are using it for. The more data about a certain topic the LLM has ingested, the better the recommendations will be. When you ask it for code to build web solutions and related automations, the recommendations will be much better than when you ask it about building package installers for macOS, since there are orders of magnitude more data for the former, than the latter.

The agent was very prone to inventing options for pkgbuild, productbuild and notarytool, even after I had instructed it to consider the man pages. This is a very important warning for people using agents to write automations in the Mac Admins space. Also, for the same reason, LLMs are “weak” on recent developments, so you may get code that would have worked fine five years ago, but doesn’t take modern changes to macOS and Apple platform deployment into account.

I am glad I did this experiment. For the first time, working with the agent felt really useful. I am not sure I would have ever overcome the writer’s block inherent in the tedious process of translating code. Using the agent to overcome that block was freeing. I experienced the wonder of a fascinating new technology. I can see how that can overshadow the concerns.

I believe the technology has merit. There is undoubtedly a usefulness to it. But in the current form, I think it is irresponsible to focus solely on the technical features and ignore all the other negative side effects. The benefits, when put under scrutiny are much smaller than they initially appear.

I have to hope that society will eventually find a way to build and use these tools in an effective, ethical, and responsible way. I don’t believe this is the case today. I don’t think the benefits outweigh the downsides. For now, I will continue to stay away.