Save up to 25% pkg file size with this weird Monterey trick!

macOS 12 Monterey brings with it a lot of new features, both for admins and users. You will probably be busy learning and experimenting with them right now, unless you already did that during the beta phase.

But, as usual, there are also many undocumented new changes and features hidden away in the update. I discovered one in the pkgbuild man page. pkgbuild, as the name implies, is used by developers and admins to build installer packages, or pkg files. Even when you are using a tool like munki-pkg or AutoPkg, it is probably using pkgbuild to assemble the installer package.

If you want to learn more about pkgbuild and creating installer packages, read my book: “Packaging for Apple Administrators

Large Payloads and Minimum Versions

“Discovered” is a strong word here. I stumbled over this as I was looking for a different new option for pkgbuild in Monterey. In a converstation with the ever awesome Duncan McCracken, he mentioned that the tool had gained an new option, --large-payload, which allows for individual files in the payload to be larger than 8GB.

This is a significant change to the pkg file format, so pkg installers created with this option will not work on systems older than 12.0. “This option requires the user to pass --min-os-version 12.0 or later to acknowledge this requirement.” (quote from the man page)

This comment led me to look for the description of the --min-os-version option and right between --large-payload and --min-os-version I stumbled over --compression.

There is no indication that these are new options in the man page. There is also no mention of any of these new options in the Developer Release Notes or the AppleSeed for IT release notes.

Payload Compression

The description for the --compression option reads:

--compression compression-mode
Allows control over the compression used for the package. This option does not affect the compression used for plugins or scripts. Not specifying this option will leave the chosen compression algorithm up to the operating system. Two compression-mode arguments are supported:

• legacy forces a 10.5-compatible compression algorithm for the package.

• latest enables pkgbuild to automatically select newer, more efficient compression algorithms based on what is provided to [--min-os-version <version>].

With this new option, in combination with the --min-os-version option, we can influence the compression algorithm used for the payload inside the pkg file. Other than that, we are left in the dark. What kind of compression algorithms? And which minimum macOS versions use which compression algorithms?

The man page is silent on this, so we need to experiment!

Lots of pkgs

After some less organized experimentation, I put together this one-liner:

for x in 10.{5..15} 11 12; do caffeinate time pkgbuild --component /Applications/Numbers.app --min-os-version $x --compression latest Numbers-min$x.pkg; done 

This is very, well, compressed, so I will explain it in steps:

Brace expansion

for x in 10.{5..15} 11 12; do

If I used for x in 10 11 12; do the shell would loop through the list using the values 10, 11, and 12. However, I also need eleven versions starting with 10. so I use the ‘brace expansion:’ 10.{5..15} will expand to 10.5, 10.6, 10.7, … until 10.15.

Brace expansion is rarely used but a very useful shell feature.

This for loop will loop through 10.5 through 10.15, and then 11 and 12, as well.

caffeinate

I did not want the MacBook I was testing on to fall asleep during the test. The caffeinate will prevent a Mac from sleeping. Most people use this command as a standalone command where it will prevent the Mac from sleeping indefinitely. Maybe you have used caffeinate -t 3600 to prevent sleep for a certain time.

But you can also use caffeinate together with a second command, and then caffeinate will prevent sleep for as long as the second command is running. For example:

caffeinate system_profiler

will prevent the Mac from falling asleep while system_profiler does its thing, which always seems like it takes ages.

The default mode of caffeinate will only prevent system sleep. The display may still dim, sleep, or even lock. If you want to prevent that as well, use caffeinate -di.

time

I was interested in the duration each pkgbuild run would take. The time command will give you that information. For example, when you run time system_profiler the system_profiler command will run, showing all its output, but the time command will add this at the end:

system_profiler  11.64s user 7.35s system 53% cpu 35.493 total

The first number (sometimes called the ‘real’ time) is the time that elapsed on the clock from the start to end of the process. The ‘user’ is also interesting as it gives the cpu time the process itself was actively running (and not waiting for other processes). Confusingly, the user time may be larger than the real time. This means the process was running on multiple cpus at once.

pkgbuild

And then we have the actual pkgbuild command using the --compression and the --min-os-version to build a pkg installer from the Numbers application on my system. I chose Numbers.app because it is fairly large (589MB) so the compression algorithm has something to do.

Results

I ran this on a MacBook Air M1 with 8GB of RAM. While pkgbuild was doing its work, I kept doing other tasks on the Mac. You can see that the run times vary by a few seconds. The system was experiencing memory pressure as the commands ran. This test wasn’t really very accurate, but the quality of the output, as you will see, is good enough to yield conclusions.

After running the above command I had 13 pkg files. and I created a chart with the output from the time command and the pkg file sizes.

When you chart the resulting filesize against the --min-os-version you see a distinct change in 10.10:

The filesizes will vary by a few bytes, but I presume that stems from different timestamps and a different minimum OS version set in the metadata of each package.

The same chart with the time required to create the pkg file:

When you use a --min-os-version value of 10.10 or higher, the file size drops by about 24% but the creation time (the ‘real’ value) nearly doubles. The ‘user’ time value increases nearly ten-fold and you may wonder how the user time can be so much higher than the ‘real’ time elapsed. The explanation is that the legacy compression algorithm uses only a single core, while the 10.10+ compression algorithm uses all available cores.

From 10.10 (Yosemite) on, the numbers stay fairly constant, all the way to 12.0, so my assumption is the compression algorithm stays the same.

Compression Algorithm

So which compression algorithms are actually used?

To figure this out, I first wanted to know if the file type of the pkg file file itself changed:

> file *.pkg    
Numbers-min10.5.pkg:   xar archive compressed TOC: 709, SHA-1 checksum, contains zlib compressed data
...
Numbers-min11.pkg:     xar archive compressed TOC: 706, SHA-1 checksum, contains zlib compressed data
Numbers-min12.pkg:     xar archive compressed TOC: 708, SHA-1 checksum, contains zlib compressed data
Numbers-minLegacy.pkg: xar archive compressed TOC: 708, SHA-1 checksum, contains zlib compressed data

As you can see the format of the wrapping archive of the pkg installer remains the same. This is probably necessary so that old versions of macOS can read the metadata inside the pkg.

But inside the pkg, is another compressed archive. You can see this when you run

> pkgutil --expand Numbers-min10.5.pkg 10.5Pkg
> ls 10.5Pkg
Bom         PackageInfo Payload
> file 10.5Pkg/Payload    
10.5Pkg/Payload: gzip compressed data, from Unix, original size modulo 2^32 594074112

Up to and including 10.9 the Payload is a gzip archive. From 10.10 upward the file command returns only data:

> file 10.10Pkg/Payload 
10.10Pkg/Payload: data

On an educated guess, I tried to list the contents of the 10.10 payload with the aa command which reads and writes the poorly documented ‘Apple Archive’ format:

> aa list -i 10.10Pkg/Payload    
.
./Numbers.app
./Numbers.app/Contents
./Numbers.app/Contents/_CodeSignature
./Numbers.app/Contents/_CodeSignature/CodeResources
...

(Note: the aa command is available on macOS Big Sur and higher.)

Expanding

When I saw how much more compute intensive the compression was, I was a bit concerned the decompression might be compute intensive on old hardware. To see if that would be a problem I used my 2012 Mac mini (the server model) running Catalina 10.15.7.

Catalina does not have the aa command line tool, (it was added in Big Sur) but pkgutil has an undocumented --expand-full option which will expand the pkg and the payload. So, I used that as a comparison:

> time pkgutil --expand-full Numbers-min10.10.pkg 10.10ExpandFull/

The M1 MacBook Air took 8.97 seconds for this operation, the 2012 Mac mini took 15.46. While that is slower, it is not a dramatic difference.

For comparison, expanding a legacy compressed file took 1.76 seconds (M1 MacBook Air) and 3.98 seconds (Mac mini 2021). The decompression is about four times slower using the new compression.

But what about productbuild?

For admins, component pkgs built with pkgbuild, are sufficient for most tasks, but sometimes developers require distribution packages. Developers generally prefer to build distribution packages.

For details on the differences of the package types and when you need which type, watch my MacDevOps YVR 2021 presentation: The Encyclopedia of Packages

The productbuild command line tool builds distribution packages. Since distribution pkgs can be far more complex than component packages, this tool has many more options. (Read my Packaging book for details.) But it has a similar mode to quickly build a distribution pkg from an app bundle:

> caffeinate time productbuild --component /Applications/Numbers.app NumbersDistDefault.pkg
productbuild: Adding component at /Applications/Numbers.app
productbuild: Inferred install-location of /Applications
productbuild: Wrote product to NumbersDistDefault.pkg
productbuild: Supported OS versions: [Min: 11.0, Before: None]
       28.85 real        22.43 user         3.42 sys

From the timing, we can guess that is creating a legacy compressed payload.

When we dig into the man page for productbuild on Monterey, we find a --component-compression option, which sounds promising. It has three options: legacy, auto, and default. The man page states that default behaves the same as legacy but that “may change in future releases of OS X.”

> caffeinate time productbuild --component /Applications/Numbers.app --component-compression auto NumbersDist.pkg
productbuild: Adding component at /Applications/Numbers.app
productbuild: Inferred install-location of /Applications
productbuild: Wrote product to NumbersDist.pkg
productbuild: Supported OS versions: [Min: 11.0, Before: None]
       44.87 real       207.40 user         8.70 sys

In this case the time suggests this uses the Apple Archive compression. But we didn’t have to provide a minimum OS version. The trick here is in the last output line of productbuild. There we see that productbuild automatically determined a minimum OS version from the application bundle. It reads the LSMinimumSystemVersion key from the app bundle’s Info.plist for this.

This is even more flexible than generically setting a min OS version of 10.10.

However, this will only work with the --component option of productbuild. Usually you have to build the components individually with pkgbuild and combine them with productbuild and in that workflow you will have to provide the minimum OS version for each component. Or determine it dynamically from the source, which is even more flexible.

Conclusion

We have learned that when you use the --compression latest with a --min-os-version of 10.10 or higher the pkg creation uses the Apple Archive compression for the payload, leading to smaller pkg file sizes. I did a few more tests with some other apps and the file compression improvements were between 20% and 25%.

When I set out to explore this, I did not expect a new compression algorithm to be present except maybe in the lastest macOS releases. This would have meant admins (who usually need to support at least two or three versions back) would have had to wait a few years before we could use a new compression algorithm, unless they were pushing the bleeding edge. However, a minimum macOS version of 10.10 means that a large majority of Apple Admins should be able to use this.

Most of the software deployed will have higher system requirements than OS X Yosemite. The minimum OS version for the package should be determined dynamically from the contents, pkgbuild and productbuild will then use the appropriate compression. The --component-compression auto option for productbuild has this dynamic behavior, but it should not be too complicated to add similar logic to your package creation workflows.

You might ask if a ~20-25% reduction in file size is really worth the extra effort of updating your packaging workflows. Since many management systems are now hosted in the cloud, every bit you can save in up- and download might have a noticeable price, if not in bandwidth costs, then in time saved for user downloading the pkg. The savings will be multiplied by the number of clients, which adds up quickly with large fleets.

I think that most effective application of this knowledge would be to have an option in your packaging workflow to use this better compression. For that, the packaging workflow will have to run on Monterey. AutoPkg is the best example of such a workflow, but there are other tools, like Packages.app or munki-pkg which could profit from this, as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.