Anatomy of a CI Bug - The Great CI Mystery

You know that sinking feeling when you push a seemingly innocent commit after two weeks of relative peace, only to watch your CI pipeline explode in spectacular fashion? Welcome to my world last Tuesday.

I’d joined a project to help some colleagues get it across the finish line. Everything seemed straightforward — just a standard Remix/Vite SPA project. What could go wrong with a simple string fix?

The Crime Scene

The error message that greeted me when i check the CI logs was as cryptic as it was frustrating:

787 bash: line 1: pnpm: command not found
788 Error: **Process completed with exit code 2.**

Pnpm? I stared at the screen in confusion. This project uses npm. I’d been working with package-lock.json files for weeks. Where was this pnpm nonsense coming from?

My detective instincts kicked in. Time to put on the sherlock holmes hat.

Following the Breadcrumbs

First stop: the CI configuration. Maybe someone had recently updated our workflow file? A quick scan of ci.yml revealed we were using:

- uses: jsmrcaga/action-netlify-deploy@v2.2.0

Aha! An outdated action version. The latest was 2.4.0, so naturally, I assumed this was our culprit or somewhere in that area. Version 2.3.0 had added automatic Netlify CLI installation because Ubuntu 24 (ubuntu-latest) GitHub runner image no longer includes it by default. Version 2.4.0 pinned the CLI to version 17.36.1 due to some mysterious bug that the linked issues couldn’t properly explain - dead links everywhere.

But here’s where it got interesting: our CI workflow was already installing the Netlify CLI manually. Someone probably had encountered this issue before and worked around it.

Meanwhile - the original developer’s solution? Just install pnpm in the CI workflow and call it a day.

Problem solved, right? Well, not quite.

The Plot Thickens

While the quick fix worked, I couldn’t shake the feeling that we were treating symptoms rather than the disease. A problem isn’t truly solved until you understand its root cause.

I dove deeper into the GitHub Actions runner images, looking for breaking changes in the past month. There were some updates - default Node versions had changed - but nothing about the Netlify CLI. The action we were using last version update (2.4.0) was in November 2024, well outside our problem window.

That’s when I shifted focus to the Netlify CLI itself. We were installing it without version pinning, which meant we were getting whatever the latest version happened to be.

The Smoking Gun

Digging through the Netlify CLI release notes (version 22.1.1 was current), I found the smoking gun in version 21.0.0 under “Potentially breaking changes”:

“Run build before deploy (#7195)”

The Netlify CLI had started running its own build process before deployment. Since our GitHub Action was already handling the build, we were essentially building the project twice. But that still didn’t explain the pnpm mystery.

The Real Culprit

The final piece of the puzzle lay in the Netlify CLI’s package manager detection logic. It turns out the CLI has an internal mechanism that decides how to handle builds by checking for different lock files. The priority order? It checks for pnpm-lock.yaml first, even if you have a package-lock.json file.

And guess what was lurking in our project root? A forgotten pnpm-lock.yaml file, probably left behind by an earlier developer who had experimented with pnpm and forgotten to clean up after themselves.

The Netlify CLI saw this file and decided, “Oh, this is a pnpm project!” Hence the mysterious pnpm command not found error.

The Aftermath

I shared my findings in our Slack channel, complete with screenshots and links to the relevant changelogs. The team acknowledged the detective work but decided to stick with the pnpm installation workaround “for now.”

The next day, we discovered our staging environment was behaving strangely. The process.env.BASE_URL was pointing to the wrong location. Further investigation revealed that the second build (the one triggered by Netlify CLI) was pulling environment variables from GitHub’s environment settings rather than our CI workflow variables.

The Resolution

In the end, we updated our workflow to work with the Netlify CLI’s new build-first approach rather than fighting against it. We also cleaned up that rogue pnpm-lock.yaml file that had been silently waiting to cause chaos.

Lessons Learned

This debugging adventure reinforced a few key principles:

Always dig deeper than the quick fix

While installing pnpm solved the immediate problem, understanding why it was needed prevented future issues and helped us catch the environment variable problem before it hit production.

Lock your dependencies

If we’d been pinning our Netlify CLI version, this breaking change wouldn’t have caught us off guard.

Clean up after experiments

That forgotten lock file was a ticking time bomb. Always check what files you’re leaving behind when switching tools or package managers.

Trust but verify your CI

Those failed builds from the past two weeks that nobody noticed? They were trying to tell us something was wrong.

The Takeaway

Sometimes the most innocuous commits reveal the deepest problems. In this case, a simple string fix exposed a chain of issues spanning package managers, CI tools, and environment configuration. It’s a reminder that in software development, everything is connected and sometimes the smallest thread can unravel the whole sweater.