npm scripts, Windows and Environment Variables - a hairy, hairy yak

The other day a Headspring colleague asked for help with an weird issue he was running setting up a Github Actions Workflow for Totem and he was getting this error when trying to run the app from within a npm script:

Start-Process : Item has already been added. Key in dictionary: 'NPM_CONFIG_CACHE'  Key being added: 'npm_config_cache'
At line:1 char:1
+ Start-Process 'dotnet' -nnw -argumentlist 'run -p ../Totem/Totem.cspr ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Start-Process], ArgumentException
    + FullyQualifiedErrorId : System.ArgumentException,Microsoft.PowerShell.Commands.StartProcessCommand

In the project.json file he had a npm script defined as:

{ 
    "scripts" : {
        "runtestps": "@powershell Start-Process 'dotnet' -nnw -argumentlist 'run -p ../Totem/Totem.csproj --launch-profile \"Testing\" -v diag'"
    }
}

When he asked for help to troubleshoot this I got immediately nerd snipped to not only find a workaround but to actually get to the root cause of what was causing it.

Troubleshooting

There isn’t a right or wrong way to troubleshoot something and different people have different ways they prefer to do it. The steps that I usually take are roughly:

  1. Make sure I can reproduce the issue with almost to no changes.
  2. Poke around to get myself familiarized with the context, tools, all variables involved, usually bumping log levels and adding a lot of log statements to give me more details of what is going on.
  3. Eliminate as much as possible variables that are not directly related to the issue at hand and that are not required to repro the issue, with the goal of getting to the bare minimum setup still being able to repro the issue
  4. With a minimum repro setup, now is the time to form hipothesis, especulate what might be causing the issue, do a lot of resarch and, hopefully, find the root cause and a solution or workaround to the problem.

For this particular problem I first set out to understand what were all the variables involved in the original setup:

So at this point we have tons of moving parts: GitHub Actions Virtual Environment; the Previous GitHub Actions in the Workflow; Powershell; npm; dotnet run. It was time to bump the logs and see what was going on. In doing so I found out that there were two NPM_CONFIG environment variables setup in the virtual environment before any step ran: NPM_CONFIG_CACHE and NPM_CONFIG_PREFIX.

After some time googling I found an old issue on the npm repo that gave me a north pointing to what could be causing the issue and also led me to a particular sentence in node config docs regarding environment variables, enphasis mine:

Config values are case-insensitive, so NPM_CONFIG_FOO=bar will work the same [as npm_config_foo=bar]. However, please note that inside npm-scripts npm will set its own environment variables and Node will prefer those lowercase versions over any uppercase ones that you might set.

If you go read the issue, the gist of it is that within npm-scripts it will set all npm_config_* environment variables, all using lower case only, and they will supersede any upper case NPM_CONFIG_* environment variable that was previously defined. Although surprising, this is not a big problem per-se, or so I thought. The problem is that it does that, even when running on Windows, and it doesn’t delete the upper case variable, it simply add the lower case one. This become a big problem on Windows because Environment Variables on Windows are **Case Insentitive** and there isn’t even a way with Windows standard tooling to setup duplicate environment variables on Windows.

Reducing the variables

With a hypothesis that I was fairly confident was the root cause for the issue, I went to eliminate all variables that were not directly related to the issue. Removed the dotnet project; the dotnet related GitHub Actions; reduced the build script to a bare minimum; and finally removed GitHub Actions altogether and got to a powershell one-liner that I could repro the issue locally on Windows:

In the directory with this package.json:

{
    "scripts" : {
        "test" : "npm run env && @powershell Get-Item Env:"
    }
}

Run this Powershell one-liner

$Env:NPM_CONFIG_CACHE='c:\temp';npm run test

The script will output the list of environment variables available in the context of npm-script and then would throw the same error when trying to list the environment variables using Powershell.

In the output of npm run env it lists this two environment variables:

NPM_CONFIG_CACHE=C:\temp
npm_config_cache=C:\temp

Again, as we are on Windows, this is not supposed to be a possible state. At this point all bets are off.

Fixing the issue, or at least having a workaround

To recap the cause of the original issue:

I opened an issue with npm/cli to see if they can avoid this creating duplicate env variables on Windows to being with. In the mean time, to work around the issue, we can simply remove the UPPER CASE environment variables in our powershell build script before calling npm run:

gci Env: | where Name -clike 'NPM_CONFIG_*' | remove-item

...

npm run test

And now I can go back to what I was supposed to be doing :)

comments powered by Disqus