The nightmare of retrofitting secrets management

A startup CTO on what a secrets management retrofit actually requires: configuration tangles, production risk, and why an afternoon of setup at five engineers becomes months of effort at fifty.

By Tim Olshansky, Co-Founder, Fencer

Secrets management is one of those things that's genuinely easy to get right at the start and genuinely hard to fix later. Not "hard" in the sense that the right approach is complicated (it isn't) but hard in the sense that once your application is running in production, once you have engineers who've been doing things a certain way, once your infrastructure settings are baked into how your product runs, changing all of that is a significant project.

I learned this lesson the hard way at a previous startup. I had to rebuild how we were managing the application's settings from the ground up. The situation was dire: secrets living in places they shouldn't, engineers using infrastructure providers in a way that sort of worked but left things exposed, and no secrets vault in sight. Getting from that state to something solid wasn't a weekend task. It required re-architecting parts of how the product ran, convincing engineers to change longrunning habits, and doing all of it without taking down a live application that our customers depended on.

If you're reading this before you've shipped anything significant, that's exactly where you want to be. The decisions you make in the next hour are worth weeks of migration work later.

What "doing it wrong" actually looks like

The typical setup I see is a combination of three things, usually all present at the same time.

Development secrets in source code

On its own, a dev credential in the codebase isn't the most critical exposure (it's probably not a production secret). The problem is what the pattern communicates to the team: secrets belong in code. Once that norm exists, it's only a matter of time before someone applies the same habit to a production credential, because nothing in the environment has signaled otherwise. The concern isn't the dev secret itself; it's the precedent that makes a worse mistake more likely.

Secrets in environment variables on PaaS platforms

The access controls here are usually weaker than engineers assume. Running on your own VM, this was less of an issue; a value in a process environment has a limited blast radius. When teams move to Vercel, Supabase, or similar platforms and start granting teammates access to manage deployments, those same environment variables become readable by anyone with dashboard access. The permissions model wasn't designed with secrets isolation in mind, and teams don't discover this until someone starts auditing who can see what.

No structured configuration management

The problem isn't only that there's no vault. There's also no decision about where settings live, how they're categorized, or what the rules are for adding new ones. Each engineer adds configuration in whatever way makes sense to them at the time. Over time, settings become coupled to specific files, to deployment scripts, to assumptions that have calcified from being true long enough to go unquestioned. By the time you want to introduce proper secrets management, the task isn't just adding a vault; it's untangling a structure that's been accumulating for years.

The moment you realize you have a problem

When I had to retrofit secrets at a previous startup, the trigger wasn't an internal audit or a security review. An upstream vendor got breached, and we had to rotate our credentials and tokens. That should have been a mechanical task: identify the affected values, update them, and verify propagation. Instead it turned into a scramble, tracing configuration through multiple files and multiple layers to figure out why a rotated secret hadn't been updated everywhere it was supposed to be.

That scramble made the underlying problem clear. Not because we'd been breached ourselves, but because rotating credentials under ordinary conditions was already painful enough to expose how fragile the setup was. If this had been our own breach, or a mistake that required an emergency rotation in the middle of the night, the process would have been disruptive enough to risk an outage.

We decided to address it proactively. The security case was obvious, but there was also a strategic reason: the configuration disorder making secrets rotation painful was the same configuration disorder blocking us from doing infrastructure as code properly. Fixing one meant fixing the other, which reframed the work from security maintenance into foundational investment that would unblock things we actually wanted to do.

What retrofitting secrets management actually requires

You can't skip the normalization step

Years of development had left us with a set of layered files that nested and overrode each other in ways that weren't fully documented and weren't always predictable. We were reading from the environment in some places, from config files in others, and from defaults that had been quietly in place long enough that nobody thought to question them.

The first step wasn't migrating secrets into a vault. It was normalizing the configuration itself: establishing which values lived where, standardizing how the layers worked, and getting to a state where the system was coherent enough to actually migrate. That work took longer than expected, because it required tracing every configuration value through the code to understand where it was used and what it might affect if changed.

Our destination was HashiCorp Vault, which would have fixed the biggest gap: any engineer with production infrastructure access could read all environment variables, including every production credential in the system. That was a security problem and an operational bottleneck. We kept production access limited to a small group precisely because we couldn't limit what they could see once they had it.

What breaks during the migration

The migration was where things fell apart. The configuration layers, built up over years, were tangled in ways we hadn't fully mapped before starting. Changing one value would break something else in a subtle way: something untested, something that depended on an assumption that had been true long enough to become invisible. We spent cycles chasing regressions in build and deployment processes that wouldn't surface immediately, because the failure mode wasn't apparent until something tried to run.

When an outside event forces a rotation

When we had to rotate credentials mid-migration because another upstream vendor was breached, that was the worst stretch. Rotating secrets in a system you're actively restructuring, where the wiring isn't fully understood, while a live application is running and customers depend on it: there is no clean version of that. You get through it, but it costs more time and more caution than it should.

That's the shape of a secrets management retrofit. It's not one project; it's two: the normalization work and then the migration, with external pressure that doesn't check your calendar. If you're weighing whether to do this now or later, that's what "later" actually looks like.

Why it's harder at 30 people than at 3

At three people, changing how secrets work means updating a few local setup files and having a brief conversation. At thirty people, it means coordinating with everyone who touches production, running migrations on live infrastructure, managing the engineers who have working setups they don't want to change, and doing all of it while the product keeps shipping. The security problem is identical at both sizes, but the cost of fixing it is not.

When no one on the team has done this before

A lot of this comes down to whether anyone on the team has been through it before. If you've lived through a painful retrofit, you're far more likely to set things up correctly from the start. Vibe-coded applications compound this: they can reach production fast, and the code they produce regularly embeds architectural assumptions that an experienced engineer would catch. If nobody on the team has dealt with secrets management before, the setup that comes out of that process reflects that inexperience. The faster you ship, the faster those assumptions get baked into how everything runs.

When the framework defaults have never been questioned

Express, Django, and Rails set up the same trap. The single-settings-file default each offers is ergonomic when the project is small: easy to extend, easy to read, and functional enough that nobody questions it. These frameworks also come with built-in tools to detect which environment they're running in, and conditional configuration logic follows from there: different settings, different secrets, different behavior depending on context. The whole structure keeps getting extended, and by the time anyone stops to ask whether it still makes sense, the conditional logic has become load-bearing in how the application runs.

The argument for getting this right early isn't really about security risk in the abstract. It's about what you're signing up for if you don't.

At the start, when you have three engineers, no live customers, and a codebase that fits in one person's head, the right approach takes an afternoon or two. You set up a vault, you decide where secrets live, you build the habit before there's anything to retrofit.

That's the whole trade. A few hours now, or a months-long project later, with all the coordination, the migration risk, and the production anxiety that comes with it.

If you're working through where to start, the Startup Security Field Guide covers how to get secrets management in place early, before there's a live system to work around.

You might also be interested in:

Take Fencer for a spin

See what full-stack security looks like, built for your stage and your stack. 
Connect your tools and get a complete, prioritized security roadmap in minutes.