Host Commentary

For this Conversations episode, I wanted to stay anchored on something I think a lot of teams feel when they talk about “modernizing CI/CD,” but do not always say out loud.

A lot of the time, they are not really asking for a newer tool.

They are asking for a delivery system that is less weird.

Less shared-state nonsense.
Less pipeline tribal knowledge.
Less unpredictability.
Less waiting around for infrastructure quirks to decide whether a build passes or fails.

That is what I liked about Stephane Moser’s story. It is easy to reduce this to “Pipedrive moved from Jenkins to GitHub Actions,” but that misses the point. The real issue was that Jenkins had become painful in ways that compound over time: Groovy was not a natural fit for a team working mostly in TypeScript and Go, shared VMs created noisy-neighbor problems, and the whole thing had become harder to reason about and harder to scale cleanly.

What makes this episode useful is that they did not just swap one logo for another.

They changed the operating model.

They used Kubernetes because it was already a language they knew well. They used Actions Runner Controller because it fit that model. They standardized runner size more aggressively than a lot of people would. They used Karpenter to scale nodes faster. And they brought the same observability mindset they already trusted in production back into the CI environment instead of treating CI like some magical side box that did not need real engineering discipline.

That part hit for me, because a lot of CI conversations still get stuck at the YAML layer.

People argue about pipeline syntax, workflow reuse, or whether GitHub Actions is better than Jenkins or GitLab CI or whatever else. But the deeper issue is whether the system is predictable, isolated, observable, and understandable enough that engineers trust it. That is a much more important bar than whether your pipeline file looks cleaner.

I also liked how pragmatic the migration path was.

They did not begin by trying to move the whole company at once. They started with pull request validation in CodeShip, because it was a smaller, more isolated slice of the bigger problem. That was the wedge. Then they used that work to build toward the bigger platform shift. That is a good pattern in general. Pick the part of the flow that has the lowest blast radius and the clearest upside, and prove it there first.

That same pragmatism shows up again in how they chose tools.

They did not just assume the shiny thing wins. They compared GitHub Actions with Argo Workflows and Tekton on the CI side, and Argo CD with Flux on the deployment side. They even took a shot at Spinnaker and basically decided it was too messy to justify. GitHub Actions won partly because it was easier to customize in languages they already used, and partly because the workflows and logs lived right next to the repo, which meant fewer clicks and less context switching for developers. Argo CD won because of the UI and the ability to show developers useful deployment status without giving them unsafe write access into the cluster.

That is another thing I appreciated here.

Stephane keeps coming back to the developer experience angle, but not in a fluffy way. Not “developer joy” as a slogan. More like, if the system is awkward to use, people will avoid it. If they have to jump between too many tools, they lose context. If they cannot see what is happening, they open tickets or start guessing. So the platform has to be legible. That matters just as much as the underlying architecture.

And then there is the part I really liked.

GitHub itself was not enough.

At their scale, repository-level visibility was not enough. They had hundreds of services, and leadership wanted real answers: what is failing, what is slow, what needs optimization, what is deployment health across the org. So they built their own internal observability and deployment registration layer around GitHub Actions events. That is a very real lesson. Sometimes the vendor product gives you enough to get started, but not enough to operate at scale. If you are serious about platform engineering, you eventually wind up building the missing context layer yourself.

The migration story itself is probably the strongest part of the whole episode.

They dogfooded first. They migrated their own services first. Then they used more platform-savvy internal teams as an open beta. Then they rolled out in batches, starting with lower criticality services and moving upward. And eventually the process got polished enough that teams later in the queue started migrating on their own because they had already watched it happen elsewhere. That is exactly what you want. Not just a migration that technically works, but a migration model that creates confidence and spreads knowledge as it goes.

That ties into something Stephane says near the end that I think is probably the cleanest lesson in the whole conversation.

If you build tools for developers, use them yourself first.

That sounds obvious, but a lot of internal platforms still skip that. They build something for everybody else, but the platform team itself never really lives inside the system the way normal engineers do. Then they wonder why adoption is weird or why the rough edges only show up later. Dogfooding is not just a nice principle. It is one of the fastest ways to find out whether your platform is actually usable.

I also liked that he was honest about what happens when the migration succeeds.

Success creates new load.

Once the system got smooth enough, people trusted it more. Bots started opening PRs for maintenance work. Dependency updates could move automatically. More deployments started happening in parallel. And then they discovered the next problem, which is the platform version of “great, now we have traffic.” They had to think about queueing, fairness, protecting capacity for humans, and fixing the fact that some deployment steps were not actually FIFO. That is such a real platform lesson. Solving one bottleneck does not end the story. It just moves the pressure somewhere else.

The mobile side of the episode was good too, mostly because it shows how messy “just migrate it” can get once you leave the clean happy path.

The mobile team had Mac minis, runner drift, different toolchains, and all the usual weirdness that shows up when physical machines and language-specific build chains get involved. I liked that he approached it almost like a real research project. Test a few hypotheses. Timebox them. See what is actually viable. He tried different directions, including Mac virtualization options, Nix, AWS, and outsourcing the runners, and the answer wound up being more practical than exotic. In their case, GitHub-hosted ended up being cheap enough relative to the engineering time being burned on the old setup. That is a good reminder that the “purest” architecture is not always the best one. Sometimes the right answer is the one that stops wasting expensive human time.

And then there is the AI thread, which I think is interesting here precisely because it was not treated like magic.

Stephane does not present AI as “press button, migration complete.” He uses it more like a force multiplier. Convert flowcharts into first-draft workflows. Help understand Ruby in Fastlane when you do not live in Ruby. Help investigate build failures. Help search for likely causes faster. That feels a lot more believable than the hype version. AI sped parts of the move up, especially in the mobile migration, but it still sat inside a very human process of evaluation, review, correction, and rollout.

So if I had to boil this episode down to one takeaway, it would be this:

A good CI/CD migration is not really about replacing one tool with another.

It is about turning delivery into a product.

That means isolation.
Observability.
Reusable building blocks.
Safer deployment mechanics.
A rollout plan that respects blast radius.
And a user experience good enough that engineers eventually stop needing hand-holding.

That is the part worth copying.

Show Notes

This is a guest conversation episode of Ship It Weekly, separate from the weekly news recaps.

In this Ship It: Conversations episode, I talk with Stephane Moser about Pipedrive’s move from Jenkins to GitHub Actions, building self-hosted runners on Kubernetes, shifting deployments toward GitOps with Argo CD, and what it actually takes to roll out a big CI/CD change across a large engineering org.

We talk about why Jenkins had become painful, from Groovy friction to noisy-neighbor problems on shared VMs, why GitHub Actions fit better, how reusable workflows and custom actions helped, why Argo CD beat out Flux for their use case, and how they had to build better observability and internal deployment visibility around GitHub as they scaled.

The bigger theme here is that this was not just a tooling swap. It was a product and platform migration. Isolation, repeatability, self-service, rollout strategy, and observability mattered just as much as the actual CI/CD tools.

Highlights

• Why Jenkins stopped working well for them: Groovy friction, shared VM contention, and poor predictability

• Replacing CodeShip pull request validation first as the low-blast-radius starting point

• Using Actions Runner Controller on Kubernetes with EKS and Karpenter for self-hosted runners

• Why reusable workflows and custom actions helped cut repetition across hundreds of services

• Choosing Argo CD over Flux, Argo Workflows, Tekton, and even a short Spinnaker attempt

• Moving from push-based deploys toward GitOps for better isolation and safer credentials handling

• Building internal observability because GitHub’s workflow visibility was not enough at their scale

• Dogfooding first, then rolling migration out in batches until teams could self-serve the move

• What broke when the new system actually worked too well: bot-driven deploy volume, queueing, and fairness

• The mobile side of the story: Mac minis, unstable runners, GitHub-hosted runners, and a very different migration path

• How AI sped up parts of the mobile migration and troubleshooting, without making the migration trivial

• Stephane’s advice for big CI/CD shifts: start small, reduce blast radius, and use your own platform first

Stephane’s links

• LinkedIn: https://www.linkedin.com/in/moserss/

• Talk video: https://www.youtube.com/watch?v=VrE1dh-1zEY

• Blog post Part 1: https://medium.com/pipedrive-engineering/so-long-jenkins-hello-github-actions-pipedrives-big-ci-cd-switch-03be29c75f63

• Blog post Part 2: https://medium.com/pipedrive-engineering/all-aboard-the-github-actions-express-pipedrives-big-ci-cd-switch-part-2-fcacf834afd2

• GitHub: https://github.com/moser-ss

Our links

More episodes + show notes + links: https://shipitweekly.fm

On Call Brief: https://oncallbrief.com