Are Microservices for You? Let's Discover Together.


When That Meetup in Reggio Emilia Changed How I Think About Architecture

I remember the first time I attended a Fullstack Meetup in Reggio Emilia. I was standing in a circle of maybe 12 developers, when a grizzled senior architect started ranting. “You just need two lines of bash,” he said. “Two lines, and you can deploy a monolith that serves millions.”

He wasn’t wrong. But the discomfort in that circle was palpable. We engineers have a beautiful, sometimes toxic trait: we crave complexity. If a solution feels too simple, we assume it’s “junior.” We convince ourselves we need to handle every possible scenario, every hypothetical scaling issue that hasn’t happened yet, every edge case that lives only in our architectural anxiety dreams.

One year and countless production incidents later, I’ve learned a humbling truth:

The Hard Truth

Microservices aren’t a destination, they’re a deliberate choice of problems.

And the question nobody wants to ask is: Is your team actually big enough to justify the pain I’m about to describe?


When Amazon Said “Actually, Never Mind”

You know your industry has a problem when Amazon—the company that literally invented the cloud platform microservices run on—publishes a blog post titled “Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%.”

Their monitoring team had built the “perfect” modern system: serverless, decoupled, distributed. AWS Step Functions orchestrating Lambda functions. The kind of architecture diagram that gets you upvotes on Reddit.

Except it was hemorrhaging money and couldn’t scale.

The problem? Every. Single. Network. Hop. They were serializing data, storing it in S3, retrieving it for the next function, over and over. The orchestration overhead alone was killing them. So they did the unthinkable: they refactored back into a monolith.

The results were staggering:

  • 90% cost reduction (not a typo)
  • Massive scalability improvements from in-memory processing
  • Simpler code no more orchestration layer gymnastics
Tip

Think about this: If Amazon whose business model is literally renting you distributed infrastructure is consolidating services, you need to ask yourself why you’re rushing to fragment yours.


The Hidden Tax of the Distributed Dream

We often talk about the “Microservice Premium” the inherent tax paid in operational complexity and cognitive load. This tax is collected in the small, painful moments of a developer’s day:

1. The Night RabbitMQ Made Me Question My Career Choices

Let me tell you about the time I spent an entire evening debugging what turned out to be… a keepalive timeout.

We had RabbitMQ running using @cloudamqp/amqp-client in Node.js. Everything looked perfect in the logs messages flowing, services communicating beautifully. Then I noticed something weird: the connection was dropping and reconnecting every 2 minutes. Not “roughly 2 minutes.” Not “between 1-3 minutes.” Exactly 120 seconds. Like clockwork.

Here’s what I learned, face-first into CloudAMQP documentation:

The WebSocket Keepalive Problem:

  • The client library was using WebSockets under the hood
  • WebSockets require periodic keepalive packets or they timeout
  • If you’re not actively sending/receiving messages, the connection appears “idle” to intermediate proxies
  • The 2-minute timeout was the proxy giving up on our “dead” connection
The Latency Penalty

RabbitMQ calls add physical distance and network hops. Development feels fast locally, but production bleeds performance.

Code → Network → Queue → Service B (Latency ↑)

And the fun doesn’t stop there:

  • The reconnection dance: Every disconnect triggered a full reconnection cycle re-establishing channels, re-declaring exchanges, re-binding queues. All that overhead, twice a minute.
  • Configuration drift hell: If Service A declares an exchange as durable: true and Service B declares the same exchange with durable: false, RabbitMQ throws:
    PRECONDITION_FAILED
    and shuts down your entire pipeline
Lesson Learned

I once spent three hours tracking down why a service wouldn’t start, only to discover it was an exchange auto-delete flag mismatch. The error message was useless. The documentation was sparse. Stack Overflow had nothing. I felt very alone.

2. The Dependency Saga

The “Technology Diversity” promised by microservices often turns into a polyglot nightmare.

Case in point: The node-canvas library:

  • Installing it requires deep system-level dependencies like Cairo, Pango, and libjpeg
  • If you develop on a Mac but deploy to an Alpine Linux container, pre-built binaries fail due to the libc vs. musl conflict
  • You are forced to install Python, g++, and make into your production images, bloating them and increasing build times from seconds to 10+ minutes
Caution

This isn’t “technology diversity” it’s dependency hell with extra steps.

3. The Windows Container Incident (Or: Why You Should Develop on Linux)

If you’ve never had to run Docker on Windows Server, congratulations. You are blessed.

For the rest of us who’ve lived through it: remember that time you tried to delete a corrupted Docker image and Windows literally said “Access Denied” to the Administrator account?

Here’s the nightmare sequence:

  1. Docker’s windowsfilter directory holds a file lock
  2. Your antivirus also holds a file lock (why? who knows)
  3. The layer becomes corrupted
  4. You try to delete it: Access Denied
  5. You log in as Administrator: Access Denied
  6. You cry a little 😢
  7. You Google for 45 minutes and find a sketchy PowerShell script that forcibly seizes ownership from the SYSTEM account
  8. You run it, praying you don’t brick your entire Docker installation
  9. It works 🎉
  10. You immediately start researching Linux alternatives
Info

I wish I was exaggerating. I still have that PowerShell script bookmarked.

4. Ops is Not Optional

If you adopt microservices, you cannot do manual deployments. CI/CD becomes non-negotiable.

Here’s what happens when you try to manually deploy 12 microservices:

  1. You SSH into the first server and pull the latest Docker image
  2. Service #1 starts successfully
  3. You move to Service #2, realize you forgot to update the environment variables
  4. Service #2 crashes on startup
  5. Service #1 is now calling the old version of Service #2’s API
  6. Users start reporting errors
  7. You fix Service #2, but now Service #3 depends on both, and you’re not sure which version combinations are compatible
  8. It’s 11 PM. You’re still deploying.

We use Azure DevOps, and once we set up self-hosted agents, the quality of life improvement was massive. But there’s a catch: if you self-host agents, you own the build environment. That means:

  • Managing Docker versions across agents
  • Keeping build tools updated (Node, .NET SDK, Python)
  • Debugging why Agent #3 builds successfully but Agent #1 fails with the exact same code
  • Monitoring disk space because Docker layers fill up drives faster than you expect

5. Logging: The Black Hole

In the “old days,” I could SSH into a server and tail -f a log file. Simple. Direct. Effective.

In a microservices world, here’s what debugging a single user request looks like:

  • Step 1: User reports error at 14:23:47
  • Step 2: Check API Gateway logs… request routed to Order Service
  • Step 3: SSH into Order Service container… log says it called Payment Service
  • Step 4: SSH into Payment Service… no obvious errors, but it called Inventory Service
  • Step 5: SSH into Inventory Service… nothing unusual, but wait—timestamps are in UTC, Order Service was in CET, Payment Service was in PST because someone changed the timezone in that Docker image for “testing”
  • Step 6: Spend 45 minutes mentally converting timezones to correlate three log streams
  • Step 7: Finally find the error: a 500ms timeout in Inventory Service that only happens under load

The brutal truth:

  • Without structured logging (JSON), you can’t parse logs programmatically
  • Without correlation IDs, you can’t trace requests across services
  • Without centralized logging (Sentry, Datadog, ELK), you’re playing archaeology with text files

If you don’t invest in observability from Day 1, you’re flying blind. And trust me, manually correlating timestamps across different log streams at 2 AM is a special kind of hell.


The Orchestration Cliff: K8s vs. Rancher

Once you have containers, you must run them. Kubernetes (K8s) has won the war, but it is a “beast” with a vertical learning curve.

K8s vs. Rancher: A Quick Comparison

Feature CategoryManaged K8s (AKS/EKS)SUSE Rancher
Control Plane✅ Managed by cloud provider⚠️ Managed by you (Single Point of Failure)
Developer UI❌ Often slow and disjointed✅ Unified “Cluster Explorer” view
Multi-Cloud❌ Limited to the provider✅ “Single pane of glass” across Azure, AWS, and on-prem

My take: If you’re a small team (< 20 devs), managed K8s (AKS/EKS) is worth the cost. If you need multi-cloud or have strong ops expertise, Rancher gives you more control—but you’re also responsible when things break.

Reality Check

Most developers (myself included at the start) admit they only know enough to “edit some damn YAML files.” This creates a dangerous single-point-of-failure where the entire team is blocked waiting for the one “K8s expert” to come back from vacation and explain why the pod is stuck in CrashLoopBackOff even though it works on my machine.


The “Boring Technology” Philosophy

Dan McKinley’s “Choose Boring Technology” essay is a guiding light. Every company has finite “innovation tokens”.

The Choice:

  • Boring (Postgres, Monoliths, C#): Known failure modes; you can Google the error and find the answer
  • Exciting (Bleeding-edge Service Meshes): “Unknown unknowns” that force you to debug the vendor’s code instead of your business logic
Tip

Spend your innovation tokens wisely. You don’t get many.


The Verdict: Start with Modular Monolith. Seriously.

Here’s my advice, earned through production scars:

Start with a Modular Monolith. Not because you’re being “cautious” or “junior”—because you’re being smart.

The Path I Actually Recommend:

  1. Build a Modular Monolith First: Use strict namespaces, clear boundaries, and domain-driven design principles in a single deployable unit
  2. Treat Boundaries Like APIs: Even within your monolith, design module interfaces as if they might become network calls someday
  3. Invest in Observability Early: Structured logging, correlation IDs, and metrics are valuable whether you’re distributed or not
  4. Deploy it. Ship it. Make money with it.

Then, only if you hit one of these hard constraints:

  • You have 100+ developers and teams are blocking each other on deployments
  • You need genuinely different scaling profiles (your video processing needs 32-core machines while your API runs fine on 2-core instances)
  • You have compliance requirements that mandate physical service isolation

…consider microservices.

But if you go that route, embrace the complexity:

  • You’re signing up for everything I described above: the RabbitMQ debugging sessions, the Docker file permission nightmares, the deployment pipeline complexity
  • You need dedicated DevOps expertise not “Bob who knows Docker”
  • You need a logging/observability platform from Day 1 not “when we get around to it”
  • You need to accept that your junior developers will spend their first month just understanding how to run the entire system locally
The Hard Truth

Microservices don’t make your system simpler. They trade code complexity for operational complexity. Make sure you’re ready for that trade.


Sometimes, the most “senior” decision you can make is to write two lines of bash, deploy a monolith, and go home on time.

Tip

Trust me, your future self, the one getting a phone call because service mesh certificates expired, will thank you.