Let’s start with myself, reading a newsletter, sitting on the couch in my lovely living room.
Then chaos started out of nowhere.
At the the exact same second:
on my left, my daughter started whining very loudly about some game she was playing
in front of me, the baby got stuck and couldn’t move back down on his knees
on my right, my husband started venting about his long day at work
All at once.
My brain went into panic mode and shut down.
Nope, no way it could process that explosion of audio and visual inputs. Not without some warm-up, or slow easing into the chaos.
So I did the only I could do to get out of this, though I wasn’t even thinking, just reacting: I shouted STOP STOP STOP
They were so surprised!
And then I went one by one.
Addressed the needs.
Starting with the baby (the one that really couldn’t wait, safety first). Then my daughter, my husband…and the cat (he had joined the queue in the meantime, because why not).
Finally, I sat back down and took a deep breath to calm myself down.
Where am I going with this?
Clearly, I was overwhelmed, going from calm and quiet to chaos in less than a second.
It hit me a bit later when my brain had the time to process and reflect.
My brain was a system. And it got overloaded.
Nothing new here, it can happen to all systems.
What about software systems?
How would that happen? With what consequences? How do they recover? What’s the impact?
Usually, a system overload in software occurs when the system, such as a server, application, or network, receives more requests or demands than it can handle within its capacity or processing power.
This can lead to issues, such as slowed performance, errors, or even crashes.
Key factors:
High Traffic Volume: Excessive simultaneous requests or transactions can overwhelm the system.
Resource Limitations: Insufficient CPU, memory, or bandwidth to handle the load.
Inefficient Code: Poorly optimized software or algorithms that cannot efficiently manage high loads.
Concurrent Processes: Too many parallel processes or threads competing for resources.
Denial of Service (DoS) Attacks: Malicious attempts to overload the system with excessive requests.
It can be only one factor, or sometimes it’s a combination of a few (else where’s the fun!)
How do we see it’s coming?
Few things can happen (hopefully you have some kind of monitoring system to get eyes on this)
Symptoms:
Slow Response Times: The system takes longer than usual to process requests.
Increased Error Rates: More frequent errors or failures in transactions.
System Crashes: Complete shutdown or restart of the system.
High CPU/Memory Usage: Resources are consistently at or near maximum capacity.
Ok, and how do we fix it?
Or even better, how do we don’t even get there in the first place?
What can we do?
Don’t go viral! 😂 (just kidding)
Mitigation Strategies:
Load Balancing: Distributing traffic across multiple servers to prevent any single server from becoming overwhelmed.
Scaling: Adding more resources (vertical scaling) or servers (horizontal scaling) to handle increased load.
Optimization: Improving code efficiency and optimizing database queries.
Caching: Using caches to reduce the load on the backend by storing frequently accessed data in memory.
Rate Limiting: Limiting the number of requests a user or system can make in a given timeframe.
Monitoring: Continuously monitoring system performance to detect and address potential overloads before they become critical.
I’ve seen a few incidents happen in the 3 years I’ve been at my current company.
Usually, it was solved by increasing resources, and then improving our observability and metrics to not get caught again.
Some of them could have been prevented, in our product area, some deadlines are very sensitive and so lead to high traffic for few days. (Know your industry well!)
Thank you for reading!
Adeline.
The Rabbit Hole:
More about Load Shedding. This is completely new to me.
The Serendipity Trap:
I have nothing to share 🥲
How does that happen??
My mind is blank, stuck….I guess I’m still in shock from earlier 😂
To know what’s happening in your system, you need eyes 👀
I've been meaning to learn this tool https://www.svgator.com/ for complex svg animations but I haven't found the time to get started.
If you need some resources to share on the topic of monitoring or better observability I have the article for you https://cloudnativeengineer.substack.com/p/master-observability-with-logs