Planning of the capacity of computer systems. It’s hard, but how hard? • The register

0

Survey of regular readers The technology of the 2020s is very forgiving, especially if our processing is done in the cloud. By this we mean that if things start to work sub-optimally, the problem is usually pretty easy to fix.

Right click -> Add storage. Right click -> Add RAM.

Accomplished job.

Which is good, but it leads to the temptation – we don’t do capacity planning because the need to do so seems to be gone.

This is the case throughout computing, of course. We do badly with algorithm design because today’s super-fast processor cores save our bacon through speed. We don’t index our databases correctly because solid state storage saves us when our queries perform full table scans. The point is, we get by with this approach most of the time, but certainly not all of the time.

In this survey – see below – we want to know how well our readers have had to deal with changes in demand for capacity in their systems and, more importantly, how they have handled the capacity planning process. Many of us have had to evolve our systems – especially things like virtual office and VPN services – due to users being sent home to work during COVID-19 lockdowns.

But some organizations will have kept their capacity at around the same levels, and it is likely that some will have reduced their capacity, perhaps by exploiting the opportunities to finally dismantle resource-hungry legacy systems.

Systems perform great in the test environment, but collapse when put into service – often because the production database was ten times the size of the test.

We are also interested in the science of performance and capacity planning. Most of us have come across systems that performed fine in the test environment, but then failed when brought online – often because the production database was ten times the size of the test one. – but did we do something to predict this?

Did we ask users if the app was fast enough during testing? Have we, for that matter, performed electronic measurements of performance and resource usage, or perhaps simulate the actions of hundreds of users with automation tools?

This pen pal was a performance tester in a previous life, and I can confirm how nice it is to know that the app will scale to 250 users thanks to the stats gathered by the test harness which simulated 250 users hammering it at that time. And after commissioning, did we continue to ask users and / or continue our electronic monitoring to assess behavior against expected performance?

And finally, what do we do in the long term? If you designed a user feedback or software-based monitoring regime during development testing, did you continue to use these tools – or something similar – in the medium to long term? Proactive assessment has obvious benefits, especially if systems are at a point where further scaling would require new hardware or increased costs.

Let us know about your approach, warts and all, by participating in our short survey below. There are three questions to answer. We’ll run the survey for a few days, then post a summary on The register after.

Don’t feel bad if you check all of the “we don’t do this” boxes, as there can be many reasons (not least in terms of time and cost) for not having a huge monitoring and surveillance regime. capacity planning. And if you tick all the “we’re doing this with a shovel” boxes, try not to be too smug … ®

JavaScript disabled

Please enable JavaScript to use this feature.

Share.

Comments are closed.