Deploying
a scale-out solution has required significant engineering effort; the system
developer often needs to handcraft the mechanisms for data partitioning and
reassembly, not to mention the logic to schedule the work across the cluster
and handle individual machine failures.
The
traditional approaches to scale-up and scale-out have not been widely adopted
outside large enterprises, government, and academia. The purchase costs are
often high, as is the effort to develop and manage the systems. These factors
alone put them out of the reach of many smaller businesses. In addition, the
approaches themselves have had several weaknesses that have become apparent
over time:
Firstly, as scale-out systems get large, or as scale-up systems deal with multiple CPUs, the difficulties caused by the complexity of the concurrency in the systems have become significant. Effectively utilizing multiple hosts or CPUs is a very difficult task, and implementing the necessary strategy to maintain efficiency throughout execution of the desired workloads can entail enormous effort.
Secondly,
Hardware advances—often couched in terms of Moore's law—have begun to highlight
discrepancies in system capability. CPU power has grown much faster than
network or disk speeds have; once CPU cycles were the most valuable resource in
the system, but today, that no longer holds. Whereas a modern CPU may be able
to execute millions of times as many operations as a CPU 20 years ago would,
memory and hard disk speeds have only increased by factors of thousands or even
hundreds. It is quite easy to build a modern system with so much CPU power that
the storage system simply cannot feed it data fast enough to keep the CPUs
busy.
As just hinted, taking a scale-up approach to scaling is not an open-ended tactic. There is a limit to the size of individual servers that can be purchased from mainstream hardware suppliers, and even more niche players can't offer an arbitrarily large server. At some point, the workload will increase beyond the capacity of the single, monolithic scale-up server, so then what? The unfortunate answer is that the best approach is to have two large servers instead of one. Then, later, three, four, and so on. Or, in other words, the natural tendency of scale-up architecture is—in extreme cases—to add a scale-out strategy to the mix.
In software development, though this gives some of the benefits of both approaches, it also compounds the costs and weaknesses; instead of very expensive hardware or the need to manually develop the cross-cluster logic, this hybrid architecture requires both.
As a consequence of this end-game tendency
and the general cost profile of scale-up architectures, they are rarely used in
the big data processing field and scale-out architectures are the de facto
standard.
Leave Comment