a scale-out solution has required significant engineering effort; the system
developer often needs to handcraft the mechanisms for data partitioning and
reassembly, not to mention the logic to schedule the work across the cluster
and handle individual machine failures.
traditional approaches to scale-up and scale-out have not been widely adopted
outside large enterprises, government, and academia. The purchase costs are
often high, as is the effort to develop and manage the systems. These factors
alone put them out of the reach of many smaller businesses. In addition, the
approaches themselves have had several weaknesses that have become apparent
as scale-out systems get large, or as scale-up systems deal with multiple CPUs,
the difficulties caused by the complexity of the concurrency in the systems have
become significant. Effectively utilizing multiple hosts or CPUs is a very
difficult task, and implementing the necessary strategy to maintain efficiency
throughout execution of the desired workloads can entail enormous effort.
Hardware advances—often couched in terms of Moore's law—have begun to highlight
discrepancies in system capability. CPU power has grown much faster than
network or disk speeds have; once CPU cycles were the most valuable resource in
the system, but today, that no longer holds. Whereas a modern CPU may be able
to execute millions of times as many operations as a CPU 20 years ago would,
memory and hard disk speeds have only increased by factors of thousands or even
hundreds. It is quite easy to build a modern system with so much CPU power that
the storage system simply cannot feed it data fast enough to keep the CPUs
just hinted, taking a scale-up approach to scaling is not an open-ended tactic.
There is a limit to the size of individual servers that can be purchased from
mainstream hardware suppliers, and even more niche players can't offer an
arbitrarily large server. At some point, the workload will increase beyond the
capacity of the single, monolithic scale-up server, so then what? The
unfortunate answer is that the best approach is to have two large servers
instead of one. Then, later, three, four, and so on. Or, in other words, the
natural tendency of scale-up architecture is—in extreme cases—to add a
scale-out strategy to the mix.
software development, though this gives some of the benefits of both
approaches, it also compounds the costs and weaknesses; instead of very
expensive hardware or the need to manually develop the cross-cluster logic,
this hybrid architecture requires both.
As a consequence of this end-game tendency
and the general cost profile of scale-up architectures, they are rarely used in
the big data processing field and scale-out architectures are the de facto