Big Data Strategies: Share Nothing Approach
with children will have spent considerable time teaching the little ones that
it's good to share. This principle does not extend into data processing
systems, and this idea applies to both data and hardware.
conceptual view of a scale-out architecture in particular shows individual
hosts, each processing a subset of the overall data set to produce its portion
of the final result. Reality is rarely so straightforward. Instead, hosts may
need to communicate between each other, or some pieces of data may be required
by multiple hosts. These additional dependencies create opportunities for the
system to be negatively affected in two ways: bottlenecks and increased risk of
piece of data or individual server is required by every calculation in the
system, there is a likelihood of contention and delays as the competing clients
access the common data or host. If, for example, in a system with 25 hosts
there is a single host that must be accessed by all the rest, the overall
system performance will be bounded by the capabilities of this key host.
still, if this "hot" server or storage system holding the key data
fails, the entire workload will collapse in a heap. Earlier cluster solutions
often demonstrated this risk; even though the workload was processed across a
farm of servers, they often used a shared storage system to hold all the data.
of sharing resources, the individual components of a system should be as
independent as possible, allowing each to proceed regardless of whether others
are tied up in complex work or are experiencing failures.
software development, this approach of embracing failure is often one of the
most difficult aspects of big data systems for developers to fully appreciate.
This is also where the approach diverges most strongly from scale-up
architectures. One of the main reasons for the high cost of large scale-up servers
is the amount of effort that goes into mitigating the impact of component
failures. Even low-end servers may have redundant power supplies, but in a big
iron box, you will see CPUs mounted on cards that connect across multiple
backplanes to banks of memory and storage systems. Big iron vendors have often
gone to extremes to show how resilient their systems are by doing everything
from pulling out parts of the server while it's running to actually shooting a
gun at it. But if the system is built in such a way that instead of treating
every failure as a crisis to be mitigated it is reduced to irrelevance, a very
different architecture emerges.