Anyone with children will have spent considerable time teaching the little ones that it's good to share. This principle does not extend into data processing systems, and this idea applies to both data and hardware.
The conceptual view of a scale-out architecture in particular shows individual hosts, each processing a subset of the overall data set to produce its portion of the final result. Reality is rarely so straightforward. Instead, hosts may need to communicate between each other, or some pieces of data may be required by multiple hosts. These additional dependencies create opportunities for the system to be negatively affected in two ways: bottlenecks and increased risk of failure.
If a piece of data or individual server is required by every calculation in the system, there is a likelihood of contention and delays as the competing clients access the common data or host. If, for example, in a system with 25 hosts there is a single host that must be accessed by all the rest, the overall system performance will be bounded by the capabilities of this key host.
Worse still, if this "hot" server or storage system holding the key data fails, the entire workload will collapse in a heap. Earlier cluster solutions often demonstrated this risk; even though the workload was processed across a farm of servers, they often used a shared storage system to hold all the data.
Instead of sharing resources, the individual components of a system should be as independent as possible, allowing each to proceed regardless of whether others are tied up in complex work or are experiencing failures.
In software development, this approach of embracing failure is often one of the most difficult aspects of big data systems for developers to fully appreciate. This is also where the approach diverges most strongly from scale-up architectures. One of the main reasons for the high cost of large scale-up servers is the amount of effort that goes into mitigating the impact of component failures. Even low-end servers may have redundant power supplies, but in a big iron box, you will see CPUs mounted on cards that connect across multiple backplanes to banks of memory and storage systems. Big iron vendors have often gone to extremes to show how resilient their systems are by doing everything from pulling out parts of the server while it's running to actually shooting a gun at it. But if the system is built in such a way that instead of treating every failure as a crisis to be mitigated it is reduced to irrelevance, a very different architecture emerges.