Big Data: ACID versus BASE
I think back in our school days, almost all of us have studied about difference between “ACID” and “BASE” in chemistry. There we use to distinguish between the two on the basis of certain chemical reactions and compounds (litmus paper test). Here, we again go through the same difference, but this time it will be based on data storing characteristics, properties and design.
One hallmark of relational database systems is something known as ACID compliance. As we might have guessed, ACID is an acronym for the individual letters, that are meant to describe a properties of individual database transactions, can be defined as described in the following list:
· Atomicity: Every database transaction must need to be completely succeeded or completely fail. Partial success is not allowed.
· Consistency: During the database transaction, the RDBMS must progresses from a certain valid state to another. The state is never invalid or unknown.
· Isolation: The user’s database transaction must execute in isolation from other users attempting to transact with the RDBMS.
· Durability: The data operation that was part of the transaction must be reflected in non-volatile storage (computer secondary memory that can retrieve stored information even when not powered – like a hard disk) and have behaviour of persistence after the transaction successfully completes. Transaction failures cannot leave the data in a partially committed state.
A good use use cases for RDBMSs, like online transaction processing, which depend on ACID-compliant transactions between the users and the RDBMS for the system to function correctly. A perfect example of an ACID-compliant transaction is a transfer of funds from a bank account to another. This breaks down into two database transactions, where the originating account expresses a withdrawal, and the destination account expresses a deposit. Obviously, both of these transactions have to be bind together in order to be valid so that if either of them fail, the whole transaction must fail to ensure both balances remain valid.
The important concept behind NoSQL data stores is that not every application truly requires an ACID-compliant transaction. If we relax on certain ACID properties (and moving away from the relational model), we can open a wealth of possibilities, which have enabled many NoSQL data stores to achieve high scalability and performance for their niche applications. ACID defines the key characteristics needed for reliable transaction processing, on contrary; the NoSQL world needs different characteristics to provide flexibility and scalability. These opposing properties are very cleverly captured in the acronym BASE:
· Basically Available: This system is always guaranteed to be available for querying by all clients. (No isolation here.)
. Soft State: The data stored in the system may change due to the eventual consistency model, as explained in the next bullet.
· Eventually Consistent: As data is added to the system, the system’s state is gradually replicated across all nodes. For instance, in Hadoop, when a file is written to the HDFS, the replicas (copies) of the data blocks are created in different data nodes after the original data blocks have been written. For the short period before the blocks are replicated, the state of the file system isn’t consistent.
This acronym BASE is a bit contrived, as majority of NoSQL data stores don’t completely abandon all the ACID properties — it’s not really the polar opposite notion that the name implies, in simple words. Also, the Soft State and Eventually Consistent properties amount to the same thing, but the thing is that by relaxing consistency, the system can be horizontally scaled ( to many nodes) and will also ensure availability.