Big Data: NoSQL Data Stores

NoSQL database stores are initially considering the notion “Just Say No to SQL” and these were the reactions to the perceived limitations of (SQL-based) relational databases RDBMS. It’s is not like that these people hated SQL, but they were tired of putting square pegs into round holes by rectifying problems that relational databases weren’t actually designed for. A relational database is a very powerful tool, but for several kinds of data (for e.g. key-value pairs, or graphs) and few usage patterns (like extremely large scale storage) a relational database just isn’t practical. Although when it comes to high-volume storage, relational database can be very costly, both in terms of database license costs and hardware costs. (Relational databases are engineered to work with enterprise-grade hardware.) So, with the NoSQL movement, innovative developers and programmers developed dozens of solutions for distinctive types of thorny data storage and processing problems. These NoSQL databases specifically provide massive scalability by the way called “clustering”, and are often architected to enable high throughput and low latency.

The NoSQL group currently available can be broken down into four specific

categories, on the basis of their design and purpose:

Key-value stores:

These kind of data stores provides a mechanism to store any kind of data without having to use a schema. In contrary, in relational databases, we need to define the schema (the table structure) before inserting any data into it. Because key-value stores don’t needs a schema, it enables great flexibility to store data in many formats. In a key-value store, a row (or a data) simply comprised of a key (an identifier) and a value, which can be anything from an integer value to a large binary data string. Several implementations of key-value stores are on the basis of Amazon’s Dynamo paper. Reddis and Riak are widely popular key value pair data store

Column family stores:

Here we have databases in which columns are grouped into column families and stored together on disk. If we speak strictly about it, many of these databases aren’t column-oriented, since they’re based on Google’s BigTable paper, which stores data as a multidimensional sorted map for e.g. Cassendra and CouchDB.

Document stores:

These kinds of data store offering rely on collections of similarly encoded and formatted documents to enhance efficiencies. Document stores empower individual documents in a collection to include only a subset of fields, so only the data that’s required is stored. For complex data like sparse data sets, in which many fields are often not populated, this can translate into significant space savings. In Contrary, empty columns in relational database (RDBMS) tables do take up space. Document stores also provide schema flexibility, since only the fields that are required are stored, and new fields can be added. Again, in contrast to relational databases, table structures and schemas are defined up front before data is stored, and changing columns is a messy task that impacts the entire data set. JSON is a very popular format for Document based data stores which is widely used in MongoDB- A document based NoSQL

Graph databases:

Here we have databases that store graph structures — representations that show collections of objects (vertices or nodes) and their relationships (edges) with each other. These structures empowers graph databases to be extremely well suited for storing complex structures, such as the linking relationships between all known web pages. (For example, individual web pages act as nodes, and the edges connecting those acts as links from one page to another.) Google, of course, is all over graph technology, and implemented a graph processing engine known as Pregel to power its PageRank algorithm. In the Hadoop , there’s an Apache project called Giraph (based on the Pregel paper), which works is a graph processing engine designed to process graphs stored in HDFS. The best example for Graph based data store is Neo4j.

articles

Big Data: NoSQL Data Stores

The NoSQL group currently available can be broken down into four specific

categories, on the basis of their design and purpose:

Key-value stores:

Column family stores:

Document stores:

Graph databases:

zack mathews

Leave Comment

1 Comments

Comments

Liked By