RDBMS vs NOSQL for Data Warehouse

According to the recent IBM research, the 90% of the data that exist on this planet has been created in last two years. We are creating everyday around 2.5 quintillion bytes of data . Data that is coming from everywhere and everything that we do. From doing bank transaction to social networking to social media, every moment we are making this data grow. Bigger , bigger and even more bigger everyday. Now, extracting information out of this data is also becoming interesting. For ages Codd’s twelve rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F. Codd, a pioneer of the relational model for databases, designed to define what is required from a database management system in order for it to be considered relational are NO more required for a lot of application of modern web. Because these rules have been written assuming databases to be Atomic , Consistent , isolation and Durable. Now, we need database that can be V3, the new rule of modern databases :

1. V1 : Volume of data
1. Database should be capable enough to handle the large volume of data starting from several terabytes to several pentabytes.

2. V2 : Velocity : Velocity of data
1. The database should be able to store data without any failure even if the data is coming in with high velocity. The database should be good enough to handle data coming at a speed ranging from several MB per second to several GB per seconds.

3. V3 : Variety of Data
1. The database should be able to handle a wide variety of data. It can be primitive database or can be a complex JSON or any image, it needs to be equally good in handling all type of data.

Over and above these V3 modern database should be easy to scale almost to infinity. But the big question is how to make this ACID to V3 transaction. The question is more important, because sometimes application need to be ACIDic rather then V3idc because :
1. Some application need to be accurate about the time of data arrival, like banks, ticket booking website etc.
2. Some application are more concerned about the velocity at which data is being written to the database like social networking website. The data timestamp does matter much, but data need to be updated faster.
3. Some application want faster access to data and won’t bother about frequent update of database.

So, V3 database are good for you, if you need your database to be more fast at loading / pulling data, i.e. read/write on database are more frequent then the update then you need a database that is following v3 principal. On the other hand if you are developing a application which to more frequent in update then read/write, then ACID based database is more better to use.

Here is a short analysis of performance of MongoDB (NoSQL database) and MySQL (Relational Database) on Debain 64 Bit OS powered with Intel i3 processor with some sample data.

Red : MongoDB
Blue : MySQL

Red : MongoDB
Blue : MySQL

Red : MongoDB
Blue : MySQL

Green : MongoDB
Blue : MySQL

Red : MongoDB
Blue : MySQL

We can see that when it comes to insert and update operation then MongoDB is way faster then MySQL thats why it is sometime more apt for data warehouse, where we generally only insert and read data, rather then any data updates.

  • leelakrishna

    I came across this doing some research on NoSQL DWH. This metrics comparison is all nice but where is the Reporting piece of DWH measured. What does it take to build a multi-dimensional OLAP Cube in mongo db and query it multiple times at distint dimesions.? Would be nice to see a comparison there. Mongo discourages joins in favour of denormalized doc stores. How will this workout for an OLAP cube? Just a thought.