RDBMS vs NOSQL for Data Warehouse

According to the recent IBM research, the 90% of the data that exist on this planet has been created in last two years. We are creating everyday around 2.5 quintillion bytes of data . Data that is coming from everywhere and everything that we do. From doing bank transaction to social networking to social media, every moment we are making this data grow. Bigger , bigger and even more bigger everyday. Now, extracting information out of this data is also becoming interesting. For ages Codd’s twelve rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F. Codd, a pioneer of the relational model for databases, designed to define what is required from a database management system in order for it to be considered relational are NO more required for a lot of application of modern web. Because these rules have been written assuming databases to be Atomic , Consistent , isolation and Durable. Now, we need database that can be V3, the new rule of modern databases :

1. V1 : Volume of data
1. Database should be capable enough to handle the large volume of data starting from several terabytes to several pentabytes.

2. V2 : Velocity : Velocity of data
1. The database should be able to store data without any failure even if the data is coming in with high velocity. The database should be good enough to handle data coming at a speed ranging from several MB per second to several GB per seconds.

3. V3 : Variety of Data
1. The database should be able to handle a wide variety of data. It can be primitive database or can be a complex JSON or any image, it needs to be equally good in handling all type of data.

Over and above these V3 modern database should be easy to scale almost to infinity. But the big question is how to make this ACID to V3 transaction. The question is more important, because sometimes application need to be ACIDic rather then V3idc because :
1. Some application need to be accurate about the time of data arrival, like banks, ticket booking website etc.
2. Some application are more concerned about the velocity at which data is being written to the database like social networking website. The data timestamp does matter much, but data need to be updated faster.
3. Some application want faster access to data and won’t bother about frequent update of database.

So, V3 database are good for you, if you need your database to be more fast at loading / pulling data, i.e. read/write on database are more frequent then the update then you need a database that is following v3 principal. On the other hand if you are developing a application which to more frequent in update then read/write, then ACID based database is more better to use.

Here is a short analysis of performance of MongoDB (NoSQL database) and MySQL (Relational Database) on Debain 64 Bit OS powered with Intel i3 processor with some sample data.

Red : MongoDB
Blue : MySQL

Red : MongoDB
Blue : MySQL

Red : MongoDB
Blue : MySQL

Green : MongoDB
Blue : MySQL

Red : MongoDB
Blue : MySQL

We can see that when it comes to insert and update operation then MongoDB is way faster then MySQL thats why it is sometime more apt for data warehouse, where we generally only insert and read data, rather then any data updates.

NoSQL and SQL

 

NoSQL and SQL Databases – Presentation Transcript

  1. Seminar and Progress ReportA comparison between SQL (Conventional) & NOSQL (WebScale) Databases using various scenarios Gaurav Paliwal 0071641507 B.Tech (Information Technology) 8th Semester
  2. What is NoSQLThis does NOT mean “No SQL”
  3. What is NoSQLThis does mean “Not ONLY SQL”
  4. Why is Not only SQLGoogle once ran off of 40,000 MySQL installations
  5. Why is Not only SQLFacebook was at one point spending $1M per month for specialized database hardware to serve their pictures.
  6. Why is Not only SQLThese unviable solutions led to a re-evaluation of existing database technologies and led to the Not-Only-SQL (NoSQL) movement.
  7. The rise of Not only SQL – 1 Googles Way
  8. The rise of Not only SQL – 11. Google invented for BigTable database.
  9. The rise of Not only SQL – 11. Google invented for BigTable database.2. BigTable maps two arbitrary string values (row key and column key)and timestamp (hence three dimensional mapping) into associatedarbitrary byte array.
  10. The rise of Not only SQL – 11. Google invented for BigTable database.2. BigTable maps two arbitrary string values (row key and column key)and timestamp (hence three dimensional mapping) into associatedarbitrary byte array.3. It is not a relational database and can be better defined as a sparse,distributed multi-dimensional sorted map.
  11. The rise of Not only SQL – 11. Google invented for BigTable database.2. BigTable maps two arbitrary string values (row key and column key)and timestamp (hence three dimensional mapping) into associatedarbitrary byte array.3. It is not a relational database and can be better defined as a sparse,distributed multi-dimensional sorted map.4. BigTable is designed to scale into the petabyte range across “hundredsor thousands of machines, and to make it easy to add more machines tothe system and automatically start taking advantage of those resourceswithout any reconfiguration”.
  12. The rise of Not only SQL – 2 Facebooks Way
  13. The rise of Not only SQL – 21. It is a NoSQL solution that was initially developed by Facebook andpowers their Inbox Search feature.
  14. The rise of Not only SQL – 21. It is a NoSQL solution that was initially developed by Facebook andpowers their Inbox Search feature.2. Jeff Hammerbacher, who led the Facebook Data team at the time, hasdescribed Cassandra as a BigTable data model running on an AmazonDynamo-like infrastructure.
  15. The rise of Not only SQL – 21. It is a NoSQL solution that was initially developed by Facebook andpowers their Inbox Search feature.2. Jeff Hammerbacher, who led the Facebook Data team at the time, hasdescribed Cassandra as a BigTable data model running on an AmazonDynamo-like infrastructure.3. Cassandra is an open source distributed database managementsystem.
  16. The rise of Not only SQL – 21. It is a NoSQL solution that was initially developed by Facebook andpowers their Inbox Search feature.2. Jeff Hammerbacher, who led the Facebook Data team at the time, hasdescribed Cassandra as a BigTable data model running on an AmazonDynamo-like infrastructure.3. Cassandra is an open source distributed database managementsystem.4. It is an Apache Software Foundation top-level project designed tohandle very large amounts of data spread out across many commodityservers while providing a highly available service with no single point offailure.
  17. The rise of Not only SQL – OthersHadoop / HBaseHypertableAmazon SimpleDBMongoDBTerrastoreCouchDBMemcacheDBAnd Many others {{The list is Endless}}.
  18. Benefits of NoSQL Databases 1. Elastic scaling
  19. Benefits of NoSQL Databases 2. Big data
  20. Benefits of NoSQL Databases 3. Goodbye DBAs
  21. Benefits of NoSQL Databases 4. Economics
  22. Benefits of NoSQL Databases 5. Flexible data models
  23. NoSQL comparison with SQL 1. ACID
  24. NoSQL comparison with SQL 2. CAP
  25. NoSQL comparison with SQL 3. Maturity
  26. NoSQL comparison with SQL 4. Support
  27. NoSQL comparison with SQL5. Analytics and business intelligence
  28. NoSQL comparison with SQL 6. Administration
  29. NoSQL comparison with SQL 7. Expertise
  30. NoSQL comparison with SQL Practical “Head On”
  31. NoSQL comparison with SQL Questions