RDBMS vs NOSQL for Data Warehouse

According to the recent IBM research, the 90% of the data that exist on this planet has been created in last two years. We are creating everyday around 2.5 quintillion bytes of data . Data that is coming from everywhere and everything that we do. From doing bank transaction to social networking to social media, every moment we are making this data grow. Bigger , bigger and even more bigger everyday. Now, extracting information out of this data is also becoming interesting. For ages Codd’s twelve rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F. Codd, a pioneer of the relational model for databases, designed to define what is required from a database management system in order for it to be considered relational are NO more required for a lot of application of modern web. Because these rules have been written assuming databases to be Atomic , Consistent , isolation and Durable. Now, we need database that can be V3, the new rule of modern databases :

1. V1 : Volume of data
1. Database should be capable enough to handle the large volume of data starting from several terabytes to several pentabytes.

2. V2 : Velocity : Velocity of data
1. The database should be able to store data without any failure even if the data is coming in with high velocity. The database should be good enough to handle data coming at a speed ranging from several MB per second to several GB per seconds.

3. V3 : Variety of Data
1. The database should be able to handle a wide variety of data. It can be primitive database or can be a complex JSON or any image, it needs to be equally good in handling all type of data.

Over and above these V3 modern database should be easy to scale almost to infinity. But the big question is how to make this ACID to V3 transaction. The question is more important, because sometimes application need to be ACIDic rather then V3idc because :
1. Some application need to be accurate about the time of data arrival, like banks, ticket booking website etc.
2. Some application are more concerned about the velocity at which data is being written to the database like social networking website. The data timestamp does matter much, but data need to be updated faster.
3. Some application want faster access to data and won’t bother about frequent update of database.

So, V3 database are good for you, if you need your database to be more fast at loading / pulling data, i.e. read/write on database are more frequent then the update then you need a database that is following v3 principal. On the other hand if you are developing a application which to more frequent in update then read/write, then ACID based database is more better to use.

Here is a short analysis of performance of MongoDB (NoSQL database) and MySQL (Relational Database) on Debain 64 Bit OS powered with Intel i3 processor with some sample data.

Red : MongoDB
Blue : MySQL

Red : MongoDB
Blue : MySQL

Red : MongoDB
Blue : MySQL

Green : MongoDB
Blue : MySQL

Red : MongoDB
Blue : MySQL

We can see that when it comes to insert and update operation then MongoDB is way faster then MySQL thats why it is sometime more apt for data warehouse, where we generally only insert and read data, rather then any data updates.

Hibernate Mapping file #tutorial

In this small tutorial I am going to show you how hibernate mapping file is can be created when a SQL relation is given to you.

Let us say you have the following :

CREATE TABLE `feedback_feedback` (
`feedback_id` int(11) NOT NULL AUTO_INCREMENT,
`creator` int(11) NOT NULL,
`subject` varchar(255) COLLATE utf8_bin NOT NULL,
`content` varchar(5000) COLLATE utf8_bin NOT NULL,
`severity` varchar(25) COLLATE utf8_bin NOT NULL,
`comment` varchar(5000) COLLATE utf8_bin DEFAULT NULL,
`status` varchar(25) COLLATE utf8_bin DEFAULT NULL,
`date_created` date NOT NULL,
`date_changed` date DEFAULT NULL,
PRIMARY KEY (`feedback_id`)
)

 

and follwoing POJO :

package org.openmrs.module.feedback;

import java.util.Date;

/*
Pojo file for feedback_feedback relation in Feedback Module
*/

public class FeedbackFeedback  implements java.io.Serializable {

private Integer feedbackId;
private int creator;
private String subject;
private String content;
private String severity;
private String comment;
private String status;
private Date dateCreated;
private Date dateChanged;

/*
Default no arguement constructor
*/

public FeedbackFeedback() {
}

/*
Constructor with all arguments
*/

public FeedbackFeedback(int creator, String subject, String content, String severity, Date dateCreated) {
this.creator = creator;
this.subject = subject;
this.content = content;
this.severity = severity;
this.dateCreated = dateCreated;
}

/*
Default constructor with arguments that can’t be Null
*/

public FeedbackFeedback(int creator, String subject, String content, String severity, String comment, String status, Date dateCreated, Date dateChanged) {
this.creator = creator;
this.subject = subject;
this.content = content;
this.severity = severity;
this.comment = comment;
this.status = status;
this.dateCreated = dateCreated;
this.dateChanged = dateChanged;
}

public Integer getFeedbackId() {
return this.feedbackId;
}

public void setFeedbackId(Integer feedbackId) {
this.feedbackId = feedbackId;
}
public int getCreator() {
return this.creator;
}

public void setCreator(int creator) {
this.creator = creator;
}
public String getSubject() {
return this.subject;
}

public void setSubject(String subject) {
this.subject = subject;
}
public String getContent() {
return this.content;
}

public void setContent(String content) {
this.content = content;
}
public String getSeverity() {
return this.severity;
}

public void setSeverity(String severity) {
this.severity = severity;
}
public String getComment() {
return this.comment;
}

public void setComment(String comment) {
this.comment = comment;
}
public String getStatus() {
return this.status;
}

public void setStatus(String status) {
this.status = status;
}
public Date getDateCreated() {
return this.dateCreated;
}

public void setDateCreated(Date dateCreated) {
this.dateCreated = dateCreated;
}
public Date getDateChanged() {
return this.dateChanged;
}

public void setDateChanged(Date dateChanged) {
this.dateChanged = dateChanged;
}
}

Now you want to create a Hibernate Mapping file for the above given relation then the hibernate mapping file will look somehow like this (explanation is given at the end for important elements) :

<?xml version=”1.0″?>
<!DOCTYPE hibernate-mapping PUBLIC “-//Hibernate/Hibernate Mapping DTD 3.0//EN” “http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd“>
<hibernate-mapping package=”org.openmrs.module.feedback” >
<class name=”org.openmrs.module.feedback.FeedbackFeedback” table=”feedback_feedback” >
<id name=”feedbackId” type=”java.lang.Integer”>
<column name=”feedback_id” />
<generator />
</id>
<property name=”creator” type=”int”>
<column name=”creator” not-null=”true” />
</property>
<property name=”subject” type=”string”>
<column name=”subject” not-null=”true” />
</property>
<property name=”content” type=”string”>
<column name=”content” length=”5000″ not-null=”true” />
</property>
<property name=”severity” type=”string”>
<column name=”severity” length=”25″ not-null=”true” />
</property>
<property name=”comment” type=”string”>
<column name=”comment” length=”5000″ />
</property>
<property name=”status” type=”string”>
<column name=”status” length=”25″ />
</property>
<property name=”dateCreated” type=”date”>
<column name=”date_created” length=”10″ not-null=”true” />
</property>
<property name=”dateChanged” type=”date”>
<column name=”date_changed” length=”10″ />
</property>
</class>
</hibernate-mapping>

Comments :

1.      <class name=”org.openmrs.module.feedback.FeedbackFeedback” table=”feedback_feedback” >
This name element tells to which of the POJO class your relation is mapped , and table elements tells which relation you are currently referring to.

2.      <id name=”feedbackId” type=”java.lang.Integer”>
This tell which object in the class you are referring to and its data type. id tag tell it is a primary key.

3.       <column name=”feedback_id” />
This tell to which attribute in relation your class object will be mapped

4.       <generator />
This tell generator element tells how that element value will be generated.

5.       <property name=”severity” type=”string”>
This tell which object in the class you are referring to and its data type.This is just a atribute in table not a primary key.

For more detailed view see the official page : http://docs.jboss.org/hibernate/core/3.3/reference/en/html/mapping.html

NoSQL and SQL

 

NoSQL and SQL Databases – Presentation Transcript

  1. Seminar and Progress ReportA comparison between SQL (Conventional) & NOSQL (WebScale) Databases using various scenarios Gaurav Paliwal 0071641507 B.Tech (Information Technology) 8th Semester
  2. What is NoSQLThis does NOT mean “No SQL”
  3. What is NoSQLThis does mean “Not ONLY SQL”
  4. Why is Not only SQLGoogle once ran off of 40,000 MySQL installations
  5. Why is Not only SQLFacebook was at one point spending $1M per month for specialized database hardware to serve their pictures.
  6. Why is Not only SQLThese unviable solutions led to a re-evaluation of existing database technologies and led to the Not-Only-SQL (NoSQL) movement.
  7. The rise of Not only SQL – 1 Googles Way
  8. The rise of Not only SQL – 11. Google invented for BigTable database.
  9. The rise of Not only SQL – 11. Google invented for BigTable database.2. BigTable maps two arbitrary string values (row key and column key)and timestamp (hence three dimensional mapping) into associatedarbitrary byte array.
  10. The rise of Not only SQL – 11. Google invented for BigTable database.2. BigTable maps two arbitrary string values (row key and column key)and timestamp (hence three dimensional mapping) into associatedarbitrary byte array.3. It is not a relational database and can be better defined as a sparse,distributed multi-dimensional sorted map.
  11. The rise of Not only SQL – 11. Google invented for BigTable database.2. BigTable maps two arbitrary string values (row key and column key)and timestamp (hence three dimensional mapping) into associatedarbitrary byte array.3. It is not a relational database and can be better defined as a sparse,distributed multi-dimensional sorted map.4. BigTable is designed to scale into the petabyte range across “hundredsor thousands of machines, and to make it easy to add more machines tothe system and automatically start taking advantage of those resourceswithout any reconfiguration”.
  12. The rise of Not only SQL – 2 Facebooks Way
  13. The rise of Not only SQL – 21. It is a NoSQL solution that was initially developed by Facebook andpowers their Inbox Search feature.
  14. The rise of Not only SQL – 21. It is a NoSQL solution that was initially developed by Facebook andpowers their Inbox Search feature.2. Jeff Hammerbacher, who led the Facebook Data team at the time, hasdescribed Cassandra as a BigTable data model running on an AmazonDynamo-like infrastructure.
  15. The rise of Not only SQL – 21. It is a NoSQL solution that was initially developed by Facebook andpowers their Inbox Search feature.2. Jeff Hammerbacher, who led the Facebook Data team at the time, hasdescribed Cassandra as a BigTable data model running on an AmazonDynamo-like infrastructure.3. Cassandra is an open source distributed database managementsystem.
  16. The rise of Not only SQL – 21. It is a NoSQL solution that was initially developed by Facebook andpowers their Inbox Search feature.2. Jeff Hammerbacher, who led the Facebook Data team at the time, hasdescribed Cassandra as a BigTable data model running on an AmazonDynamo-like infrastructure.3. Cassandra is an open source distributed database managementsystem.4. It is an Apache Software Foundation top-level project designed tohandle very large amounts of data spread out across many commodityservers while providing a highly available service with no single point offailure.
  17. The rise of Not only SQL – OthersHadoop / HBaseHypertableAmazon SimpleDBMongoDBTerrastoreCouchDBMemcacheDBAnd Many others {{The list is Endless}}.
  18. Benefits of NoSQL Databases 1. Elastic scaling
  19. Benefits of NoSQL Databases 2. Big data
  20. Benefits of NoSQL Databases 3. Goodbye DBAs
  21. Benefits of NoSQL Databases 4. Economics
  22. Benefits of NoSQL Databases 5. Flexible data models
  23. NoSQL comparison with SQL 1. ACID
  24. NoSQL comparison with SQL 2. CAP
  25. NoSQL comparison with SQL 3. Maturity
  26. NoSQL comparison with SQL 4. Support
  27. NoSQL comparison with SQL5. Analytics and business intelligence
  28. NoSQL comparison with SQL 6. Administration
  29. NoSQL comparison with SQL 7. Expertise
  30. NoSQL comparison with SQL Practical “Head On”
  31. NoSQL comparison with SQL Questions