In the world of database technology, there are two main types of databases: SQL and NoSQL—or,
relational databases and non-relational databases. The difference
speaks to how they’re built, the type of information they store, and how
they store it. Relational databases are structured, like phone books
that store phone numbers and addresses. Non-relational databases are
document-oriented and distributed, like file folders that hold
everything from a person’s address and phone number to their Facebook
likes and online shopping preferences.
We call them SQL and NoSQL, referring to whether or not they’re
written solely in structured query language (SQL). In this article,
we’ll explore what SQL is, how it makes these databases different, and
how each type structures the data it holds so you can easily determine
which type is right for you.
In the world of database technology, there are two main types of databases: SQL and NoSQL—or,
relational databases and non-relational databases. The difference
speaks to how they’re built, the type of information they store, and how
they store it. Relational databases are structured, like phone books
that store phone numbers and addresses. Non-relational databases are
document-oriented and distributed, like file folders that hold
everything from a person’s address and phone number to their Facebook
likes and online shopping preferences.
We call them SQL and NoSQL, referring to whether or not they’re
written solely in structured query language (SQL). In this article,
we’ll explore what SQL is, how it makes these databases different, and
how each type structures the data it holds so you can easily determine
which type is right for you.
SQL: RELATIONAL DATABASES
First, let’s take a look at one of the main features that separates these two systems: the way they structure data. A relational database—or,
an SQL database, named for the language it’s written in, Structured
Query Language (SQL)—is the more rigid, structured way of storing data,
like a phone book. Developed by IBM in the 1970s, a relational database
consists of two or more tables with columns and rows. Each row
represents an entry, and each column sorts a very specific type of
information, like a name, address, and phone number. The relationship
between tables and field types is called a schema. In a relational database, the schema must be clearly defined before any information can be added.
For a relational database to be effective, the data you’re storing in
it has to be structured in a very organized way. A well-designed schema
minimizes data redundancy and prevents tables from becoming
out-of-sync, a critical feature for many businesses, especially those
that record financial transactions. A poorly designed schema can result
in organizational headaches due to its rigidity. For example, a column
designed to store U.S. phone numbers might require 10 digits because
that’s the standard for phone numbers in the U.S. This has the advantage
of rejecting any invalid values (for example, if a number is missing an
area code). However, if you need to change the schema (for instance, if
you need to include an international phone number entry with more than
10 digits), then the entire database needs to be edited. Key takeaway:
excellent organization results in a compromise in flexibility with a
relational database. Structured Query Language (SQL) is a programming
language used by database architects to design relational databases. In
an SQL database like MySQL, Sybase, Oracle, or IBM DM2, SQL executes
queries, retrieves data, and edits data by updating, deleting, or
creating new records. SQL is a lightweight, declarative language that
does a lot of heavy lifting for the relational database, acting like a
database’s version of a server-side script. One particular advantage of
SQL is its simple-yet-powerful JOIN clause, which allows developers to
retrieve related data stored across multiple tables with a single
command.
Another reason SQL databases remain popular is that they fit
naturally into many venerable software stacks, including LAMP and
Ruby-based stacks. These databases are well understood and widely
supported, which can be a major advantage if you run into problems.
Popular SQL databases and RDBMS’s
MySQL—the most popular open-source database, excellent for CMS sites and blogs.
Oracle—an object-relational DBMS written in the C++
language. If you have the budget, this is a full-service option with
great customer service and reliability. Oracle has also released an
Oracle NoSQL database.
IMB DB2—a family of database server products from IBM that are built to handle advanced “big data” analytics.
Sybase—a relational model database server product
for businesses primarily used on the Unix OS, which was the first
enterprise-level DBMS for Linux.
MS SQL Server—a Microsoft-developed RDBMS for enterprise-level databases that supports both SQL and NoSQL architectures.
Microsoft Azure—a cloud computing platform that
supports any operating system, and lets you store, compute, and scale
data in one place. A recent survey even put it ahead of Amazon Web
Services and Google Cloud Storage for corporate data storage.
MariaDB—an enhanced, drop-in version of MySQL.
PostgreSQL—an enterprise-level, object-relational DBMS that uses procedural languages like Perl and Python, in addition to SQL-level code.
NOSQL DATABASES: NON-RELATIONAL & DISTRIBUTED DATA
If your data requirements aren’t clear at the outset or if you’re
dealing with massive amounts of unstructured data, you may not have the
luxury of developing a relational database with clearly defined schema.
Enter non-relational databases, which offer much greater flexibility
than their traditional counterparts. Think of non-relational databases
more like file folders, assembling related information of all types. If a
WordPress blog used a NoSQL database, each file could store data for a blog post: social likes, photos, text, metrics, links, and more.
Unstructured data from the web can include sensor data, social
sharing, personal settings, photos, location-based information, online
activity, usage metrics, and more. Trying to store, process, and analyze
all of this unstructured data led to the development of schema-less
alternatives to SQL. Taken together, these alternatives are referred to
as NoSQL, meaning “Not only SQL.” While the term NoSQL encompasses a
broad range of alternatives to relational databases, what they have in
common is that they allow you to treat data more flexibly.
How do NoSQL databases work? Instead of tables, NoSQL databases are document-oriented.
This way, non-structured data (such as articles, photos, social media
data, videos, or content within a blog post) can be stored in a single
document that can be easily found but isn’t necessarily categorized into
fields like a relational database does. It’s more intuitive, but note
that storing data in bulk like this requires extra processing effort and
more storage than highly organized SQL data. That’s why Hadoop, an open-source computing and data analysis platform capable of processing huge amounts of data in the cloud, is so popular in conjunction with NoSQL database stacks.
NoSQL databases offer another major advantage, particularly to app
developers: ease of access. Relational databases have a fraught
relationship with applications written in object-oriented programming
languages like Java, PHP, and Python. NoSQL databases are often able to
sidestep this problem through APIs, which allow developers to execute
queries without having to learn SQL or understand the underlying
architecture of their database system.
Common Types of NoSQL Databases
Key-value model—the least complex NoSQL option, which stores data in a schema-less way that consists of indexed keys and values. Examples: Cassandra, Azure, LevelDB, and Riak.
Column store—or, wide-column store, which stores
data tables as columns rather than rows. It’s more than just an inverted
table—sectioning out columns allows for excellent scalability and high
performance. Examples: HBase, BigTable, HyperTable.
Document database—taking the key-value concept and
adding more complexity, each document in this type of database has its
own data, and its own unique key, which is used to retrieve it. It’s a
great option for storing, retrieving and managing data that’s
document-oriented but still somewhat structured. Examples: MongoDB, CouchDB.
Graph database—have data that’s interconnected and best represented as a graph? This method is capable of lots of complexity. Examples: Polyglot, Neo4J.
Popular NoSQL Databases
MongoDB—the most popular NoSQL system, especially
among startups. A document-oriented database with JSON-like documents in
dynamic schemas instead of relational tables that’s used on the back
end of sites like Craigslist, eBay, Foursquare. It’s open-source, so
it’s free, with good customer service. Read more in Should You Use MongoDB? A Look at the Leading NoSQL Database.
Apache’s CouchDB—a true DB for the web, it uses the
JSON data exchange format to store its documents; JavaScript for
indexing, combining and transforming documents; and, HTTP for its API.
HBase—another Apache project, developed as a part
of Hadoop, this open-source, non-relational “column store” NoSQL DB is
written in Java, and provides BigTable-like capabilities.
Oracle NoSQL—Oracle’s entry into the NoSQL category.
Apache’s Cassandra DB—born at Facebook, Cassandra
is a distributed database that’s great at handling massive amounts of
structured data. Anticipate a growing application? Cassandra is
excellent at scaling up. Examples: Instagram, Comcast, Apple, and
Spotify.
Riak—an open-source key-value store database
written in Erlang. It has fault-tolerance replication and automatic data
distribution built in for excellent performance.
What database solution is right for you?
Reasons to use a SQL database
When it comes to database technology, there’s no one-size-fits-all
solution. That’s why many businesses rely on both relational and
nonrelational databases for different tasks. Even as NoSQL databases
gain popularity for their speed and scalability, there are still
situations where a highly structured SQL database may be preferable.
Here are a few reasons you might choose an SQL database:
You need to ensure ACID compliancy (Atomicity, Consistency, Isolation, Durability).
ACID compliancy reduces anomalies and protects the integrity of your
database by prescribing exactly how transactions interact with the
database. Generally, NoSQL databases sacrifice ACID compliancy for
flexibility and processing speed, but for many e-commerce and financial
applications, an ACID-compliant database remains the preferred option.
Your data is structured and unchanging. If your
business is not experiencing massive growth that would require more
servers and you’re only working with data that’s consistent, then there
may be no reason to use a system designed to support a variety of data
types and high traffic volume.
Reasons to use a NoSQL database
When all of the other components of your server-side application are
designed to be fast and seamless, NoSQL databases prevent data from
being the bottleneck. Big data is the real NoSQL motivator here, doing
things that traditional relational databases cannot. It’s driving the
popularity of NoSQL databases like MongoDB, CouchDB, Cassandra, and
HBase.
Storing large volumes of data that often have little to no structure.
A NoSQL database sets no limits on the types of data you can store
together, and allows you to add different new types as your needs
change. With document-based databases, you can store data in one place
without having to define what “types” of data those are in advance.
Making the most of cloud computing and storage.
Cloud-based storage is an excellent cost-saving solution, but requires
data to be easily spread across multiple servers to scale up. Using
commodity (affordable, smaller) hardware on-site or in the cloud saves
you the hassle of additional software, and NoSQL databases like
Cassandra are designed to be scaled across multiple data centers out of
the box without a lot of headaches.
Rapid development. If you’re developing within
two-week Agile sprints, cranking out quick iterations, or needing to
make frequent updates to the data structure without a lot of downtime
between versions, a relational database will slow you down. NoSQL data
doesn’t need to be prepped ahead of time.
SQL vs. NoSQL- Which to Use?
The idea that SQL and NoSQL are in direct opposition and competition
with each other is flawed one, not in the least because many companies
opt to use them concurrently. As with all of the technologies I’ve previously discussed,
there really isn’t a ‘one-system-fits-all’ approach; choosing the right
technology hinges on the use case. If your data needs are changing
rapidly, you need high throughput to handle viral growth,
or your data is growing fast and you need to be able to scale out
quickly and efficiently, maybe NoSQL is for you. But if the data you
have isn’t changing in structure and you’re experiencing moderate,
manageable growth, your needs may be best met by SQL technologies.
Certainly, SQL is not dead yet.
The Benefits of NoSQL
When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:
Large volumes of rapidly changing structured, semi-structured, and unstructured data
Agile sprints, quick schema iteration, and frequent code pushes
Object-oriented programming that is easy to use and flexible
Geographically distributed scale-out architecture instead of expensive, monolithic architecture
NoSQL vs. SQL Summary
SQL Databases
NOSQL Databases
Types
One type (SQL database) with minor variations
Many different types including key-value stores, document databases, wide-column stores, and graph databases
Development History
Developed in 1970s to deal with first wave of data storage applications
Developed
in late 2000s to deal with limitations of SQL databases, especially
scalability, multi-structured data, geo-distribution and agile
development sprints
Examples
MySQL, Postgres, Microsoft SQL Server, Oracle Database
MongoDB, Cassandra, HBase, Neo4j
Data Storage Model
Individual
records (e.g., 'employees') are stored as rows in tables, with each
column storing a specific piece of data about that record (e.g.,
'manager,' 'date hired,' etc.), much like a spreadsheet. Related data is
stored in separate tables, and then joined together when more complex
queries are executed. For example, 'offices' might be stored in one
table, and 'employees' in another. When a user wants to find the work
address of an employee, the database engine joins the 'employee' and
'office' tables together to get all the information necessary.
Varies
based on database type. For example, key-value stores function
similarly to SQL databases, but have only two columns ('key' and
'value'), with more complex information sometimes stored as BLOBs within
the 'value' columns. Document databases do away with the table-and-row
model altogether, storing all relevant data together in single
'document' in JSON, XML, or another format, which can nest values
hierarchically.
Schemas
Structure
and data types are fixed in advance. To store information about a new
data item, the entire database must be altered, during which time the
database must be taken offline.
Typically
dynamic, with some enforcing data validation rules. Applications can add
new fields on the fly, and unlike SQL table rows, dissimilar data can
be stored together as necessary. For some databases (e.g., wide-column
stores), it is somewhat more challenging to add new fields dynamically.
Scaling
Vertically,
meaning a single server must be made increasingly powerful in order to
deal with increased demand. It is possible to spread SQL databases over
many servers, but significant additional engineering is generally
required, and core relational features such as JOINs, referential
integrity and transactions are typically lost.
Horizontally,
meaning that to add capacity, a database administrator can simply add
more commodity servers or cloud instances. The database automatically
spreads data across servers as necessary.
Development Model
Mix of open-source (e.g., Postgres, MySQL) and closed source (e.g., Oracle Database)
Open-source
Supports Transactions
Yes, updates can be configured to complete entirely or not at all
In certain circumstances and at certain levels (e.g., document level vs. database level)
Data Manipulation
Specific language using Select, Insert, and Update statements, e.g. SELECT fields FROM table WHERE…
Through object-oriented APIs
Consistency
Can be configured for strong consistency
Depends
on product. Some provide strong consistency (e.g., MongoDB, with
tunable consistency for reads) whereas others offer eventual consistency
(e.g., Cassandra).
0 comments:
Post a Comment