If you would like to read the other parts in this article series please go to:
- Doing your Database on AWS (Part 1)
- Doing your Database on AWS (Part 2)
- Doing your Database on AWS (Part 4)
In Part 1 of this series, we discussed the evolution of the digital database and the benefits and drawbacks of the relatively new phenomenon of the cloud-based database or DBaaS (Database as a Service). Then we provided a brief overview of Amazon Relational Database Service (RDS), which is the choice for traditional structured databases built on the Structured Query Language (SQL). In Part 2, we continued that discussion with an examination of how to work with an RDS instance in a Virtual Private Cloud (VPC) and how to move a database instance into a VPC if it isn’t in one already.
Saying “No” to SQL
The Structured Query Language (SQL) has been the foundation of traditional databases since it was developed by IBM and then leveraged by Relational Software, which later became Oracle, in the first commercial implementation. SQL was adopted as a standard by ANSI (American National Standards Institute) in 1986 and by ISO (International Organization for Standardization) in 1987. Oracle still holds the top spot in database licensing numbers. Microsoft’s SQL Server was first released in 1989 and has been a popular application for maintaining databases for decades, as well. MySQL is an open source database alternative that gained a lot of ground over the years after its release in 1995; it was acquired by Sun in 2008 and then Sun was acquired by Oracle in 2009.
SQL-based databases have many adherents and millions of installations. However, they are not the best choice for all types of data. Thus the rise of NoSQL – a non-relational database model that allows you to store and retrieve data in ways other than the traditional tabular method. There are a number of different types of NoSQL databases, including document, column, key value and graph, as well as multi-model databases.
NoSQL doesn’t necessarily mean SQL isn’t used; in some implementations the “NO” is interpreted as “Not Only” so in other words those databases can support structured queries. NoSQL has performance advantages in some cases, although relational databases can be faster in other circumstances. NoSQL tends to be more appropriate for big data storage.
Saying “Hello” to Dynamo
DynamoDB is a NoSQL database service that supports two of the NoSQL types: document and key value data models. DynamoDB Local is a small client-side version of DynamoDB can be installed on your local computer if you’re a developer, to make it easier for you to create applications that use the DynamoDB API. This lets you enjoy the performance benefit of working on your local machine and saves you money for data storage and transfer fees while you’re developing the application. You don’t even have to have an Internet connection during the development process. Then when you get ready to deploy it, you can redirect it to DynamoDB. You can download DynamoDB Local as an executable Java archive that will work on Windows, Mac or Linux computers that have the Java Runtime Engine (JRE) v6 or later installed. For more information about DynamoDB Local, see the AWS web site.
With AWS and DynamoDB, you can request a throughput capacity and the resources will automatically be provisioned to achieve that rate. You decide how much capacity you’ll need for read and write operations when you create a table in DynamoDB, and pay a flat hourly rate based on your selected reserved capacity.
In DynamoDB, the database consists of a collection of tables, each of which consists of items, and each item consists of a group of attributes. Unlike a relational database, the tables in DynamoDB don’t have a schema other than a primary key (which we’ll talk about later).
To use DynamoDB, you first need to create the tables for the storage of your data. Because it’s a NoSQL database, different items in your table can have any number of attributes. After you create a table, then you’ll need to write code for retrieving items from the table. If you want to be able to query the table, you’ll need to make the primary key be hash and range type.
If you already have tables that you’ve created in DynamoDB, the console will display a list of those tables. If not, you’ll need to create them. If this is your first time to use DynamoDB and you don’t have any tables created, when you open the DynamoDB console the Amazon DynamoDB Getting Started wizard will appear and you can click the Create Table button to get started. This page also contains links to more information about picking the primary key, setting provisioned throughput and creating a table with alarms.
Clicking the Create Table button will result in the appearance of the Create Table wizard, where you can enter a name for the table and select the type of primary key (hash or hash and range). If you select Hash and Range, you will need to then enter the hash attribute name and type for both hash and range. If you’ve chosen the hash type, you will just need to specify the hash attribute name and attribute type.
With the hash primary key, the key only has a single attribute (hash attribute), upon which DynamoDB creates an unordered hash index. With the hash and range primary key, there are two attributes: the hash attribute and the range attribute. An unordered hash index is created on the hash attribute. A sorted range index is created on the range attribute.
Depending on what kind of table you’re creating, you might need to create a local secondary index to allow you to run queries against an attribute that’s not part of the primary key. You do this by completing the Index Type, Index Range Key and Index Name fields and adding the index to the table.
Next you enter your desired provision throughput as discussed above. You can check the box to Help me estimate Provisioned Throughput or enter your own estimates in the Read Capacity Units and Write Capacity Units fields. Consider item sizes, the expected rate of read and write requests, consistency, and whether you have local secondary indexes. You can find detailed information on estimating throughput capacity on the AWS web site.
You can configure throughput alarms via CloudWatch that will let you know via email when you get to 80 percent of your configured throughput.
When the table status is shown as Active, you’re done. After you have created your tables, you can increase or reduce the read and write throughput values that you specified when you created your tables, using an UpdateTable request.
Items and attributes
An item, in the context of DynamoDB tables, is a group of attributes, with each of the attributes having both a name and at least one value. An attribute can have multiple values (a value set).
Attributes can be of three data types: scalar (number, string, binary, Boolean, null), document (list and map) or multi-value (string set, number set, binary set). The number of attributes that an item can have is limited only by the 400 KB limit on item size.
There is no schema for any of the items in a table except for the primary key.
Reading and writing items are done via the GetItem and PutItem operations. You can also make changes to an existing item’s attributes by using the UpdateItem operation, and you can remove an item with the DeleteIteam operation. You have to specify the entire primary key for each of the operations.
You can also use the GetBatchItem request to read up to one hundred items, and BatchWriteItem can be used to put or delete multiple items. Conditional writes can be used in situations where multiple clients might access and try to make changes to items at the same time, so as to set conditions which must be met before the operation can be completed.
You can learn more about reading and writing items using expressions on the AWS web site.
Query and scan
You can search the data in DynamoDB tables using the Query and Scan operations. Query searches by primary key attribute values and supports a set of comparison operators. You can use filter expressions to narrow the results. The Scan operation examines all items in the table or secondary index and returns all data attributes for every item unless you use the ProjectionExpression parameter to limit the returns.
You can use filters with Scan requests in the same way as with Query requests. You can use the Query and Scan operations on tables or on secondary indexes. You can also use a Limit value to control the number of items in the returned results.
Query is generally faster than Scan because the Scan operation scans the entire table or secondary index and then filters out values, so performance may suffer on large tables and indexes. To learn more about using the Query and Scan operations, see the AWS web site.
In this, Part 3 of this series on AWS database options, we introduced you to Amazon’s NoSQL offering, DynamoDB and provided a brief overview of how you create and work with tables, items and attributes and how to use the Query and Scan operations to search the database. In Part 4, we will take a look at AWS Redshift, designed to handle Big Data.
If you would like to read the other parts in this article series please go to: