In this tutorial, you will learn how to use the SQL GROUP BY statement ot group your data into buckets according to their value in a specific column. You will also learn you to use aggregate functions to perform calculations on database data based on the data's category iin the GROUP BY statement.
GROUP BY statements are one of the most important concepts for SQL programmers to understand. They are also fairly complex. Accordingly, be sure to understand the concepts from this lesson before proceeding through this course.
The SQL GROUP BY statement allows us to easily aggregate columns into categories. Let's explore this idea with a simple example from our DVD Rental database.
We saw in our last lecture that within the
film table, there is a column called
rental_rate that contains the rental prices for each movie in the table. The
rental-rate column only holds three different values:
We can treat each of these rental rates as its own category using a GROUP BY statement. Once we have used the GROUP BY statement to create categories, we use an aggregate function to perform calculations on the values within those categories.
Let's consider a generalized example. Here is the syntax for a typical GROUP BY statement:
SELECT category_column, AGG(data_column) FROM table_name GROUP BY category_column;
In this SQL example,
category_column is the column that we are using to define our categories.
data_column is the column that we are performing aggregate functions on, while AGG is a hypothetical aggregate function.
table_name is the table that contains both
It is important to note that a GROUP BY clause must appear right after a FROM statement or a WHERE statement. If this is not the case, your query will return an error.
This will make more sense once we have actually learned some aggregate functions. Let's explore SQL aggregate functions next.
You cannot use the SQL GROUP BY statement without pairing it with an aggregate function. With that in mind, let's learn what aggregate functions are and how they interact with the GROUP BY clause.
The purpose of aggregate functions is to take multiple inputs and return a single output.
PostgreSQL includes a number of aggregate functions. You can read the PostgreSQL documentation for a complete list of aggregate functions
In this section, I will introduce some of the most important aggregate functions and provide examples of how they function using our DVD Rental database. Specifically, I will introduce the following functions:
The SQL AVG function allows you to easily calculate the average value of section of values.
For example, here is how you would calculate the average value of the
rental_rate column from the
film table in our DVD Rental database:
SELECT AVG(rental_rate) FROM film;
Here is what the output of this SQL query looks like in pgAdmin:
As you can see, the average value in the
rental_rate column is $2.98.
We can pair the AVG function with the GROUP BY clause to generate average values within categories.
As an example, consider the
rental_duration column within the
film table. This column contains integer values from 3 to 7 that describe how long a customer can rent a specific movie for.
Let's write a SQL query using GROUP BY that calculates the average value of
rental_rate for movies in each
SELECT rental_duration, ROUND(AVG(rental_rate),2) FROM film GROUP BY rental_duration;
Note that I have also wrapped the AVG function in a ROUND function with a second argument of
2 to make the output of the query more readable.
Here's what the output of this query looks like:
For the other aggregate functions that we explore in this lesson, we will follow a similar recipe of categorizing movies into different buckets based on
rental_duration, and then applying aggregate functions to those categories.
The SQL COUNT function combined with the GROUP BY clause allows you to easily count the number of items contained in each category.
We have already worked with the SQL COUNT function earlier in this course to count all of the rows within a specific column. However, we have not yet paired it with the GROUP BY statement, which has limited its capability.
The COUNT function simply counts the number of rows returned by a given SQL query. This means it does not matter which column the COUNT method is applied to. By convention, it is considered a best practice to simply use
COUNT(*) in any SQL query.
Let's look at an example where we categorize the rows of the
film table based on
rental_duration and count the number of rows in each category:
SELECT rental_duration, COUNT(*) FROM film GROUP BY rental_duration;
Here is the output of this SQL query:
The SQL MAX function allows you to easily determine the maximum value of a column.
As an example, we could determine the maximum value of the
length column of the
film table with the following SQL query:
SELECT MAX(length) FROM film;
Here is the output of this query:
As you can see, the film with the maximum length is 185 minutes long.
The SQL MAX function can be used with GROUP BY to determine the maximum values of different categories. Let's again divide our
film table into categories based on
rental_duration and try and find the longest film within each category.
We can answer this question with the following query:
SELECT rental_duration, MAX(length) FROM film GROUP BY rental_duration;
The output of this code is below:
Like the SQL MAX function, the SQL MIN Function allows you easily find the smallest value within a table column.
As an example, we can find the shortest movie in the
film table with the following query:
SELECT MIN(length) FROM film;
As you can see, the shortest film in our DVD Rental database is 46 minutes long.
Let's explore how we can pair the SQL MIN function with the GROUP BY clause by finding the shortest film in each
rental_duration category. The following query would answer this question for us:
SELECT rental_duration, MIN(length) FROM film GROUP BY rental_duration;
Here is the output of this query:
The SQL SUM function performs similarly to the other aggregate functions explored in this lesson. Namely, it allows us to calculate the sum of either a column (when applied to an entire column) or a category (when combined with the GROUP BY clause).
Here is an example of how we could use the SUM function to calculate how long it would take to watch every movie in the DVD Rental database:
SELECT SUM(length) FROM film;
Here is the output of this query:
As you can surely imagine by now, we can also combine the SUM function with the GROUP BY clause to calculate the sum of different categories within our table.
Here is how you could calculate the sum of the
length column for every
rental_duration category in a SQL query:
SELECT rental_duration, SUM(length) FROM film GROUP BY rental_duration;
Here is the output of this code:
In this lesson, we explored how to use aggregate functions to take in multiple values, perform a calculation, and return a single value. Specifically, we worked with the following aggregate functions:
In the next section of this course, we'll explore the SQL HAVING keyword in detail.