Lesson 4: Summarizing and Grouping Data

What Will I Learn in this Lesson?

In this lesson, you will continue your study of SQL. You will practice some new data retrievals from our sample database Sales. The focus in this lesson is on SQL statements that allow you to summarize numerical data and group data.

Reviewing the Example Database

We repeat our example database design diagram here for your easy reference.

Creating Calculated Fields

On occasion we have a need to calculate values based on entries in a database. For example, we may wish to compute the total cost of an order based on the unit price of the item and the number of units ordered. Or we may wish to compute the average price of the items in our Products table. SQL allows us to make calculations based on database entries and create temporary fields in our retrievals. As an example, here's how we'd compute the average price of the items in the Products table.

SELECT AVG(unitprice) AS average_price
FROM Products;

Note that we use the SQL built-in function AVG to do the calculation, then store it as a new field named average_price. When this retrieval is made it returns a single value.

Here's a list of commonly used SQL functions:

Function
Description
AVG()
returns the average value of a column's data
COUNT()
returns the number of rows for a column
MAX()
returns the maximum value for a column
MIN()
returns the minimum value for a column
SUM()
returns the sum of a column's data

Here's how we would compute the total number of items ordered by customer 201.

SELECT SUM(units) AS total_units
FROM Orders
WHERE custnum = 201;

We've used the SQL built-in function SUM to do the calculation, then store it as a new field named total_units. Again, when this retrieval is made it returns a single value. Notice that we can combine calculated fields with WHERE clauses to filter the data that serves as input for the calculation.

Grouping Data

Grouping data allows you to perform summary calculations, such as COUNT() or SUM(), on logical groupings of a data set. For example, we may want to report the number of items ordered by each of our customers.

SELECT custnum, COUNT(*) as cust_units
FROM Orders
GROUP BY custnum;

The SELECT statement specifies two columns, custnum and the calculated field cust_units. The COUNT(*) function counts all the rows in each group defined by the GROUP BY clause. The GROUP BY clause specifies a column name and the database system groups the data by distinct custnum values to perform the various COUNT(*) calculations.

A GROUP BY clause must come after any WHERE clause and before any ORDER BY clause. GROUP BY clauses can contain as many columns as you want. Multiple columns create a nested grouping, but data is summarized only for the last specified column or group.

In addition to grouping data, SQL gives you the power to filter which data to include in your groups. For example, suppose we want to repeat the count of units ordered for customers, but this time we only want to include those customers who made at least 2 orders. We do this by using a HAVING clause in conjunction with our GROUP BY clause.

SELECT custnum, COUNT(*) as cust_units
FROM Orders
GROUP BY custnum
HAVING COUNT(*) >= 2;

Activity 4.1: Summarizing and Grouping Data

In this activity you will again use MySQL to experiment with SQL queries on the Sales example database we defined above.

  1. Open MySQL and then open the database Sales.
  2. Execute the SQL command:

    SELECT AVG(unitprice) AS average_price
    FROM Products;

    Validate the results against the database contents.

  3. Execute the SQL command:

    SELECT SUM(units) AS total_units
    FROM Orders
    WHERE custnum = 201;


    V alidate the results against the database contents.

  4. Execute the SQL command:

    SELECT custnum, COUNT(*) as cust_units
    FROM Orders
    GROUP BY custnum;


    Validate the results against the database contents.

  5. Execute the SQL command:

    SELECT custnum, COUNT(*) as cust_units
    FROM Orders
    GROUP BY custnum
    HAVING COUNT(*) >= 2;


    Validate the results against the database contents. Notice how the retrieved data differs from that in the previous retrieval.

  6. Write, run, and check a SQL query to retrieve the maximum number of units ordered on any order by each of the customers.
  7. Repeat the above query, but find the maximum only for those customers who ordered at least 10 units on some order.
  8. Write, run, and check a SQL query to retrieve the average number of units ordered for each product.
  9. Write, run, and check a SQL query to retrieve the total number of orders for each month.
  10. Write, run, and check a SQL query to retrieve the total number of units ordered for each month for which at least 30 units were ordered.

** End of Lesson 4 **