SQL: Using Aggregate Functions
In previous posts, we have gone over creating databases and tables, how to add, update, and delete data from those tables, and a number of string functions that can be used in select queries. After going over ways of using strings to understand the data, we will look more deeply into aggregate functions to gain a better understanding through the integer columns.
There are a good number of aggregate functions, and we will go over some of the most commonly used ones. There are functions such as SUM()
, AVG()
, MIN()
, MAX()
, and COUNT()
. We will also look into how we can separate the data into different groups for more analysis.
The COUNT() Function
As it might be expected in SQL, the COUNT()
function does exactly what it sounds like. It counts the number of observations in a given selection. For example, if we used the COUNT()
function on the entire data set, the function would return the total number of observations within the dataset.
SELECT COUNT(*) FROM data_table;
This can be used in more queries using the WHERE statement. If we had a table of clients, that data table could have information such as the state and town the individual lives in. By using the WHERE statement with the COUNT()
function, one could understand which states have the most clients. Or a DISTINCT statement could be used with the COUNT()
to determine the number of unique individuals who have made orders in a given time period. These are just some of the possibilities the COUNT()
function can be used for.
The MAX() Function
The MAX()
function does what it sounds like. Given a selection of data, the function returns the max value found in that data. Some simple examples we might use the MAX()
function for are finding the item with the greatest cost in a store or finding the most watched video of a given streamer.The formula for using the function is the following:
SELECT MAX(<column_name>) FROM <table_name>;
The MIN() Function
The MIN()
function is very similar to the previous function, however instead of return the max number, the function returns the opposite: the minimum number. One simple example is if there is a data table of products which has a column for the number of times a product was ordered and we wanted to know which product was selling the least, we can use the MIN()
function. The formula for this function is the same as above, just switching MAX for MIN.
SELECT MIN(<column_name>) FROM <table_name>;
The AVG() Function
The AVG()
function stands for the average of a selected column in a given data set. As a reminder, the average is also considered the arithmetic mean or in other words the sum of the observations divided by the number of observations. You can probably think of multiple ways this might be needed in the analysis of a data set. For example we could use our client table. To sign up for certain services, it is required for the individual to give their birthdate. From that a column of age could be extrapolated and with the use of the AVG()
function, we could find the average age of the users of the services. The formula for this function is just like the ones before, just replacing the function with AVG()
.
SELECT AVG(<column_name>) FROM <table_name>
The SUM() Function
The SUM()
function adds together all the observations in a given column together. A simple example would be using the SUM()
function to find the total number of products are in stock. The formula is in the same format as the previous functions.
SELECT SUM(<column_name>) FROM <table_name>
The GROUP BY Statement
The GROUP BY
statement groups different observations together by similar values into summary rows. An example of this uses the example using the client table from the COUNT()
function. There are clients from different states. The GROUP BY
statement could give a summary of the number of clients in each state. It could look like the following:
SELECT state, COUNT(client_id) FROM client_table GROUP BY state;
The state column would give us the states for each corresponding value. The COUNT function would count the number of clients in each summary row, and the GROUP BY would do the work of grouping together the data into those rows.
Another piece to note is that the data can be grouped by multiple columns by using a comma in between the different columns. For example, if you are working for an international company, the data may need to be grouped first by country and then by the next level such as State or Province.
I hope this post is able to give you a strong foundation for using different aggregate functions to gain a better understanding of a data table. Thanks for reading.
You can find more information about the aggregate functions described here and others in the following link.