SQL Functions
Introduction to Functions in SQL
SQL functions are built-in functions that are used to perform operations on data in a database. They take one or more arguments as input, perform a specific operation, and return a result. Functions can be used in SELECT, WHERE, HAVING, and ORDER BY clauses of a SQL query.
There are many different types of functions in SQL, including aggregate functions, scalar functions, date and time functions, and string functions. Each type of function performs a specific operation on data and returns a result.
Overall, SQL functions are essential for data analysis because they allow you to perform complex calculations, filter data based on specific criteria, clean up messy data, aggregate data to provide insights into trends and patterns, and transform data from one format to another. By mastering SQL functions, you can become a more effective data analyst and make more informed decisions based on your data.
Overview of Different Types of Functions and Examples
Aggregate Functions
Aggregate functions are used to perform calculations on groups of rows and return a single value. Some common aggregate functions are:
COUNT()
: returns the number of rows in a table or the number of non-null values in a column.
-
This query calculates the total salary of all employees in the "salaries" table and assigns the name "total_salary" to the output column.SUM()
: returns the sum of values in a column. -
This query calculates the average salary of all employees in the "salaries" table and assigns the name "avg_salary" to the output column.AVG()
: returns the average value of a column. -
This query finds the highest salary in the "salaries" table and assigns the name "max_salary" to the output column.MAX()
: returns the maximum value in a column. MIN()
: returns the minimum value in a column.
Scalar Functions
Scalar functions are used to perform operations on individual values and return a single value. Some common scalar functions are:
UPPER()
: converts a string to uppercase. This query converts the first name of all employees in the "employees" table to uppercase and assigns the name "upper_first_name" to the output column.-
This query converts the last name of all employees in the "employees" table to lowercase and assigns the name "lower_last_name" to the output column.LOWER()
: converts a string to lowercase. -
LENGTH()
: returns the length of a string.
Date and Time Functions
Date and time functions are used to perform operations on date and time values. Some common date and time functions are:
-
This query extracts the date part from the "hire_date" column of the "employees" table and assigns the name "hire_date_only" to the output column.DATE()
: extracts the date part from a datetime value. -
YEAR()
: returns the year from a date value.
Same MONTH()
function :
String Functions
String functions are used to perform operations on string values. Some common string functions are:
- CONCAT(): concatenates two or more strings together. This query combines the first name and last name columns of the "employees" table and assigns the name "full_name" to the output column.
- LEFT(): returns the leftmost characters of a string. This query returns the first name of all employees in the "employees" table and extracts the first three characters of each name. The name "initial" is assigned to the output column.
- REPLACE(): replaces a substring in a string with another substring. This query returns the email column of the "employees" table and replaces the substring 'gmail' with 'yahoo' in each email address. The name "new_email" is assigned to the output column.
Mix up
Let's take a look at three examples of SQL queries that use a mix of functions on the "employees" database again.
Find the average salary of employees by department, and round the results to two decimal places:
SELECT department, ROUND(AVG(salary), 2) as avg_salary
FROM employees
JOIN dept_emp ON employees.emp_no = dept_emp.emp_no
JOIN departments ON dept_emp.dept_no = departments.dept_no
GROUP BY department;
Don't worry about the JOIN
clause we will get to it in detail later.
Find the top 10 most common first names among employees, and show the number of employees with each name:
SELECT first_name, COUNT(*) as num_employees
FROM employees
GROUP BY first_name
ORDER BY num_employees DESC
LIMIT 10;
Find the number of employees hired in each year, and show the results as a percentage of the total number of employees:
SELECT YEAR(hire_date) as hire_year, COUNT(*) / (SELECT COUNT(*) FROM employees) * 100 as percentage
FROM employees
GROUP BY hire_year;
This query uses the YEAR() function to extract the year from the hire date of each employee, and the COUNT() function to count the number of employees hired in each year. The subquery (SELECT COUNT(*) FROM employees) is used to calculate the total number of employees in the "employees" table. The percentage of employees hired in each year is calculated by dividing the count by the total number of employees and multiplying by 100. The output includes the hire year and the percentage of employees hired in that year.
Find the number of employees born in each month, and sort the results by month:
SELECT MONTH(birth_date) as birth_month, COUNT(*) as num_employees
FROM employees
GROUP BY birth_month
ORDER BY birth_month;
This query uses the MONTH() function to extract the month from the birth date of each employee, and the COUNT() function to count the number of employees born in each month. The GROUP BY clause is used to group the results by birth month, and the ORDER BY clause is used to sort the results by month.
Find the number of employees who were hired in each year, and show the results as a bar chart:
This query uses the YEAR() function to extract the year from the hire date of each employee, and the COUNT() function to count the number of employees hired in each year. The GROUP BY clause is used to group the results by hire year. You can visualize the results as a bar chart in a data visualization tool, such as Tableau or Power BI, to see the distribution of hires over time.
Conclusion
SQL functions are powerful tools that allow you to perform operations on data and return meaningful results. They can be used to calculate aggregate values, manipulate strings, and work with date and time values. By understanding the different types of functions available in SQL and how to use them in queries, you can perform complex data analysis and retrieve valuable insights from your data.