Tutorials

The 5 SQL date functions biz ops teams need to know | Census

Terence Shin
Terence Shin September 07, 2021

Terence is a data enthusiast and professional working with top Canadian tech companies. He received his MBA from Quantic School of Business and Technology and is currently pursuing a master's in computational analytics at Georgia Tech. Markham, Ontario, Canada

Key learnings from this article: In this post, you'll get a breakdown of the syntax for each of the date functions below, as well as an example of their use cases.

  • DATE_TRUNC()
  • DATE_DIFF()
  • DATE_ADD()/ DATE_SUB()
  • EXTRACT()
  • CURRENT_DATE()

What makes a good SQL expert? There are a lot of things (we’re pretty awesome), but one (among many) of our superpowers is the ability to manipulate data in any way that we like, especially dates.

Date manipulation allows the businesses we support to compare and assess performance across different periods of time. It should come as no surprise this ability informs a variety of top-tier business ops and reporting. Why? Different business functions have different needs across their datasets, including dates. Typically, finance cares about numbers on a quarterly basis, while customer success might care about numbers on a daily, or even hourly, basis. Thus, being able to manipulate dates for different business users is essential.

In this article, we’ll dive into five of the most important and most useful DATE functions in SQL (along with some of the practical business cases you can use them for) including:

  • DATE_TRUNC()
  • DATE_DIFF()
  • DATE_ADD()/ DATE_SUB()
  • EXTRACT()
  • CURRENT_DATE()

To help illustrate our point and anchor our examples into the real data and business worlds, we’ll be using an example scenario throughout. Let’s say we’re an analyst working for a company called Sandy Shores (if you caught our webinar on operational analytics at every stage, you know this company well) that rents beach chairs. We might want to manipulate the dates in our data warehouse so that we can run analysis on the amount of time between a customer’s rentals, and other important data points to help us measure growth.

Let’s dive in.

1. DATE_TRUNC()

DATE_TRUNC() shortens the date to the specified date part so you can aggregate numbers based on different durations (i.e. daily, monthly, annually). DATE_TRUNC is a necessity when conducting cohort analyses, where you typically group users by month.

It has the following syntax:


DATE_TRUNC(date_expression, date_part)

Thinking back to our example company, our fictional analyst might use the code below to truncate the date, June 28, 2021, to the beginning of June, June 1, 2021. In the example below,  we specified date_part = MONTH so the date_trunc() will truncate the date to get the first day of the month. Here’s the code:


DATE_TRUNC('2021-06-28', MONTH) = '2021-06-01'

To give another example, if our analyst specified date_part = YEAR, then they’d get the first day of the year of the specified date. Here’s the code again with this change:


DATE_TRUNC('2021-06-28', YEAR) = '2021-01-01'

Let’s look at some more examples Sandy Shores might run into.

Example 1: Suppose our analyst wanted to get a weekly sum of rental sales from the table below:

To do so, they would run the following code. In the block below, we’re truncating all dates to the beginning of each week, so that we can group together all sales that happen in the same week:


with sales_data as (
SELECT DATE_TRUNC(date, WEEK) as week_date,
daily_sales
FROM sales
)
SELECT week_date,
SUM(daily_sales) as weekly_sales
FROM sales_data
GROUP BY week_date

This would result in:

Now our analyst can confidently tell their business teams which week had the highest profits from chair rentals, which can be used in later analysis. Now, what if our analyst wanted to compare dates to understand the time between rentals? That’s where DATE_DIFF() comes into play.

2. DATE_DIFF()

DATE_DIFF() compares two dates and returns the difference in date_part between them.

For example, if date_part = DAY, then DATE_DIFF() returns the number of DAYs between the two dates. If date_part = MONTH, then DATE_DIFF() returns the number of MONTHs between the two dates.

In the code below, our date_part = DAY, so we’re told that one day passes between January 1, 2021 and January 2, 2021.


DATE_DIFF('2021-01-02', '2021-01-01', DAY) = 1

DATE_DIFF() is useful when you want to compare two dates, for example, when a package was shipped and when a package was delivered, or when a user registered and when they canceled. DATE_DIFF() is also useful in the WHERE clause if you want to filter dates that happened X periods ago (eg. 5 days ago, 2 weeks ago, or last month).

It has the following syntax:


DATE_DIFF(date_expression_1, date_expression_2, date_part)

Let’s take a look at an example that Sandy Shores might run into.

Example 1: Suppose you wanted to get the time between when a rental chair was picked up and when it was returned to a given Sandy Shores location.

To do so, you would run the following query:


SELECT order_id
, DATE_DIFF(date_returned, date_picked_up, DAY) as shipping_time
FROM orders

This would result in:

Example 2: Using the same table above, suppose you wanted to get all orders where the return time was less than 10 days:


SELECT *
FROM orders
WHERE DATE_DIFF(date_returned, date_picked_up, DAY) < 10

This time your results table would look something like this:

Now, our Sandy Shores data analyst has a better understanding of how long different customers use their rental chairs each time. This information can help the company better tailor its prices to the needs of customers, or project how many rental chairs they should keep on hand for new customers on a given day.

Now, what if they want to add a time period to the dates they’re working with? Let’s take a look.

3. DATE_ADD() / DATE_SUB()

DATE_ADD() adds a specified number of date parts to a date. Conversely, DATE_SUB subtracts a specified number of date parts to a date.


DATE_ADD('2021-01-01', INTERVAL 3 DAY) = '2021-01-04'


DATE_SUB('2021-01-04', INTERVAL 3 DAY) = '2021-01-01'


DATE_ADD('2021-01-01', INTERVAL 1 MONTH) = '2021-02-01'


DATE_ADD() and DATE_SUB() can be similarly used like DATE_DIFF() in the WHERE clause to filter dates that happened X periods ago or X periods in the future.

It has the following syntax:


DATE_ADD(date_expression, INTERVAL int64 date_part)


DATE_SUB(date_expression, INTERVAL int64 date_part)

Let’s take another look at a use case that might be helpful for Sandy Shores.

Example 1: Suppose you wanted to add 7 days to the date_picked_up values to get an estimated return date (assuming that Sandy Shores determined that vacationers rent beach chairs for an average of 7 days).

To do so, you would run the following query:


SELECT order_id
, date_picked_up
, DATE_ADD(date_picked_up, INTERVAL 7 DAY) as estimated_date_returned
FROM orders

This would result in the table below:

This function helps Sandy Shores understand when they should expect to see their chairs back, and follow up with customers if their return window exceeds that date. But what if they wanted to get specific about the number of rentals they had on a specific date for a specific location? That’s where EXTRACT() comes in.

4. EXTRACT()

EXTRACT() returns the value that corresponds to the specified date part.

For example, we can see below how we can use this function to pull out a specific part of a full date, such as a month, day, or year.


EXTRACT(DAY FROM '2021-01-03') = 3


EXTRACT(MONTH FROM '2021-01-03') = 1


EXTRACT(YEAR FROM '2021-01-03') = 2021

It has the following syntax:


EXTRACT(part FROM date_expression)

Using EXTRACT() is an easy way to get specific components of a date you’re interested in. For example, if your company does reporting by week numbers, you can use EXTRACT() to get the week number for a given date for a given record. EXTRACT() also allows you to get the month number or year from the date, both of which are useful for machine learning models.

Let’s see how Sandy Shores can use EXTRACT() to get specific about their rental records.

If Sandy Shores wanted to get more insights on the number of sales in each year, or sales by month, EXTRACT() can come in handy because it provides an easy way to group data.

Consider the following table:

To get the year and month from each date, you could run the following query:


SELECT order_id
, date_picked_up
, EXTRACT(YEAR FROM date_picked_up) as year
, EXTRACT(MONTH FROM date_picked_up) as month
FROM orders

This would result in the following output:

With the four date functions above, the analyst at Sandy Shores can now manipulate dates however she likes for business reporting and insights. However, there’s one more useful date function that will allow our analyst to write dynamic queries, which is CURRENT_DATE().

5. CURRENT_DATE()

CURRENT_DATE() returns the (you guessed it) current date in which the query was executed in the specified timezone. Note that the timezone parameter is optional and does not need to be specified, as the default timezone is in UTC.

Using CURRENT_DATE() offers an easier way of referencing today’s date as opposed to a hard-coded date, which is especially useful if it’s a query scheduled on Airflow or a query you use often.

It has the following syntax:


CURRENT_DATE([time_zone])

Here’s an example that Sandy Shores might use.

Example 1: Suppose today is January 14, 2021, and Sandy Shores wanted to get all rentals that were picked up in the past week:

To do so, they would run the following query:


SELECT order_id
, date_picked_up
, date_returned
FROM orders
WHERE DATE_DIFF(CURRENT_DATE(), date_picked_up, DAY) < 7

This would result in the following output:

As we can see in the table above, the function gave us a list of each chair rental that was picked up in the last seven days so we can see which customers are active and how many of our chairs are out enjoying a beach day somewhere.

Following in the steps of Sandy Shores: What next?

Now that you understand how the top five DATE functions in SQL work as well as our fine analysts at Sandy Shores, it’s time to put your knowledge to the test. For you adventurous few, there are two resources I’d recommend checking out:

  1. Leetcode is a website that has hundreds of coding problems that you can solve. One problem called ‘Rising Temperature’ is a good problem that challenges your ability to work with dates.
  2. w3resource is another great resource — in this link, there are 21 problems focused on datetime.

Looking for even more ways to improve your SQL skills? Check out some of the other great resources from the census team here.

Related articles

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps:

Best Practices
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue

Remember when the days of "Dear [First Name]" emails felt like cutting-edge personalization?

Product News
Why Census Embedded?
Why Census Embedded?

Last November, we shipped a new product: Census Embedded. It's a massive expansion of our footprint in the world of data. As I'll lay out here, it's a natural evolution of our platform in service of our mission and it's poised to help a lot of people get access to more great quality data.