Tutorials

Your guide to revenue cohort analysis | Census

Sylvain Giuliani
Sylvain Giuliani January 19, 2021

Syl is the Head of Growth & Operations at Census. He's a revenue leader and mentor with a decade of experience building go-to-market strategies for developer tools. San Francisco, California, United States

Understanding your customers and their behaviors are the pinnacle to any successful  startup, which is exactly what cohort analyses are for. A Cohort Analysis is an extremely useful tool that allows you to gather insights pertaining to customer churn, lifetime value, product engagement, stickiness, and more.

Cohort analyses are especially useful for improving user onboardings, product development, and marketing tactics. What makes cohort analyses so powerful is that they’re essentially a 3-dimensional visualization, where you can compare a value/metric across different segments over time.

By the end of this article, you’ll learn how to create something like this:

If you’re not exactly sure what you’re looking at or why this would be useful, stay tuned and keep reading.

What is a Cohort Analysis?

In order to understand what a cohort analysis is, you first need to understand what a cohort is. A cohort is simply a group of people who have something in common.Therefore, a cohort analysis is simply an analysis of several different cohorts (i.e. groups of customers) to get a better understanding of behaviors, patterns, and trends. If this doesn’t make complete sense yet, keep reading and I promise you it will!

Type of Cohorts

One of the most common types of cohort analyses looks at time-based cohorts which groups users/customers by specific time-frames. For example, a company might want to see how its customers who started using the product or started paying in January compare to February’s customers.

Segment-based cohorts represent groups of customers that used or purchased a particular product or service. For example, you could segment your users based on the amount of time they log into your app on a weekly basis.

Another type of cohort is a sized-based cohort, which is segmenting customers by their monetary value. This is a common practice in the gaming industry (free users vs whale) or in the SaaS world by segmenting customers by their LTV or plan they are on.

For the remainder of this article, we’re going to focus solely on implementing Time-Based, Revenue Cohort Analyses.

Required Data for a Cohort Analysis

Before starting a cohort analysis, the following data is required:

  1. Revenue data linked to the data of purchase
  2. A unique identifier for your users, like a customer ID or account ID
  3. An initial start date for each user, whether that be the signup date or the first payment.

For example, you might have a table that looks similar to this:

When conducting a cohort analysis in SQL, the goal is to generally manipulate the data in a particular format and then ingest the result into a BI tool, like Sisense, Mode, Looker, etc.

Say we were working with the table above, we could use SQL to manipulate the data in the following fashion:

1. Create Cohorts

First, we want to bucket the users into cohorts - in this case, we want to bucket them by their Order Date.


with user_cohorts as (
    SELECT  customerId
            , MIN(DATE_TRUNC('MONTH', orderDate)) as cohortMonth
    FROM orders
    GROUP BY 1
)

Next we want to create an order_month variable. For example, a payment that a customer made one month after their initial payment would have an order_month of 2.


with order_month as (
    SELECT  customerId
            , DATE_DIFF(
                MONTH,
                cohortMonth,
                DATE_TRUNC('MONTH', orderDate)
            ) as month_number
            , SUM(revenue) as revenue
    FROM orders
    LEFT JOIN user_cohorts USING(customerId)
    GROUP BY 1, 2
)

Taking a step back, we can now aggregate revenue by cohortMonth which was created in the first step. This will allow us to create our rentention_table.


with cohort_size as (
    SELECT  sum(revenue) as revenue
            , cohortMonth
    FROM orders
    LEFT JOIN user_cohorts USING (customer_id)
    GROUP BY 1
    ORDER BY 1
)

with retention_table as (
    SELECT  c.cohortMonth
            , o.month_number
            , sum(revenue) as revenue
    FROM order_month o
    LEFT JOIN user_cohorts c USING (customerId) 
)

SELECT  r.cohortMonth
        , s.revenue as totalRevenue
        , r.month_number
        , r.revenue / s.revenue as percentage
FROM retention_table r
LEFT JOIN cohort_size s USING (cohortMonth)
WHERE r.cohortMonth IS NOT NULL
ORDER BY 1, 3

And now, you should have your data in a format that is digestible by a business intelligence tool that supports cohort analyses (or there is always Excel 🤣)

Here are some tips to help you navigate this visualization:

  • Each row represents a cohort. In this case, a cohort represents a given month-year.
  • The cohort size (bar on the left) shows the baseline revenue for each cohort, aka the total revenue from month 0.
  • The percentages are relative to each cohort’s revenue in month 0.

The Value in Conducting a Cohort Analysis

“A picture equals a thousand words”

Cohort analyses are not only useful to measure and evaluate revenue related trends such as NET MRR retention, Customer Churn, Lifetime revenue, etc but it can also help with the following:

  • User behavior: a cohort analysis allows you to understand the lifetime of a cohort, and so, it allows you to get a better understanding of how customers engage with a business over their lifetime. In the example above, we can immediately notice that sales are repeat sales are generally higher around Christmas.
  • Compare groups: whether you segment cohorts by time periods or products or something else, a cohort analysis allows you to compare different groups. Notice in the example above that the first cohort has a much higher percentage of repeat sales compared to the other cohorts. By identifying that, you can then look into possible reasons for why the first cohort is purchasing more.Customer churn & Lifetime value: Lastly, a cohort analysis allows you to calculate useful metrics like churn and lifetime value. By understanding the differences in churn and lifetime value across different cohorts, you can learn from cohorts with a low churn rate and cohorts with a high lifetime value.

What's Next?

With Census, you could send the cohort membership to an email marketing tools or your CRM and take action based on that data. For example, send a reactivation or education campaign to a cohort that is not performing very well. You can do that with no code by using Census. If you are interested to learn more, contact us! Or want to check Census out yourself? Start your free trial.

Related articles

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps:

Best Practices
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue

Remember when the days of "Dear [First Name]" emails felt like cutting-edge personalization?

Product News
Why Census Embedded?
Why Census Embedded?

Last November, we shipped a new product: Census Embedded. It's a massive expansion of our footprint in the world of data. As I'll lay out here, it's a natural evolution of our platform in service of our mission and it's poised to help a lot of people get access to more great quality data.