Best Practices

Graduating to the Modern Data Stack for Startups | Census

Sylvain Giuliani
Sylvain Giuliani June 08, 2020

Syl is the Head of Growth & Operations at Census. He's a revenue leader and mentor with a decade of experience building go-to-market strategies for developer tools. San Francisco, California, United States

For the past 6 months, I've been helping dozens of companies and newly formed RevOps & Data teams that are hitting the limits of what I call the "Founder Data Stack" (one marketing/sales tool that holds most of your customer data). It usually combines a single tool like Intercom/Hubspot with events forwarded by Segment.

Overall, the founder stack looks something like this 👇

Founder data stack looks like this

🤔 Pretty simple right? So what is the problem?

More People, More Tools, More Data, More Problems

As your company grows, new employees add new tools to the stack to help them do their jobs:

  • Salesforce is added for sales
  • Hubspot, Marketo or Customer.io for marketing automation
  • Pendo for product data analysis
  • Outreach, Salesloft, Reply.io to help with sales engagement
  • People.ai, Mixmax, Yesware, to help with sales productivity
  • Livestorm, Zoom, Crowdcast for webinars
  • Wordpress Ghosts, Gatsby for their blog

... and the list goes on. 6 months later, your stack of tools looks something closer to this.

Data stack tools

This explosion of tools is not a problem in itself. The real issue lies in the fact that all of these dotted lines represent point-to-point integrations. This means that there's no unified control plane for how data gets to all the tools. If something goes wrong, or you need to make a change, nobody knows where to find it. These applications were never designed with this integration outcome in mind.

Even worse, these point-to-point integrations are usually built with custom one-off code, which makes it hard to maintain over time. This mess creates inconsistent data, duplication, and leads ultimately to data you can't trust.

Don’t think you have this problem? Here's some classic symptoms, can you answer no to all of these?

  • The Sales / CS / Support teams have 3-5 tabs open to understand what a customer is doing.
  • One of your tools might say the customer is on a free plan, and the other one shows they are paying $399 / month.
  • You have sent automatic emails to the wrong customers or multiple copies of the same email to one customer.
  • It takes 2+ weeks to run any in-depth report and analysis that uses a mix of product & customer data. By the time you finish the analysis, the source data (usually based on CSV files) is out of date.
Data stack tools tabs

Problem: No Single Source of Truth

Without a single source of truth, you will suffer from garbage in, garbage out across your whole stack. Because of this, teams won't trust their reports & analysis because they think the data is inaccurate. Automations & workflows will misbehave, sending the wrong emails to the wrong person with the incorrect personalized value. Incompatible IDs will generate tons of duplicates, which will lead to more bad data. The result is that you will have a recurring calendar task where you will spend your day cleaning the mess manually across your tools.

The solution is straightforward. As your company and the number of tools grow, you need to maintain a single source of truth that:

  • enforces data consistency
  • syncs data across all of your tools
  • can be queried when in doubt

Solution: The Modern Data Stack

There's already a tool perfectly suited to storing massive amounts of data, that can be queried easily, and is connected to everything. A database or data warehouse. You probably already have one running in your company that you can reuse so you don't need to buy another CRM, CDP, DMP, MAP, or any other acronym.

Building around a data warehouse as part of a modern data stack has additional benefits such as:

  • You own your data. It helps you comply with different regulations.
  • Get value quicker. It is 10x easier to dump historical data in a DB than importing the data in yet-another-tool.
  • Easier to sync with other tools. Databases integrate with everything, contrary to SaaS Tools that have limited APIs (and please don't get me started on APIs like Marketo).
  • Reusability. Other teams in the company can use this trusted source of truth.

In addition to a data warehouse, you will need 4 other key components for your modern data stack:

  1. An event tracking tool. You can continue using Segment here. It does the job well and allows you to collect events across all of your websites & apps.
  2. A data loader. I recommend Fivetran. It’s easy to set up in a couple of clicks and amazingly reliable.
  3. A data modeling tool. DBT is the new power tool here. It allows you to transform and model your data.
  4. An Integration Platform. I’m a 100% biased here, but I recommend using Census. We integrate well with DBT and enable you to sync your clean and unified data models back to all of your other tools.
Diagram of modern data stack

As a bonus, you can replace Amplitude with a BI tool like Mode or Chart.io, which is cheap and as good as Looker.

What do you get?

Done right, this modern data stack will help you centralize all of your data in one accessible place so you can create unified models and sync them to the tools each business team uses. Here's some of the way you can put the resulting data to work:

  • Create segments of users based on features they use to create personalized education campaigns
  • Create Account Health Scores and report in your CRM to help your CSM team prioritize their time
  • Use the same customer view across your tools
  • Notify Sales reps that there is activity in an account that is currently in a trial
  • Retarget on Facebook & Google disengaged users based on actual product data

Bonus, everyone gets to keep using the tools they love.

Finally, despite the look of it, this modern data stack is straightforward to deploy and requires near-zero maintenance. Best of all, it is pretty affordable and costs will scale with you.

How to get started?

I would encourage you to start small. Take one end-to-end data flow and build the modern data stack to solve that use case. It could be pushing aggregate product usage from your web app to Salesforce to give your sales team visibility on product adoption. Or it could be syncing payment data to your marketing automation tool to send emails to your best customers and turn them into champions.

If you have any questions, don't hesitate to contact us. We are happy to do a full review of your existing stack for free!

Or want to check out Census for free for your modern data stack?

Related articles

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps:

Best Practices
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue

Remember when the days of "Dear [First Name]" emails felt like cutting-edge personalization?

Product News
Why Census Embedded?
Why Census Embedded?

Last November, we shipped a new product: Census Embedded. It's a massive expansion of our footprint in the world of data. As I'll lay out here, it's a natural evolution of our platform in service of our mission and it's poised to help a lot of people get access to more great quality data.