Interviews

Data industry facts according to Jamie Quint | Census

Sylvain Giuliani
Sylvain Giuliani March 18, 2021

Syl is the Head of Growth & Operations at Census. He's a revenue leader and mentor with a decade of experience building go-to-market strategies for developer tools. San Francisco, California, United States

We recently talked to Jamie Quint about his ideal analytics stack for founders. In talking about what tools are best in class right now, the conversation naturally led to what tools will be the best in the future.

The short answer is, nobody can know right now. But during the conversation about what’s possible, Jamie kept returning to two main themes: 1) There is a clear gap in the ideal analytics stack that hasn’t been filled yet; and 2) There is a new responsibility placed on data teams that’s still being defined.

The way Jamie breaks these to two concepts down is useful for anyone building their own analytics stack and for those trying wrap their head around the future of data.

The Functionality of the Ideal Analytics Stack Is Here to Stay

As Jamie described his ideal analytics stack, he was careful to distinguish between the core functionality of the different pieces, and the specific tools he uses right now. Why? Because, he said, that function – what each tool does – will largely remain unchanged, even if the tools themselves do. By investing in best-in-class tools for each function, your stack will be cutting-edge and help you adapt to whatever the future holds.

Jamie breaks down the functionality of each tool in his analytics stack like this:

  1. “Get the data into your warehouse” with Fivetran and Segment.
  2. “Transform the data to be useful” with Snowflake and dbt.
  3. “Analyze the data” with Amplitude and Mode.
  4. “Get the data out into other platforms with Census, where it can be used and utilized to add value to the business.”

But which tools best serve these functions in the future might change. “The functionality of all of these tools is something that people are going to continue to need for an indefinite amount of time,” Jamie says. “The question is always ‘Will a new tool come along that blows the other ones away?’ ... The ideal tech stack always changes over time. Five years ago, the answer would have been different than it is today.”

It’s impossible to predict which tools will be best in class in the future, but, according to Jamie, we do know for certain which tools are best in class right now: Amplitude, Mode, Segment, Fivetran, Snowflake, dbt, and Census. In five years’ time, you can adapt your analytics stack by focusing on which tools best address the permanent functionality you need to fulfill.

Current Challenge: Ensuring Data Quality

The one often overlooked functionality that Jamie admits is missing from his ideal analytics stack is “data quality monitoring and meta analytics on your data.” Addressing the data quality functionality is one of the next big challenges for data teams.

Jamie describes this functionality as answering the question, “Is your data in good shape?” Being able to answer this question has become more and more important in recent years, thanks to privacy regulations, like the CCPA; industry shifts, like Apple’s anti-tracking measures; and the sheer amount of data flowing throughout businesses of all sizes.

Regulations and industry shifts make it more difficult to collect the type of data that can help solve growth marketing issues, like multi-touch attribution or customer profiles. They also make it your responsibility to make sure you’re using the data you do have responsibly.

Not to mention that, with the rise of approaches to data like operational analytics, data is being used in more ways than ever before. You can build your own CDP, prioritize support tickets automatically, do complex growth modeling in Google Sheets, and much more. Across all these use cases, the data you have needs to be accurate.

Companies need a way to monitor the quality of their data to ensure compliance, accuracy, and usability. Tools like MonteCarlo are starting to answer for this function.

The Role of Data Teams Is in Flux

Data teams have long had a core responsibility of managing data infrastructure. But, Jamie says, as access to data spreads around the organization, a new responsibility will emerge that manages how data stakeholders interact with that infrastructure.

According to Jamie, the core responsibility of a data team should be to “ensure that all the data exists in a single place where it can be easily queried. Then, [data stakeholders like the] product team either integrate with that [infrastructure] directly or they'll have embedded data analysts on their teams.” When it comes to the flow of data, there is an exchange between teams—the data team provides data, and the stakeholder team puts it to use.

For large companies, data teams should be more defined and set apart from the rest of the organization. Jamie says that while he was at Reddit, the data team’s responsibility was “data extraction and transformation. Then they’d make the transformed data available to [other] teams or to the data team itself for doing analysis.”

Smaller companies, like when Jamie was at Notion, will most likely have to work with a one- or two-person team serving as both the data and stakeholder teams. He says that as a one-person team at Notion, this collaboration was “singularly encapsulated in my job.

”Either way, there’s a flow of data and communication between the data team and the stakeholder team, explaining what’s needed from and what’s possible with the data. This back-and-forth allows data team to build a tailored analytics stack, but it raises an issue: Who manages the flow of data and communication between the data team and the stakeholders?

Current Challenge: Who Owns the Flow of Data?

In this collaboration between data teams and stakeholder teams, the question of who owns the flow of data and communication remains. There is no easy answer yet, but emerging trends in the world of data, like the role of an analytics engineer, can start to answer for this question.

Jamie explains the challenge of data ownership this way, “There are more people who expect to have access to data that they know exists. Salespeople are like, ‘What is the activity on this count? Is the account growing?’ Now that folks know that that data exists, they just want it and want access to it.”

Data engineers can make sure the data is ready for use, but the sales team needs a way to send requests to the data team.  And the data team needs to communicate to the sales team what answers they can get from the data.

Questions like these sometimes go unanswered, Jamie says, because data leaders sometimes focus too much on the data quality and availability challenge. “People running data teams are thinking more about data quality and ‘Do I have all the data?” and not so much about ‘Are we providing maximum value to other functions in the org?’”

Proving the value of data requires that you understand what stakeholders would find valuable in the first place. There is rarely a formalized way of communicating such things. As Jamie says, “I don't think there's any standard yet for how internal teams are structured to fully leverage the ideal analytics stack. … You need someone who's thinking about data from a stakeholder team perspective, which I think is still a role that's missing at a number of companies.”

Emerging roles like the analytics engineer can start to own this flow of communication using methods like operational analytics. Analytics engineers interact with the analytics stack and the stakeholders who will benefit from the stack, creating a bridge that doesn’t formally exist yet in many companies.

Start Building Your Ideal Analytics Stack

No one person and no one tool will have all the answers when it comes to how you should approach data stacks and processes. But by focusing on fundamental functionalities and communication flows, you can build an adaptable stack of tools that are not only best in class but also optimized for your needs.

The best way to start building your own analytics stack is to use Jamie’s ideal analytics stack as a guideline. Focus on the functionalities he lays out, and move on from there. Schedule a Census demo today and we can show you how it all works.

Related articles

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps:

Best Practices
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue

Remember when the days of "Dear [First Name]" emails felt like cutting-edge personalization?

Product News
Why Census Embedded?
Why Census Embedded?

Last November, we shipped a new product: Census Embedded. It's a massive expansion of our footprint in the world of data. As I'll lay out here, it's a natural evolution of our platform in service of our mission and it's poised to help a lot of people get access to more great quality data.