Best Practices

Build or Buy? 4 Reasons NOT to Build Your Own Reverse ETL | Census

Allie Beazell
Allie Beazell December 07, 2022

Allie Beazell is director of developer marketing @ Census. She loves getting to connect with data practitioners about their day-to-day work, helping technical folks communicate their expertise through writing, and bringing people together to learn from each other. Los Angeles Metropolitan Area, California, United States

Welcome to part two of our series on evaluating reverse ETL tools. This article was updated December 7, 2022 to better capture the current state of reverse ETL tooling. You can read the updated part one here and grab the new-and-improved companion guide, The definitive guide for evaluating reverse ETL tools.

In this installment, you’ll learn why reverse ETL is so challenging for many teams to build. We’ll breakdown:

  • The cost to build 🏗️
  • The cost to maintain 🔧
  • The opportunity cost 💸
  • Why reverse ETL isn’t just another data pipeline🚰

We’re assuming you already have some savvy data and engineering talent on staff. You may be asking yourself: Why can’t these existing resources within my company just build reverse ETL for our team themselves?

We often see companies wrestle with this. On the one hand, you can spin up an MVP of a reverse ETL tool yourself without too much trouble using custom python scripts and open-source options. But the labor isn’t in the initial development cost, it’s in the ongoing time and resources maintaining that tool will cost you. A DIY build will lock your engineers right back in API purgatory and ensure that your data team is always busy, but not necessarily productive.

On the flip side, when you use a pre-built tool like Census, you don’t have to spend time worrying about the ins and outs of each API your team uses (and you can reuse the data modeling and logic work you’ve already built for tools like dbt and Airflow). Even better, you can go from request to deployment within the same day. Sounds great, right?

But in case you’re not convinced, let’s take a look at some numbers around the incurred cost of the classic build vs buy decision (since we all love numbers, right?) Here are three reasons you don't want to build your own reverse ETL solution.  

1. Cost to build 🏗️

If you choose to build reverse ETL yourself, consider the labor cost of whoever builds and maintains your data integrations. This could be a data engineer, a data analyst, or an integrations specialist. For this exercise, we’ll assume that the task would fall on a data engineer.

To start, let’s take the average annual salary of a data engineer in San Francisco: $157,494 (but keep in mind that a good data engineer’s annual salary quickly rises above $200k).

For ease of calculations, we’ll assume your engineers are in that $200k range. Let’s do some back-of-the-napkin math to get a ballpark estimate of how much it would cost your organization to build a bespoke reverse ETL tool (not including the potentially infinite number of maintenance hours on that tool after it ships – we’ll get to that in a second):

Let’s say you’re starting out with three CRM connectors: Salesforce, Marketo, and Zendesk. And let’s say, for the sake of napkin science, it takes a (proficient) engineer about 2 weeks to build a custom DIY connector (or ~$8k per engineer per connector). That’s just for spinning one connector up. All three would run you about $24k and 6 weeks in initial engineering time alone. And that’s the cheap part. The expensive part is maintaining it and hoping that, if it fails, someone notices within a short period of time and fixes it before your business teams spend months working off stale data.

Compare that ☝ to a reverse ETL tool where it takes anyone 15 minutes to set up an initial sync with no upfront cost. With most reputable reverse ETL tools, you can get started for free with a yearly subscription that includes unlimited connectors available for less than 10K per year. Now that napkin math just makes sense.

2. Cost to maintain 🔧

We probably don’t have to tell you this, but the cost of DIY software doesn’t stop when the integration goes live. Instead, it becomes exponentially more expensive to deal with as it becomes more out of date. And with new tools and API updates always around the corner, new team members have to take the time to make heads or tails of it.

This cost is much harder to estimate with a high-level formula. People will inevitably leave your company or move teams, APIs will change, new features will be needed, and the organization will grow. Add in the added pressure to maintain these pipelines quickly since they’re transporting business-critical data. If these updates are relegated to the back burner, there are real-world, real-time data impacts like new trials not receiving welcome emails.

As your integrations and organization become more complex, you’ll end up with a dangerous, interdependent web of connectors to maintain and debug.

So how does a pre-built, dedicated reverse ETL tool eliminate this risk and stress? There are four main ways:

1. Increased uptime: Your business depends on customer data. If your integrations aren’t performing as needed or they’re not consistently built using up-to-date best practices, you’ll lose time and money quickly. For example, if you’re not automatically de-duping your ad platform audiences or if your sync is running up your API bills by syncing unnecessary fields.

2. Data quality assurance: Trust in data is an integral part of building a modern, data-driven organization. If your tools aren’t solving this (or, at the very least, not making it worse), you’re going to come out net negative. For example, if your sales team consistently gets out-of-date or incorrect contact information for prospects, they’ll hesitate to leverage the data at their disposal and fail to perform as well as they could.

3. Scalability: The cost of building and maintaining custom features for your tooling will quickly add up. A quality pre-built reverse ETL tool will come out of the box with free core features to get you up and running (and running some more) from day one. Plus, a quality pre-built reverse ETL tool will scale arbitrarily to meet your data volumes. For example, our free plan includes 50 destination fields, unlimited destination connections, unlimited SQL, dbt, and Looker models, incremental syncing, and more from day one.

4. API Quotas: Do you know how close you are to hitting the API quota of yourSalesforce, Braze, insert-name-of-costly-SaaS API? Do you want to be thinking about that every day? These are concerns that you will adopt when you trigger a python script on a cron job. Put the * in the wrong place, or forget to use a batch API, and you might end up exceeding your API quotas by orders of magnitude.

If you take down all these considerations and decide that it’s worth the ongoing cost to build your reverse ETL tooling in-house, more power to you. But, if you’re like us, you’d rather spend that time and resources building your core product.

3. Opportunity cost 💸

What could your talent build if all the busy work and distractions of maintenance were taken off their plate? Ultimately, the real cost of building a reverse ETL solution in-house and dedicating engineering effort to maintaining your own connectors isn’t the dollar amount of the time spent coding the project, it’s the cost of the work they’re not doing instead.

Not only that, it’s the cost of the work your analysts could do with the right tooling, but are blocked otherwise. If data professionals are constantly busy with maintenance, they’re less likely to innovate. After all, you want your team working on adding value using the pipelines, not building reverse ETL pipelines. Tools like Census don’t just let data engineers do better work. They give your analysts an avenue of greater business impact by building data pipelines themselves with just SQL vs the complex skills required to instrument Airflow or custom scripts.

On the data engineering side, practitioners want to spend less time writing integrations and more time unlocking value in data through models and applications. But they can’t do that if all their time is spent troubleshooting a complicated integration they inherited from another engineer. So, by investing in reverse ETL, you cut your troubleshooting time, invest in your data, and, as a result, invest in your product. After all, an uphill battle doesn’t scale for anyone.

4. Reverse ETL is not just another pipeline 🚰

Now that we’ve weighed the costs of building an in-house reverse ETL tool, the final point we want to make is that there is significant complexity in building a reverse ETL tool that goes beyond just building pipelines. It’s easy to think that reverse ETL is “just another data pipeline” but here are a few reasons why that’s not true:

  • Reverse ETL is designed for a different use case. Data being synced to downstream operational tools has a different purpose than data landing in your data warehouse; it’s a completely different use case. With reverse ETL you’re not just putting a data tool in place for the data team, you’re bridging the organizational gap between that data team and your BizOps teams. Operational data is used to power workflows and drive decision-making, and your reverse ETL tool needs to be carefully designed with these end-use cases and users in mind. This affects many requirements such as data quality, alerting, and usability for business users.
  • Writing data is a different animal than reading it. Reverse ETL writes business-critical data into hundreds of destination databases at scale, all with their own nuances. Read APIs on most SaaS tools tend to be relatively straightforward, whereas write APIs are much more complex (e.g., each tool handles deletes differently). TL;DR: It’s a harder problem to solve.
  • Reverse ETL requires a new data governance strategy. Reverse ETL tools interact with any application they’re connected to and must be well-governed to ensure there’s a tight feedback loop throughout the organization. Good data stewardship is a uniquely important virtue for reverse ETL tools, making it an overall riskier endeavor.

We know that’s a lot to think about when deciding if a managed reverse ETL solution is for you. So here’s a recap:

  • The cost of building your own tool is often more than it makes sense for many companies to take on. Our best case back of the napkin estimates put it at almost $24K and six weeks to build only three connectors, and that doesn’t come close to factoring in the maintenance costs.
  • The cost of maintaining a bespoke reverse ETL tool often balloons well beyond initial estimates and never really stops. Outsourcing your reverse ETL tool to a vendor like Census lets you do higher leverage work with that time and gain some peace of mind.
  • The opportunity cost of having your talented data engineers focus on building and maintaining a DIY tool (and the wasted potential of your analysts) is more than enough to offset the annual cost of a dedicated reverse ETL tool.
  • Reverse ETL is not just another pipeline and there is significant complexity in trying to build a tool with the functionality to satisfy business intelligence use cases and reliably meets technical requirements like dealing with write APIs.

Hopefully, by now you’re in the camp of folks who know they want to invest in a reverse ETL tool from a dedicated, expert vendor like Census. If you’re ready to start the journey toward operational analytics (AKA doing more with your data (and time)), there are some key considerations to keep in mind as you search for the best reverse ETL tool.

Ready to learn more? Check out part 3.1 of our series on evaluating reverse ETL tools: 4 key considerations for finding the best reverse ETL tool. Or, if you want to skip ahead, grab The definitive guide to evaluating reverse ETL tools.

Related articles

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps:

Best Practices
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue

Remember when the days of "Dear [First Name]" emails felt like cutting-edge personalization?

Product News
Why Census Embedded?
Why Census Embedded?

Last November, we shipped a new product: Census Embedded. It's a massive expansion of our footprint in the world of data. As I'll lay out here, it's a natural evolution of our platform in service of our mission and it's poised to help a lot of people get access to more great quality data.