Best Practices

How Airbnb democratized their data to empower their employees | Census

Boris Jabes
Boris Jabes October 13, 2020

Boris is the CEO of Census. Previously, he was the CEO of Meldium, acquired by LogMeIn. He is an advisor and alumnus of Y Combinator. He enjoys nerding out about data and technology, 8-bit graphics, and helping other startup founders.

Airbnb is one of the most data-driven companies in the world, largely due to how well it scales data-driven decision-making. It got there by building sophisticated data infrastructure that gave everyone in the company access to information.

A lot has already been written about how Airbnb approached growth (check out this great interview). One of the lesser known pillars of how they make data-driven decisions is their Data University. It's something they had to iterate on internally until it paid off.

Despite open-sourcing much of their tools & techniques, most companies are still playing catch-up. Here's how you can learn from their history and take similar steps towards transparency & education in your organization.

Data Democratization’s History at Airbnb

The Data University program, founded in 2016, was modeled after Google’s internal team training programs. Airbnb’s original motivation for educating their teams was to give a “voice” to their customer through data. From product analytics or accessible datasets of customer actions, Airbnb enabled data-driven decision-making across the company.

Essential to this effort was making the data accessible and easy to understand for everyone, no matter their role at the company. In other words, all employees needed to be able to analyze the data independently. So Airbnb prioritized developing 100-level, foundational classes over the more advanced classes (which were created later for developers). All told, Airbnb launched Data University in the first year with 30 classes.

Source

Airbnb had tried to democratize data and roll out the tools to analyze it to all teams three times before. Why was it successful the fourth time?

The successful launch of Airbnb’s Data University can be traced back to three factors:

  • The curriculum was approachable for each employee, regardless of job function. In prior attempts, there wasn’t an emphasis on equipping beginners with tools to succeed.
  • Airbnb’s leadership set clear expectations for teams in data literacy. From the get-go, the priority was for managers to discuss progression through Data University and emphasize to direct reports that data-driven insights were the expectation upon completion of the program. Top-down, this education’s value was reinforced.
  • They found ways to measure the successes of the program. The main metric used was weekly active users (WAU) of Data University, in addition to how many Airbnb’ers took classes and NPS scores from attendees. Notably, these methods of measurement continue to chart success over time.
“We used a metric of weekly active users (WAUs) of our data platform as a proxy to how ‘data informed’ we were as an organization. At the beginning of Q3 2016, only about 30% of Airbnb employees were a WAU of our data platform, which was significantly lower than other hypergrowth internet company peers we benchmarked with like Facebook and Dropbox.” — Jeff Feng, PM Lead for Data at Airbnb

After the first six months in 2016, 45% of Airbnb was weekly active users (WAU) of the data platform, a strong 66% gain in the first half-year of the program. A related metric of success: Airbnb became profitable for the first time in the second half of 2016. This profitability meant more hires, more hosts and stays for Airbnb, and thus more data to analyze.

Data Scientists Scaled by Training Thousands

The key to democratizing your own organization’s data lies within your own employees, those working with the data regularly. Data science teams should dedicate some bandwidth to educating the rest of the company if you want to have maximum impact. That’s right—you don’t need to hire “educators.” Data scientists can, and should, teach your product managers and operations teams directly if you want to make more data-driven decisions. Fortunately, the data champions in your organization don’t have to be huge in numbers to make a significant positive impact on the data insights your team can dig up with their guidance.

Airbnb’s data analysts and scientists made up just 1.6% of their total workforce in 2016-17, but because of Data University, the company was able to inform every decision employees made with data. Proper employee training of any type ensures that the workforce has all the tools needed to succeed, so Airbnb set out to arm their workforce with Data University to increase transparency with data and scale the impact of their small but mighty data team. They were determined to prevent data holding back their growth.

This meant amplifying the impact of that small, mighty data team beyond headcount. Said Feng at the outset, “In order to inform every decision with data, it wouldn’t be possible to have a data scientist in every room—we needed to scale our skillset.”

This system of using data-scientist employees as educators to scale impact across the company was a page out of Google’s internal training book. Per Google: “Your own employees are perhaps the most qualified instructors available to you.”

“Your own employees are perhaps the most qualified instructors available to you.”

What did this look like a year later, mid-2017, after the launch of Data University? By then, Data University was powered by 30 volunteer data science “faculty members."

These 30 volunteers effectively scaled their impact to 500 unique people who had participated in at least one class, or educating nearly one-eighth of all of Airbnb, as of May 2017. And Airbnb began to measure a new metric, given the successes: depth of engagement. As of May 2017, employees who had participated took more than four classes on average, totaling over 2,100, as Feng called them, “butts in seats.”

He added, “Every class offered thus far has a net promoter score of +55 or higher,” another key metric to mark Data University's success.

Powered by the internal data minds at Airbnb and a shared goal of bringing learning back to the workplace, the company laid the foundational work to establish data-driven decision-making as part of “business as usual” at Airbnb.

The 3 Competencies of Data-Driven Organizational Decision-Making

A commitment to data democratization, according to Feng, “not only help[s] ensure that decisions are grounded in data, but it enables people to make decisions autonomously. This is important because the person asking the question always has the best context on the question they are trying to answer, and it reduces the feedback loop to answering questions. This also has the side benefit of freeing up some of the data science team’s time.” Making that conscious commitment as an organization is the first step.

Both a method and a company culture, data-driven decision-making becomes possible only once all employees are trained as “citizen data scientists” and well-versed in three areas. The three competencies to ensure people make good, data-informed decisions as imagined by Airbnb are SQL/data proficiency (data education), access & documentation, and tools.

source

Data Democratization Program Grows with Airbnb

In the years since its 2016 inception, Airbnb has iterated on and created additional programs to augment Data University (Data U).

In December 2018, the condensed two- to three-day training of Data U Intensive was born to solve for two limitations of the original university:

  • Tweaking training content to address specific needs of a department or team
  • Cross-training more data champions who are members of the teams they serve

As of January 2019, over 400 Data U Intensive courses have been taught to thousands of Airbnb employees by 55 faculty members. The ROI speaks for itself, with the number of daily SQL users leaping up after the first Data U Intensive trainings.

After the rollout of Data U Intensive courses, educating users on the value of and how to use SQL, the percentage of daily SQL users spiked from a lowly ~7% to a peak of ~62% of users less than one month later. This sustained increase evened out to a solid 30% increase in SQL usage year over year from 2017 to 2018.

source

Airbnb’s most recent advancement is the Engineering Empowered Data Science program to deepen their data scientists’ knowledge of the engineering landscape and streamline collaboration with engineers. New use-cases for this educational framework arise constantly across teams, and Airbnb is prepared to evolve to meet those needs.

Once data insights are surfaced and are readily available to your team, there arise hundreds of ways to analyze them, combine them for new learnings, and open up new venues to explore further.

Airbnb Sets the Standard for Forward-Looking Organizations

Data is power, and the more organizations can adopt this system of radical transparency, the more quickly they can respond to market fluctuations and changes in the economy. The stakes couldn’t be higher for those organizations that don’t take employee training seriously.

For Airbnb, trial and error was and continues to be part of the process. The company continually learns from its successes and failures, cracking the code along the way and adapting quickly to Airbnb’s growth, and communicating learnings about their customers through data.

It doesn’t take establishing your own “university” to follow Airbnb’s example. Take your own steps toward data democratization right now.

  • Setup an internal presentation series. This goes without saying but your data team should budget at least 1-2 hours per month to share their knowledge with the broader organization. You can do this by soliciting a data-driven presentation from an internal expert for a lunch-and-learn or sponsoring knowledge-sharing sessions like “professional book club” discussions.
  • Introduce 1:1 tutoring. Your own team can be the most powerful instructors. Randomize matching data scientists to members of the team to cover bite-sized topics together in monthly 30-minute 1:1s. Survey the pairs after each round to compile topics to pull together in future courses.

Want to become more like Airbnb? Boiled down, you need two things: the right tools and to teach your company SQL skills. It will pay off. Census helps operationalize your data democratization efforts to make customer data insights accessible for your employees. Forming your own coalition of citizen data scientists is only a demo away.

Read on for more inspiring customer stories and powerful applications of Census’s reverse ETL platform across the myriad tools you use for the customer, marketing, and sales data.

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: