The Data Source #1 | Welcome + What’s New in Experimentation 🌟
Welcome to The Data Source, a monthly newsletter covering the inputs and outputs of the data science, data engineering, and developer tools ecosystem.
I’m Priyanka, an investor at Work-Bench covering data management and developer tools. For context, Work-Bench is an early stage enterprise software-focused VC firm based in NYC with our sweet spot for investment being at the Seed II stage which correlates with building out a startup’s early go-to-market motions. In the data world, we’ve been fortunate to invest in companies like Cockroach Labs, ArthurAI, Algorithmia, Datalogue, Alkymi, x.ai and others.
Through this newsletter, I will be sharing resources that help inform my research on data topics I’m tracking and offer an investor’s perspective on what it means for the enterprise. This effort will be collaborative with the community, so if you’re an operator, leader, or contributor focused on data, I’d love to hear from you. 📩
This month we dive into the world of experimentation! 🧑🏽🔬
⏳ Product Experimentation in 1 minute
Companies across the board have embraced a test and learn philosophy when it comes to improving their product design. The basic premise of product experimentation is that for every feature or product that you release, you need to be able to measure quantitatively the impact that it’s going to have on your users in order to improve user experience.
Experimentation through A/B testing allows companies to test out new features among a group of randomized users, group A and group B, as a baseline for measuring the effectiveness of every product release.
In cases where it’s hard to have the desired level of randomization that’s required of a typical A/B test, companies have turned to quasi experiments where the experimental and control groups are compared against a criterion that’s been pre-selected by the researcher. Even though quasi experiments are not as precise as A/B tests they usually provide reasonable estimates of the cause-and-effect of an experiment. For more, see how it’s being done at Netflix and Shopify.
But designing experiments in an iterative way is just one part of the picture, having the right infrastructure to compute all key metrics, validate hypotheses and instrument experiments at scale is also important. Today, tools like Optimizely, Split, and open source libraries like PlanOut (Facebook) and Wasabi (Intuit) are being widely adopted across the enterprise to serve as the underlying testing framework for data scientists.
The list of open-source tools continues to grow, check out Wix’s Petri and Indeed’s Proctor.
Get One Level Deeper 👇🏼
🔹 LinkedIn T-REX or “LinkedIn Targeting, Ramping, and Experimentation” broadened the scope of A/B testing to address issues around fairness and parity in AI and the way it does it is by applying the Atkinson Index which is the income inequality measure to assess the negative impact that a particular feature or product might have on its users.
I think the applicability of A/B testing to responsible design is very impactful, especially as big tech grapples with the increasing concerns around algorithmic bias, and I’m excited to see how other companies are going at it and implementing ethics and fairness into their experimentation practices.
🔹 Spotify’s experimentation platform called ABBA (yup, named after the Swedish pop group!) introduces an alternative approach to doing feature flagging. Instead of “flags,” the platform configures “properties” on clients services which programmatically controls the types of experiences that a user is exposed to during the experiment. This is important in that it not only reduces configuration errors that usually occur during experimental testing but you can now run one single experiment to test different systems and this to me is the key to keep iterating on new ideas.
🔹 Stitch Fix’s experimentation platform stands out for a few reasons for me:
It’s a unified system where different teams collaboratively design, test and run experiments, and visualize the impact of each rollout-- which means maximizing the most value out of one platform, no more unnecessary context switching and a step change in productivity!
It leverages metadata across experiments which basically gives you a historical view of your data so that if something were to break it’s easy for you to go back to the failed experiments and do a root cause analysis.
There is a real governance structure around the way metrics are stored and served which is such good hygiene when it comes to reproducing and standardizing metrics especially in statistical testing.
Centralized metric definitions and unified decision making has been a hot topic in data science and I think it’s the way to go with product experimentation where there needs to be a certain level of discipline in test methods. Check out DoorDash’s experimental analysis platform called Curie which reflects a lot of the same design considerations incorporated into Stitch Fix’s.
💡 Ongoing Work in Experimentation and Why It Matters
As the appetite for data-driven decisions keeps growing, organizations are making experimentation front and center of their product development cycle and it’s been interesting to see how experimentation continues to evolve to help teams iterate on new ideas faster and more confidently as well as proactively tackle biases to mitigate the unintended consequences of poor product design.
And that’s a wrap folks! To all the founders and VCs out there, I’d love to swap notes if this is a space that you’re tackling. As always, my Twitter DM is open and you can 📩 at priyanka@work-bench.com!