Announcing Streamdal: Data Monitoring for Your Distributed Streaming Systems 🔦
Welcome to The Data Source, your monthly (ish) newsletter covering the top innovation in data infrastructure, analytics and developer-first tooling.
Subscribe now and never miss an issue 👇🏼
Welcome back to The Data Source! Today, I’m kicking off our regular programming with the announcement of our latest investment at Work-Bench 🌟
As an Associate at Work-Bench, an early stage, enterprise focused VC firm based in NYC, I focus on our investments in data and cloud-native infrastructure.
For the past year, the evolution of modern data architectures and the need for data monitoring for distributed systems has been an important investment category for us at Work-Bench. Today, I’m excited to announce Streamdal, our latest investment in the data infrastructure category, and in line with our broader research.
We led Streamdal’s $5.4M Seed round with participation from Crosscut Ventures and Verissimo Ventures.
Streamdal, formerly known as Batch.sh, is an end-to-end streaming data performance solution that monitors, traces, and governs data in any event-driven architecture at any scale.
As newer data architectures are created, we believe that sophisticated data encodings and protocols will be leveraged by developers to ensure quality guarantees across their data streams. Streamdal works with popular messaging systems (Kafka, RabbitMQ, GCP Pub/Sub, etc.) and supports data encoded in JSON, Protobuf, and Avro, with plans to support others such as Flatbuffer and Thrift. Given companies tend to implement their distributed systems differently, we are excited for Streamdal to build a solution that is accessible to all.
The Problem
Application performance monitoring (APM) and software observability have been a core focus for practitioners over the years given how critical they are for the proper upkeep and functioning of the infrastructure and application layers. Given the proliferation of distributed systems, APM and software observability as a market has evolved into several sub-categories of solutions designed for today's modern system architectures.
Metrics-based APM: Tools in this category include Datadog, New Relic, AppDynamics and others. They focus on capturing key real-time metrics such as application availability, memory, error rates, etc. to help engineers understand all that is happening inside their applications.
Code-level APM: Tools in this category include Sentry, Dynatrace, Splunk, Lightstep, Pyroscope and others. They focus on providing code-level visibility into the application. Solutions specialize in distributed tracing, profiling and more to capture the nitty gritty details of what’s happening at the code-level to help thwart any incidents.
Network-level APM: Tools in this category include LogicMonitor, New Relic and others. They measure application performance by analyzing network traffic and capturing data around how applications are being delivered through a network.
However, current APM and observability offerings, while great at capturing key operational metrics, only offer a partial view of the true state of an application.
As modern data architectures continue to evolve and grow in complexity, teams are being tasked with decoding their distributed systems to understand the data that is being fed into each application. Applications often break if bad instructions (i.e. bad data) are passed to them which is why making sure that the right data is being directed to its intended destination is critical.
Software observability for data infrastructure systems is not a new concept - solutions such as Monte Carlo, Bigeye, Anomalo and others specialize in observing data at rest in the database and data workflows feeding storage systems as well as downstream applications to ensure that nothing is broken. But when it comes to observing streaming data infrastructures and providing granular views of in-flight data, we find current offerings to be lacking.
Data in motion is encoded, i.e. it is transformed into specific formats, to ensure that it is properly and safely consumed upon transmission to distributed, event-driven systems. But distributed systems are by default, "black boxes," and therefore require a highly sophisticated tool to decode and understand the data that these applications are using. Today, companies that have successfully implemented a streaming data performance solution are the FAANG organizations who have built their own in-house solution or those who are actively leveraging proprietary solutions like Confluent. For the rest of the world, it's having their engineering teams writing and maintaining brittle code to accomplish this task.
As sophisticated encodings such as Protobuf, Avro, Flatbuffer, Thrift and Gzip, continue to penetrate the market to support the different types of data architectures that are being built today, teams will require a dedicated tool to continuously observe their streaming data pipelines and this is where we see Streamdal filling a very important gap in software observability.
The Solution
Streamdal is defining a new category in APM for streaming data infrastructure. It enables engineering teams to observe and understand data inside any stream, encoding and at any scale.
It is no surprise that the streaming data infrastructure ecosystem is growing as fast as its batch counterpart and has the potential of becoming an incredibly important fabric underpinning the next-generation of applications. We are starting to see a growing number of products being built around the streaming data infrastructure such as Vectorized, Materialize, Decodable, StarTree and more which is an indication that the market for the modern streaming data stack is finally crystalizing.
With Streamdal, tracking, reviewing and replaying events on demand is an easy feat. The tool provides engineering teams with a Protobuf-first experience and enables them to see beyond metrics and track root cause analysis during outages.
Their open source offering, Plumber, is a command-line interface (CLI) tool for inspecting, piping, messaging and redirecting data into the messaging systems and is currently being used by organizations including Recharge, ParkMobile, Octane, Elude and others.
The Team
Streamdal is founded by Ustin Zarubin and Daniel Selans, who previously created highly scalable and distributed systems at companies such as Digital Ocean, New Relic, Community.com, InVision and others. They’ve seen firsthand the challenges around implementing proper monitoring and observability guardrails to prevent bad data.
When we first met Ustin and Dan, the thing that stood out was their exciting vision to re-imagine the way that organizations monitor their distributed infrastructure and are excited to watch them bring the product to life.
Congrats to Ustin, Dan, and the Streamdal team on their launch today and we are pumped to partner up with them on this amazing journey.
See their coverage in TechCrunch here!