The Data Source #18 | Articles I'm reading this week 📙
Welcome to The Data Source, your monthly (ish) newsletter covering the top trends and innovation in software infrastructure and developer-first tooling.
Subscribe now and never miss an issue 👇🏼
In this edition of The Data Source, GitHub embraces Copilot, a move that acknowledges AI's transformative impact on developer workflows. AWS introduces S3 Express One Zone, targeting startups with promises of reduced latency over costs. And, catch a throwback post from Brendan Gregg as he explores the intricate challenges of adapting eBPF observability tools for security monitoring.
📚 How AI Changes Workflows by Matt Rickard
GitHub recently said it was “re-founding” itself on Copilot instead of git. GitHub has always been about the workflow — there are plenty of other hosted git providers, but GitHub was the first to put together pull requests, issues, and collaboration into a single workflow. Re-founding on Copilot is a way to acknowledge that AI will drastically change the developer workflow.
GitHub is undergoing a significant transformation by shifting its focus from Git to Copilot, marking a pivotal moment that underscores the far-reaching impact of AI on developer workflows.
Today, developers are faster and more efficient at executing familiar tasks, thanks to features such as autocompleted code, AI-assisted code reviews, and the generation of AI-powered commit messages. These advancements not only streamline the software development process but also reimagine traditional developer workflows. It empowers developers to identify low-risk changes that can seamlessly merge without manual review. It automates conflict resolution and style issue checks, sparing developers from these routine tasks and allowing them to focus on more strategic aspects of their work.
But that's not all. In this post, Matt envisions a future where AI revolutionizes enterprise platforms, rendering them adaptable to diverse workflows. As a copilot, AI has the potential to equip non-technical users with the ability to autonomously generate specific programming languages and generic code, tailoring platforms to their own unique requirements. GitHub re-founding itself on Co-pilot introduces a completely fresh paradigm in software development and it’ll be interesting to see all the innovation that comes out of this shift.
📚 S3 Express One Zone, Not Quite What I Hoped For by Jack Vanlightly
S3 Standard can be cheap and most definitely is highly durable. It’s Achilles heel is the high, unpredictable latency. Cheap, durable storage makes it the best place to store large volumes of data and many systems today already do that. However, the high latency is a problem and depending on the workload, data system builders must go through many hoops to integrate S3 into the architecture to benefit from the economical storage but dodging the latency bullet.
Fresh from re:Invent this week, AWS unveiled S3 Express One Zone, a new storage class that promises lower latency but comes with a hefty price tag.
The right cloud storage solution has to balance cost-effectiveness, durability, and low-latency. While S3 Standard offers affordable and durable storage and is an ideal solution for storing large volumes of data, its high and unpredictable latency has made it challenging to handle low latency workloads. Data infrastructure teams typically resort to implementing replicated, fault-tolerant caches atop S3 to meet their systems' low-latency demands. Does the advent of S3 Express One Zone signal the end of these replication layers? Not quite.
While the tech itself is revolutionary and is a leap toward the perfect cloud storage solution, the cost of deploying it will make it challenging for teams to adopt it. But as Jack points out, Express One Zone might find a niche among startups prioritizing time over expenses. They may be more inclined to embrace Express One Zone rather than layering S3 with replication mechanisms and building it all themselves. That said, I do think we are not far from a storage primitive that is durable, supports low latency and is cost-effective. It’s just a matter of time.
📚 eBPF Observability Tools Are Not Security Tools by Brendan Gregg
Observability tools are designed have the lowest overhead possible so that they are safe to run in production while analyzing an active performance issue. Keeping overhead low can require tradeoffs in other areas: tcpdump(8), for example, will drop packets if the system is overloaded, resulting in incomplete visibility. This creates an obvious security risk for tcpdump(8)-based security monitoring: An attacker could overwhelm the system with mostly innocent packets, hoping that a few malicious packets get dropped and are left undetected. Long ago I encountered systems which met strict security auditing requirements with the following behavior: If the kernel could not log an event, it would immediately halt! While this was vulnerable to DoS attacks, it met the system's security auditing non-repudiation requirements, and logs were 100% complete.
This is an excellent post from Brendan Gregg that explores the intricacies of utilizing eBPF observability tools for security monitoring. Integrating observability tools into security monitoring frameworks without proper adaptation is not the way to do it. eBPF observability solutions excel in minimizing overhead for safe production use during active performance monitoring. But the downside of this low overhead is that it can have serious implications such as exposing the system to security vulnerabilities.
The distinction between security monitoring and operational monitoring has to be made clear. Teams must recognize the necessity for dedicated tools tailored to the unique demands of security, rather than attempting to repurpose existing operational tools. This is a solid post by Brendan which highlights an exciting opportunity shaping up around eBPF-powered security products.
Are these topics / themes relevant to you? Let’s Chat! ☎️
Practitioners and startup builders, if you’re focusing on developer tools, distributed systems and cloud infrastructure, I’d 💜 to chat! My Twitter DM and 📩 (priyanka@work-bench.com) are always open!
Priyanka 🦋
…
I’m an Associate at Work-Bench, an early stage enterprise software-focused VC firm based in NYC with our sweet spot for investment being at the Seed stage. This correlates with building out a startup’s early go-to-market motions. In the data world, we’ve invested in companies like Cockroach Labs, Arthur, Alkymi, Streamdal and others.