Season 10, Episode 165

The Evolution Of Platform Engineering With Massdriver CEO Cory O’Daniel

Hosts

Danny Allan

Guests

Cory O’Daniel

Episode Summary

Dive into the ever-evolving world of platform engineering with Cory O’Daniel, CEO and co-founder of Massdriver. This episode explores the journey of DevOps, the challenges of building and scaling infrastructure, and the crucial role of creating effective abstractions to empower developers. Cory shares his insights on the shift towards platform engineering as a means to build more secure and efficient software by default.

Show Notes

In this episode of The Secure Developer, host Danny Allan sits down with Cory O’Daniel, CEO and co-founder of Massdriver, to discuss the dynamic landscape of platform engineering. Cory, a seasoned software engineer and first-time CEO, shares his extensive experience in the Infrastructure as Code (IaC) space, tracing his journey from early encounters with EC2 to founding Massdriver. He offers candid advice for developers aspiring to become CEOs, emphasizing the importance of passion and early customer engagement.

The conversation delves into the evolution of DevOps over the past two decades, highlighting the constant changes in how software is run, from mainframes to serverless containers and now AI. Cory argues that the true spirit of DevOps lies in operations teams producing products that developers can easily use. He points out the challenge of scaling operations expertise, suggesting that IT and Cloud practices need to mature in software development to create better abstractions for developers, rather than expecting developers to become infrastructure experts.

A significant portion of the discussion focuses on the current state of abstractions in IaC. Cory contends that existing public abstractions, like open-source Terraform modules, are often too generic and don't account for specific business logic, security, or compliance requirements. He advocates for operations teams building their own prescriptive modules that embed organizational standards, effectively shifting security left by design rather than by burdening developers. The episode also touches upon the potential and limitations of AI in the operations space, with Cory expressing skepticism about AI's current ability to handle the contextual complexities of infrastructure without significant, organization-specific training data. Finally, Cory shares his optimism for the future of platform engineering, viewing it as a return to the original intentions of DevOps, where operations teams ship software with ingrained security and compliance, leading to more secure systems by default.

Links

Tags:

“Cory O’Daniel: That thing is also hard for engineers. It’s like, you got to get out and talk to people early and frequently. You are talking a lot, listening more, you’re doing both of those, way more than you’re developing, and I think that’s one of the keys. Like it’s very easy to like, “Yeah, I got a cool idea.” And just spend like, you know, your entire weekend hacking away on it.

But like, until you’ve vetted and validated, that the other people care about that and we’ll pay you money for it, like, you don’t – you don’t even have an inkling of a business, right? And so, that was the part that was very hard for us early on is like figuring out like where to find that feedback that we’re looking for, right? Not feedback that validates your idea, right? You want some feedback that’s going to invalidate that idea a bit.

Go about it in a scientific approach, and then make sure that, like, you’re incorporating it into the thing that you want to build, right? Because you’re going to get feedback, and some of it you’re not going to love. Some of it is going to be extremely valid if you don’t love it, some of it is going to be extremely valid if you do love it, but like, that’s the key, is just getting out there, and just finding those early customers, iterating on ideas, getting feedback from people, and just devoting your entire life to it, almost.”

[INTRODUCTION]

[0:01:06.1] ANNOUNCER: You are listening to The Secure Developer, where we speak to industry leaders and experts about the past, present and future of DevSecOps and AI security. We aim to help you bring developers and security together to build secure applications while moving fast and having fun.

This podcast is brought to you by Snyk. Snyk’s developer security platform helps developers build secure applications without slowing down. Snyk makes it easy to find and fix vulnerabilities in code, open source dependencies, containers, and infrastructure as code, all while providing actionable security insights and administration capabilities. To learn more, visit snyk.io/tsd.

[INTERVIEW]

[0:01:46.3] Danny Allan: Morning, good afternoon, welcome to everyone to another episode of The Secure Developer. I’m Danny Allan, very excited to be back with you today because we have a very special guest, and that is the CEO and co-founder of Massdriver, Cory O’Daniel. Cory, welcome to the show. How are you?

[0:01:59.9] Cory O’Daniel: I’m doing pretty good. Thanks for having me.

[0:02:02.0] Danny Allan: Yeah, it’s exciting to get to talk to with you. We were chatting a bit before, and you have a really interesting background in a lot of different areas. It seems to be mostly the IaC, Infrastructure as Code space, but why don’t you introduce yourself to the audience?

[0:02:15.6] Cory O’Daniel: Cory O’Daniel, CEO and co-founder of Massdriver. Software engineer, so first-time CEO. I’ve been in the space for quite a while, original background was in healthcare information systems, moved to California in the early oughts as they call them now, to Chase, being a startup and getting rich and famous, which has not happened, but yeah, I joined a startup in the early 2000s, right around the time that there is this – there was this like book and shoe company that was launched in these cloud computers called Amazon.

So, my boss walks by me one day, and you know, this is 2005, 2006, after EC2 first launched, and I was the only person on the team that was a software developer that had also worked in a data centre, and our data centre’s lease was expiring, and he wanted to move all of our stuff to EC2, and so he just kind of like, surprised the entire team, including myself with the – me, leading this project as a budding 24, 25 year old, who had no idea what he was doing.

And so, you know, I’ve just kind of been pigeonholed into what would be soon to be called like, the DevOps role, and I put big old air quotes on that. Yeah, so I kind of worked in the DevOps space for most of my career ever since. So, I just kind of got pigeonholed into that place, been working in startups for most of my life, and then started Massdriver about four years ago.

[0:03:34.2] Danny Allan: So, first time CEO, you set up Massdriver. What’s the – what’s your TLDR in four years? What advice to give to developers who want to be CEO of a startup company?

[0:03:46.1] Cory O’Daniel: Do it young, do it young. If you’re over 40 and have kids already, you're going to be exhausted; that’s my number one tip. You know, it’s funny, I was like, “Yeah, this would be easy.” Starting on all these. I’ve been a part of so many of them. Like, “Yeah, I’ll do some CEOing and see how this goes.” It’s exhausting. I don't know, it’s interesting. I feel like my advice may be deprecated, honestly.

Like, the world has just changed so much since we started Massdriver, like Zerp’s gone, right? Like, we’re not making like that. They got like, easy money is gone, AI is here, and everybody is throwing their money at AI. Find something that you’re going to love being CEO of, right? Like, I’m working on a developer tool, and it is an idea that I love. It’s something I’ve been passionate about for decades.

I think that’s the number one thing is like, you’re going to be in it, you’re going to be in the grind, make sure it’s something you’re super passionate about, but I think one of the other thing that’s a bit – that I think is an evergreen piece of advice, the thing is also hard for engineers is like, you got to get out and talk to people early and frequently. You are talking a lot, listening more, you’re doing both of those way more than you’re developing, and I think that’s one of the keys.

It’s very easy to be like, “I got a cool idea.” And just spend like, you know, your entire weekend hacking away on it, but like, until you’ve vetted and validated, that other people care about that and we’ll pay you money for it, like, you don’t – you don’t even have an inkling of a business, right? And so, that was the part that was very hard for us early on is like figuring out like where to find that feedback that we’re looking for, right?

Not feedback that validates your idea, right? You want some feedback that’s going to invalidate that idea, they’d go about it in a scientific approach, and then make sure that, like, you’re incorporating it into the thing that you want to build, right? Because you’re going to get feedback, and some of it you’re not going to love, some of it is going to be extremely valid if you don’t love it. Some of it is going to be extremely valid if you do love it.

But, like, that’s the key, is just getting out there, and just, finding those early customers, iterating on ideas, getting feedback from people, and just devoting your entire life to it, almost.

[0:05:46.6] Danny Allan: So, you’ve always been in the infrastructure as code space, is that a fair statement? And then you started in EC2, but then, your evolution has been very much on Infrastructure as Code, is that true?

[0:05:57.5] Cory O’Daniel: Yeah. So, I mean, I’ve pretty much worked on like teams that were, you know, delivering self-service to engineers, most of my career. So, it’s like, how do we take the infrastructure requests of developers and make it and in a fashion that’s easy for them to get it without them like, asking ops to set something up for them or asking us to write some Terraform, or asking us how to troubleshoot something.

Like, my goal has always been to make sure that developers can do that on their own without necessarily becoming me, and even like, getting started, when we first started getting on EC2, like, one of my first open source projects was a tool for provisioning infrastructure, and like, managing and figuring infrastructure. So, it’s like, I think it was right around the time that Ansible is coming into being, and so like, I worked on this thing for a while.

And then I was like, “Oh, there’s other – there’s hundreds of people working on this thing.” I'm not going to be no JS about it and be like, I’m going to keep working on mine, like, forget this, I’m going to go use Ansible, and then, you know, Chef came out, and knowing from Ansible to Chef to Terraform and –

[0:06:52.8] Danny Allan: Yeah, no, it’s an interesting space, and the platform engineering has evolved, I’ll say, in general, all over the past 20 years. I guess, maybe you can give a flavour of that to the audience. So, we have an interesting audience, I’ll say, and that it spans from kind of traditional developers and C and C++ to modern developers that are doing pipeline development to truly DevOps-type engineers.

But you have a different perspective because you come from the DevOps world, from the platform engineering world. How have you seen that evolve over the last 20 years? Like, what has changed, what has stayed the same?

[0:07:25.9] Cory O’Daniel: Jeez, I mean, so much has changed. I feel like – I feel like the only thing that’s been consistent is that none of us can agree on what DevOps is, right? And like –

[0:07:32.7] Danny Allan: Right.

[0:07:34.6] Cory O’Daniel: I mean, I think that’s probably like, the root cause of like, how we’ve gotten to where we are today, as like, there was never like a concrete definition of it per se. It was like, “Ah, we’re going to tear down the barriers” and like – or tear down the walls and like, nobody knew what that meant, right? And so, some people went down this path, which was like, “Hey, that means developers do everything.”

And other people are like, “Well, that means we make a separate team.” There’s like this like conduit for like, get handling developer requests, and then people are like, “Hey, well, you know, do the…” I’m throwing air quotes around it, “The platform engineering approach,” which is like, we’re Ops teams, but we’re also software developers, and so, we’re going to build APIs and tooling to make it easier for developers.

And like, that’s always been the approach of DevOps that I’ve seen the most value in, is like, getting that operations team to produce a product that developers can then use to access what they need to. I’ve always felt that was the spirit of the original definition of DevOps. I think that a couple of things have obviously changed over the years. It’s like that term is fragmented greatly.

I have a blog post I wrote a few years ago called DevOps is Bullshit, which kind of like, tackles the idea of like, how much it’s sprawled, and then I think the other thing that’s changed a lot though is, you know, I have a piece on this too, it’s called Elephant in the Cloud, it’s like, the way we write software really hasn’t changed much since like, the old days of Cobalt. Like, we have conditionals and loops and variables.

Different syntaxes, yeah, but like, the way that we run stuff has changed fundamentally, multiple times in the past 25 years. We went from mainframes to servers, to VMs, to serverless, to containers, to serverless containers, and now, we’re all just making Open AI calls, right? Like, it just keeps changing, right? So, it’s like, and that’s one of the things that sucks. It’s like, it’s, you know, when you're looking at like, the operational side of the world, when that whole world is just changing like every couple of years.

Like, that’s one of the things that makes it very hard for I think, developers to stay up on like the cloud, right? Like, I can sit and I can note, I no note, I can know my Ruby, I can talk to a business person and like, figure out how to build a feature, but when like, the way that my software is running is changing every couple of years, like, that’s a lot to stay on top of, right? That is, in and of itself, a full-time job just to understand the cloud.

Yeah. I mean, I think, I think a ton has changed, and you know, that one through line is that it keeps changing.

[0:09:53.9] Danny Allan: You’re right, though, it changes. It has changed at a faster velocity than software development itself, but would you say that the DevOps engineer or the platform engineer is an evolution of IT? So, if you go back 25 years before the cloud, and before these things occurred, essentially, developers are writing code, throwing it over the wall to IT, and IT was running it.

Do you see DevOps and platform engineering as the next iteration of that IT role, or do you truly see it as a hybrid blend between development and IT operations?

[0:10:26.8] Cory O’Daniel: I mean, I think, honestly, if we want to scale it, and I think we have a huge scale problem, and this is hard. This is very hard for – if you’re – if you're me or if you’re somebody in this operations role, it’s very hard. It can be very hard to seal this – see this scale problem that we have, but we actually have – we’re creating a ton of developers year in, year out, right now, and we are not creating a lot of operations folk.

From sys admins to database administrators, cloud, et cetera, right? Like, a lot of that experience comes from being in production, right? Like, nobody teaches you how to like, troubleshoot a massive like, multi-primary database failure and prod for millions of transactions per second in school. Like, you learn that from like –

[0:11:07.5] Danny Allan: Yup.

[0:11:08.4] Cory O’Daniel: Going through it, right? And we’re just not producing people with that expertise at a rate fast enough, right? So, we have to figure out how do we scale that. Now, we can say, “Hey, we’ll just start reaching developers that.” But it’s like, “Yeah, it’s hard to teach it,” and when you see organizations that do it well, it’s like, they have like apprenticeship programs or something like that or there’s some sort of like, you know, DevOps hybrid team, where like, there’s a person like, embedded in your team that you can like, learn from a bit.

But it’s hard to just like, throw a book at somebody and be, “Go give yourself a DevOp” Like, it’s rough. Like, you can learn Docker, you can learn Terraform, but like, you don’t understand like how to operate that cloud service at scale, and so I think one of the things that’s really important for this role is for it to become a bit more mature in software development. It’s too much to add software development to become more mature in Cloud.

I think Cloud and IT needs to become more mature in software, and so it’s starting to go from things, instead of just saying like, “Hey, I’m going to configure, I’m going to write you some YAML software developer. How can I build software to ease the burden on a developer to make self-service plausible to them, without them having to learn my entire toolset?” And that requires us as operations engineers to start adopting more software principles and become better at producing software and software artefacts, rather than just producing configuration.

[0:12:24.6] Danny Allan: Did the abstractions exist to enable that to take place? So, if you want to go into this world where we have all of the infrastructural over here, and the software over here, do the abstractions exist to enable those two worlds to marry together?

[0:12:36.9] Cory O’Daniel: No. I don’t think so. No, and like, no. No, and that’s another part of the problem, right? Like, this is one of the things that I have – this if is a very hot take that I have and I’ll – this is a hill I will happily die on. When you first get your feet wet in Infrastructure as Code, it’s funny, like, one of the first things you’ll do, like let’s say – let’s say you’re trying to deploy Postgres on AWS.

You’ll go, “Okay, like, I don't know how to write the Terraform for it, or maybe I’m looking for some docs on like, how RDS is Terraform module works,” or Terraform, sorry, “Resource. Go Google it,” and boom, the first thing that comes up is the public RDS module, and you’re like, “Oh, somebody’s already done this for me.” It’s like, “Well, well, they haven’t. You still have to write your own Terraform to consume that Terraform.

So, you have to do something, and then, like, that module is literally a thousand-plus companies’ opinions on how RDS should work, not how Postgres should work, how RDS should work, and the problem with a thousand people’s opinions is no one agrees, right? So, the module itself becomes extremely watered down, and like almost like an anaemic abstraction. So now, I’m not learning Terraform or even this resource, or even how all these resources work together.

I’m learning the interface that was exposed to me that somebody else wrote, right? And so like – and those abstractions, like, they’re not – they’re anaemic. It’s like, “Hey, like, here is how you’re going to deploy some RDS,” and it’s like, “Oh, I have to, there’s a hundred parameters I have to think about,” right? But the reality is, your business is probably made a lot of decisions about how those values should be set, right?

So, like, you might say, “Hey, you know what? If it’s a production workload, we always have hourly snapshots.” And you know, or maybe for using serverless, you’re doing like, point in time recovery, right? And you’d probably say, “Hey, you know what? Even if it’s a development workload, we encrypt at rest,” right? I don’t even want to expose that. I don’t want the ability for a developer to accidentally set encryption false, and have a SOC2 landmine go off on somebody in the future.

Like, I just want to hardcode that in the module, right? Like, there are things that you have decided as an operational organisation that you want this to run like, and that’s not going to be in that module, and you could wrap it, but now, you have two layers of indirection, right? And so, like, I see that there’s not a ton of value, personally, in those open source modules, besides getting a footprint idea of like, how all the resources connect together for that concept, right?

And I think that’s one of the things that’s hard. It’s like, there’s never going to be great public abstractions, because they don’t take into account your business and your operations teams' rules, security, and compliance, and I think that’s one of the things that we just have to do as operations engineers is like, “What are the things that’s really nice about like, trying to build self-services?”

We have an incredible set of frameworks and like, tool chain to build it out of, having things like Terraform, OpenTofu, Helm, et cetera, that’s already built. We just got to put some stuff around it to make it easier for developers, right? And if we can lower that learning curve for them, and make it the language that they understand, right? Like, developers don’t necessarily care about the availability zones, right?

If I’m like, making a new service, I’m like, “Oh, you know, I need – I need Redis for caching or I need Redis for user sessions.” I don’t want to stop to think about availability zones, and the operations team is probably like, “Hey, if it’s production, we need like at least two,” right? Maybe they picked three. Yeah, I’ve seen developers be like, “Uh, how many? How many zones are available? Oh, six? Six, is that highly available?”

It’s like, “No, that’s expensive, that’s expensive is what you just did.” It’s not hot. I mean, sure, it’s highly available, maybe too highly available, right? And so, like, that’s the thing that sucks, it’s like, you’re just constantly shifting that on to them if you’re not designing these abstractions yourself.

[0:16:20.0] Danny Allan: How prescriptive does that – do those modules have to be for the organization in and is the prescriptive nature of, you know, how many databases you have or how much redundancy you have or how often you do backups, is that dependent on the vertical or the company or it is truly down to the organisation? Like, where does that abstraction, where does it get set?

[0:16:39.8] Cory O’Daniel: I think it comes to the organisation, but I think the funny thing with this is, well, many organisations do this already, they just don’t do it in software, right? Which is the thing that’s – it’s maddening to me. Like, you start out in one of two states, you’re either saying, developers do all this Terraforming cloud configuration stuff themselves, or operations does it or helps with it, right? And if they’re doing it themselves, and you’re a huge company, you probably have a bit of chaos and not the standards, right?

[0:17:09.0] Danny Allan: A lot of fragmentation.

[0:17:10.2] Cory O’Daniel: Right? But that standardisation, somebody is coming in like, “Oh, we want some standardisation.” It’s like, “Excuse me? You want some what?” Well, then you got to have questions you ask before, right? And it’s funny, like, the – kind of like the realisation of this for me was a few years ago, where I had been asked this monolith that we’re breaking down and as part of this.

It’s like, “Okay, well, the database kind of has to be like, ripped into like, three or four pieces.” And over a couple of days, the leads on these different projects, where we’re kind of splitting this monolith up, we’re splitting into a thousand pieces, reasonable size, services, not microservices. Over about a week, the leads of each of these teams asked me almost the exact same questions about Postgres.

And it was funny because I’m like, I’m literally having this conversation, like, four times, and they’re, like, “Hey, what size? What instance class do I pick for this?” And it’s like, my response is, me, asking them questions about their expected workload, right? And it’s like, so I’m having a conversation with them. I’m doing a little bit of this DevOps thing, where we’re collaborating, right? Like, we’re having a conversation about it.

I’m receiving information about their workload, I’m starting to understand it, and then I’m going to take that, run it through the computer that is my brain, and instance class is going to come out the other side, right? And I was like, “I’ve been asked this four times,” and they’re typing in instance class some place and to Terraform, or maybe there’s clicking it in the AWS console, and no one’s going to know why they pick that instance class after they hit submit, right?

But the funny thing is, this is like, I have like a very reasonable way I go about figuring out what an instance class is, right? Now, it’s like, “If my Terraform module was the questions that I asked them when they ask me what instance class, and I wrote some logic,” and this was very hard four years ago in Terraform. It’s much easier now, but like, back then, it was like, “Okay, let’s see if I can take some input and like pick an instance class,” right?

There’s also stuff where it’s like, we don’t ever want to use, like T2 or T3 instance classes, because they’re burstable and we’re running out of compute, right? So, it’s like, if it’s production workload, like, we just don’t ever want that to be an option. So, like, we’re able to like, write that into our Terraform, and then the outside of this, all of a sudden, is somebody asked me a question. I’m like, “Hey, you don’t have to ask me these questions anymore.”

“The module asked you the questions I was going to ask you,” And now, you actually have self-service. You can be like, “Oh, I know what my expected base data is and what I think it’s going to expand to each month.” Great. Filling this in, picks an instance size, and now, you have documentation as to how you got to that instance size, which is pretty nice, right? Like –

[0:19:44.5] Danny Allan: And where do you see that going? So, obviously, right now, that’s a manual questionnaire that you’re going through where it’s asking the questions, generating the YAML, and it’s going out. Do you think AI replaces that in the future? Because everyone has this magical promise of AI that it’s going to do the assessment for you, and people are going to go away.

Do you think that’s a realistic expectation around the Infrastructure as Code or no, that, you know, we’re five years, 10 years away from that?

[0:20:06.3] Cory O’Daniel: Yeah, so when we designed that, that was actually, that was the interface to the Terraform modules was those questions. So, it’s like it wasn’t, it wasn’t like, “Hey, here’s all the attributes of AWS’s service.” It was like, “Here is what the ops team would have asked you to configure this thing, and we just compute the values for you, right?” So, I mean, as far as like AI and ops, you know, I think this is going to be –

I think it’s an interesting space to watch. I think we ought to have our expectations, like, checked, though. Like, I use, right? Because I’m going to say like, “Ah, I don’t think it’s there yet, but at the same time, I use AI every day. Like I use it a ton or in the places that I know, I’m an expert, and this thing can save me some time, and I can see a result, and trust it. I can confirm it, right?

Like, this thing sucks, as you’d be like, “What’s Abraham Lincoln’s birthday?” And it’s like, it might not be right. I am popping in there and ask the thing a hundred times, there’s going to be wrong ones, right? It’s like, if you don’t, you know, we can do and go to Wikipedia, right? Like, every single time you get an answer, right? So, it’s like it’s hard to ask AI of something where you don’t have deep domain knowledge because you’re never quite sure if it’s right or not, right?

This is just like a model, right? So, you know, sitting down and saying like, “Hey, AI, we don’t have an ops person anymore because you’re here. Make my Infrastructure SOC 2 compliant.” [inaudible 0:21:22.7] What does that even mean?

[0:21:25.3] Danny Allan: Okay, right.

[0:21:25.8] Cory O’Daniel: Like, it’s a weird question, but like developers are glad. This is SOC 2 compliant, like check off told me it was wrong, right? But like, what is SOC 2 compliance? A big old bag of policies that your companies decided you want to do. Like, if it doesn’t know what your policies are, you’re stalemate, right? So, it’s like we have to put, right? The old garbage in, garbage out.

It’s like most of AI is trained on garbage, I’m sorry, right? Like, especially when it comes to Infrastructure as code. Again, like it’s one thing to train it on, you know, some Go lang, right? We got blogs on like, “Hey, best practices for writing, you know, Idiomatic, go, right?” And then we’ve got – it could crawl the entirety of the Kubernetes source code, and get an idea of like, how hundreds of people are contributing together, and get an idea of like how to write code, but then, when you get to Terraform –

[0:22:15.3] Danny Allan: There’s so much fragmentation, there’s just no—

[0:22:19.7] Cory O’Daniel: Yeah.

[0:22:20.0] Danny Allan: It’s all over the place, yeah.

[0:22:20.7] Cory O’Daniel: And everything is anaemic. So, it’s just like, “Oh, the resource, the resource has a field called instance class.” So, the variable, we’ll have a variable called instance class. I go, that’s of zero value again, right? And so, like I think this is the thing we’re missing is like for AI to be truly good for operations, we need to train it on really good operation stuff, but the catch is like, “Am I going to train AI?”

Like, is open AI going to get access to everybody’s metrics? No. Can I go fine-tune a model with my company’s metrics? Yes, I can, right? And so, I think the reality is if you’re a company that has big enough problems and big enough budget, just start using AI to do the really hard parts of operations, you can probably do that, but I don’t see an organization coming along and saying like, “Hey, we’re going to be the AI ops company that’s going to crush for you guys because we have all this information.”

It’s like, that’s some of like the most nitty gritty like private details of an organisation. Like, I don’t want to pipe my cloud watch metrics out to Open AI so they can build me like a, you know, a better configurator. Like A, that is a lot of data to egress, but also like, I don’t – do you want – do you want a third party knowing everything or having besides Datadog, or knowing everything about how your thing works, right?

And they might honestly be one of the few people that could probably do it, but now are they going to start stepping into the Infrastructure as Code space, right? Like, I mean, maybe they do. I don’t know, but like that’s the real rob is like, you got to get that really good information to that model and start training it so you can get your metrics, but beyond that, you have to have an idea of like, what my request responses are like, what’s my Infrastructure like. What’s my architecture like, right?

Like, you look at a Lambda is called 80,000 times a second, what is that thing doing? You know it’s getting called 80,000 times a second, why? Is it a loop, right? Is it a loop that’s gotten out of control? There’s 80,000 requests a second, right? Like, there’s context that really matters there, like you can’t just look at the raw metrics, right? So, there is a lot of an organisational information that you would have to push out into one of these models to make it perform well as an ops person, which I just don’t think – I don’t think companies are too super-savvy on giving all that information away.

[0:24:33.9] Danny Allan: Well, it’s interesting, I always say that AI has – use a huge unfair advantage to the large organisations that can train on a massive amount of data. Small companies that are super small probably doesn’t matter because they’re so small that individuals can do it. It’s the midsize markets that get penalised by this because they’re bigger. They’re big enough that they can’t depend on one or two people.

But they’re not big enough to have the datasets to actually use AI in a meaningful way going forward.

[0:24:57.5] Cory O’Daniel: Yeah, yeah. It will be interesting to see, like, if AWS starts to do anything there too, right? I mean, they’re the other company that has plenty of information about your metrics, but again, like what is causing those metrics to go up and down is the other question, right? It’s hard to just look at something and be like, “Oh, well, this is a container.” Okay, was it processing?

Is it processing in user request? Is it processing a cue? Like, that’s very different behaviours of like, how many times this container is called if it’s coming from people or coming from a cue, right? So, I think like, without that context of understanding, like how and why you’re architecture is the way it is, like it’s hard to just look at overall numbers, and be like, “Oh, that should be a bigger instant size, right?”

Like, maybe you need to re-architecture, maybe you need a bigger instant size, maybe you need some other change, but it’s like that’s the hard part. Like, that’s where the operational, like I’ve worked in prod, I’ve worked at a similar company to this, and I joined, like that’s where the expertise comes from, and we don’t – these systems aren’t experts. They just have tons of exposure, right? And that context support –

[0:25:57.3] Danny Allan: And they give you the exact average of that, and I mean, it’s still the grey side. We did a study – oh, we didn’t do a study. I looked to the study that came out of Cornell last month, and it said a code that was created by AI assistance, 27% of the code gave vulnerabilities. Now, imagine and code is a pretty structured thing. Like you say, it’s an IFNL’s or a for-while or do-while loops.

Infrastructure as Code is way less structured than code is at self, so you can imagine the mistakes or the hallucinations that end up by trying to do that with Infrastructure as Code.

[0:26:29.7] Cory O’Daniel: Yeah.

[0:26:30.4] Danny Allan: It just seems like a much bigger problem space.

[0:26:32.2] Cory O’Daniel: Yeah, and it’s like – I mean, the other thing is like, that it’s a bit weird of the Infrastructure as Code again is like, it’s not – you can’t. It’s not just generating the code, like that’s never –

[0:26:42.7] Danny Allan: Right.

[0:26:43.1] Cory O’Daniel: That’s never the hard part, right? And like, when I first started my company, people are like, “Oh, why don’t you make a tool that like generates Terraform?” I’m like, “The people I know in this space that knows the Cloud, their biggest problem isn’t writing some Terraform, like, right?” Like, the biggest problem is like, how. Like, how do I make myself more effective?

And like sitting there and being like, “Ah, we generated the Terraform for me.” Like, that’s not it, right? But becoming more effective is, “How do I get all these developer requests handled without me being involved, right? Like that, but I mean, you, like, I want to find –

[0:27:21.7] Danny Allan: I’m getting behind this.

[0:27:22.5] Cory O’Daniel: Yeah, I’m looking for some real yields, not like, “Oh, I saved myself like 20 minutes here or there, right?” Like, especially when you get the code, you’re like, “This code sucks,” right? Like that’s just – and like the thing in such by Infrastructure as Code too is not as linear as code-code, right? I can look at a Python function, read it step by step, understand what’s going on.

But like, looking at Infrastructure as Code is like, I look at it, like A, figure out words, hallucinated something weird, pull that out, but then it’s like, you know, there’s a DAG in Terraform. I got to figure out like which one of these resources called first. Like, oh shit, it threw a resource in there. Why? Why is this resource in here because I actually need it, or did it make an assumption and just like, toss another resource in?

I’ve seen that plenty of times, so like it’s just not as straightforward, and again, I don’t think that’s where like the big yields are in making these operations folk, which are you know, again, shrinking relative to the size of engineers. That’s not the thing that’s going to make them massively more efficient.

[0:28:17.7] Danny Allan: What do you – what’s your perspective on this helping in building security into the environment? Because one of the problems that we’ve had, I will say for over 20 years now, is we’ve always said, “Hey, we need to build security in,” and I can see Infrastructure as Code as a way to build security in or reliability or resiliency into the systems, but it hasn’t worked for the last 20 years.

Do you think that there is something different today that we can provide the quality, the security, the efficiency, the whatever, into our systems by using modules of this type?

[0:28:50.4] Cory O’Daniel: I think so, and I think – so, I mean, like AI, I’m a positive. I’m sure there’s going to be great applications for AI and security, but it’s funny. It’s funny, like when we love shifting things left. I feel like this is like, “Well, I will just like…” Have you seen the meme where, like, the DevOps title just becomes longer and longer? It’s like DevOps DevSecFanAIOps. It’s just this like huge amount of names.

It’s just like, yeah, it’s the developers. They have a keyboard, right? So, they can do security, great, they’re DevSecOps now, right? It’s like, we want to shift all this stuff left, and it’s like, in my opinion, like A, these are different things. Like, there’s ML Ops, and there’s Sec, and there’s AI, right? People is going to shift it all left, but like when we’re talking about security, right? I don’t think you shift security left by saying, “Will developers do it?”

Like, again, like there are people with security expertise, and there’s people that are far bigger security experts than I am, right? And like, if you have the team where you have that security expertise, like you want them doing that, and you want your developers to be thinking securely, but the reality is like, are they all going to become, you know, right? Hackers overnight? No, like you’re paying these people to build features that create money for your company.

Like, you don’t want them becoming an expert in everything that requires a keyboard, right? You want them to have their specialities, you want them to understand their domain, you want to be a profitable business, that’s why we’re all here, right? And so, the funny thing is like, when you’re shifting security left, like you can start to do this with like, really good abstractions, or you can start to take this enough into account, right?

So, like, here’s one of the things that’s funny is when you look at Infrastructure as Code, it’s like okay, talk back to Postgres or even Redis. It’s like, okay, we’re going to – we’re a company, we’re going to have 30 different Postgres databases because we have microservices, or whatever. Great, you’ve probably made a decision about how you want to authenticate to that, right? Maybe we want to use.

We’re in AWS, we want to use IEM roles all the time. Great, that’s like, we’ve all decided that. We’re all sitting in this boardroom, we’ve all said that. This is fantastic, let’s put it in a runbook, in a ReadMe, and tell developers make sure when you make a Postgres database to use IEM policies and IEM role access. That’s like, or you can just encode the thing, and that’s the only way you can authenticate to it.

And it’s just like the – but we can put these security standards into our modules. Like, we can actually shift it left. The catch is it’s not the developer doing it, it is the team that would, you know, typically be called the platform team, like saying, “Okay, this is how we will authenticate the databases. We want to use IEM in roles, and we’re going to actually design our modules, you have that built in, and there is no way to change it.”

“We’ve taken away all. Setting the password? Nope, you don’t even get that as an option. It is going to be IEM-based authentication for all RDS services,” and that’s all there is to it, and you might have somebody come along in a year, and this is like the biggest – it’s funny, it’s a funny piece of pushback but this is the most common piece of pushback I get on this idea of abstraction.

Somebody is like, “Well, if sometimes somebody needs the configuration to be different.” It’s like, that’s fine. That’s great actually, because when you have somebody come along a year later and they’re like, “Hey, you know what? All of our databases authenticate using IEM, but we’re using this open-source service now that needs Postgres, and the tool itself can do IEM-based authentication. It needs an actual username and password.”

It’s like, “Okay, okay. We got a new – we’ve got ourselves a new used case. That’s great.” But guess what? In the meantime, in that year, nobody’s had to think about it, and that security it was handled by IEM, and so now, we have something special, and that’s great. Now, do we change that module to make it less secure? No, we say, “You’ve got an edge case.” We have edge cases, right? And you make a second module.

These things are not this like, just carved from marble, like amazing pieces of software. Dude, it’s a bag of Jace, and you hit an API with. Just make a second one, call it less secure Postgres module, and people are going to see it, and again be like, “Well, that’s not the one I want to use,” right? Right, and they’re going to use the one called more secure Postgres module, right? I mean, you’d probably give it a better name than that, but like, that is so funny.

It’s like, well, we already wrote a module. Like, we had this in software, where we’re like solid principles, and everybody was obsessed with drive for like a decade, and everything was just so dry. We’re like, “Our software got really sucky,” and then, like, wet came about, like write everything twice or write everything three times, you can remember, right? People are like, “Yeah, maybe we shouldn’t just like, collapse everything that when we see it the second time,” right?

It’s like, “We haven’t had that.” We haven’t had that moment in Infrastructure as Code. Yeah, we’re like, “No, we have one RDS module and it’s like, “Why?” Because we’re doing it dry.

[0:33:28.7] Danny Allan: Yeah, people are way too binary.

[0:33:30.2] Cory O’Daniel: Yeah.

[0:33:30.3] Danny Allan: Black, white, and that is not the reality. Yes, lots of black and white but you should have shades of grey. I talk about this concept of policies, where you have green light policies, red light policies, but if you don’t have yellow light policies, you are going to fail because not everything is going to fall in that like green light, red light.

[0:33:44.9] Cory O’Daniel: Yup, yeah.

[0:33:45.7] Danny Allan: Type approach. So, well, it is awesome, Cory, having this conversation with you. I rarely run into CEOs that go as deeper or as technical as you do. So, clearly, you are hands-on and continue to be hands-on at Massdriver.

[0:34:00.0] Cory O’Daniel: Too hands-on, too hands-on. Yeah. We joke sometimes that CEO means Chief Elixir Officer because we’re an Elixir-based company, but you know, so I need to be more executive and less Elixir.

[0:34:13.8] Danny Allan: Yes, you’re a celebrated author of Bonny, which is an Elixir-based open-sourced component, correct?

[0:34:20.6] Cory O’Daniel: It is. Whoever is celebrating me, like, invite me to the party. I haven’t been there yet. Yeah-yeah-yeah. So, it’s an operator framework for extending Kubernetes in Elixir.

[0:34:30.8] Danny Allan: Yeah.

[0:34:31.0] Cory O’Daniel: And this is very important, like, people are like, “Why does this exist?” Well, because seven years ago, I didn’t want to write Golang. It was just like, I was working a lot, I was consulting, I was working a lot in Kubernetes, I write mostly in Elixir, and Erlang these days, or those days. Now, I write a lot of Golang. So, I’m just like, “What did I do?” But yeah, so, it’s funny, like seven years ago, it was like, “I’ll do anything not to learn Go.” I’m like, “I guess I’ll just write my own operator framework from scratch.”

[0:34:58.5] Danny Allan: I don’t know if that’s the fastest way to do it or not.

[0:35:00.4] Cory O’Daniel: No, but that is – that is the way that we do things, right? We see – we see an open source project, and we’re like, “That’s not the way I would do it,” and you have to rewrite the entire thing from scratch. It’s the only way to develop.

[0:35:12.5] Danny Allan: Well, I’m a hands-on learner myself, and so when I want to learn something, I do the exact same thing.

[0:35:16.5] Cory O’Daniel: Yeah.

[0:35:16.8] Danny Allan: So, I totally get it and appreciate it.

[0:35:18.7] Cory O’Daniel: I’ll tell you what, people are like, “How did you learn so much about Kubernetes?” I’m like, “Writing an operator framework like you.” Like you end up like, just deep in how the thing works, and I’m just like, “The best way to learn things is to read the Kubernetes IO docs or just go write your own framework, and you will learn a lot.” You’ll get these.

[0:35:37.0] Danny Allan: Yes.

[0:35:36.9] Cory O’Daniel: You get a lot of these, but –

[0:35:38.2] Danny Allan: Wow, believe me, I know a lot of what the wrinkles on the forehead and all over. Last question. I guess, I’m curious Cory, you’ve seen a lot in the infrastructure is as code space, what makes you most optimistic about the future? Like, where do you think we’re going and – well, maybe it doesn’t make you optimistic, but what do you think is most exciting about where the industry is going?

[0:35:57.9] Cory O’Daniel: I think we’re starting to get to the point where we’re having that – and this is a very odd thing to be excited about. I’m not floored by AI, it’s not blowing me away right now. I think we’ll probably get to a point where it’s pretty substantial, but like, again, like you were saying earlier, like, if you’re not one of the huge companies, like, there’s – it’s hard to get really good operational yield out of AI today.

I think one of the things that is most important and most exciting is there’s a lot of people that are starting to beat this like drum of platform engineering, and like I could go into that a bit more. Like, I hate – I’m the host of the Platform Engineering Podcast, I hate that we have to use this term to like move forward, but that’s where we are, right? Like, platform engineering is starting to become big.

And people are starting to, you know, associate it with this idea of the operations team, or like the operations-minded folk producing actual software for people to consume, and I think that was the original intention of where we were with DevOps 20 years ago or 18 years ago or whatever, right? And like, the idea that like, that is starting to come around, we’ve had to create a new term just to like, kind of separate it from what DevOps has become today, that’s fine.

It sucks that we just had to rebrand the same idea, but here is – well, this is where we are, and the fact that people are starting to bring in some of these software principles, right? Like – I’m not the only person who is talking about abstractions and starting to take some of these principles and like, moving away from dry, and like, the fact there’s a bunch of voices starting to talk about this, I think that we’re close to getting to that software maturity point that we kind of hit in the early 2000s, right?

I mean, I think for people that were like, writing software in the like, late 90s, early 2000s, it sucked, right? Like, it just –

[0:37:29.2] Danny Allan: Yes.

[0:37:30.9] Cory O’Daniel: It was fun, but it was just like, every code base was a mess, right? And then, you know, and then we started having principles, like 12 factor, and like people starting to take these bigger ideas of like, how to write good software together, and I think we’re starting to get to that point in operations, and the big boon there is if we can start doing that and provide these better APIs for developers, better DevEx, better self-service, we’re going to get better compliance.

We’re going to get better security because that will be ingrained in the product that we’re creating, rather than shoved off onto some random person to figure out, right? And I think one of the things that’s more important than anything else right now, especially in the world that we’re entering slowly is we need to have more secure software by default, and unfortunately, when you look at the cloud, like, the cloud’s goal is not to give you secure stuff by default.

It is to give you every option under the sun, and then to sell you additional security on top of it, right? Go turn on all the security stuff in AWS, right? Like, why is encryption even an option, right? Like, they have to make it where it works for everybody, no matter what their use case is. They’re not necessarily selling new security, right? And so, we need to produce more secure systems.

That starts with operations, starting to encapsulate and truly shift security left by starting to lean into platform engineering and these ideas of shipping software, instead of shipping configuration.

[0:38:49.6] Danny Allan: Well, I love that because it totally fits into the theme of The Secure Developer, which is fast by design and secure by default. Make sure that you take away the option for them to do something that is insecure, and you build it into the framework itself.

[0:39:02.7] Cory O’Daniel: Yeah.

[0:39:02.9] Danny Allan: Well, Cory, it’s been great to have you on the podcast, thank you. I love what you're doing, and I love all the passion for the infrastructure as code space, and thank you to everyone for joining us on The Secure Developer. We’ll be with you again next time. Thank you, Cory.

[0:39:17.1] Cory O’Daniel: Thank you.

[END OF INTERVIEW]

[0:39:20.6] Guy Podjarny: Thanks for tuning in to The Secure Developer, brought to you by Snyk. We hope this episode gave you new insights and strategies to help you champion security in your organisation. If you like this conversation, please leave us a review on iTunes, Spotify, or wherever you get your podcast, and share the episode with fellow security leaders who might benefit from our discussions.

We’d love to hear your recommendations for future guests, topics, or any feedback you might have to help us get better. Please contact us by connecting with us on LinkedIn under our Snyk account or by emailing us at thesecuredev@snyk.io. That’s it for now, I hope you join us for the next one.

Up next

You're all caught up with the latest episodes!

The developer security platform

The Evolution Of Platform Engineering With Massdriver CEO Cory O’Daniel

Up next