Nutanix Weekly
Join XenTegra on a journey through the transformative world of Nutanix’s hyper-converged infrastructure. Each episode of our podcast dives into how Nutanix’s innovative technology seamlessly integrates into your hybrid and multi-cloud strategy, simplifying management and operations with its one-click solutions. Whether you're operating on-premises or in the cloud, discover how Nutanix enables always-on availability, intelligent automation, and the operational simplicity that drives business forward. Tune in for expert insights, real-world success stories, and interactive discussions. Engage with us as we explore how to harness the full potential of your IT environment in this rapidly evolving digital landscape.
Nutanix Weekly
Nutanix Weekly: Running NVIDIA Triton Inference Server on Nutanix
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Artificial Intelligence in the datacenter is one of the more exciting announcements from Nutanix in the last couple years and the partnership with NVIDIA is growing. In this episode, we walk through what it takes to run NVIDIA Inferencing Server on Nutanix and how that works with real-world implications that you can test yourself.
Reference Blog: https://www.nutanix.dev/2024/01/15/running-nvidia-triton-inference-server-on-nutanix/
Host: Phil Sellers
Co-Host: Harvey Green
Co-Host: Jirah Cox
WEBVTT
1
00:00:03.270 --> 00:00:19.899
Philip Sellers: Hello and welcome to another episode of Nutanix Weekly, one of the many XenTegra Podcasts with context this is, Phil sellers, your host. Again for this episode, and wanna say, welcome and thanks for spending a little time with us.
2
00:00:20.701 --> 00:00:32.900
Philip Sellers: We know you probably have very little free time. So while you're on a jog or treadmill, or whatever it is that you're doing listening to us. We we appreciate you spending a little time with us.
3
00:00:33.494 --> 00:00:44.530
Philip Sellers: You know, we we call these things podcasts with contacts because we try to bring the real world experience and the real world conversations. We're having with customers
4
00:00:44.670 --> 00:01:00.339
Philip Sellers: to the conversation here that you're listening to, and, as always, the only way I can do that is, with other great technologists and minds on the phone with us today I've got Mr. Harvey Green. CEO is integr
5
00:01:00.500 --> 00:01:02.800
Philip Sellers: Harvey. How's the world treating you.
6
00:01:03.790 --> 00:01:09.849
Harvey Green III: Pretty good, pretty good. I can't believe that you would imply that people are doing other things while they listen to our podcast.
7
00:01:10.699 --> 00:01:11.549
Philip Sellers: Hey!
8
00:01:11.970 --> 00:01:12.390
Harvey Green III: We're.
9
00:01:12.922 --> 00:01:15.587
Philip Sellers: Multitaskers in this world, right?
10
00:01:17.370 --> 00:01:20.378
Harvey Green III: Yeah, alright, I guess that's true.
11
00:01:21.120 --> 00:01:31.899
Jirah Cox: Are you implying our user base is not like, sitting like seventies like hi-fi headphones, and like the rack Mount Amp, you know, sitting there on the beambag, just totally meditating on just our our only our spoken words.
12
00:01:32.210 --> 00:01:32.870
Harvey Green III: Well.
13
00:01:32.870 --> 00:01:33.329
Philip Sellers: You know.
14
00:01:33.330 --> 00:01:34.360
Harvey Green III: That the way.
15
00:01:34.490 --> 00:01:41.864
Philip Sellers: I've heard my voice. So yeah, I mean, just you know, I I'm not thinking I've got the asmr thing going on.
16
00:01:42.350 --> 00:01:45.650
Philip Sellers: So I'm just appreciative. Somebody's listening at the end of the day.
17
00:01:46.417 --> 00:01:53.402
Philip Sellers: But yeah, I mean other than self deprecating humor. We're also joined with.
18
00:01:53.940 --> 00:02:05.069
Jirah Cox: I bring that to. That's fine. I'm a team player, but no grateful to be a part of the the the Xpu, the Zintra podcast universe. It's great to be here.
19
00:02:06.165 --> 00:02:28.959
Philip Sellers: Well, Jagger, we we all get you joining us. You know, and and bringing the knowledge at the end of the day. That's 1 of the things I really really have enjoyed. It's been over 18 months. I've been here at Zintagra, and I wanna say, I got to meet you week. One cause, I think. Week one. I was on a nutanix podcast with you.
20
00:02:28.960 --> 00:02:31.200
Harvey Green III: You sure were I made sure of it?
21
00:02:33.880 --> 00:02:43.990
Philip Sellers: And and honestly, I mean, it's it's great. Learn so much. And I think everybody listening has learned so much, so just really appreciate you. What you bring to the table?
22
00:02:45.180 --> 00:02:52.580
Philip Sellers: And and speaking of today, we want to talk a little bit about Nvidia on Nutanix. And so
23
00:02:53.060 --> 00:03:20.730
Philip Sellers: bringing to the table, we've got a great blog post from the Nutanix dot dev blog for everybody watching on Youtube replays you can see it on screen. But the title of the blog is running in video. Triton, Interface Server on Nutanix and this is written by Laura Giordano, technical marketing engineering director at Nutanix. So shout out to Laura, thanks so much for posting this and giving us something to talk about today.
24
00:03:21.620 --> 00:03:37.739
Jirah Cox: Fantastic one of the one of the 1st Esses at Utanics and then now a huge player in tech marketing for us. And if you've been to a dot next conference, or one of the bigger dot next on Tours, you've actually probably seen Laura in person. Usually, you know.
25
00:03:38.290 --> 00:03:45.880
Jirah Cox: running some awesome, awesome Demos right there from stage of something that like just gets announced. And like 2 min later, you see Laura giving a demo of it, it's awesome.
26
00:03:46.550 --> 00:03:52.645
Philip Sellers: That's awesome shout out to Laura, thanks so much for giving us fun stuff to talk about.
27
00:03:53.310 --> 00:04:00.329
Philip Sellers: So I mean, this is big news for for Nutanix and and across the space right doing
28
00:04:02.060 --> 00:04:06.300
Philip Sellers: doing AI AI type things on prem.
29
00:04:06.856 --> 00:04:14.190
Philip Sellers: this was a huge topic. As we were out at New tanks next a few months ago in Barcelona.
30
00:04:14.910 --> 00:04:16.110
Philip Sellers: I mean
31
00:04:16.339 --> 00:04:25.569
Philip Sellers: a as you guys have thought about this. I mean, what are the used cases, or what are the things you've had conversations following up of of on prem AI.
32
00:04:26.850 --> 00:04:28.170
Jirah Cox: Oh, we have, I mean.
33
00:04:28.310 --> 00:04:30.150
Jirah Cox: huge and and
34
00:04:30.170 --> 00:04:46.199
Jirah Cox: pretty lengthy, right? But some, of course, some of the top ones. I even have customers that already are running things like on prem copilot right to get help. Their software developers write code faster and better, but of course, fully, privately, and of course, with performance.
35
00:04:46.670 --> 00:04:52.419
Jirah Cox: any kind of thing stuff like inference, image detection computer vision
36
00:04:52.570 --> 00:04:58.710
Jirah Cox: chat bots. Of course, we've actually launched a number of them, even internally, here at Nutanix. Right? That we're sort of cutting our teeth on around.
37
00:04:58.720 --> 00:05:11.190
Jirah Cox: you know. Help, query, knowledge bases get answers faster. Help, support customer service kind of outcomes. So yeah, it's the the list is almost limited only by your imagination of what it could help.
38
00:05:11.310 --> 00:05:12.480
Jirah Cox: You do
39
00:05:12.740 --> 00:05:16.150
Jirah Cox: right. Usually there's more questions around, how do I start doing it?
40
00:05:16.420 --> 00:05:17.080
Philip Sellers: Yeah.
41
00:05:17.580 --> 00:05:22.160
Philip Sellers: yeah. Harvey, you and I, I know, spent some time doing hands on labs
42
00:05:22.490 --> 00:05:26.248
Philip Sellers: trying to understand. How. How do you do that I mean.
43
00:05:26.850 --> 00:05:34.679
Philip Sellers: You know what what kind of stands out to you as as customers start to have those conversations. And the how do you do it?
44
00:05:35.840 --> 00:05:37.991
Harvey Green III: Yeah, I mean, I think that
45
00:05:38.780 --> 00:05:52.240
Harvey Green III: there's a lot of talk about this overall. And then when you get past the overall talk and we start talking about. Should we do it and get past that into now, how do we do it?
46
00:05:53.055 --> 00:05:54.690
Harvey Green III: You know the
47
00:05:54.710 --> 00:05:59.060
Harvey Green III: the picture gets deeper and deeper each time. But
48
00:05:59.606 --> 00:06:18.360
Harvey Green III: this is this covers one of those top level decisions of, are we going to do this internally and keep our data here or externally, and use, you know, something that's already fabricated in there. This gives you the option to do that internally.
49
00:06:18.450 --> 00:06:26.270
Harvey Green III: Keep your data where it is maintain. You know, the chain of custody over that. So it's not going anywhere.
50
00:06:26.670 --> 00:06:32.790
Harvey Green III: And I'm I'm sure that makes security people just a little bit happier. In some cases.
51
00:06:33.770 --> 00:06:35.260
Philip Sellers: I think, you know.
52
00:06:35.410 --> 00:06:44.549
Philip Sellers: tailing on to what you just said, there's certain data, use cases where we could never, that that data was never leaving our data center so we couldn't tap into.
53
00:06:44.550 --> 00:06:45.210
Harvey Green III: Then.
54
00:06:46.090 --> 00:06:51.350
Philip Sellers: We couldn't tap into that. And and we couldn't do things on that. So that definitely
55
00:06:51.360 --> 00:06:54.189
Philip Sellers: opens new use cases in and of itself.
56
00:06:54.660 --> 00:06:58.869
Harvey Green III: Yeah, I mean, one of the use cases you've already brought up is, you know, being able to
57
00:06:59.140 --> 00:07:03.619
Harvey Green III: go through that internally and be able to help it.
58
00:07:04.275 --> 00:07:07.428
Harvey Green III: Help the developers write better code.
59
00:07:09.170 --> 00:07:13.040
Harvey Green III: that in and of itself you've got developers developing
60
00:07:13.080 --> 00:07:17.416
Harvey Green III: against something that is not out there, not, you know,
61
00:07:18.260 --> 00:07:22.629
Harvey Green III: not out to users, not, you know, generally available.
62
00:07:23.094 --> 00:07:27.215
Harvey Green III: That you're still doing testing on that. I mean, definitely has
63
00:07:27.690 --> 00:07:32.670
Harvey Green III: some some business intelligence in there. You know some some of the
64
00:07:33.277 --> 00:07:40.820
Harvey Green III: intellectual property that you want to keep safe while you're still in the process of actually making it something
65
00:07:41.955 --> 00:07:46.210
Harvey Green III: and keeping that in house is very important.
66
00:07:48.340 --> 00:08:00.070
Philip Sellers: I, I 100% agree so tactically, one of the ways that we can do this now is is tapping into the Nvidia Gpus in cluster. You know you and I both
67
00:08:00.490 --> 00:08:08.740
Philip Sellers: worked on some kubernetes, or I I guess it was technically a Mini K 8 cluster, just a kind of a small scale
68
00:08:09.276 --> 00:08:19.499
Philip Sellers: doing labs during dot next. And so a lot of this stuff is written in Cloud native. It's it's already released as containerized workloads.
69
00:08:19.520 --> 00:08:30.029
Philip Sellers: So you gotta have a basis in that. And then, or not necessarily. In this case, you know, the article we're talking about is is creating a Gpu enabled Vm.
70
00:08:30.507 --> 00:08:43.609
Philip Sellers: but then, basically, you, you take some open source projects or some commercial software. And you you bring that all together. So let's kind of talk a little bit through this one. And
71
00:08:44.041 --> 00:08:56.260
Philip Sellers: it. It's very, I'm gonna say, cut and dry. It's a step by step build process for running the Nvidia inference server, the Triton inference server on prem.
72
00:08:56.870 --> 00:08:57.530
Philip Sellers: But
73
00:08:58.410 --> 00:09:05.629
Philip Sellers: I I guess the introduction walks us through a couple of things. Key. Announce things like that, Jair, do you wanna kinda set the stage for us.
74
00:09:05.920 --> 00:09:24.471
Jirah Cox: Totally right? So this is is gonna be to your point, Philip, like a very prescriptive walkthrough like, do this, do this, do this, and then at the end of the cooking show, you're gonna have a similar kind of a you know roast turkey in this case a a Triton inference server. The what this brings together right is, of course, the awesome
75
00:09:24.770 --> 00:09:37.009
Jirah Cox: platform, right? How do I deploy vms as an automated service, a substrate that I need like a gpu like you mentioned. Of course, being in video certified great ingredient for a wonderful recipe there.
76
00:09:37.480 --> 00:10:01.129
Jirah Cox: so having nutanix nodes that are on the Nvidia certified systems list. And of course, lastly, something to do right? So this is gonna show deploying, hugging face. Who actually, more recently than this article, right? As we got like, we also learned together in Barcelona, New Nutanix technology partner, right? So you can now totally get wonderful all the way joint support from Nutanix and hugging face for deploying things like this.
77
00:10:01.738 --> 00:10:05.223
Jirah Cox: And of course, the last thing to probably say that
78
00:10:06.246 --> 00:10:22.840
Jirah Cox: just being a good corporate citizen. This is, of course, one way to deploy AI models onto Nutanix. We actually announced@our.next show right coming soon that Nutanix gpt in a box will be will be an even easier, like more of a more of a product than a process. Right? Click, click next finish.
79
00:10:22.950 --> 00:10:24.840
Jirah Cox: Now you're running AI in production
80
00:10:25.050 --> 00:10:34.789
Jirah Cox: style of outcome. This is more of a pop. Let's pop the hood. Let's do it from the Cli and get that all kind of running in a bit more of a manual, controllable fashion as well.
81
00:10:35.540 --> 00:10:41.060
Philip Sellers: Yeah. And I think that's important. Like, Gpt in a box came out over a year ago. Version one
82
00:10:41.080 --> 00:11:03.739
Philip Sellers: Gpt. Innovation 2 was announced during dot. Next, moving it more from a project product and helping users adopt it easily with workflows and engines to to get value out of it more quickly, and hugging face. As you said. You know it's a community with all the tools and things to help you with your AI development. So
83
00:11:03.740 --> 00:11:18.210
Philip Sellers: it's bring that side of the equation, the development, the the software side of it to to what we need. So probably not going to need to go into depth as we go through this, but I mean.
84
00:11:18.210 --> 00:11:19.730
Jirah Cox: Principles will still apply.
85
00:11:19.860 --> 00:11:21.213
Philip Sellers: Yeah. Principles.
86
00:11:22.100 --> 00:11:29.640
Philip Sellers: first, st step up. We're going to create a Gpu enabled. Vm, obviously, you've got to be able to consume the Gpu cycles. So
87
00:11:29.710 --> 00:11:32.598
Philip Sellers: that makes a lot of sense. But.
88
00:11:32.960 --> 00:11:51.290
Jirah Cox: Cool 2 neat takeaways here with the way Laura does this, that you know, especially if you're coming to New Tan X, and you have more of like what I used to have a very traditional windows. Server admin background couple of really cool technologies here. One is that a lot of your Linux vendors. In this case we're gonna show using ubuntu in the late Lts. There, 22 0, 4.
89
00:11:51.330 --> 00:12:07.820
Jirah Cox: The Linux vendors will give you. Here is a full on bootable, virtual disk image. It's not an iso for installation. It's a virtual CD virtual hard drive. You can just power it on right just download it, turn it on one of the rap most rapid ways to get a workload up and running right just download a Linux image.
90
00:12:07.860 --> 00:12:23.439
Jirah Cox: Second one, though, is showing the cloud in it file, which is a way to as you're creating the Vm. Paste in a little block of text that says, essentially like an on 1st boot. Personalize the Vm. This way. So in this case, Laura's doing that to turn on Ssh.
91
00:12:23.440 --> 00:12:44.219
Jirah Cox: so the Vm. Can then be accessed over Ssh. By a putty, or the windows command line or Mac command line, and secondly, set a password for that. So now we can do that do that securely as well. So some really neat tricks to take away here from just a general. Sys admin point of view of ways to get Linux up and running on Nutanix rapidly. Whether we do a Gpu or not, whether we're doing AI or not.
92
00:12:45.180 --> 00:12:53.492
Philip Sellers: Yeah, absolutely. You know, the key here is that that pass through. That's what we're tapping into in the virtual machine.
93
00:12:54.980 --> 00:12:56.400
Philip Sellers: But the guest system.
94
00:12:56.400 --> 00:12:59.309
Jirah Cox: Have a Gpu step one, pass it through.
95
00:13:01.510 --> 00:13:09.719
Philip Sellers: Yeah. And then again, like, you're pointing out cloud in it. The scripting language to just onboard that make it easier to
96
00:13:09.850 --> 00:13:13.159
Philip Sellers: to do this in mass, right? I mean.
97
00:13:13.450 --> 00:13:32.369
Philip Sellers: automation has always been a core component of nutanix. Being able to pass that through onto the software layers is now a critical success factor as you try to do projects like AI inference engines on the infrastructure. So just following through that same thread of
98
00:13:33.840 --> 00:13:42.160
Philip Sellers: heavy, I'm gonna say heavy but I don't mean heavy in a bad way. I mean heavy in a good way. Very orchestrated workflows.
99
00:13:43.150 --> 00:13:48.230
Jirah Cox: Yeah, that's a good point, like, like thinking about like zooming in from like the house to like a brick
100
00:13:48.280 --> 00:13:53.170
Jirah Cox: like this could be a work flow where, like my Vm creation, automation might
101
00:13:53.550 --> 00:14:10.350
Jirah Cox: randomly generate that password and stash it in a vault or an itsm versus. In this case, use it as hard coded right, but it shows where you could stick that variable in and use this for a a more repeatable, more broad more secure automated process.
102
00:14:10.520 --> 00:14:10.930
Philip Sellers: Yeah.
103
00:14:11.490 --> 00:14:24.320
Philip Sellers: yeah, it looks like in this example, there's still some manual stuff that we have to do once again created nothing going. We have to install some drivers. It has to know how to talk to the Nidia Gpu at the end of the day.
104
00:14:26.180 --> 00:14:33.039
Philip Sellers: So what else are we doing here? It looks like we. We install some drivers, and then what's next?
105
00:14:34.400 --> 00:14:41.120
Jirah Cox: We're installing drivers, verifying the output of that, of course, installing docker right? When I have a way to to run containerize applications.
106
00:14:41.350 --> 00:14:47.650
Jirah Cox: So installing docker, installing the Nvidia container toolkit. That shows that coming out of the Nvidia Github
107
00:14:47.820 --> 00:14:49.310
Jirah Cox: Repo, there.
108
00:14:50.400 --> 00:15:02.209
Philip Sellers: And that's that's something that's stands out to me like we're going after repos and pulling this stuff. We're not writing this stuff, but we're just tapping in and pulling it down. We're consuming it. At the end of the day.
109
00:15:02.610 --> 00:15:03.720
Jirah Cox: Yeah, totally true.
110
00:15:06.012 --> 00:15:06.785
Jirah Cox: The
111
00:15:08.860 --> 00:15:29.029
Jirah Cox: installing mini conda small, which I just checked out. They're good their repo as well to see what that is, Mini Conda is a free minimal installer for Conda, which is a Bootstrap version of Anaconda that includes Conda Python. Other things like that Pip. C. Lib, and a few others. So application, kind of framework runtime, to get python up and running in a kind of predictable state. There.
112
00:15:29.230 --> 00:15:29.950
Philip Sellers: Okay.
113
00:15:30.550 --> 00:15:32.100
Jirah Cox: Selling, many, many condo.
114
00:15:32.100 --> 00:15:34.194
Philip Sellers: What's Minicanda? So
115
00:15:36.560 --> 00:15:42.489
Philip Sellers: you know, as an infrastructure professional coming into this, there's so many different
116
00:15:43.242 --> 00:16:03.890
Philip Sellers: tools, libraries, things like that, just understanding all the different components that we're tapping into is sometimes a difficulty as you approach these projects. But I think, that's where a quick start like this definitely helps us to to see it and then start to build on what we've successfully done to do. The next thing.
117
00:16:04.080 --> 00:16:11.440
Jirah Cox: Totally. In some ways, I think you know, standing up AI or Gpt or Llm based solutions in a data center are.
118
00:16:11.894 --> 00:16:38.110
Jirah Cox: maybe the most hardware defined thing that some of us have done in 20 years. Right? We've been doing things in vms, right things where it's just like, define your resources in an abstract way of CPU memory storage and this is kind of getting back more, very close to the metal of like no, you. D! Thou shalt have this Gpu which needs this driver to talk to it. And here's how you how you stand that up. Kind of stuff much more prescriptive than some so than what some of us are very used to for our day jobs.
119
00:16:39.659 --> 00:17:06.910
Jirah Cox: So getting that the Trident Inference Service stood up itself again. Here again we see that that pulls down from Github. The inference server is, of course, like Nvidia's packaged way of running this kind of as a larger, like framework of like, how can I run models and Gpt's and Llms in my data center? I could run tens of thousands of them, and then I could do training. I could run one or 2, or 3, and then I could do inference, or run an app or do a proof of concept.
120
00:17:07.274 --> 00:17:11.960
Jirah Cox: So this, of course, is going to be a single node. But there's no reasons. Couldn't be scalable as well.
121
00:17:12.599 --> 00:17:17.809
Jirah Cox: So pretty simple. There, get clone, and then run a script that it provides there locally. There.
122
00:17:18.050 --> 00:17:18.359
Philip Sellers: Yeah.
123
00:17:18.369 --> 00:17:31.189
Jirah Cox: And then, of course, we launch it as a as a docker run right? So here we see that docker framework standing up the containerized micro service that it provides there to run the Triton server and expose that on a web server.
124
00:17:31.610 --> 00:17:55.840
Philip Sellers: Yeah. And so I, I think it's important. You know, as we talk about this, we're looking at Docker. This is a proof of concept. This is not a production scale thing. Likely the same container would run on a Kubernetes infrastructure at scale with other things configured. This is more of a Poc type proving ground where you can easily get things up and running. But this is a
125
00:17:55.910 --> 00:17:57.060
Philip Sellers: a single
126
00:17:57.090 --> 00:18:01.200
Philip Sellers: instance sort of thing, not not something that's ready for production.
127
00:18:03.690 --> 00:18:06.778
Philip Sellers: So we've got a talker container up.
128
00:18:07.740 --> 00:18:11.070
Philip Sellers: you know. Then we send an inference request.
129
00:18:12.790 --> 00:18:13.869
Philip Sellers: What is that.
130
00:18:15.120 --> 00:18:20.199
Jirah Cox: The inference request. I'm catching up with you here as well is
131
00:18:20.650 --> 00:18:27.379
Jirah Cox: actually passing. It looks like a Jpeg here that's asking it to to infer right
132
00:18:27.840 --> 00:18:50.040
Jirah Cox: spoiler inference. Server infers. What's in this picture? I just gave you right. And what's really neat is so of course this is. This is to your point, Philip. Very proof of concept. Ish very kind of not quite rudimentary, but very low level. Right? I give it an image. I get back text about what's in the image right? And like confidence scores. I even. I think that's what the numbers there imply
133
00:18:50.040 --> 00:18:58.790
Jirah Cox: coffee, mug, cup coffee pot, right? What was in what I just gave you? So image class image classification type activities here.
134
00:18:59.170 --> 00:19:01.650
Philip Sellers: So I'm scrolling back up a little bit so like
135
00:19:01.690 --> 00:19:03.140
Philip Sellers: that.
136
00:19:03.840 --> 00:19:09.500
Philip Sellers: That seems like magic to me right? I've just pulled a couple of things from Github. I've I've
137
00:19:09.640 --> 00:19:14.100
Philip Sellers: says spell. And suddenly it's giving me this text. It says it's coughed up.
138
00:19:14.460 --> 00:19:21.769
Philip Sellers: But that's not actually what's going on above. When we launch this, we and I think we, we
139
00:19:21.830 --> 00:19:36.440
Philip Sellers: may skipped over this, but we fetched a model. And and I think that's an important concept for people to talk about when we talk about AI is that there's a model that we pulled as well. That model was trained somewhere else.
140
00:19:36.670 --> 00:19:43.960
Philip Sellers: That model is being consumed. But we're benefiting from the knowledge of that model that
141
00:19:44.170 --> 00:19:47.451
Philip Sellers: that happened. That training happened somewhere else.
142
00:19:48.431 --> 00:19:51.199
Philip Sellers: and so it's not black magic.
143
00:19:51.390 --> 00:19:55.899
Philip Sellers: Somebody spent a lot of CPU cycles creating that model that we're now consuming.
144
00:19:56.090 --> 00:19:56.420
Harvey Green III: Yeah.
145
00:19:56.420 --> 00:19:57.150
Jirah Cox: Oh, wait!
146
00:19:57.150 --> 00:20:10.050
Harvey Green III: I kinda liken this to encyclopedias only because I I tell people all the time my kids will never know what it's like to pull an encyclopedia off the shelf and have to look something up.
147
00:20:10.808 --> 00:20:16.730
Harvey Green III: Th. This is like bringing in the encyclopedia and basically teaching
148
00:20:16.940 --> 00:20:23.119
Harvey Green III: what you needed to know in order to accomplish the things that you're planning on doing in the near future.
149
00:20:23.510 --> 00:20:24.130
Philip Sellers: Yeah.
150
00:20:24.680 --> 00:20:25.993
Jirah Cox: Totally the
151
00:20:27.430 --> 00:20:29.769
Jirah Cox: Yeah. Fully agree with you, Harvey. Like, if you're listening.
152
00:20:29.770 --> 00:20:31.840
Harvey Green III: May have dated myself there. No.
153
00:20:31.840 --> 00:20:32.370
Jirah Cox: Well, that?
154
00:20:32.750 --> 00:20:43.599
Jirah Cox: Well, the the phone, the phone version I'm doing to my kids lately, they're 8 and 11 is like, if we're driving along. I'll point to like a telephone pole and go kids. What do you think that is? And why does it exist? And what does it do.
155
00:20:43.770 --> 00:20:45.259
Harvey Green III: Right, God.
156
00:20:45.390 --> 00:20:49.660
Jirah Cox: It is just not part of their life in any degree whatsoever.
157
00:20:49.840 --> 00:20:50.240
Harvey Green III: Right.
158
00:20:50.640 --> 00:20:59.149
Philip Sellers: Not to completely take us in a rabbit hole, but we call them telephone calls, but they run electric wires. They don't run telephone wires for the most part anymore. Right?
159
00:20:59.180 --> 00:21:06.059
Philip Sellers: I mean, I I that's a complete. They're both looking at me.
160
00:21:07.520 --> 00:21:12.780
Jirah Cox: The I wouldn't say, Yeah, they're not used very much. But no, you're you're right, Harvey. If you're listening to this, like
161
00:21:12.830 --> 00:21:22.950
Jirah Cox: the vast majority of customers like, if you are a customer who as as part of running it, you consume power, you don't generate power.
162
00:21:22.960 --> 00:21:31.809
Jirah Cox: You're probably gonna consume AI models. You're not gonna train AI models. Right? Same kind of thing is, it is done elsewhere. And you get the benefit of it within your 4 walls.
163
00:21:31.960 --> 00:21:32.580
Jirah Cox: Yeah.
164
00:21:32.580 --> 00:21:39.529
Philip Sellers: So people talk about the energy consumption of AI, right? You know.
165
00:21:39.780 --> 00:21:41.920
Philip Sellers: that's not what this is.
166
00:21:41.930 --> 00:21:45.689
Philip Sellers: That's the other thing. That's the training. That's where
167
00:21:45.750 --> 00:21:52.229
Philip Sellers: these models take enormous amounts of power and processing to to be created.
168
00:21:52.510 --> 00:22:00.450
Jirah Cox: And maybe literally dozens of thousands of Gpus, all working in parallel for weeks or months on end. To do that training, activity.
169
00:22:00.630 --> 00:22:04.555
Philip Sellers: Yeah. And and I think that's a a really important call out,
170
00:22:05.180 --> 00:22:09.440
Philip Sellers: because, you know, it's not just happening in the box. But
171
00:22:09.610 --> 00:22:15.829
Philip Sellers: there are things certainly happening in the box that we're talking about. It's just not training the model.
172
00:22:17.483 --> 00:22:18.130
Philip Sellers: So
173
00:22:18.620 --> 00:22:34.419
Philip Sellers: then we close this out. This example of the coffee cup, and then we move on to another example loading a hugging face model in trident. So now we're gonna use another model loaded into the trident engine. What? What comes next?
174
00:22:35.500 --> 00:23:00.159
Jirah Cox: Yeah. So now that we've sort of tested like we, we have proof of life. Right? We've done like a hello world like that's neat. I can do that one time as a command line. Standing it up. What can I do? That's more business useful. So this shows deploying hugging face itself as a model into trident, right? So there's gonna be one thing now that the trident server like vens to the business or for our Dev teams to go consume
175
00:23:02.350 --> 00:23:10.360
Jirah Cox: So we have the text block here that by a python stands up and deploys a hugging face into
176
00:23:10.724 --> 00:23:14.139
Jirah Cox: this is, of course, using that conda that we previously installed there
177
00:23:14.390 --> 00:23:20.699
Jirah Cox: that ultimately gives me a python based pipeline that I can then use for.
178
00:23:21.220 --> 00:23:34.249
Jirah Cox: This is showing an example where it actually does it via text, right? So like, give me some text generation here, where, if I provide it with like an AI, you know. Usually the word is prompt of my favorite food is.
179
00:23:34.450 --> 00:23:35.570
Jirah Cox: and just let it
180
00:23:35.620 --> 00:23:50.302
Jirah Cox: riff right? What does it think like a lot of take a a guess with 0 data whatsoever what your favorite food is. And we see here the sample return is like iced coffee and chocolate, which actually, it's we're so far pretty pretty accurate. This all sounds amazing.
181
00:23:51.256 --> 00:24:11.313
Jirah Cox: and almond sugar, I mean, yeah, that's pretty neat. So so this one is showing a text based. Use case of when I pass it. Prompt finish the prompt for me. Right? So you can picture that working with Kv articles, or, you know, user provided questions or answers, or what's the way to configure this or whatever? What's our sop for a given activity?
182
00:24:18.795 --> 00:24:19.170
Harvey Green III: Stuff.
183
00:24:20.340 --> 00:24:21.739
Jirah Cox: So then, beyond that.
184
00:24:21.780 --> 00:24:27.772
Jirah Cox: this is showing ways to store multiple models right? Like hugging face like others
185
00:24:28.360 --> 00:24:30.360
Jirah Cox: elsewhere. For
186
00:24:30.540 --> 00:24:31.730
Jirah Cox: the
187
00:24:32.010 --> 00:24:35.599
Jirah Cox: inference engine to then go consume. Right? So this is where
188
00:24:36.100 --> 00:24:59.023
Jirah Cox: we've said it before on the podcast, but worth it worth reiterating. When you run things in, say, like a container has environment. The containers themselves are, of course, fully immutable and don't store data within them. So you'd be putting data that you want to infer on or storing models somewhere else in some other directory. Right? You can do it locally on the Vm. You can do it on an Nas. Nfs. Smb, you can do it on
189
00:24:59.490 --> 00:25:06.999
Jirah Cox: You could do it on other storage. You could do it on block storage. Lots of ways you could solve for that. So this is showing other ways. I can configure a repository
190
00:25:07.050 --> 00:25:16.980
Jirah Cox: to. And how do I set up my directory structure there, and my indexing and and naming or nomenclature to get that aligned the way that the application expects.
191
00:25:18.186 --> 00:25:23.049
Jirah Cox: and then finally, our model code right to say, How do I run different models there.
192
00:25:24.230 --> 00:25:34.290
Jirah Cox: within the inference. Engine talks about importing required libraries right around Triton, Python backend utils works required for every Triton Python model
193
00:25:34.830 --> 00:25:44.676
Jirah Cox: calls out numpy specific python library there, that does multi dimensional array computing and then transformers. That's library that or that provides hugging face.
194
00:25:45.120 --> 00:25:50.190
Jirah Cox: and also sports, you know, downloading and training models itself. Right transformers is like one form of a
195
00:25:50.200 --> 00:25:54.759
Jirah Cox: of an Llm usually text base. I think from what my understanding.
196
00:25:58.500 --> 00:26:05.421
Philip Sellers: A lot of the code that we're looking at on screen, and I know that folks re listening can't really see this. But
197
00:26:05.710 --> 00:26:06.979
Harvey Green III: I'll read it to all.
198
00:26:10.682 --> 00:26:12.800
Philip Sellers: I was. Gonna say it is.
199
00:26:12.800 --> 00:26:14.770
Jirah Cox: Paid on episode length. Yeah.
200
00:26:15.720 --> 00:26:19.619
Philip Sellers: Pretty human readable, though I mean, you know, it's
201
00:26:19.770 --> 00:26:21.850
Philip Sellers: it's approachable.
202
00:26:22.060 --> 00:26:23.659
Philip Sellers: Yes, and yes.
203
00:26:23.660 --> 00:26:31.849
Jirah Cox: What this is translating is what we did before was like a single Linux command line of like, look at this, Jpeg, give me, tell me what's in it, and coffee cup and whatnot.
204
00:26:32.090 --> 00:26:48.250
Jirah Cox: Let's evolve that into what can I consume, you know, securely over, say, like a web, call right? Or an Api call more scalable for the enterprise. Right? Doesn't need to be at the command line of the Linux Vm, but ultimately kind of same output right? I wanna I wanna pass an input to an Llm and get the response back out of it.
205
00:26:48.570 --> 00:27:11.990
Philip Sellers: In in as a consumer. So you know it. Professionals, infrastructure professionals. We all consume applications all day long. Whether it's sas for our. I don't know time cards or our payroll, or, you know, service. Now for our it tickets and things like that. We're all consumers. So what actually is being done here
206
00:27:12.130 --> 00:27:30.449
Philip Sellers: to Gyra's point is more of a Api call type thing, a lot of these engines and services. They're exposed, as Api calls, so that other services can interact with them. And I think that's that's another important thing is is the power of this is underneath the covers.
207
00:27:30.450 --> 00:27:44.420
Philip Sellers: But there has to be that interface. And it's not gonna be just a generic web page like we're accustomed to coming to, you know, to interact and do things that may then talk to this on the back end. But a lot of these
208
00:27:44.440 --> 00:27:56.779
Philip Sellers: models then get used and consumed through other web services through the Api. So yeah, I mean, all this stuff sounds neat. And I think we
209
00:27:57.000 --> 00:28:05.235
Philip Sellers: we may incorrectly think that this is chat gpt. And there's gonna be a nice web interface, and we type in a box, and off we go.
210
00:28:05.960 --> 00:28:10.709
Philip Sellers: That's just one way of interacting with this, and more than likely
211
00:28:10.820 --> 00:28:16.580
Philip Sellers: in the enterprise. It'll use an Api from other other places to actually talk to
212
00:28:16.790 --> 00:28:22.970
Philip Sellers: what we're doing with these inference servers and services that we deploy in the enterprise.
213
00:28:23.200 --> 00:28:23.860
Harvey Green III: Yes, we.
214
00:28:23.860 --> 00:28:24.760
Jirah Cox: Yeah, and I can picture.
215
00:28:24.760 --> 00:28:28.685
Harvey Green III: And it. There's a command line option, and there's a gui option.
216
00:28:29.340 --> 00:28:46.980
Jirah Cox: Yeah, I can picture most of our customers right? If we walked in with this tomorrow, you know, the 1st question would get is, can you make this connect to teams or slack or whatnot? So you know, not a web server, but like, put it in the application that I'm already using. So that kind of connectivity right? Is kind of that. Usually that last mile dial tone. If you will.
217
00:28:47.750 --> 00:28:52.210
Philip Sellers: And and I think that's that's the key takeaway. You know.
218
00:28:53.090 --> 00:28:58.629
Philip Sellers: we we have to think about not only what happens in the processing. But how do we interface with
219
00:28:58.870 --> 00:29:02.510
Philip Sellers: with these kind of services moving forward? And I I
220
00:29:02.640 --> 00:29:19.450
Philip Sellers: I think you hit the nail on the head. We want to consume it where we already are, because I need another channel to check like a hole in my head. I've already got teams and slack and email and text messages everything else. We just wanna consume it where we already are.
221
00:29:22.600 --> 00:29:26.650
Philip Sellers: so this is a great model. And a great example.
222
00:29:27.120 --> 00:29:34.830
Philip Sellers: This is inference focused. This is Gpt, it's that generative AI sort of stuff.
223
00:29:36.440 --> 00:29:47.152
Philip Sellers: through the examples we've looked at. You know it. It's taking image data or things like that helping us understand it, pull context out of out of these data sets.
224
00:29:47.860 --> 00:29:48.750
Philip Sellers: what?
225
00:29:49.410 --> 00:29:51.780
Philip Sellers: I guess. What are?
226
00:29:53.580 --> 00:29:57.209
Philip Sellers: I guess taking something like this? What would be the next step to
227
00:29:57.490 --> 00:30:02.250
Philip Sellers: to take it out of proof of concept and and move forward for the enterprise.
228
00:30:03.540 --> 00:30:19.170
Jirah Cox: I would say, look, having kind of read through the blog post, as we've described it to our audio. Only listeners. This one's worth this one's 1 that's worth clicking through into. Right? Find this article on, you know, running running in video trident inference server on Nutanix. It should be the 1st Google hit
229
00:30:19.460 --> 00:30:25.950
Jirah Cox: for that. But when you do this like this is actually really accessible, it's easy to get up and running in a lab environment in a home lab environment.
230
00:30:25.990 --> 00:30:27.670
Jirah Cox: You're looking at less than
231
00:30:28.040 --> 00:30:43.044
Jirah Cox: 150 ish lines of text, maybe under 100. All of it's, you know, thoroughly copy pastable, so easy to grasp. Really good way to cut your teeth on this. Beyond that, I think for most it professionals. It's
232
00:30:44.394 --> 00:31:05.729
Jirah Cox: partnering with the business right to understand like, where are we looking to use this? Where do we want to consume, you know fully Gpt. Lms in the cloud where there's not sensitive data in play. Where do we need to have a strategy around private performant? AI that lives only behind our firewall, where we do want to have it touching things that we for data, we would don't want leaving our 4 walls.
233
00:31:05.980 --> 00:31:06.800
Jirah Cox: Yeah.
234
00:31:07.450 --> 00:31:10.419
Philip Sellers: And then, you know, is it is it consumable.
235
00:31:10.620 --> 00:31:14.800
Philip Sellers: you know, or do we need to partner with a 3rd party?
236
00:31:15.190 --> 00:31:19.260
Philip Sellers: I I would. I would add that one to it. So there's lots of different
237
00:31:20.650 --> 00:31:48.239
Philip Sellers: technologies and ISP softwares out there. So if you're trying to do something with video, maybe you know safety. Maybe safety is your core. Use case there, and you're trying to spot bad things in video feeds. You know, an inferencing engine like this could be a big part, play a huge role in that. But it may need layers of software developed on top of it to be able to do what you're looking to solve for so the 3rd party markets, probably
238
00:31:48.400 --> 00:31:49.990
Philip Sellers: a big
239
00:31:50.000 --> 00:31:53.200
Philip Sellers: component in a lot of these cases, too.
240
00:31:55.650 --> 00:32:09.209
Philip Sellers: closing thoughts, guys, I mean this, this is interesting. It's it's cool to geek out and and kind of talk through this. It's really cool that we have the capability. I think that's more than anything for me. What? What's your key? Takeaways?
241
00:32:11.650 --> 00:32:21.749
Jirah Cox: Yeah, I think it's I think it's really neat. How democratizing the technology is right, like, never before have you been able to do so much with kind of like, so little like. I just have an old Gp laying around. I can go play with this tomorrow.
242
00:32:21.970 --> 00:32:22.600
Philip Sellers: Yeah.
243
00:32:24.090 --> 00:32:31.699
Harvey Green III: Yeah. And I think being able to have a model like this that's approachable on a platform you're already using.
244
00:32:32.055 --> 00:32:46.260
Harvey Green III: Just something you can go in for for the audio. Only people. They at least have to come in here and look at some code, since Phil and Jyra wouldn't let me read it. But it's it's very approachable. It's something that
245
00:32:46.350 --> 00:32:50.704
Harvey Green III: you can do and actually see the end of it.
246
00:32:51.610 --> 00:33:00.869
Harvey Green III: you know, within a very short period of time I feel talked about. We we did it, you know, in a in a hands-on lab. So you know.
247
00:33:00.970 --> 00:33:08.340
Harvey Green III: you can see the end and see the results at the end of it within a day, in some cases. So
248
00:33:08.530 --> 00:33:11.140
Harvey Green III: I definitely have the
249
00:33:11.420 --> 00:33:13.559
Harvey Green III: the power and the ability
250
00:33:13.891 --> 00:33:17.938
Harvey Green III: to get something like this going internally. Take a look at it, and
251
00:33:18.440 --> 00:33:20.811
Harvey Green III: I'll just put all your hugging face.
252
00:33:21.564 --> 00:33:38.965
Philip Sellers: Right. That's that's a great way to bring it back around and and now for all of our sleep challenge listeners we are gonna take to the next section of our podcast episode and Harvey's gonna read, you code. So.
253
00:33:39.380 --> 00:33:41.290
Jirah Cox: The medication. That's our tagline.
254
00:33:41.290 --> 00:33:48.171
Harvey Green III: I'm gonna need you to help me with some of these words. I can't pronounce.
255
00:33:48.880 --> 00:34:11.060
Philip Sellers: Well, guys, a as always appreciate you guys joining us talking through capabilities in the Nutanix platform. And you know, at the end of the day. It is a platform, right, all of these capabilities, all in the same style of operating heavily orchestrated, heavily workflowed. All the goodness that that we talk about from Nutanix.
256
00:34:11.927 --> 00:34:15.089
Philip Sellers: Gyra, Harvey, thanks so much for spending some time with us.
257
00:34:15.860 --> 00:34:16.630
Jirah Cox: Like y'all.
258
00:34:17.359 --> 00:34:27.069
Philip Sellers: And for everybody listening. Hopefully, you'll join us back again very soon for another episode. And in the meantime we'll see you
259
00:34:27.655 --> 00:34:29.239
Philip Sellers: on that next episode.