A friendly guide to containerization for AI work

Hands on One of the biggest headaches associated with AI workloads is wrangling all of the drivers, runtimes, libraries, and other dependencies they need to run.

This is especially true for hardware-accelerated tasks where if you've got the wrong version of CUDA, ROCm, or PyTorch there's a good chance you'll be left scratching your head while staring at an error.

If that weren't bad enough, some AI projects and apps may have conflicting dependencies, while different operating systems may not support the packages you need. However, by containerizing these environments we can avoid a lot of this mess by building images that have been configured specifically to a task and - perhaps more importantly - be deployed in a consistent and repeatable manner each time.

And because the containers are largely isolated from one another, you can usually have apps running with conflicting software stacks. For example you can have two containers, one with CUDA 11 and the other with 12, running at the same time.

This is one of the reasons that chipmakers often make containerized versions of their accelerated-computing software libraries available to users since it offers a consistent starting point for development.

Exposing Intel and AMD GPUs to Docker

Unlike virtual machines, you can pass your GPU through to as many containers as you like, and so long as you don't exceed the available vRAM you shouldn't have an issue.

For those with Intel or AMD GPUs, the process couldn't be simpler and simply involves passing the right flags when spinning up our container.

For example, let's say we want to make make your Intel GPU available to an Ubuntu 22.04 container. You'd append --device /dev/dri to the docker run command. Assuming you're on a bare metal system with an Intel GPU, you'd run something like:

Meanwhile, for AMD GPUs you'd append --device /dev/kfd

Note: Depending on your system you'll probably need to run this command with elevated privileges using sudo docker run or in some cases doas docker run.

Exposing Nvidia GPUs to Docker

If you happen to be running one of Team Green's cards, you'll need to install the Nvidia Container Toolkit before you can expose it to your Docker containers.

To get started, we'll add the software repository for the toolkit to our sources list and refresh Apt. (You can see Nvidia's docs for instructions on installing on RHEL and SUSE-based distros here.)

Now we can install the container runtime and configure Docker to use it.

With the container toolkit installed, we just need to tell Docker to use the Nvidia runtime by editing the /etc/docker/daemon.json file. To do this, we can simply execute the following:

The last step is to restart the docker daemon and test that everything is working by launching a container with the --gpus=all flag.

Note: If you have multiple GPUs you can specify which ones to expose by using the gpus=1 or gpus '"device=1,3,4"' flags.

Inside the container, you can then run nvidia-smi and you should see something similar appear on your screen.

Using Docker containers as dev environments

One of the most useful applications of Docker containers when working with AI software libraries and models is as a development environment. This is because you can spin up as many containers as you need and tear them down when you're done without worrying about borking your system.

Now, you can just spin up a base image of your distro of choice, expose our GPU to it, and start installing CUDA, ROCm, PyTorch, or Tensorflow. For example, to create a basic GPU accelerated Ubuntu container you'd run the following (remember to change the --gpus or --device flag appropriately) to create and then access the container.

This will create a new Ubuntu 22.04 container named GPUtainer that:

Using prebuilt images

While building up a container from scratch with CUDA, ROCm, or OpenVINO can be useful at times, it's also rather tedious and time consuming, especially when there are prebuilt images out there that'll do most of the work for you.

For example, if we want to get a basic CUDA 12.5 environment up and running we can use a nvidia/cuda image as a starting point. To test it run:

Or, if you've got and AMD card, we can use one of the ROCm images like this ROCm/dev-ubuntu-22.04 one.

Meanwhile, owners of Intel GPU should be able to create a similar environment using this OpenVINO image.

Converting your containers into images

By design, Docker containers are largely ephemeral in nature, which means that changes to them won't be preserved if, for example, you were to delete the container or update the image. However, we can save any changes committing them to a new image.

To commit changes made to the CUDA dev environment we created in the last step we'd run the following to create a new image called "cudaimage".

We could then spin up a new container based on it by running:

Building custom images

Converting existing containers into reproducible images can be helpful for creating checkpoints and testing out changes. But, if you plan to share your images, it's generally best practice to show your work in the form of a dockerfile.

This file is essentially just a list of instructions that typically tells Docker how to turn an existing image into a custom one. As with much of this tutorial, if you're at all familiar with Docker or the docker build command most of this should be self explanatory.

For those new to generating Docker images, we'll go through a simple example using this AI weather app we kludged together in Python. It uses Microsoft's Phi3-instruct LLM to generate a human-readable report from stats gathered from Open Weather Map every 15 minutes in the tone of a TV weather personality.

import json import time from typing import Dict, Any

import requests import torch from transformers import pipeline, BitsAndBytesConfig

# Constants ZIP_CODE = YOUR_ZIP_CODE API_KEY = "YOUR_OPEN_WEATHER_MAP_API_KEY" # Replace with your OpenWeatherMap API key WEATHER_URL = f"http://api.openweathermap.org/data/2.5/weather?zip={ZIP_CODE}&appid={API_KEY}" UPDATE_INTERVAL = 900 # seconds

# Initialize the text generation pipeline quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16) pipe = pipeline("text-generation", "microsoft/Phi-3-mini-4k-instruct", device_map="auto", model_kwargs={"quantization_config": quantization_config})

def kelvin_to_fahrenheit(kelvin: float) -> float: """Convert Kelvin to Fahrenheit.""" return (kelvin - 273.15) * 9/5 + 32

def get_weather_data() -> Dict[str, Any]: """Fetch weather data from OpenWeatherMap API.""" response = requests.get(WEATHER_URL) response.raise_for_status() return response.json()

def format_weather_report(weather_data: Dict[str, Any]) -> str: """Format weather data into a report string.""" main_weather = weather_data['main'] location = weather_data['name'] conditions = weather_data['weather'][0]['description'] temperature = kelvin_to_fahrenheit(main_weather['temp']) humidity = main_weather['humidity'] wind_speed = weather_data['wind']['speed']

return (f"The time is: {time.strftime('%H:%M')}, " f"location: {location}, " f"Conditions: {conditions}, " f"Temperature: {temperature:.2f}°F, " f"Humidity: {humidity}%, " f"Wind Speed: {wind_speed} m/s")

def generate_weather_report(weather_report: str) -> str: """Generate a weather report using the text generation pipeline.""" chat = [ {"role": "assistant", "content": "You are a friendly weather reporter that takes weather data and turns it into short reports. Keep these short, to the point, and in the tone of a TV weather man or woman. Be sure to inject some humor into each report too. Only use units that are standard in the United States. Always begin every report with 'in (location) the time is'"}, {"role": "user", "content": f"Today's weather data is {weather_report}"} ] response = pipe(chat, max_new_tokens=512) return response[0]['generated_text'][-1]['content']

def main(): """Main function to run the weather reporting loop.""" try: while True: try: weather_data = get_weather_data() weather_report = format_weather_report(weather_data) generated_report = generate_weather_report(weather_report) print(generated_report) except requests.RequestException as e: print(f"Error fetching weather data: {e}") except Exception as e: print(f"An unexpected error occurred: {e}")

time.sleep(UPDATE_INTERVAL) except KeyboardInterrupt: print("\nWeather reporting stopped.")

if __name__ == "__main__": main()

Note: If you are following along, be sure to set your zip code and Open Weather Map API key appropriately.

If you're curious, the app works by passing the weather data and instructions to the LLM via the Transformers pipeline module, which you can learn more about here.

On its own, the app is already fairly portable with minimal dependencies. However, it still relies on the CUDA runtime being installed correctly, something we can make easier to manage by containerizing the app.

To start, in a new directory create an empty dockerfile alongside the weather_app.py Python script above. Inside the dockerfile we'll define which base image we want to start with, as well as the working directory we'd like to use.

Below this, we'll tell the Dockerfile to copy the weather_app.py script to the working directory.

From here, we simply need to tell it what commands it should RUN to set up the container and install any dependencies. In this case, we just need a few Python modules, as well as the latest release of PyTorch for our GPU.

Finally, we'll set the CMD to the command or executable we want the container to run when it's first started. With that, our dockerfile is complete and should look like this:

Now all we have to do is convert the dockerfile into a new image by running the following, and then sit back and wait.

After a few minutes, the image should be complete and we can use it to spin up our container in interactive mode. Note: Remove the --rm bit if you don't want the container to destroy itself when stopped.

After a few seconds the container will launch, download Phi3 from Hugging Face, quantize it to 4-bits precision, and present our first weather report.

Naturally, this is an intentionally simple example, but hopefully it illustrates how containerization can be used to make running AI apps easier to build and deploy. We recommend taking a look at Docker's documentation here, if you need anything more intricate.

What about NIMs

Like any other app, containerizing your AI projects has a number of advantages beyond just making them more reproducible and easier to deploy at scale, it also allows models to be shipped alongside optimized configurations for specific use cases or hardware configurations.

This is the idea behind Nvidia Inference Microservices - NIMs for short - which we looked at back at GTC this spring. These NIMs are really just containers built by Nvidia with specific versions of software such as CUDA, Triton Inference Server, or TensorRT LLM that have been tuned to achieve the best possible performance on their hardware.

And since they're built by Nvidia, every time the GPU giant releases an update to one of its services that unlocks new features or higher performance on new or existing hardware, users will be able to take advantage of these improvements simply by pulling down a new NIM image. Or that's the idea anyway.

Over the next couple of weeks, Nvidia is expected to make its NIMs available for free via its developer program for research and testing purposes. But before you get too excited, if you want to deploy them in production you're still going to need a AI Enterprise license which will set you back $4,500/year per GPU or $1/hour per GPU in the cloud.

We plan to take a closer look at Nvidia's NIMs in the near future. But, if an AI enterprise license isn't in your budget, there's nothing stopping you from building your own optimized images, as we've shown in this tutorial. ®

Editor's Note: Nvidia provided The Register with an RTX 6000 Ada Generation graphics card to support this story and others like it. Nvidia had no input as to the contents of this article.

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Oct 5
159 Automattic staff take severance offer and walk out over WP Engine feud

WordPress supremo Mullenweg channels Churchill: Never let a good crisis go to waste

Oct 5
Unlock the power of database innovation

Webinar Discover how to transform data management for maximum efficiency

Oct 4
Office 2024 unveiled for Microsoft 365 refuseniks

For the IT professional who has to take work home

Oct 4
Busybox 1.37 is tiny but capable, the way we like Linux tools to be

Self-proclaimed Swiss Army knife of embedded Linux moves slow and fixes things in latest release

Oct 4
Google Cloud to help India export its Digital Public Infrastructure

Bundles free government apps to help digital diplomacy - and maybe find some new customers

Oct 4
John Deere accused of being full of manure with its right-to-repair promises

Tractor maker has only turned over shoddy tools, half-baked info, may be breaking the law, says senator

Oct 3
Saying goodbye to the tech dreams Microsoft abandoned with Windows 11 24H2

Is that a Mixed Reality headset, or just a complicated paperweight? Oh and farewell WordPad