DGX Workstation Setup Tips and Tricks
The NVIDIA DGX Workstation is a high-performance AI workstation that enables your Data Science team to get started quickly with the power of a data center in your office. With 4 NVIDIA V100 32GB GPUs and NVIDIA Docker preloaded, there is plenty of computing power to allow multiple users to run their most challenging AI training projects. Additionally, the DGX Workstation allows you to download preconfigured and optimized Docker containers from the NVIDIA GPU Cloud to further decrease the time it takes to get started. Our customers use their DGX Workstations for projects involving Natural Language Processing, classification with algorithms such as XGBoost, image analysis of x-rays and MRIs, and predictive analytics.
The setup for the DGX Workstation (DGX-WS) is a quick process, typically less than 30 minutes from power on until your first nvidia-docker run command. As you do your install, here are a few tips that can ensure the process is as smooth as possible. These tips come from our experience with installing and configuring DGX-WS.
1.The Setup Guide says the ethernet ports are configured for DHCP by default. We have found this is typically not the case, but that they are set to “manual”. Be sure to edit the /etc/network/interfaces file (or configure the ports using the GUI) and either set the ports to DHCP or a static IP address. As with any server, best practice is static IP. Timesaving tip: Make sure you have a monitor, keyboard, and mouse handy when first booting so you can configure the interfaces. After that, everything can be done with ssh.
2.Sometimes the DGX-WS will ship with nvidia-docker v1. If so, it needs to be upgraded to nvidia-docker v2. You can check you nvidia-docker version using:
nvidia-docker version
If the command returns 2.0.x, then your system already contains the upgrade to the NVIDIA Container Runtime for Docker and no further action is needed. If it does not return 2.0.x, then perform the steps listed here.
3.One of the most “creative” errors that can occur is this. It can happen when trying to pull new docker containers from NGC:
user@DGX-WS:~/ docker pull nvcr.io/nvidia/pytorch:20.01-py3 (or any other container)
Error response from daemon: Get https://nvcr.io/v2/: dial tcp: lookup nvcr.io on 127.0.0.53:53: server misbehaving
“server misbehaving” – someone had fun with that error. This is typically caused by a DNS issue. Make sure the DNS servers are correctly configured and it should resolve the problem. Setting the DNS server to 8.8.8.8 is the most common fix.
4.Finally, make sure you remove the piece of foam that is inside the DGX-WS that keeps the GPUs from moving during shipping. It’s easy to overlook this step!