Building an ASL Vision AI Model at SuperCompute 2019
I attended the Super Computing 2019 (SC19) conference in Denver. SC is a very large conference that highlights some of the most cutting-edge work in high-performance computing (HPC), AI, computational fluid dynamics, and other fields where massive computation is required. Groups like NASA, the National Laboratories, including Sandia, Oak Ridge, and Lawrence Livermore, and major research universities all highlighted their work. For someone who loves to see how far computing can be pushed, this is the place to be.
At the conference, I presented my software that uses AI to transcribe American Sign Language (ASL) in real-time as a person signs in front of a camera. As letters of the alphabet are signed, they are written across the screen so they can be read. The ultimate goal of the project is to develop both phone and web apps that allow someone who does not speak ASL to be able to understand what someone who does speak ASL is signing.
Since the project started in August of this year, I have been able to make significant progress. One of the main things I need, and that is needed in any computer vision project, is lots and lots of training data. At several other conferences this year I have asked people to allow me to take a video of them performing the ASL alphabet. I was able to get video of 31 new people performing the alphabet.
At SC19, I added a new twist to the video collection. I had a screen that played a 10-minute loop of various backgrounds, including random colors, a person walking through a castle and a forest, concert video footage, and highlights from the 2019 World Series (sorry Astros fans). People stood in front of the screen as they performed the alphabet. The reason for the screen is very simple. When someone signs, they will rarely be standing in front of a solid, non-moving background. Instead, they will be in a classroom or restaurant or outside or somewhere else where the environment is not static. For the AI software to be generalizable, the training must be done using myriad backgrounds. By adding 10-minutes of different backgrounds, I was able to ensure that each letter that was signed would be on a different background. As I did at previous conferences, I made every attempt to get people with varying hand shapes, colors, sizes, fingernails (both painted and unpainted), and jewelry to sign the alphabet. This will also make the models generalizable to the maximum number of people and reduce bias as much as possible.
As at other conference, the response to the software was very good. I had many, many people come by and try the demo. In fact, I had my first deaf person try it. Honestly, I was quite nervous when she tried. I have been careful to make it as useful as possible, but I wouldn’t know for sure how successful it is until a deaf person tried it. She was very impressed and said it will be very helpful for the deaf community. She even asked how she can help with the development. I am very much looking forward to working with her.
Finally, I was able to do 2 oral presentations of the challenges I face in developing the software. I had several people ask me questions and we had good discussions about some ways to overcome some of the challenges.
If you haven’t had a chance, look at my Twitter feed to see the posts I made during the conference and to stay up to date on my latest research.