Voice activated communication device for preschoolers: a sprint sabbatical project
One of the great things about Bandwidth is how the company truly recognizes the importance of family. As a father of a four year old boy, I often think about today’s technology and how kids interact with it. Be it touch with iPads/ LeapFrog or voice with Siri or Echo, it’s becoming increasingly simple for the little ones to engage with computers these days. So, it made sense to me to combine this with another great Bandwidth benefit, a Sprint Sabbatical, to explore how to reduce the learning curve when it comes to children and smartphones. A Sprint Sabbatical is two weeks offered to any Bandwidth employee to develop an idea. If your idea if approved for a Sprint Sabbatical, you are allowed to step away from your day job and its responsibilities, so that you can focus entirely on your sabbatical project.
The market is already ripe with tools to more easily engage kids with smartphones or phone-like devices. Here is one example just based off of an internet search I did. With my sabbatical project, I had a few goals in mind:
- Ease-of-use technology for communication between family members: specifically parents and children who are in the three-five year age group.
- Eliminate having a child need to remember telephone numbers/contacts, navigate around a UI, or accidentally hitting the wrong button.
- The interface should be voice-activated, using simple commands like “Mommy” or “Daddy”, words any child of that age would instinctively know.
Simple as that.
I started off building around the open source hardware prototyping project known as Tessel. Tessel, at it’s core, is a mini-computer, similar to others like the Raspberry Pi and the Arduino, with which you can add various modules to the main board. Each module can provide functional enhancements to the base Tessel such as GPS, accelerometer, and ambient, to name a few. The software that is executed on the Tessel runtime is written in Node.js to allow for quicker application development. One of the key differentiators is that Tessel provides on-board WiFi capability. I also chose to use the GPRS module for GSM cellular service and the audio module for voice command input. The prototype ended up looking similar to the picture shown below. The assembled board fit in a 3”x 6”x 1” cardboard box with holes cut out for mic and micro USB inputs.
One of the more important requirements I was aiming for was to have the device work completely offline (i.e. without internet data connectivity). Integrating the speech recognition portion proved to be the piece that prevented that from being reality. Sphinx, developed by Carnegie Mellon University, has been accepted as a stable voice-recognition API and ported to numerous languages. However, attempting to execute code on the Tessel runtime using one of three Node.js/JS ports of Sphinx (node pocketsphinx, pocketsphinx-js, pocketsphinx-web) proved a failure.
I ended up settling for an online approach to demo using Tessel, a node.js service, and Wit.ai, an online speech recognition service developed by Facebook. The node.js service’s sole purpose was to convert the audio generated by the Tessel to a format Wit.ai could recognize. Wit.ai would then respond with a JSON string indicating a confidence level on how close the audio was to a preconfigured set of words. Based on the confidence level, the Tessel would then dial the telephone number of the contact that matched. The whole sequence proved to be rather slow, so an offline approach would have certainly been more performant.
The sequence diagram above shows the result of what I was able to demo at the end of my sabbatical and all the pieces involved. For demo purposes, I had recorded my son’s voice with him saying “I want mommy” or “I want daddy” and then played it back with the Tessel device recording what was said. I did successfully get the Tessel to place a call to my wife’s number after playback of the “I want mommy” phrase. I didn’t successfully get a call placed to my number; the voice recognition portion worked so it’s possible that the Tessel GPRS module didn’t pick up a good enough cellular signal in order to make the call.
Overall, it was a great experiment and my son got a kick of shouting “Mommy” and “Daddy” into a cardboard box over the course of a few days of testing. Ideally, I would’ve liked to have my son test with a fancier approach other than a cardboard box containing the Tessel board. It would be something that he would’ve not had to carry with him. Perhaps the next iteration of this project would incorporate the board into a shirt, jacket, or a hat so he would always have direct communication with either me or my wife without him even knowing it. The next version of Tessel will be released in August, so I’ll check it out to see if I can make more headway into the offline voice recognition capabilities.