Speech to Test with Unity and Watson
After watching the Sprint Planning Meeting video and then reading through the Task Board article, I logged onto my Trello board, looked at through my ‘Explore Ideas’ and identified my next sprint goal – Speech and IBM Watson Unity SDK. The goal is exploring the idea of how player and AI character speech can create a stronger sense of immersion within a VR space.
MI started with a user story centered on a person wants to interact within VR using speech, similar to Alexa or Google Assistant but a bit more robust . User W: “I want to talk with the characters I meet in the virtual world.”
The first step is to setup a Unity scene and get IBM Watson to recognize speech to text and display the text to UI text field. The ultimate goal is triggering in-game logic with a virtual character that could respond to users’ verbal actions. This could lead to a language learning environment where students could learn a vocabulary simply by interacting with an AI character.
Having moved the Trello card to the started board, I set up a free IBM Cloud account to access Watson. I have planned to spend 8 hours within a two week period, on the initial setup. Speech recognition is one of the core features of the app and will give me basic structure to build on. The success of the sprint hinges on communicating with the speech to text server, and it able recognize verbal input through Unity.
I will know I have achieved my sprint goal when the Watson servers are able to transcribe the speech as text and display the output in text field within Unity. The goal is realistic because of the small steps approach I have used in the past. I set have planned to spend 2 two hour blocks during the week and a four hour block of time on the weekend. I will update my Trello board and I am using Unity Collaborate as my version tracking system. Keeping to my time schedule, updating my Trello board and staying focused on simply communicating with Watson, will allow me to successfully complete my sprint goal.
After looking at the list of challenges I the decided on,
and followed Geoff’s advice to ” Let yourself off the leash!”. I’ve been thinking a lot about “Presence” (spatial immersion for the technically minded) and the users POV, so this activity sounded really interesting. I put the timer on and got started. After twenty minutes I had a paper full of ideas that started off safe but moved into the “what if” realm of thought.
A few of the interesting ones where:
Spending a few minutes looking back on them, it would be really fun to be able to breath life into these ideas. I liked the idea of being able to sort through your dreams or of even creating a dream scrapbook you could share with people. One or two my be possible in a very limited way but overall it was fun to think about and gave me an idea or two play with. The challenge now would be to come up with a business plans to pitch on of these ideas and research needed to support it.
Having said that, I have started researching the idea of a VR language learning app in which users could converse with an AI character. Having an AI character like Miquela Sousa that students could learn with would be really intriguing experience. So, I started exploring just what speech capable AI’s are available and are able to integrate into the Unity platform. Microsoft has Azure client and Amazon has Sumerian but that service may only run on AWS servers. There IBM Watson and SmartBody by the University of California. At the moment, IBM Watson looks to be the most promising because of their recent partnership, bringing Watson’s AI functionality to Unity’s gaming engine, with built-in VR/AR features. I have added this to my Trello board for now and will continue to follow it.
This week was spent looking at the importance of Persona and why they are necessary and before diving in and developing. Some things for guiding the persona where:
I hadn’t realized how important persona where in the success of an app and how they helped the development team make design decisions based on the user needs. Also, how important those stories were to the sprint process and keeping people focused on the needs of the target audience. I see how this could keep a project on track and help prevent feature creep that could waste time and money.
I didn’t have a personal project in mind when thinking about this assignment, but did have some ideas based on recent events at work dealing with students and SNS problems. Getting started, I put a real name to my persona and what they wanted to accomplish and what motivated them. I looked for a picture that went well with the persona I had outlined and that would provide an emotional pull for the team using it.
The persona is tied to the idea of SNS’s and the app I choose is YouTube Kids (Links to an external site.)and focused on the idea of giving working parents ‘peace of mind’ when comes to their kids net viewing habits. Something that a majority of parents worry about and but unsure about their choices are in providing safe apps for their children to use. A multitude of websites advise parents on responsible tech usage for children and Google’s YouTube Kids helps parents with age-appropriate video. Google wants to “make it safer and simpler for kids to explore the world through online video” and at the same time give parents ” a whole suite of parental controls, so they can tailor the experience to their family’s needs.”
My persona is Lykke Li, a working mother worried about giving her children the freedom to watch YouTube but in a safe and controlled environment.
YouTubeKids (persona PDF)
Reflecting on the VRTK project, I accomplished two of the three goals (teleportation and object interaction) within the timeframe I had set. The third goal of triggering sound effects remains unfinished. I tied sound to actions but was having an issue of syncing the audio to the event action. Audio effects can be part of a future quick sprint to focus on this one element in interaction. Having a smooth integration between visual selection (when an object is selected, the effect changes) and audio selection response helps reinforce immersion in the environment. The issue encountered was one of time, rather than a matter of implementation. Audio work in Unity is a bit more involved when looking to match animations to event triggers.
Object effects and audio in Unity are two things I would like to explore in a future sprint.
I have been on track with my Unity studies using Trello. I am about to start a personal project to test what I’ve learned with VRTK. I have completed a few of the tutorial videos and have successfully installed the toolkit into an example project. The challenge now is to be able to repeat the process on a new project and then tie user interactions to my own assets rather than those used in the tutorial.
I will test myself using my one-week (2 hours a day) deadline for a simple working environment. The only issues I foresee at the moment is having to fine tune the settings to get my assets to work correctly with VRTK. Being to teleport around the environment, interact with objects and trigger sound effects will be the yardstick to measure the results by. Having practiced with the examples, this should a good challenge and signal that it’s time to push myself a bit harder with the Unity. Being able to build basic interactive environments means being able to iterate quicker and start working on projects with more confidence. I have some extra time this week to bank towards this project, giving myself some wiggle room for an unforeseen problem. VRTK developers also have a very active Discord giving me additional help should I need it.
I will update this post with picture at the end of the week.
Spent the weekend doing some research after getting Maya installed and finishing a few of the tutorials. Its a really amazing what you can do with this and I’ve just scratched the surface. My goal is to get a basic understanding of the building a character and rigging it. I also installed Mudbox and was interested in how you export out the files as .fbx because the files in Mudbox save as .mud. There are some tutorials with Mudbox on the sight but I think I will play with that another day. I am exploring Substance Alchemist which can extract textures form scans which is pretty cool. I need to stay focused and come back to the other Maya workflows later when I more time to dive in.
Working now on building a simple character and its going pretty well, just taking a bit of time with the back and forth of moving through the tutorials.
Update: When I started exploring Maya, I thought I could somehow find the time to learn but after working with it for a week or so realized that it was a lot harder to use than I imagined. I wasn’t able to translate the image I had in my mind’s eye into the character I created with Maya. Then something that I hadn’t realized until my adviser mentioned was that it would be better to contract out work like that, rather than to spend the time that I didn’t have and wasn’t contributing my core goal of making a virtual reality app. In hindsight, it was good to have gained a basic understanding of how Maya worked and general insight into how character movement works. Having this working knowledge provides an opportunity for team communication, but more importantly give me the knowledge needed to communicate the what I am looking for when contracting out Maya work from a freelancer.