The Secret Behind Successful VR and AR: Data Capture and Labeling

By Seng Teung, Client Partner

The global market for augmented and virtual reality is expected to reach $766 billion by 2025. It is easy to see why. Both AR and VR continue to make inroads beyond the realm of gaming. Businesses are adopting AR and VR for applications ranging from employee training to consumer shopping. For example:

IKEA's AR app, IKEA Place, allows users to visualize how furniture will look in their home by virtually placing 3D models of IKEA products in their living space through their smartphones or tablets.
Ford uses VR in its design process, allowing engineers and designers to collaborate and explore various design options in a virtual environment. It helps in reducing prototyping costs and accelerates the development process.

When these business cases get discussed in news articles, the conversation focuses on the devices used and the user experience. This is understandable and desirable. Businesses need to see the applications that provide real value in order to invest in these technologies. The equipment required – notably headsets for VR – have a huge impact on the user experience. And yet, it’s important that businesses considering an investment into AR and VR appreciate the complex process of preparing data for different applications of these technologies. The experiences enabled by AR and VR do not happen by magic. An incredible amount of data collection and labeling are required – and few resources exist to do all this.

Example of the Data Heavy Lifting

Consider some of the immersive scenarios that VR alone can create. VR literally takes you to another world without needing to leave your home. VR applications can take you to lush beaches and majestic mountain peaks to help you visualize and plan a vacation. VR can simulate a Parisian kitchen for an immersive cooking class or take a user to the Louvre for an art appreciation class. The list goes on and on.

But how are those immersive environments actually created, especially to make them realistic? After all, visiting Paris through VR would not be the same if a VR app failed to include the Eiffel Tower, and an immersive cooking class would not work so well if a user could not find a virtual baking tray and spatula. Moreover, objects need to appear with a proper depth and with appropriate dimensions to seem real.

These environments need to be designed just like virtual games do.

Let’s take the example of a VR application for meal preparation. Learning how to prepare a meal in a real-life setting can be costly and even physically risky for someone who is just learning how. Not everyone has access to a brick-and-mortar location with access to all the necessary tools required, and beginners might not be ready to handle sharp objects such as knives. Simulated meal preparation environments can be useful for those and more reasons.

But how do you create the right learning environment? The process is not just about rendering visually appealing food; it also includes simulating textures, handling interactions with various kitchen tools, and perhaps even integrating smell or taste sensations. Here's a closer look at the data capture, labeling, and tagging that might be involved:

Data Capture

Data capture involves a 3D modeling of entire environments. When you walk into a virtual kitchen, it needs to look and function like a real kitchen. This means capturing all the data that comprises the building blocks of a kitchen. Capturing data to build an environment includes capturing the entire layout, individual objects, motions, sounds, and even sensory data. For example:

Environmental capture consists of scanning or modeling the entire kitchen environment, including countertops, cabinets, sinks, and flooring to establish the setting. This also includes the proper spatial arrangements for ergonomic interactions.
An example of 3D scanning is capturing accurate models of kitchen utensils, appliances, ingredients, and even hands or utensils in motion. All this requires accurate texture mapping or gathering detailed texture information to render food and kitchen tools realistically.
Motion capture, on the other hand, involves both human movements (capture the motions of chopping, stirring, flipping, etc., to animate the user’s virtual hands or utensils accurately) and cooking process simulation (observing and recording the real-time cooking process, like boiling or frying, to replicate it in VR).
Audio capture typically entails recording cooking sounds (recording sounds of chopping, sizzling, boiling, etc., for an immersive auditory experience).
In addition, some advanced VR setups might include ways to simulate smell or taste, requiring specialized data capture.

Sounds like a lot of detail needs to be captured, right? And remember, I’m just providing some examples. From there, all that data needs to be labeled and tagged.

Labeling and Tagging

I’m not going to get into all the types of data that need to be tagged, but just to give you a sense of the enormity of the challenge, here’s a brief sample:

Object labeling. This means labeling all the different utensils, appliances, ingredients, and their parts for accurate interaction, and tagging different materials to ensure that they behave realistically (e.g., metal pan, wooden spoon).
Interaction tagging. Here, one needs to tag how objects respond to various actions (e.g., cutting an onion, stirring a pot) and how they interact with each other. Oh, and one must label different stages of cooking like raw, fried, or boiled, and define how transitions occur based on user actions.
Navigation and ergonomics. How does the user move around the virtual kitchen and interact with objects? How will the experience address accessibility needs, providing alternative controls or guidance for those who may need them?
Cooking animations: define animations for various cooking processes, linking them to the correct user actions or timed events.
Step-by-step instructions: If the scenario is instructional, tagging and sequencing the necessary steps, providing prompts or guidance as needed.

Demand Is Growing

For these reasons, the data capture and labeling aspects of VR and AR are growing in demand. But those skills alone won’t be enough. A business that wants to develop an VR/AR application needs to consider a number of important best practices to manage the myriad data collection and labeling tasks. They include:

Collaboration and expertise in data collection: collaborating with chefs or cooking experts to ensure that the techniques, tools, and ingredients are accurately represented. Including specialists in 3D modeling, animation, user experience, sound design, and other relevant fields to cover all aspects of the development.
Labeling and tagging through ongoing design and testing. This means establishing clear guidelines for labeling and tagging objects, interactions, and animations; designing user-centered interactions that feel natural and intuitive, considering real-world cooking movements and ergonomics; and verifying that the tagged interactions and state changes (like chopping or frying) function as intended through regular testing.

At Centific, we can help you build your next AR/VR application from collection to annotation at scale -- to get your experience in the market with the right performance and velocity. Contact us and learn more.