Thoughts on Apple Vision Pro and VisionOS

Thoughts on Apple Vision Pro and VisionOS

We discuss VisionOS in-depth, recovering either first impressions of Apple framework presentation from a technologist's point of view.

Slated for early release next year, the Vision Pro promises robust capabilities due to its embedded M2 chip and a novel R1 chip designed for instantaneous processing. This gadget is lightweight and user-friendly, with features such as eye tracking, hand control, and voice control. A myriad of use cases for the Vision Pro has been envisioned by Apple, including an enhanced Mac productivity experience, a novel approach to FaceTime, immersive video consumption, gaming, and photo and video creation and viewing.

Apple Vision Pro is neither a VR or an AR device, it is much more

A close encounter with the Apple Vision Pro reveals its resemblance to a pair of high-tech goggles — mostly intended positively. Its 3D-shaped glass exudes a sleek and polished look, the aluminum alloy frame seems robust, while the light seal that interfaces between the headset and your face is soft and comes in multiple sizes to achieve a custom fit.

Apple has handpicked the materials for the Vision Pro to ensure maximum wearability. The custom aluminum alloy frame contours around the user’s face, and a modular design guarantees every user a “perfect” fit.

The light seal, crafted from a soft textile, has various sizes and shapes to offer a precise fit. Adjustable flexible straps fit snugly around the head and position the Audio Pods near the ears. The cushioned headband comes in multiple sizes and styles and can be swapped.

For those requiring prescription glasses, custom optical inserts will be available for purchase to correct vision. These ZEISS Optical Inserts attach magnetically to the headset’s lenses but are sold separately, constituting an additional cost.

The Vision Pro, as per Apple, will “deliver phenomenal compute performance.” It is powered by an M2 processor and a new R1 chip, which processes inputs from 12 cameras, five sensors, and six microphones, effectively eliminating lag. Apple asserts that it can refresh images eight times faster than a human eye’s blink. The headset reportedly operates almost silently.

The device has two ultra-high-resolution displays that merge to produce 23 million pixels, creating a 4K display for each eye. This could generate a display that appears to be 100 feet wide. The video can be rendered at 4K resolution, and the text appears sharp from any angle.

Each audio pod contains two individually amplified drivers capable of delivering Personalized Spatial Audio tailored to the user’s head and ear geometry. This creates a sound that appears to emanate from the surrounding space. Vision Pro also matches sound to the room using audio raytracing. An advanced sensor array is also included.

Within the lenses, a high-performance eye-tracking system employs cameras and a ring of LEDs to project invisible light patterns onto the user’s eyes. Consequently, the Apple headset can discern the user’s gaze direction and utilize high-performance eye tracking for interface control. There are no external accessories required except controllers for gaming.

A noteworthy feature called Optic ID enables the Vision Pro to recognize the user’s iris, unlock the device, and authenticate passwords and purchases. Like the iPhone and Mac, all data is stored on a dedicated Secure Enclave processor.

VisionOS, a new platform for Spatial Applications

Here we look at the trifecta of constructs that define the architecture: Windows, Volumes, and Spaces.

Windows on visionOS can be visualized as spatially orchestrated rectangles encapsulating content. In a given space, users can interact with multiple windows from distinct apps or numerous windows spawned by a single application.

Volumes represent interactive 3D objects or environments that are not fully immersive, for example, a 3D mapping system or a mini-game.

On the other end of the spectrum, Spaces are fully immersive environments confined to a single app, and they can house multiple Windows and Volumes from your application. A Shared Space is a collective environment housing multiple apps with limited user control, while Full Spaces are completely immersive, singular app experiences with different immersion levels — mixed, progressive, and full.

In terms of user input, visionOS supports a wide range of input methods. The user can directly interact with the UI using gestures as demonstrated in various videos. Alternatively, users can interact with each window as a physical, floating iPad. External inputs via Bluetooth trackpads or gaming controllers are also supported. Moreover, voice interaction is possible, although it is not enabled by default for iPad and iOS apps running on Vision Pro.

The Vision Pro hardware introduces a sophisticated spatial audio system, providing auditory feedback corresponding to the physical environment’s size and material composition. This necessitates the thoughtful design of UI sounds and audio cues to provide the best immersive experience.

Developers will operate within the familiar Apple dev ecosystem regarding cross-platform development for Vision Pro, iPad, and iOS. Tools like XCode and SwiftUI will form the crux of the development process. However, for developers intending to create a fully immersive VR experience compatible with other headsets like Meta’s Quest or PlayStation VR, the Unity engine becomes essential.

SwiftUI forms the basis for your app's UI and content creation in the Apple ecosystem. RealityKit, Apple’s proprietary 3D rendering engine, renders materials, 3D objects, and light simulations for realistic graphics. ARKit, another tool Apple provides, aids in advanced scene understanding, enabling virtual interactions with real-world objects.

Migrating existing iOS apps to visionOS generally results in the app working as a Window in the Shared Space without modifications. Using the Ornament API can enhance the spatial experience by creating floating islands of UI around your app. However, apps heavily reliant on ARKit features may require significant modifications due to the numerous upgrades made to ARKit.

If Unity is your tool of choice, it allows the construction of Bounded Volumes for the Shared Space, besides crafting immersive content and VR-like apps that provide you with greater control over rendering but may lack the support for AR Kit’s advanced scene understanding capabilities like plane detection. Unity’s PolySpatial tool is required to convert materials, shaders, and other features to be compatible with RealityKit’s rendering constraints.

Building a foundational understanding of spatial computing, especially for a platform like visionOS, is crucial for developers, designers, product managers, and other professionals in the tech sphere. Immersing oneself in virtual reality (VR) experiences, spending hours navigating different applications, understanding their functionality, and noting what works and what doesn’t, can significantly aid this understanding. VR applications like Hand Physics Lab and Youtube VR offer practical insights into designing for spatial computing.

Maintaining a diary to document the nuances of various VR applications can be beneficial. Evaluating apps on parameters such as comfort, user fatigue, intuitive navigation, value proposition, and repeat value can help develop a deeper understanding of successful VR design.

Adopting methodologies like IDEO’s design thinking can offer significant advantages in the design process. This user-centric approach involves rapid prototyping and iterative development, emphasizing addressing real human needs instead of imagined problems. For resources, the Design Kit website and its introductory video are recommended.

Considering the user’s energy expenditure is an essential design consideration. Spatial computing experiences should be designed to minimize user motion as much as possible. The ‘form follows function principle applies here, where the app’s purpose should reflect in its spatial arrangements and interaction pattern.

For the design phase, paper and cardboard prototyping are recommended. These can help determine the design's ergonomic feasibility before shifting to digital platforms like Figma. Tools like ShapesXR allow users to sketch ideas in space, create storyboards, and enable real-time collaboration with remote users.

While expensive for high-fidelity prototyping, the Varjo XR-3 headset offers high-quality passthrough, high-resolution displays, hand tracking, world mapping, and more. It can be a useful tool for larger organizations with ample budgets.

When designing for spatial computing, it is crucial to consider the user’s entire sensory system and how their brain integrates these senses. For instance, the arrangement of UI elements should be made considering the user’s natural line of sight, thereby minimizing head and eye movements. One must be cognizant of each design choice's ergonomic and cognitive impacts.

Where to go from here

The WWDC talk on spatial design is highly recommended for understanding the perceptual and cognitive design constraints unique to spatial computing. A well-designed app should prioritize reducing eye fatigue, discomfort, and motion sickness to provide a comfortable and immersive user experience.

Apple anticipates that the Vision Pro headset will be ready for the market “early next year,” implying a potential spring 2024 release. Initially, it will only be available in the U.S., with other regions scheduled for access “later next year.” A dedicated event is likely to launch the headset, reminiscent of the 2015 Apple Watch unveiling.

The starting price for the Vision Pro is confirmed at $3,499. However, the international pricing structure still needs to be confirmed by Apple. Additional costs will be associated with required prescription lenses, though the exact cost remains undisclosed. It also remains to be seen whether there will be additional options.

In conclusion, understanding the nuances of visionOS and its spatial computing environment is key for developers to create compelling, immersive applications that maximize user engagement and deliver exceptional experiences.