Vision Pro does a Job(s to Be Done) Interview

Vision Pro does a Job(s to Be Done) Interview

This was nearly titled “XR Wars: A New Hope.” You’re welcome.

By now you’ve read an article or two hundred about Apple’s latest novel hardware announcement: Vision Pro. I’m here to talk about what I think is being overlooked elsewhere.

I have heard a few “thought leaders” dancing right up to their own version of then Palm CEO Ed Colligan’s take on Apple entering the seemingly well-established smartphone market back in 2006:

We’ve learned and struggled for a few years here figuring out how to make a decent phone… PC guys are not going to just figure this out. They’re not going to just walk in. source

Apple, some of these folks are saying, “is just showing up late with a ridiculously expensive version of what everyone else already has, like they always do.”

Apple has entered enough “seemingly well-established markets” (cf. iPad, Apple TV, Apple TV+, AirPods, and perhaps most importantly microchip design) since the iPhone that people know the chances are good it will work out, but the instinct to doubt the Johnny(née Jony, err, Sir Jony, that is)-come-lately is strong. I find, though, that what differentiates Apple’s offerings from those that came before is their answer to the “jobs to be done” question, the question, “what are you hiring this thing to do?”

What made iPhone special?

I owned one of nearly every smartphone that was on offer before the iPhone came out, starting with a Handspring Tree with a cellular add-on (🦖). I was excited about the possibilities of hyper-personal mobile computing. Having been following the Palm/Pocket PC industry since inception, I’ll admit that Colligan’s quote above sounded right to me, even though this was after the iPhone had been announced and demonstrated by Steve Jobs. I was an Apple fan, and I thought the demo was great, but I didn’t really get it. What would iPhone do that the keyboard-equipped Windows Phone device I was carrying couldn’t? I was thinking of the comparison in a simple set of checkboxes:

iPhoneWindows Phones
Mobile browser✔️✔️
Email client✔️✔️
Physical keyboard✔️

This table obviously didn’t capture as much as it felt like it did at the time. Steve Jobs would later compare the use of iPhones versus Macs to the use of cars versus trucks. In a similar table comparison you might note that both use gasoline, both have wheels, etc. You might note that one has a truck bed and one doesn’t (usually). This analogy is getting less and less useful in the United States, as more people are buying pickup trucks and using them like cars, and more manufacturers are manufacturing pickup trucks that are essentially cars, but—at least historically—people hired pickup trucks to do a very different job from cars. If you want something with great fuel efficiency, safety ratings, creature comforts, and the like, you’re probably going to hire a car. If you’re looking for something with a lot of pulling power, rubberized interior that can be cleaned easily, and, most of all, a truck bed, you’re probably going to hire a truck. The tabular comparison, often called the “speeds and feeds” perspective, misses the spirit of the thing.

The iPhone was different from the smartphones that came before it because it was designed from the ground up to do three jobs:

It was interviewing to replace the iPod. The Windows smartphones could all play MP3s, but none of them would be hired to replace your iPod. They certainly didn’t have enough storage—because they weren’t designed for that purpose, and even if you added enough external storage via a compact flash card, there wasn’t good software for browsing large libraries of music. Getting music onto your device, too, was an issue for years. iPods, by that point, had picked up the ability to play video content, too, and Apple was selling movies and TV shows through the iTunes Store. iPhone could play all of that on day one, on a much bigger screen than the iPod and any Windows smartphone had ever offered. The iPhone was a shoe-in to replace your iPod.

It was interviewing to replace your cell phone, too, which seems obvious, until you learn/remember how bad a lot of smartphones of the time were at this job. Many had terrible sound quality, terrible battery life, terrible reception, or all of the above. Few had speakerphone capabilities. No one had ever done anything like what iPhone did with voicemail (a visual inbox where you could browse, listen to, and delete messages).

It was interviewing to be your favorite internet terminal. The other smartphones had web browsers, but none of them held a candle to Safari on the iPhone. The other devices had browsers, to be sure, but they often attempted to render web sites as though they were a desktop computer with a large monitor, and you had to pan and zoom around to try to see the site. Safari was the first mobile browser I recall that offered a good “native”-like experience for the web, wherein pages were rendered for the screen you were using, and it felt like it was on purpose. You could smoothly scroll and read many web sites quite easily. It was vastly superior to any mobile browsing experience I’d seen before it.

In table form, iPhone and Windows/Blackberry/Palm both had cellular capabilities, web browsers, etc. Check, check, check. But that just wasn’t the whole story.

When you know why you’re including a feature—why you’re spending the money on R&D, or on higher-priced parts, or slower and more costly production steps—you know better what features matter and how good they really need to be. Mobile Safari wasn’t so much better than its competitors’ offerings just because Apple liked doing laps around people. It was better because it needed to be if it was going to do the job it was interviewing for.

Within a few generations it became clear that iPhone was getting hired for another job: as your new camera. Other devices had cameras, but most of them were the cheapest, smallest cameras that could be included. Because Apple designed iPhone to be hired instead of your point-and-shoot camera, they spent the time and money to figure out how to fit ever larger, ever more complex lenses into their devices.

These days it seems that iPhone is overall hired to be your computer in your pocket. iCloud was created just for that purpose, I think: to make the decision of “should I do this on my phone or my laptop?” as close to an arbitrary decision as possible (since all your passwords, notes, email, contacts, email, open browser tabs, files, photos, etc. and all the changes you make to them are available across all your devices). And it turns out when your computer is in your pocket, it becomes even more personal and powerful. Since it’s designed to be used by just one person, you can surface information in notifications and widgets that you might not want displayed for all to see on a large monitor, and it can use biometrics to know when you are looking at it, to obscure sensitive information until your eyes and your eyes only are on it. It can then be hired to be your new wallet (and all your credit cards, public transit cards, state license, etc.).

Vision Pro’s CV

So, with that framework in mind, what jobs do Apple think Vision Pro is interviewing for?

  • Privacy – I didn’t think of this one. I heard John Gruber and Matthew Panzarino on The Talk Show talking about this. If your job involves looking at confidential data, this is going to be one of the safest ways to work with it—far better than a “privacy screen” stick to your monitor or laptop screen. But every one of us has received a text message notification with enough text in the preview that we wish it hadn’t arrive while we were showing our colleague something on our computer screen.
A laptop with a privacy hood attached that obscures the screen for anyone but the user
  • Security – I didn’t think of this one, either, but a friend of a friend who works in the medical field pointed out that his group has been waiting years for something that would provide a VR experience for their doctors that would also meet the security requirements of their IT department. The friend of mine in that chain, who teaches video game design, has had the same problem with their school systems’ IT department. He’s been sitting on a number of Oculus Quest headsets that have gone (mostly—he’s crafty) unused because they don’t meet the requirements for security.
  • Focus – I think of the virtual Mt. Hood environment shown in the demos as an optical equivalent of noise-cancellation features. If you’re in a noisy environment (which could be your attempt at working from home, a busy coffee shop, or even your desk in a traditional office), combining AirPods Pro in active noise-canceling mode with the isolation mode in Vision Pro seems like a game-changer. And I imagine this is even more helpful for neurodivergent folks with ADHD, stimulus sensitivities, etc. Also helpful for those of us that are finding the return to office challenging.
Soundproof booths that are hired for the same job that Vision Pro might be—isolating you from a noisy environment
  • Mobility – What I see when I watch a demo of someone working at a virtual 4K monitor, using their gaze and subtle pinch gestures in place of a mouse or trackpad, is someone who doesn’t need to carry around a big, honkin’ 4K monitor, a mouse/trackpad, and all the wires and stands and such needed to make them work. I also see someone who can use their computing device in more environments (including cramped into a small seat on a plane or bus where having your laptop in your lap or on a tray table wouldn’t be very comfortable) and possibly with less stress on their body (the gaze/pinch input is a lot easier on your arms, at least, than holding your arm up for several hours using a mouse/trackpad). You can take your virtual desk—complete with its 4K monitor and scenic view—with you anywhere you go (for at least a couple hours before your battery dies).

So, if you’re hiring Vision Pro to replace your office, what kind of price tag would you put on that? Pricing out just the rather large 4K monitor you’re getting in your virtual office alone, you’re in territory beyond that of something like the Oculus Quest. If you’re hiring Vision Pro to do the job of one of those office sound booths? Those things start at around $6,000.

If you’re hiring the device to be a gaming console, the thing you’re replacing probably only cost a few hundred dollars. It’s going to be hard for you to justify spending an order of magnitude more. That roughly $300 price-point, then, is going to influence decisions like what kind of screen you can afford to include in your device. There are virtuous and vicious cycles here that keep the Oculus Quest and its kind in relatively low resolution screens and Vision Pro using screens that users describe as “nearly impossible to differentiate from being in the room.” When you’re interviewing for the job of “workspace,” you know that means potentially 8 hours of continuous use. You know you need to design something that doesn’t trigger nausea due to rendering latency. You build a device with numerous sensors and cameras and a second, very capable processor dedicated to the job of capturing and rendering the user’s environment in such accuracy that they don’t even feel like they’re wearing a headset. Your competitors used the resolutions of their inferior screens in their marketing copy. You don’t even mention real numbers, because that’s not the point. Vision Pro isn’t being hired to be a high resolution monitor; it’s being hired to be invisible to the user. When professionals in their place of work are hiring your device to make their jobs easier, you design ways to ensure your technology only gets in the way when it should. You design ways for the user to remain present in the room—like the front display that show’s the users’ eyes. You design ways for the user to remain aware of their environment even when trying to stay focused on their work—like the features in which people trying to make eye contact with you and/or speak to you will “punch through” your virtual experience in Vision Pro. And you can price it in such a way that you can provide that experience, because you knew what jobs you were interviewing for.

Missing from all of this? Gaming. I don’t doubt that there will be games available for Vision Pro, and Apple may even dedicate a corner of Apple Arcade to them. But it’s clear that Vision Pro is not interviewing for your Playstation’s job. Part of that may be because this is Vision Pro. Apple doesn’t promote MacBook Pro models with gaming use-cases, either, but you will see them promoting gaming on iPhone, iPad, Apple TV, and even some Macs (never Pro lines, to my recollection). Perhaps a forthcoming Vision model that isn’t in the Pro line will see game developers on stage at WWDC talking about how easy it was to make with Metal, Reality Kit, et al. But as with iPhone, iPad, etc., I’d be willing to bet Apple still won’t be aiming to replace your console, and therefore won’t be designing the device for gaming. The difference between Apple designing a device meant to be hired to replace your game console, and designing a device that can also do gaming? Look to joystick support. If Apple meant for any of their devices to be hired as a gaming console, they would have released their own joystick (or perhaps something hired to do the job of a joystick). Instead, we see Apple slowly adding support for other vendors’ joysticks. I’d expect the same for the Vision line.

Where it breaks down…

And then there’s the birthday cake scene. From my read of people I’ve spoken to and voices I’ve heard on podcasts and blogs, this part of the demo felt very different from the rest of it. It was weird. People could justify it afterward—many comparing the headset-donning father to one carrying a giant camcorder in the 80s/90s, but it certainly didn’t feel right during the demo. I think it’s precisely because this is not a job that anyone intuitively wants to hire this device for. No one is saying, “I sure wish there was a helmet I could wear to my kid’s birthday party…” no matter what magical powers that helmet might convey. Will people do it? Sure! This was Apple doing what it has historically been so good at avoiding: searching for other problems that could be solved with this new bit of technology. Yes, when you have this array of sensors and cameras you can record an incredible first-person perspective video. And, yes, watching those recordings will probably be amazing at least some of the time. But this is not a problem that people have, yet, and therefore no one is hiring for this position, and so this part stuck out like a sore thumb.

a man watches two children playing while wearing a Vision Pro headset

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.