The Next Generation of Communications Tech: Q&A With Voxer’s Tom Katis

The dawn of push-to-talk systems

When I was invited to attend the amazing MaiTai event in Cabarete, Dominican Republic (serious #humblebrag going on right there), the organizers of the event established that Voxer would be the communication platform of choice for all the attendees. Very quickly the group started “voxing,” illuminating the potential of this app: people were leaving voice messages, texts and photos. 

Information about the event was delivered as needed and questions answered in close to real time. From kiting weather reports to random queries like “Has anyone seen my eye patch?”, people were able to communicate to the group and at the same time not be overwhelmed with the volume. Very cool.

Voxer could be a very useful tool for communicating with an ad operations team. Instead of having to interrupt everyone with conference calls, I can vox out my message (text or voice depending on what I need to say and how I want to say it) and get a reaction as soon as people are able to respond.

One of my fellow attendees at the MaiTai event was Voxer CEO Tom Katis and I asked if I could interview him for the site… using Voxer. Conducting an interview proved a perfect application of the tool: Tom and I were able to leave messages as time permitted over the course of a few days – no email with standard questions, no frustrations with trying to book a call.

The time delay also let me think about Tom’s answers and prepare better follow-up questions. Tom was free to respond when he was available, as well as leave recorded messages rather than typing out his answers. Going forward, Voxer will be my go-to application for conducting interviews.

Could you tell us about Voxer and what you are hoping to accomplish?

The goal of Voxer is we’re trying to roll out this thing that works a little differently than everything else, a new communication type. That’s hard. It’s not just a free or mobilized version of something that already exists.

Sure, it can replicate phone calls or text messaging. But Voxer is a next-gen push-to-talk system. It doesn’t work like any other previous communication platforms. Prior to Voxer, it was either live-only like Nextel, or old-school walkie-talkies. Both have to blurt out and interrupt you or they don’t work, or they work as systems that can never be live. However long one person is speaking, the other person can’t. So they’re completely asynchronous and non-progressive.

Voxer is this hybrid that’s sort of a low-laying, progressively moving forwards system. And so any time you’re rolling out something that’s new, with any new user behavior, it’s going to be harder than just rolling out a new version of something that existed before. But that’s also what makes it exciting, because all of the new use cases and bringing a new tool into the mix.

It’s not quite phone calls, it’s not quite old-school push to talk, it’s not quite IM or voice mail or any ofthese other things – it’s a new thing. We’re excited about it, and we can make organizations more productive and efficient. We know that businesses are willing to pay for a system that makes them better.

My understanding is the idea of Voxer came from your experiences in Afghanistan. What communication problem were you trying to solve?

The thing that drives me nuts about military communications is that you need something that’s highly responsive and something that quickly gets you the response that you’re looking for. You need something that’s live. You wouldn’t want to be getting shot at and leaving a voicemail or being on hold.

Live systems up until Voxer were only live. A key element of push-to-talk radio is you have be interrupting someone, it has to blare out and they have to deal with it right then and there. But any time you’re doing anything, whether it’s military or business or otherwise, you’re never just dealing with one thing. And the more stressful things are, the more complex they are. 

In a work environment you’re negotiating a deal and you’re dealing with your own internal team and you’re dealing with lawyers and you’re dealing with their counterparts and with all sorts of stuff. In the military you’re getting shot at or you’re getting a call for MediVac or you’re getting a call for service support and you’ve got to talk to your commander, or you’ve got to have a quick reaction for force inbound, you have other units in the area… So when you have a system that’s live-only that has to interrupt you, you basically can’t multitask, or you can’t multitask effectively. 

You basically have to turn off everything except for one individual group that you’re talking to, because you only have one brain, and if five people are talking to you at the same time, you cannot comprehend it. And so you have to serialize everything, which means you miss stuff. If I’m trying to talk to somebody, I first of all have to know which channel they’re going to be on, and what happens if they flip to a different channel to talk to somebody else and we just endlessly are looking for each other?

The same holds true if somebody’s trying to contact me – for example, my commander’s trying to get an update, but I’m busy talking to air support, and when somebody tries to contact me, I have no knowledge that they even tried. There’s no messaging – it just disappears. 

To be effective and productive, you need to use a messaging system. The messaging systems are elegant, they’re non-intrusive. They allow you to multitask and prioritize. You can get 100 emails and you can scan them and pick the one that’s most important to you and respond to that first. You can have five chat windows open in instant messaging and focus on the one that’s most important. Periodically, when you have a couple seconds or a minute, you can go through and see what you missed in the less important ones, deal with those and then get back to the important one. Our messaging systems allow you to prioritize and to multitask, and live systems really don’t. You have to only deal with what is urgent. 

But people hate to wait. In a military or a police environment or fire or the like, real time is critical because time can mean lives. But it’s more than that. In general, not only businesses, but even just normal consumers, hate to wait. Everybody wants the instant gratification. 

And if you force asynchronous action for as long as I’m speaking and you can’t start listening until I’m done speaking in the way voicemail works, then it’s not really a conversation. Then it’s series of disjointed messages. And sometimes you want it to be fully asynchronous. For example right now I just spent the last two hours in a meeting and I wasn’t able to respond to you, but if this was live-only, push to talk, I wouldn’t be able to respond later. You would have just gotten an error message or you would have just gotten nothing from me.

The idea that nobody has to wait is also super critical, because that turns a messaging system into a conversational system and provides that immediacy and response time that’s so critical. 

The military is pushing IP-based communication out to the battlefield. It’s not ubiquitous yet, so rather than tackling the obstacles of trying to figure out how to roll a military system out to a world with an imperfect network and deal with the government bureaucracy and all that stuff, it makes so much more sense for us to focus on the consumer first – roll out a free app and gain adoption. 

The next step is enterprise and businesses where everybody’s just using the same smartphones and the same networks as consumers, but they want to pay for things that control every business need. If you were to work for a bank or a tobacco company, they’re not going to say “Oh, just use Gmail.” They want you to use their own email – even though their email is no better than free Gmail, they need that control.

After that, we want Voxer to be used by the police, fire departments and the military. At the same time it’s a very broad application that can be used in a lot of different environments, and it solves many different use cases. It fills a gap between live-only communication and asynchronous messaging-only communication, and that’s what we’re all about.

You’ve built a utility and advertising is going to get in the way, but I’m assuming that while you thought through revenue models, advertising was probably something you considered.

I think that advertising has been the default business model for the last 15-years-plus since the dot-com boom simply because there weren’t a lot of other good business models.

The entire advertising market globally is $500 billion a year, maybe a little bit more than that, including TV, newspapers, online, everything, billboards, etc. The online portion of that is $100 billion a year, and Google has about half of that, and everybody else – between Facebook and whoever else – is all fighting over the other half. So, they’re fighting over about $50 billion.

The over-the-top communication market that we are going after along with all these other apps is about a trillion and a half dollars a year, and all of it’s online. All of it is subject to disruption. There’s a gigantic market. So advertising will be one type of revenue model within communication apps, but it will by no means be the only one, and I don’t believe it will be the largest.

We foresee more of a subscription model with our business customers. We’re going after the high end. There are others that are going after advertising, distribution, etc. There will be certainly a place for it.

We are considering at some point rolling advertising into our free consumer app and therefore providing motivation for someone to actually pay for the pro version just to remove the ads. That would be one way of monetizing the free version. Otherwise it’s just a loss leader for us.