IBM WebSphere and WebRTC – An Interview with Brian Pulito from IBM

In previous roles I held several years ago, I was involved in work with IBM related to WebSphere and their work around SIP. Things have advanced significantly since then, among other things, IBM added support for WebRTC to WebSphere. WebSphere is used by some of the largest enterprise communication companies such as Avaya and by large service providers such as AT&T Enterprise Business for managing and supporting eCommerce and Portal applications. Recently, it’s being used in AT&T’s telco network to deliver numerous SIP based services. It will be interesting to see how the new additions of WebRTC and the RtComm open source strategy IBM is pushing will be received in the market.

I wanted to delve more into IBM’s work on WebRTC. For this I contacted Brian Pulito who is Senior Technical Staff Member and a WebSphere Real-Time Communications Architect.

What is IBM’s WebSphere Application Server all about and who uses it?

The WebSphere Application server is one of the leading JavaEE (Java Enterprise Edition) servers on the market. Companies all over the world, including a large percentage of Fortune 500 companies, build and deploy web and mobile business applications on it. WebSphere is used for everything from providing business logic around transactional Systems of Record to delivering the newer Systems of Engagement that have been the big growth engines for business over the last several years. More recently, the WebSphere Liberty platform, which is a lightweight, composable application server, is being used to enable cloud based Microservices.

SIP Servlets have been a core part of the WebSphere SIP server for many years. Companies like AT&T and Avaya deliver their flagship services and products on top of WebSphere’s SIP Servlet container. In September of 2015, WebSphere released SIP Servlets and a new WebRTC feature for the WebSphere Liberty profile, bringing real-time communications to a platform that was designed from the ground up to be simple to develop on and easy to deploy in the cloud. Just as an example, a WebSphere Liberty server configured to support WebRTC along with a SIP application, can be deployed in less than a 100 meg file that contains the complete runtime. It also starts up in around 5 seconds, which is perfect for development.

So what is WebSphere middleware for real-time communications? First, it’s a scalable and secure signaling server for building WebRTC and SIP services. It includes a WebRTC gateway allowing it to be federated with any SIP service such as SIP trunking. It supports the SIP Servlet (JSR 289) programming model. It integrates with media servers using the Media Server Control API (JSR 309) and it comes with a number of out-of-the-box capabilities to help developers get started like a built in user registry, call queues and support for Third Party Call Control. Open source WebRTC SDKs for web and mobile development can be downloaded from GitHub using Bower and NPM. Best of all, WebSphere Liberty is free to download and develop on and can even be deployed in production at no cost for single server deployments.

You added support for WebRTC. How was this done? Does it simply have a WebRTC interface (leg) or is there more to that?

When we started looking at WebRTC we decided that we wanted to create a feature for WebSphere that would appeal to web and mobile developers who are not strong in telco protocols and services. Let’s face it, IBM is not known for telco networking equipment and products. We are about business systems and when you look at WebRTC, you realize there is a huge opportunity to finally combine real-time communications with context in a way that was just not possible prior to WebRTC. There are many vertical industries that could and probably would use WebRTC today if there was an easy way to integrate it into their business applications. In a nutshell, that is the focus of what we are doing with the new WebSphere Liberty Rtcomm feature and that is why we are promoting the Rtcomm open source strategy which includes not only WebSphere but Node.js and lots of other open source capabilities for the client and server.

So let’s drill a little deeper on this point. Real-time communications is really hard for the average JavaScript developer. While its easy to get a simple demo off the ground extending it out to a production deployment is much more difficult. You have to deal with things like STUN/TURN, media servers, high availability, load balancing, federation and the list goes on.

To make this easier we decided (like a lot of other companies) that we needed to provide SDKs on the client side that made it super easy for JavaScript developers to build solutions that were both open and could easily integrate with our backend solutions built around WebSphere. We also wanted our SDKs to look and feel like a modern JavaScript library. That means its easy to install through Bower and NPM, it supports JavaScript frameworks like AngularJS, etc. We also defined an open signaling protocol that is extensible and built on JSON which is well suited for JavaScript development. This signaling protocol is built on a very simple protocol called MQTT. In fact, even though WebSphere provides all kinds of additional features on the backend that you would likely need for any production deployment you can do peer-to-peer WebRTC calling with just the open source client SDKs and any standard, open MQTT broker. We’ve recently been testing a complete open source solution based on the Node.js Mosca MQTT broker and Rtcomm. With those open source technologies you can literally have an end-to-end WebRTC system up and running in under an hour.

Early on we decided that SIP was not a great solution for browsers or even mobile applications. While I’ve worked with SIP for many years and absolutely think it’s a necessity when it comes to backend federation with the rest of the world, any protocol trying to be all things to all people gets bloated after awhile and SIP is no exception. Because of this and other reasons we decided to build our WebRTC solution on top of MQTT. MQTT is an extremely simple pub/sub protocol designed for low powered devices. There are open source MQTT clients for every major programing language along with open source MQTT brokers and IBM has products like MessageSight that can scale into millions of transactions per second and handle enormous numbers of client connections. The scalability of MQTT is off the charts.

Since MQTT was also born out of the Internet of Things revolution, integration with IoT devices was something we were very interested in. For instance, there is a super cool open source project for Node.js called Node-RED that provides a UI for wiring together IoT devices with services. My team created a Third-Party Call Control node for Node-RED that makes it possible to trigger a WebRTC call from an MQTT event published from any device in the world. This allows WebRTC to move beyond peer-to-peer RTC browser solutions to include everything from home automation integration to surveillance systems that rely on cognitive services for crowd detection.

As for which backend programing model did we expose for developing services for WebRTC clients, we decided that it was important to expose API access to WebRTC signaling on both WebSphere Liberty and Node.js. On Liberty we provide a gateway to SIP servlets. It just didn’t make sense to expose one programming model for SIP and one for WebRTC. That means we stay simple at the client where things like forking, proxying and back-to-back user agents are not needed but we provide those deep capabilities on the server-side where they are needed to create advanced signaling services. On Node.js we currently support an API for Third Party Call Control and an API for catching WebRTC related events. Since Node.js is so popular with WebRTC developers it only makes sense for us to continue to expand that programming model to include a signaling API for service development.

How are you handling WebRTC on the mobile side

As is the case with web applications, our primary goal with mobile is to enable customers to capitalizing on the inherent context of mobile apps. Everyone can already chat or make audio or video calls with dedicated apps on their mobile devices but things get much more interesting when these real-time capabilities start being embedded directly with the context.

For example, Insurance companies can have audio/video built into their app using WebRTC and provide context, user identity, and location information to a service rep on the backend. If there is an emergency or you need to file a claim, all the information relevant can be provided to the agent without them having to ask.

This can extend further into Healthcare apps, where a user can make an emergency call, log vital signs, symptoms and talk directly to a nurse who can have their medical history based on their identity. They can visually assess many things in addition to talking with the customer.

The real issue is reducing the friction that is inherent in most business interactions. Embedding WebRTC in mobile applications can substantially improve customer loyalty.

Our approach is to provide ways for people developing apps to enrich them with WebRTC and allow them to be innovative in their use cases by giving them an enterprise ready RTC infrastructure and simple mobile SDK.

What approach did you take for running WebRTC on mobile and what alternatives did you look at before making this choice?

There are two important issues that must be addressed when thinking about adding WebRTC to a mobile application. First, how are you going to handle signaling? And, second, do you want to maintain your own native WebRTC builds for iOS. This has some influence on the Hybrid vs. Native discussion and for us the first step in the process was to build a Hybrid mobile solution.

We use WebSockets & MQTT for signaling which consume minimal power on mobile devices. Recent changes in iOS have made WebSockets even more attractive. Going hybrid allowed us to reuse our JavaScript API to provide a simple interface to the signaling and WebRTC. In order to ‘go Hybrid’ we needed to decide on building our own plugin vs. using an Open Source one. We tried several options and settled on a Cordova plugin called iOSRTC. This is well maintained; it’s current and has quite a few users. It also implements the same API that is provided in the browsers for WebRTC, which made plugging in our JavaScript library very easy. Other plugins we tried had less users and slightly different APIs that complicated our development. We open sourced a mobile sample application built on angular-rtcomm (an Angular.js module for WebRTC) and Ionic that you can take a look at here.

In our initial experimentation phases, we did do some native work with Android, but still elected to stick with a hybrid solution to keep things simple. For the same reason, we did not go down a native path for iOS either. We just didn’t want to invest in keeping a version of WebRTC current for iOS. Going native also would require us to re-build our signaling stack for each platform we chose to support. In the end, going ‘native’ is trade off between developer resources vs. a UI that feels more like a web application than a native app. There have been so many great advances recently with HTML5, you can do some extremely powerful things now with hybrid applications and reuse the investment in many different ways.

What do you view as the main value WebRTC brings to WebSphere and to your customers?

This one is simple. WebRTC allows you to contextualize real-time communications. This is no secret. The question is what is the best way to achieve that. For any business out there looking to enhance their Systems of Engagement with real-time communications, WebRTC can’t be ignored. This is especially true if you accept the fact that native plugins in browsers will soon be a thing of the past. Since WebSphere is all about delivering the business logic that drives web and mobile applications, it makes sense to use that same foundation to add real-time communications to those same business applications. The fact that WebSphere already has a long history of providing signaling services for tier one service providers like AT&T means signaling is part of our DNA and extending the reach of that to support WebRTC is a natural evolution of the platform.

Looking into the future what are your next plans for WebSphere and for the WebRTC side of your implementation?

Well, IBM has a pretty strict policy when it comes to discussing future plans but I think our future direction is pretty clear if you look at where IBM as a company is heading with our recent public announcements.

First is cloud. There is very little IBM does these days that does not start with cloud. IBM has been working through a major transformational shift for many years now to move everything we do to the cloud and WebSphere Liberty is no exception. The WebSphere Liberty runtime is already available in IBM’s Platform as a Service offering called Bluemix. We run several WebRTC demos out of Bluemix today using this service. The challenges that an average web or mobile developer is faced with when it comes to WebRTC go way beyond programming. Deploying and managing the complex infrastructure to support RTC capabilities is something many companies just won’t be able to justify. It’s pretty easy to connect the dots on this one.

Second is cognitive. IBM recently announced the dawn of the cognitive business and its been investing heavily in this area for many years. Watson services are at the core of this effort and if you want to see the kinds of cognitive services you can access today in the cloud, check out the catalog tab at www.bluemix.net. I view real-time media as big data and WebRTC is a rich source for that data. If you assume that IBM will leave no stone unturned when it comes to cognitive, it’s pretty easy to imagine cognitive solutions that encompass real-time communications for things like language translation, sentiment analysis, fraud detection and all sorts of real-time business insights.

Third is Node.js. As mentioned earlier we’ve already been integrating our WebRTC solution with Node.js through Node-RED and I believe this trend will continue. IBM is now a Node.js Foundation Member, which means we are all in when it comes to Node as a middleware platform. I love Node.js and I’m a firm believer that Node.js and WebSphere Liberty both have their place when it comes to middleware development. We will continue to play to the strengths of each of those middleware platforms as we move forward.

An important part of our future strategy is the proliferation of the Rtcomm ecosystem. While WebSphere is an important part of that it is only one piece of the puzzle. Rtcomm is an open technology that should not be perceived as vendor specific. In this day and age you have to think beyond individual products and services to succeed. Developers looking to invest in a new technology want to make sure its future proof. That’s why IBM is embracing a 360 degree approach to WebRTC development that embraces not just WebSphere but lots of open source technologies like Node.js, Angular.js, Paho MQTT, Mosca and the list goes on. I see lots of developers on Stack Overflow really struggling with how to get started with WebRTC. The future of Rtcomm is to continue providing a simple path for the average JavaScript developer to quickly get moving with WebRTC, but for those developers that need to go deeper, Rtcomm will continue to expand on its already rich set of capabilities to support even more advanced capabilities.