At the recent IETF meeting in Prague, one of the hot areas was the RTCWEB stuff. The short explanation (Real-Time Communication over the web) is that the browser should become a full audio/video client. The browser needs to get access to hardware and OS resources so a full, high-quality video client can run live inside the browser. This is of course something that Google (Google Talk) and Skype are very interested in enabling.
More controversial, the second interface makes for more interesting discussions and possibly strategic consequences. The RTC client running inside the browser needs to communicate with a server or network to make and receive calls. It needs to register somewhere, “here I am, I’m ready to receive calls!” and it needs to be able to make calls, “please connect me with the following address!” The question is: Should this interface be standardized or should the RTC client just communicate with the central server using a proprietary protocol? The argument for the latter is that it will stimulate innovation and that the above API and how to set up the browser-to-browser media sessions are the only elements that should be standardized. Well, it is likely that mechanisms like ICE may be of use in setting up a session, and there will be media parameters etc that must be communicated as part of setting up a session, so what has already been solved using SIP must be mapped into the protocol between the RTC client and the server that connects the calls, whether it is proprietary or not. Although I’m favorable to the argument that using SIP between the RTC client and the server creates a lot of overhead if you only need a small part for your application, it seems a bit unnecessary that every RTC client implementer has to come up with a new way of setting up a session and exchanging the necessary parameters.
Pondering this a bit I get the feeling that no standardization of the RTC client-server interface would spur innovation for small, light-weight applications, but that any real audio/video/IM client implementation would require quite a bit of work to get right. This in reality favors larger vendors with lots of resources. As you probably can choose to adopt a light-weight client/server protocol even if there is a full SIP stack in the browser (or at least, it should be possible to design in way that this is possible), I don’t really see the downside (beyond standardization body cycles) of standardizing how SIP should work between a browser and a web server, probably over port 443/80.
So, what are the consequences for companies like Cisco? We would really welcome video capabilities within the browser. For example, our PrecisionUSB camera would be possible to use to get a high-quality video source. Of course, having a large install base, we would like to ensure compatibility with existing SIP systems and avoid interoperability through a gateway. Gateways are always expensive (especially media gateways) and reduces the functionality that can flow over the gateway. I would love to see the possibility of creating a browser-based client, but if the quality ends up as a least common denominator (i.e. not possible to add codecs to improve quality or extend beyond APIs/protocols to improve functionality), I feel we haven’t gained much. Indeed, enabling a basic real-time communication experience through free services would be beneficial to everybody, but if we do it in a way that allows high-school students as well as large corporations to innovate on top of it, I believe we have much more to gain.