Object-Oriented Television
nicholas.negroponte
wired 4.07

The Media Lab's Michael Bove believes that a television set should be more like a movie set. But movies require locations, actors, budgets, scripts, producers, and directors. What would it mean, Bove wonders, if your TV worked with sets instead of scan lines?

Sets and actors
For too long, TV has taken its lead from photography, which collapses the three-dimensional world onto a plane. Except for the image-sensing mechanism attached to the back, today's TV camera is very similar to a Renaissance camera obscura. This long-standing construct is perhaps the wrong way to think about television. Maybe there is a way to capture the scene as a collection of objects moving in three dimensions versus capturing a single viewpoint on the scene. Think of it as a computer graphics process, more like Toy Story than Seinfeld.

The networked virtual reality language VRML has such a model behind it. But it's difficult to author good virtual worlds from thin air, so there aren't any out there on the Web that are as funny as Seinfeld or as appealing to the public as college basketball. What we need is "real virtuality," the ability to point a computer or camera at something and later look at it from any point of view.

This is particularly important to Hollywood, because most of the cost of a movie is in front of the camera, not behind it. Object-oriented television should cost less both in front and behind, and not look cartoonlike. It will still involve cameras, but instead of giving the postproduction people (or the viewers of an interactive program) a switch that flips between cameras one and two, these cameras will contribute what they observe to a database from which any viewpoint can be constructed.

Similarly, TV sound should be object-oriented. Instead of left and right channels, sound can be represented as individual sound sources in an acoustically modeled space so that on playback we can resynthesize the speaker to correspond with the arrangement of things on the screen and the viewer's path through them.

The bit budget
TV is a bandwidth pig. Ten years ago, a common assumption was that 45 million bits per second were needed to obtain studio-quality television. Today, that level of performance is possible at 4 million bps. That's quite an improvement, but compared with the 29,000 bps you get when connecting to the Internet (if you're lucky), we still have a long way to go.

There is one fundamental reason for this profligate use of bandwidth. TV receivers are dumb - in particular, they are forgetful. On a per-cubic-inch basis, your microwave oven may be smarter. A TV set is spoon-fed pixels - line by line, frame by frame. Even if you compress them by taking out the enormous redundancy that occurs within and between frames and by taking advantage of the characteristics of human vision, video as we know it still uses many more bits than a computer graphics database capable of synthesizing the same images.

Inefficiency also results from a lack of memory. Your TV doesn't remember that the set of the local news changes only about once every three years, it doesn't remember the architecture of sports arenas, and it doesn't remember the Steve Forbes commercials seen six times each hour by those of us living in states holding early primaries.

The digital TV sets about to hit the market are able to do a lot more multiplications per second than your microwave oven, but they still aren't "clever." They decode a closed-form standard, known as MPEG-2 (derived from the Motion Picture Experts Group). MPEG-2 may be among the last standards for which anyone bothers to develop a dedicated chip. Why? Because a single data standard for digital video, one that is always best, just does not exist.

We need a flexible decoder capable of interpreting whatever the originator (or an automatic process) decides is the best way to encode a given scene. For example, it would be more efficient (and legible!) to transmit the fine print during car-lease commercials as PostScript (a common standard for typography and printers) instead of MPEG. Your TV's decoding capabilities might be updated as often as your PC's Web browser is now. Perhaps TV viewers in the next decade will eagerly look forward to September as the month when the next season's algorithms get downloaded.

Storytelling
Having actors and sets hang around in our TVs isn't going to do us a lot of good unless we can tell them to do something interesting. So, in addition to objects, we need a script that tells the receiver what to do with the objects in order to tell a story.

TV conceived as objects and scripts can be very responsive. Consider hyperlinked TV, in which touching an athlete produces relevant statistics, or touching an actor reveals that his necktie is on sale this week. Bits that contain more information about pixels than their color - that tell them how to behave and where to look for further instruction - can be embedded.

These bits-about-the-bits will resolve a problem that has beleaguered Hollywood directors faced with one-version-fits-all screens and made them envious of graphic designers, who can design postage stamps, magazine ads, and highway billboards using different rules of visual organization. Television programs could react according to the originator's intention when viewed under different circumstances (for instance, more close-ups and cuts on a small screen).

You think Java is important - wait until we have a similar language for storytelling. TV is, after all, an entertainment medium. Its technology will be judged by the richness of the connection between creator and viewer. As Bran Ferren of Disney has said, "We need dialog lines, not scan lines."

This article was co-authored by V. Michael Bove (vmb@media.mit.edu), Alexander Dreyfoos Career Development professor at MIT's Media Lab.