First app: Follow the speaker

As I said in the previous post, we’ll use this blog to help show you how to get started building cool new video applications using the Mosami API.  But to help get you started, I’m going to start with a familiar example – mixing multiple video streams with a focus on the active speaker.  This demo uses our Python language bindings, and is based on “convdir_position.py” from our Example Gallery available for download from our Launchpad site https://launchpad.net/mosami.

What the application does is divide the screen into two regions “top” and “bottom”.  When you speak (the presenter), you move to the top.  When you’re silent (the audience) you move to the bottom.  Simple.  To do that, we make use of two Mosami API functions:  LayoutMix (to create the mixed output) and DetectSpeech (to trigger changes to the layout when someone speaks).  When finished, it looks something like this

How does it work?

In the init method of the demo, we launch a pipeline of LayoutMix (which extends the basic Mix functionality to deal with defining regions, resizing on new videos, and more).  The python binding jlaunch wraps the POST calls necessary to create new pipelines.

self.mix = self.mo.jlaunch('LayoutMix', dst=self.state['dst'], volume=1)

And define two regions (called DIVs) in the layout.  The python binding msg wraps the PUT calls necessary to update an existing pipeline.

self.mix.msg('add_div', layout='intro', div='top', placement_algorithm='coords_1xn_grid', xpos=0, ypos=0, width=self.state['width'], height=self.state['height']/2, zorder=10)
self.mix.msg('add_div', layout='intro', div='bottom', placement_algorithm='coords_1xn_grid', xpos=0, ypos=self.state['height']/2, width=self.state['width'], height=self.state['height']/2, zorder=10)

Each time a video stream is added, we need to do two things.

First, we attach it as an input to the LayoutMix pipeline and place it at the bottom

self.mix.msg('add', stream=stream, div='bottom')

Second, for each input, we launch a DetectSpeech pipeline (sometimes referred to as VAD – Voice Activity Detection)

self.vads[stream] = self.mo.jlaunch('DetectSpeech', src=stream, interval=1000000000, run_length=2*1000000000, rx_msg_callback=self.vad_cb, user_params=stream)

There’s one step still missing – when someone starts/stops speaking, we need to change the layout appropriately.  This is handled with the callback function rx_msg_callback=self.vad_cb that will be invoked whenever the DetectSpeech pipeline creates an event.  Here, we use one callback function for all pipelines, with user_params=stream passed to the callback function to distinguish them.

With that setup, the callback function itself is quite simple – filter on the correct message type, then send a message to the LayoutMix pipeline to move the given stream to the top or bottom DIV.

    def vad_cb(self, msg, user_params=None):
        stream = user_params
        if 'moMessageType' in msg or 'type' in msg:
            type = msg['type'] if 'type' in msg else msg['moMessageType']
            if type == 'com.mosami.analyzer.vad.activity':
                 if msg['above']:
                       self.mix.msg('add', stream=user_params, div='top')
                 else:
                       self.mix.msg('add', stream=user_params, div='bottom')

That’s all there is to it! This is a simple example, but also illustrates how Mosami gives you direct access to the application behavior you want.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s