Shoot 1.6.4

Running Shoot

To run Shoot, all you need to do is to launch the shoot.exe program located in the installation folder. To force Shoot to load a profile automatically when it starts up, you just need specify the profile file to use on the command line (you may need to specify the full or relative path to the file if it's not in the same folder as shoot.exe). For example:

shoot.exe myprofile.xml
shoot.exe profiles\falcon4sp3.xml
shoot.exe c:\shoot\profiles\falcon4sp3.xml

Shoot file format

A Shoot configuration file has a main section called shoot-config which contains a number of elements for configuring each aspect of the profile. The file must start with the line:

<?xml version="1.0" encoding="iso-8859-1"?>

The following diagram illustrates the overall structure. For complete examples, take a look at the profile library.

<?xml version="1.0" encoding="iso-8859-1"?>
<shoot-config>
   <command-list key-delay="..." key-pause="...">
     <command name="..." phrase="..." key-delay="..." key-pause="...">
       <key type="..." repeat="..." delay="..." pause="..."/>
       <key ... />
       <sequence keys="..." delay="..." pause="..." />
     </command>
     <command>
       ...
     </command>
     ...
   </command-list>

   <push-to-talk initial-state="...">
     <hold on-press="...">
       <key type="..." />
       <key ... />
       <button device="..." number="..." />
       <button ... />
     </hold>
     <toggle>
       <key type="..." />
       <key ... />
       <button device="..." number="..." />
       <button ... />
     </toggle>
   </push-to-talk>

   <sounds>
     <sound type="..." source="..." pan="..." volume="..." />
     <sound ... />
   </sounds>
</shoot-config>


Command list

<command-list> element

The command-list element encloses the list of commands that Shoot will understand. There's nothing special about this element, except that it has an optional key-delay and key-pause attribute that can be used to set the default delay and pause for keys in all commands in the profile. Delays emulate the time span between the pressing and releasing of a key. Some programs are sensitive to such delays (e.g., the longer the delay, the stronger the effect). The delay is measured in milliseconds, and the attribute can take any value greater than or equal to 0. If no value is specified Shoot will assume a default delay of 100 ms. On the other hand, the pause specifies the time Shoot must wait between keystrokes. Like the delay, the pause is also measured in milliseconds and the default value is 0 ms.

Note: the delay attribute describes the time between the pressing and releasing of a key, not the time between a successful recognition by the speech recognition engine and the actual emission of keys. Keys will be emitted as soon as a command is successfully recognized. Similarly, the pause is the time Shoot waits after sending each key. It is better to think of the delay and the pause as the rate at which keys are emitted (they're actually the inverse of the rate, but you get the point).

<command-list key-delay="milliseconds" key-pause="milliseconds"> 
   ...
</command-list>

<command> element

<command name="the name" phrase="a phrase to recognize" key-delay="milliseconds" key-pause="milliseconds"> 
   ...
</command>

A command is characterized by its name and phrase. The name identifies the command, so it must be unique throughout the profile. The phrase specifies the sequence of utterances that will trigger the execution of the command. Phrases, too, must be unique across the profile for the speech recognizer to be able to match a recognized phrase to a command. A phrase is made up of a sequence of words separated by spaces. In the current version, numbers must be spelled out for the speech recognizer to understand them. Thus, twoshould be used instead of 2, and so on. I'm investigating how to get around this limitation.

Like its parent, command-list, the command element also has optional key-delay and key-pause attributes. When the attributes exist they override whatever value was set at the command-list level. If no delay or pause is specified, the command will inherit the command-list's key delay and pause value.

Command sections contain sequences of keys, which will be sent to the active application when the command is triggered. The key element is described in the next section. The following example illustrates a command definition.

<command name="WINGMAN clear my six" phrase="clear my six">
   <key type="W"/>
   <key type="7"/>
</command>

<key> element

<key type="key combination" repeat="number" delay="milliseconds" pause="milliseconds"/>

Key elements represent individual key combinations that will be sent to the active application. A key element has one mandatory attribute, namely, the key type, and the optional attributes repeat, delay and pause.

Type should be any combination of the keywords SHIFT, CTRL and ALT and a key from the following table (or any other printable character on non-US keyboards). Note that key types represent physical keys, so, for example, to emulate the $ sign, the key type should be specified as SHIFT 4.

A ... Z
0 ... 9
F1 ... F12
Numpad0 ... Numpad9
Numpad+
Numpad-
Numpad*
Numpad/
Numpad.
NumLock, ScrollLock, CapsLock
Up, Down, Left, Right
PageUp, PageDown
End, Home
Space (or " ", without the quotes), BackSpace, Tab
Delete, Insert
Enter, NumpadEnter
Esc
Pause
PrintScreen
[ ]
\ /
. , ;
= -
` '
BrowserBack, BrowserForward, BrowserStop, BrowserRefresh
BrowserSearch, BrowserFavorites, BrowserHome
Mail
Mute, VolumeUp, VolumeDown
Media, MediaPlayPause, MediaStop, MediaPrevious, MediaNext

Repeat specifies how many times the key should be sent (useful for sending long sequences of the same key). If this attribute is omitted, a value of 1 is assumed.

Delay specifies how much time, in milliseconds, Shoot should wait between pressing and releasing the key. This attribute will override whatever delay was specified at the command and command-list levels. If it is omitted, the value will be inherited from the enclosing command element.

Pause specifies how much time Shoot must wait between keystrokes. Like the delay attribute, pause is inherited from the enclosing command element if no value is specified.

The following example demonstrates how to send the key SHIFT+CTRL+K three times.

<key type="SHIFT CTRL K" repeat="3" />

<press> and <release> elements

These elements are used to emulate separate key press or release events. Their syntax is similar to that of the key element:

<press type="key combination" pause="milliseconds"/>

<release type="key combination" pause="milliseconds"/>

Note: make sure that for each press a corresponding release will eventually occur (it doesn't need to be in the same command) to avoid leaving the keyboard in an inconsistent state.

<sequence> element

The sequence element provides an easy way of specifying long sequences of characters to be sent by Shoot. The syntax is similar to that of the key element:

<sequence keys="key sequence" delay="milliseconds" pause="milliseconds/>

Note that only printable characters can be sent using this method. To send extended keys the key element must be used. The following example demonstrates how to use this element:

<command name="say hello" phrase="say hello">
   <key type="Enter"/>
   <sequence keys="hello" delay="0" pause="100"/>
</command>


Push to talk

<push-to-talk> element (optional)

Push to talk allows the user to selectively disable or enable speech recognition when a key combination or joystick button is pressed. There are two major PTT modes, namely, hold and toggle and three submodes for hold: hold to enable, hold to disable and hold to flip. Shoot can be configured to use either major mode or a combination of both. Toggle mode will cause speech recognition to be turned on and off alternatively each time a key combination or joystick button is pressed. On the other hand, hold mode will cause speech recognition to momentarily switch state while a key combination or joystick button is depressed. The submode indicates the state to which speech recognition will be set when push-to-talk is triggered. The following table summarizes each mode and submode.

Mode Effect
toggle Speech recognition is alternatively enabled and disabled when PTT keys or buttons are pressed.
hold enable Speech recognition is enabled while PTT keys or buttons are depressed, regardless of whether speech recognition is currently enabled or disabled.
disable Speech recognition is disabled while PTT keys or buttons are depressed, regardless of whether speech recognition is currently enabled or disabled.
flip Speech recognition is momentarily flipped to the opposite state of whatever state it's currently set to. Thus, if speech is enabled, pressing a PTT key or button will momentarily disable it and vice versa.

The ability to mix toggle and hold modes is a new addition to this version and mandated a change in the syntax of the push-to-talk. Nonetheless, the older syntax is still supported for backward compatibility with previous versions. The reason for the change is that in some instances it may be useful to have a global on/off switch that can be used to turn off speech recognition when it won't be needed for extended periods of time, while at the same time being able to use the hold to disable PTT mode. The overall structure of the push-to-talk section is outlined below.

<push-to-talk initial-state="off">
  <toggle>
    <key type="..."/>
    <button device="..." number="..."/>
    ...
  </toggle>
  <hold on-press="submode">
    <key type="..."/>
    <button device="..." number="..."/>
    ...
  </hold>
</push-to-talk>

The initial-state attribute of the push-to-talk element indicates the state to which speech recognition will be set when the profile is loaded. It's value can be either on or off.

The on-press attribute of the hold element describes the hold submode. It's value must be either enable,disable or flip.

Important: take care when choosing PTT keys in hold to enable and hold to flip mode, as the keys sent by Shoot may interfere with the currently depressed keys and produce unintended results. E.g., if you choose SHIFT A as your PTT key and a command sends the T key, the program that receives the key may interpret it as a SHIFT T because the SHIFT key is currently depressed.

The following example demonstrates how to configure PTT to disable speech recognition while either ALT 1, ALT 2 or the 6th button on joystick 2 is depressed and to use the ScrollLock key as a global on/off switch:

<push-to-talk initial-state="off">
  <toggle>
    <key type="ScrollLock"/>
  </toggle>
  <hold on-press="disable">
    <key type="ALT 1"/>
    <key type="ALT 2"/>
    <button device="2" number="6"/>
  </hold>
</push-to-talk>

<key> element

The key element within the push-to-talk hold or toggle sections is used to configure PTT with key combinations. The type should be any combination of the keywords SHIFT, CTRL and ALT and a key from the key table.

<key type="key combination"/>

<button> element

The button element within the push-to-talk hold or toggle sections is used to configure PTT with joystick buttons. The device attribute indicates which joystick the button belongs to. Joysticks are numbered in increasing order, starting at 1. The number attribute indicates which button in particular will be used, and can range between 1 and 32.

<button device="joystick number" number="button number"/>

Sounds

Shoot can be configured to play sounds when certain events take place. The sound configuration section is defined by the sounds element. Following is an example of the basic structure:

<sounds>
  <sound type="type" source="sound file" volume="volume" pan="pan"/>
  <sound ...>
</sounds>

<sound> element

Each entry in the sounds section describes one of the sound types supported by Shoot and its attributes. The table below describes the types currently supported and when they corresponding sounds are played:

Sound type Description
recognized The sound will be played when a command is successfully understood by Shoot.
not-recognized The sound will be played whenever Shoot fails to recognize a command.
speech-enabled The sound will be played whenever speech recognition is turned on (through push-to-talk).
speech-disabled The sound will be played whenever speech recognition is turned off (through push-to-talk).

The source attribute specifies the location of the sound file relative to the sound directory in the Shoot installation folder. Absolute paths are allowed, but it is recomended that you copy your sound files into the sound directory and refer to them by their simple name.

The volume attribute indicates the volume at which the sound will be played. It's value can range between 0 and 100, with 0 being silence and 100, full volume.

The pan attribute controls the balance of volume of the sound between the left and right channels. It's value can range between -100 and 100. At -100, the sound is played through the left channel only. When pan is 0, the sound is played equally through both channels and at 100, through the right channel only.

The following example demonstrates how Shoot can be configured to use the sounds provided with in the installation package:

<sounds>
  <sound type="recognized" source="recognized.wav" volume="100" pan="0"/>
  <sound type="not-recognized" source="notrecognized.wav" volume="100" pan="0"/>
  <sound type="speech-enabled" source="speechenabled.wav" volume="100" pan="0"/>
  <sound type="speech-disabled" source="speechdisabled.wav" volume="100" pan="0"/>
</sounds>


Microphone sampling rate

By default, Shoot attempts to initialize the audio input device with a sampling rate of 16khz. If that fails, it tries alternate rates in succession until a suitable one is found. This is especially important in Win98/ME systems, in which the sampling rate for the microphone is constrained by the sampling rate currently being used for audio playback, and viceversa.

Due to limitations of SAPI, the algorithm may fail in some PC configurations. To get around this problem, users can manually specify what sampling rate to use via the microphone element. The element must be placed within the shoot-config section and its syntax is:

<microphone sampling-rate="rate"/>

The sampling rate can be any of the following values: 11025, 12000, 16000, 22050, 24000, 32000, 44100, 48000. A value lower than 16000 can affect speech recognition accuracy. Higher sampling rates can affect performance. Note that in Win98/ME systems, if Shoot is started before the game, the sampling rate of the audio output channel will be constrained by the chosen sampling rate. If the game was developed to adapt to available rates, both programs should be able to work together (it may be possible that audio quality be affected, depending on the chosen sampling rate).

However, if the chosen rate is not supported by the game, the programs may not work well when run simultaneously. If Shoot is started before the game, it is possible that the game will either crash or have no sound. Conversely, if the game is started first, Shoot's speech recognizer will probably fail to start.