Chapter 2: Technology and Sound Basics

This chapter will present a brief overview of some of the music technology concepts that are pertinent to video game music. You will learn about the types of sound generation used in video games and some of the basic technological constructs and terminology. This will provide you with a framework and context with which to understand how and why video game music functions as it does – after all, the style and function of video game music is a result of available technology. In much the same way that the symphony sound is dependent on the development of the classical orchestra and the instruments within, so it is the case with video game music and technological forces. Composers have to work within the technological limitations of the console and create something effective with these means.

2.1 Characteristics of Sound

Before we jump directly into a discussion regarding sound production, it is important to understand the five basic characteristics of a sound. Understanding these will be important as we discuss the qualities of game music by different composers and for different consoles. These sound characteristics are also what you will use as a focal point if, for example, you wish to analyze a piece of video game music. Each characteristic has an objective and a subjective quality. The characteristics are described in the subsequent table.

Impetus Objective Subjective Subjective Result
Wave speed Frequency Pitch Perceived note
Wave height Amplitude Volume Perceived loudness
Wave makeup Spectrum Timbre Perceived “colour”
Length of wave Duration Rhythm Perceived timing
Sounding body Location Panning Perceived place
Extrasonic
Mechanical causality Embodiment Agential displacement Perceived connection between sound and sound source

All of these characteristics are directly related to the properties of a sound wave. You may be already familiar with the idea that sound propagates through space in waves, but in case you are not, a brief description of how this works follows. Sound is created when a sound source or agent enacts a motion on an object that creates a period displacement of air molecules, resulting in sound waves. The properties of this displacement (i.e., the properties of the wave itself) are what contribute to how we hear sound. An example of this would be the plucking of a guitar string. An agent plucks a guitar string, which vibrates. The vibrations cause periodic displacements of air molecules, and the result is a sound wave, which we perceive as a specific pitch (the plucked note) dependent on the frequency (speed) of the wave. This frequency is measured in Hertz (Hz), or complete wave cycles per second. A sound wave that contains 40 full wave cycles each second has a frequency of 40 Hz for example. The amplitude represents the measurement of the air pressure displacement, which we perceive as volume; large amounts of displacement will result in a perceptually louder sound. The other property of sound waves that doesn’t have a subjective correspondent is phase, and this refers a specified point in the wave cycle. It is not necessary to have a complete understanding of the precise physics behind this to access all of the information in this book, but it is helpful to have a context with which to frame the sound and technological aspects discussed. Below the five properties of sound, I have added a sixth property, related to the mechanical causality and perceived relationship between sound and sound source. This property is also useful to the study of video game music, especially when discussing sounds that the character makes or sounds within the environment. If these topics interest you and do wish to have a fuller understanding, I encourage you to seek out Peter Manning’s Electronic and Computer Music 4th edition.

2.2 Sound Synthesis

It is important to understand the ways in which sound is generated on video game systems, especially as we discuss console-specific music, and its’ sound characteristics. Sound synthesis refers to the creation of sound from scratch. This is in contrast to, for example, recording a sound and playing it back. Early video game composers all used sound synthesis as their primary sound generation method since it was more technologically feasible than recording and playing back large amounts of pre-recorded sound. Several types of synthesis are discussed below, all of which have been implemented in video game music.

Additive synthesis involves the combination of various generated waveforms at different phases, amplitudes, and frequencies, to create complex waveforms (complex sounds). When you hear a sound, the frequency you perceive as pitch is called the fundamental. In a sound created using additive synthesis, this is generally the lowest frequency present in the waveform. Frequencies above the fundamental that are present in the waveform are called partials. Usually frequencies that are whole number multiples of the fundamental are added to the fundamental, although this is not always the case (whole number multiples are harmonic, frequencies that are not whole number multiples are called inharmonic). Different combinations will produce a different waveform makeup, which results in a different spectrum or timbre. A computer chip uses voltages to control these generated waveforms (sometimes referred to as oscillators), and the desired frequencies, amplitudes, and phases can all be set by the sound programmer. In the case of video game music, additive synthesis is primarily used to create very specific types of waveforms. I will provide a basic description of some of these:

Pulse waves: These contain odd numbered harmonics only. If the fundamental is 100 Hz, the first couple of partials will be 300 Hz (fundamental x 3), 500 Hz (fundamental x 5) and 700 Hz (fundamental x 7). The harmonics of these partials are present in the ratio 1/harmonic number. All of these harmonics are in phase.

Triangle waves: These also contain odd-numbered harmonics only. However, the amplitude of each harmonic is scaled to 1/harmonic number squared (the amplitude of the second partial, which is the third harmonic, would therefore be 1/9th of the amplitude of the fundamental). Additionally, every other harmonic is 180 degrees out of phase.

Noise: Most early consoles contained a channel that could generate noise. There are a few different kinds of noise, but in general, noise contains a wide band of frequencies at equal amplitudes. White noise consists of all frequencies at equal amplitudes. Pink noise consists of all frequencies, but the amplitudes differ at each octave. Noise is usually not created using additive synthesis; there are special noise generators to do so.

Subtractive synthesis involves the use of devices called filters to remove or reduce (attenuate) portions of the sound. Subtractive synthesis can be used on generated noise (which contains all frequencies), or on waveforms generated by additive synthesis. There are several types of filters, all of which have a different effect. The most common types of filters are high pass, low pass, band pass, and notch.

High pass filters: These types of filters allow frequencies above a specified frequency to pass unaltered. All lower frequencies are attenuated (reduced in volume).

Low pass filters: These types of filters allow frequencies below a specified frequency to pass unaltered. All higher frequencies are attenuated.

Band pass filters: A user must supply two frequencies to these types of filter – all of the frequencies on the outside of this specified band of frequencies are attenuated.

Notch filters: Like band pass filters, a user also specifies two frequencies to these filters. However, the frequencies within the band are attenuated, and the outside frequencies are allowed to pass.

These represent the most widely implemented types of filters, but this list is not exhaustive. More complex filters are continually being designed and used in music software and hardware.

Frequency Modulation (FM) Synthesis was developed by John Chowning in 1973 and gained widespread use when it was implemented into the chip of many Yamaha synthesizers in the early 1980s.[1] FM synthesis has the benefit of using only two oscillators to create very rich and complex timbres. The concept underlying FM synthesis is derived from a stylistic technique usually used by musicians called vibrato. In instrumental and vocal technique, vibrato is stylistic application of periodic fluctuations of tone (an example of this is in opera – held notes contain a wavering sound – this is vibrato). When synthesizing sound, vibrato is created when a single oscillator is modified by using a second oscillator as a continuous control to change the first oscillator. Very low frequency oscillators (those that are so low they are inaudible, about 6 Hz) are used to change the pitch of the first oscillator, but since these oscillators are below the hearing range, the result is perceived as very small but periodic changes in pitch, rather than a timbral change. We refer to the first oscillator in FM synthesis as the carrier, and the second the modifier. When the frequency of the modifier is increased to the audible range (20 Hz and above), it begins to alter the timbre, rather than the pitch, of the carrier wave. In this way, FM synthesis allows for very complex and rich timbres with very limited material (only two simple sound waves); this is achieved because other waves, called sidebands, are generated above and below the carrier wave. FM Synthesis is theoretically complicated, and it is not necessary to have a complete understanding of how it works to read this text. If you wish to learn more about FM synthesis, both Barry Truax[2] and John Chowning[3] have extensive literature on the topic.

All of the types of synthesis described above were made possible by computer chips called Programmable Sound Generators that allowed composers (or sound programmers) to write machine language to instruct the oscillators how to behave. These are essentially the synthesizers, also called sound cards, which produce the game’s sound; these PSGs can be located in the console, arcade cabinet, or computer base.

2.3 Streaming Audio and Digital Sampling

The use of streaming audio in video games became possible as consoles grew more efficient at processing and as the game storage mediums became able to hold more data. Unlike synthesis, streaming audio allows a piece of music to be pre-composed and then stored on the disc for later playback (and looping) during gameplay. This type of technology became widespread when cartridge type consoles were replaced with consoles that used optical laser disc (CDs and DVDs). Audio information is stored digitally on these devices, so in order to understand how streaming audio works, I will provide a basic introduction to digital audio. Audio information is transmitted into a digital format via a process called digital sampling, which converts an analog representation of sound to a digital representation of sound by taking measurements of instantaneous amplitudes of the sampled sound at equally spaced time intervals. What this means is that when a sampled sound is played, digital snapshots are taken at a speed known as a sampling rate. The sampling rate for CD quality sound is 44,100 – meaning that 44,100 snapshots of the incoming sound are taken every second. Sampling rate therefore will have a high impact on the quality of the digital sound, as the more samples that are taken, the more precise the replica of the sound becomes. The second contributing factor to sound quality in digital audio is bit depth. The term “bit” may already be familiar as it is a common computer storage term, short for binary digit. The higher the bit depth, the more information that can be encoded per sample – this allows for finer resolution in waveform amplitude.

Therefore, regarding quality:

  • Higher sampling rate means more snapshots are taken = higher quality
  • Higher bit depth means more potential values for amplitude in waveforms = higher quality.

However, we have to consider the size of the storage medium, and the higher the bit depth and sampling rate, the higher amount of storage the sound files take up. The most common sampling rate used today is 44.1 kHz and 48 kHz; the most common bit depth used is 16- or 24-bit.

2.4 Summary

You have been given a very basic overview of the basic technology used in game sound. This chapter does not go into great detail regarding the physics of sound, and does not survey concepts such as software used to program games or specific environments and engines used in game sound design. These subjects are beyond the scope of this book, which aims to provide an introduction to video game music for scholars of all levels, including non-musicians. With this basic introduction behind you, you should have enough understanding to comprehend the material throughout the book.

[1] see US. Patent 4,018,121, April 19, 1977

[2] Truax, Barry. “Tutorial for Frequency Modulation Synthesis,” http://www.sfu.ca/~truax/fmtut.html.

[3] Chowning, John M. “The synthesis of complex audio spectra by means of frequency modulation.” Journal of the Audio Engineering Society 21.7 (1973): 526-534.