The purpose of this assignment is to give you practice with strings, lists, functions, and sound processing.
You'll use a sound media module for several parts of this assignment. Please place this sound_media.py file and this sample.py file into the same directory in which you are working on your assignment. You can then test that the sound_media
module is working by typing import sound_media
at the Python shell.
This is a long handout. I recommend reading it through once or twice before starting, so you get an overall picture of how everything fits together. I also recommend starting early. It is easier to do the assignment in small steps rather than all at once, because each function on its own is not large. You may also like to Download The Assignment as a single archive (there are a lot of files!).
Sounds are waves of air pressure. When a sound is generated, a sound wave consisting of compressions (increases in pressure) and rarefactions (decreases in pressure) moves through the air. This is similar to what happens if you throw a stone into a pond: the water rises and falls in a repeating wave.
When a microphone records sound, it takes a measure of the pressure in front of the microphone and returns it as a value. These values are called samples and can be positive or negative corresponding to increases or decreases in air pressure. Each time the air pressure is recorded, we are sampling the sound. Each sample records the sound at an instant in time; the faster we sample, the more accurate is our representation of the sound. The sampling rate refers to how many times per second we sample the sound. For example, CD-quality sound uses a sampling rate of 44100 samples per second; sampling someone's voice for use in a VOIP conversation uses far less than this. Sampling rates of 11025 (voice quality), 22050, and 44100 (CD quality) are common; we will create sounds with a sampling rate of 22050 in this assignment.
A sample is simply a positive or negative integer that represents the amount of compression in the air at the point the sample was taken. We will use 16 bits for each sample. Note that for stereo sounds, a sample is actually made up of two integer values: one for the left speaker and one for the right. We will be working with only mono sound files in this assignment.
The sound_media
module contains functionality for working with sound files. You will use its features to complete the various subparts of this assignment, after which you will have created a Song generator. The Song Generator will be able to take a string of note data and play the song it represents. In part 1 of this assignment, you'll write some functions that manipulate sounds. In part 2, you will write the Song Generator, some of whose features will rely on your functions from part 1. Some of the sound files in this assignment came from acoustica.com.
The sound_media
module contains functions for loading sounds from existing wav
files, and creating new, empty sounds that contain as many samples as you request. The relevant functions are load_sound
and create_sound
. You should familiarize yourself with these functions (by using Python help) before continuing.
For example, here is a Sample Wav File that plays the notes C, D and E. You could verify this by loading the file into any media player, but you can also play the file using the sound_media
module:
>>> import sound_media >>> song = sound_media.load_sound('cde.wav') >>> sound_media.play(song)
sound_media
module will help you test some of the functions you will write. You might also find it helpful to save a sound as a wav file; you can do that using the sound_media.save_as
function.
You may be more familiar with MP3 files than wav files. The major difference between the two is that MP3 files are sound files that use a form of lossy compression to make them smaller than their wav counterparts. Wav files typically store sound in an uncompressed format, so they are usually far bigger (sometimes ten times bigger) than the same sounds stored as MP3 files.
All functions in this part should be stored together in a file called sound_functions.py
. Done correctly, each can fit comfortably in 15 lines (but certainly does not have to, as long as your code is clear), and follow similar looping strategies.
The first function we'll write is reverse (snd)
: it takes a sound, and creates a new sound that is the reverse of the original sound. (The original sound is not modified.) For example, the reverse of this sample wav file we gave above is reversed here. Notice that reversing a sound simply means "play it backwards". Reversing everyday noises can sound strange: here is a door slamming and a door slamming in reverse!
To reverse a sound, we want to reverse the order of its samples. If we conceive of a sound as a sequence of samples ordered from left to right, we reverse a sound by instead ordering its samples from right to left. For example, if a sound has sample values 2, 3, and 4, then the reversed sound will have sample values 4, 3, and 2. (Of course, sounds usually have hundreds or thousands of samples, not just three.) Investigate the function get_sample
, which gives you access to single samples of sounds, and get_value
and set_value
, which allow you to retrieve and modify the value of a sample, respectively. In case your solution requires it, you can obtain the length of a sound in samples using the len
function that we previously used on strings.
Next, write a function mix(snds)
that takes a list of sound objects, and "mixes" them into a new sound that is returned. (None of the original sounds in the list is modified.) By "mixing", we mean that the original sounds are played at the same time, so that each sound that is mixed is heard at the same time with the other sounds. For example, if we mix this three-note sound and this sound of water bubbling, we get this combination of notes and water. And if we mix this door slamming sound and this welcome sound, we get this ominous door and welcome sound. Your function should work with an arbitrary list of sounds, not just two sounds. The length of the resultant sound should be the length of the longest sound in the input list, otherwise part of one or more sounds will be cut off! In our notes-and-water example, the three notes were longer than the bubbling water, so the length of the mixed sound was the same as the notes sound.
Mixing two or more sounds involves adding corresponding samples together. For example, if one sound has three samples: 2, 4, and 6; and another sound has four samples: 10, 11, 12, and 13; mixing them yields a sound of four samples: 12, 15, 18, and 13.
Note that if we try to mix too many sounds together -- or just a couple loud ones! -- the resultant, mixed sound will sound distorted. This phenomenon is called clipping. For example, if you mix together four or five copies of the above water sound, it sounds more like static than like water. The reason is that we use only 16 bits for each sample: once we try to store samples that fall outside the range representable by 16 bits, we cannot properly store them and hence get distortion.
To increase or decrease the volume of a sound wave, we increase or decrease its amplitude, respectively. In terms of our digital representation of sounds, we will achieve this by multiplying or dividing each sample by a constant in order to increase or decrease the volume. If we multiply each sample by 2, for example, we double the volume; if we divide each sample by 2, we halve it.
Write a function change_volume(snd, factor)
that returns a new sound resulting from multiplying each sample in snd
by factor
. (The original sound is not modified.) We will then be able to use this function to increase the volume (by providing a factor
larger than 1) or decrease it (by providing a factor
between 0 and 1). For example, if we use a factor of 0.5 on this crow cawing sound, we get this crow at half volume sound.
Taking the lessons on mixing and volume from the previous two exercises, write a function echo(snd, delay)
that takes a sound object and a delay, and returns a new sound that adds an echo to snd
; snd
itself is not modified. To add an echo to a sound, we will mix in another, lower-volume copy of that sound starting delay
samples from the beginning. The lower-volume copy of the sound should be at 25% of the original volume. The number of samples in Your new sound will be the sum of the number of samples in the original sound, plus delay
samples. The reason the new sound is longer than the original is because otherwise the echoing copy of the sound would be cut off before it completes. Of course, you can write this function by directly manipulating sound samples, but it would be pretty hyper if you relied on your volume and mixing functions...
Here is an example. This Crow Cawing sound has an echo added to it to create this echoing crow cawing sound. I used a delay of 5000 samples. As another example, this welcome sound has an echo added to it to create this welcome sound with echo. Here, I used a delay of 10000 samples.
In this section, you will write a function song_generate(notestring)
that takes a notestring (to be described) and returns its representative sound object. The sound returned by your function can then be played like any sound loaded from a wav file. However, you will directly generate the returned sound; you are not to load any wav files at all for this part. Where appropriate, you should call and reuse your functions from part 1. The function for this part should be saved in file song_generator.py
.
Let's begin with the simplest notestrings, and incrementally describe all of the features you must support.
Consider the notestring "CDEFGAB"
. Passing this string to your song_generate
function should result in the sound object which, when played or saved to a wav file, results in this sound of seven notes. The sound is composed of the note C, followed by the note D, followed by the note E, and so on, until the note B. The simplest notestrings, then, are composed of the letters A, B, C, D, E, F, and G, corresponding to the seven notes of a scale. The sound_media
module has a create_note
function for creating notes based on these note names that you should use to create notes and append them together to create a sound that represents the entire notestring. Notes created in this way are like the sounds we have been using all along (i.e. they support the same functions such as play
and save_as
). When creating a default note, it should last for 7350 samples, use the default volume of the note when it is created, and use the default octave. These parameters are influenced by other features of notestrings, described shortly.
In addition to the characters A, B, C, D, E, F and G, you must support the character P. A P means "pause": it indicates that you should add 7350 samples of silence to the song you are generating (rather than 7350 samples of a particular note). As an example, the string "CPCPPCPPPPPC"
sounds like C notes with pauses. Four C's are played, each one waiting more time to play than the one before it.
The second feature of notestrings is evident in a string such as "2Cd2E"
. If we have a positive integer number n directly preceding a note or pause, it means that the note or pause should last n times its normal length. The integer n may be multiple digits long; you should support these multi-digit integers.
Here's a sanity check. The string "CCCC"
should sound like four distinct notes. The string "4c"
should sound like one longer note. Listening carefully, the first of these sounds contains four notes (you can hear little pauses between the notes), whereas the second contains one really long note with no gaps.
Here is a string for you to parse once your function supports lengths. It is the first ten notes of Canada's national anthem: "4E3GG4C2P2D2E2F2G2A6D"
.
The third feature of notestrings is the ability to change octaves. A >
symbol means "increase the octave by 1" and a <
symbol means "decrease the octave by 1". The new octave is active until changed by another greater-than or less-than sign. That is, all of the notes following an octave-changing sign will be in that new octave until the octave is changed again. When we increase the octave, all of the notes still "sound the same" except they have a higher pitch. Similarly, when we decrease the octave, notes have a lower pitch. (Interestingly, the sound frequency doubles each time we increase the octave by 1. That is, in terms of frequencies, corresponding notes in successive octaves become more and more distant as the octaves increase, even though it sounds like a linear increase in pitch to us! ... But you don't have to care about this for the assignment.)
As an example, the string 4E>4E>4E>4E<<<<4E>4E
sounds like E's in different octaves. The note E is played in four increasing octaves; then played one octave below the default octave; then played again at the default octave. Here's another example: "4C>4C<2BGA2B>2C"
; do you know this famous song?
The fourth feature of notestrings is the ability to change volume. A +
symbol means "increase the volume by 1" and a -
symbol means "decrease the volume by 1". The new volume setting is active until changed by another +
or -
symbol later in the notestring. The default volume is 0. A volume of 1 causes notes to play at 120% of the default volume; a volume of 2 plays notes at 140% of the default volume; and so on. A volume of -1 causes notes to play at 80% of the default volume; a volume of -2 plays notes at 60% of the default volume; and so on. (Do not worry about the situation where the volume hits -6.)
As an example, the string "2C+2C+2C+2C++++2C-------2C"
sounds like C's at various volumes. The double-length C note is played at the default volume, then at each of three increasing volume levels, then at volume 7, before being played again at default volume.
The fifth feature of notestrings allows us to include multiple "channels" that are mixed together in the final sound. The |
symbol in a notestring indicates that the current channel has ended, and a new channel is beginning. For example, the string "8C|8F|8A|>4C"
sounds like a sound with multiple channels. It plays four channels at the same time: the first channel plays a C, the second an F, the third an A, and the fourth a C in the next octave. This final note is shorter than the others, so it ends first; the remaining three notes keep playing the harmonious chord. Any octave or volume changes must be restricted to the channel in which they occur; in particular, octave and volume commands have no effect on any channel descriptions that follow it in the string.
Another way to think of what the |
does is to think in terms of "hands" playing a piano. The stuff before the first |
is what your left hand is playing, the stuff after the first |
and before the second |
(or until the end of the string if there are only two channels) is what your right hand is playing. At this point, we run out of hands for our metaphor, but your supported strings should not be restricted to just two channels.
The following string is a larger example that collects most of the functionality discussed so far. It contains notes, notes with numbers preceding them (for increasing their length), octave changes, volume changes, and two channels played simultaneously:
">+CCGGAA2GFFEEDD2CGGFFEE2DGGFFEE2DCCGGAA2GFFEEDD2C|+CGEGCA2EBFCGBF2CEGFAEG<2B>EGFAEG<2B>CGEGCA2EBFCGBF2C"
The result is this nice little Twinkle Twinkle tune. The first channel contains the melody of the song, and the second channel contains the accompanying harmony. The harmony plays along with the melody to give the song a fuller sound.
The final feature of our notestrings is that they may begin with the substring [x]
(including the square brackets), where x
is an integer indicating that the song should play at x
beats per minute (BPM).
The BPM tells us how long a note, such as "C", lasts. It also indirectly tells us how long "2C" and "3C" last"; "2C" lasts twice as long as "C", and "3C" lasts three times as long as "C".
Given a BPM value x
, we can take x / 60
to calculate the beats per second y
. If we then take 22050 / y
, we arrive at the number of samples constituting one beat (i.e. one note like "C").
If a BPM is not explicitly specified, you should default to 180 BPM. (In fact, this is what we've implicitly done all along, because we've used notes and pauses of 7350 samples. Be sure you understand how 7350 samples and 180 BPM correspond.)
The only place the BPM specifier can appear is at the very beginning of the entire notestring. In particular, it is not permitted to occur at the beginning of any channel's description besides the first. If the first character of a notestring is not the [
symbol, the BPM remains at the default of 180 for the song.
Here's an example. The notestrings "CEG"
and "[180]CEG" are exactly the same C major chord at 180 BPM. The notestring "[60]CEG"
is the same C major chord, but this time at 60 BPM. In other words, the second is three times slower than the first.
You should not produce any output to the screen (with print
) or acquire any input from the keyboard (with raw_input
) in any of your code files. You may include an if name == ...
section, but do not include any top-level code that runs when we import your .py
files. We will call your functions to test them, so such calling code is unnecessary.
For your amusement (or for further testing!), here is a Notestring Library.
These are the aspects of your work that we will focus on in the marking:
Hand in the following files:
sound_functions.py
song_generator.py