In the first half of this article I covered how to invite a guest for an interview, prepare and record the interview, and manage the interview to keep it on track and interesting. In this part I'll show you how to edit the interview audio on your computer, and share some tips from the pros that you can use in your own interviewing.

Audio Treatments

There's only so much you can do in the editing room with bad signal. The best way to get high-quality audio is to record it correctly in the beginning. That means preparing the space, yourself, your guest, and the equipment.

First, check out the room. Can you turn off the air conditioning or furnace blower? If so, you should. Are there periodic noises, like computer fans or buzzing fluorescent lights? How about intermittent noise, like telephones or door slams? Try to reduce that as much as you can. Turn on your recorder and monitor the room. You may be surprised at what it picks up.

After removing environmental noise, take a look at yourself and the interviewees. Do they have noisy clothes, like jackets, that can be removed? How about jewelry, which has the bad habit of clinking?

Next, check your microphone setup. If possible it should be on a stand to reduce handling noise. And it should have a pop screen to stop the popping that comes from the rushing air of hard consonants like P and B (see Figures 1 and 2). Commercial pop screens are easy to set up, but you can also make your own.

Fig. 1: MXL Mic and Stand

Figure 1: The MXL V63MBP mic, part of the company's Computer Desktop Recording Kit, comes with a convenient desktop stand.

Popless Pop Screen

Figure 2: A pop screen, like this one from Popless, helps prevent explosive P and B sounds from ruining your recordings. Note the elastic shock mount on the mic as well. It minimizes vibration pick-up.

Next, check your levels to make sure you're getting the most volume you can without clipping. And let your interviewees know that they should stay in the same relative position to the microphone, unless you're using a handheld mic and controlling the position yourself.

Last, be sure to get a sample of the ambient noise in the room for use in editing later. Five to ten seconds should suffice.

Throw the Ums Out

Once you have the audio recorded, you will need to edit it into a produced form for your podcast. The extent of the editing is up to you, but you need to make sure that your guest knows beforehand what type of editing will be done. In the examples below, I'll use Audacity, the free audio editor for Windows, Mac OS, and Linux.

When you've got your guests thinking, they will often pause and say "um" or "hmm." Everyone does it, some much more than others, and it's annoying to listen to. It's ethical to speed up the interview by removing the ums. Figure 3 shows the audio waveform in Audacity with the um selected. Figure 4 shows the waveform with the um removed.

Fig. 3: Um Selected

Figure 3. The um is selected.

Fig. 4: Um Deleted

Figure 4. The um is deleted.

Until you get the hang of removing sounds cleanly you should try removing more or less audio around a single um to see what sounds the most natural. Often people will trail an um into the beginning of their first word. Finding where to cut the um in that case can take some experimentation. It helps to zoom way in on the waveform and cut on zero-crossings, the points where the waveform crosses the center line in the diagram, meaning that the level is zero. Many audio editing programs can snap the cursor to zero-crossings automatically, and some will automatically smooth the audio surrounding the edit point with a brief crossfade.

Matching the wave shape on either side of the cut can help smooth the transition as well. For example, suppose the waveform is falling toward the zero-crossing at a 60-degree angle just before the beginning of your cut. Try to put the end of the cut at a point where the waveform falls below the zero-crossing at 60 degrees. That will produce a linear transition from high to low. (For more waveform-matching tips, see the sidebar "Extreme Audio Editing.")

Isolated ums are fairly easy to remove, but sounds that overlap the speech, like microphone noise, pops, or tongue clacks from dry mouth, aren't as easy to fix. While you're recording the interview keep in mind how you'll edit it later. And if those problems occur, let your interviewees know, and give them an opportunity to stop, fix the problem, get a glass of water, and start again on the same question so that it's easier to edit later.

Removing Failed Responses

If you've established up front that you'll be editing for content, your guest should feel free to stop a response they don't like and restart from the beginning. In that case simply delete the entire section from the end of your question to the start of the final response. But leave enough space between the question and the answer to give a natural pacing to the conversation. The top part of Figure 5 shows the audio waveform with the question and both of the responses. The bottom part shows the waveform after the first response has been deleted.

Fig. 5: Deleting a Failed Response

Figure 5. At top, the failed response is selected. At bottom, it's deleted, closing the gap.

Note how I left some space to create the appropriate pacing. To check your work you should listen to the question and answer just before the one you're editing to make sure that the pacing matches.

It's also important to match the breath sounds. People naturally breathe in before starting a sentence. To create a seamless edit you need to chop between start of the first response and the start of the second response and preserve the breath. The top part of Figure 6 shows just the word selected. If you chop there, you'll get an extra-long pause and possibly an additional breath sound. The proper selection to remove the word is shown in the bottom window.

Fig. 6: Selecting a Breath

Figure 6. At top, only the word is selected. At bottom, both the word and its preceding breath are selected.

Always encourage your guest to leave an ample one- or two-second pause before the restated response so that you can easily delete the failed response.

Removing Entire Questions

To shorten an interview it's often necessary to remove entire questions or groups of questions. You may also want to do that when you rephrase a question to get a better answer. The top part of Figure 7 shows the interview with the unwanted question highlighted. The bottom part shows the post-operation result.

Fig. 7: Deleting a Question

Figure 7. At the top, I've highlighted a question I want to delete. At the bottom, it's gone.

This type of large edit foreshadows the ethical issues in editing interviews. If you remove a question and answer you can change the meaning of the answers that follow. It's important to retain the integrity of the interview and to present the answers given by your guests in the way the way they intended. Cutting within a question or a response is almost never ethical without the approval of the guest.

Moving Questions Around

A well-produced interview presents a smooth arc of information. The answers follow logically within the context of the story being presented. However, that's not always the way it works out as you're recording. Often the interviewer will remember a question, or the interviewee will offer more information about a previous question. Both will necessitate re-ordering the questions. Figure 8 shows the interview before and after the question is moved.

Fig. 8: Moving a Question

Figure 8. The highlighted audio in the top window is a question I want to move to earlier in the interview. The bottom window shows the audio after the question is moved.

Again, you need to tread a fine ethical line with these types of edits. In this case I had asked the interviewee to talk about his product. He started with a customer-centric pitch. I then asked about deployment. Then I asked about a customer-use case. Then I asked another question about deployment. To fix the disconnect, I moved the third question after the first question so that both of the customer questions came first, then both of the questions about deployment. That way the interview just flowed better.

Creating Smooth Transitions

Earlier I spoke about recording some ambient room noise before getting into your interview. Here is why. Often you will want to have a short studio segment that introduces the interview. That segment is going to have significantly less noise and better quality than the interview segment. Without any editing there will be a jarring transition between the clean studio sound and the relatively noisy interview.

The top track in Figure 9 shows a multitracked interview segment that starts with a studio sound and ends with a noisy bit of interview. The bottom track, which plays simultaneously, has a small segment of background noise. I used a volume envelope in Audacity to lower the level of the noise when the noisy interview segment begins and then increase it at the end, smoothing the transition.

Fig. 9: Adding Ambience

Figure 9. Using a volume envelope to mix background noise (bottom) with a too-clean voiceover (top).

The effect is subtle, but it eases the listener's mind. This technique works best when you're using studio-produced content in conjunction with audio from the field.

Working Interview Content into a Story

In some podcasts the interview will act as supporting material rather than the central theme. To work segments of an interview into a podcast you will need to multitrack your recording.

Figure 10a shows the container segment. Figure 10b shows the sound bite I want to insert; note that it's about 11 seconds longer than the silent gap in Figure 10a. In Figure 10c, I have opened up the gap in the original file wide enough to hold the audio I want to insert. For clarity, I pasted the sound bite into a separate track, but I could have pasted it directly into the gap.

Fig. 10a: Container Segment

Figure 10a. The recorded story segment, with a gap for an interview sound bite.

Fig. 10b: The Sound Bite to Insert

Figure 10b. The interview fragment I want to insert.

Fig. 10c: The Multitrack Composite

Figure 10c. The final multitrack mix.

To keep it straight as to who is saying what, I renamed the tracks "story" and "interview." You can name them whatever you like to keep it straight as to what channel is holding what audio.

If the sound quality of the story segment varies greatly from the quality of the interview, I suggest you use the "Creating Smooth Transitions" trick to even out the audio.

If you intend to use several fragments from a fairly long interview, I recommend using a notepad to jot down the start and end times of each fragment you need to extract. Then put notations for where each fragment should go in your story script.

Removing Pops

If your guests aren't familiar with microphone technique, they will often create pops in the signal by speaking directly into the face of the microphone. Ideally you can prevent this by demonstrating some proper microphone technique beforehand, and by using a pop screen or windscreen.

If you find the pop during the editing phase (it will probably look like a spike or flat-top in the waveform), there are several ways to remove or reduce it. You can

  1. Use nondestructive volume envelopes
  2. Select the pop and change the level destructively (using the editor's Change Gain function)
  3. Select the pop and use a low-cut EQ to reduce the thump
  4. Snip a few milliseconds out of the pop sound

We'll demonstrate these techniques in a future article.

Avoiding the Up-cut

Speech styles vary from individual to individual, but you'll find certain patterns fairly often. One of these is a slight elevation in volume (or pitch) going into a short pause, the kind of pause that is represented in print by a comma. This is a speaker's way of letting listeners know that while they are taking a quick break, they still have more to say.

If you cut the audio right after the pause, you will have an audio fragment left that ends on a rise. That almost never happens in regular speech and it will sound very unnatural. It's a dead giveaway that you have made some harsh audio edits. Always work to avoid the up-cut.

Removing Periodic Noise

When it's impossible to turn off the air conditioning or the computer fan, you will have a complex periodic noise in your signal. Audacity has a periodic noise filter built in. But you may want to look at commercial products like BIAS Sound Soap if you have serious noise issues in your signal. (See Seven Steps to Noise-Free Digital Audio.)

Adding the Final Polish

To finish off the interview you will want to normalize the volume by using the editor's Normalize function, applying a volume envelope manually, or by using dynamic compression. Enveloping is particularly import and very easy to do in Audacity. The top part of Figure 11 shows a signal that starts soft and gets louder because the recordist boosted the input level during recording. Obviously it would have been best to have consistent volume, but the signal is what it is. The bottom part shows the same signal after it's been enveloped to normalize the sound from start to end.

Fig. 11: Volume Envelope

Figure 11. The original signal (top) gradually increased in volume as the recording engineer raised the input level. Applying a decreasing volume envelope (bottom) evened out the levels.

Because the overall volume of the signal was reduced, I also boosted the channel gain by 6dB to bring it back up into an acceptable range. Here's the finished result:

You may also want to add a small amount of reverb to add some depth to your voice or your guest's voice. All of these final polish steps should be applied only after the initial micro-edits are made to remove pops, ums, and other small problems.

One piece of editing advice that I live by is to always listen to the original and the new version of the audio after compression and reverb are applied, to make sure that you are making real improvements. Sometimes a natural sound is best.

Extreme Vocal Editing

Fuzzy Math

"Fuzzy Math," by the Bots, chops political speech into astounding new shapes.

Ed. Note: Although this article focuses on ways to get a natural sound through audio editing, Brian "BC" Coburn has taken vocal-slicing to extremes. At his George W. Bush Public Domain Audio Archive, Coburn painstakingly assembled and tagged some ten gigabytes' worth of phrases from political speeches. He then deconstructed that source material to produce the amazing presidential parody songs at TheBots.net.

In "Fuzzy Math," George W. Bush appears to sing, "Now is the time to give corporations the people's money!" In the raunchy "Rock the House," Bill Clinton confesses, "I need sex with Ms. Lewinsky."

In many places the vocals flow so smoothly that you wouldn't know they were fake if there weren't techno music pumping along. I asked Coburn how he's able to chop up and recombine phrases so seamlessly, particularly with speakers like Bush who slur their words together. He sent back these tips. —David Battino

There are some simple techniques for producing good results with the Bush archive. Find a strong positive or negative statement. Look around in that same folder and see if you can find the perfect word to substitute for the subject of the original statement, so that it reverses the meaning in humorous fashion. For example, you find the phrase "I love freedom," and in that same folder, you find the statement "they are evil." Open the files in your audio editor. Cut the last word off the first phrase and paste in the last word from the second phrase. Save your new file ILoveEvil.wav and you will have a new and unique addition to your personal version of the database.

When editing and making your own phrases, you will have much better results if you match words from the same speech, rather than between speeches. In other words, find another phrase from the same folder before you start searching for keywords in the entire database. Because each speech was given in a different room, and under different conditions, words will more easily pair with other words that are from the same speech.

When people speak, words tend to run together. You will find many places where you can't cleanly edit that word you need from one phrase into another because the words are all running together. The pitch profile and duration may also change depending on where a word appears in a sentence. In the "I Love Evil" example above, notice that the edited words both appear at the end of the original phrases. Neither "freedom" nor "evil" was running into another word, because they're both at the end of the phrase. In general, words at the beginnings and ends of phrases are easier to work with.

If you have to use run-together words from the middle of phrases, try to find ones that are coming out of or going into the same phonetic sound as your target.

In your audio sequencer, create an envelope for your reverb send amount. You can smooth out differences in room tone between your samples by varying the amount of reverb send.

Learning from the Best

As with almost any skill, the easiest way to learn something new is to imitate someone who does it well. For emotional interviews, where you are telling a person's story, there are two great interviewers I'd recommend listening to, and you can hear them for free on the Web. The first is Tony Kahn of the Morning Stories podcast. Tony is a seasoned professional broadcaster who has rekindled his passion for people's unique stories using his podcast. He interviews an average person for an hour to get enough material for a five- to ten-minute podcast story.

The second master of the emotional interview is Ira Glass of This American Life. For an example of his work, listen to the "Squirrel Cop" segment that is archived on the site. He also does a long interview in person where he walks around with the person to get into their head as he tells their story.

For informational interviews I like to listen to Brooke Gladstone of On the Media. She is both an interviewer and an editor who aggressively edits interviews into a tight package that gets right to the point. On the flip side of that is Terry Gross of Fresh Air, where the banter is more jovial and the editing is a bit looser with longer gaps between questions and answers.

On the documentary side, you should check out the work of Errol Morris in films like The Fog of War. Morris uses super-long interviews (often ten hours) to get into as much detail as he can, then condenses it into a documentary narrative. Even though his medium is video, there is a lot to be learned from his use of contemplative pauses and expressions. All too often in interviews the guests are allowed to settle into a scripted answer. It's important for the listener to understand when you have asked a question that has gotten your guests off script and made them think.

You can also learn from interviews that you didn't like. When you turn off an interview, try to figure out what it was that made you switch off. Was it the guest? Was it the interview style? Was the interviewer too pushy, or phrasing questions in a way that limited the conversation? What questions would you have asked instead?

Good Night and Good Luck

All killer interviews have one thing in common—killer content. It's not the style of questioning, the voice of the interviewer, or the audio quality that matters. What matters is having an interesting topic plus questions and answers that inform and entertain. That is where you should concentrate your time and preparation. Know the topics that you are asking questions about and ask the insightful questions that illuminate those topics in compelling ways. If you do that, then people will listen through tin cans connected by shoestrings if need be.

Jack Herrington is an engineer, author and presenter who lives and works in the Bay Area. His mission is to expose his fellow engineers to new technologies. That covers a broad spectrum, from demonstrating programs that write other programs in the book Code Generation in Action. Providing techniques for building customer centered web sites in PHP Hacks. All the way writing a how-to on audio blogging called Podcasting Hacks.

