Even better if you could get it working automatically. I wonder if there's any existing work on using signal processing to turn audio of spoken words into a whisper or a shout?
Theoretically, you'd need to separate the tonal part of the voice and the noise part. The tonal part goes up/down, let's say, one octave at a time and becomes louder or quieter. Pure whisper is essentially only the noise part. The separation routine should look into the spectrum: the tonal part must look like narrower bands, while the rest is noise. Tonal transformations may be more complex in reality than just multiplication, but the simplest transformation might work OK too.
This is similar to the velocity channel in digital music, which is how hard the note is struck (possibly changing the quality of the note), which can be separate from the volume.