我正在使用pyaudio将我的声音录制为wav文件.我正在使用以下代码:
def voice_recorder(): FORMAT = pyaudio.paInt16 CHANNELS = 2 RATE = 22050 CHUNK = 1024 RECORD_SECONDS = 4 WAVE_OUTPUT_FILENAME = "first.wav" audio = pyaudio.PyAudio() # start Recording stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK) print "konusun..." frames = [] for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) #print "finished recording" # stop Recording stream.stop_stream() stream.close() audio.terminate() waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb') waveFile.setnchannels(CHANNELS) waveFile.setsampwidth(audio.get_sample_size(FORMAT)) waveFile.setframerate(RATE) waveFile.writeframes(b''.join(frames)) waveFile.close()
我正在使用以下Google Speech API代码,它基本上将WAV文件中的语音转换为文本:https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/api-client/transcribe. PY
当我尝试将由pyaudio生成的wav文件导入Google的代码时,我收到以下错误:
googleapiclient.errors.HttpError:
我正在使用以下解决方法:我正在使用ffmpeg将WAV文件转换为MP3,之后我将使用sox将MP3文件转换为wav:
def wav_to_mp3(): FNULL = open(os.devnull, 'w') subprocess.call(['ffmpeg', '-i', 'first.wav', '-ac', '1', '-ab', '6400', '-ar', '16000', 'second.mp3', '-y'], stdout=FNULL, stderr=subprocess.STDOUT) def mp3_to_wav(): subprocess.call(['sox', 'second.mp3', '-r', '16000', 'son.wav'])
Google的API与此WAV输出配合使用,但由于质量下降过多,因此效果不佳.
那么如何在第一步创建与pyaudio兼容的WAV文件呢?
使用avconv将wav文件转换为flac文件并将其发送到Google Speech API解决了该问题
subprocess.call(['avconv', '-i', 'first.wav', '-y', '-ar', '48000', '-ac', '1', 'last.flac'])