AI ninja project [day 7] 语音转文字

开会的时候，是有可能不留下会议记录的，
当会议做出了错误的决定，造成了破口，
就很难追究责任，甚至当一切好像没事发生一样。

因此，这里我们使用了GCP的Speech-to-Text功能，

启动该API之後我们可以试着本地端使用该功能:

安装

pip install  google-cloud-speech

本地端使用

import os

credential_path = "cred.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path

def transcribe_file(speech_file):
    """Transcribe the given audio file."""
    from google.cloud import speech
    import io

    client = speech.SpeechClient()

    with io.open(speech_file, "rb") as audio_file:
        content = audio_file.read()

    audio = speech.RecognitionAudio(content=content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=8000,
        language_code="en-US",
    )

    response = client.recognize(config=config, audio=audio)

    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print(u"Transcript: {}".format(result.alternatives[0].transcript))


transcribe_file("speach.wav")

我们可以在最後一行发现我们将本地端的wav录音档转换成文字，
而中间config language_code的部分，
我们可以从
https://cloud.google.com/speech-to-text/docs/languages
寻找支援的语言(像是繁体中文zh-TW)，
而sample_rate_hertz会在第一次执行之後告诉你该录音档的频率为多少，
是可能需要做调整才能正确执行程序。

而如果使用云端储存空间google-cloud-storage，官网也有提供范例:

# Imports the Google Cloud client library
from google.cloud import speech
import os

credential_path = "cred.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path

# Instantiates a client
client = speech.SpeechClient()

# The name of the audio file to transcribe
gcs_uri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw"

audio = speech.RecognitionAudio(uri=gcs_uri)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="en-US",
)

# zh-TW

# Detects speech in the audio file
response = client.recognize(config=config, audio=audio)

for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))

可以发现，主要就是audio = speech.RecognitionAudio()的参数，
由content换成uri。

价格的话，每个月前一个小时免费，之後翻译一个小时大约45元的台币。

<<: Day5 - numpy(4)ndarray的运算及全域函式

>>: [Tableau Public] day 7：尝试制作不同种类的报表-4