Created
August 13, 2019 12:59
-
-
Save dtinth/955be399d4b8442344db02c17a64ddca to your computer and use it in GitHub Desktop.
Revisions
-
dtinth created this gist
Aug 13, 2019 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,61 @@ How to transcribe Thai speech in videos into text. ## Requirements - Google Cloud or Firebase project **with** billing enabled. - [`gcloud` command line tool installed](https://cloud.google.com/sdk/gcloud/). - `ffmpeg` or Docker. - `youtube-dl` to download YouTube videos. - 30 Baht per 1 hour of input. ## Step 1: Grab the audio track Example, from YouTube, using `youtube-dl`: ``` youtube-dl -f bestaudio 'https://www.youtube.com/watch?v=..........' ``` ## Step 2: Convert We need to convert a audio into a format that is supported by Google Cloud APIs. We will use OGG Opus. ``` docker run -v "$PWD:/data" jrottenberg/ffmpeg -i "/data/<FILENAME>.m4a" -c:a libopus -ar 16000 -ac 1 "/data/<FILENAME>.ogg" ``` To cut a portion of audio, put `-ss <START TIME> -t <DURATION>` before `-i`. For example, `-ss 01:38:23 -t 00:30:00`. ## Step 3: Recognize 1. Upload the ogg file to Google/Firebase Cloud Storage. After uploading, you will get a `<STORAGE LOCATION>` such as `gs://<PROJECT>.appspot.com/transcribe/<FILENAME>.ogg`. 2. Start the transcription: ```sh gcloud ml speech recognize-long-running "<STORAGE LOCATION>" --language-code=th --encoding=ogg-opus --include-word-time-offsets --sample-rate=16000 --async ``` It will print out: ```json { "name": "5766027198115285298" } ``` This is your `<OPERATION ID>`. 3. Wait for the operation to finish and write the results to the file. ```json gcloud ml speech operations wait "<OPERATION ID>" > "<FILENAME>.json" ``` View the JSON file. 