Skip to content

Instantly share code, notes, and snippets.

@dtinth
Created August 13, 2019 12:59
Show Gist options
  • Select an option

  • Save dtinth/955be399d4b8442344db02c17a64ddca to your computer and use it in GitHub Desktop.

Select an option

Save dtinth/955be399d4b8442344db02c17a64ddca to your computer and use it in GitHub Desktop.

Revisions

  1. dtinth created this gist Aug 13, 2019.
    61 changes: 61 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,61 @@
    How to transcribe Thai speech in videos into text.

    ## Requirements

    - Google Cloud or Firebase project **with** billing enabled.

    - [`gcloud` command line tool installed](https://cloud.google.com/sdk/gcloud/).

    - `ffmpeg` or Docker.

    - `youtube-dl` to download YouTube videos.

    - 30 Baht per 1 hour of input.

    ## Step 1: Grab the audio track

    Example, from YouTube, using `youtube-dl`:

    ```
    youtube-dl -f bestaudio 'https://www.youtube.com/watch?v=..........'
    ```

    ## Step 2: Convert

    We need to convert a audio into a format that is supported by Google Cloud APIs.
    We will use OGG Opus.

    ```
    docker run -v "$PWD:/data" jrottenberg/ffmpeg -i "/data/<FILENAME>.m4a" -c:a libopus -ar 16000 -ac 1 "/data/<FILENAME>.ogg"
    ```

    To cut a portion of audio, put `-ss <START TIME> -t <DURATION>` before `-i`. For example, `-ss 01:38:23 -t 00:30:00`.

    ## Step 3: Recognize

    1. Upload the ogg file to Google/Firebase Cloud Storage. After uploading, you will get a `<STORAGE LOCATION>` such as `gs://<PROJECT>.appspot.com/transcribe/<FILENAME>.ogg`.

    2. Start the transcription:
    ```sh
    gcloud ml speech recognize-long-running "<STORAGE LOCATION>" --language-code=th --encoding=ogg-opus --include-word-time-offsets --sample-rate=16000 --async
    ```

    It will print out:

    ```json
    {
    "name": "5766027198115285298"
    }
    ```

    This is your `<OPERATION ID>`.

    3. Wait for the operation to finish and write the results to the file.

    ```json
    gcloud ml speech operations wait "<OPERATION ID>" > "<FILENAME>.json"
    ```

    View the JSON file.

    ![](https://i.imgur.com/wMu4DUp.png)