Skip to content

Instantly share code, notes, and snippets.

@erikvullings
Last active May 4, 2026 19:25
Show Gist options
  • Select an option

  • Save erikvullings/eedb9e4364f52429dbbaf17a17ff5a85 to your computer and use it in GitHub Desktop.

Select an option

Save erikvullings/eedb9e4364f52429dbbaf17a17ff5a85 to your computer and use it in GitHub Desktop.

Revisions

  1. erikvullings revised this gist May 4, 2026. 1 changed file with 10 additions and 2 deletions.
    12 changes: 10 additions & 2 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -74,7 +74,11 @@ mogrify -path rotated/ \
    cd rotated

    # Create regular PDF
    magick *.png input.pdf
    magick *.png \
    -units PixelsPerInch \
    -density 300 \
    -resize 1158x1851! \
    input.pdf

    # Add searchable text using ocrmypdf
    ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf
    @@ -93,7 +97,11 @@ mogrify -path cropped -crop 1158x1851+95+115 +repage *.png
    cd cropped

    # Create regular PDF
    magick *.png input.pdf
    magick *.png \
    -units PixelsPerInch \
    -density 300 \
    -resize 1158x1851! \
    input.pdf

    # Add searchable text using ocrmypdf
    ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf
  2. erikvullings revised this gist May 4, 2026. 1 changed file with 9 additions and 0 deletions.
    9 changes: 9 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -111,3 +111,12 @@ ebook-convert output.pdf book.epub \
    --enable-heuristics \
    --no-default-epub-cover
    ```

    Alternatively, use `docling` to convert your pdf to markdown first:

    ```bash
    PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8 \
    PYTORCH_MPS_LOW_WATERMARK_RATIO=0.5 \
    docling --from pdf --to md --image-export-mode embedded ocred.pdf
    ```

  3. erikvullings revised this gist May 4, 2026. 1 changed file with 13 additions and 3 deletions.
    16 changes: 13 additions & 3 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -73,16 +73,26 @@ mogrify -path rotated/ \

    cd rotated

    ## Create regular PDF
    # Create regular PDF
    magick *.png input.pdf

    # Add searchable text using ocrmypdf
    ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf

    # Crop
    # Compress
    gs -sDEVICE=pdfwrite \
    -dCompatibilityLevel=1.4 \
    -dPDFSETTINGS=/ebook \
    -dNOPAUSE -dQUIET -dBATCH \
    -sOutputFile=output.pdf \
    ocred.pdf

    # Do the same for the cropped version
    mogrify -path cropped -crop 1158x1851+95+115 +repage *.png

    cd cropped

    # Concatenate
    # Create regular PDF
    magick *.png input.pdf

    # Add searchable text using ocrmypdf
  4. erikvullings revised this gist May 4, 2026. 1 changed file with 31 additions and 2 deletions.
    33 changes: 31 additions & 2 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -9,12 +9,41 @@ A simple approach and, according to ChatGPT legal approach, is the following:

    ## Taking screenshots

    Taking screenshots of 300+ pages is no fun, even on a Mac. I needed to press:
    Taking screenshots of 300+ pages is no fun, even on a Mac. I needed to press for each page:
    - CMD + SHIFT + 5: It opens the screenshot tool.
    The first time, select the page area and specify the output folder <OUTPUT_FOLDER>. On subsequent times, the same area is selected.
    - Press "ENTER" to save the page
    - Press "Page Down" to move to the next page.


    AppleScript to the rescue:

    Press CMD+SPACE and type "Script Editor".
    1. Create a new script
    2. Paste below code
    3. Press RUN
    4. Quickly click the browser (so it receives keyboard focus for Page Down)

    ```bash
    delay 3
    tell application "System Events"
    repeat 322 times

    -- Open screenshot tool
    key code 23 using {command down, shift down} -- Cmd+Shift+5
    delay 1

    -- Press Enter (take screenshot)
    key code 36
    delay 1

    -- Go to next page (Page Down)
    key code 121
    delay 1

    end repeat
    end tell
    ```

    **TIP:** If you have the option to display the page in landscape format, e.g. rotate page 90 degrees counter-clockwise, the image resolution is better.

    ## Tools you need (on Mac)
  5. erikvullings revised this gist May 4, 2026. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -30,7 +30,7 @@ brew install --cask calibre
    cd <OUTPUT_FOLDER>

    # Rename to page_000.png
    i=0; for f in $(ls -1v *.png); do mv "$f" "$(printf "page_%03d.png" $i)"; ((i++)); done
    i=0; for f in *.png(.n); do mv "$f" "$(printf "page_%03d.png" "$i")"; ((i++)); done

    # Prepare outputs
    mkdir -p rotated/cropped
  6. erikvullings revised this gist May 4, 2026. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -68,7 +68,7 @@ gs -sDEVICE=pdfwrite \
    ocred.pdf

    # To epub using calibre
    ebook-convert ocred.pdf book.epub \
    ebook-convert output.pdf book.epub \
    --enable-heuristics \
    --no-default-epub-cover
    ```
  7. erikvullings revised this gist May 4, 2026. 1 changed file with 10 additions and 2 deletions.
    12 changes: 10 additions & 2 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -20,7 +20,7 @@ Taking screenshots of 300+ pages is no fun, even on a Mac. I needed to press:
    ## Tools you need (on Mac)

    ```bash
    brew install imagemagick ocrmypdf
    brew install imagemagick ocrmypdf ghostscript
    brew install --cask calibre
    ```

    @@ -54,11 +54,19 @@ mogrify -path cropped -crop 1158x1851+95+115 +repage *.png
    cd cropped

    # Concatenate
    magick cropped/*.png input.pdf
    magick *.png input.pdf

    # Add searchable text using ocrmypdf
    ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf

    # Compress
    gs -sDEVICE=pdfwrite \
    -dCompatibilityLevel=1.4 \
    -dPDFSETTINGS=/ebook \
    -dNOPAUSE -dQUIET -dBATCH \
    -sOutputFile=output.pdf \
    ocred.pdf

    # To epub using calibre
    ebook-convert ocred.pdf book.epub \
    --enable-heuristics \
  8. erikvullings created this gist May 4, 2026.
    66 changes: 66 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,66 @@
    # Converting digital DRM protected ebook to PDF and epub for offline reading

    I recently bought a digital, DRM protected, ebook. Since it is protected, you are not allowed to download it, so I always have to use a browser. Since I prefer an ereader, I looked for legal ways to convert my book to a PDF or epub.

    A simple approach and, according to ChatGPT legal approach, is the following:

    1. Create screenshots of the book pages in the online reader
    2. Bundle the screenshots to epub or pdf

    ## Taking screenshots

    Taking screenshots of 300+ pages is no fun, even on a Mac. I needed to press:
    - CMD + SHIFT + 5: It opens the screenshot tool.
    The first time, select the page area and specify the output folder <OUTPUT_FOLDER>. On subsequent times, the same area is selected.
    - Press "ENTER" to save the page
    - Press "Page Down" to move to the next page.

    **TIP:** If you have the option to display the page in landscape format, e.g. rotate page 90 degrees counter-clockwise, the image resolution is better.

    ## Tools you need (on Mac)

    ```bash
    brew install imagemagick ocrmypdf
    brew install --cask calibre
    ```

    ## Post-processing

    ```bash
    cd <OUTPUT_FOLDER>

    # Rename to page_000.png
    i=0; for f in $(ls -1v *.png); do mv "$f" "$(printf "page_%03d.png" $i)"; ((i++)); done

    # Prepare outputs
    mkdir -p rotated/cropped

    # Rotate
    mogrify -path rotated/ \
    -rotate 90 \
    -density 300 \
    -format png \
    *.png

    cd rotated

    ## Create regular PDF
    magick *.png input.pdf
    ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf

    # Crop
    mogrify -path cropped -crop 1158x1851+95+115 +repage *.png

    cd cropped

    # Concatenate
    magick cropped/*.png input.pdf

    # Add searchable text using ocrmypdf
    ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf

    # To epub using calibre
    ebook-convert ocred.pdf book.epub \
    --enable-heuristics \
    --no-default-epub-cover
    ```