Last active
May 4, 2026 19:25
-
-
Save erikvullings/eedb9e4364f52429dbbaf17a17ff5a85 to your computer and use it in GitHub Desktop.
Revisions
-
erikvullings revised this gist
May 4, 2026 . 1 changed file with 10 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -74,7 +74,11 @@ mogrify -path rotated/ \ cd rotated # Create regular PDF magick *.png \ -units PixelsPerInch \ -density 300 \ -resize 1158x1851! \ input.pdf # Add searchable text using ocrmypdf ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf @@ -93,7 +97,11 @@ mogrify -path cropped -crop 1158x1851+95+115 +repage *.png cd cropped # Create regular PDF magick *.png \ -units PixelsPerInch \ -density 300 \ -resize 1158x1851! \ input.pdf # Add searchable text using ocrmypdf ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf -
erikvullings revised this gist
May 4, 2026 . 1 changed file with 9 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -111,3 +111,12 @@ ebook-convert output.pdf book.epub \ --enable-heuristics \ --no-default-epub-cover ``` Alternatively, use `docling` to convert your pdf to markdown first: ```bash PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8 \ PYTORCH_MPS_LOW_WATERMARK_RATIO=0.5 \ docling --from pdf --to md --image-export-mode embedded ocred.pdf ``` -
erikvullings revised this gist
May 4, 2026 . 1 changed file with 13 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -73,16 +73,26 @@ mogrify -path rotated/ \ cd rotated # Create regular PDF magick *.png input.pdf # Add searchable text using ocrmypdf ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf # Compress gs -sDEVICE=pdfwrite \ -dCompatibilityLevel=1.4 \ -dPDFSETTINGS=/ebook \ -dNOPAUSE -dQUIET -dBATCH \ -sOutputFile=output.pdf \ ocred.pdf # Do the same for the cropped version mogrify -path cropped -crop 1158x1851+95+115 +repage *.png cd cropped # Create regular PDF magick *.png input.pdf # Add searchable text using ocrmypdf -
erikvullings revised this gist
May 4, 2026 . 1 changed file with 31 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -9,12 +9,41 @@ A simple approach and, according to ChatGPT legal approach, is the following: ## Taking screenshots Taking screenshots of 300+ pages is no fun, even on a Mac. I needed to press for each page: - CMD + SHIFT + 5: It opens the screenshot tool. The first time, select the page area and specify the output folder <OUTPUT_FOLDER>. On subsequent times, the same area is selected. - Press "ENTER" to save the page - Press "Page Down" to move to the next page. AppleScript to the rescue: Press CMD+SPACE and type "Script Editor". 1. Create a new script 2. Paste below code 3. Press RUN 4. Quickly click the browser (so it receives keyboard focus for Page Down) ```bash delay 3 tell application "System Events" repeat 322 times -- Open screenshot tool key code 23 using {command down, shift down} -- Cmd+Shift+5 delay 1 -- Press Enter (take screenshot) key code 36 delay 1 -- Go to next page (Page Down) key code 121 delay 1 end repeat end tell ``` **TIP:** If you have the option to display the page in landscape format, e.g. rotate page 90 degrees counter-clockwise, the image resolution is better. ## Tools you need (on Mac) -
erikvullings revised this gist
May 4, 2026 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -30,7 +30,7 @@ brew install --cask calibre cd <OUTPUT_FOLDER> # Rename to page_000.png i=0; for f in *.png(.n); do mv "$f" "$(printf "page_%03d.png" "$i")"; ((i++)); done # Prepare outputs mkdir -p rotated/cropped -
erikvullings revised this gist
May 4, 2026 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -68,7 +68,7 @@ gs -sDEVICE=pdfwrite \ ocred.pdf # To epub using calibre ebook-convert output.pdf book.epub \ --enable-heuristics \ --no-default-epub-cover ``` -
erikvullings revised this gist
May 4, 2026 . 1 changed file with 10 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -20,7 +20,7 @@ Taking screenshots of 300+ pages is no fun, even on a Mac. I needed to press: ## Tools you need (on Mac) ```bash brew install imagemagick ocrmypdf ghostscript brew install --cask calibre ``` @@ -54,11 +54,19 @@ mogrify -path cropped -crop 1158x1851+95+115 +repage *.png cd cropped # Concatenate magick *.png input.pdf # Add searchable text using ocrmypdf ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf # Compress gs -sDEVICE=pdfwrite \ -dCompatibilityLevel=1.4 \ -dPDFSETTINGS=/ebook \ -dNOPAUSE -dQUIET -dBATCH \ -sOutputFile=output.pdf \ ocred.pdf # To epub using calibre ebook-convert ocred.pdf book.epub \ --enable-heuristics \ -
erikvullings created this gist
May 4, 2026 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,66 @@ # Converting digital DRM protected ebook to PDF and epub for offline reading I recently bought a digital, DRM protected, ebook. Since it is protected, you are not allowed to download it, so I always have to use a browser. Since I prefer an ereader, I looked for legal ways to convert my book to a PDF or epub. A simple approach and, according to ChatGPT legal approach, is the following: 1. Create screenshots of the book pages in the online reader 2. Bundle the screenshots to epub or pdf ## Taking screenshots Taking screenshots of 300+ pages is no fun, even on a Mac. I needed to press: - CMD + SHIFT + 5: It opens the screenshot tool. The first time, select the page area and specify the output folder <OUTPUT_FOLDER>. On subsequent times, the same area is selected. - Press "ENTER" to save the page - Press "Page Down" to move to the next page. **TIP:** If you have the option to display the page in landscape format, e.g. rotate page 90 degrees counter-clockwise, the image resolution is better. ## Tools you need (on Mac) ```bash brew install imagemagick ocrmypdf brew install --cask calibre ``` ## Post-processing ```bash cd <OUTPUT_FOLDER> # Rename to page_000.png i=0; for f in $(ls -1v *.png); do mv "$f" "$(printf "page_%03d.png" $i)"; ((i++)); done # Prepare outputs mkdir -p rotated/cropped # Rotate mogrify -path rotated/ \ -rotate 90 \ -density 300 \ -format png \ *.png cd rotated ## Create regular PDF magick *.png input.pdf ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf # Crop mogrify -path cropped -crop 1158x1851+95+115 +repage *.png cd cropped # Concatenate magick cropped/*.png input.pdf # Add searchable text using ocrmypdf ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf # To epub using calibre ebook-convert ocred.pdf book.epub \ --enable-heuristics \ --no-default-epub-cover ```