I recently bought a digital, DRM protected, ebook. Since it is protected, you are not allowed to download it, so I always have to use a browser. Since I prefer an ereader, I looked for legal ways to convert my book to a PDF or epub.
A simple approach and, according to ChatGPT legal approach, is the following:
- Create screenshots of the book pages in the online reader
- Bundle the screenshots to epub or pdf
Taking screenshots of 300+ pages is no fun, even on a Mac. I needed to press for each page:
- CMD + SHIFT + 5: It opens the screenshot tool. The first time, select the page area and specify the output folder <OUTPUT_FOLDER>. On subsequent times, the same area is selected.
- Press "ENTER" to save the page
- Press "Page Down" to move to the next page.
AppleScript to the rescue:
Press CMD+SPACE and type "Script Editor".
- Create a new script
- Paste below code
- Press RUN
- Quickly click the browser (so it receives keyboard focus for Page Down)
delay 3
tell application "System Events"
repeat 322 times
-- Open screenshot tool
key code 23 using {command down, shift down} -- Cmd+Shift+5
delay 1
-- Press Enter (take screenshot)
key code 36
delay 1
-- Go to next page (Page Down)
key code 121
delay 1
end repeat
end tellTIP: If you have the option to display the page in landscape format, e.g. rotate page 90 degrees counter-clockwise, the image resolution is better.
brew install imagemagick ocrmypdf ghostscript
brew install --cask calibrecd <OUTPUT_FOLDER>
# Rename to page_000.png
i=0; for f in *.png(.n); do mv "$f" "$(printf "page_%03d.png" "$i")"; ((i++)); done
# Prepare outputs
mkdir -p rotated/cropped
# Rotate
mogrify -path rotated/ \
-rotate 90 \
-density 300 \
-format png \
*.png
cd rotated
# Create regular PDF
magick *.png \
-units PixelsPerInch \
-density 300 \
-resize 1158x1851! \
input.pdf
# Add searchable text using ocrmypdf
ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf
# Compress
gs -sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=output.pdf \
ocred.pdf
# Do the same for the cropped version
mogrify -path cropped -crop 1158x1851+95+115 +repage *.png
cd cropped
# Create regular PDF
magick *.png \
-units PixelsPerInch \
-density 300 \
-resize 1158x1851! \
input.pdf
# Add searchable text using ocrmypdf
ocrmypdf --optimize 3 --rotate-pages input.pdf ocred.pdf
# Compress
gs -sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=output.pdf \
ocred.pdf
# To epub using calibre
ebook-convert output.pdf book.epub \
--enable-heuristics \
--no-default-epub-cover Alternatively, use docling to convert your pdf to markdown first:
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8 \
PYTORCH_MPS_LOW_WATERMARK_RATIO=0.5 \
docling --from pdf --to md --image-export-mode embedded ocred.pdf