- Create a list with the links to be downloaded. Name it something like list.txt
- Now download entire site data using the command
wget -E -H -k -p -i list.txt - Now all the data should be available offline.
- Go to html files folder and issue the command
pandoc -i filename-*.html -o filename.epub - Alternatively creat an 'index.html' file and include all the links in the body of the file. Like
<a href="https://venmurasu.in/mazhaippadal/chapter-5">1</a> - Now use calibre to convert the
index.htmlto EPUB byebook-convert index.html book.epub --title Title --authors Author --cover cover.png
Sample index.html file
<!doctype html>
<html>
<head>
<title>This is the title of the webpage!</title>
</head>
<body>
<a href="chapter-1.html">1</a>
<a href="chapter-2.html">2</a>
<a href="chapter-3.html">3</a>
<a href="chapter-4.html">4</a>
</body>
</html>
Note on pandoc link list file
- If the links ends with number in ascending order copy the first link to LibreOffice Sheets and extend it to till the last number to generate all the link.
To Delete Unwanted Strings in the scrapped text
Input required strings in a file (e.g. delete.txt). Each string should be entered in a new line as below:
:::
venmurasu.jpg
.items-center
.p-1
.text
.feather
.ml-3
data:image
# [[]{.icon
.items
.border
.transition
Home
.icon
.mb-4
Now use grep command to filter the strings. grep -vf delete.txt filename.md > tmpfile && mv tmpfile newname.md
- Install
EpubPressin the browser (Tested on chrome - Extension Link). - Open the webpages (to be converted into EPUB) in different tabs
- Click the extension and select the pages and process
Limitations of EpubPress
- Books are limited to containing 50 articles.
- Books must be 10 Mb or less for email delivery to work.
- Images in an article must be 1 Mb or less. Images that exceed this limit will be removed.
- No more than 30 images will be downloaded.