prvn-pa/web2epub.md

WEB PAGE TO EPUB CONVERSION

1. USING WGET AND PANDOC

Create a list with the links to be downloaded. Name it something like list.txt
Now download entire site data using the command wget -E -H -k -p -i list.txt
Now all the data should be available offline.
Go to html files folder and issue the command pandoc -i filename-*.html -o filename.epub
Alternatively creat an 'index.html' file and include all the links in the body of the file. Like <a href="https://venmurasu.in/mazhaippadal/chapter-5">1</a>
Now use calibre to convert the index.html to EPUB by ebook-convert index.html book.epub --title Title --authors Author --cover cover.png

Sample index.html file

<!doctype html>
<html>
  <head>
    <title>This is the title of the webpage!</title>
  </head>
  <body>
	<a href="chapter-1.html">1</a>
	<a href="chapter-2.html">2</a>
	<a href="chapter-3.html">3</a>
	<a href="chapter-4.html">4</a>
  </body>
</html>

Note on pandoc link list file

If the links ends with number in ascending order copy the first link to LibreOffice Sheets and extend it to till the last number to generate all the link.

To Delete Unwanted Strings in the scrapped text

Input required strings in a file (e.g. delete.txt). Each string should be entered in a new line as below:

:::
venmurasu.jpg
.items-center
.p-1
.text
.feather
.ml-3
data:image
# [[]{.icon
.items
.border
.transition
Home
.icon
.mb-4

Now use grep command to filter the strings. grep -vf delete.txt filename.md > tmpfile && mv tmpfile newname.md

2. Using Chrome Extension

Install EpubPress in the browser (Tested on chrome - Extension Link).
Open the webpages (to be converted into EPUB) in different tabs
Click the extension and select the pages and process

Limitations of EpubPress

Books are limited to containing 50 articles.
Books must be 10 Mb or less for email delivery to work.
Images in an article must be 1 Mb or less. Images that exceed this limit will be removed.
No more than 30 images will be downloaded.