The following gist is an extract of the article Building a simple crawler. It allows crawling from a URL and for a given number of bounce.
from crawler import Crawler
crawler = Crawler()
crawler.crawl('http://techcrunch.com/')
| { | |
| "bitcoin": [ | |
| { | |
| "24h_vol": null, | |
| "date": "12/5/2013", | |
| "market_cap": null, | |
| "price_usd": null | |
| }, | |
| { | |
| "24h_vol": null, |
| <!DOCTYPE html> | |
| <html lang="en"> | |
| <meta charset="utf-8"> | |
| <script src="https://d3js.org/d3.v4.min.js"></script> | |
| <!-- Google fonts Second Reference--> | |
| <link href="https://fonts.googleapis.com/css?family=Barlow:100,200,300,400,500&display=swap" rel="stylesheet"> | |
| <!-- Connecting with D3 library--> | |
| <script src="https://d3js.org/d3.v3.min.js"></script> |
| license: gpl-3.0 |
| license: gpl-3.0 |
| license: gpl-3.0 |
The following gist is an extract of the article Building a simple crawler. It allows crawling from a URL and for a given number of bounce.
from crawler import Crawler
crawler = Crawler()
crawler.crawl('http://techcrunch.com/')