Skip to content

Instantly share code, notes, and snippets.

@cigzigwon
Created September 16, 2022 15:45
Show Gist options
  • Select an option

  • Save cigzigwon/c81078861d28090b7de9fdbc36d49df4 to your computer and use it in GitHub Desktop.

Select an option

Save cigzigwon/c81078861d28090b7de9fdbc36d49df4 to your computer and use it in GitHub Desktop.

Revisions

  1. cigzigwon created this gist Sep 16, 2022.
    24 changes: 24 additions & 0 deletions config.exs
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,24 @@
    config :revo,
    :user_agent,
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"

    config :revo, :wait_intervals, [30_000, 45_000, 60_000, 65_000, 76_000]

    config :crawly,
    concurrent_requests_per_domain: 1,
    closespider_timeout: 1,
    manager_operations_timeout: 5 * 60_000,
    middlewares: [
    Crawly.Middlewares.DomainFilter,
    Crawly.Middlewares.UniqueRequest,
    {Crawly.Middlewares.UserAgent,
    user_agents: [
    # "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)",
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"
    ]},
    {Crawly.Middlewares.RequestOptions, [timeout: 16_000]}
    ],
    pipelines: [
    Crawly.Pipelines.JSONEncoder,
    {Crawly.Pipelines.WriteToFile, folder: "/app/priv/data", extension: "jl"}
    ]