Skip to content

Instantly share code, notes, and snippets.

@vucong2409
Created May 10, 2022 04:34
Show Gist options
  • Select an option

  • Save vucong2409/b72ea004c837e9c60a146d1fc2837492 to your computer and use it in GitHub Desktop.

Select an option

Save vucong2409/b72ea004c837e9c60a146d1fc2837492 to your computer and use it in GitHub Desktop.
CNBC Crawler
import requests
import json
batchSizeParam = 100
# endIndexParam = 47142
endIndexParam = 47142
index_number = 0
apiBasePathParam = "{API_PATH}&endIndex="
batchSizeParamStr = "&batchsize="
output_file = open("result.csv", "w")
output_file.writelines("No|Title|Summary|Time\n")
while (endIndexParam > 0):
endIndexParam = endIndexParam - batchSizeParam
res = requests.get(apiBasePathParam + str(endIndexParam) + batchSizeParamStr + str(batchSizeParam))
response_in_json = json.loads(res.text)
list_news = response_in_json["results"]
for new in list_news:
index_number = index_number + 1;
output_file.writelines(str(index_number) + "|" + new["cn:title"] + "|" + new["summary"] + "|" + new["dateModified"] + "\n")
print(index_number)
output_file.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment