Skip to content

Instantly share code, notes, and snippets.

View vucong2409's full-sized avatar
๐Ÿ‘€
:idle:

Inco1122 vucong2409

๐Ÿ‘€
:idle:
  • ...
  • Vietnam
View GitHub Profile
boto3==1.34.90
botocore==1.34.90
click==8.1.7
jmespath==1.0.1
python-dateutil==2.9.0.post0
s3transfer==0.10.1
six==1.16.0
urllib3==2.2.1
#!/bin/python3
import click
import boto3
from click import ClickException
KEY_VALUE_LINE_PATTERN = "{}={}\n"
@click.command()
@vucong2409
vucong2409 / crawler.py
Created May 10, 2022 04:34
CNBC Crawler
import requests
import json
batchSizeParam = 100
# endIndexParam = 47142
endIndexParam = 47142
index_number = 0
apiBasePathParam = "{API_PATH}&endIndex="
batchSizeParamStr = "&batchsize="
@vucong2409
vucong2409 / spark_word_count_2.py
Last active April 2, 2022 20:27
Spark WordCount2 with pattern file
from operator import add
from pyspark import SparkConf, SparkContext
APP_NAME = "WordCount2"
SPARK_MASTER_ADDRESS = ""
HADOOP_ADDRESS = ""
need_to_delete_char = []
with open("pattern.txt", "r", encoding='utf-8') as pattern_file:
need_to_delete_char = pattern_file.read() \
@vucong2409
vucong2409 / my_little_spider.py
Last active March 23, 2022 08:14
reviewedu_crawler
import scrapy
URL_TEMPLATE = 'https://reviewedu.net/school/page/'
URL_FIRST_PAGE = 'https://reviewedu.net/school/'
INFO_BOX_XPATH = '//div[contains(@class,"box-text-products")]'
UNI_NAME_XPATH = 'div[contains(@class,"title-wrapper")]/p/a/text()'
RATING_XPATH = 'div[contains(@class,"rating-group")]/div[contains(@class,"rating-score__number")]/text()'
NUMBER_OF_STUDENT_XPATH = 'div[contains(@class,"rating-score__text-6")]/text()'
COURSE_LENGTH_XPATH = 'div[contains(@class,"hide-sm-less")]/div[contains(@class,"rating-score__text-5")]/text()'