Skip to content

Instantly share code, notes, and snippets.

import pandas as pd
import nltk
from tqdm import tnrange
import re
import gdelt
# Version 2 queries
gd2 = gdelt.gdelt(version=2)
# days
!pip install gdelt #make sure gdelt installed
import pandas as pd, numpy as np, matplotlib.pyplot as plt, gdelt, os, datetime, warnings #imports
gd = gdelt.gdelt(version=1) #instantiate object to pull gdelt files
os.makedirs("data",exist_ok=True) #check if there's a data folder
cur_date = datetime.datetime(2019,10,7)-datetime.timedelta(days=60) #start pulling from 60 days prior to 10/7
while cur_date < datetime.datetime(2019,10,7): #pull until 10/7
if not os.path.exists("data/%s-%s-%s.pkl"%(cur_date.year, cur_date.month, cur_date.day)): #if don't have
@mneedham
mneedham / 00_install.sh
Last active April 1, 2025 16:55
Getting Neo4j Tweets
pip install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint
pip install confluent-kafka[avro]
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ahalterman
ahalterman / spacy_events.py
Created March 13, 2018 20:21
Event Data in 30 Lines of Python
import spacy
nlp = spacy.load("en_core_web_lg")
with open("scraped.json", "r") as f:
news = json.load(f)
news = [i['body'] for i in news]
processed_docs = list(nlp.pipe(news))
verb_list = ["launch", "begin", "initiate", "start"]
dobj_list = ["attack", "offensive", "operation", "assault"]
@linwoodc3
linwoodc3 / utilities.py
Last active February 14, 2025 01:49
A python script to scrape text from websites. This works surprisingly well on most news websites when you have the URL to the story. Use GDELT urls for the best results.
# Author: Linwood Creekmore
# Email: valinvescap@gmail.com
# Description: Python script to pull content from a website (works on news stories).
#Licensed under GNU GPLv3; see https://choosealicense.com/licenses/lgpl-3.0/ for details
# Notes
"""
23 Oct 2017: updated to include readability based on PyCon talk: https://github.com/DistrictDataLabs/PyCon2016/blob/master/notebooks/tutorial/Working%20with%20Text%20Corpora.ipynb
18 Jul 2018: added keywords and summary
@pmgreen
pmgreen / openrefine_regexp.md
Last active November 21, 2022 21:49
Quick primer on using regular expressions in OpenRefine.

Using regular expressions in OpenRefine

A regular expression is a string that describes a text pattern occurring in other strings, m'kay.

Basic concepts

With which one can go quite far.

* metacharacters
* character escapes \
* anchors \A\Z or ^$
@reorx
reorx / python_tutorials.rst
Last active May 10, 2022 03:39
Python 中英文教程及其他进阶资源