Skip to content

Instantly share code, notes, and snippets.

@larjudge
larjudge / README.openai-structured-output-demo.md
Created October 14, 2024 22:55 — forked from dannguyen/README.openai-structured-output-demo.md
A basic test of OpenAI's Structured Output feature against financial disclosure reports and a newspaper's police blotter. Code examples use the Python SDK and pydantic for the schema definition.

Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output

tl;dr this demo shows how to call OpenAI's gpt-4o-mini model, provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.

OpenAI announced Structured Outputs for its API, a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.

For example, given a Congressional financial disclosure report, with assets defined in a table like this:

@larjudge
larjudge / delete_tweets.sh
Created April 2, 2020 02:33 — forked from nerdcha/delete_tweets.sh
Bash oneliner to delete tweets
#!/bin/bash
# Context: https://jamiehall.cc/2020/03/10/delete-all-your-tweets-with-one-line-of-bash/
# https://news.ycombinator.com/item?id=22689746
twurl "/1.1/statuses/user_timeline.json?screen_name=YOUR_TWITTER_HANDLE&count=200&max_id=$(twurl '/1.1/statuses/user_timeline.json?screen_name=YOUR_TWITTER_HANDLE&count=200&include_rts=1' | jq -r '.[9]|.id_str')&include_rts=1" | jq -r '.[]|.id_str' | parallel -j 10 -a - twurl -X POST /1.1/statuses/destroy/{1}.json > /dev/null