Skip to content

Instantly share code, notes, and snippets.

View glowdan's full-sized avatar
:octocat:
Watching

Glowdan glowdan

:octocat:
Watching
  • Hello Group.inc
  • Beijing, China
View GitHub Profile
@glowdan
glowdan / tokenization.cpp
Created April 15, 2025 10:09 — forked from luistung/tokenization.cpp
c++ version of bert tokenize
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <unordered_map>
#include <boost/algorithm/string.hpp>
#include <utf8proc.h>
//https://unicode.org/reports/tr15/#Norm_Forms
//https://ssl.icu-project.org/apiref/icu4c/uchar_8h.html
@glowdan
glowdan / subs.md
Created April 30, 2024 10:36 — forked from tatsumoto-ren/subs.md
Japanese Subtitles

📓 Table of Contents 📚 Resources ✉️ Chat


kitsunekko.net jp subtitles

A large repository of japanese subtitles that is updated reasonably often and has a clean design.| The most popular one, you can upload your own subs.| Often have to be retimed.