Skip to content

Instantly share code, notes, and snippets.

View XiaoxiaLei's full-sized avatar
💭
I may be slow to respond.

Xiaoxia Lei XiaoxiaLei

💭
I may be slow to respond.
View GitHub Profile
@XiaoxiaLei
XiaoxiaLei / TextTeaser源码阅读笔记
Created July 12, 2018 06:29 — forked from rsarxiv/TextTeaser源码阅读笔记
TextTeaser源码阅读笔记
一共三个class,TextTeaser,Parser,Summarizer。
1、TextTeaser,程序入口类。给定待摘要的文本和文本题目,输出文本摘要,默认是原文中最重要的5句话。
2、Summarizer,生成摘要类。计算出每句话的分数,并按照得分做排序,然后按照原文中句子的顺序依次输出得分最高的5句话作为摘要。
关键在于如何计算句子的得分,打分模型分为四个部分:
1)句子长度,长度为20的句子为最理想的长度,依照距离这个长度来打分。
@alexhanna
alexhanna / social-science-programming.md
Last active May 15, 2025 00:45
Notes on social science programming principles
  1. Code and Data for the Social Sciences: A Practitioner’s Guide, Gentzkow and Shapiro.
  2. Good enough practices in scientific computing, Wilson et al.
  3. Best Practices for Scientific Computing, Wilson et al.
  4. Principled Data Processing, Patrick Ball.
  5. The Plain Person’s Guide to Plain Text Social Science, Healy.
  6. Avoiding technical debt in social science research, Toor.
@rsarxiv
rsarxiv / TextTeaser源码阅读笔记
Created March 30, 2016 12:48
TextTeaser源码阅读笔记
一共三个class,TextTeaser,Parser,Summarizer。
1、TextTeaser,程序入口类。给定待摘要的文本和文本题目,输出文本摘要,默认是原文中最重要的5句话。
2、Summarizer,生成摘要类。计算出每句话的分数,并按照得分做排序,然后按照原文中句子的顺序依次输出得分最高的5句话作为摘要。
关键在于如何计算句子的得分,打分模型分为四个部分:
1)句子长度,长度为20的句子为最理想的长度,依照距离这个长度来打分。
@entaroadun
entaroadun / gist:1653794
Created January 21, 2012 20:10
Recommendation and Ratings Public Data Sets For Machine Learning

Movies Recommendation:

Music Recommendation: