Skip to content

Instantly share code, notes, and snippets.

View yfu's full-sized avatar
😃

Yu Fu yfu

😃
View GitHub Profile
export const gpt_functions_param_jobs_fetching = [
{
name: "process_job_data",
description: "Process job data and extract core fields for a scraper.",
parameters: {
type: "object",
additionalProperties: false,
properties: {
// Company
company_name: {
@yfu
yfu / maker_genome_annotation.md
Created March 24, 2019 19:05 — forked from darencard/maker_genome_annotation.md
In-depth description of running MAKER for genome annotation.

Genome Annotation using MAKER

MAKER is a great tool for annotating a reference genome using empirical and ab initio gene predictions. GMOD, the umbrella organization that includes MAKER, has some nice tutorials online for running MAKER. However, these were quite simplified examples and it took a bit of effort to wrap my head completely around everything. Here I will describe a de novo genome annotation for Boa constrictor in detail, so that there is a record and that it is easy to use this as a guide to annotate any genome.

Software & Data

Software prerequisites:

  1. RepeatModeler and RepeatMasker with all dependencies (I used NCBI BLAST) and RepBase (version used was 20150807).
  2. MAKER MPI version 2.31.8 (though any other version 2 releases should be okay).
  3. [Augustus](http://bio
#!/usr/bin/ruby
# Create display override file to force Mac OS X to use RGB mode for Display
# see http://embdev.net/topic/284710
require 'base64'
data=`ioreg -l -d0 -w 0 -r -c AppleDisplay`
edids=data.scan(/IODisplayEDID.*?<([a-z0-9]+)>/i).flatten
vendorids=data.scan(/DisplayVendorID.*?([0-9]+)/i).flatten
@yfu
yfu / tmux.cheat
Created October 2, 2016 05:33 — forked from afair/tmux.cheat
Tmux Quick Reference & Cheat sheet - 2 column format for less scrolling!
========================================== ==========================================
TMUX COMMAND WINDOW (TAB)
========================================== ==========================================
List tmux ls List ^b w
New -s <session> Create ^b c
Attach att -t <session> Rename ^b , <name>
Rename rename-session -t <old> <new> Last ^b l (lower-L)
Kill kill-session -t <session> Close ^b &
@yfu
yfu / blat_in_parallel.sh
Created March 18, 2015 22:05
blat_in_parallel.sh
awk '{ if(NR%2==1) { header=$0; fn=substr(header, 2, length(header)-1); fn=fn ".fa"} else { seq=$1; print header "\n" seq > fn } }' ../piRNA.cluster.pachytene.transcripts.fa
# blat -dots=1 ~/data/shared/mm10/mm10.2bit ../piRNA.cluster.pachytene.transcripts.fa -ooc=$HOME/data/shared/mm10/mm10.11.ooc piRNA.cluster.pachytene.transcripts.blat.psl &> piRNA.cluster.pachytene.transcripts.blat.psl.log
ls -1 chr*.fa | parallel -j20 --progress blat ~/data/shared/mm10/mm10.2bit {} -ooc=/data/fuy2/shared/mm10/mm10.11.ooc {.}.blat
@yfu
yfu / clean_up_gencode_gtf.awk
Last active August 29, 2015 14:14
Clean up the GENCODE GTF file for Cufflinks
# Usage: cat /home/fuy2/data/shared/mm10/GENCODE_release_M4/gencode.vM4.annotation.gtf | awk 'BEGIN{FS=OFs="\t"} { if($3=="CDS" || $3=="exon" || $3=="start_codon" || $3=="stop_codon") {print} }' | gawk 'BEGIN{FS=OFS="\t"} { info=$9; match(info, /gene_id ("[^"]+")/, gi); match(info, /gene_name ("[^"]+")/, gn) ; match(info, /transcript_id ("[^"]+")/, ti); print $1, $2, $3, $4, $5, $6, $7, $8, "gene_id " gi[1] "; gene_name " gn[1] "; transcript_id " ti[1] ";" ; }' > gencode.vM4.annotation.clean.gtf
BEGIN{FS=OFs="\t"} { if($3=="CDS" || $3=="exon" || $3=="start_codon" || $3=="stop_codon") {print} }' | gawk 'BEGIN{FS=OFS="\t"} { info=$9; match(info, /gene_id ("[^"]+")/, gi); match(info, /gene_name ("[^"]+")/, gn) ; match(info, /transcript_id ("[^"]+")/, ti); print $1, $2, $3, $4, $5, $6, $7, $8, "gene_id " gi[1] "; gene_name " gn[1] "; transcript_id " ti[1] ";" ; }
@yfu
yfu / run_multiz.sh
Last active August 29, 2015 14:09
Run Multiz for SOAPdenovo and Velvet assemblies
cat /data/fuy2/cl/results/2014-11-06/Hi5_soapdenovo_69mer.fa | tr ' ' '_' | gawk '{ if($1 ~ "^>") { match($1, "^>(.+)", array); printf "\n" array[1] "\t"} else { printf $1 } }' | grep -E -v '^$' | awk '{ print ">sd69:" $1 ":1:+:" length($2); print $2 }' > sd69
cat /data/fuy2/cl/results/2014-11-06/Hi5_velvet_69mer.fa | gawk '{ if($1 ~ "^>") { match($1, ">NODE_([0-9]+).+", array); printf "\n" array[1] "\t" } else{printf $1} }' | grep -v -E '^$' | awk '{ print ">vv:" $1 ":1:+:"length($2); print $2 }' > vv69
all_bz - '(sd69 vv69)' > all_bz.log
@yfu
yfu / 炒蛋
Created May 12, 2014 01:01
转载:炒蛋
炒蛋
  床上,一位老人。床下,一位年轻人,垂头肃立一旁。床边放一矮几,矮几上有一碟子,一碟子炒鸡蛋,老人在吃鸡蛋。
  “我们还有鸡蛋没有?”老人说话了。
  “没有了,这是最后两枚”年轻人恭敬地答到。
  “鸡蛋没了,看来我大限已到”老人眼中闪过一丝悲哀,“阿蛋,把那几只不下蛋的老母鸡都放了,你也该走了,去北京,永远不要回来。”
  “可我除了炒鸡蛋,什么都不会”
  “这已足够,谁能吃到你炒的鸡蛋,都是他八辈子修来的福分”
  北京,阿蛋在北京街头。
  阿蛋在电线竿下两眼发呆。上面有张纸条:
@yfu
yfu / good_old_plain_text
Created May 11, 2014 17:31
My first gist
Hello world
#!/bin/bash
# Mavericks has a nasty issue regarding ARPs in corporate networks.
# It appears that they have tried to reduce bandwidth utilization by caching
# the results of ARPs. Unfortunately, this causes big problems with corporate
# networks with Virtual IPs or other corporate network redundancy measures.
# There exist other patches for this issue
# (see https://github.com/MacMiniVault/Mac-Scripts/blob/master/unicastarp/unicastarp-README.md)
# but they write a value that will be pulled everytime the machine reboots. Because