Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save glebkuznetsov/c2feedccf416468d5420 to your computer and use it in GitHub Desktop.

Select an option

Save glebkuznetsov/c2feedccf416468d5420 to your computer and use it in GitHub Desktop.
Automated MascPCR Primer Design Prototype
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Designing Primers for Experiment 3"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import re\n",
"\n",
"import pandas as pd"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import sequence data, used to generate primers."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from Bio import SeqIO\n",
"with open('../genomes/mg1655.fa') as fh:\n",
" mg1655_record = SeqIO.read(fh, 'fasta')\n",
"with open('../genomes/c321d.fa') as fh:\n",
" c321d_record = SeqIO.read(fh, 'fasta')\n",
"mg1655_seq = str(mg1655_record.seq)\n",
"c321_seq = str(c321d_record.seq)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We start with a file that includes mg1655 positions, refs, alts (generated in the analysis ipython notebook) and target sizes (specified manually)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"experiment_set_pos_ref_alt_df = pd.read_csv('experiment_3_oligos/experiment_3_primer_targets_1_with_amplicon_sizes.csv')\n",
"REQUIRED_KEYS = [\n",
" 'TARGET_AMPLICON_SIZE',\n",
"]\n",
"missing_keys = set(REQUIRED_KEYS) - set(experiment_set_pos_ref_alt_df.keys())\n",
"assert not missing_keys, \"Missing keys %s\" % missing_keys"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to avoid inadvertently introducing other changes, and to check for changes that would fall across a structural variant in C321D relative to MG1655, we'll design our primers in the frame of the C321.deltaA genome. The positions in the imported file are in the frame of MG1655, so we need to map them over to C321.deltaA.\n",
"\n",
"I ran USCS liftover from mg1655 to c321D using `genome-refactor/src/liftOver/liftOver_runner.sh` to generate a chain file, and use `pyliftover` to handle the logic of using the chain file to map positions in mg1655 to their corresponding positions in c321D."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pyliftover\n",
"liftover_obj = pyliftover.LiftOver('../liftover/mg1655.to.c321d.liftOver')\n",
"SOURCE_CHROM = 'gi|49175990|ref|NC_000913.2|' # Liftover parsed this weird. Oh well.\n",
"\n",
"# Mappings I had to resolve manually because the liftover is not robust to deletions apparently.\n",
"MANUAL_MAPPINGS = [\n",
"]\n",
"\n",
"\n",
"def get_c321d_pos_fuzzy(mg1655_pos):\n",
" \"\"\"Fuzzy liftover handler. Tries again on next position if fails.\n",
" \"\"\"\n",
" liftover_pos_list = liftover_obj.convert_coordinate(SOURCE_CHROM, mg1655_pos)\n",
" if liftover_pos_list:\n",
" return liftover_pos_list[0][1]\n",
" \n",
" # Otherwise try next position.\n",
" return get_c321d_pos_fuzzy(mg1655_pos + 1)\n",
" \n",
"\n",
"# Store dictionary containing mg1655 and c321D position, e.g.:\n",
"# {'POSITION': 1, 'C321D_POSITION': 1}]\n",
"mg1655_to_c321d_mapping_list = []\n",
"for idx, row in experiment_set_pos_ref_alt_df.iterrows():\n",
" mg1655_pos = int(row['POSITION'])\n",
" c321_pos = get_c321d_pos_fuzzy(mg1655_pos)\n",
" mg1655_to_c321d_mapping_list.append({\n",
" 'POSITION': mg1655_pos,\n",
" 'C321D_POSITION': c321_pos\n",
" })\n",
" \n",
"experiment_set_pos_ref_alt_mapping_df = pd.merge(\n",
" experiment_set_pos_ref_alt_df,\n",
" pd.DataFrame(mg1655_to_c321d_mapping_list),\n",
" on='POSITION',\n",
" how='inner')\n",
"assert 0 == len(experiment_set_pos_ref_alt_mapping_df[\n",
" experiment_set_pos_ref_alt_mapping_df.C321D_POSITION.apply(pd.isnull)])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import sys\n",
"sys.path.append('/home/glebk/Projects/churchlab/primer_design')\n",
"import primer_design"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"reload(primer_design)\n",
"\n",
"def get_primers(row):\n",
"# print row['POSITION'], row['REF'], row['ALT']\n",
" c321d_pos_pythonic = row['C321D_POSITION'] - 1\n",
" alt_len = len(row['ALT'])\n",
" \n",
" # We refer to REF as wild-type (MG1655).\n",
" wt_max_seq = c321_seq[c321d_pos_pythonic - 40 : c321d_pos_pythonic] + row['REF']\n",
"# print wt_max_seq\n",
" assert mg1655_seq.find(wt_max_seq)\n",
" fwd_primer_wt_result = primer_design.find_primer(wt_max_seq, start_at_left=False)\n",
" fwd_primer_wt = fwd_primer_wt_result[0]\n",
" fwd_primer_wt_tm = fwd_primer_wt_result[1]\n",
" fwd_primer_wt_delta_g = fwd_primer_wt_result[2]\n",
" \n",
" # Calc mut primer (C321D).\n",
" mut_max_seq = c321_seq[c321d_pos_pythonic - 30 : c321d_pos_pythonic] + row['ALT']\n",
" fwd_primer_mut_result = primer_design.find_primer(mut_max_seq, start_at_left=False)\n",
" fwd_primer_mut = fwd_primer_mut_result[0]\n",
" fwd_primer_mut_tm = fwd_primer_mut_result[1]\n",
" fwd_primer_mut_delta_g = fwd_primer_mut_result[2]\n",
" \n",
" return pd.Series({\n",
" 'fwd_primer_wt': fwd_primer_wt,\n",
" 'fwd_primer_wt_tm': fwd_primer_wt_tm,\n",
" 'fwd_primer_wt_delta_g': fwd_primer_wt_delta_g,\n",
" 'fwd_primer_mut': fwd_primer_mut,\n",
" 'fwd_primer_mut_tm': fwd_primer_mut_tm,\n",
" 'fwd_primer_mut_delta_g': fwd_primer_mut_delta_g\n",
" })\n",
" \n",
"variants_with_fwd_primers_df = experiment_set_pos_ref_alt_mapping_df.join(\n",
" experiment_set_pos_ref_alt_mapping_df.apply(get_primers, axis=1))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"reload(primer_design)\n",
"from Bio.Seq import reverse_complement\n",
"\n",
"def design_reverse_primer(row):\n",
" fwd_primer = row['fwd_primer_mut']\n",
" primer_minus_allele = row['fwd_primer_mut'][:-1 * len(row['ALT'])]\n",
" \n",
" # Get primer start position in sequence.\n",
" match_pos_list = [m.start() for m in re.finditer(primer_minus_allele, c321_seq)]\n",
" assert len(match_pos_list) == 1, \"%d matches for %s\" % (len(match_pos_list), primer_minus_allele)\n",
" amplicon_start_pos = match_pos_list[0]\n",
" \n",
" # Get end position based on target size.\n",
" amplicon_end_pos = amplicon_start_pos + row['TARGET_AMPLICON_SIZE']\n",
" \n",
" # Find the optimal primer.\n",
" max_reverse_candidate = reverse_complement(c321_seq[amplicon_end_pos - 30:amplicon_end_pos])\n",
" rev_primer_result = primer_design.find_primer(max_reverse_candidate, start_at_left=True)\n",
" rev_primer = rev_primer_result[0]\n",
" rev_primer_tm = rev_primer_result[1]\n",
" rev_primer_delta_g = rev_primer_result[2]\n",
" return pd.Series({\n",
" 'rev_primer': rev_primer,\n",
" 'rev_primer_tm': rev_primer_tm,\n",
" 'rev_primer_delta_g': rev_primer_delta_g,\n",
" })\n",
" \n",
"variants_with_all_primers = variants_with_fwd_primers_df.join(\n",
" variants_with_fwd_primers_df.apply(design_reverse_primer, axis=1))\n",
"\n",
"variants_with_all_primers.to_csv(\n",
" 'experiment_3_oligos/experiment_3_primer_targets_2_with_fwd_and_rev_primers.csv', index=False)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Verify the designs. We do this iteratively, manually updating a file externally, so we re-import that file after every fix."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def find_subseq_start_and_assert_unique(subseq, source):\n",
" \"\"\"Helper.\n",
" \"\"\"\n",
" results = [m.start() for m in re.finditer(subseq, source)]\n",
" assert len(results) == 1, results\n",
" return results[0]\n",
"\n",
"def verify_primer_design(row):\n",
" # Make sure the two forward primers actually discriminate the result.\n",
" if row['fwd_primer_mut'].find(row['fwd_primer_wt'][-12:]) > -1:\n",
" return pd.Series({\n",
" 'actual_amplicon_size': -1,\n",
" 'flagged': True,\n",
" 'flagged_reason': \"fwd primers don't discriminate correctly - wt in mut\"\n",
" })\n",
" \n",
" if row['fwd_primer_wt'].find(row['fwd_primer_mut'][-12:]) > -1:\n",
" return pd.Series({\n",
" 'actual_amplicon_size': -1,\n",
" 'flagged': True,\n",
" 'flagged_reason': \"fwd primers don't discriminate correctly - mut in wt\"\n",
" })\n",
"\n",
" # This is specific to Experiment 3 design, where category 2 oligos are new variants\n",
" # relative to C321D, so we use the wt primer seq to search.\n",
" fwd_primer_key = 'fwd_primer_mut'\n",
" if row['OLIGO_CATEGORY'] == 2:\n",
" fwd_primer_key = 'fwd_primer_wt'\n",
" \n",
" # If reversed, we want to search the opposite direction.\n",
" if 'is_reverse' in row and row['is_reverse'] == 1:\n",
" fwd_primer = row['rev_primer']\n",
" rev_primer = row[fwd_primer_key]\n",
" else:\n",
" fwd_primer = row[fwd_primer_key]\n",
" rev_primer = row['rev_primer']\n",
" \n",
" try:\n",
" start_pos = find_subseq_start_and_assert_unique(fwd_primer, c321_seq)\n",
" rev_primer_start = find_subseq_start_and_assert_unique(reverse_complement(rev_primer), c321_seq)\n",
" end_pos = rev_primer_start + len(rev_primer)\n",
" actual_amplicon_size = end_pos - start_pos\n",
" except AssertionError:\n",
" return pd.Series({\n",
" 'actual_amplicon_size': -1,\n",
" 'flagged': True,\n",
" 'flagged_reason': 'fwd primer not found'\n",
" })\n",
" \n",
" return pd.Series({\n",
" 'actual_amplicon_size': actual_amplicon_size,\n",
" 'flagged': False,\n",
" 'flagged_reason': ''\n",
" })"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 51
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"variants_with_all_primers_manual_fixes = pd.read_csv(\n",
" 'experiment_3_oligos/experiment_3_primer_targets_4_manual_fixes.csv')\n",
"\n",
"variants_with_all_primers_manual_fixes_analyzed = variants_with_all_primers_manual_fixes.join(\n",
" variants_with_all_primers_manual_fixes.apply(verify_primer_design, axis=1))\n",
"variants_with_all_primers_manual_fixes_analyzed.to_csv(\n",
" 'experiment_3_oligos/experiment_3_primer_targets_5_manual_fixes_analyzed.csv', index=False)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 52
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Generate ids for IDT order for copy-paste convenience."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"f_wt = []\n",
"f_c321 = []\n",
"r = []\n",
"for idx, row in variants_with_all_primers_manual_fixes_analyzed.iterrows():\n",
" pos = row['POSITION']\n",
" amplicon_size = row['TARGET_AMPLICON_SIZE']\n",
" f_wt.append({'id': 'p_wt_f_' + str(pos), 'amplicon_size': amplicon_size})\n",
" f_c321.append({'id': 'p_c321_f_' + str(pos), 'amplicon_size': amplicon_size})\n",
" r.append({'id': 'p_r_' + str(pos), 'amplicon_size': amplicon_size})\n",
" \n",
"pd.DataFrame(f_wt + f_c321 + r).to_csv('experiment_3_oligos/experiment_3_primer_ids.csv')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 40
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'AGGACTAAATGGTATTCAGAATAGCTAGCT'[-5:]\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 45,
"text": [
"'TAGCT'"
]
}
],
"prompt_number": 45
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment