FIGURE 1 - Nucleotide sequence of pCL4.1 cDNA and deduced amino acid sequence of C. elegans phenylalanine hydroxylase protein.
Nucleotide and amino acid sequence numbers are indicated above and below the sequence, respectively. A consensus translation start site and polyadenylation signal (underlined) are found in the sequence. A conserved CAM Kinase II phosphorylation site at Ser-273 is marked with an asterisk. The locations of introns, as determined by sequencing of genomic clones (and confirmed by GSC sequencing), are indicated by carets just below the nucleotide sequence. The phase of the intron is indicated by the placement of the caret: introns inserting in Frame 0 are indicated with carets between codons, in Frame I with carets pointing to the first letter of a codon, and Frame II with carets pointing to the third letter of a codon (See also Fig. 3). The Genbank Accession number for this sequence is AF119388.
gcaacgtacggctcacgttttgtctctgtttaattcttatttcccggtgactgtctattcctctgaaaaccaaatcttgttctctgaaa
90
atg cca cca gct gga caa gat gat ctt gac ttc ttg aag tac gcc atg gaa tcg tac gtt gct gac gtc aac gcc
M P P A G Q D D L D F ^ L K Y A M E S Y V A D V N A
1
165
gac att ggc aag act act atc gta ttc act ctt cgt gaa aag gca gga gct ctc gct gaa aca ttg aag ctg ttc
D I G K T T I V F T L R E K A G A L A E T L K L F
26
240
cag gca cat gat gtg aat ctg tct cac att gaa tca aga cca tca aga ctc atg aag gat gct atg agg tgc tcg
Q ^ A H D V N L S H I E S R P S R L M K D A M R C S
51
315
ttg aat ttg ctg aag ctg aag acc atc gta aga ttg aag gag tta ttg agc att tcc aac aaa aag ctg aaa aga
L N L L K L K T I V R L K E L L S I S N K K L K R
76
390
agg ttc ttg ttc aag act gga aca cca aaa aca aaa caa aac aag gat tct gtt cca tgg ttc cca cag aag atc
R F L F K T G T P K T K Q N K ^D S V P W F P Q K I
101
465
aac gac atc gat caa ttt gcc aac aga att ctt tct tat gga gct gag ctt gat gcc gat cac cct gga ttt aag
N D I D Q F A N R I L S Y G A E L D A D H P G F K
126
540
gac atg acc tac cgc gag cgc aga aag ttt ttc gcc gat att gca ttc aac ttc aaa cac gga gac aag atc cct
D M T Y R E R R K F F A D I A F N F K H^ G D K I P
151
615
act atc acc tac act gat gaa gaa att gcc acg tgg cgt aca gtc tac aac gag ctg aca gtt atg tac ccg aaa
T I T Y T D E E I A T W R T V Y N E L T V M Y P K
176
690
aac gct tgc caa gag ttc aac tac ata ttc cca ctc ctc cag cag aat tgt ggt ttt gga cct gac cgc att cca
N A C Q E F N Y I F P L L Q Q N C G F G P D R I P
201
765
caa ttg cag gat gtt tca gat ttt ttg aag gat tgt acc ggg tac acg att cga cca gtc gct ggt ctc ctt tct
Q L Q D V S D F L K ^D C T G Y T I R P V A G L L S
226
840
cct cgt gat ttc ttg gct ggt tgg gcc ttc cgt gtt ttt cat tcc aca caa tac att cgc cat cat tcc gct cca
P R D F L A G W A F R V F H S T Q Y I R H H S A P
251 *
915
aaa tac aca cct gaa cca gat atc tgc cac gag ctt ctg gga cat gtt cca cta ttt gct gat gtt gaa ttt gca
K Y T P E P^ D I C H E L L G H V P L F A D V E F A
276
990
caa ttc tca cag gaa atc ggt ctt gct tct ctt gga gct cca gat gat gtt att gaa aaa ctt gcc aca ctc tac
Q F S Q E I G L A S L G A P D D V I E K L A T ^ L Y
301
1065
tgg ttc aca atc gaa ttt gga atc tgt caa caa gat ggg gag aaa aaa gct tac gga gcc gga ctt ttg agt tca
W F T I E F G I C Q Q D G E K K A Y G A G L L S S
326
1140
ttt gga gag ctt caa tat gcg ttg agt gat aag ccg gaa gtt gta gat ttt gat cca gct gta tgt tgt gtc acc
F G E L Q Y A L S D K P E V V D F D P A V C C V T
351
1215
aaa tat cca atc aca gaa tat cag cca aag tat ttc tta gct gaa tca ttt gca agt gct aag aac aaa ctt aaa
K Y P I T E Y Q P K Y F L A E S F A S A K N K L K^
376
1290
tca tgg gca gct acc atc aat cgt cca ttc caa att cgt tat aat gct tac act caa cga gtt gaa att ctc gac
S W A A T I N R P F Q I R Y N A Y T Q R V E I L D
401
1365
aag gta gca gca ctt caa cgt ctc gca aga gac atc aga agt gat att tct act ttg gaa gaa gct ctt gga aaa
K V A A L Q R L A R D I R S D I S T L E E A L G K
426
1440
gtg aac aat ctc aag atg aag tga ctttttgattaaattgttaatttgcaagaataaatgtacattgttca 21
V N N L K M K @ ------
451