Sequence Analysis - Biology 382
BIO 382 NAVIGATION BAR ->
HOME
SYLLABUS
ANNOUNCEMENTS
Q & A
LINKS
SEQUENCE ANALYSIS

Biology 382 - Molecular Biology

Computer-based Sequence Analysis / Bioinformatics

Links and Instructions for Week 1


Exercise 1 - Manual translation of cDNA sequence in 3 frames

Using the genetic code handout, translate the DNA sequence on your handout, and determine the reading frame which contains an ORF (open reading frame). You will turn in your manual translation with the rest of the week 1 materials when they are due.


Exercise 2 - Sequence Databases, Internet retrieval of DNA sequence from Genbank

  • Entrez from NCBI, access to integrated biomedical/molecular databases, including Genbank.

  • Visit two other major sequence databases:
    The EMBL Nucleotide Sequence Database at the European Bioinformatics Institute (EBI),
    also associated with ExPASy (Expert Protein Analysis System) based in Switzerland.

    DNA Data Bank of Japan

  • At the Entrez page, click on the word "nucleotide." Enter your accession number (see individual assignments) in the text box and click Go. The default format for the database entry is as a "Genbank Report" - you will see this indicated next Format:. Examine the GenBank report to see all the useful features and information. The nucleotide sequence itself is at the bottom of the page.

    Now, change the selection next to Format: from GenBank to FASTA, by clicking on the link (red arrow & box in illustration below). This will retrieve the sequence in FASTA format (description), which is a minimal, simplified format recognized by most or all sequence analysis programs. This format is used both for nucleotide and amino acid sequences. Save this file.

    Figure illustrating function of aromatic L-amino acid decarboxylase (AADC) aka dopa decarboxylase (DDC) proteins in animals.


    Exercise 3 - Computer translation of DNA sequence to amino acid sequence

  • EXPASY DNA to Protein translation tool - Allows 3 different output formats for your translation: "Verbose," "Compact," or including the nucleotide sequence.

    1. Paste your nucleotide sequence into the window.
    2. Select an output format, and hit the TRANSLATE button.
    3. Examine each of the 3 different ouput forms. Determine which reading frame is correct, and save that translation (e.g., copy & paste into a word document), in the "with nucleotide seq" format.

    4. Check your earlier manual translation using your DNA sequence, using the "with nucleotide seq" format.


    Exercise 4 - Computer alignment of two amino acid sequences

    Use translations of genes retrieved in exercise 2. [Note that you are provided with accession numbers for the corresponding amino acid sequences.]

  • SIM - Alignment Tool for protein sequences at EXPASY

    1. Select the button: User-entered sequence for each sequence window.
    2. Give each sequence a short name
    3. Paste each of your two protein sequences into the two windows.
    4. Change number of alignments to be computed to: 1   --Leave other default settings unchanged
    5. Click SUBMIT button.


    Exercise 5 - Multiple sequence alignment of protein sequences

    Making alignments of related protein sequences is an important technique for a variety of purposes. Alignments show what portions of a molecule are shared between two or more molecules. Alignments are the starting point for determining the relatedness of molecules, and by extension the organisms from which they come.

    Parts of a given protein that are highly conserved (that is, unchanged over evolutionary time) in many distantly related organisms are likely to be important functional regions. In other words, that portion of the protein cannot be altered by mutations in the gene encoding it without destroying or reducing the protein's function. Such mutants are selected against and do not survive. Regions of the protein that can change without disrupting protein function will evolve over time and be different in distantly related organisms.

    For a molecular biologist, those highly conserved regions of a gene/protein can also be a key to isolating a gene from an organism for which have little or no DNA sequence information, as we will discuss later when we examine a technique known as 'degenerate PCR.'

    Some 'proteins of interest'

    1. Download sequences to align

    Using your newly acquired sequence retrieval skills, find and save the sequences of at least 5 related proteins in FASTA format. Save each of these sequences in a new folder, and name them using the accession number.

    For very highly conserved proteins, you can select a wide range of organisms, including animals, plants, fungi - even perhaps bacteria or archea. For other conserved proteins you may wish to restrict yourself to animals or plants or fungi. For most proteins, selecting too closely related a group of sequences will result in an alignment with few differences.

    2. Paste all sequences in FASTA format into a single file

    In one method of performing a multiple sequence alignment, we must first create a plain text file with all the sequences we wish to align in the FASTA format. Open a word processor window on your computer (and stagger the two windows on the screen so you can easily go back and forth between them). Then paste each sequence into a single text document, with a blank line in between each.

    Example FASTA files

    3. Rename the sequences

    For convenience (given the program we are using), in your sequence collection file, rename each sequence with the species name initials, underscore, and short name. Make sure there are no spaces in the name.

    Example: rename the full FASTA description
    >gi|132814447|ref|NM_001082971.1| Homo sapiens dopa decarboxylase (aromatic L-amino acid decarboxylase) (DDC), transcript variant 1, mRNA

    As:
    >Hsa_DDC

    4. Run the alignment with ClustalW2.

    The program we will use for alignment is ClustalW2 and is found at the EBI website. There are others multiple sequence aligmnent programs available, including on the NCBI website.

    Either paste the collected sequences into the large window, or upload the saved file (note that for uploading the file MUST be in a plain text/text only format).

    We will use the program default settings, although we may wish to alter two output parameters (in the lower left):

    Output format - try a couple different formats to see the possibilities

    and

    Output order [you may prefer the order in your input file to be retained - if so, change this selection to 'input' vs. the default 'aligned'

    Note also that the order in which the sequences are input (that is, the order you have them in your FASTA file) can affect the alignment.

    Files you can use for demonstration purposes: Example FASTA files


    Exercise 6 - Access the Protein Structure Database, View Structures

    1. Go to ENTREZ - Structure
    2. Search with one of the following accession numbers:

    Proteins of interest related to serotonin, dopamine & biopterin synthesis

    Interesting DNA binding proteins complexed to DNA

    Or, try the protein of your choice, such as the protein you chose for a multiple alignment previously. Note: not all of these proteins may not have crystal structures.

    3. Click on the entry, which will take you to a MMDB Structure Summary
    4. View structures using FirstGlance in Jmol
    Examine the structure of at least two different proteins. 5. Enter a PDB file number and wait for the image to load. Then you can view it in a variety of ways.

    Extras - look at conservation of structural features of proteins. Open two windows side by side and compare structures of:
    6. Human and E. coli GTP cyclohydrolases
    7. Rat and C. elegans Pyruvoyl Tetrahydropterin Synthases
    8. Mouse and Chlorobium (bacteria) sepiapterin reductases
    9. Various aromatic amino acid hydroxylases
    10. Ubx homeodomain and Lambda Cro helix-turn helix proteins bound to DNA
    11. A bHLH protein (e.g., GCN) and a bZIP protein (e.g., Fos-Jun heterodimer) bound to DNA

    © 2013 Curtis Loer, Dept of Biology, USD. All rights reserved. Simple, hand-coded pages.