Sequence Analysis, Biology 482, Spring 2007
BIO 482 NAVIGATION BAR ->
HOME
SYLLABUS
ANNOUNCEMENTS
Q & A
LINKS
SEQUENCE ANALYSIS

Biology 482 - Molecular Biology

Computer-based Genomic Sequence Analysis

Directory of C. remanei genomic sequences - Cr_seq - find your assigned genomic sequences.

Links for Week 2

Basic Cloning Strategies illustrations

Reporter Fusion Cloning Strategies illustrations

Directory of C. remanei genomic sequences - Cr_seq - find your assigned genomic sequences.

FGENESH - gene prediction program, in case you still need it.

WWWtacg - the web-based version of the restriction analysis program 'tacg' by Harry Mangalam (thanks, Harry!)
Use site from the top row if available: WWWtacg v3 - Univ of Victoria, BC WWWtacg v3 - Univ Mass Medical School
Use bottom row sites if others aren't working: WWWtacg v4.1 - UC Irvine WWWtacg v4.1 - Cal State Fullerton

Restriction Site File for Analyses - ~RE2128MCS_Comp - Download to your local computer

Directory of C. remanei genomic sequences - Cr_seq - find your assigned genomic sequences.


Examples of Cloning Strategy Reports

These are both acceptable forms; essentially the same information is conveyed:

cb089c8.1_gene5 Strategy

F32G8.6 Strategy

Location of blunt sites in F32G8.6 genomic sequence

Extra credit "pretty" versions of genes:

cb089c8.1_gene5

F32G8.6


Links and Instructions for Week 1

Required reading:

Exercise 1 - Manual translation of cDNA sequence in 3 frames

Using the genetic code printed on the back of your sequence analysis lab handout, translate the DNA sequence on your handout, and determine the reading frame which contains an ORF (open reading frame). You will turn in your manual translation with the rest of the week 1 materials when they are due.


Exercise 2 - Internet retrieval of DNA sequence from Genbank

  • Entrez from NCBI, access to integrated biomedical/molecular databases, including Genbank.

  • Once you reach the Entrez page, click on the word "nucleotide." Enter your accession number (see individual assignments) in the text box and click GO. At the next window, change the selection next to Display from Summary to FASTA, then click Display. This will retrieve the sequence in FASTA format.
    Exercise 3 - Computer translation of DNA sequence to amino acid sequence.

  • EXPASY DNA to Protein translation tool - Allows 3 different output formats for your translation: "Verbose," "Compact," or including the nucleotide sequence.

    1. Paste your nucleotide sequence into the window.
    2. Select an output format, and hit the TRANSLATE button.
    3. Examine each of the 3 different ouput forms, print the FIRST PAGE ONLY from the final ("with nucleotide seq") format.

    4. Check your earlier manual translation using your DNA sequence.


    Exercise 4. Computer alignment of two amino acid sequences.

    Use translations of genes retrieved in exercise 2. [Note that you are provided with accession numbers for the corresponding amino acid sequences.]

  • SIM - Alignment Tool for protein sequences at EXPASY

    1. Select the button: User-entered sequence for each sequence window.
    2. Give each sequence a short name
    3. Paste each of your two protein sequences into the two windows.
    4. Change number of alignments to be computed to: 1   --Leave other default settings unchanged
    5. Click SUBMIT button.


    Exercise 5 - Access the Protein Structure Database, View Structures

    1. Go to ENTREZ - Structure
    2. Search with one of the following numbers:

    3. Click on the entry, which will take you to a MMDB Structure Summary
    4. View structures using FirstGlance in Jmol
    5. Enter a PDB file number and wait for the image to load. Then you can view it in a variety of ways.

    Extras
    6. Compare structures of Lambda Cro and Lambda Repressor proteins bound to DNA
    7. Compare structures of Ubx homeodomain and Lambda Cro helix-turn helix proteins bound to DNA
    8. Compare structures of a bHLH protein (e.g., GCN) and a bZIP protein (e.g., Fos-Jun heterodimer) bound to DNA

    PDB files of noteworthy molecules


    Exercises 6 and 7 - Searching for genes among raw genomic DNA sequence using a 'genefinder' and BLAST.

    Directory of C. remanei genomic sequences - find your assigned genomic sequences.

    Exercise 6

    FGENESH - a gene prediction program

    1. Paste your assigned genomic sequence into the sequence window (or browse the appropriate file if you have downloaded it).

    2. Select "C. elegans" for Organism.

    3. Select the following "advanced options" from the list below:
    - print mRNA sequences for predicted genes.
    - print exon sequences for predicted genes.

    4. Click the "Search" button.

    5. Save the resulting file to use in next week's computer lab as well. Beside the basic information on the predicted genes found in your genomic sequence, you will use the predicted mRNA sequences in exercise 7A below, and the predicted protein sequences in exercise 7B below for each of your predicted genes.


    Exercise 7 - Analyze each of your predicted genes from the FGENESH results

    A. Determine whether each of your predicted genes has an associated cDNA or EST. To qualify, the sequence should have 100% identity (or nearly) over an extended region and a very low E value.

    1. Go to the NCBI BLAST Server

    2. Under Basic BLAST, click on nucleotide blast.

    3. Paste your nucleotide sequence into the large window under "Enter Query Sequence"

    4. Alter the settings under "Choose Search Set" in the following manner:
    a. For "Database", click "Others"; then on the pull-down menu select "Expressed sequence tags (est)"
    b. For "Organism" type "Caenorhabditis" (no quotes) into the text window,

    5. Select the "Show results in a new window" checkbox next to the BLAST button.

    6. Click "Algorithm parameters"
    For "Max target sequences" select "10"

    7. Click the BLAST button.

    B. Determine whether your predicted proteins have matches in the protein database.

    1. Go to the NCBI BLAST Server

    2. Under Basic BLAST, click on protein blast (blastp).

    3. Paste each predicted amino acid sequence from your FGENESH analysis into the large window under "Enter Query Sequence"

    All the other parameters can be left at their default settings.

    5. Select the "Show results in a new window" checkbox, then click the BLAST button.

    Note also if there are any Conserved Domains found in your search. (This information will come up immediately after initiating the search in the 'Format' window.)

    Miscellaneous resources for getting information on your predicted genes:

  • WormBase - C. elegans whole organismal database

  • Kyoto Encyclopedia of Genes and Genomes (KEGG) - extensive database of metabolic pathways and gene function information.

    Example of further analysis of Cr0062a.seq predicted genes
    GeneEST?C. elegans homologConserved DomainsType of protein encoded
    Gene 1nocat-1 gene/AAG00026KOG6734Vesicular monoamine transporter
    Gene 2nononenoneno significant matches
    Gene 3nononeKOG0293/WD40WD40 repeat protein
    Gene 4noZK550.3KOG2089Oligopeptidase
    Gene 5yesmsh-1/H26D21.2KOG0219Mismatch repair protein MutS family