USD - Biology 482 - Sequence Analysis

Sequence Analysis, Biology 482, Spring 2006

BIO 482 NAVIGATION BAR ->

HOME

SYLLABUS

ANNOUNCEMENTS

Q & A

LINKS

SEQUENCE ANALYSIS

Biology 482 - Molecular Biology

Computer-based Genomic Sequence Analysis

Links for Week 2

Basic Cloning Strategies illustrations

Reporter Fusion Cloning Strategies illustrations

Directory of C. remanei genomic sequences - Cr_seq - find your assigned genomic sequences.

FGENESH - gene prediction program, in case you still need it.

WWWtacg - the web-based version of the restriction analysis program 'tacg' by Harry Mangalam (thanks, Harry!)

Use site from the top row if available: WWWtacg v3 - Univ of Victoria, BC WWWtacg v3 - Univ Mass Medical School

Use bottom row sites if others aren't working: WWWtacg v4.1 - UC Irvine WWWtacg v4.1 - Cal State Fullerton

Restriction Site File for Analyses - ~RE2128MCS_Comp - Download to your local computer

Directory of C. remanei genomic sequences - Cr_seq - find your assigned genomic sequences.

Examples of Cloning Strategy Reports

These are both acceptable forms; essentially the same information is conveyed:

cb089c8.1_gene5 Strategy

F32G8.6 Strategy

Location of blunt sites in F32G8.6 genomic sequence

Extra credit "pretty" versions of genes:

cb089c8.1_gene5

F32G8.6

Links and Instructions for Week 1

Required reading:

Similarity Search page - Introduction to the concepts and vocabulary of sequence similarity searching
BLAST Query tutorial from NCBI.

Exercise 1 - Manual translation of cDNA sequence in 3 frames

Using the genetic code printed on the back of your sequence analysis lab handout, translate the DNA sequence on your handout, and determine the reading frame which contains an ORF (open reading frame). You will turn in your manual translation with the rest of the week 1 materials when they are due.

Exercise 2 - Internet retrieval of DNA sequence from Genbank

Entrez from NCBI, access to integrated biomedical/molecular databases, including Genbank.

Once you reach the Entrez page, click on the word "nucleotide." Enter your accession number (see individual assignments) in the text box and click GO. At the next window, change the selection next to Display from Summary to FASTA, then click Display. This will retrieve the sequence in FASTA format.

Exercise 3 - Computer translation of DNA sequence to amino acid sequence.

EXPASY DNA to Protein translation tool - Allows 3 different output formats for your translation: "Verbose," "Compact," or including the nucleotide sequence.

1. Paste your nucleotide sequence into the window.
2. Select an output format, and hit the TRANSLATE button.
3. Examine each of the 3 different ouput forms - which reading frame has the ORF?

4. Confirm your earlier manual translation using your DNA sequence -- check the accuracy of your translation by putting this sequence into the "DNA to Protein translation tool."

Exercise 4. Computer alignment of two amino acid sequences.

Use translations of genes retrieved in exercise 2. [Note that you are provided with accession numbers for the corresponding amino acid sequences.]

SIM - Alignment Tool for protein sequences at EXPASY

1. Select the button: User-entered sequence for each sequence window.
2. Give each sequence a short name
3. Paste each of your two protein sequences into the two windows.
4. Change number of alignments to be computed to: 1 --Leave other default settings unchanged
5. Click SUBMIT button.

Exercise 5 - Access the Protein Structure Database, View Structures

1. Go to ENTREZ - Structure
2. Search with one of the following numbers:

6CRO - Lambda Cro protein monomer bound to Operator DNA
1LMB - Lambda Repressor protein dimer bound to Operator DNA
1B8I - Ubx homeodomain protein bound to DNA
1MYD - MyoD basic helix-loop-helix (bHLH) protein bound to DNA
1AXC - GCN basic leucine zipper (bZIP) protein bound to DNA
1R40 - Glucocorticoid receptor protein bound to DNA
1AAY - ZIF268 Zinc finger protein bound to DNA

3. Click on the entry, which will take you to a MMDB Structure Summary

New Method using Cn3D

The program Cn3D should be found on your desktop.

4. Download a few mmdb flies for use with Cn3D from those I have already stored locally at MMDB files.
5. After downloading a file to the desktop, open one by double-clicking or dragging and dropping on the Cn3D icon.
6. Rotate the molecule by clicking and dragging in the window. Under the "Styles" menu, you can alter the look using the "Rendering Shortcuts" and/or "Coloring shortcuts". 7. Examine the structures of some of the molecules we have discussed in class.

Old method using RasMol (or similar)
4. Next to "Reference:" click on the link "PDB:"
5. Under "Summary Information" column at left, click "Download/Display File"
6. Under "Download the Structure File," click on "X" under Compression - None, and file format - PDB. Save the file to the desktop.
7. Under the Netscape file menu, select "Open file" and select the .pdb file on the desktop.
8. Once the structure is loaded (with default 'wireframe' representation), click and hold in the window, and select under "Display" - "Ribbons."
9. Have fun looking at the molecule. Click and drag in the window to rotate the molecule. Hold down the shift key while clicking and dragging to zoom in or out on the molecule
10. To see more features of the protein, try some of the different coloring schemes (under "Color"):

Structure - shows alpha helices and beta sheets in different colors
Chain - will mark individual subunits with different colors (not exciting for single polypeptide)
Group - gives different colors to different regions of the protein
Amino acid - creates a multicolored polypeptide chain with a different color representing each amino acid

Extras
11. Compare structures of Lambda Cro and Lambda Repressor proteins bound to DNA
12. Compare structures of Ubx homeodomain and Lambda Cro helix-turn helix proteins bound to DNA
13. Compare structures of a bHLH protein (e.g., GCN) and a bZIP protein (e.g., Fos-Jun heterodimer) bound to DNA

MMDB files to use with Cn3D

Exercises 6 and 7 - Searching for genes among raw genomic DNA sequence using a 'genefinder' and BLAST.

Directory of C. remanei genomic sequences - find your assigned genomic sequences.

Exercise 6

FGENESH - a gene prediction program

1. Paste your assigned genomic sequence (from the Cr_seq directory) into the sequence window.
2. Select "C. elegans" for Organism.
3. Enter the following into the "advanced options" box:
-pmrna -pexons
This will also show the predicted mRNA and each individual exon from each predicted gene in your sequence. Save this file to use in next week's computer lab. You will use the complete predicted mRNA sequences in exercise 7A below, and the predicted amino acid sequences in exercise 7B below.
4. Click the Search button. You will use this information for exercise 7B.

Exercise 7 - Analyze each of your predicted genes from the FGENESH results

A. Determine whether each of your predicted genes has an associated cDNA or EST. To qualify, the sequence should have 100% identity (or nearly) over an extended region and a very low E value.

1. Go to BLASTN at NCBI.

2. Paste in the nucleotide sequence from one of your predicted genes (found in the second FGENESH file you created).

3. Alter the settings in the following manner:
a. Under "Choose Database", select "est"
b. Under "Options" in the "Limit by entrez query" text window, type "C. remanei" (no quotes)
c. Under "Format", for both "Descriptions" and "Alignments" select "10".

Click the Search button.

B. Determine whether your predicted proteins have matches in the protein database

1. Go to the NCBI BLAST Server

2. Under Protein, click on Protein-Protein BLAST [blastp].
3. Paste each of the amino acid sequences from your predicted genes from your FGENESH analysis, and do the BLAST search.

Note also if there are any Conserved Domains found in your search.

See your handout for additional instructions Miscellaneous resources for getting information on your predicted genes:

WormBase - C. elegans whole organismal database

Kyoto Encyclopedia of Genes and Genomes (KEGG) - extensive database of metabolic pathways and gene function information.

PubMed - Biomedical literature search utility at NCBI

GoogleScholar - literature search through Google

Google

Example of further analysis of Bm3859 predicted genes

Gene EST? C. elegans homolog Conserved Domains Type of protein encoded

Gene 1 no cat-1 gene/AAG00026 KOG6734 Vesicular monoamine transporter

Gene 2 no none none no significant matches

Gene 3 no none KOG0293/WD40 WD40 repeat protein

Gene 4 no ZK550.3 KOG2089 Oligopeptidase

Gene 5 yes msh-1/H26D21.2 KOG0219 Mismatch repair protein MutS family

Use site from the top row if available:	WWWtacg v3 - Univ of Victoria, BC	WWWtacg v3 - Univ Mass Medical School
Use bottom row sites if others aren't working:	WWWtacg v4.1 - UC Irvine	WWWtacg v4.1 - Cal State Fullerton

Gene	EST?	C. elegans homolog	Conserved Domains	Type of protein encoded
Gene 1	no	cat-1 gene/AAG00026	KOG6734	Vesicular monoamine transporter
Gene 2	no	none	none	no significant matches
Gene 3	no	none	KOG0293/WD40	WD40 repeat protein
Gene 4	no	ZK550.3	KOG2089	Oligopeptidase
Gene 5	yes	msh-1/H26D21.2	KOG0219	Mismatch repair protein MutS family