Sequence Analysis - Biology 382
BIO 382 NAVIGATION BAR ->
HOME
SYLLABUS
ANNOUNCEMENTS
Q & A
LINKS
SEQUENCE ANALYSIS

Biology 382 - Molecular Biology

Computer-based Sequence Analysis / Bioinformatics

Required BLAST readings are on the links page.


Links & Instructions for Week 2 Bioinformatics


Exercises 7 and 8 - Searching for genes among raw genomic DNA sequence using a 'genefinder' and BLAST.

Directory of C. remanei genomic sequences - find your assigned genomic sequences.

Exercise 7

FGENESH - a gene prediction program

1. Paste your assigned genomic sequence into the sequence window (or browse the appropriate file if you have downloaded it).

2. Select "C. elegans" for Organism.

3. Select the following "advanced options" from the list below:
- print mRNA sequences for predicted genes.
4. Click the "Search" button.

5. Save the resulting file to use in a later computer analysis lab as well. Beside the basic information on the predicted genes found in your genomic sequence, you will use the predicted mRNA sequences in exercise 7A below, and the predicted protein sequences in exercise 7C below for each of your predicted genes.

Guide to interpreting the FGENESH results

More FGENESH help/ino


Exercise 8 - Analyze each of your predicted genes from the FGENESH results

A. Determine whether each of your predicted genes has an associated cDNA or EST. To qualify, the sequence should have 100% identity (or nearly) over an extended region and a very low E value.

1. Go to the NCBI BLAST Server

2. Under Basic BLAST, click on nucleotide blast.

3. Paste your nucleotide sequence into the large window under "Enter Query Sequence"

4. Alter the settings under "Choose Search Set" in the following manner:
a. For "Database", click "Others"; then on the pull-down menu select "Expressed sequence tags (est)"
b. For "Organism" type "Caenorhabditis remanei" (no quotes) into the text window,

5. Select the "Show results in a new window" checkbox next to the BLAST button.

6. Click "Algorithm parameters"
For "Max target sequences" select "10"

7. Click the BLAST button.

B. Search for possible genes NOT predicted by your genefinder (FGENESH) using BLASTX and/or TBLASTX

A first look might use blastx

1. From the NCBI BLAST Server (1. above), under Basic BLAST, click on blastx

2. Paste your assigned genomic sequence into the sequence window, and click the BLAST button (we will used default parameters).

See the printed handout for additional information about interpreting your results from B. & C. and doing further analyses.

A more computation-intensive approach that can find previously matches with previously unidentified (never predicted) proteins would use tblastx

Note the locations of homologies detected relative to proteins predicted by FGENESH - how well do they match up? Did FGENESH miss predictions of some regions detected by BLASTX or TBLASTX? Are there any regions detected with TBLASTX that were not detected with BLASTX? Make a note of these regions, if found.

This method compares everything at once, and can be a bit overwhelming initially. To focus on individual gene/protein predictions, proceed to the next step.

C. Determine whether proteins predicted by FGENESH have matches in the protein database.

1. Go to the NCBI BLAST Server

2. Under Basic BLAST, click on protein blast (blastp).

3. Paste each predicted amino acid sequence from your FGENESH analysis into the large window under "Enter Query Sequence"
All the other parameters can be left at their default settings.

5. Select the "Show results in a new window" checkbox, then click the BLAST button.

Note also if there are any Conserved Domains found in your search. (This information will come up immediately after initiating the search in the 'Format' window.)


Miscellaneous resources for getting information on your predicted genes:
  • WormBase - C. elegans whole organismal database

  • Kyoto Encyclopedia of Genes and Genomes (KEGG) - extensive database of metabolic pathways and gene function information.

    Example of further analysis of Cr0062a.seq predicted genes
    GeneEST?C. elegans homologConserved DomainsType of protein encoded
    Gene 1nocat-1 gene/AAG00026KOG6734Vesicular monoamine transporter
    Gene 2nononenoneno significant matches
    Gene 3nononeKOG0293/WD40WD40 repeat protein
    Gene 4noZK550.3KOG2089Oligopeptidase
    Gene 5yesmsh-1/H26D21.2KOG0219Mismatch repair protein MutS family


    © 2013 Curtis Loer, Dept of Biology, USD. All rights reserved. Simple, hand-coded pages.