Directory of C. remanei genomic sequences - find your assigned genomic sequences.
Exercise 7
FGENESH - a gene prediction program
1. Paste your assigned genomic sequence into the sequence window (or browse the appropriate file if you have downloaded it).
2. Select "C. elegans" for Organism.
3. Select the following "advanced options" from the list below:
- print mRNA sequences for predicted genes.
4. Click the "Search" button.
5. Save the resulting file to use in a later computer analysis lab as well. Beside the basic information on the predicted genes found in your genomic sequence, you will use the predicted mRNA sequences in exercise 7A below, and the predicted protein sequences in exercise 7C below for each of your predicted genes.
Guide to interpreting the FGENESH results
A. Determine whether each of your predicted genes has an associated cDNA or EST. To qualify, the sequence should have 100% identity (or nearly) over an extended region and a very low E value.
1. Go to the NCBI BLAST Server
2. Under Basic BLAST, click on nucleotide blast.
3. Paste your nucleotide sequence into the large window under "Enter Query Sequence"
4. Alter the settings under "Choose Search Set" in the following manner:
a. For "Database", click "Others"; then on the pull-down menu select "Expressed sequence tags (est)"
b. For "Organism" type "Caenorhabditis remanei" (no quotes) into the text window,
5. Select the "Show results in a new window" checkbox next to the BLAST button.
6. Click "Algorithm parameters"
For "Max target sequences" select "10"
7. Click the BLAST button.
B. Search for possible genes NOT predicted by your genefinder (FGENESH) using BLASTX and/or TBLASTX
A first look might use blastx
1. From the NCBI BLAST Server (1. above), under Basic BLAST, click on blastx
2. Paste your assigned genomic sequence into the sequence window, and click the BLAST button (we will used default parameters).
See the printed handout for additional information about interpreting your results from B. & C. and doing further analyses.
A more computation-intensive approach that can find previously matches with previously unidentified (never predicted) proteins would use tblastx
Note the locations of homologies detected relative to proteins predicted by FGENESH - how well do they match up? Did FGENESH miss predictions of some regions detected by BLASTX or TBLASTX? Are there any regions detected with TBLASTX that were not detected with BLASTX? Make a note of these regions, if found.
This method compares everything at once, and can be a bit overwhelming initially. To focus on individual gene/protein predictions, proceed to the next step.
C. Determine whether proteins predicted by FGENESH have matches in the protein database.
1. Go to the NCBI BLAST Server
2. Under Basic BLAST, click on protein blast (blastp).
3. Paste each predicted amino acid sequence from your FGENESH analysis into the large window under "Enter Query Sequence"
All the other parameters can be left at their default settings.
5. Select the "Show results in a new window" checkbox, then click the BLAST button.
Note also if there are any Conserved Domains found in your search. (This information will come up immediately after initiating the search in the 'Format' window.)
Example of further analysis of Cr0062a.seq predicted genes
Gene | EST? | C. elegans homolog | Conserved Domains | Type of protein encoded |
Gene 1 | no | cat-1 gene/AAG00026 | KOG6734 | Vesicular monoamine transporter |
Gene 2 | no | none | none | no significant matches |
Gene 3 | no | none | KOG0293/WD40 | WD40 repeat protein |
Gene 4 | no | ZK550.3 | KOG2089 | Oligopeptidase |
Gene 5 | yes | msh-1/H26D21.2 | KOG0219 | Mismatch repair protein MutS family |
© 2013 Curtis Loer, Dept of Biology, USD. All rights reserved. Simple, hand-coded pages.