- Plain english: "I want all the metalloproteins that have R-values less than 0.15, and would like to see the pdb id, deposition date, R-value, resolution and how it was acquired"
SQL: SELECT source_id,dep_date,r_value,resolution FROM protein WHERE r_value
< 0.15
- Plain english: "How about the site id, number of ligands and number of those
ligands that are aminoacids for Ni
containing proteins? and I will like it to be ordered by the number of ligands"
SQL: SELECT site_id,num_ligands,nlig_protein FROM site WHERE metal='ni' ORDER BY
num_ligands
- Plain english: "Bet you cannot tell me the pdb ids, resolution and deposition
dates for all the Mn containing proteins that are not NMR structures, and do not forget
to order them by deposition date and resolution"
SQL: SELECT DISTINCT
site.source_id,protein.resolution,protein.dep_date
FROM protein,site where site.metal='mn' AND protein.expdata NOT LIKE '%nmr%' AND protein.source_id=site.source_id
ORDER BY protein.dep_date,protein.resolution
- Plain english: "I would like to see the descriptions of Zn containing
proteins with more than one ligand of a metal type (e.g. trinuclear or higher
sites)"
SQL: The wrong command will be:
SELECT protein.description FROM protein,site WHERE site.metal='zn' AND
site.nlig_metal > 1
and the right way will be to use:
SELECT DISTINCT protein.description FROM protein,site WHERE site.metal='zn' AND
site.nlig_metal > 1 AND protein.source_id=site.source_id
Explanation: The first SQL will ask the database engine to join the two tables
(protein and site) in all the possible ways, generating a humongous
intermediate joined table, and then selecting all the matching rows, duplicates
an all. By using "DISTINCT" in the second SQL we get rid of duplicate joined
rows, and by using the join condition "protein.source_id=site.source_id", we
limit the size of the intermediate joined table. What is more, by putting the
joining condition last, and the "filtering" conditions first the number of
rows from the "site" table to be joined is diminished, resulting in a faster
search. In SQL (and w/o optimizing engines), order matters.
Challenge
Figure out what the following monstrosity does:
SELECT DISTINCT
protein.source_id,site.metal,ligand.lig_symbol,ligand.lig_type,ligand.lig_atom,ligand.lig_seq,ligand.metal_lig_dist,protein.description,ligand.lig_chain
FROM protein,site,ligand WHERE site.metal='zn' AND site.num_ligands > 3 AND
site.nlig_metal > 0 AND ligand.lig_type='m'
AND protein.source_id=site.source_id AND site.site_id=ligand.site_id
ORDER BY protein.source_id,ligand.lig_chain
|