lumiHumanIDMapping_nuID {lumiHumanIDMapping.db} | R Documentation |
We mapped nuIDs of Illumina Human chips by BLASTing each probe sequence (converted from nuID) against the the most recent Homo sapiens RefSeq release. The mapping also includes the mapping quality information, like mapping strength, uniqueness, number of hits.
lumiHumanIDMapping_nuID()
The nuID mapping information is kept in the nuID_MappingInfo table in the ID Mapping library. The nuID mapping table includes following fields (columns):
1. nuID: nuID for the probe sequence
2. Strength1: Strength of the best hit. This is measured as the longest contig between the probe and the hit sequence plus the number of bases of identity between the two sequences, divided by the total probe length, normalized to 100 for a perfect identical match.
3. Strength2: Strength of the second best gene hit. We are mapping to Entrez gene ids as multiple RefSeq accessions may have the same Entrez gene accession, reflecting differing splice sites, conflicting gene model evidence, or unresolved curation. As with Strength1, Strength2 is measured as the longest contig between the probe and the hit sequence plus the number of bases of identity between the two sequences, divided by the total probe length, normalized to 100 for a perfect identical match.
4. Uniqueness: (Strength1-Strength2)/Strength1*100.
5. Total hits: Total number of gene models (Entrez gene records) hit by the probe with at least 17 nucleotides
6. Accession: RefSeq gene model Accession number
7. EntrezID: The Entrez Gene ID corresponding to RefSeq Accession number shown in field "Accession"
8. Accession2: RefSeq gene model Accession number for the best hit for the second best gene model (Entrez gene model)
Procedures of nuID mappings:
Briefly, we BLASTed each probe sequence (converted from nuID) against the corresponding RefSeq genome. Then we processed the resulting BLAST run files and identified all hits to a probe sequence that have at least a contiguous hit of 17 nucleotides (17 is generally accepted as a minimum number of contiguous bases required to get a hybridization signal with oligo arrays). We have found that many of the RefSeq models map to the same Entrez gene, so we treat those as single hits and take the best hit defined by expectation value to that model using that probe. We then summarize the total number of Entrez genes hit by a probe, and list the best RefSeq model accession number (if any) and 2nd best RefSeq model accession number (if any) that is to a second Entrez gene. We then score the best hit by giving it a strength, which is the length of the matched sequence plus the length of the longest contiguous sequence in the hit, divided by the total length of the probe sequence and then multiply by 50, giving a strength score that runs from 0-100. This procedure is done for the second best gene model hit as well. A uniqueness score is then calculated, and it is simply strength of the best hit against the first gene model minus the strength of the best hit against the second gene model (in most cases there is not a second model, so this is zero), and this number divided by the strength of the first hit and then multiplied by 100, to again give a number from 0-100. We anticipate that most groups will be interested in only using probes for which the strength of the best model and the uniqueness score are both 95 or above. For more details, please visit website at: https://prod.bioinformatics.northwestern.edu/nuID/
lumiHumanIDMapping_nuID
returns a nuID mapping summary of Illumina Human chips.
1. https://prod.bioinformatics.northwestern.edu/nuID/
2. Du, P., Kibbe, W.A. and Lin, S.M., "nuID: A universal naming schema of oligonucleotides for Illumina, Affymetrix, and other microarrays", Biology Direct 2007, 2:16 (31May2007).
## List the fields in the nuID_MappingInfo table conn <- lumiHumanIDMapping_dbconn() dbListFields(conn, 'nuID_MappingInfo') ## Summary of nuID mapping lumiHumanIDMapping_nuID()