(Note: Click on the first result in each of the search results pages linked to throughout the post to see this feature in action.)
A scanner is a wonderful tool. Every day, people all over the world post scanned documents online -- everything from official government reports to obscure academic papers. These files usually contain images of text, rather than the text themselves.But all of these documents have one thing in common: someone somewhere thought they were they were valuable enough to share with the world.
In the past, scanned documents were rarely included in search results as we couldn't be sure of their content. We had occasional clues from references to the document-- so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe's PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words -- words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world's information accessible and useful.
While we've indexed documents saved as PDFs for some time now, scanned documents are a lot more difficult for a computer to read. Scanning is the reverse of printing. Printing turns digital words into text on paper, while scanning makes a digital picture of the physical paper (and text) so you can store and view it on a computer. The scanned picture of the text is not quite the same as the original digital words, however -- it is a picture of the printed words. Often you can see telltale signs: the ring of a coffee cup, ink smudges, or even fold creases in the pages.
To people reading these documents, the distinction between words and pictures of words makes little difference, but for a computer the picture is almost unintelligible. Consider a circle. Should it be read it as a zero, the letter 'O', just a circle, or the ring from my coffee cup? People learn to answer this kind of question very quickly, but for the computer it is a painstaking and error-prone process.
To see our new system at work, click on these search queries. Note the document excerpt in the search results, along with the full text presented after the 'View as HTML' link:
[repairing aluminum wiring]
[spin lock performance]
[Mumps and Severe Neutropenia]
[Steady success in a volatile world]
Related Posts
Google's movie showtimes, digitally remastered
11 Nov 20090Did you know you can immediately discover movie times and locations by typing a simple search on Goo...Read more »
World Bank public data, now in search
11 Nov 20090When we first launched public data on Google.com, we wanted to make statistics easier to find and to...Read more »
Locking SafeSearch
11 Nov 20090When you're searching on Google, we think you should have the choice to keep adult content out of yo...Read more »
Finding flu vaccine information in one easy place
10 Nov 20090This year, it's especially important to have clear information on what you can do to prepare for the...Read more »
Happy 40th birthday Sesame Street!
09 Nov 20090It's hard to believe, but today marks the 40th anniversary of Sesame Street! Over the past four deca...Read more »
Đăng ký:
Đăng Nhận xét (Atom)
0 nhận xét:
Đăng nhận xét
Click to see the code!
To insert emoticon you must added at least one space before the code.