|

Winter 2006 Departments
Exchange
Prerequisite
Extended Family
Foundation News
Alumni Association News
Zip 01003
Books Received
Alumni Photos
Features
Why You Should Love Polymers
Where There's Spark
Falling for Shelburne Falls
Where Are They Now?
Lessons in the Sand
|
 |
Prerequisite
|
George Washington Wrote Here
Computer scientists are unlocking the past in handwritten texts
|
—Patricia Sullivan
|
 |
 |
 |
 |
 |
Project contributor Toni Rath earned his Ph.D. in computer science and now works for Google in California. The project’s other main researcher, Victor Lavrenko, is a post-doctoral candidate at the Center for Intelligent Information Retrieval. Partial funding came from the National Science Foundation and the National Endowment for the Humanities. (photo by Ben Barnhart) |
 |
CONSIDER FOR A MOMENT THE vast trove of handwritten material in the world’s libraries, archives, museums, and attics. These countless millions of pages of historical documents, diaries, letters, ships’ logs, and scientific notes hold incredible potential for discovery.
Unfortunately, most of this rich material is so electronically inaccessible that “you could say it doesn’t exist,” according to R. Manmatha, research assistant professor of computer science at UMass Amherst. “Currently, you can search only handwritten documents that have been transcribed word by word or indexed page by page. This is expensive and takes a lot of work.”
Last year, Manmatha and his colleagues at the UMass Center for Intelligent Information Retrieval proved that it is indeed possible to solve the very difficult problem of searching handwritten historical documents. In 10 years, he believes, a manuscript retrieval system could be commercially deployed; you might search Isaac Newton’s notes or hunt through scrawled genealogical records from your home computer.
The new tool functions somewhat like a language translation program. For example, to “teach” a computer how to translate from French into English, programmers use documents published in both French and English. The translated documents work like a rich Rosetta Stone that can then help search other documents.
“The breakthrough was when we took this analogy and applied it to handwritten material,” explains Manmatha.
Researchers used scanned images of George Washington’s personal papers to develop the system. Undergrads converted a portion of his letters, orders, and documents into computer text. Researchers then applied this learning model to 1,000 handwritten pages. Washington’s papers posed particular challenges because he had at least three secretaries, whose handwriting varied significantly. And, military orders do not exactly make scintillating reading.
“It’s a lot of ‘we need blankets,‘ or ‘could you send money?’” says researcher Toni Rath ’05. It’s also a tedious process; each word was typed in and its location on the page noted as well. “After you’ve done a couple of pages you need to take a break.”
Right now, Manmatha’s team is focusing on making the system quicker and more accurate. Search engine leader Google (where Manmatha worked last summer) and eager scholars are interested in the project.
One is curator Christopher Conroy of The Museum of Vertebrate Zoology at the University of California, Berkeley. Conroy says handwriting retrieval has great potential to sift through reams of field notes that date from when the museum was founded, in 1908. Although the museum has scanned many of these papers, scientists still have to decipher every handwritten page to find relevant information. Searching capability would allow them to simply type “salamander,” and locate all references to that creature in all scanned notes.
Raymond S. Bradley, geosciences professor and director of the UMass Climate System Research Center, says that handwriting recognition could further the study of weather and natural systems. “If you could automate a search of New England farmers’ diaries, for example, it opens up many possibilities. You could much more easily discover when the corn ripened in various places and when peepers were first heard in the spring.”
You can check out the demonstration software (and read about General Washington’s woes) at http://ciir.cs.umass.edu/research/wordspotting |
|
 |
[top of page]
|
 |
 |
 |
Yo-Yo Champion
Yo-Yo Champion: more images
From China, With Love
From China, With Love: larger image
Court of Honor
With Each Stitch, Hope
With Each Stitch, Hope: more images
Silver-Screen Rebels
Silver-Screen Rebels: larger image
Science Under Siege
Science Under Siege: larger image
Name That Warble
Name That Warble: more images
Going Up
Science Notebook
George Washington Wrote Here
George Washington Wrote Here: larger image
The Walls Came Tumbling Down
The Walls Came Tumbling Down: more images
Going Native
Let's Get Physical
Let's Get Physical: larger image
Learning Commons Plugged In
Learning Commons Plugged In: larger image
|