UMASS MAG ONLINENavigationMastheadIn MemoriamAdvertiseContact UsArchivesMagazine Home

Winter 2006

Departments

Exchange

Prerequisite

Extended Family

Foundation News

Alumni Association News

Zip 01003

Books Received

Alumni Photos

Features

Why You Should Love Polymers

Where There's Spark

Falling for Shelburne Falls

Where Are They Now?

Lessons in the Sand

Prerequisite

George Washington Wrote Here
Computer scientists are unlocking the past in handwritten texts

—Patricia Sullivan

Rath and Lavrenko
Project contributor Toni Rath earned his Ph.D. in computer science and now works for Google in California. The project’s other main researcher, Victor Lavrenko, is a post-doctoral candidate at the Center for Intelligent Information Retrieval. Partial funding came from the National Science Foundation and the National Endowment for the Humanities. (photo by Ben Barnhart)
CONSIDER FOR A MOMENT THE vast trove of handwritten material in the world’s libraries, archives, museums, and attics. These countless millions of pages of historical documents, diaries, letters, ships’ logs, and scientific notes hold incredible potential for discovery.

Unfortunately, most of this rich material is so electronically inaccessible that “you could say it doesn’t exist,” according to R. Manmatha, research assistant professor of computer science at UMass Amherst. “Currently, you can search only handwritten documents that have been transcribed word by word or indexed page by page. This is expensive and takes a lot of work.”

Last year, Manmatha and his colleagues at the UMass Center for Intelligent Information Retrieval proved that it is indeed possible to solve the very difficult problem of searching handwritten historical documents. In 10 years, he believes, a manuscript retrieval system could be commercially deployed; you might search Isaac Newton’s notes or hunt through scrawled genealogical records from your home computer.

The new tool functions somewhat like a language translation program. For example, to “teach” a computer how to translate from French into English, programmers use documents published in both French and English. The translated documents work like a rich Rosetta Stone that can then help search other documents.

“The breakthrough was when we took this analogy and applied it to handwritten material,” explains Manmatha.

Researchers used scanned images of George Washington’s personal papers to develop the system. Undergrads converted a portion of his letters, orders, and documents into computer text. Researchers then applied this learning model to 1,000 handwritten pages. Washington’s papers posed particular challenges because he had at least three secretaries, whose handwriting varied significantly. And, military orders do not exactly make scintillating reading.

“It’s a lot of ‘we need blankets,‘ or ‘could you send money?’” says researcher Toni Rath ’05. It’s also a tedious process; each word was typed in and its location on the page noted as well. “After you’ve done a couple of pages you need to take a break.”

Right now, Manmatha’s team is focusing on making the system quicker and more accurate. Search engine leader Google (where Manmatha worked last summer) and eager scholars are interested in the project.

One is curator Christopher Conroy of The Museum of Vertebrate Zoology at the University of California, Berkeley. Conroy says handwriting retrieval has great potential to sift through reams of field notes that date from when the museum was founded, in 1908. Although the museum has scanned many of these papers, scientists still have to decipher every handwritten page to find relevant information. Searching capability would allow them to simply type “salamander,” and locate all references to that creature in all scanned notes.

Raymond S. Bradley, geosciences professor and director of the UMass Climate System Research Center, says that handwriting recognition could further the study of weather and natural systems. “If you could automate a search of New England farmers’ diaries, for example, it opens up many possibilities. You could much more easily discover when the corn ripened in various places and when peepers were first heard in the spring.”

You can check out the demonstration software (and read about General Washington’s woes) at http://ciir.cs.umass.edu/research/wordspotting


[top of page]

Yo-Yo Champion

Yo-Yo Champion: more images

From China, With Love

From China, With Love: larger image

Court of Honor

With Each Stitch, Hope

With Each Stitch, Hope: more images

Silver-Screen Rebels

Silver-Screen Rebels: larger image

Science Under Siege

Science Under Siege: larger image

Name That Warble

Name That Warble: more images

Going Up

Science Notebook

George Washington Wrote Here

George Washington Wrote Here: larger image

The Walls Came Tumbling Down

The Walls Came Tumbling Down: more images

Going Native

Let's Get Physical

Let's Get Physical: larger image

Learning Commons Plugged In

Learning Commons Plugged In: larger image

© 2004 University of Massachusetts Amherst. Site Policies.
This site is maintained by lcahillane@admin.umass.edu