Abstract:
This
paper presents an effective approach for the offline recognition of
unconstrained handwritten Chinese texts. Under the general integrated
segmentation-and-recognition framework with character oversegmentation,
we investigate three important issues: candidate path evaluation, path search,
and parameter estimation.For path evaluation,we combine multiple contexts
(character recognition scores, geometric and linguistic contexts) from the
Bayesian decision view, and convert the classifier outputs to posterior
probabilities via confidence transformation. In path search, we use a refined
beam search algorithm to improve the search efficiency and, meanwhile, use a
candidate character augmentation strategy to improve the recognition
accuracy. The combining weights of the path evaluation function are optimized
by supervised learning using a Maximum Character Accuracy criterion. We
evaluated the recognition performance on a Chinese handwriting database
CASIA-HWDB, which contains nearly four million character samples of 7,356
classes and 5,091 pages of unconstrained handwritten texts. The
experimental results show that confidence transformation and combining
multiple contexts improve the text line recognition performance significantly.
On a test set of 1,015 handwritten pages, the proposed approach achieved
character-level accurate rate of 90.75 percent and correct rate of 91.39
percent, which are superior by far to the best results reported in the literature.
unconstrained handwritten Chinese texts. Under the general integrated
segmentation-and-recognition framework with character oversegmentation,
we investigate three important issues: candidate path evaluation, path search,
and parameter estimation.For path evaluation,we combine multiple contexts
(character recognition scores, geometric and linguistic contexts) from the
Bayesian decision view, and convert the classifier outputs to posterior
probabilities via confidence transformation. In path search, we use a refined
beam search algorithm to improve the search efficiency and, meanwhile, use a
candidate character augmentation strategy to improve the recognition
accuracy. The combining weights of the path evaluation function are optimized
by supervised learning using a Maximum Character Accuracy criterion. We
evaluated the recognition performance on a Chinese handwriting database
CASIA-HWDB, which contains nearly four million character samples of 7,356
classes and 5,091 pages of unconstrained handwritten texts. The
experimental results show that confidence transformation and combining
multiple contexts improve the text line recognition performance significantly.
On a test set of 1,015 handwritten pages, the proposed approach achieved
character-level accurate rate of 90.75 percent and correct rate of 91.39
percent, which are superior by far to the best results reported in the literature.
System diagram of handwritten Chinese text line recognition
A page of handwritten Chinese text
No comments:
Post a Comment