AI Case Study
Researchers at the National Technical University of Ukraine classify ancient carved characters with a 99% accuracy using a convolutional neural network
Researchers at the National Technical University of Ukraine and Huizhou University’s School of Information Science and Technology have studied ancient carvings found on the wall of a Kiev cathedral. Due to the rarity of resources on ancient letters of the Glagolitic and Cyrillic languages, the team used a public database of fonts and glyphs and a collection of more than 4,000 images for 34 letter types. With this data, they trained a convolutional neural network to identify the graffiti and letters. The system was able to identify characters with a 99% accuracy.
"Researchers at the National Technical University of Ukraine and Huizhou University’s School of Information Science and Technology describe a machine learning model that detects, isolates, and classifies ancient letters carved on the stone walls of a Kiev cathedral.
The researchers focused the bulk of their efforts on Glagolitic and Cyrillic, two alphabets commonly used in Eastern Slavic visual texts. Archeologists found glyphs from both — some dating back to the 11th century — on the St. Sophia Cathedral in Ukraine. To date, about 7,000 have been detected and studied.
It goes without saying that historical letter datasets aren’t as common as, say, those for the Arabic alphabet, so the team assembled and preprocessed a collection of more than 4,000 images for 34 letter types. They used notMINST, a second database containing publicly available fonts and glyphs for the letters A-J, to compare the two outputs.
They next embarked on training a convolutional neural network — a type of machine learning algorithm that’s commonly used in computer vision — to recognize the graffiti by feeding it data from the notMINST and their novel dataset, taking care to horizontally and vertically flip some of the original images so as to prevent overfitting.
In future, the researchers hope to improve the model by “teaching” it to consider factors like date, language, authorship, authenticity, and meaning. Furthermore, they propose the creation of larger datasets shared “around the world” in the spirit of “open science, volunteer data collection, processing, and computing,” which they say will lead to further advancements."
"The neural net was 99 percent accurate in isolating characters from the team’s dataset and notMINST, respectively."
"A convolutional neural network to recognize the graffiti by feeding it data from the notMINST and their novel dataset, taking care to horizontally and vertically flip some of the original images so as to prevent overfitting."
"“[C]arved handwriting has usually much worse quality and shabby state to provide the similar values of accuracy … Usually, the preprocessing requires a priori knowledge about the entire glyph, but [certain] datasets are not available at the moment as open source databases …” the team wrote. “The main aim of this paper is to apply some machine learning techniques for automatic recognition of … historical graffiti … and estimate their efficiency in the view of the complex geometry, barely discernible shape, and low statistical representativeness”."
4,000 images for 34 letter types and notMINST database of publicly available fonts and glyphs for the letters A-J.