Abstract:
The intense development and extensive evolution of multimedia technologies is a current
trend, and progressively multimedia data are used to provide authentication of document
images. This implies robustness of a digital image against any security breaches. Moreover, it
also renders the service of providing authorization and verification of users for distinct type
of services worldwide [1–4]. With the concurrent, continuous and progressive improvisations
in modern steganalysis techniques, the security of the information stored in digital images is
at potential danger [5, 6].
This report gives information about Writer Identification (WI) related problems [7, 8].
The goal of the research is to design authentication systems for predicting writer of a given
sample image and answer certain questions regarding handwriting style, representation and
performance of the models created by reducing the parameters involved for the same, along
with minimizing human intervention in the process. The approaches used in this report
raise certain questions in computer vision, for example, whether handwriting style of an
individual can be characterized using various algorithms and what features must be used to
represent the model and how can they be combined in these models. The proposed models
were evaluated on different datasets, the computer algorithms being unaware of what was
written on these datasets sample images. These methods have potential to make it feasible for
practical applications like in forensic science, banking system, human identification etc.
The term WI is used to identify a writer from handwritten images. In most of the
languages, various types of symbols and signs have been employed for communication
purposes in the present work. Generally, writer identification research is conducted using
handwritten document in different languages such as English, Arabic, Devanagari, etc [9].
Handwriting-based WI is an active research area in pattern recognition and machine learning.
There are many intermediate steps to identify the writers from handwritten documents, which
are as follows;
Design and development of distinct writer recognition methods and algorithms.
Different feature extraction techniques and methods.
Identify appropriate characteristics from the handwriting feature set.
Evaluate handwritten image-based writer identification performance.
1
Features extraction and segmentation of an image plays significant role in many pattern
recognition applications such as handwriting recognition and writer identification. Some
of the feature extraction techniques that are used for WI are Global Wavelet-based features
[10], Pattern-based features [11], Contour-based Orientation and Curvature-based features
[12], Edge Structure Code (ESC) Distribution feature [13], Grid Micro-structure features [14],
Curvature-free features [15], etc. Feature extraction is used to resize the vector dimension
of the feature. If the feature vector is too large to process, it requires to be reduced to a
set of features, called the feature vector, in order to improve the model’s efficiency. The
extracted and selected features have appropriate information of the input data. Further,
segmentation is the method of splitting a digital image into several sections (pixels) in image
processing [16]. The fundamental objective of segmentation is to simplify or alter the image
as required by the user. It is used as a pre-processing technique for many applications of
pattern recognition. Several approaches have been developed for authentication of documents
images by researchers during last few decades for enhancing authenticity [1, 3]. WI, Signature
Verification [17, 18] and Presence of seal in the documents are some of the approaches to
provide the authentication of document images [19, 20].
In the era of smart computation, artificial intelligence and machine learning play an
important role to simplify the human lives by developing smart devices and systems. However,
intrusion and falsification cannot be avoided. To ensure the identification, several recognition
systems have already been developed, commercialized and functional at peak. Writer
identification is one of the methods to identify the document writer. Statistics show that
machine learning methods help in predicting the writer in a better way compared to humans.
The advent of deep learning revolutionized the learning and improved the performance of
systems exponentially. Though, deep learning computation is expensive, it outperforms the
traditional methods. In the present research work, the existing features have been used and
new feature extraction techniques have been developed, and these features are used to learn
the model based on machine and deep learning to classify the document writer. The designed
and developed proposed research scheme is illustrated in Figure 1.
The contributions of the proposed work are as follows:
i. A new model, Histogram Weight Transformation (HWT), is proposed for Writer
Identification and Verification (WIV) that provides an authentication of handwritten
document images. The model targets the drawbacks of traditional data analysis and
2
Writer Identification
Models
Datasets Performance
HWT Model
DCWI Model
SEG-WI Model
IAM
Kannada
Devanagari
CVL
IFN/ENIT
Arabic
IAM = 99.48 %
CVL = 82.55 %
Kannada = 98.25%
Arabic = 45.95 %
Devanagari = 99.87 %
IAM = 97.80 %
IFN/ENIT = 97.50 %
Kannada = 99.80 %
Devanagari = 99.90 %
IAM = 97.27 %
CVL = 99.35 %
IFN/ENIT = 98.24 %
Kannada = 100 %
Devanagari = 87.24 %
Figure 1: Designed and Developed Proposed Research Scheme.
Histogram Symbolic Representation (HSR) approach. The majority vote technique is
adapted to identify the writer of a document having multiple text-lines.
ii. Next, a novel approach of feature extraction based on Distribution Descriptive Curve
(DDC) and Cellular Automata (CA) has been presented and utilized in a new developed
model to obtain high accuracy compared with recent techniques. Eventually, an efficient
model, DCWI, for writer identification has been presented based on DDC and CA.
iii. Furthermore, a Segmentation-free Writer Identification (SEG-WI) model based on CNN
is proposed to identify the writer. The region selection mechanism is also developed
to improve the overall performance of the model. The lexical similarity between two
documents of different writers makes the training difficult, therefore, a new training
strategy is suggested to train the model