AUTHENTICATION OF DOCUMENT IMAGES

Kumar, Parveen

AUTHENTICATION OF DOCUMENT IMAGES

Kumar, Parveen

URI: http://localhost:8081/xmlui/handle/123456789/15780

Date: 2019-11

Abstract:

The intense development and extensive evolution of multimedia technologies is a current trend, and progressively multimedia data are used to provide authentication of document images. This implies robustness of a digital image against any security breaches. Moreover, it also renders the service of providing authorization and verification of users for distinct type of services worldwide [1–4]. With the concurrent, continuous and progressive improvisations in modern steganalysis techniques, the security of the information stored in digital images is at potential danger [5, 6]. This report gives information about Writer Identification (WI) related problems [7, 8]. The goal of the research is to design authentication systems for predicting writer of a given sample image and answer certain questions regarding handwriting style, representation and performance of the models created by reducing the parameters involved for the same, along with minimizing human intervention in the process. The approaches used in this report raise certain questions in computer vision, for example, whether handwriting style of an individual can be characterized using various algorithms and what features must be used to represent the model and how can they be combined in these models. The proposed models were evaluated on different datasets, the computer algorithms being unaware of what was written on these datasets sample images. These methods have potential to make it feasible for practical applications like in forensic science, banking system, human identification etc. The term WI is used to identify a writer from handwritten images. In most of the languages, various types of symbols and signs have been employed for communication purposes in the present work. Generally, writer identification research is conducted using handwritten document in different languages such as English, Arabic, Devanagari, etc [9]. Handwriting-based WI is an active research area in pattern recognition and machine learning. There are many intermediate steps to identify the writers from handwritten documents, which are as follows; Design and development of distinct writer recognition methods and algorithms. Different feature extraction techniques and methods. Identify appropriate characteristics from the handwriting feature set. Evaluate handwritten image-based writer identification performance. 1 Features extraction and segmentation of an image plays significant role in many pattern recognition applications such as handwriting recognition and writer identification. Some of the feature extraction techniques that are used for WI are Global Wavelet-based features [10], Pattern-based features [11], Contour-based Orientation and Curvature-based features [12], Edge Structure Code (ESC) Distribution feature [13], Grid Micro-structure features [14], Curvature-free features [15], etc. Feature extraction is used to resize the vector dimension of the feature. If the feature vector is too large to process, it requires to be reduced to a set of features, called the feature vector, in order to improve the model’s efficiency. The extracted and selected features have appropriate information of the input data. Further, segmentation is the method of splitting a digital image into several sections (pixels) in image processing [16]. The fundamental objective of segmentation is to simplify or alter the image as required by the user. It is used as a pre-processing technique for many applications of pattern recognition. Several approaches have been developed for authentication of documents images by researchers during last few decades for enhancing authenticity [1, 3]. WI, Signature Verification [17, 18] and Presence of seal in the documents are some of the approaches to provide the authentication of document images [19, 20]. In the era of smart computation, artificial intelligence and machine learning play an important role to simplify the human lives by developing smart devices and systems. However, intrusion and falsification cannot be avoided. To ensure the identification, several recognition systems have already been developed, commercialized and functional at peak. Writer identification is one of the methods to identify the document writer. Statistics show that machine learning methods help in predicting the writer in a better way compared to humans. The advent of deep learning revolutionized the learning and improved the performance of systems exponentially. Though, deep learning computation is expensive, it outperforms the traditional methods. In the present research work, the existing features have been used and new feature extraction techniques have been developed, and these features are used to learn the model based on machine and deep learning to classify the document writer. The designed and developed proposed research scheme is illustrated in Figure 1. The contributions of the proposed work are as follows: i. A new model, Histogram Weight Transformation (HWT), is proposed for Writer Identification and Verification (WIV) that provides an authentication of handwritten document images. The model targets the drawbacks of traditional data analysis and 2 Writer Identification Models Datasets Performance HWT Model DCWI Model SEG-WI Model IAM Kannada Devanagari CVL IFN/ENIT Arabic IAM = 99.48 % CVL = 82.55 % Kannada = 98.25% Arabic = 45.95 % Devanagari = 99.87 % IAM = 97.80 % IFN/ENIT = 97.50 % Kannada = 99.80 % Devanagari = 99.90 % IAM = 97.27 % CVL = 99.35 % IFN/ENIT = 98.24 % Kannada = 100 % Devanagari = 87.24 % Figure 1: Designed and Developed Proposed Research Scheme. Histogram Symbolic Representation (HSR) approach. The majority vote technique is adapted to identify the writer of a document having multiple text-lines. ii. Next, a novel approach of feature extraction based on Distribution Descriptive Curve (DDC) and Cellular Automata (CA) has been presented and utilized in a new developed model to obtain high accuracy compared with recent techniques. Eventually, an efficient model, DCWI, for writer identification has been presented based on DDC and CA. iii. Furthermore, a Segmentation-free Writer Identification (SEG-WI) model based on CNN is proposed to identify the writer. The region selection mechanism is also developed to improve the overall performance of the model. The lexical similarity between two documents of different writers makes the training difficult, therefore, a new training strategy is suggested to train the model

Show full item record