Empowering the Visually Impaired: Translating Handwritten Digits into Spoken Language with HRNN-GOA and Haralick Features

Alshehri, Mohammed K.; Sharma, Sunil Kumar; Gupta, Priya; Shah, Sapna Ratan

doi:10.57197/JDR-2023-0051

INTRODUCTION

Access to information that is primarily visual is severely hampered by vision impairment (Muthureka et al., 2023), a problem that affects millions of people all over the globe. Individuals with visual impairments are substantially hampered in their independence and everyday functioning when they are unable to read handwritten text, especially numerical information. This study aims to aid the visually handicapped by creating a device that can convert handwritten numerals into spoken words (He et al., 2023).

Utilising the state-of-the-art deep learning (DL) architecture (Mekapothula et al., 2023), more especially a Hierarchical Recurrent Neural Network (Chandra et al., n.d.) with Global Orthogonal Attention, the suggested approach is able to achieve impressive results. This model excels in sequential learning, which allows it to recognise the complex patterns seen in handwritten numerals. Traditional image-processing approaches, in particular Haralick features, are incorporated to complement this DL approach by providing extra texture-based information, thus increasing the identification process’s resilience.

People who are visually impaired face several obstacles in their daily lives that get in the way of their pursuit of independence and the effortless acquisition of knowledge. The difficulty in deciphering handwritten text, especially when it contains numbers, (Le et al., 2023) stands out as a significant obstacle. Because of the complexity of handwritten numbers, it is sometimes difficult for those with visual impairments to independently obtain this information.

This study aspires to address this challenge by suggesting a novel approach that makes use of the interplay between cutting-edge tools for visual recognition (Valencia and Alimohammad, 2023) and voice synthesis (Supakar et al., 2023). Our major goal is to develop technology that not only accurately deciphers handwritten numerical input but also effortlessly converts the visual data into audible language. As a result, we want to provide a game-changing resource for people with visual impairments to have easier access to numerical information by bridging the gap between visual content and aural cognition.

From postal code identification to the digitisation of historical documents, handwritten digit recognition (Mondal et al., 2023) has been an enduring basic difficulty in the area of computer vision. In recent years, the introduction of DL has transformed the landscape of pattern recognition jobs, enabling unparalleled accuracy and efficiency. In this introductory piece, we investigate DL’s (Larochelle et al., 2009) application in the field of handwritten digit identification, focusing on the revolutionary changes it has brought to the reliability and adaptability of recognition tools.

In the past, feature engineering and rule-based algorithms were used for handwritten digit recognition, but these methods had difficulty generalising across different writing styles and variants. DL, and specifically convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have shown to be effective in automatically learning hierarchical representations of data, making them particularly well suited for the complexity of handwritten characters.

In particular, CNNs excel in recognising patterns and features at various sizes within pictures by capturing spatial hierarchies. CNNs, when applied to the problem of handwritten digit identification (Yang et al., 2023), can distinguish subtle nuances and stylistic variances in writing to achieve exceptional accuracy in the presence of noise or distortions.

Not only CNNs (Kiani and Xia, 2023; Zivasatienraj and Doolittle, 2023) and RNNs (Bhattacharjee et al., 2023; Bordoni et al., 2023) but also more modern designs as the Hierarchical Recurrent Neural Network with Global Orthogonal Attention have shown improved sequential learning capabilities. It is important to understand the temporal interdependence of handwritten characters, such as the sequence and structure of a digit’s strokes.

Automatically learning suitable features from raw data is one of the primary benefits of DL (Bhattacharjee et al., 2023; Bordoni et al., 2023; Carrasquilla et al., 2023) for handwritten digit identification since it eliminates the requirement for feature engineering. Because of this malleability, DL models may generalise well to different datasets and accommodate different writing styles.

This investigation will help us better understand the structures, approaches, and accomplishments that have driven the area of DL for handwritten digit identification as we dive further into its mechanics. To demonstrate the revolutionary potential of DL in improving the precision and efficiency of handwritten digit identification systems, the following sections will describe certain models, methodology (Tausani, n.d.), and their implementations.

The project hopes to help those who are blind or have low vision by translating written text into speech using state-of-the-art technology like optical character recognition and machine learning. Text-to-speech (TTS) technology is used to verbalise the detected digits, making numerical material easy to understand and engage with even for the visually impaired.

As an important step toward equality, this study aims to equip the visually challenged population by reducing the difficulties of interpreting handwritten digits. In addition to improving digit identification accuracy, the combination of HRNN-GOA and Haralick features (Sánchez-Sánchez et al., 2022) shows promise in promoting more freedom and improving the quality of life for those with visual impairments. With an eye toward a more accessible and inclusive future, the following sections explore the technique, outcomes, and consequences of this novel approach.

As our culture transitions from the industrial to the postindustrial age, information management and storage become more vital. As a result, pattern recognition is now at the forefront of cutting-edge engineering and scientific study. Pattern recognition is an essential feature of most machine intelligence systems (Sánchez-Sánchez et al., 2022) developed for decision making. Using a camera to take pictures, a machine vision system then analyses those pictures to describe what it sees, making pattern recognition an important part of the field of machine vision.

Machine vision systems are often used in the industrial sector, particularly for visual inspection and assembly line automation. For instance, the field of machine vision focuses on the development of integrated mechanical-optical electronic-software systems (Ali et al., 2023) for inspecting raw materials, finished products, and production procedures for flaws and for enhancing quality, operational efficiency, and safety.

The standard three-step procedure used by Mehta et al. (2019) included preprocessing, feature extraction, and classification. As first steps, they had performed skeletonisation, thresholding, line segmentation, character segmentation, and size normalisation. The character picture was segmented into n by n zones, and the pixel density in each zone was used to determine which features should be extracted. Excluding the Sanskrit loanwords, a total of 106 classes were classified using the support vector machines (SVM) (Boymatova, n.d.) classifier. They used their own 35,441-character dataset to train the model, while their test set of 6048-character dataset only included 34 unique Tamil characters. They were able to recognise 34 characters with an accuracy of 82.04%.

Character glyphs’ height, breadth, number of horizontal lines, number of vertical lines, number of circles, number of horizontally oriented arcs, number of vertically oriented arcs, centroid of picture, location, and pixels in different areas were all retrieved by Moya-Albor et al. (2023). Prior to feature extraction, different preprocessing approaches were employed such as skew detection, smoothing, thresholding, and skeletonisation. The complete printed paper was used as the input and scanned in its entirety. Therefore, the picture is divided into its component characters, and then the characteristics are retrieved from those characters. SVM, self organizing map, (Pugliese et al., n.d.) Fuzzy network, RCS algorithm, and Radial basis function were used to categorise the retrieved characteristics. However, the training and testing dataset sizes were not specified.

In Weyori et al. (2023), a wavelet transform was employed for feature extraction, and a backpropagation neural network was utilised for classification; together, these methods resulted in an 89% recognition accuracy. As far as we are aware, only Ponce et al. (2023) have reported on any work using CNNs for its research. Using the HPLabs dataset, a CNN was built for 35 different classes. The methods of dropout, tanh activations, probabilistic weighting, tanh normalisation, and stochastic pooling were all tested, with a reported accuracy of 94.4%.

In order to recognise handwritten Tamil characters without an internet connection, Chethan et al. (2023) relied on structural and statistical factors. Quad tree construction relied on calculating pixel densities from the segmented character images. The SVM classifier was then used to categorise these characteristics. For offline Tamil handwritten letter identification, Kiruthika and Manivannan (2023) created the closest interest point classifier using a neural network and speeded up robust features (SURF) descriptors. They applied their methods to the HPLabs dataset and achieved 90.2% accuracy across 156 classifications.

Traditional approaches were utilised that retrieved manually built features from the pictures, which were the feature representations of the images, before the success of adopting DL algorithms for computer vision applications. SIFT, SURF, histogram of oriented gradients, and Bag of Words were often used as feature representations in the area of computer vision. CNNs have just become a better alternative in the computer vision sector following the ILSVRC 2012 (Li et al., 2023) challenge, when they virtually halved the error rate. Because of this, CNNs are now used in all cutting-edge computer vision techniques. Without access to powerful graphics processing units or laborious preprocessing techniques, handwritten digit identification was accomplished via a back propagation network (Wan et al., 2023). The study was implemented using zipcode digits supplied by US Postal service and achieved an error rate of 1%.

For the purpose of handwritten digit recognition on the Modified National Institute of Standards and Technology (MNIST) dataset, Wan et al. (2023) used the gradient based learning approach. It was the first model to use a CNN, and it was called LeNet. Different parameterised versions of the LeNet model were created, but the most successful and extensively used version is LeNet-5.

Collectively, these studies show a shift from traditional Hopfield learning methodologies, with an emphasis on optimising parameters to improve classification precision. This research is motivated by the idea that radiomics properties may be retrieved without resorting to data augmentation or direct input.

The following text details our investigation process as follows: the materials and methods used are provided in the following section.

The Dataset Section presents a high-level summary of our model.

In the Results and Discussion Section, we evaluate HRNN-GOA against the state-of-the-art algorithms on the MNIST and DIDA datasets, using a number of different metrics.

The Conclusion Section concludes by reviewing the work’s scientific achievements and outlining future research directions.

MATERIALS AND METHODS

The use of AI and Hopfield learning to translate handwritten numbers into spoken language is helpful for the visually impaired. An intriguing strategy with the potential to increase precision is the incorporation of radiomics characteristics into a model built on the HRNN-GOA (Hopfield Recurrent Neural Network with Wavelet Output Activation). Take a look at this high-level summary of potential strategies:

Data collection and preprocessing:
- Gather a large collection of people’s handwritten numbers. Please categorise this dataset with appropriate numeric categories.
- The photos should be preprocessed in order to remove unwanted elements before further analysis.
- Prepare the photos for radiomics analysis by extracting relevant characteristics. To define handwritten digits, a technique called “radiomics,” which typically includes extracting a large number of quantitative information from medical pictures, might be used.
Model architecture:
- As the starting point, implement an HRNN (Hopfield Recurrent Neural Network). RNNs excel at capturing sequential dependencies, making them ideal for sequence data like handwritten numbers.
- Modify the design to add wavelet output activation (GOA) as a means to synthesise human-like speech. GOA has the potential to improve upon the standard neural network output layer used in speech synthesis.
Training:
- Create a training set and a validation set from your data.
- Train your HRNN-GOA model using the annotated handwritten digit pictures. In order to train the wavelet output activation, you will require a dataset consisting of recorded audio samples that correspond to each digit class.
- If you have access to transfer learning or a big dataset of handwritten digits (like MNIST), pretrain on that dataset before honing your model on the radiomics dataset.
Evaluation:
- Test your model using data that wasn’t used during training, ideally data that contain handwritten numbers.
- Accuracy, precision, recall, F1-score, and even user ratings of the produced voice’s quality are all possible measures of success.
Integration for visually challenged users:
- Create a user-friendly interface that accepts handwritten numbers from people who are visually impaired using touchscreen devices or digital pens.
- Your trained HRNN-GOA model should then be fed with the preprocessed input pictures.
- The results of the model should be made audible to the user using a TTS system.
Accessibility and user testing:
- Make that the system works as intended by conducting user tests and soliciting input from people with visual impairments.
- Implement accessibility features such as voice commands and screen readers to enhance usability.
Deployment and maintenance:
- Install it on a platform that will be readily available to your intended audience.
- Maintain and update the system on a regular basis, taking user input into account to enhance the model’s accuracy and performance.

This project uses image analysis, Hopfield learning, and adaptive technology to aid the visually handicapped. Consult accessibility professionals and get input from persons with visual impairments at every level of development if you want to create a really effective and user-friendly solution. In this part, we’ll break down the faultless process flowchart depicted in Figure 1.

Figure 1:

Projected flow diagram.

Dataset

MNIST dataset

The MNIST database is often used for testing and benchmarking image classification techniques in the fields of machine learning and computer vision. There are 60,000 grayscale 28 × 28 photos of handwritten numbers (0-9) that serve as the training set, and 10,000 images that serve as the test set.

Machine learning models are sought to properly categorise images of numbers from the MNIST dataset based on the values of their pixels. Specifically for handwritten digit identification tasks, the dataset has been instrumental in the creation and assessment of several image recognition systems. It offers a common metric for comparing the efficacy of various image categorisation methods and algorithms.

IAM handwriting dataset

Written English text forms may be found in the identity and access management (IAM) Handwriting Database, making it a useful tool for training and testing handwritten text recognisers and for authorship verification and identification research.

This database, first presented in Muthureka et al. (2023) at the ICDAR 1999, has been essential in driving forward developments in handwriting recognition systems. Notably, it helped advance an hidden Markov models-based handwritten sentence recognition system, as described in He et al. (2023) at the ICPR 2000. The database’s second iteration made use of a segmentation strategy that was published in Mekapothula et al. (2023) and presented at the ICPR 2002. As of October 2002, a detailed description of the IAM database may be found in Chandra et al. (n.d.).

Our studies rely heavily on the IAM Handwriting Database, and further information about its use may be found in our written works. Unrestricted handwritten text forms are scanned at 300 dpi and stored as 256-level PNG pictures in the database. The picture below graphically depicts examples of a whole form, a text line, and some extracted words. Several images from the DIDA dataset (Ponce et al., 2023) are shown for visual comparison in Figure 2b.

Figure 2:

(a) Random MNIST handwritten numbers plotted. (b) Handwritten numbers from the IAM dataset. Abbreviation: MNIST, Modified National Institute of Standards and Technology.

Noise removal and image conversion

When preparing images for uses like digit recognition, preprocessing handwritten digits is an important first step. The process starts with obtaining photos, which are often taken from publicly available datasets like MNIST. The first step is to convert the photos to grayscale so that everything is uniform and easy to work with. Next, thresholding methods are used to separate the foreground (the numbers) from the background. Noise, a typical difficulty in picture data, is methodically handled using a succession of approaches. The picture is smoothed with a Gaussian blur, and the salt-and-pepper noise is reduced using a median filter. The picture is further refined using sophisticated denoising methods like Nonlocal Means or Total Variation Denoising. The process of binarisation converts a picture to a black and white format, which is more suitable for number identification. Image imperfections are smoothed up by morphological processes like erosion and dilatation. Connected component analysis is useful for identifying and discarding irrelevant information. Consistent size and alignment may be achieved by normalisation, while attention can be directed to the area of interest (the handwritten numerals) through shrinking or cropping. Adjustments to the contrast or sharpening of edges are two examples of upgrades that may be done if needed. Handwritten digit pictures benefit greatly from this iterative and custom preprocessing pipeline, which boosts their quality and clarity to set the stage for successful identification later on.

(1)

g (x, y; λ, θ, ψ, σ, γ) = exp (- \frac{{x^{'}}^{2} + y^{2} {y^{'}}^{2}}{2 σ^{2}}) exp (i (2 π \frac{x^{'}}{λ} + ψ))

(2)

\begin{array}{l} g (x, y; λ, θ, ψ, σ, γ) = exp (- \frac{{x^{'}}^{2} + y^{2} {y^{'}}^{2}}{2 σ^{2}}) cos (2 π \frac{x^{'}}{λ} + ψ) \\ Real \end{array}

(3)

g (x, y; λ, θ, ψ, σ, γ) = exp (- \frac{{x^{'}}^{2} + γ^{2} {y^{'}}^{2}}{2 σ^{2}}) sin (2 π \frac{x^{'}}{λ} + ψ),

where

(x′, y′) is the input parameter of the coordinate space.
λ is the input wavelength of the pixel range.
θ is the overall filter orientation factor.
ψ is the turning point of phase range offset.
σ is the overall standard deviation of the input pixel.
γ is the range of the spatial domain.

Equation 3 gives Gabor filter, a type of linear filter used for texture analysis and edge detection in image processing. The Gabor filter is defined by the following equation:

(4)

g_{σ z} (|| q - p ||) = x cos θ + y sin θ .

This expression calculates the Gabor filter response at a point (x, y) in the image, given the parameters σ_z , q, and p. The response is a linear combination of the image coordinates x and y with the cosine and sine of the orientation angle θ. These expressions are commonly used in image-processing tasks such as edge detection, texture analysis, and feature extraction

(5)

(f (q) - f (p)) = - xsin θ + ycos θ .

Both the initial data f and the current location p affect the renormalisation factor N p. Operators of this kind are often used by robust statistical estimators.

Note that nonlinear Gaussian filters prevent outliers from skewing the average. Within the context of the highly structured signal f2, the realisation of random variables is similarly stated.

HRNN (Li et al., 2023) using a nonlinear Gaussian filter may be computationally intensive due to the equation’s dependence on the Gaussian kernel.

To further aid in noise reduction, a new parameter has been included to the nonlinear Gaussian filter.

(6)

\begin{array}{l} Y (t) = m e d i a n (x (t - T / 2), x (t - T_{1} + 1), \dots, \\ x (t), \dots, x (t + T / 2)), \end{array}

where Y is the current median filter.

The image’s edges are softened using a technique called morphological smoothing. Morphological smoothing takes a statistical method with the aim of eliminating noise (Wan et al., 2023). In contrast to dilation, which adds pixels in the centre, erosion removes them at the edges, turning binary 1 values to 0.

Dilation and erosion of a grayscale image f(x) by another image g are the most general translation-invariant morphological transformations:

(7)

(f \oplus g) (x) \equiv \underset{y = D}{\lor} f (x - y) + g (y)

(8)

(f ⊖ g) (x) = y \hat{\in} D_{\land} f (x + y) - g (y) .

Erosion followed by dilation is called an opening procedure, and it is responsible for smoothing out the edges by reducing the prominence of sharp distinctions. When the small spaces are filled by dilation and erosion, the black features are hidden. Images are enhanced by smoothing their edges using various processes. Where the dilation and erosion are applied, the darker features are covered up in this piece. The median filter and morphological smoothing are used to produce the preprocessed picture.

Feature extraction

Haralick feature extraction is a cornerstone process for describing texture patterns in images for use in several image analysis and computer vision tasks. The first step is the generation of cooccurrence matrices, which include the counting of pixel intensity pairings according to predetermined spatial correlations (e.g., along a horizontal, vertical, or diagonal axis). When these matrices are normalised, the resulting probabilities can be used as building blocks for determining individual Haralick characteristics. Characteristics like the angular second moment, entropy, contrast, correlation, and homogeneity may provide light on the texture’s consistency, randomness, regional variation, linear dependencies, and proximity to the diagonal. Spatial averaging, where the average value of each Haralick trait is calculated over all spatial connections, is often used to improve resilience. When these texture descriptors are combined into a feature vector, it may be used as the input in machine learning algorithms for things like picture categorisation and segmentation. The subtleties of texture patterns greatly affect picture interpretation and analysis, making Haralick characteristics especially useful in fields like medical imaging and remote sensing.

(9)

G_{E} (X) = \frac{1}{\sqrt{2 π σ^{2}}} e^{\frac{- x}{2 σ^{2}}} cos(2 π w_{0} X)

(10)

G_{o} (X) = \frac{1}{\sqrt{2 π σ^{2}}} e^{\frac{- x}{2 σ^{2}}} sin (2 π w_{0} X),

where

w ₀ defines the centre frequency (frequency in which filters yield greatest response).
σ is the spread of Gaussian window.
Some features of Gabor filters include that it is a tunable bandpass filter, akin to a windowed Fourier transform and fulfils the lowermost constraint on the time spectrum resolution. It is a filter with orientation selectivity, spectral bandwidth selectivity, and spatial resolution selection. A 2D Gabor filter is a kernel function of the Gaussian distribution that is sinusoidally modulated in the spatial domain.
A 2D Gabor filter over the image domain (x, y) is given by

(11)

\begin{array}{l} G (x, y) = exp (- \frac{{(x - x_{0})}^{2}}{2 σ_{x}^{2}} - \frac{{(y - y_{0})}^{2}}{2 σ_{y}^{2}}) \\ \times exp (- 2 π i (u_{0} (x - x_{0}) + v_{0} (y - y_{0}))), \end{array}

where

(x ₀, y ₀) is located in the image.
(u ₀, v ₀) specifies modulation which has frequency $w_{0} = \sqrt{u_{0}^{2} + v_{0}^{2}}$ and orientation θ _0=arctan (v ₀/u ₀).
σ_x and σ_y are the standard deviation of Gaussian envelope.

Images have textural attributes if they include distinguishing features that characterise the surface texture of objects or areas in the picture, such as patterns, variations, or structures. These characteristics are essential in fields like computer vision, pattern recognition, and picture analysis. In three-dimensional image analysis, the phrase “26-connected voxels” implies a complete connectivity scheme that considers each voxel in a three-dimensional space related to its 26 immediate neighbours. This connection forms a three-dimensional lattice of linked voxels, with six neighbouring faces, twelve neighbouring edges, and eight neighbouring vertices. The voxel and its face-adjacent neighbours constitute a cubic structure, with the face-adjacent connections spanning the positive and negative directions along the x, y, and z axes, and the edge-adjacent and vertex-adjacent connections extending along the edges and vertices. In three-dimensional image-processing tasks such as segmentation and object analysis, a thorough grasp of voxel interactions is crucial, and this 26-connected neighbourhood is at the centre of it all. The 26-connected scheme, which uses a more comprehensive definition of connectedness, captures a wider variety of spatial interactions and thus offers a more complete picture of the three-dimensional arrangement of voxels inside volumetric data.

Analysis of text using HRNN-GOA

An advanced method of natural language processing (NLP) is the use of a Hierarchical Recurrent Neural Network with Global Orthogonal Attention for text analysis. By using the strength of hierarchically structured RNNs, HRNN-GOA can simulate sequential relationships in textual data. The model’s ability to capture textual context and long-range relationships has been substantially improved with the addition of Global Orthogonal Attention (GOA). In order to train the model to pay attention only to the most important aspects, GOA includes a technique for selectively attending to relevant information across distinct segments of the input sequence. In tasks like sentiment analysis, document summarisation, and language production, where it is crucial to recognise the links between far-flung words or phrases, this attention mechanism becomes very useful. The HRNN-GOA architecture not only excels at collecting complex patterns within textual input, but it also provides interpretability by way of attention weights, which reveal which portions of the text most substantially contribute to the model’s predictions. For superior text interpretation and processing skills, HRNN-GOA is a state-of-the-art method that integrates the best features of both hierarchical recurrent structures and global attention mechanisms.

The WHT is given by

(12)

W X_{k} = \frac{1}{N} (H_{m} W X_{k}),

where

X_k is the corresponding inverse transform which is given as

(13)

X_{k} = (H_{m} W X_{k})

H_m -square and symmetric Hadamard transform matrix of order m is recursively defined as

(14)

H_{m} = (\begin{matrix} H \frac{m}{2} & H \frac{m}{2} \\ H \frac{m}{2} & - H \frac{m}{2} \end{matrix})

m > 1 and m−2^k with H ₁−[1]. Since H _m contains only the +1 or −1 entry, the transformation requires only real additions and subtractions.

Neural network optimisation

Training neural networks requires optimisation to improve performance, efficiency, and generalisation on a specific job. Neural networks are optimised using a variety of methods.

The network’s capability of modelling complicated interactions is affected by the choice of proper activation functions. Common possibilities include Rectified Linear Unit, Sigmoid, and Tanh, each ideal for distinct contexts.

The network weights must be initialised correctly. Training problems, such as disappearing or bursting gradients, may be avoided with the use of methods like Xavier/Glorot initialisation and He initialisation.

Convergence may be improved by adjusting the learning rate. Methods like learning rate annealing, learning rate scheduling, and adaptive strategies (like Adam and RMSprop) are often used.

(15)

c (s, σ) s = \frac{2 s}{(2 + \frac{s^{2}}{σ^{2}})} = ψ (s, σ)

(16)

\int c (s, σ) s d s = σ^{2} log (1 + \frac{1}{2} (\frac{s^{2}}{σ^{2}})) = ρ (s, σ)

An RNN layer takes in data at each time step and uses that data to modify its memory, or its hidden state. Input history is stored in this hidden state and used to determine how new data are handled. RNNs are adaptable for applications like NLP, voice recognition, and time-series prediction because of their design, which enables them to handle input sequences of varied lengths.

Despite their efficiency, classic RNNs have limitations, such as difficulty in capturing long-term dependencies and vulnerability to disappearing or bursting gradients during training. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) layers are two examples of the more sophisticated variations designed to deal with these problems. These variations improve sequential pattern learning by including methods for selective memory storage and recall.

(17)

{x_{i}}_{i = 1}^{t} = {x_{1}, \dots x_{t}}

In conclusion, RNN layers are crucial parts of neural network designs meant for sequential data processing. They are particularly effective at jobs that call for a comprehension of context and temporal relationships, and variations like as LSTM and GRU have significantly boosted their capacity to grasp long-term dependencies in sequences.

Mechanism of hidden layers

T_n is the result or forecast at time n, and shows what the network has learnt thus far. This component seems to reflect the cumulative effect of the hidden states h(x _n _-1), multiplied by their respective coefficients, c-n. This represents the network’s recurrent nature, where previous data are utilised to shape the present forecast.

This part involves another sum, likely over some features or units j, each contributing to the prediction based on the current input $x_{n} \cdot h^{*} (x_{n} | v_{n j} | + 1)$ representing the hidden layer activation, possibly with some additional parameter v_nj . In summary, the equation seems to be a representation of an RNN’s prediction at time n, where the current prediction is influenced by the sum of contributions from the past hidden states and features in the current input. The parameters c_n , a_j , and v_nj would be learnt during the training process to optimise the network for a specific task. It’s worth noting that without additional context or specific definitions for terms like $h (x_{n - 1}), h^{*} (x_{n} | v_{n j} | + 1)$ , and the specific task of the network, a more detailed interpretation would depend on the specific architecture and purpose of the RNN. Here are the results:

(18)

T_{n} = \sum_{i = 1}^{n} c_{n} h (x_{n - 1}) + \sum_{j = 1}^{k} a_{j} h^{*} (x_{n} | v_{n j} | + 1)

The initial summation on the right side of the equation (18) is considered to be zero on the suggested scale.

Following is the derivation of the remaining function h().

(19)

h^{*} (x_{(n p_{j})}) = {\begin{array}{l} M e d {x_{i} < [n p_{j}]}, w h e n 1 \leq i \leq [n p_{j}] - 1 \\ M e d {x_{i} \geq [n p_{j}]}, w h e n [n p_{j}] \leq i < n \end{array}

(20)

\begin{array}{l} T_{n} ([n p_{j}]) = \frac{1}{2} (M e d {x_{i} < [n p_{j}]} + M e d {x_{i} \geq [n p_{j}]}) \\ = \frac{1}{2} (L_{| s, j |} + H_{(s p, 1)}) \end{array}

Now, an objective function is developed for optimisation as a means of determining the heterogeneity scale parameter.

(21)

t = \underset{τ}{m i n} \sum_{| \sim p_{j} |} | T_{s} ([n p_{j}]) - τ |

(22)

t = M e d {\frac{1}{2} (L_{| m p j j |} + H_{| a p j |})} .

This estimator has the benefit of being applicable in any setting, since the threshold is calculated as the average of the medians above and below the threshold at each iteration, making it applicable in any place. As a result, T-n is not dependent on a single physical place.

Let l-1…,l-3…,l-n denote the “m” possible grey levels of an image, and let x-1…,x-3…,x-2 denote the intensity values of an example picture U,(x), where x,e,R-2 is the radial distance between the camera and the subject n.

Compute the average

(23)

τ_{j} = \frac{(ν_{j} + p_{j})}{2} for j = 1 - 1

(24)

t = M g d [T_{j}]

where b is the bias.

Grasshopper optimisation method

Grasshoppers are insects which impair soil fertility as well as farming. These bugs are often considered annoying nuisances. Despite their solitary nature, grasshoppers are capable of forming one of the biggest swarms in the animal kingdom. When these grasshopper swarms reach the size of a continent, they become a veritable time bomb for the farmers. Swarming behaviour is shown by both nymphs and adults, which is a notable aspect of grasshoppers. Billion nymph grasshoppers will bounce and transfer like cylinders in a spin. Almost all of the grasses these insects go over will be consumed by them. When these grasshopper nymphs mature into adults, they will engage in a swarming activity.

During the larval stage, the swarm’s distinguishing characteristic is the gradual and steady advance of the grasshoppers. On the other hand, during the adult phase, the swarm’s important component is the grasshoppers’ long-range as well as unexpected movements. Grasshopper swarms also have another fundamental characteristic: the pursuit of food and water. The finding and use phases of assessment have been conceptually separated by the methods inspired by nature. The check lawyers are only required to relocate for short periods of time during discovery, but they are often on the road during exploitation. Grasshoppers in nature execute these two tasks in addition to checking for potential targets.

The Grasshopper Optimisation Algorithm (GOA) offers an unusual strategy for handwritten feature categorisation that is grounded in nature-inspired optimisation. Although GOA is often used for tackling complicated optimisation issues, its applicability in the context of handwritten character recognition extends to feature selection and parameter optimisation. GOA tries to repeatedly refine feature subsets for increased accuracy by modelling them as a population of grasshoppers, each of which is assessed based on a predetermined objective function linked to classification performance.

The algorithm’s ability to both locally and globally explore data is a good fit for the subtlety of handwritten data, where the selection of relevant attributes is crucial. GOA provides a dynamic approach to improving the performance of classification models, whether it is used for feature subset selection or parameter adjustment. By combining GOA with machine learning algorithms, a hybrid strategy may be implemented, one that makes use of the algorithm’s innate swarm intelligence to get superior classification results. Optimal feature subsets and parameter settings for accurate and efficient character recognition may be uncovered using GOA, making it a promising method for improving the performance of handwritten feature classification systems.

Classification phase

In this section, we detail the recommended classification approach (GOA+HRNN), the end result of which is to identify the optimal values for the HRNN’s input parameters.

GOA structure initialisation

The numerical simulation employed for the accurate measurement of the grasshoppers’ swarming actions will be in accordance with Equation (25),

(25)

P_{k} = Q_{k} + L_{k} + B_{k}

Here, P_k will indicate the Kth grasshopper’s position, Q_k will indicate the social interaction, L_k will indicate the force of gravity, and B_k will indicate the wind advection.

In order to provide random behaviour, Equation (26) can be formulated as P_k = rn ₁ Q_k + m ₂ L_k + rn ₃ B_k , where rn₁ , rn ₂ and rn ₃ are random numbers in [0,1].

(26)

Q_{k} = \sum_{\begin{array}{l} l = 1 \\ l \neq 1 \end{array}}^{m} q (d_{k l}) d_{M}

Here, d_kl will indicate the distance between the kth grasshopper and the 1st grasshopper.

Training using the proposed work

Having a precise model of the issue at hand is crucial when training HRNNs using metaheuristics. Previous research has focused extensively on the weights and biases used in the training of HRNNs. To improve classification precision, trainers must first tweak weights and biases. To get the most out of the GOA algorithm, your inputs should be crafted in accordance with the following guidelines:

The q function, that defines the social forces, can be evaluated as per the below Equation (27),

(27)

q (r n) = R e^{\frac{- m}{t}} - e^{- m t}

In this equation, R will indicate the intensity of attraction and t will indicate the length scale. The q function’s shape will have an L component and is evaluated in accordance with Equation (27),

(28)

L_{k} = - f e_{f},

where f will indicate the gravitational constant and e_f will indicate a unity vector towards the Earth’s centre. Equation (28) is used to evaluate the B component as follows:

(29)

B_{k} = ν e_{μ},

where v will indicate a constant drift, and e _μ will indicate a unity vector in the wind direction. Since the nymph grasshoppers do not have any wings, their movements are correlated with the direction of wind. When q, L, and B are added to Equation (30), the resultant equation will be the below Equation (29),

(30)

p_{k} = \sum_{\begin{array}{l} l = 1 \\ l \neq 1 \end{array}}^{m} q (| x_{1} - x_{k} |)((| x_{1} - x_{k} | / d_{k l})) - f e_{f} + v e_{μ}

Here, $q (r n) = R e^{\frac{- r n}{t}} - e^{- m}$ , and M will be the number of grasshoppers.

For the resolution of optimisation problems, a stochastic algorithm should efficiently execute the exploration as well as the exploitation so as to determine an accurate approximation of the global optimum. In order to demonstrate the exploration as well as the exploitation across the various stages of optimisation, the proposed mathematical model must have special parameters. This model can be represented using the below Equation (31):

(31)

X_{i}^{d} = c (\sum_{\begin{array}{l} j = 1 \\ j \neq i \end{array}}^{N} c \frac{u b_{d} - l b_{d}}{2} s (x_{j}^{d} - x_{i}^{d}) \frac{x_{j} - x_{i}}{d_{i j}}) + {\hat{T}}_{d} .

Here, ub_d will indicate the upper bound in the dth dimension, and lb_d will indicate the lower bound in the dth dimension.

(32)

s (r) = f e^{\frac{- - t}{T}} - e^{- r}, {\hat{T}}_{d}

will indicate the value of the dth dimension in the target (best solution found so far), and c will indicate a decreasing coefficient that shrinks the comfort area, the repulsion area, and the attraction area.

This equation assumes that the wind is constantly blowing in the same direction (the A component), hence it can’t account for gravity (because there is no G component). The inner c will help reduce mutual repulsion and attraction between grasshoppers as the iteration counter increases, while the outer c will narrow the search area around the target as the iteration counter rises.

Classification stage

The term “HRNN-GOA” is often used to describe the union of an HRNN with the Grasshopper Optimisation Algorithm (GOA). While GOA is an optimisation technique inspired by the swarming behaviour of grasshoppers, HRNN is a neural network design that uses recurrent connections to describe sequential dependencies.

Design and train the HRNN for your particular categorisation job. The network’s architecture must be defined, including the number of layers, kind of neurons, and order of incoming data. Use GOA to fine-tune the HRNN’s settings.

With GOA, you won’t be stuck at a “local optimum.” It also converges well to the optimal solutions. However, it cannot always execute global search properly. As a result, there are situations in which GOA can’t discover the best answer. The search approach used by standard GOA is based mostly on reducing values and random walks.

Algorithm

HRNN-GOA

Input:

- Data D with n samples and m features

- Number of hidden layers H_no

- Number of neurons per hidden layer N_h

- Learning rate α

- Number of training iterations max-iterations

- Gaussian kernel bandwidth σ

Preprocessing (Gaussian):

for each feature column j in D:

for each data point × in D[j]:

Calculate the Gaussian kernel density estimate K(x) using σ and the statistics of feature j

Replace x in D[j] with K(x)

K(x) = (1/(σ * √(2π))) * exp(−(x − μ)^2 / (2 * σ^2))

Training the Grasshopper Neural Network:

Initialise neural network weights and biases randomly

for each iteration from 1 to max-iterations:

for each data point x in D:

Perform forward pass:

Compute the output of the neural network for x

Apply the activation function for the hidden layers and output layer

Find the discrepancy between forecasted and actual production or goal.

Backpropagation is used to calculate the loss gradients based on the weights and biases.

Gradient descent with the learning rate is used to update the weights and biases.

Output:

Trained Grasshopper Neural Network

TTS conversion

Recent advancements in TTS technology, primarily propelled by DL methodologies, have significantly transformed the landscape of speech synthesis. Neural network architectures such as WaveNet, Tacotron, and Transformer-based models, including those based on the Generative Pretrained Transformer framework, have revolutionised the generation of highly natural and expressive speech from textual input. These models undergo extensive training on vast datasets, enabling them to capture the intricacies of language, intonation, and phonetic nuances. Consequently, synthesised speech has achieved remarkable improvements in human-likeness, prosody, and reduced artificiality, allowing for a more authentic and engaging auditory experience. Ongoing research endeavours in DL for TTS aim to further refine these systems, focusing on enhancing expressiveness, minimising data requirements, and optimising real-time performance, thus widening the scope and applications of synthesised speech across various domains.

Pesudocode 2

Conversion of Text to Speech

text_message = get_text_message()

N = length_of(text_message)

I = 1

myphoneme = []

while I <= N:

phoneme = “”

while I <= N and phoneme_in_database(phoneme + get_character(text_message, I)):

phoneme += get_character(text_message, I)

I += 1

myphoneme.append(phoneme)

for phoneme in myphoneme:

copy_wav(phoneme, “SpeakIt.wav”)

play_audio_file(“SpeakIt.wav”)

This pseudocode outlines a simplified TTS conversion process. It begins by obtaining a text message and determining its length. Using iterative loops, it extracts phonemes from the text message based on matches found in a phoneme database, storing these phonemes in an array. Once all phonemes are extracted, it converts each phoneme into an audio waveform, assembling them into an audio file named “SpeakIt.wav”. Finally, it plays the audio file, resulting in the synthesised speech of the input text message.

RESULTS AND DISCUSSION

Using simulated analysis on two separate datasets, we experimentally validate the performance of the proposed model. There are a total of 70,000 photos of handwritten digits in the default MNIST collection. Only 10,000 of these pictures are used to evaluate the proposed algorithm. We shall use Dataset 1 to refer to the MNIST dataset.

The IAM dataset, an archive of handwritten data including 250,000 pictures of single-digit numbers and 100,000 photos of multi-digit numbers, is also used in the suggested model. There are 1000 examples from each class in the dataset, for a grand total of 10,000 samples. This dataset will be referred to as Dataset 2 in the following sections.

The accuracy, precision, recall, F1-score, specificity, and area under the curve (AUC) of the suggested model are only a few of the performance indicators used to assess its usefulness (AUC). The mathematical formulation of these performance indicators is as follows:

True Positive: The number of correct positive predictions.
False Negative: The number of actual positive cases incorrectly predicted as negative.
False Positive: The number of actual negative cases incorrectly predicted as positive.
True Negative: The number of correct negative predictions.

Experimental results for dataset 1

The confusion matrices for the training and testing datasets are shown in Figures 2 and 3, respectively. Figure 4 shows the flowchart of the GOA method and Figure 5 shows the confusion matrix of the proposed work. Figure 6 shows the trained statistics of proposed work.

Figure 3:

Training of images and validation stage.

Figure 4:

Flowchart of the GOA method.

Figure 5:

Confusion matrix.

Figure 6:

Trained set statistics.

The Receiver Operating Characteristic (ROC) curve is a graphical representation of the accuracy of a binary classification model. Figures 7 and 8 depict the trade-off between sensitivity (how often a classification is correct) and specificity (1 minus sensitivity). Figures 9 and 10 show the proposed model’s training and testing ROC curves, respectively. The suggested model has an AUC larger than 0.95 because it produces largely positive findings with a low false-positive rate in training and testing.

Figure 7:

Training ROC curve. Abbreviation: ROC, receiver operating characteristic.

Figure 8:

Testing ROC curve. Abbreviation: ROC, receiver operating characteristic.

Figure 9:

Training PR curve.

Figure 10:

Testing PR curve.

Experimental results for IAM dataset

Figures 11 and 12 show the Dataset 2 training and testing confusion matrices, respectively. The dataset consists of 500 unique writers who together produced 9862 lines of text. Each of the four subsets of this dataset was deliberately created to serve a distinct role in the context of machine learning model training and testing. The training set contains 6161 text lines representing a broad range of handwriting styles from 283 authors. Validation 1 has 900 text lines, and Validation 2 contains 940 text lines; both may be used to train the model. 46 and 43 authors, respectively, contributed to these validation sets, ensuring the datasets’ variety. Finally, the 1861 text lines from 128 authors in the test set are used as an external benchmark against which to evaluate the model’s ability to generalise the unseen data. Handwriting recognition and associated applications benefit from this thoughtful dataset division since it allows for comprehensive model training, validation, and testing, resulting in trustworthy results across a broad variety of handwriting styles.

Figure 11:

Overall matrix for training.

Figure 12:

Overall matrix for testing.

Figures 13 and 14 demonstrate, respectively, the AUC curves during training and testing.

Figure 13:

Actual accuracy rate.

Figure 14:

Loss rate.

The mission of Table 1 is to evaluate the efficacy of various models by studying many approaches to a single goal, often in the context of computer vision or image processing. Some of the metrics used to assess the efficacy of various approaches include accuracy, precision, recall, specificity, F1 score, and AUC.

Table 1:

Validation analysis for training values.

Techniques	Accuracy	Precision	Recall	Specificity	F1 score	AUC score
SVM (Carrasquilla et al., 2023)	0.7407	0.7667	0.7377	0.7431	0.7525	0.9190
CNN (Tausani, n.d.)	0.7673	0.9073	0.7226	0.9216	0.7717	0.9219
R-CNN (Bappy et al., 2022)	0.7574	0.7966	0.7377	0.7724	0.7667	0.9330
DCSOM (Sánchez-Sánchez et al., 2022)	0.7750	0.9016	0.7771	0.7724	0.7943	0.9374
GAN (Sánchez-Sánchez et al., 2022)	0.7574	0.7594	0.7771	0.7235	0.7730	0.9272
MNIST	0.9616	0.9293	0.9062	0.9699	0.9061	0.9953
IAM	0.9959	0.9696	0.9696	0.9966	0.9696	0.9686

Abbreviations: AUC, area under the curve; CNN, convolutional neural network; DCSOM, Deep Convolutional Self-Organising Map; GAN, generative adversarial networks; MNIST, Modified National Institute of Standards and Technology; R-CNN, regions with convolutional neural networks; SVM, support vector machines.

CNN achieved the best accuracy of all of the approaches, coming in at 74.07%, in addition to outstanding results for recall and specificity. The quicker regions with convolutional neural networks (R-CNN) is more accurate (76.73%) and prioritises accuracy above speed. In terms of accuracy and recall, both the Deep Convolutional Self-Organising Map (DCSOM) and the Deep Convolutional Neural Network (DDRNet) excel. DeblurGAN, a model likely developed for image deblurring, strikes a balance between the two approaches.

The proposed technique outperforms the state-of-the-art on two independent datasets (Datasets 1 and 2). With outstanding precision, recall, and specificity, the proposed method achieves a state-of-the-art 96.16% accuracy on Dataset 1. The reliability of the suggested technique is further shown by Dataset 2, which achieves a remarkable 99.59% accuracy with recall, precision, and specificity all >96%. These impressive outcomes point to the suggested method’s superior performance over a wide variety of assessment measures, suggesting that it is more suited to the job at hand than the other investigated methodologies.

Figures 15 through 20 give graphical representations of the various metrics to provide a visual comparison between testing and training validations.

Figure 15:

Comparison with precision rate. Abbreviations: CNN, convolutional neural network; RNN, recurrent neural network; SVM, support vector machines.

Figure 16:

Comparison with the existing recall rate. Abbreviations: CNN, convolutional neural network; RNN, recurrent neural network; SVM, support vector machines.

Figure 17:

Performance metrics of specificity. Abbreviations: CNN, convolutional neural network; RNN, recurrent neural network; SVM, support vector machines.

Figure 18:

Performance metrics of the F1 rate. Abbreviations: CNN, convolutional neural network; RNN, recurrent neural network; SVM, support vector machines.

Figure 19:

Accuracy comparison. Abbreviations: CNN, convolutional neural network; RNN, recurrent neural network; SVM, support vector machines.

Figure 20:

Overall accuracy rate. Abbreviations: CNN, convolutional neural network; RNN, recurrent neural network; SVM, support vector machines.

In order to compare and contrast various cutting-edge methods of computer vision and image processing, the provided table gives a comprehensive overview of the most important metrics utilised to do so. Each approach is evaluated using a variety of metrics, such as accuracy, precision, recall, specificity, F1 score, and AUC, to get an in-depth understanding of their strengths and weaknesses. In particular, when accuracy and specificity are prioritised, CNN and faster R-CNN show comparable performance. DCSOM and DDRNet are both complete approaches that perform well in terms of accuracy and recall. DeblurGAN, a generative adversarial networks designed specifically for image deblurring, has strong performance across a range of metrics. Figure 21 shows the training loss of the proposed work, Figure 22 shows the testing accuracy of the proposed work. However, Dataset 2 is where the proposed methods really shine, with an accuracy and precision of over 99% being achieved. This shows that the proposed solutions are very robust and effective, making them a good fit for the situation at hand. Including the AUC score improves the discriminating power of the proposed methods. Though promising, these findings should be interpreted with caution due to the specifics of the datasets and potential challenges in using these techniques in the real world.

Figure 21:

Training loss. Abbreviation: NN, neural network.

Figure 22:

Testing accuracy.

CONCLUSION

The goal of this study is to use DL methods for the identification of handwritten digits. To improve the quality of digital photographs, the procedure starts with HDR photo preprocessing. The input data may be refined with the use of methods like convolution and Gaussian filters. Importance of sophisticated neural network topologies in dealing with sequential data is further shown by the following use of the Hierarchical Recurrent Neural Network for picture categorisation.

One of the significant characteristics of this study consists of the inclusion of the HRNN-GOA algorithm. This approach is crucial in improving the model’s parameters since it combines the Hierarchical Recurrent Neural Network with the Grasshopper Optimisation Algorithm (GOA). To perfect and speed up the process of recognising handwritten digits, it is essential to combine DL with optimisation techniques.

Accuracy, precision, recall, specificity, AUC, F1-score, and false-positive rate are only a few of the measures used to assess the model’s efficacy. When taken as a whole, these measures give a holistic evaluation of the model’s performance on the well-known MNIST datasets. A complete comprehension of the model’s capabilities in terms of accuracy, precision, sensitivity, specificity, and total discriminatory power is ensured by the focus on many performance indicators.

When applied to handwritten digit recognition, the combination of DL, preprocessing methods, and the novel HRNN-GOA algorithm provides a comprehensive method for improving performance. The assessment measures used provide a solid basis upon which to build future improvements in the fields of image recognition and pattern classification by quantifying and validating the model’s performance.

However, the HRNN model’s efficiency may drop if the GOA algorithm becomes stuck on a single optimum solution. Future studies should focus on developing more effective optimisation strategies for this problem so that the model’s performance may be enhanced.

[1] Ali S, Sahiba S, Azeem M, Shaukat Z, Mahmood T, Sakhawat Z, et al.. 2023. A recognition model for handwritten Persian/Arabic numbers based on optimized deep convolutional neural network. Multimed Tools Appl. Vol. 82(10):14557–14580

[2] Bappy MH, Haq MS, Talukder KH. 2022. Bangla Handwritten Numeral Recognition using Deep Convolutional Neural Network. p. 863–877. Khulna University Studies.

[3] Bhattacharjee S, Sifat MBU, Kibria JB, Pathan NS, Mohammad N. 2023. Recognition of Bengali handwritten digits using spiking neural network architecture2023 International Conference on Electrical, Computer and Communication Engineering (ECCE); IEEE. Singapore. p. 1–5

[4] Bordoni S, Stanev D, Santantonio T, Giagu S. 2023. Long-lived particles anomaly detection with parametrized quantum circuits. Particles. Vol. 6(1):297–311

[5] Boymatova M. n.d.. Handwritten text image recognition algorithmsThe XXII International Scientific and Practical Conference “Modern scientific space and learning in special conditions”; Toronto, Canada. 05-07 June; p. 311

[6] Carrasquilla J, Hibat-Allah M, Inack E, Makhzani A, Neklyudov K, Taylor GW, et al.. 2023. Quantum hypernetworks: training binary neural networks in quantum superposition. arXiv preprint arXiv. 2301.08292

[7] Chandra PR, Karthik J, Tharun G, Bhargav A, Rakesh A, Mahesh G. n.d.. High accuracy handwritten digit recognition using deep convolutional neural network architecture. In: International Conference on Innovative Computing and Communication; September 2023; New Delhi

[8] Chethan M, Anirudh R, Rani MK, Dey SR. 2023. A novel segmentation-free approach for handwritten sentence recognitionComputational Intelligence: Select Proceedings of InCITe 2022. p. 641–648. Springer Nature Singapore. Singapore:

[9] He C, Zhao D, Fan F, Zhou H, Li X, Li Y, et al.. 2023. Pluggable multitask diffractive neural networks based on cascaded metasurfaces. Opto-Electron Advan. Vol. 7:230005

[10] Kiani F, Xia Q. 2023. Bipolar flash for bifunctional computing operations. Nature Nanotechnol. Vol. 18(5):444–445

[11] Kiruthika R, Manivannan A. 2023. A delay dependent stability condition for Hopfield neural networks via Wirtinger-based inequalityAIP Conference Proceedings; Vol. Volume 2852(No. 1)AIP Publishing. Chennai, India. [Cross Ref]

[12] Larochelle H, Bengio Y, Louradour J, Lamblin P. 2009. Exploring strategies for training deep neural networks. J. Mach. Learn. Res. Vol. 10(1):1–40

[13] Le HH, Baig MA, Hong WC, Tsai CH, Yeh CJ, Liang FX, et al.. 2023. CIMulator: a comprehensive simulation platform for computing-in-memory circuit macros with low bit-width and real memory materials. arXiv preprint arXiv. 2306.14649

[14] Li X, He Q, Yu T, Cai Z, Xu G. 2023. Coexistence behavior of asymmetric attractors in hyperbolic-type Memristive Hopfield neural network and its application in image encryption. Chinese Physics B. Vol. 10:256–265

[15] Mehta A, Chaturvedi A, Rathod D, Patel M. 2019. Handwritten digit recognition from digital image. IJITEE. Vol. 8(12)

[16] Mekapothula MSS, Pullagura P, Potharlanka JL. 2023. Hybrid approach for handwritten digit recognition using deep learning and ESRGAN-based image super-resolution2023 2nd International Conference on Edge Computing and Applications (ICECAA); p. 741–746. IEEE.

[17] Mondal D, Kumar N, Kaur R. 2023. Dignet: a deep learning-based efficient digit recognition systemSoft Computing: Theories and Applications: Proceedings of SoCTA 2022. p. 219–230. Springer Nature Singapore. Singapore:

[18] Moya-Albor E, Brieva J, Ponce H. 2023. Towards the distributed wound treatment optimization method for training CNN models: analysis on the MNIST dataset. REPOSITORIO SCRIPTA.

[19] Muthureka K, Srinivasulu Reddy U, Janet B. 2023. An improved customized CNN model for adaptive recognition of cerebral palsy people’s handwritten digits in assessment. Int. J. Multimed. Inf. Retr. Vol. 12(2):23

[20] Ponce H, Moya-Albor E, Brieva J. 2023. Towards the distributed wound treatment optimization method for training CNN models: analysis on the MNIST dataset2023 IEEE 15th International Symposium on Autonomous Decentralized System (ISADS); p. 1–6. IEEE.

[21] Pugliese G, Song W, Zhang J, Academies BC. n.d.. Applications of algebraic topology to the detection of ventricular tachycardia. http://giacomopugliese.com/Applications_of_Topology_to_the_Detection_of_Ventricular_Tachycardia.pdf

[22] Sánchez-Sánchez PM, Huertas Celdrán A, Tomás Martínez Beltrán E, Demeter D, Bovet G, Martínez Pérez G, et al.. 2022. Analyzing the robustness of decentralized horizontal and vertical federated learning architectures in a non-IID scenario. arXiv preprint arXiv. 2210.11061

[23] Supakar SK, Ali M, Das S, Singh E. 2023. Handwritten digit recognition. Adv. Image Process. Pattern Recognit. Vol. 6(3):6–10

[24] Tausani L. n.d.. Investigating the dynamics of spontaneous activity in energy-based neural networks. https://hdl.handle.net/20.500.12608/42071

[25] Valencia D, Alimohammad A. 2023. A generalized hardware architecture for real-time spiking neural networks. Neural. Comput. Appl. Vol. 35:17821–17835

[26] Wan Q, Chen S, Yang Q, Liu J, Sun K. 2023. Grid multi-scroll attractors in memristive Hopfield neural network under pulse current stimulation and multi-piecewise memristor. Nonlin.Dyn. Vol. 111:1–17

[27] Weyori BA, Afriyie Y, Opoku AA. 2023. Analyzing the performances of squash functions in capsnets on complex images. Cogent Eng. Vol. 10(1):2203890

[28] Yang Y, Voyles RM, Zhang HH, Nawrocki R. 2023. Fractional-order spike timing dependent gradient descent for deep spiking neural networks. SSRN 4412806

[29] Zivasatienraj B, Doolittle WA. 2023. Dynamical memristive neural networks and associative self-learning architectures using biomimetic devices. Front. Neurosci. Vol. 17:1153183

Journal of Disability Research

Empowering the Visually Impaired: Translating Handwritten Digits into Spoken Language with HRNN-GOA and Haralick Features

Abstract

Main article text

INTRODUCTION

MATERIALS AND METHODS

Dataset

MNIST dataset

IAM handwriting dataset

Noise removal and image conversion

Feature extraction

Analysis of text using HRNN-GOA

Neural network optimisation

Mechanism of hidden layers

Grasshopper optimisation method

Classification phase

GOA structure initialisation

Training using the proposed work

Classification stage

TTS conversion

RESULTS AND DISCUSSION

Experimental results for dataset 1

Experimental results for IAM dataset

CONCLUSION

CONFLICTS OF INTEREST

DATA AVAILABILITY

REFERENCES

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Comments

Comment on this article