Gears are commonly soft machined using manufacturing processes with a geometrically defined cutting edge, with the aim of balancing workpiece quality and manufacturing costs by controlling tool wear. Early detection of critical tool wear is therefore a key factor. In this article, an algorithm for detecting and quantifying tool wear on gear cutting tools using computer vision methods is presented. For this, the suitability of traditional and deep learning-based computer vision methods for tool wear detection is compared. Traditional methods used include binary thresholding, edge detection, and contour detection. The suitability of different convolutional neural network architectures for the application of deep learning-based computer vision methods is compared. Using U Net with EfficientNet as the backbone, an Intersection over Union score of IoU = 0.65 and a loss of l = 0.05 is achieved on the training data and an IoU score of IoU = 0.64 and a loss of l = 0.06 is achieved on the validation data after 60 epochs. The convolutional neural network for identifying the worn area on gear cutting tools is then integrated into a wear quantification algorithm. To quantify the tool wear, the mean and maximum wear width VBm and VBmax, and the worn area VBarea are calculated. To ensure the performance of the algorithm, a manually measured wear curve and an algorithm-generated wear curve from bevel gear cutting and gear hobbing wear trials are compared. Overall, good agreement is achieved between the algorithm-generated and manually measured wear curves with an average absolute difference of ∆VBmax = 7.99 µm for bevel cutting and ∆VBmax = 19.95 µm for gear hobbing.
1 Introduction and motivation
Soft machining of gears is usually performed by manufacturing processes with a geometrically defined cutting edge, such as gear hobbing for cylindrical gears or face hobbing for bevel gears. Regardless of the manufacturing process, the objective of productive manufacturing is to maintain the required workpiece quality while reducing costs. Reducing costs includes controlling tool wear that affects workpiece quality and leads to costs associated with tool replacement, machining downtime or scrap parts. Early detection of critical tool wear is therefore a key factor in productive manufacturing. Tool wear can be assessed either directly, using optical sensors, or indirectly by analyzing acceleration or acoustic signals or the workpiece quality. While optical measurements offer higher accuracy, there is also uncertainty due to human interpretation of the tool wear images. As a result, approaches to use computer vision to detect and quantify tool wear have been pursued in recent years. [1,2,3,4,5]
2 Fundamentals of computer vision
Computer Vision (CV) is a branch of artificial intelligence that deals with the automated processing and analysis of visual data. Algorithms and techniques are used to extract information from visual data, recognize patterns and draw conclusions. CV is used in various areas, such as medicine to support pathology in analyzing tissue samples, in the automotive industry for autonomous driving, or in mobile devices for facial recognition. [6]
CV methods are generally divided into traditional and deep learning (DL)-based methods. Both approaches differ in the way features are extracted from images and patterns are recognized. Traditional methods are based on extracted features such as edges, corners, or texture features to obtain relevant information for processing. To do this, traditional methods use techniques such as filtering, edge detection, feature extraction, and classic machine learning algorithms such as support vector machines or random forests. These methods are often efficient at processing small datasets and can be useful in applications with limited resources. However, traditional methods often require manual feature extraction and are usually less robust to unexpected variations in visual data. In contrast, DL-based methods use neural networks to learn to extract features from the data itself.
Convolutional neural networks (CNNs) are a type of artificial neural network that have been specially developed for processing images and visual data. Special features of CNNs are convolutional or pooling layers. Convolutional layers perform filtering operations on an image to recognize different features such as edges, textures or shapes. Pooling layers reduce the spatial dimensions of the features and generalize the extracted features. By combining convolutional and pooling layers, CNNs can learn complex hierarchies and features, from simple edges and textures to complex structures and concepts. DL-based methods are suitable for extracting complex features from large amounts of data. However, DL-based methods require a large amount of training data and more computational resources compared to traditional methods. The choice between traditional and DL-based methods depends on the complexity of the task, the availability of data, and resources and the specific requirements of an application. [7]
Research into the use of CV methods to detect and quantify tool wear has focused on abrasive wear on the flank of milling and turning tools. Jeon and Kim developed a system for measuring wear on cutting tools. The measuring system included a camera and lighting system for recording the tool wear images and an algorithm for identifying and quantifying the worn area. The algorithm used a combination of traditional CV methods to identify the worn area, such as thresholding, histogram projection, and contour detection. The comparison between the manually measured and algorithmically quantified maximum wear width resulted in an average deviation of ∆VBmax = 0.1 mm. [1]
Thakre et al. developed a measurement system consisting of a camera and an image analysis system to identify and quantify wear on carbide inserts. The algorithm presented combined traditional CV methods such as thresholding, median filter, dilation, and canny edge detection. For quantification, the wear images were normalized, and each pixel was assigned a value in microns. The average and maximum wear width VBm and VBmax as well as the worn area VBarea were calculated. The average deviation between the manually measured and algorithmically quantified mean wear width was ∆VBm = 3%. [2]
Due to the high computational power of modern computers and the ability to consider more diverse environments with DL-based methods, those methods are increasingly being used for modeling. Wu et al. used a CNN to categorize different types of wear on end mills, such as abrasion or adhesion, achieving a recognition accuracy of 96.2%. Depending on the type of wear, further data pre-processing steps were carried out. Traditional CV methods such as thresholding, median filter, and contour detection were combined to identify the worn area, regardless of the type of wear. To calculate the maximum wear width VBmax, the number of pixels was converted into micrometers. The average deviation between the algorithmically quantified and the manually measured maximum wear width was ∆VBmax = 4.76%. [3]
Bergs et al. used DL-based methods to differentiate between ball nose end mills, end mills, drills, and indexable inserts. A recognition accuracy of 95.6% was achieved with a CNN. A fully convolutional neural network (FCNN) was then used to identify the worn area on the tools. While an average Intersection over Union (IoU) score of IoU = 0.73 was achieved for the training data, the average IoU value for the entire data set was IoU = 0.37. When applied to the entire dataset, edges and scratches were partially incorrectly recognized as wear. [4]
Friedrich et al. developed a system for semi-automatic tool wear monitoring and wear classification of milling tools. They defined an experimental setup including a camera system, illumination, and a tool holder with an automatic tool positioning system to acquire images of milling tools in different wear conditions using a combination of traditional — e.g. Gaussian filtering, Canny edge detection, k means clustering — and DL-based CV methods — e.g. pre-trained CNNs such as VGG16. Friedrich et al. developed a model to classify the image dataset into the categories new, light wear, medium, and heavy wear. In the end, the accuracy of classifying the images into the correct categories was 96%. [5]
3 Objective and approach
The aim of productive manufacturing is to ensure workpiece quality while minimizing manufacturing costs, whereby tool wear influences both workpiece quality and manufacturing costs [8]. In the previous section, the potential of using CV methods to identify and quantify tool wear on cutting tools was demonstrated. The wear detection algorithms for turning and milling tools presented in the previous section are not readily applicable to gear cutting tools. On the one hand, the width of wear marks on gear cutting tools can vary greatly in different local areas, including the flank faces and the tip cutting edge. On the other hand, different cutting materials and coatings are used, which significantly influence the appearance of the worn surface. Consequently, the wear patterns of gear cutting tools differ from those of other cutting processes. Therefore, the aim of this report is to develop and implement an algorithm for identifying and quantifying wear on gear cutting tools based on a comparison of traditional and DL-based CV methods. The approach used is shown in Figure 1.

The first step is data acquisition and processing. During data acquisition, images of gear hobbing and bevel gear cutting tools are taken in different wear conditions from different perspectives and at different magnifications. Various transformations are used to increase the number of tool wear images. This is followed by data processing, which involves standardizing and scaling the wear images.
In the next step, the suitability of traditional and DL-based CV methods for identifying tool wear in the previously prepared wear images is compared. On the one hand, traditional methods such as binary thresholding, edge detection, and contour detection are combined. On the other hand, different CNN architectures such as DeeplabV3+, ResNet, and U Net are implemented.
Once a suitable method for identifying tool wear has been found, the next step is to implement an algorithm for quantifying tool wear. First, a neural network is trained to detect tool wear in images. Subsequently, an algorithm is implemented that extracts and quantifies the wear from the tool wear images using a combination of traditional and DL-based CV methods. To quantify the tool wear, the mean and maximum wear width VBm and VBmax as well as the area of the worn surface VBarea are determined.
Finally, the performance of the implemented algorithm for quantifying tool wear is tested. For this purpose, sample wear curves for a gear hobbing and a bevel cutting trial are generated using the algorithm. The algorithmically generated maximum wear widths VBmax are then compared to the manually measured maximum wear widths VBmax under different tool wear conditions.

4 Description of the data set and data preparation
High-quality and representative data is crucial for the performance of CV methods. Especially for training models from DL-based methods, a large amount of data is required for training so the model can learn patterns and features. Typically, datasets of at least 10,000 images are used, as datasets that are too small can lead to overfitting of the model. To ensure the generalizability of the model, the data should also cover a large number of states to be recognized. Models that have been trained with different variations in the training data (illumination, perspective, background) are more robust to such variations in new data. Overall, data preparation ensures the data is of high quality, diverse, and representative, which influences the performance, robustness, and generalizability of models [7].
An overview of the data used to develop the algorithm presented in this report is depicted in Figure 2. In general, images of worn tools from bevel gear and hobbing wear trials were used. For bevel gear cutting, the influence of tool and process design on tool wear for face milling and face hobbing plunging was analyzed in single blade group cutting trials. Relevant data on the gears is summarized in Table 1. In the bevel gear cutting trials, K30 carbide inserts with an AlCrN and an AlTiN coating were used. During the trials, the wear on the main cutting edge, tip, and clearance side of the stick blades was documented in regular intervals.

The wear images used to develop the algorithm were taken with a Dinolite digital microscope mounted on a fixture in the machine, and at the end of tool life with a Keyence microscope.
For gear hobbing, the images were taken from fly-cutting trials without cooling lubricant. The influence of process parameters on tool wear depending on the combination of workpiece and cutting material was investigated in fly-cutting trials. As a result, the dataset contains images of fly-cutters made of PM-HSS S390, FeCoMo (MC90) and K30 carbide, all with an AlCrN coating. During the trials and at the end of tool life, the wear on the leading and trailing flank and on the tip of the fly-cutter was documented using a Keyence microscope.
The resulting dataset consists of 646 images of worn stick blades from bevel gear cutting trials and 594 images of worn fly-cutters from gear hobbing trials. The datasets contain images of different cutting edges of the tools in various states of wear up to the end of the tool life. All images were taken at either 50↔, 70↔, or 150↔ magnification. The background was white, gray, black, or reddish.
The type of data processing determines the quality, variety, and representativeness of the data, which affects the performance, robustness, and generalizability of models resulting from DL-based methods [7]. Since the number of tool wear images is not sufficient to train a model, transformations are used to increase the number of images. The transformations used in this report include horizontal flipping, resizing, translation, rotation, brightness changes, contrast changes, and color variations. A sample of the transformations used for an example of the tip of a stick blade is shown in Figure 2. The characteristics of the transformations reflect the natural variations that can occur during the measurement process.
Each transformation has been assigned a probability of occurrence. This means the transformations are applied to the tool wear images both individually and in combination at random to avoid creating a systematic structure in the data set. In total, the transformations generate 10 individual images from each original image, so the final data set consists of a total of 5,940 images of stick blades and 6,460 images of fly-cutters.
Finally, the pixel values of all tool wear images are normalized to a uniform scale and converted into a grayscale image. In addition, the tool wear images are scaled to a uniform resolution of 1,200 ↔ 1,600 pixels and the format is normalized to 320 ↔ 320 pixels. This ensures the uniformity of the input data for further use in traditional and DL-based CV methods.
5 Comparison of the suitability of computer vision methods for wear detection on gear cutting tools
CV uses both traditional and DL-based methods to analyze and understand visual data. The choice between traditional and DL-based methods depends on the complexity of the task, the availability of data and resources, and the specific requirements of the application. In the following, both traditional methods such as binary thresholding and DL-based methods in the form of different CNN architectures are used to identify the wear surface in tool-wear patterns. The implementation is done with Python 3.11.4 in the Anaconda 23.7.4 environment. Based on a comparison of the suitability of the CV methods, a suitable CV method for the identification of the wear in tool wear images is selected.

5.1 Traditional methods
Traditional CV methods are based on extracted features and classical machine learning techniques to analyze and process visual information. Due to their efficiency in terms of computational resources and data size, traditional methods are often used for object classification, detection, and segmentation. In order to use the advantages of different traditional methods, traditional methods are used in combination [6]. In Figure 3, binary thresholding, edge detection, and contour detection were applied to identify the wear in an image of the leading flank of a fly-cutter. In the top left of Figure 3 the reference image taken with a Keyence microscope at 50↔ magnification is depicted.
First, binary thresholding was applied to the reference image. The resulting image is depicted in the upper right of Figure 3. Binary thresholding is used to convert a grayscale image into a binary image to highlight features. Each intensity value of a pixel is compared to a threshold value. Pixels above the threshold value are displayed in white, and pixels below the threshold value are displayed in black [7]. With the eight-bit color depth used, the pixel intensity is measured on a scale from zero to 255, with zero representing the lowest and 255 the highest pixel intensity. For this application, a lower threshold of 160 and an upper threshold of 235 were determined iteratively. However, binary thresholding not only highlights the wear surface along the cutting edge of the fly-cutter, but also scratches and irregularities on the flank face are recognized and displayed in white. In addition, the algorithm reacts sensitively to changing light conditions and light reflections on the tool surface.
To highlight the wear more visibly in the image, edge detection is applied next. The resulting tool wear image is shown in the lower left of Figure 3. Edge detection aims to identify edges or transitions between objects in an image [7]. Mathematical operators are used to detect the change in intensity or color information in the visual data, and the edges are identified using, for example, hysteresis thresholding [6]. Hysteresis thresholding is a method for linking and tracking edge points. There is a high and a low threshold value for pixel intensity. Pixels that lie above the high threshold value are considered strong edge points, while pixels between the low and high threshold values remain undecided. Hysteresis thresholding is used to connect strong edge points across undecided pixels and thus decide whether undecided pixels belong to the edge [6]. Analogous to binary thresholding, the threshold values for edge detection also represent the pixel intensity and are determined iteratively. For the present application, this results in a lower threshold of 160 and an upper threshold value of 235. However, edge detection is sensitive to noise in the tool wear images, which means unwanted edges are detected. In the tool wear image, for example, light reflections and scratches on the flank face of the tool are recognized as edges. In addition, the thickness of the edges is not always detected exactly and varies as a result.
Finally, contour detection is applied to the tool wear image to suppress the noise and highlight the wear. Contour detection identifies the outer outlines or contours of objects in visual data by searching for contiguous groups of edges or lines. This defines the shapes of the objects [7]. The line width represents the width of the identified contour lines in an image and is specified in pixel. The line width should be selected so the contours are sufficiently emphasized without overemphasizing or distorting details [6]. For contour detection, gradient-based methods such as the Sobel operator are used. The Sobel operator uses two separate folding masks in the horizontal and vertical directions. The gradient is calculated for each pixel in the horizontal and vertical directions, and the resulting gradient is then determined using the Euclidean norm. As a result, pixels with a high intensity variation, i.e. edges, are displayed brighter than pixels with a low intensity variation [6]. As with edge detection, the result of contour detection is affected by image quality, noise, and blurring. In the resulting tool wear image, which is shown in the lower right of Figure 3, a continuous edge separating the tool from the background can be seen. However, the wear is not enclosed by a complete contour line.
In summary, a combination of binary thresholding, edge detection, and contour detection were used to identify the wear in the tool wear images. However, visual similarities between worn and non-worn surfaces, noise due to light reflections, and scratches on the tool’s flank face pose a challenge. Despite the combination of different traditional CV methods, no closed contour of the wear could be identified in the tool wear images. Therefore, in the next step, DL-based CV methods are used to identify the wear.
5.2 Deep learning based methods
In the previous section, it was shown that traditional CV methods alone are prone to errors in the detection of wear surfaces in images of worn gear cutting tools. Therefore, in the following, DL-based CV methods are used and the suitability and performance of different CNNs are compared.

In addition to the data preparation described in Section 4, labeling of the data is required for the application of DL-based CV methods. Based on the reference in the upper left of Figure 4, the tool wear image is manually divided into three classes using the LabelMe software: background, tool (red), and wear (green) [9]. Each category is represented by a different color. Other elements are automatically classified as background by the software and displayed in black. The labeled reference image is shown in the upper right of Figure 4. Careful labeling is important so a CNN can learn and generalize the patterns to accurately identify tool wear.
Pre-trained CNN architectures are used to identify tool wear because these CNNs have already been trained on larger datasets to recognize complex features in visual data. This results in several advantages. Since the pre-trained model already has an understanding of basic features, less training data is needed to refine the CNN on specific data. By training on large datasets, pre-trained CNNs have learned to promote generalization and reduce overfitting. In addition, pre-trained CNNs have been trained on powerful hardware and have complex structures that are often more efficient than self-built CNNs. The selection of a suitable CNN depends on the specific requirements of the application, the available resources, and the type of problem to be solved [10]. For the present application, the suitability of DeeplabV3+, ResNet, and U Net with EfficientNet as its backbone was compared for identifying tool wear. For this purpose, all CNNs were trained over 10 epochs with a batch size of eight and a learning rate of 0.01. For training, the dataset without transformations presented in Section 4 was divided into 80% training data, 10% test data, and 10% validation data and thus consisted of 992 training images, 124 test images, and 124 validation images.
DeeplabV3+ specializes in semantic segmentation, i.e. the pixel-precise assignment of object classes in images. For example, DeeplabV3+ is used for environmental analysis in robotics or for mapping and recognizing objects in satellite images. DeeplabV3+is designed to learn a large number of parameters, which increases the complexity of the model. On the one hand, the higher complexity improves the ability to recognize complex patterns in the visual data. On the other hand, it increases the computational complexity [10]. DeeplabV3+ could not be fully trained for the available dataset and with the available computational resources.
ResNet uses residual blocks to skip layers in order to avoid the problem of vanishing gradients and thus train deep CNNs efficiently. During training, gradients are used to adjust the weights of the network and to optimize the CNN. For mathematical reasons, e.g. due to activation functions, the gradients can assume very small values close to zero, especially in deep layers, so the training is slowed down or stagnates. ResNet is used, for example, in medical diagnostics to classify diseases on the basis of medical images [11]. The wear identified with ResNet is shown in the lower left of Figure 4. The tool, the wear, and the background cannot be clearly distinguished from each other, as no contours are recognizable. As a result, ResNet is unsuitable for wear detection on gear cutting tools.
U Net is a specialized architecture for semantic segmentation, while EfficientNet has a scalable and efficient architecture. The combination uses the training efficiency of EfficientNet for feature extraction and the U Net architecture for precise segmentation. The CNN is used, for example, in the biomedical environment for the segmentation of cells or structures in microscopic images or in the analysis of satellite images for environmental monitoring [12]. The tool image segmented with the combination of U Net and EfficientNet is shown in the lower right of Figure 4. The three classes tool, wear, and background are distinguishable from each other, and the comparison between the segmented tool image and the reference with and without labels shows a good agreement. Consequently, the combination of U Net and EfficientNet is used for the implementation of an algorithm to identify the wear in tool wear images of gear cutting tools.
6 Implementation of an algorithm for quantifying tool wear on gear cutting tools
The contour accuracy of the wear identified in the previous section is not sufficient for the quantification of tool wear, since a closed contour with sharp edges of the wear is required. Therefore, an algorithm combining traditional and DL-based CV methods is implemented to quantify the tool wear. To identify the wear in the tool wear images, the CNN from U Net and EfficientNet is first trained. The trained CNN is then integrated into the algorithm for quantifying the tool wear. The implementation is done in Python 3.11.4 in the Anaconda 23.7.4 environment.
6.1 Training of the neural network for tool wear detection
To quantify tool wear, the wear in the tool wear images must first be identified. For this purpose, a suitable CNN architecture consisting of the combination of U Net and EfficientNet was determined in the previous section. In order to increase the performance and robustness of the CNN, the CNN is trained on the transformed data presented in Section 4. For this purpose, 80% of the dataset, i.e. 9,920 images, and 10% of the data set, i.e. 1,240 images, are used for training and validation respectively. Due to the amount of data and the CNN architecture, the training is performed on the high performance cluster (HPC) at RWTH Aachen University. To ensure an effective training of the CNN, the hyperparameters learning rate, the number of epochs and the batch size were determined iteratively. For the final training, a learning rate of 0.001, 60 epochs, and a batch size of eight were used.

To monitor and evaluate the learning process of the CNN, the intersection over union (IoU) score and the loss are considered. The development of the IoU-score and the loss depending on the epochs are depicted in Figure 5. The training data is shown in dark gray and the validation data in light gray.
The IoU score on the left side of Figure 5 is a metric for the accuracy of the segmentation and is the quotient of the cross-section and the combined area of the predicted and actual wear. Therefore, the IoU score assumes values between zero and one, with a higher value corresponding to a more accurate segmentation [7]. The IoU score of both the training and validation data increases degressively with the number of epochs. The IoU score of the training data is higher than the IoU value of the validation data. After 60 epochs, the IoU score of the training data is 0.65 and the IoU score of the validation data is 0.64.
The loss, on the right side of Figure 5, is calculated iteratively during training and is an indicator of the CNN’s adaptation to the training data. A decreasing course of the loss indicates that the CNN is improving its segmentation and can better map the training data [6]. For the present classification task, a categorical cross-entropy loss function is used. The loss of the training and validation data decreases significantly in the first 10 epochs and then declines degressively. After 60 epochs, the loss of the training data is 0.05 and the loss of the validation data is 0.06.
At the bottom of Figure 5 two examples of identified wear that were generated using the trained CNN are depicted. On the left side, the abrasive wear on the cutting edge of a stick blade was predicted and on the right side, the abrasive wear on the leading flank of a fly-cutter was predicted. Both predicted worn areas, which are depicted in white, show good agreement with the actual wear on the microscope images.
In summary, the robust performance and effective learning of the CNN is illustrated by the convergence of the loss and IoU values of training and validation data. The ability of the CNN to generalize beyond the training dataset is confirmed by the comparable performance of the training and validation metrics. The small deviation between predicted and actual wear is also reflected in a loss of 0.06 for the validation dataset. The IoU value of 0.64 for the validation data set indicates a high segmentation accuracy of the CNN.
6.2 Algorithm for tool wear quantification
In addition to the identification of wear surfaces in tool wear patterns, the determination of wear parameters according to DIN ISO 3685 is relevant for the assessment of the tool wear condition [13]. Only the specification of wear parameters enables a statement to be made about the remaining service life of a gear hobbing tool. For this reason, an algorithm for quantifying tool wear is implemented using the CNN trained in section 6.1. The sequence of the algorithm for identifying and quantifying tool wear is depicted in Figure 6.

First, the tool wear images are imported and pre-processed. Pre-processing includes scaling, normalization, color space conversion, and padding. The pixel intensity values of all tool wear images are normalized to a uniform scale from zero to 255. In addition, the tool wear images are normalized to a uniform format of 320 ↔ 320 pixel. The tool wear images are then converted into a grayscale image to reduce the amount of information and noise contained in the image. Padding is used to add additional pixel with a value of zero around the edge of the tool wear image. This ensures the spatial information at the edge of the image is not lost during further processing of the tool wear image. The result is a uniform and standardized dataset.
After pre-processing the dataset, the tool wear is detected using the CNN of U Net with EfficientNet as a backbone trained in Section 6.1. The wear is identified in the tool wear images and the resulting image data is stored. An example of an identified wear on the tip of a stick blade from bevel gear cutting is depicted in Figure 7 top left. In addition to the wear on the tip of the stick blade, also the contour of the main cutting edge and the clearance side are depicted in white.

A closed contour around the wear is required to quantify the tool wear. Hence, the image with the detected wear must be post-processed. After rescaling each image file to the original format of 1,200 ↔ 1,600 pixels, post-processing is carried out in several steps. First, k means clustering is applied with one class to increase the accuracy of the identified wear surface. K means clustering is a method of grouping pixels into k predefined classes and aims to group similar pixels into the same classes based on their characteristics or properties [7]. The resulting identified wear is depicted in Figure 7, lower left. Compared to the wear identified by the CNN, more accurate contours are recognized, but the contours of the main cutting edge and the clearance side are still assigned to the wear.
To distinguish the wear from the cutting edges and to suppress the background noise, binary thresholding is used in the next step. A lower threshold value of 100 and an upper threshold value of 200 are used for this purpose. The resulting detected wear is depicted in the top center of Figure 7. Binary thresholding only highlights the wear surface relevant for quantification.
Dilation is used to obtain a closed contour around the wear surface. During dilation, a core is moved across the image, whereby the core is defined as a square or rectangular matrix. A check is made at each position in the image to determine whether an object pixel is present. If at least one object pixel is present, the central point of the core is marked as part of the object in the output image [6]. To obtain a closed contour of the wear, dilation is performed with a 5 ↔ 5 matrix and one iteration. The largest contour in the image is then identified and outlined using the contour function of OpenCV, as shown in the lower center of Figure 7. [14]
To quantify the wear, a coordinate system is defined in which the x axis is along the image width and the y axis along the image height. An exemplary contour of a worn area in a coordinate system and the equations for calculating the wear parameters are shown in Figure 7 on the right. The x and y values are integers and defined in pixel. The corresponding y values of the contour are stored for each x value. The difference between the maximum and minimum y value corresponds to the contour width at the x position ∆y(x). The contour width ∆y(x) specified in pixel is converted to the wear width VB (x) defined in micrometers using the factor U, which depends on the image resolution and magnification used. In the present application, the tool wear image has a width of 1600 pixel, which corresponds to a measured image width of 2345 µm, resulting in a conversion factor of U = 1.4656 µm. The average and maximum wear widths VBm and VBmax are then determined considering all x values. The area of the worn surface VBarea is calculated by approximating the contour area by rectangles, using the square shape of the pixel with an edge length of 1. For each x value, the area can be calculated from the associated contour width multiplied by the pixel edge length of 1. The contour area and thus the area of the worn surface VBarea is therefore the sum of the contour widths ∆y(x) or the wear widths VB(x) for all x values.
7 Comparison of manually measured and algorithm generated wear curves
To ensure the performance and accuracy of the implemented algorithm, algorithmically generated and manually measured wear curves are compared. In Figure 8, the algorithmically generated wear curves are compared with the manually measured wear curves for one bevel gear cutting application and one gear hobbing application. In each case, the maximum wear width VBmax is compared, as this determines the tool life N or L achieved. By comparing the wear curves, different wear conditions of the tools are taken into account, ranging from light to severe wear.

For bevel gear cutting, wear images were captured using a Dinolite digital microscope during the trials and using a Keyence optical microscope at the end of the tool’s life since the stick blades could not be removed from the cutter head during the trials [15]. For gear hobbing, all wear images were captured using a Keyence optical microscope [16]. Tool wear was measured using the integrated software of each microscope. In both cases, straight lines were placed manually along the cutting edge of the tool by selecting two points along the cutting edge. Then, the operator selects the highest point orthogonally to the line to measure the maximum wear width VBmax. As the images were taken during previous wear trials, the original measurements were performed by different operators. To assess the measurement’s uncertainty, a single operator repeated the measurements for the validation dataset. The deviation between the original and repeated maximum wear width was IVBmaxI ≤ 5 µm. Hence, the average of the original and repeated measurements was used to assess the performance of the algorithm.
On the left side of Figure 8 both the algorithmically generated and manually measured wear of a sample application from bevel gear cutting is depicted. The wear investigation was carried out in a single blade group trial as an analogy trial to face hobbing plunging. The K30 carbide blades used, are coated with AlTiN. The maximum wear width VBmax at the tip of the outside blade is for the number of workpieces manufactured N. The manually measured wear is depicted in dark gray and the algorithmically generated wear in light gray.
The manually measured wear increases degressively up to N = 25 workpieces. The manually measured wear curve then increases linearly until the end of the tool life after N = 80 workpieces. The algorithmically generated wear curve also shows a degressive increase up to N = 25 workpieces and then an almost linear increase until the end of the tool life. In general, the manually measured and the algorithmically generated wear curves show a qualitatively similar progression.
The largest differences occur between N = 35 workpieces and N = 65 workpieces, where the mean absolute difference between the algorithmically generated maximum wear width VBmax and the manually measured maximum wear width is ∆VBmax = 14.61 µm. At the beginning and end of the wear curve, the algorithmically generated wear is higher than the manually measured wear and the mean absolute difference is ∆VBmax = 2.84 µm. Over the entire course of the wear curve, the difference between the algorithmically generated and the manually measured maximum wear width is on average ∆VBmax = 7.99 µm. A microscope image of the worn tip of the outside blade after N = 45 workpieces and the corresponding algorithmically identified wear are shown below the diagram. Overall, the manually measured and algorithmically generated maximum wear widths VBmax are in very good agreement.
On the right side of Figure 8 manually measured and algorithmically generated wear curves of a sample gear hobbing application are depicted. In the example shown fly-cutting trials were carried out without cooling lubricant with fly-cutter made of PM-HSS S390 with AlCrN coating were used. For the comparison the maximum wear width VBmax in area B of the leading flank depending on the cut length L is used. The manually measured wear widths are depicted in dark gray and the algorithmically generated wear widths in light gray.
The manually measured wear increases degressively up to the end of the cut length of L = 17.79 m. In comparison, the algorithmically detected wear is higher than the manually measured wear at the beginning and end of the wear curve. Between a cut length of L = 3.81 m and L = 13.98 m, the algorithmically generated wear curve is lower than the manually measured wear. The largest difference in wear occurs at a cut length of L = 12.21 m with ∆VBmax = 30.9 µm. A microscope image of the worn fly-cutter and the corresponding algorithmically identified wear at a cut length of L = 12.21 m are shown below the graphs. The microscope image of the fly-cutter shows two breakouts that were recognized and quantified as wear by the algorithm. In the algorithmically identified wear; however, a black spot can be seen in the area of the breakout that is closer to the radius of the cutting edge. This black area was not identified as wear by the algorithm. The largest contour width of the algorithmically detected wear is therefore at the breakout further away from the cutting-edge radius, resulting in a smaller algorithmically detected than manually measured maximum wear width VBmax. Overall, the mean absolute difference between the algorithmically generated and manually measured maximum wear widths is ∆VBmax = 19.95 µm. Hence, the algorithmically detected and manually measured wear are in good agreement.
To evaluate the algorithm’s performance further, a sample dataset was created using 126 images of bevel gear cutting and 101 images of gear hobbing from the validation data described in section 5.2. Therefore, none of these images was used to train the CNN. In Figure 9, the comparison between the algorithmically detected maximum wear width VBmax, algorithm on the y axis, and the manually measured maximum wear width VBmax, measured on the x axis for bevel gear cutting is depicted. If the algorithm generated and manually measured maximum wear widths aligned perfectly, they would coincide with the dashed black line. With a standard deviation of σ = 9.866 µm and a level of significance of α = 0.05, the resulting confidence interval is CI = 8.923 µm. This means that, with a 95% probability, the deviation between the algorithm generated and manually measured maximum wear width is below ∆VBmax = 8.923 µm.

On the right side of Figure 9, the comparison between the algorithmically detected maximum wear width VBmax, algorithm and the manually measured maximum wear width VBmax, measured for gear hobbing is depicted. For gear hobbing, the standard deviation is higher at σ = 18.199 µm compared to bevel gear cutting. Hence, at a significance level of α = 0.05, the confidence interval is CI = 16.701 µm.
The algorithm overestimates and underestimates the maximum wear width VBmax for bevel gear cutting and gear hobbing compared to manual measurements. Overall, the wear curves and wear measurements generated by the algorithm and those measured manually are in good agreement for both processes. Nevertheless, better agreement is achieved for bevel gear cutting than for gear hobbing.
The difference between the algorithmically detected and manually measured maximum wear widths VBmax may be due to measurement inaccuracies during manual wear measurement on the microscope by the human operator. On the other hand, differences may be the result of an inaccurately identified wear by the algorithm due to a lack of diversity in the training data or the quality of the wear images.
In terms of speed, the algorithm generates a wear curve faster than a human operator can measure it. Using a computer with an Intel® Core™ i9-10900K processor and 64 GB of RAM, it takes about two minutes to generate a wear curve once all images are available. However, most of this time is spent loading the CNN. Once the CNN is loaded, quantifying tool wear in a single image takes approximately two seconds. For production line applications, the CNN could be loaded while the operator sets up the machine. Therefore, quantification would be almost real-time afterwards. However, the process of positioning and cleaning the tool to take images inside the machine with a suitable measuring setup is still being researched. This process could significantly impact production time if it cannot be done while the workpiece is being changed automatically.
8 Summary and outlook
Gears are commonly soft-machined using manufacturing processes with a geometrically defined cutting edge, with the aim of balancing workpiece quality and manufacturing costs by controlling tool wear. In this report, an algorithm has been developed to detect and quantify tool wear on gear cutting tools using computer vision methods. To achieve this, the first step was to create a dataset of images of worn tools from bevel gear cutting and gear hobbing trials. To obtain a representative dataset, typical variations during the measurement process such as different backgrounds, lighting, and magnifications were taken into account. This data was then used to compare the suitability of traditional and DL-based CV methods for identifying the worn area in images of worn gear cutting tools. A combination of the traditional CV methods, binary thresholding, edge, and contour detection failed to identify a closed contour of the worn area. Therefore, the suitability of different CNN architectures for identifying the worn area was compared. The most accurate wear detection was achieved with the pre-trained CNN U Net and EfficientNet as the backbone.
The CNN was then trained on images of worn gear cutting tools to obtain a reliable and robust wear identification. While the training data achieved an IoU-score of 0.65 and a loss of 0.05, the validation data achieved an IoU-score of 0.64 and a loss of 0.06 after 60 epochs. The CNN for identifying the worn area on gear cutting tools was then integrated into a wear quantification algorithm. The algorithm post-processes the wear predictions from the CNN using traditional methods such as k means clustering, convex hull, binary thresholding and dilation to obtain a closed contour of the worn area. A resolution-dependent conversion factor is then used to calculate the mean and maximum wear widths VBm and VBmax and the surface area VBarea of the worn area.
Finally, to ensure the performance of the algorithm, a manually measured wear curve and an algorithm-generated wear curve from bevel gear cutting and gear hobbing wear trails were compared. In general, good agreement is achieved between the algorithm generated and manually measured wear curves with a confidence interval of CI = 8.923 µm for bevel gear cutting and of CI = 16.701 µm for gear hobbing both at a level of significance of α = 0.05. On the one hand, the existing differences may be due to measurement inaccuracies during wear measurement by the human operator. On the other hand, the differences may be the result of an inaccurately identified worn area by the algorithm due to a lack of diversity in the training data or the quality of the wear images.
In the future, the CNN will be trained on a wider variety of data to provide more accurate and robust wear identification. Wear images from other gear cutting processes with geometrically defined cutting edges, such as gear skiving, can also be used to extend the application of the algorithm. In addition, other outputs can be added to the algorithm to automatically generate wear curves. Finally, it is possible to couple the algorithm to a machine-integrated measurement system to monitor tool wear at regular intervals during the manufacturing process.
References
- Jeon J, Kim S (1988) Optical flank wear monitoring of cutting tools by image processing. Wear 127:207–217.
- Thakre A, Lad A, Mala K (2019) Measurements of Tool Wear Parameters Using Machine Vision System. Model Simul Eng 2019:1–9. https://doi.org/10.1155/2019/1876489.
- Wu X, Liu Y, Zhou X, Mou A (2019) Automatic Identification of Tool Wear Based on Convolutional Neural Network in Face Milling Process. Sensors. https://doi.org/10.3390/s19183817.
- Bergs T, Holst C, Gupta P, Augspurger T (2020) Digital image processing with deep learning for automated cutting tool wear detection. Procedia Manuf 48:947–958. https://doi.org/10.1016/j.promfg.2020.05.134.
- Friedrich M, Gerber T, Dumler J, Döpper F (2023) A system for automated tool wear monitoring and classification using computer vision. Procedia Cirp 118:425–430. https://doi.org/10.1016/j.procir.2023.06.073.
- Priese L (2015) Computer Vision, Springer Berlin Heidelberg.
- Szeliski R (2022) Computer Vision, Springer.
- Klocke F, Brecher C (2023) Zahnrad- und Getriebetechnik: Auslegung – Herstellung – Untersuchung – Simulation, 2nd edn. Hanser, München.
- LabelMe K (2021) https://github.com/labelmeai/labelme. Accessed 9 Mar. 2025.
- Alzubaidi L, Zhang J, Humaidi A (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8:53. https://doi.org/10.1186/s40537-021-00444-8.
- He K, Zhang X, Ren S, Sun J (2015) http://arxiv.org/pdf/1512.03385.pdf. Accessed 9 Mar. 2025.
- Konovalenko I, Maruschak P, Brezinová J, Prentkovskis O, Brezina J (2022) Research of U Net-Based CNN Architectures for Metal Surface Defect Detection. Machines 10:327. https://doi.org/10.3390/machines10050327.
- DIN ISO Tool-life testing with single-point turning tools (November 1993), Beuth, Berlin.
- Open Source Computer Vision (2024) OpenCV Documentation—Contours. https://docs.opencv.org/3.4/index.html. Accessed 9 Mar. 2025.
- Kamratowski M, Alexopoulos C, Brimmers J, Bergs T (2023) Model for tool wear prediction in face hobbing plunging of bevel gears. Wear 204787:524–525. https://doi.org/10.1016/j.wear.2023.204787.
- Troß N, Brimmers J, Bergs T (2021) Tool wear in dry gear hobbing of 20MnCr5 case-hardening steel, 42CrMo4 tempered steel and EN-GJS-700 2 cast iron. Wear 476:203737. https://doi.org/10.1016/j.wear.2021.203737.
























