Acessibilidade / Reportar erro

Automatic identification of tuberculosis mycobacterium

Abstract

Introduction

According to the Global TB control report of 2013, “Tuberculosis (TB) remains a major global health problem. In 2012, an estimated 8.6 million people developed TB and 1.3 million died from the disease. Two main sputum smear microscopy techniques are used for TB diagnosis: Fluorescence microscopy and conventional microscopy. Fluorescence microscopy is a more expensive diagnostic method because of the high costs of the microscopy unit and its maintenance. Therefore, conventional microscopy is more appropriate for use in developing countries.

Methods

This paper presents a new method for detecting tuberculosis bacillus in conventional sputum smear microscopy. The method consists of two main steps, bacillus segmentation and post-processing. In the first step, the scalar selection technique was used to select input variables for the segmentation classifiers from four color spaces. Thirty features were used, including the subtractions of the color components of different color spaces. In the post-processing step, three filters were used to separate bacilli from artifact: a size filter, a geometric filter and a Rule-based filter that uses the components of the RGB color space.

Results

In bacillus identification, an overall sensitivity of 96.80% and an error rate of 3.38% were obtained. An image database with 120-sputum-smear microscopy slices of 12 patients with objects marked as bacillus, agglomerated bacillus and artifact was generated and is now available online.

Conclusions

The best results were obtained with a support vector machine in bacillus segmentation associated with the application of the three post-processing filters.

Tuberculosis; Automatic bacillus identification; Neural network; Support vector machine


Introduction

The World Health Organization publishes an annual report on the global control of tuberculosis (TB) with the purpose of providing a comprehensive and up-to-date assessment of the TB epidemic. According to the Global TB control report of 2013 (World…, 2013World Health Organization - WHOGlobal TB control reportinternet2013cited 2014 Jan 24Available from: http://www.who.int/tb/publications/global_report/en), “Tuberculosis (TB) remains a major global health problem. In 2012, an estimated 8.6 million people developed TB and 1.3 million died from the disease (including 320 000 deaths among HIV-positive people). The number of TB deaths is unacceptably large given that most are preventable.”

The Millennium Development Goals (MDGs) were proposed by the United Nations Development Programme (United…, 2010United Nations - UN. The millennium development goals report [internet]. 2010 [cited 2013 Dec 18]. Available from: http://www.un.org/millenniumgoals/reports.shtml.) and adopted by world leaders in 2000. They provide concrete, numerical benchmarks for extreme poverty and its many dimensions and aim to be achieved by 2015. The program identifies 8 millennium development goals with 21 targets that are measured by 60 indicators. TB falls under the 6th goal related to fighting disease epidemics, aiming to “Combat HIV/AIDS, Malaria and other diseases”. Within this goal, the following target refers to TB: “Halt and begin to reverse the incidence of malaria and other major diseases”. Related to this target, the following indicators refer to TB: "halt and begin to reverse TB incidence by 2015; reduce prevalence and deaths of TB by 50% compared to the 1990 baseline".

To achieve these goals, the WHO adopted a Partnership Global Plan to Stop TB (World…, 2010World Health Organization - WHOGlobal TB Control reportinternet2010cited 2013 Dec 18Available from:http://www.who.int/tb/publications/global_report/2010/en/index.html), launched in January 2006, which includes smear sputum microscopy as the main diagnostic tool. One of the targets of this plan is “A treatment success rate among sputum smear positive cases of 90%”. Sputum smear microscopy is the main non-invasive technique employed for TB diagnosis. Other non-invasive techniques include culture and chest radiography.

There are two main reasons why sputum smear microscopy is appropriate for TB diagnosis. Special dyes allow for differentiating the bacillus from the background, and there is a positive correlation between the number of bacilli in the smear and the probability of them being identified by microscopy (David, 1976, as cited in Toman, 2004aToman K. How reliable is smear microscopy? In: Frieden T, editor. Toman’s Tuberculosis: case detection, treatment, and monitoring. Questions and answers. WHO; 2004a. p. 14-23.).

Two techniques are used for TB diagnosis with sputum smear microscopy: fluorescence microscopy and conventional microscopy. Fluorescence microscopy uses an acid-fast fluorochrome dye (e.g., auramine O or auramine-rhodamine) and an intense light source, such as a halogen or high-pressure mercury-vapor lamp. Conventional microscopy uses the carbolfuchsin Ziehl-Neelsen - ZN or Kinyoun acid-fast stains and a conventional artificial light source.

Fluorescence microscopy has several advantages over conventional microscopy. Fluorescence microscopy uses a lower-power objective lens (typically 25x), whereas conventional microscopy uses a higher-power objective lens (typically 100x). Fluorescence microscopy allows the identical area of a smear to be scanned in a much shorter time than conventional microscopy (Bennedsen and Larsen, 1966Bennedsen J, Larsen SO. Examination for tubercle bacili by fluorescence microscopy. Scandinavian Journal of Respiratory Diseases 1966; 47(2):114-20. PMid:4161476.); Fluorescence microscopy is approximately 10% more sensitive than conventional microscopy (Steingart et al., 2006Steingart KR, Henry M, Ng V, Hopewell PC, Ramsay A, Cunningham J, Urbanczik R, Perkins M, Aziz MA, Pai M. Fluorescence versus conventional sputum smear microscopy for tuberculosis: a systematic review. The Lancet Infectious Diseases 2006; 6(9):570-81. http://dx.doi.org/10.1016/S1473-3099(06)70578-3. PMid:16931408
http://dx.doi.org/10.1016/S1473-3099(06)...
).

The main shortcomings of fluorescence microscopy are the high costs of the microscopy unit and its maintenance and the advanced technical skills required for handling and maintenance of the optical equipment (Toman, 2004bToman K. What are the advantages and disadvantages of fluorescence microscopy? In: Frieden T, editor. Toman’s Tuberculosis: case detection, treatment, and monitoring. Questions and answers. WHO; 2004b. p. 31-5.).

The sensitivity of tuberculosis diagnosis through sputum smear analysis reported in the literature varies greatly. Reported sensitivities of conventional microscopy range from 0.32 to 0.94, and reported sensitivities of fluorescence microscopy range from 0.52 to 0.97. The specificity of fluorescence microscopy is similar to that of conventional microscopy and ranges from 0.94 to 1.0 (Steingart et al., 2006Steingart KR, Henry M, Ng V, Hopewell PC, Ramsay A, Cunningham J, Urbanczik R, Perkins M, Aziz MA, Pai M. Fluorescence versus conventional sputum smear microscopy for tuberculosis: a systematic review. The Lancet Infectious Diseases 2006; 6(9):570-81. http://dx.doi.org/10.1016/S1473-3099(06)70578-3. PMid:16931408
http://dx.doi.org/10.1016/S1473-3099(06)...
).

In addition to the large variability in sensitivity, the manual screening for bacillus identification is a labor-intensive and time-consuming task that takes between 40 minutes and3 hours, depending on the patient’s level of infection. Approximately 40-100 images must be analyzed (Sotaquirá et al., 2009Sotaquirá M, Rueda L, Narvaez R. Detection and quantification of bacilli and clusters present in sputum smear samples: a novel algorithm for pulmonary tuberculosis diagnosis. In: Proceedings of International Conference on Digital Image Processing; 2009. p. 117-21. http://dx.doi.org/10.1109/ICDIP.2009.59.
http://dx.doi.org/10.1109/ICDIP.2009.59...
).

Automatic methods for bacillus screening were first developed for fluorescence microscopy images (Veropoulos et al., 1998Veropoulos K, Campbell C, Learmonth G, Knight B, Simpson J. The automated identification of tubercle bacilli using image processing and neural computing techniques. In: Proceedings of 8th International Conference on Artificial Neural Networks; 1998. p. 797-802. http://dx.doi.org/10.1007/978-1-4471-1599-1_123.
http://dx.doi.org/10.1007/978-1-4471-159...
; Forero et al., 2003Forero MG, Cristóbal G, Alvarez-Borrego J. Automatic identification techniques of tuberculosis bacteria. Proceedings of the Society for Photo-Instrumentation Engineers 2003; 5203:71-81. http://dx.doi.org/10.1117/12.506800.
http://dx.doi.org/10.1117/12.506800...
). The first methods for automatic bacillus screening in conventional microscopy were published in 2008 (Costa et al., 2008Costa MGF, Costa Filho CFF, Sena JF, Salem J, Lima MO. Automatic identification of mycobacterium tuberculosis with conventional light microscopy. In: Proceedings of 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2008. p. 382-5.; Sadaphal et al., 2008Sadaphal P, Rao J, Comstock GW, Beg MF. Image processing techniques for identifying Mycobacterium tuberculosis in Ziehl-Neelsen stains. The International Journal of Tuberculosis and Lung Disease 2008; 12(5):579-82. PMid:18419897.; Raof et al., 2008Raof RAA, Salleh Z, Sahidan SI, Mashor MY, Noor SS, Idris FM, Hasan H. Color thresholding method for image segmentation algorithm of Ziehl-Neelsen sputum slide images. In: Proceedings of 5th International Conference on Electrical Engineering, Computing Science and Automatic Control; 2008. p. 212-7. http://dx.doi.org/10.1109/ICEEE.2008.4723398.
http://dx.doi.org/10.1109/ICEEE.2008.472...
). Recently, other methods for automatic bacillus screening were published (Forero et al., 2004Forero MG, Sroubek F, Cristóbal G. Identification of tuberculosis bacteria based on geometric and color. Real Time Imaging 2004; 10(4):251-62. http://dx.doi.org/10.1016/j.rti.2004.05.007.
http://dx.doi.org/10.1016/j.rti.2004.05....
, 2006Forero MG, Cristóbal G, Desco M. Automatic identification of Mycobacterium tuberculosis by Gaussian mixture models. Journal of Microscopy 2006; 223(Pt 2):120-32. http://dx.doi.org/10.1111/j.1365-2818.2006.01610.x. PMid:16911072
http://dx.doi.org/10.1111/j.1365-2818.20...
; Khutlang et al., 2010Khutlang R, Krishnan S, Dendere R, Whitelaw A, Veropoulos K, Learmonth G, Douglas TS. Classification of Mycobacterium tuberculosis in images of ZN-stained sputum smears. IEEE Transactions on Information Technology in Biomedicine 2010; 14(4):949-57. http://dx.doi.org/10.1109/TITB.2009.2028339. PMid:19726269
http://dx.doi.org/10.1109/TITB.2009.2028...
; Lenseigne et al., 2007Lenseigne B, Brodin P, Jeon H, Christophe T, Genovesio A. Support vector machines for automatic detection of tuberculosis bacteria in confocal microscopy images. In: Proceedings of 4th IEEE International Symposium on Biomedical Imaging; 2007. p. 85-8. http://dx.doi.org/10.1109/ISBI.2007.356794.
http://dx.doi.org/10.1109/ISBI.2007.3567...
; Makkapati, et al., 2009Makkapati V, Agrawal R, Acharya R. Segmentation and classification of tuberculosis bacilli from ZN-stained sputum smear images. In: Proceedings of 5th Annual IEEE Conference on Automation Science and Engineering; 2009. p. 217-20. http://dx.doi.org/10.1109/COASE.2009.5234173.
http://dx.doi.org/10.1109/COASE.2009.523...
; Osman et al., 2012Osman MK, Mashor MY, Jaafar H. Performance comparison and thresholding algorithms for tuberculosis bacilli segmentation. In: Proceedings of International Conference of Computer, Information and Telecommunication Systems (CITS); 2012. p. 1-5. http://dx.doi.org/10.1109/CITS.2012.6220378.
http://dx.doi.org/10.1109/CITS.2012.6220...
; Sotaquirá et al., 2009Sotaquirá M, Rueda L, Narvaez R. Detection and quantification of bacilli and clusters present in sputum smear samples: a novel algorithm for pulmonary tuberculosis diagnosis. In: Proceedings of International Conference on Digital Image Processing; 2009. p. 117-21. http://dx.doi.org/10.1109/ICDIP.2009.59.
http://dx.doi.org/10.1109/ICDIP.2009.59...
).

Some authors (Forero et al., 2006Forero MG, Cristóbal G, Desco M. Automatic identification of Mycobacterium tuberculosis by Gaussian mixture models. Journal of Microscopy 2006; 223(Pt 2):120-32. http://dx.doi.org/10.1111/j.1365-2818.2006.01610.x. PMid:16911072
http://dx.doi.org/10.1111/j.1365-2818.20...
; Khutlang et al., 2010Khutlang R, Krishnan S, Dendere R, Whitelaw A, Veropoulos K, Learmonth G, Douglas TS. Classification of Mycobacterium tuberculosis in images of ZN-stained sputum smears. IEEE Transactions on Information Technology in Biomedicine 2010; 14(4):949-57. http://dx.doi.org/10.1109/TITB.2009.2028339. PMid:19726269
http://dx.doi.org/10.1109/TITB.2009.2028...
; Sotaquirá et al., 2009Sotaquirá M, Rueda L, Narvaez R. Detection and quantification of bacilli and clusters present in sputum smear samples: a novel algorithm for pulmonary tuberculosis diagnosis. In: Proceedings of International Conference on Digital Image Processing; 2009. p. 117-21. http://dx.doi.org/10.1109/ICDIP.2009.59.
http://dx.doi.org/10.1109/ICDIP.2009.59...
) claim that the advantages of automatic bacillus screening over a manual screening include more reproducible values for sensitivity and specificity and a faster screening process. Table 1 reports the values for sensitivity, specificity and time waste for one image analysis using automatic methods.

Table 1
Sensitivity, specificity and time for one image analysis.

The sensitivity and specificity values previously cited for manual screening methods refer to tuberculosis diagnosis. The sensitivity and specificity values for automatic methods shown in Table 1 refer to object classification as bacillus or not bacillus. A rigorous comparison of sensitivities and specificities between manual and automatic screening methods is not available. A rigorous performance comparison between automatic methods is not possible because different image databases are used in each report.

As shown in Table 1, only one report (Sotaquirá et al., 2009Sotaquirá M, Rueda L, Narvaez R. Detection and quantification of bacilli and clusters present in sputum smear samples: a novel algorithm for pulmonary tuberculosis diagnosis. In: Proceedings of International Conference on Digital Image Processing; 2009. p. 117-21. http://dx.doi.org/10.1109/ICDIP.2009.59.
http://dx.doi.org/10.1109/ICDIP.2009.59...
) cited time wasted for image analysis. It is necessary to consider the number of images required to achieve a correct diagnosis to compute the time consumed with a TB automatic diagnosis. It is necessary to analyze between 20 and 100 fields of one slide to achieve a correct diagnosis. With an automatic procedure, it is also necessary to calculate the time spent on focusing computations, image acquisition and microscopy displacement. According to Santos (Santos et al., 1997Santos A, Ortiz de Solórzano C, Vaquero JJ, Peña JM, Malpica N, del Pozo F. Evaluation of autofocus functions in molecular cytogenetic analysis. Journal of Microscopy 1997; 188(Pt 3):264-72. http://dx.doi.org/10.1046/j.1365-2818.1997.2630819.x. PMid:9450330
http://dx.doi.org/10.1046/j.1365-2818.19...
), focusing computations takes 1.8s per field and acquisition takes 0.7s, including 0.5s for slide movement. Assuming that no parallel processes occur, and considering the maximum of 100 images, we calculate time spent for an automatic diagnosis according to Equation 1:

T a d = 100 x 1,87 + 1.8 + 0.7 7 m i n (1)

This value is sometimes less than the 40 minutes required for a TB manual diagnostic with sputum smear microscopy.

In fluorescence microscopy images, the bacilli are easily separated from the background with a threshold operation. The segmentation is performed using edge detection operators, such as a Canny edge detector (Veropoulos et al., 1998Veropoulos K, Campbell C, Learmonth G, Knight B, Simpson J. The automated identification of tubercle bacilli using image processing and neural computing techniques. In: Proceedings of 8th International Conference on Artificial Neural Networks; 1998. p. 797-802. http://dx.doi.org/10.1007/978-1-4471-1599-1_123.
http://dx.doi.org/10.1007/978-1-4471-159...
; Forero et al., 2004Forero MG, Sroubek F, Cristóbal G. Identification of tuberculosis bacteria based on geometric and color. Real Time Imaging 2004; 10(4):251-62. http://dx.doi.org/10.1016/j.rti.2004.05.007.
http://dx.doi.org/10.1016/j.rti.2004.05....
). Intermediate steps for edge linking and boundary tracing are also employed.

In conventional microscopy images, the bacilli are not easily separated from the background by a pixel intensity threshold operation. Histogram-based techniques, Bayesian pixel classifiers and KNN pixel classifiers are the main approaches for bacillus segmentation in the literature. These approaches use color space components as input variables. Khutlang et al. (2010)Khutlang R, Krishnan S, Dendere R, Whitelaw A, Veropoulos K, Learmonth G, Douglas TS. Classification of Mycobacterium tuberculosis in images of ZN-stained sputum smears. IEEE Transactions on Information Technology in Biomedicine 2010; 14(4):949-57. http://dx.doi.org/10.1109/TITB.2009.2028339. PMid:19726269
http://dx.doi.org/10.1109/TITB.2009.2028...
uses the RGB color space but did not justify their choice.Sotaquirá et al. (2009)Sotaquirá M, Rueda L, Narvaez R. Detection and quantification of bacilli and clusters present in sputum smear samples: a novel algorithm for pulmonary tuberculosis diagnosis. In: Proceedings of International Conference on Digital Image Processing; 2009. p. 117-21. http://dx.doi.org/10.1109/ICDIP.2009.59.
http://dx.doi.org/10.1109/ICDIP.2009.59...
analyzed the following color spaces: RGB, YCbCr, Lab, YIQ and HSV. From this analysis, the authors conclude that RGB, HSV and YIQ are not adequate because they generate a high number of false positives after the segmentation stage. YcbCr and Lab reported better results. No author has combined components of different color spaces in the segmentation step. In this paper, we propose combining components of different color spaces.

For separating bacilli from artifacts in a post-processing step, all authors use bacillus geometric characteristics. We improve this step by adding a Rule-based filter that uses the components of the RGB color space. This filter uses a new parameter, the color ratio (CR), which combines color information from pixels belonging to bacillus and to its neighbor.

This paper proposes a new method for bacillus identification in sputum smear microscopy with the following novel features:

  • The input variables for the segmentation were selected combining components of different color spaces: RGB, HSI, YCbCr and Lab.

  • In bacillus segmentation, two classifiers were compared with each set of input variables: neural networks and support vector machines (SVM).

  • In the post-processing step, a new filter based on rules is used to separate the bacilli from other artifacts in addition to geometric characteristics. This filter uses a new parameter proposed in this paper called the color ratio.

As demonstrated in this study, the sputum smear images can be divided in two groups according to the density of background: high-density background (HDB) images and low-density background (LDB) images. The HDB group is characterized by a strong presence of methylene blue counter stain in the background, and the LDB group is characterized by a weak presence of this same counter stain. In this study, we compare the behavior of the proposed bacillus identification method, when applied to these two different image groups.

Methods

The methodology for bacillus identification is composed of the following steps: image acquisition, segmentation and post-processing. In the segmentation step, two techniques were investigated: SVM and neural network classifiers. The input variables of these classifiers are combinations of pixel color characteristics selected from 4 color spaces. The best characteristics were selected by a scalar feature selection technique. The outputs of the segmentation step are objects that could be bacilli or artifacts. The goal of the post-processing step is to eliminate the objects considered artifacts. This task was accomplished by a sequence of three filtering processes.

Image acquisition

A total of 120 sputum smear images were acquired. The samples, from 12 patients, were prepared in the Laboratory of the Instituto Nacional de Pesquisas da Amazonia (INPA), Manaus, Brazil, using the Kinyoun acid-fast stain and counterstained with methylene blue solution. The images were captured using a digital camera model Canon Power Shot A640 of 10 megapixels. The microscope used was a Zeiss Axioskop 40 with a magnification of 100x and numerical aperture of 1.25. The PC had a Core 2 Duo processor 2.0 GHz with 3GB RAM attached to a conventional microscope model Zeiss Axioskop 4. The spatial resolution of the images is 2816x2112 pixels. The image focus was established in a previous study (Kimura et al., 2010Kimura A, Costa MGF, Costa Filho CFF, Fujimoto LBM, Salem J. Evaluation of autofocus functions of conventional sputum smear microscopy for tuberculosis. In: Proceedings of 32th Annual International IEEE EMBS Conference; 2010. p. 3041-4. http://dx.doi.org/10.1109/IEMBS.2010.5626143.
http://dx.doi.org/10.1109/IEMBS.2010.562...
).

Image groups

In a previous study (Costa Filho et al., 2012Costa Filho CFF, Costa MGF, Kimura Junior A. Autofocus functions for tuberculosis diagnosis with conventional sputum smear microscopy. In: Méndez-Vilas, A. editor. Current Microscopy Contributions to Advances in Science and Technology. Badajoz: Formatex; 2012. p. 13-20.; Kimura et al., 2010Kimura A, Costa MGF, Costa Filho CFF, Fujimoto LBM, Salem J. Evaluation of autofocus functions of conventional sputum smear microscopy for tuberculosis. In: Proceedings of 32th Annual International IEEE EMBS Conference; 2010. p. 3041-4. http://dx.doi.org/10.1109/IEMBS.2010.5626143.
http://dx.doi.org/10.1109/IEMBS.2010.562...
), we verified that the density of background content influences the focus of the image by a quantitative analysis. In images with high-density background (HDB) content, the best focus measure was the variance. The best focus measure was the entropy for images with low-density background (LDB) content.

The HDB group is characterized by a strong presence of counter stain with methylene blue solution in the background. The LDB group is characterized by a weak presence of this same counter stain. Figure 1 shows image examples extracted from the two groups. There is a prevalent blue color in the background of the HDB images and a prevalent white color in the background of the LDB images.

Figure 1
(a) Image with high density background content (HDB image); (b) Image with low density background content (LDB image) (c) Bar graph of the 120 acquired images, in which the vertical axis corresponds to the number of image pixels (%) whose H component is in the range of 0.5 to 0.7.

The evaluation of image background density was performed using the Hue component of the HSI space. For each image, the percentage of pixels with a Hue component in the blue color range (0.5-0.7) was obtained. To illustrate this evaluation, Figure 1(c) shows a bar graph of the 120 acquired images, in which the vertical axis corresponds to the number of image pixels (%) whose H component is in the range 0.5 to 0.7. The graph depicts that an experimental threshold value can be obtained to separate the images into two groups. This threshold value, 13.56, is shown as a horizontal line in Figure 1(c). When the bar value was less than this threshold value, the image was assigned to the LDB group. When the bar value was higher than this threshold value, the image was assigned to the HDB group.

A visual inspection of the images shown in Figure 1 revealed that the strong presence of counter stain with methylene blue solution in the background of the HDB images produces more artifacts than the LDB images. In this study, we compare the behavior of the proposed bacillus identification methods when applied to these two different image groups.

In the 120 images, the identified objects were enclosed within a geometric shape by two researchers guided by a pathologist. A true bacillus was enclosed in a circular or oval shape. An agglomerated bacillus was enclosed by a rectangle and a doubtful bacillus (the image focus or the geometry did not permit a clear identification of the object) was enclosed by a polygon. These marked objects were the standards used to calculate the accuracy, sensitivity and specificity of bacillus recognition. The doubtful bacilli and the agglomerated bacilli (it is not possible to know how many bacilli there are in one agglomeration) were not taken into account for these calculations. Figure 2 depicts examples images in which objects were marked as previously described.

Figure 2
Examples of sputum smear images in which the objects were identified as: true bacillus - circular or oval shape; doubtful bacillus – polygon; agglomerated bacilli- rectangle. (a) LDB image (b) HDB image.

Characteristic selection for segmentation

The features used for pixel classification in the segmentation step were the components and the subtraction of components of the following color spaces: RGB, HSI, YCbCr and Lab. A set, F, of 30 features was used: F = {R, G, B, R-B, R-G, G-B, ~R, ~G, ~B, H, S, I, H-S, H-I, S-I, R-I, G-I, B-I, Y, Cb, Cr, Y-Cb, Y-Cr, Cb-Cr, L, a, b, L-a, L-b, a-b}. The scalar feature selection technique was used to select the best features.

This “ad-hoc” technique contains information from combined correlation and adapted criteria for scalar characteristics. The choice of scalar feature selection over vectorial feature selection was because of the computational complexity of vectorial feature selection. As described by Theodoridis and Koutroumbas (2009)Theodoridis S, Koutroumbas K. Pattern recognition. 4th ed. Amsterdam: Academic Press; 2009., the scalar feature selection is divided into three parts:

  1. 1

    Select the first characteristic using a class separation measurement. In this study, Fisher´s Discriminant Ratio (FDR) was used. FDR is described in Equation 2:

    FDRk=μk1-μk22σk12+σk22(2)

where

μk1, σk1: mean value and standard deviation of characteristic xk in class w1.

μk2, σk2: mean value and standard deviation of characteristic xkin class w2.

Classes w1 and w2 represent pixels belonging to the background and pixels belonging to bacilli. The value of FDRkis calculated for each characteristicxk, k=1,,m. The characteristic xk with higherFDRk is selected. This is the xs1 characteristic.

  1. 2

    To select the second characteristic, xs2, the cross correlation coefficient is used between the two characteristics,xi and xjdefined in Equation 3.

    ρij=n=1Nxnixnjn=1Nxni2n=1Nxnj2(3)

whereN = total number of patterns belonging to classes w1andw2.

xni and xnj: values of ith and jth characteristic of pattern n.i,j=1,…, m

The second characteristic in the characteristicxs2that maximizes Equation 4:

α 1 F D R s 2 - α 2 ρ s 1 s 2 ,   f o r a l l s 2 s 1 (4)

α1 and α2express the importance of the first and second terms, in selecting the second-best characteristic. In this work,α1=α2=0.5.

  1. 3

    Other selected characteristics, xsk, k=3,…,m, are those that maximize the Equation 5:

    α1FDRsk-α2k-1r=1k-1ρsrsk(5)

From each of the 120 images, 20 pixels belonging to bacilli and 20 pixels belonging to the background were extracted for the application of this technique. Sets with 4,5,6,7 and 8 features were produced. The set with four selected features is {G-I, L-a, Y-Cr, a}. The set with five selected features is {G-I, L-a, Y-Cr, a, R-G}. The set with six selected features is {G-I, L-a, Y-Cr, a, R-G, H-I}. The set with seven selected features is {G-I, L-a, Y-Cr, a, R-G, H-I, a-b}. The set with eight selected features is {G-I, L-a, Y-Cr, a, R-G, H-I, a-b, H}.

Bacillus segmentation

In the segmentation step, the pixels are classified as belonging to bacilli or background. Two classification methods were employed: support vector machines (SVM) and feedforward neural networks. A total of 1,200 pixels belonging to bacilli and 1,200 pixels belonging to the background were used in the training set. These pixels were extracted from all 120 images.

SVM separates patterns belonging to two classes defining one hyperplane that maximizes the separating margin between these two classes (Haykin, 1999Haykin S. Neural networks and learning machines. 3rd. ed. New Jersey: Pearson Prentice Hall; 1999.). According to Theodoridis and Koutroumbas (2009)Theodoridis S, Koutroumbas K. Pattern recognition. 4th ed. Amsterdam: Academic Press; 2009., the hyperplane parameters that maximize the separating margin are the weight vector w and polarization w0 that minimizes Equation 6 and satisfies Equation 7:

J w , w 0 = 1 2 w 2 (6)
y i w T x i + w 0 1 ,   i = 1,2 , N (7)

where N = number of pixels to be classified.

For non-separating classes, the identical parameters could be determined, minimizing the Equation 8, in which new variablesξi, known as slack variables, are introduced. The optimizing task becomes more complex. The goal now is to make the margin as large as possible but simultaneously keep the number of points with ξ > 0 as small as possible.

J w , w 0 , ξ = 1 2 w 2 + C i = 1 N ξ i (8)

The C parameter in Equation 7 is a positive constant that controls the relative influence of the two competing terms. The C parameter values used in this work were 0.2, 0.4, 0.8, 1.6, 3.2, 6.4, 25.6, 51.2 and 102.4.

SVMs use kernels for mapping characteristic vectors as a large dimension space vector in which classes could be separated by hyperplanes. The following kernels were used in this work in association with the SVM classifier: linear, polynomial, radial, quadratic and multilayer perceptron.

Combining the C parameter values and kernels, 250 simulations were used to obtain the best SVM classifier.

The second classifier was a feedforward neural network, more specifically, a three layer neural network, n – m-1. To adjust the best architecture, a total of 180 simulations, combining different values for n and m in the set {3,6,9,12,15,18}, were performed. The training algorithm was back propagation associated with the Levenberg-Marquardt acceleration method. The convergence criterion was a quadratic error less than 10-4.

A total of 2,456 bacilli had been identified by the pathologist in the set of 120 images (the gold standard). When the classifier identifies an object as a bacillus and this classification is equal to that taken by the specialist, a true positive case occurs. Otherwise, a false positive case occurs (the object is classified as bacillus but, in fact, is not).A total of 2,456 bacilli were identified by the pathologist in the set of 120 images (the standard). A true positive case occurs when the classifier identifies an object as a bacillus, and this classification is equal to that taken by the specialist. Otherwise, a false positive case occurs (the object is incorrectly classified as a bacillus).

Post-processing

The outputs of the previous step are objects that could be bacilli or artifacts. The goal of the post-processing step is to eliminate the objects considered artifacts. This task was accomplished by applying the following filters: filter 1 – a size filter that removes objects with large areas (agglomerated bacillus) and small areas (artifacts); filter 2 – a geometric filter that eliminates objects based on its eccentricity and filter 3 – a Rule-based filter that uses components of the RGB color space.

Filter 1: size filter

This filtering process removes objects larger than 150 pixels (agglomerated bacilli) and smaller than 20 pixels (small artifacts).

Filter 2: geometric filter

The following geometric characteristics were investigated to choose the best characteristic for the geometric filter: area, perimeter, compactness, eccentricity and Hu moments of the first and second order: μ10, μ02, μ20, μ11, μ12, μ21.

The contours of 500 bacilli were extracted, and the following parameters were calculated for each one of these geometric characteristics: mean value (η), standard deviation (σ) and variation coefficient (v). The variation coefficient is defined by Equation 9:

v = σ η . 100 (9)

Table 2 depicts the computed parameter values for all considered geometric characteristics. The best geometric characteristic was the one with the lowest value. The geometric characteristic used by filter 2 was the eccentricity. A threshold value for eccentricity (0.77) that minimizes the false positive cases was experimentally obtained. Objects with eccentricity higher than 0.77 were considered bacilli, and objects with eccentricity lower than 0.77 were considered artifacts.

Table 2
Mean value, Standard Deviation and Variation Coefficient used to design the geometric filter in the post-processing step.

For the LDB image group, the size filter associated with the geometricfilterwasenough to obtain highbacillus identification rates. When another filter was added, the Rule-based filter, high bacillus identification rates were obtain for all images. For the LDB image group, the size filter associated with the geometric filter was enough to obtain high bacillus identification rates. When another filter was added, the rule-based filter, high bacillus identification rates wereextended for all images.

Filter 3: rule-based filter

The Rule-based filter uses the Color Ratio (CR) parameter. Figure 3 is used to define the CR parameter. In this figure, two points, CpandBp, are initially determined. The first corresponds to the centroid or the geometric center of the bacillus (xCp, yCp). The location of point Bp(xBp, yBp) is obtained as follows: 1. Determine whether the bacillus major axis is horizontal or vertical; 2. If the bacillus major axis is vertical, Bp corresponds to a background pixel, 4-neighbor of a pixel bacillus, located on the identical row of the geometric center of the bacillus, xCp, to the left or right; 3. Otherwise, Bp corresponds to a background pixel, 4-neighbor of a pixel bacillus, located on the identical column of the geometric center of the bacillus, yCp, above or below it. Figure 3(a) illustrates a bacillus in which the major axis is vertical and was chosen on the left side (or right side) of the centroid point.

Figure 3
Illustration of the post-processing step: (a) example of the segmented bacillus's image with centroid point, and a border point. (b) an original sputum smear image; (c) output of post-processing after applying the size filter; (d) output of post-processing after applying size filter + geometric filter; (e) output of post-processing after applying size filter +geometric filter + Rule-based filter.

The proposed CR parameter is defined by Expression 10. CR is the ratio of the difference in intensity of the red and green components, as shown in Expressions 11 and 12.

C R = d i f R d i f G (10)
d i f R = R C p - R B p (11)
d i f G = G C p - G B p (12)

where RCp - value of Red component in point Cp.

RBp - value of Red component in point Bp.

GCp - value of Green component in point Cp.

GBp- value of Green component in point Bp.

The following rules are used to determine whether an object is a bacillus or an artifact:

  • if RCp> GCp and RCp> BCp

  • object is bacillus

  • elseif RCp> GCp and RCp< BCp

  • if difR> 0 and difG> 0 and CR> 2

  • object is a bacillus

  • elseif difR> 0 and difG> 0

  • object is a bacillus

  • elseif difR< 0 and difG< 0 and CR< 0.5

  • object is a bacillus

  • else

  • object is not an artifact

  • elseif RCp< GCp and RCp< BCp

  • object is not an artifact

Because of the Kinyoun acid-fast stain, when a bacillus is over a white background (with a weak presence of counter stain with methylene blue solution), its color appears as light fuchsia. When the bacillus is over a blue background (with the strong presence of a counter stain with methylene blue solution), its color appears to be dark purple. In the first case, there is a predominance of the red component over the green and blue components. In the second case, there is a predominance of the blue component over the other two components. The red component is predominant over the green component. These observations are explored and summarized with the Rule-based filter.

The following figures demonstrate the application of the post-processing step: Figure 3(b) is an original image; Figure 3(c) depicts the output of post-processing step after applying the size filter; Figure 3(d) depicts the output of the post-processing step after applying the size filter+ the geometric filter; Figure 3(e) depicts the output of post-processing step after applying the size filter+ the geometric filter + the Rule-based filter. Figure 3(c) shows five marked objects: O1 and O2 – these objects are not bacilli and are eliminated with a geometric filter; Objects O3 and O4 – these objects are not bacilli and are eliminated only with the Rule-based filter; O5 – bacillus and is not eliminated with any of the filters. Figure 4(a), Figure 4(b) and Figure 4(c) show an intensity profile of the RGB components of the objects: O3, O4 and O5. In each graph, the coordinate x=0 corresponds to point Bpand the right coordinate of the graph corresponds to Cp. Values ofdifR, difGand CR for object O3were –65, –85 and 0.76, respectively. Values of difR,difGand CR for objectO4were –62, –86 and 0.72, respectively.

Figure 4
Object with corresponding RGB profile (a) object O3; (b) object O4; (c) object O5.

Results

Two result sets are reported. The first set demonstrates the segmentation step. The second result set demonstrates bacillus identification after the post-processing step.

The segmentation classifiers are used to separate the pixels into two classes, bacilli or background. Table 3 reports the accuracy, sensitivity and specificity of both segmentation classifiers used in pixel classification, neural network and SVM. The best values are obtained with the SVM classifier (Table 3). The best neural network performance was obtained with architecture 18-3-1 and five features as input characteristics. The best performance of the SVM was obtained with a quadratic kernel, C parameter equal to 1.6, and 7 features as input variables.

Table 3
Results of pixel classification in the segmentation step.

The results of bacillus detection after applying the post-processing step are shown in Table 4. Six different types of results are shown, depending on the segmentation classifier and post-processing filtering process used: SVM classifier + size filter; Neural network classifier + size filter; SVM classifier + (size filter + geometric filter); Neural network classifier + (size filter + geometric filter); SVM classifier + (size filter + geometric filter + Rule-based filter); Neural network classifier + (size filter + geometric filter + Rule-based filter).

Table 4
Results of bacillus identification after post-processing.

Discussion

This work presents a new method for bacillus identification. The following points summarize the differences between this method and those previously presented in literature:

  • Features used as input of segmentation classifiers were selected from four color spaces: RGB, HSI, YCbCr and Lab. A total of 30 features were used. Combinations of components from different color spaces, such as G-I, and from the identical color space (e.g., L-a, Y-Cr, R-G, H-I, a-b) were examined.

  • Only geometric characteristics are used to separate bacilli from artifact in bacillus identification methods reported in the literature (Sotaquirá et al., 2009; Makkapati et al., 2009Makkapati V, Agrawal R, Acharya R. Segmentation and classification of tuberculosis bacilli from ZN-stained sputum smear images. In: Proceedings of 5th Annual IEEE Conference on Automation Science and Engineering; 2009. p. 217-20. http://dx.doi.org/10.1109/COASE.2009.5234173.
    http://dx.doi.org/10.1109/COASE.2009.523...
    ; Khutlang et al., 2010Khutlang R, Krishnan S, Dendere R, Whitelaw A, Veropoulos K, Learmonth G, Douglas TS. Classification of Mycobacterium tuberculosis in images of ZN-stained sputum smears. IEEE Transactions on Information Technology in Biomedicine 2010; 14(4):949-57. http://dx.doi.org/10.1109/TITB.2009.2028339. PMid:19726269
    http://dx.doi.org/10.1109/TITB.2009.2028...
    ). This paper proposes using a new filtering process, the Rule-based filter.

The proposed method characterizes the sputum smear images that analyze the H component of the HSI color space of the image's pixels. This method identifies two groups of images: high-density background (HDB) and low-density background (LDB).

As shown in Table 4, the error rates obtained in bacillus detection are much lower for the LDB images. The sensitivity values obtained for the HDB images were higher than those obtained for the LDB images. The hit rate obtained with the LDB images was higher than that obtained with the HDB images.

The results obtained with the association of the three filters were improved compared to those obtained with the size filter and with the association of the size filter and the geometric filter. When using the three filters with the LDB images, the error rate decreases to 0% and lower than 4% for all images.

The best sensitivity, 96.80%, was obtained using the SVM classifier in the segmentation step and three filtering processes in the post-processing step, with an error rate of 3.38%. Khutlang et al. (2010)Khutlang R, Krishnan S, Dendere R, Whitelaw A, Veropoulos K, Learmonth G, Douglas TS. Classification of Mycobacterium tuberculosis in images of ZN-stained sputum smears. IEEE Transactions on Information Technology in Biomedicine 2010; 14(4):949-57. http://dx.doi.org/10.1109/TITB.2009.2028339. PMid:19726269
http://dx.doi.org/10.1109/TITB.2009.2028...
reported a sensitivity of 97.77%. Sotaquirá et al. (2009)Sotaquirá M, Rueda L, Narvaez R. Detection and quantification of bacilli and clusters present in sputum smear samples: a novel algorithm for pulmonary tuberculosis diagnosis. In: Proceedings of International Conference on Digital Image Processing; 2009. p. 117-21. http://dx.doi.org/10.1109/ICDIP.2009.59.
http://dx.doi.org/10.1109/ICDIP.2009.59...
reported a false positive rate of 9.78%.

The area of automatic tuberculosis diagnosis does not have an image database of sputum smear microscopy slices available. A rigid comparison between sensitivities and error rates between different methods was not possible because each author used a proprietary image database with a different specialist identifying objects as bacillus or artifact.

We generated an image database with 120-sputum-smear images from 12 patients with objects marked as bacillus, agglomerated bacillus and artifact. This database is now available at http://www.tbimages.ufam.edu.br. This database could be used by other authors to establish a comparison between different methods in bacillus recognition. Future work includes an improved image database funded by FAPEAM.

Acknowledgements

We would like to thank FAPEAM and CNPq (process 470972/2011-4) for the financial support. We also thank Academic English Solutions.com for revising the text.

References

  • Bennedsen J, Larsen SO. Examination for tubercle bacili by fluorescence microscopy. Scandinavian Journal of Respiratory Diseases 1966; 47(2):114-20. PMid:4161476.
  • Costa MGF, Costa Filho CFF, Sena JF, Salem J, Lima MO. Automatic identification of mycobacterium tuberculosis with conventional light microscopy. In: Proceedings of 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2008. p. 382-5.
  • Costa Filho CFF, Costa MGF, Kimura Junior A. Autofocus functions for tuberculosis diagnosis with conventional sputum smear microscopy. In: Méndez-Vilas, A. editor. Current Microscopy Contributions to Advances in Science and Technology. Badajoz: Formatex; 2012. p. 13-20.
  • Forero MG, Cristóbal G, Alvarez-Borrego J. Automatic identification techniques of tuberculosis bacteria. Proceedings of the Society for Photo-Instrumentation Engineers 2003; 5203:71-81. http://dx.doi.org/10.1117/12.506800.
    » http://dx.doi.org/10.1117/12.506800
  • Forero MG, Sroubek F, Cristóbal G. Identification of tuberculosis bacteria based on geometric and color. Real Time Imaging 2004; 10(4):251-62. http://dx.doi.org/10.1016/j.rti.2004.05.007.
    » http://dx.doi.org/10.1016/j.rti.2004.05.007
  • Forero MG, Cristóbal G, Desco M. Automatic identification of Mycobacterium tuberculosis by Gaussian mixture models. Journal of Microscopy 2006; 223(Pt 2):120-32. http://dx.doi.org/10.1111/j.1365-2818.2006.01610.x. PMid:16911072
    » http://dx.doi.org/10.1111/j.1365-2818.2006.01610.x
  • Haykin S. Neural networks and learning machines. 3rd. ed. New Jersey: Pearson Prentice Hall; 1999.
  • Khutlang R, Krishnan S, Dendere R, Whitelaw A, Veropoulos K, Learmonth G, Douglas TS. Classification of Mycobacterium tuberculosis in images of ZN-stained sputum smears. IEEE Transactions on Information Technology in Biomedicine 2010; 14(4):949-57. http://dx.doi.org/10.1109/TITB.2009.2028339. PMid:19726269
    » http://dx.doi.org/10.1109/TITB.2009.2028339
  • Kimura A, Costa MGF, Costa Filho CFF, Fujimoto LBM, Salem J. Evaluation of autofocus functions of conventional sputum smear microscopy for tuberculosis. In: Proceedings of 32th Annual International IEEE EMBS Conference; 2010. p. 3041-4. http://dx.doi.org/10.1109/IEMBS.2010.5626143.
    » http://dx.doi.org/10.1109/IEMBS.2010.5626143
  • Lenseigne B, Brodin P, Jeon H, Christophe T, Genovesio A. Support vector machines for automatic detection of tuberculosis bacteria in confocal microscopy images. In: Proceedings of 4th IEEE International Symposium on Biomedical Imaging; 2007. p. 85-8. http://dx.doi.org/10.1109/ISBI.2007.356794.
    » http://dx.doi.org/10.1109/ISBI.2007.356794
  • Makkapati V, Agrawal R, Acharya R. Segmentation and classification of tuberculosis bacilli from ZN-stained sputum smear images. In: Proceedings of 5th Annual IEEE Conference on Automation Science and Engineering; 2009. p. 217-20. http://dx.doi.org/10.1109/COASE.2009.5234173.
    » http://dx.doi.org/10.1109/COASE.2009.5234173
  • Osman MK, Mashor MY, Jaafar H. Performance comparison and thresholding algorithms for tuberculosis bacilli segmentation. In: Proceedings of International Conference of Computer, Information and Telecommunication Systems (CITS); 2012. p. 1-5. http://dx.doi.org/10.1109/CITS.2012.6220378.
    » http://dx.doi.org/10.1109/CITS.2012.6220378
  • Raof RAA, Salleh Z, Sahidan SI, Mashor MY, Noor SS, Idris FM, Hasan H. Color thresholding method for image segmentation algorithm of Ziehl-Neelsen sputum slide images. In: Proceedings of 5th International Conference on Electrical Engineering, Computing Science and Automatic Control; 2008. p. 212-7. http://dx.doi.org/10.1109/ICEEE.2008.4723398.
    » http://dx.doi.org/10.1109/ICEEE.2008.4723398
  • Sadaphal P, Rao J, Comstock GW, Beg MF. Image processing techniques for identifying Mycobacterium tuberculosis in Ziehl-Neelsen stains. The International Journal of Tuberculosis and Lung Disease 2008; 12(5):579-82. PMid:18419897.
  • Santos A, Ortiz de Solórzano C, Vaquero JJ, Peña JM, Malpica N, del Pozo F. Evaluation of autofocus functions in molecular cytogenetic analysis. Journal of Microscopy 1997; 188(Pt 3):264-72. http://dx.doi.org/10.1046/j.1365-2818.1997.2630819.x. PMid:9450330
    » http://dx.doi.org/10.1046/j.1365-2818.1997.2630819.x
  • Sotaquirá M, Rueda L, Narvaez R. Detection and quantification of bacilli and clusters present in sputum smear samples: a novel algorithm for pulmonary tuberculosis diagnosis. In: Proceedings of International Conference on Digital Image Processing; 2009. p. 117-21. http://dx.doi.org/10.1109/ICDIP.2009.59.
    » http://dx.doi.org/10.1109/ICDIP.2009.59
  • Steingart KR, Henry M, Ng V, Hopewell PC, Ramsay A, Cunningham J, Urbanczik R, Perkins M, Aziz MA, Pai M. Fluorescence versus conventional sputum smear microscopy for tuberculosis: a systematic review. The Lancet Infectious Diseases 2006; 6(9):570-81. http://dx.doi.org/10.1016/S1473-3099(06)70578-3. PMid:16931408
    » http://dx.doi.org/10.1016/S1473-3099(06)70578-3
  • Theodoridis S, Koutroumbas K. Pattern recognition. 4th ed. Amsterdam: Academic Press; 2009.
  • Toman K. How reliable is smear microscopy? In: Frieden T, editor. Toman’s Tuberculosis: case detection, treatment, and monitoring. Questions and answers. WHO; 2004a. p. 14-23.
  • Toman K. What are the advantages and disadvantages of fluorescence microscopy? In: Frieden T, editor. Toman’s Tuberculosis: case detection, treatment, and monitoring. Questions and answers. WHO; 2004b. p. 31-5.
  • United Nations - UN. The millennium development goals report [internet]. 2010 [cited 2013 Dec 18]. Available from: http://www.un.org/millenniumgoals/reports.shtml.
  • Veropoulos K, Campbell C, Learmonth G, Knight B, Simpson J. The automated identification of tubercle bacilli using image processing and neural computing techniques. In: Proceedings of 8th International Conference on Artificial Neural Networks; 1998. p. 797-802. http://dx.doi.org/10.1007/978-1-4471-1599-1_123.
    » http://dx.doi.org/10.1007/978-1-4471-1599-1_123
  • World Health Organization - WHOGlobal TB Control reportinternet2010cited 2013 Dec 18Available from:http://www.who.int/tb/publications/global_report/2010/en/index.html
  • World Health Organization - WHOGlobal TB control reportinternet2013cited 2014 Jan 24Available from: http://www.who.int/tb/publications/global_report/en

Publication Dates

  • Publication in this collection
    Jan-Mar 2015

History

  • Received
    13 Feb 2014
  • Accepted
    21 Jan 2015
Sociedade Brasileira de Engenharia Biomédica Centro de Tecnologia, bloco H, sala 327 - Cidade Universitária, 21941-914 Rio de Janeiro RJ Brasil, Tel./Fax: (55 21)2562-8591 - Rio de Janeiro - RJ - Brazil
E-mail: rbe@rbejournal.org