Acessibilidade / Reportar erro

Machine learning classification based on k-Nearest Neighbors for PolSAR data

Abstract

In this work, we focus on obtaining insights of the performances of some well-known machine learning image classification techniques (k-NN, Support Vector Machine, randomized decision tree and one based on stochastic distances) for PolSAR (Polarimetric Synthetic Aperture Radar) imagery. We test the classifiers methods on a set of actual PolSAR data and provide some conclusions. The aim of this work is to show that suitable adapted standard machine learning methods offer excellent performances vs. computational complexity trade-off for PolSAR image classification. In this work, we evaluate well-known machine learning techniques for PolSAR (Polarimetric Synthetic Aperture Radar) image classification, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), randomized decision tree, and a method based on the Kullback-Leibler stochastic distance. Our experiments with real PolSAR data show that standard machine learning methods, when adapted appropriately, offer a favourable trade-off between performance and computational complexity. The KNN and SVM perform poorly on these data, likely due to their failure to account for the inherent speckle presence and properties of the studied reliefs. Overall, our findings highlight the potential of the Kullback-Leibler stochastic distance method for PolSAR image classification.

Key words
speckle; classification; PolSAR; machine learning; Kullback-Leibler

Introduction

In the area of remote sensing, Polarimetric Synthetic Aperture Radar (PolSAR) systems have been indicated as important and efficient tools for the monitoring of the terrestrial surface, because such systems operate independently of the weather and provide high spatial resolution and through its analysis, it can be obtained various information about a terrestrial scene (Cintra et al. 2013CHENG J, ZHANG F, XIANG D, YIN Q, ZHOU Y & WANG W. 2021. PolSAR Image Land Cover Classification Based on Hierarchical Capsule Network. Remote Sens 13(16): 3132.).

A scene obtained from PolSAR data could be understood as a map in which each entry (pixel) is associated with a positive definite Hermitian matrix containing data from various polarizations in the illuminated area (horizontal polarization, vertical polarization, and/or a combination of both). In this paper, fully PolSAR data is used. This means that the radar returned signal contains rich information due to the participation of horizontally emitted and received signal (HH), vertically emitted and received signal (VV), and the HV (horizontally emitted and vertically received) and VH (vertically emitted and horizontally received) signal. Thus, for these data types it is necessary to use multivariate processing methods for the extraction of information.

As a difference from monopolarized radar data (SAR, Synthetic Aperture Radar), PolSAR images resemble, naturally, as pre-classified images when represented in the usual Pauli’s representation (Lee & Pottier 2009LEE JS, AINSWORTH TL & WANG Y. 2017. A review of polarimetric SAR speckle filtering. In: Int Geosci Remote Sens Symp (IGARSS), p. 5303-5306. doi:10.1109/IGARSS.2017.8128201.), as it can be in Figure 1. With Pauli’s representation, the original image resembles more of a natural (optical) image.

Figure 1
Pauli representation for a subset of San Francisco PolSAR data (left) and single polarization (HH SAR) image for same data (right).

In addition, PolSAR data is strongly affected by multiplicative speckle noise, resulting from coherent illumination (Frery et al. 1997FIX E & HODGES JL. 1989. Discriminatory analysis. Nonparametric discrimination: consistency properties. Int Stat Rev 57(3): 238-247.). This effect imposes the PolSAR intensity data to not follow Gaussian behavior and, as a consequence, techniques for classification, segmentation must be adapted. The inherent speckle noise hampers the classification process (Cintra et al. 2013CHENG J, ZHANG F, XIANG D, YIN Q, ZHOU Y & WANG W. 2021. PolSAR Image Land Cover Classification Based on Hierarchical Capsule Network. Remote Sens 13(16): 3132.) and makes it a challenging problem. Thus, classifiers less sensitive to the speckle effect are sought for the processing of PolSAR images.

Most remote sensing analyses require image classification. The classification of the land is one of the main possibilities that PolSAR data offers. For instance, to measure urban pressure (Kotru et al. 2021KOTRU R, SHAIKH M, TURKAR V, SIMU S, BANERJEE S & SINGH G. 2021. Semantic Segmentation of PolSAR Images for Various Land Cover Features. In: Int Geosci Remote Sens Symp (IGARSS). p. 351-354. IEEE.), to monitoring climatic changes (Guo et al. 2015GOODMAN NR. 1963b. Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction). Ann Math Stat 34(1): 152-177.) or landslides (Zhong et al. 2020ZHONG C, LIU Y, GAO P, CHEN W, LI H, HOU Y, NUREMANGULI T & MA H. 2020. Landslide mapping with remote sensing: challenges and opportunities. Int J Remote Sens 41(4): 1555-1581.) among other problems of great interest.

PolSAR data classification has been traditionally done through two unsupervised well-established (academic) methods: the \(H-\alpha\) technique (Cloude & Pottier 1997CINTRA RJ, FRERY AC & NASCIMENTO AD. 2013. Parametric and nonparametric tests for speckled imagery. Pattern Anal Appl 16(2): 141-161.) and the methods that decompose the return radar signal in several scattering components (Yamaguchi et al. 2005YAMAGUCHI Y, MORIYAMA T, ISHIDO M & YAMADA H. 2005. Four-component scattering model for polarimetric SAR image decomposition. IEEE Trans Geosci Remote Sens 43(8): 1699-1706. doi:10.1109/TGRS.2005.852084.). The former method uses the entropy and scattering mechanism types, to map the \(H-\alpha\) plane into eight zones and, from that information, data is classified. The latter method accounts for different signal bounce mechanisms (single-bounce) scattering, double-bounce, and volume scattering), so, providing a clear physical meaning to data (to image pixels).

Although those techniques provide excellent results (especially when classifying large areas such as ocean, urban, or forest), the speckle content in PolSAR data strongly reduces its performance. In Figure 1, the speckle is easily visible as a granular pattern covering the whole image, especially in the HH image (right). However, such content is also visible in the left image and all PolSAR and SAR imagery.

This post-processing task, image classification, has been largely researched by means of several approaches such as simple machine learning and pattern recognition methods (Liu et al. 2016LIU C, LIAO W, LI HC, FU K & PHILIPS W. 2018. Unsupervised classification of multilook polarimetric SAR data using spatially variant wishart mixture model with double constraints. IEEE Trans Geosci Remote Sens 56(10): 5600-5613.) or by applying complex deep learning techniques (Liu et al. 2018LIU F, SHI J, JIAO L, LIU H, YANG S, WU J, HAO H & YUAN J. 2016. Hierarchical semantic model and scattering mechanism based PolSAR image classification. Pattern Recognit 59: 325-342., Mullissa et al. 2019MULLISSA AG, PERSELLO C & STEIN A. 2019. PolSARNet: A deep fully convolutional network for polarimetric SAR image classification. IEEE J Sel Top Appl Earth Obs Remote Sens 12(12): 5300-5309.). A review of PolSAR data classification covering from the academic methods (polarimetry) to emerging deep learning approaches can be seen in (Wang et al. 2019WANG H, XU F & JIN YQ. 2019. A Review of Polsar Image Classification: from Polarimetry to Deep Learning. In: Int Geosci Remote Sens Symp (IGARSS), p. 3189-3192. doi:10.1109/IGARSS.2019.8899902.).

The latter techniques, deep learning methods, are widely researched at present, providing excellent results but, their long training times (several days or even more), the need of huge labeling data, and the difficulty of replicability of experiments and results, make them still a challenging option for their practical application to classification of PolSAR data. Other nontrivial issues related to deep learning methods rely on the computational resources needed to improve the quality of results, and also on the environmental cost related to pursuing such improvements.

Among non-parametric classification methodologies, the Nearest Neighbors (NN) is one of the most powerful. Currently, NNs have been explored in various areas of artificial intelligence, such as pattern recognition, data mining, estimation of posterior probabilities, consultation based on similarity, and others (Zuo et al. 2008ZUO W, ZHANG D & WANG K. 2008. On kernel difference-weighted k-nearest neighbor classification. Pattern Anal Appl 11(3-4): 247-257.). These classifiers were proposed by Fix and Hodges (Fix 1951FERREIRA JA, COÊLHO H & NASCIMENTO AD. 2021. A family of divergence-based classifiers for Polarimetric Synthetic Aperture Radar (PolSAR) imagery vector and matrix features. Int J Remote Sens 42(4): 1201-1229., Fix & Hodges 1989FIX E. 1951. Discriminatory analysis: nonparametric discrimination: consistency properties. Report No 4, USAF School of Aviation Medicine, Randolph Field, Texas.) and can be considered as one of the oldest and simplest methods of classifying patterns (Jaafar et al. 2016HSU CW & LIN CJ. 2002. A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw Learn Syst 13(2): 415-425.).

NN classifiers do not require modeling or training steps. In this sense, the classification mechanism is based on itself training data set. The predicted class for a point in the test data set is obtained by the relation of its neighbors in the training data set (Pathak 2014PATHAK MA. 2014. Beginning data science with R. Springer.).

Several papers addressed the proposal of classifiers for PolSAR data with the objective of extracting information on ground cover. (Tao et al. 2015TAO M, ZHOU F, LIU Y & ZHANG Z. 2015. Tensorial independent component analysis-based feature extraction for polarimetric SAR data classification. IEEE Trans Geosci Remote Sens 53(5): 2481-2495.) combined feature extraction methods with the k-NN (k-Nearest Neighbor) and SVM (Support Vector Machine) classifiers. This combination showed that it is possible to improve classification rates in PolSAR image processing with methods combination

Parametric techniques used in image processing usually impose the Gaussian assumption. However, PolSAR data are represented by Hermitian matrices whose main diagonals are not described by the normal multivariate distribution (see next Section). The objective of this study is to evaluate the accuracy of some classifiers that do not possess the Gaussian assumption in discriminating regions of PolSAR images, using as input data the intensity attributes. The classifiers used were the k-NN, kernel k-NN, fuzzy k-NN, kernel fuzzy k-NN, SVM, Naive Bayes, XGBoost and the Kullback-Leibler distance.

Most classification methods for PolSAR (and SAR) images are applied after filtering (despeckling) the data. Although a plethora of efficient despeckling filters are available (see for instance (Lee et al. 2017LEE J & GRUNES M. 1992. Classification of multi-look polarimetric SAR data based on complex Wishart distribution. In: [Proceedings] NTC-92: National Telesystems Conference, p. 7-21. IEEE.)), in this work PolSAR data is not filtered. This is done to evaluate, in a direct way, the performances of used classifiers, and thus, eliminating any bias produced by the filtering methods. Additionally, it can be expected that conclusions achieved for a particular classifier (in terms of performing better than the others), would even improve if data were previously filtered. Thus, classifiers less sensitive to the speckle effect are sought for the processing of PolSAR images.

From above, it remains still interesting to explore the capabilities of simple and efficient machine learning techniques able to provide acceptable results for PolSAR image classification without requiring such so-computationally demanding deep learning techniques.

This is the purpose of this work, which is organized as follows: we begin with a brief introduction to the Wishart law followed by the PolSAR data, the subsequent content delves into various general classifiers, including nearest neighbors, support vector machines, and decision-tree-based methods. Towards the conclusion of this discussion, the Kullback-Leibler stochastic distance is introduced, and a classifier based on this measure, suitable for PolSAR data, is explored. The experimental setup and details are then covered, followed by the presentation of results and conclusions.

Scene heterogeneity representation

In general form, a PolSAR systems represent each resolution cell by p polarization elements comprising a complex random vector \(\boldsymbol{y}=[S_1\; S_2\;\cdots\; S_p]^\top\) related to the dielectric properties of the scene. Each component of \(\boldsymbol{y}\) carries the amplitude and phase of a polarization combination (Frery et al. 2007FRERY AC, MULLER H, YANASSE CDCF & SANT’ANNA SJS. 1997. A model for extremely heterogeneous clutter. IEEE Trans Geosci Remote Sens 35(3): 648-659.). An \(L\)-looks covariance matrix is the average of \(N\) backscatter measurements in a neighborhood used to improve the signal-to-noise ratio being

[Z=1Li=1Lyiyi,]
where \(\boldsymbol{y}_i\), \(i=1,2,\ldots,L\) are realizations of \(\boldsymbol{y},\) and \(\star\) represent the conjugate operator. The diagonal elements of \(\mathbf{Z}\) are nonnegative numbers that represent the intensity of the signal measured on a specific polarization. Under assumption that \(\boldsymbol{Z}\) follows a zero-mean Complex Gaussian distribution (Goodman 1963GOMEZ L, ALVAREZ L, MAZORRA L & FRERY AC. 2017. Fully PolSAR image classification using machine learning techniques and reaction-diffusion systems. Neurocomputing 255: 52-60.), it is possible to show that \(\boldsymbol{Z}\) follows a scaled Complex Multivariate Wishart law (we denoted it by \(\boldsymbol{Z}\sim \mathcal{W}(\boldsymbol{\Sigma},L)\)) which is characterized by the following probability density function

fZ(Z;Σ,L)=LpL|Z|Lp|Σ|LΓp(L)exp[Ltr(Σ1Z)],(1)
where, \(\Gamma_p(L)=\pi^{p(p-1)/2}\prod_{i=0}^{p-1}\Gamma(L-i)\), \(L\geq p\), \(\Gamma(\cdot)\) is the gamma function, and \(\operatorname{tr}(\cdot)\) is the trace operator. This distribution satisfies \(\operatorname{\mathbb{E}}\{\boldsymbol{Z}\}=\boldsymbol{\Sigma}\), which is a Hermitian positive definite matrix.

The scaled Complex Multivariate Wishart model is valid in textureless areas. Target variability can be included by one or more additional parameters; the reader is referred to the work by (Deng et al. 2017COVER T & HART P. 1967. Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1): 21-27.) for a comprehensive survey of models for PolSAR data. In practical situations, the parameters \(L\) and \(\boldsymbol{\Sigma}\) are the number of looks and the target mean covariance matrix to be estimated. The maximum likelihood estimator of \(\boldsymbol{\Sigma}\), based on \(N\) independent samples, is the sample mean \(\widehat{\boldsymbol{\Sigma}}={N}^{-1}\sum_{i=1}^N {\boldsymbol{Z}_i}\), and \(L\) can be estimated by any of the techniques discussed in (Anfinsen et al. 2009ANFINSEN SN, DOULGERIS AP & ELTOFT T. 2009. Estimation of the equivalent number of looks in polarimetric synthetic aperture radar imagery. EEE Trans Geosci Remote Sens 47(11): 3795-3809.).

Machine Learning Classifiers

Machine learning (ML) is a huge interdisciplinary area of research focusing on solving problems from knowledge from data. There are numerous applications for machine learning, among them, image classification is one of most significant. ML techniques have been widely applied to classify PolSAR data in the last decades Lee & Grunes 1992LEE JS & POTTIER E. 2009. Polarimetric Radar Imaging From Basics to Applications. CRC Press., Xie et al. 2018XIE W, XIE Z, ZHAO F & REN B. 2018. POLSAR Image Classification via Clustering-WAE Classification Model. IEEE Access 6: 40041-40049. doi:10.1109/ACCESS.2018.2852768.. Not only the standard methods have been used (that is, shallow classifiers) but efficient combinations of classical methods with new concepts can be found in the literature. For instance, in (Gomez et al. 2017GIROLAMI M. 2002. Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw Learn Syst 13(3): 780-784.), first, a complex Wishart model is estimated for each class using training data, and then, the model is embedded into new classification procedure based on a diffusion-reaction equation. The method relies on simultaneously filtering and classifying pixels within the image. In (Luo et al. 2020LUO S, SARABANDI K, TONG L & GUO S. 2020. Unsupervised Multiregion Partitioning of Fully Polarimetric SAR Images With Advanced Fuzzy Active Contours. IEEE Trans Geosci Remote Sens 58(2): 1475-1486. doi:10.1109/TGRS.2019.2947376.), an unsupervised fuzzy active contour model for multiregion segmentation for PolSAR images is discussed. The method combines statistical information from data (scattering mechanisms obtained by the Freeman decomposition method) as well as edge information and texture information through a local homogeneity operator,

Below, some well-known machine learning methods for image classification are discussed. First, we focus on the methods that extract local information to perform the classification. Then, other widely used machine learning methods are introduced (support vector machines, boosting algorithm, and the naive Bayes classifier). Special attention is devoted to the suite-to-data classifier based on stochastic distances, which is discussed at last in this Section.

Classifiers based on Nearest Neighbors

k-NN classifier

The k-Nearest Neighbors algorithm is an extension of the NN algorithm, and is considered as a non-parametric classifier (Cover & Hart 1967CLOUDE S & POTTIER E. 1997. An entropy based classification scheme for land applications of polarimetric SAR. IEEE Trans Geosci Remote Sens 35(1): 68-78. doi:10.1109/36.551935.). Given a test sample where the class is unknown, the method finds the nearest k in the training set for each observation of the test sample over a distance and assigns a class to the observation according to the majority vote of the classes of these neighbors. Let a data set \(\mathcal{D} = \{ {\boldsymbol x}_j, j=1,2\},\) with \({\boldsymbol x}_j\in \mathbb{R}^2\) and \(n\)-dimensional, i.e., \({\boldsymbol x}_j = [x_{ij}, \ldots, x_{nj}]^{\top}\), where \(^{\top}\) represents the transposed operator, the similarity between \({\boldsymbol x}_{i1}\) and \({\boldsymbol x}_{i2}\) is measured by the Euclidean distance is obtained as:

\begin{aligned} \label{disteuclidi} d({\boldsymbol x}_1,{\boldsymbol x}_2) = \parallel {\boldsymbol x}_1 - {\boldsymbol x}_2 \parallel = \sqrt{\sum\limits_{i=1}^{n}(x_{i1} - x_{i2})^2}. \end{aligned}(2)

A decision on the amount of neighbors closest adequate (value of k) should be taken. (Loftsgaarden et al. 1965LOFTSGAARDEN DO ET AL. 1965. A nonparametric estimate of a multivariate density function. Ann Math Stat 36(3): 1049-1051.), suggests that a k value should be used next to \(\sqrt{n}\), where \(n\) is the size of the training sample. But in practice, several values can be tried for k and is chosen as value for k based on the lowest classification error, this procedure is called hyperparameter tuning, through the method cross validation.

The algorithm k-NN estimates the probability of an observation belonging to each group based on the nearest neighbor k information defined as follows (Cover & Hart 1967CLOUDE S & POTTIER E. 1997. An entropy based classification scheme for land applications of polarimetric SAR. IEEE Trans Geosci Remote Sens 35(1): 68-78. doi:10.1109/36.551935.). Let \(\mathcal{C}\) be a finite non-empty set of all possible class labels \(c.\) A partition of \(\mathcal{C}\) is a collection \(\{C_1,C_2, \ldots,C_c \}\) of subsets of \(\mathcal{C}\) that are pairwise disjoint and satisfy \(\mathcal{C} = \cup_{t=1}^{c} C_t.\) Here, \(C_t \subset \mathbb{R}^p\) with size \(|C_t|=n_t\), for \(t = 1, \ldots,c,\) and \(n = \sum_{t=1}^{c} n_t.\)

Given an observation \({\boldsymbol x}_r \in \mathbb{R}^p\) to be classified, where \(r \notin {1, \ldots, n}\), and \({{\boldsymbol x}_{i_1}, {\boldsymbol x}_{i_2}, \ldots, {\boldsymbol x}_{i_k}}\) is the set of k nearest neighbors, with \(k \leq n\), and \({i_1, i_2, \ldots, i_k} \subset {1, 2, \ldots, n}\).

Thus, for \(t = 1, \ldots, c\), the classification rule of k-NN is defined as:

\[\widehat{c}_{_{kNN}} = \underset{c}{\mathrm{arg max}} \ \widehat{P}_k(c|{\boldsymbol x}_r),\](3)
where \(\widehat{P}_k(c|{\boldsymbol x}_r)\) is the estimate of the conditional probability of \(c\) given \({\boldsymbol x}_r\), and \(\widehat{P}_k(c|{\boldsymbol x}_r) = k^{-1}\sum_{\nu=1}^{k}\mathbb{I}_c({\boldsymbol x}_{i_\nu}),\) where
\begin{aligned} \mathbb{I}_c({\boldsymbol x}_{i_\nu}) = \left\{ \begin{array}{rl} 1,&\text{if } {\boldsymbol x}_{i_\nu} \in C_t,\\ 0,& \text{if } {\boldsymbol x}_{i_\nu} \notin C_t. \end{array} \right. \nonumber \end{aligned}

Kernel k-NN classifier

The k-NN kernel classifier is a natural extension of the k-NN algorithm in which the training samples are transformed into a feature space through a nonlinear transformation (Cao et al. 2012CAMPBELL C & YING Y. 2011. Learning with support vector machines. Synth Lect Artif Intell Mach Learn 5(1): 1-95.). Thus, consider a transformation in the \(p\)-dimensional feature space for a \(s\)-dimensional feature space (usually with \(p \leq s\)) in which, given an arbitrary vector \({\boldsymbol x}\in E_1\) (original \(p\)-dimensional feature space), with \(\psi({\boldsymbol x})\) being the respective vector in \(E_2\) (new \(s\)-dimensional feature space), the transformation is obtained as follows:

x=(x1,x2,,xp),xE1(transformation of characteristics)ψ (x)=(ϕ 1(x),ϕ 2(x),,ϕ s(x)),ψ (x)E2(4)

Given \({\boldsymbol x}\in E_1 \subseteq \mathbb{R}^p\) the vector of characteristics of the original data and \(\psi\) a nonlinear continuous, symmetric and positive semi-definite transformation of \(E_1\) for a possible space of high dimension \(E_2\) (Hilbert space). The transformation of \({\boldsymbol x}\) for \(\psi\) represents the space of characteristics (Yu et al. 2002YU K, JI L & ZHANG X. 2002. Kernel nearest-neighbor algorithm. Neural Process Lett 15(2): 147-156.).

To use k-NN in a space of high-dimensional characteristics, it is necessary that the classification process is executed in such space. Thus, a distance metric can be applied that evaluates the proximity in the feature space \(E_2\). This is called the kernel and can be applied to the algorithm k-NN.

A kernel can be defined as a K function, such that,

\[K({\boldsymbol x}_1,{\boldsymbol x}_2) = \langle \psi({\boldsymbol x}_1),\psi({\boldsymbol x}_2) \rangle,\](5)

where \(\langle \psi({\boldsymbol x}_1),\psi({\boldsymbol x}_2) \rangle\) is the internal product of \(\psi({\boldsymbol x}_1)\) and \(\psi({\boldsymbol x}_2)\).

According to Hilbert-Schmidt’s theory, \(K({\boldsymbol x}_1,{\boldsymbol x}_2)\) can be an arbitrary symmetric function that satisfies the condition of Mercer 1909MERCER JB. 1909. XVI. Functions of positive and negative type, and their connection the theory of integral equations. Philos Trans R Soc A 209(441-458): 415-446..

In this paper we consider the Radial (Gaussian) base kernel, which is the most popular used in the literature:

K(x1,x2)=exp{x1x222σ2},(6)
where, \(\sigma\) is adjustable parameter to control overfitting (Kuhn & Johnson 2013KUHN M & JOHNSON K. 2013. Over-fitting and model tuning. In: Appl Predic Mod, p. 61-92. Springer.).

The kernel approach is based on the fact that the transformation in the feature space induces internal products and in this way distances can be calculated in the Hilbert space. However, internal products are only available in Hilbert space (Yu et al. 2002YU K, JI L & ZHANG X. 2002. Kernel nearest-neighbor algorithm. Neural Process Lett 15(2): 147-156.). In the algorithm k-NN usually the Euclidean distance or norm (equation 6) is used as a measure of similarity. Thus, using, for example, the quadratic norm, it is possible to obtain a measure of the transformed data, i.e.:

\begin{aligned} d^2(\psi({\boldsymbol x}_1),\psi({\boldsymbol x}_2)) = \parallel \psi({\boldsymbol x}_1) - \psi({\boldsymbol x}_2) \parallel^2 = K({\boldsymbol x}_1,{\boldsymbol x}_1) -2K({\boldsymbol x}_1,{\boldsymbol x}_2) + K({\boldsymbol x}_2,{\boldsymbol x}_2). \label{distnovoespcarac} \end{aligned}(7)

Thus, the distance between neighbors of the original data in the new feature space is calculated using a kernel function (Girolami 2002FRIEDMAN JH. 2001. Greedy function approximation: a gradient boosting machine. Ann Stat, p. 1189-1232.). The classification the rule is equal to k-NN.

Fuzzy k-NN classifier

The fuzzy k-NN algorithm is based on the classification rules of the algorithm k-NN together with the fuzzy set theory (Nikdel et al. 2018NIKDEL H, FORGHANI Y & MOHAMMAD HOSEIN MOATTAR S. 2018. Increasing the speed of fuzzy k-nearest neighbours algorithm. Expert Syst 35(3): e12254.). The algorithm calculates the membership degrees of the characteristics for the set of training data through a membership function, the membership of \(x\) to a class is given as a function of the number of nearest neighbors in the training sample.

To define the membership degree \(u_{h}\) as (Nikdel et al. 2018NIKDEL H, FORGHANI Y & MOHAMMAD HOSEIN MOATTAR S. 2018. Increasing the speed of fuzzy k-nearest neighbours algorithm. Expert Syst 35(3): e12254.):

\begin{aligned} u_h(x) = \left\lbrace \begin{array}{rc} \alpha + (k_h/k)(1-\alpha), & \text{if } h=t \\ (k_h/k)(1-\alpha), & \text{if } h \neq t, \\ \end{array} \right. \label{calculopertinenclass} \end{aligned}(8)
where, \(t\) is the class \(x\) belongs, \(k_h\) is the number of neighbors belonging to \(h\)-th class and k is the total of neighbors. In general, \(\alpha=0.51\) is used (KELLER et al. 1985KELLER J, GRAY M & GIVENS JR J. 1985. A frezzy K-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15(4): 580-585.). Given \(c\) classes, and defining that \(u_{hi}=u_h(x_i)\) in-class \(h\) to \(h = 1,2,\ldots,c\), and \(i=1,2,\ldots,n\), the algorithm must satisfy two conditions:
h=1cuhi=1,i and9
uhi[0,1],0<i=1nuhi<n.10

The fuzzy classifier k-NN assigns memberships which are obtained through the distance of the vectors of the nearest neighborhood k, and these neighboring memberships are considered possible classes. Let \({\boldsymbol x}_r \in \mathbb{R}^p\) an observation to be classified, being that \(r \notin \{1,\ldots, n\}\), and \(\{\boldsymbol{x}_{i_1},\boldsymbol{x}_{i_2},\ldots,\boldsymbol{x}_{i_k} \}\) the set of k nearest neighbors, such that, \(k\leq n\) and \(\{{i_1},{i_2},\ldots,{i_k} \} \subset \{1,2,\ldots,n\}\). The algorithm works similarly to k-NN, and the membership obtained for the test sample is as follows:

uh(xr)=i=1kuhi(sim{xr,xi})i=1k(sim{xr,xi}),(11)
where \(sim\lbrace {\boldsymbol x}_r, {\boldsymbol x}_i \rbrace\) is a similarity function of \({\boldsymbol x}_r\) and \({\boldsymbol x}_i\), such as,
\[sim\lbrace {\boldsymbol x}_r, {\boldsymbol x}_i \rbrace = \frac{1}{\parallel {\boldsymbol x}_r-{\boldsymbol x}_i \parallel ^{\frac{2}{(m-1)}}}.\]

According to the equation (11), the pertinence attributed to \({\boldsymbol x}_r\) are influenced by the inverse of the distances of the nearest neighbors and their relative membership degrees \(u_{hi}\). The parameter \(m\) is responsible for determining the intensity (influence, weight) that each neighbor contributes to the membership degree, calculating a weighted distance. As \(m\) increases, the relative distances of neighbors have less effect (Wu & Zhou 2005WU X & ZHOU J. 2005. Kernel-based Fuzzy K-nearest-neighbor Algorithm. In: Computational Intelligence for Modelling, Control and Automation, 2005 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, International Conference on, Vol. 2, p. 159-162.).

Kernel Fuzzy k-NN classifier

In subsection, was shown the k-NN algorithm using kernel and in the subsection the fuzzy k-NN was shown. Now we will use the idea of each of the algorithms to generate the algorithm Kernel Fuzzy k-NN, joining the idea of the kernel transformation with fuzzy membership degree (Wu & Zhou 2005WU X & ZHOU J. 2005. Kernel-based Fuzzy K-nearest-neighbor Algorithm. In: Computational Intelligence for Modelling, Control and Automation, 2005 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, International Conference on, Vol. 2, p. 159-162.). In this study, it will be considered a symmetric kernel, such as the radial basis shown in equation (6). It will be assumed \(m=2\), and considered the distance

d2(ψ(x1),ψ(x2))=22K(x1,x2),(12)
then, by combining equation (11) and equation (12), we have:
uj(ϕ(x1))=i=1kuhi(1d2(ψ(x1),ψ(xi)))i=1k(1d2(ψ(x1),ψ(xi))),(13)
where \(u_{hi}\) are the membership degrees and calculated in equation (8).

Support Vector Machine (SVM)

The SVM has been proposed as an effective method of machine learning for classification and regression that often have superior predictive performance to classical neural networks (Campbell & Ying 2011BREIMAN L. 1996. Bagging predictors. Mach Learn 24(2): 123-140.). The main idea consists in mapping the input space in a high-dimensional feature space (generally a suitable Hilbert space) through nonlinear transformation to produce Optimal Separating Hyperplanes (OSH) that separates cases of different class label. For simplicity, let’s assume that the input data points are \({\boldsymbol x}_i\) in \(\mathbb{R}^n\) and \(y_i\in \{-1,1\}\) are the output labels, with \(i = 1,\ldots, n.\) Intuitively, based on the training set, the SVM estimates a hyperplane that serves as a decision boundary, it is \(\langle {\boldsymbol w } , {\boldsymbol x}\rangle+b= w_1x_1+\cdots+w_nx_n+b=0\) where the weight vector \({\boldsymbol w } \in \mathbb{R}^n\) and the bias \(b\in \mathbb{R}.\) To predict the label of a new input vector \(\textbf{ x } ^*\), the SVM uses a decision rule based on the sign of \(\langle {\boldsymbol w } , \textbf{ x } ^* \rangle + b\). If it is positive, then the label is 1, otherwise it is 1.

For SVMs, it is well known that the original data \({\boldsymbol x}\) are mapped via the function \(\Phi : \mathbb{R}^n \rightarrow \mathcal{F}.\) The function \(\Phi\) is implicitly defined by a kernel function \(K ( {\boldsymbol x}, {\boldsymbol x}^{\top} ) = \ensuremath{ \left< \Phi( {\boldsymbol x}), \Phi({\boldsymbol x}^{\top}) \right> }\), which computes the inner product in \(\mathcal{F }\). SVM finds a hyperplane that best separates the data by solving:

minwRn,bR,0fsv=12w22+C1ξ,(14)
subject to the margin constraints \(y_i ( {\boldsymbol w } ^{\!\top}\Phi({\boldsymbol x}_i) + b ) \geq 1 - \xi_i\), \(\forall i\). Here \({\boldsymbol 1}\) is a vector of all 1’s. The Lagrange dual can be easily derived:
maxα1α12α(Kyy)α, s.t. 0αC,yα=0.(15)

Here \(C\) is the trade-off parameter; \({\boldsymbol K }\) is the kernel matrix with \({\boldsymbol K } _{lh} = K( {\boldsymbol x}_l, {\boldsymbol x}_h)\); and \(\circ\) denotes element-wise matrix multiplication (Hadamard product).

The Kernel function \(K(\boldsymbol{x}, \boldsymbol{y})\) used in this work was defined by Equation 15. At first, the SVM was developed for a two classes problems, but it can be easily extended to multiclasses problems (Hsu & Lin 2002GUO HD, ZHANG L & ZHU LW. 2015. Earth observation big data for climate change research. Adv Clim Change Res 6(2): 108-117.).

Extreme Gradient Boosting (XGBoost)

The XGBoost classifier is a decision-tree-based ensemble method that uses the decision tree concept associated with the Gradient Boosting structure (Breiman 1996BOUHLEL N & MERIC S. 2020. Multilook Polarimetric SAR Change Detection Using Stochastic Distances Between Matrix-Variate Gd0 Distributions. EEE Trans Geosci Remote Sens 58(10): 6823-6843.). Therefore, this method is based on the boosting technique, which consists of re-sampling classifiers, with replacement, several times, however, the re-sampled data are constructed in such a way that they obtain learning from the classification performed in the previous sampling (Friedman 2001FRERY AC, NASCIMENTO ADC & CINTRA RJ. 2011. Information Theory and Image Understanding: An Application to Polarimetric SAR Imagery. Chil J Stat 2(2): 81-100.). To obtain the final result, after all re-sampling, a combination method weighted by the classification performance in each model is used (Jafarzadeh et al. 2021JAFARZADEH H, MAHDIANPARI M, GILL E, MOHAMMADIMANESH F & HOMAYOUNI S. 2021. Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sens 13(21): 4405.).

Consider a data set \(\mathcal{D} = \{({\boldsymbol x}_i, y_i), i=1,\ldots,n\},\) with features \({\boldsymbol x}_j\in \mathbb{R}^p\) and target variable \(y_j \in \mathbb{R}.\) A predictive model based on tree ensemble can be described by

\[\widehat{y}_j = \phi({\boldsymbol x}_i) = \sum^K_{k=1} f_k({\boldsymbol x}_i), \ \ f_k\in \mathcal{F},\](16)
where \(K\) is the total number of trees, \(f_k\) for \(k\)th tree is a function in the functional space \(\mathcal{F}\), and \(\mathcal{F}\) is the the set of all possible regression trees (also known as CART). In the training, each of new-trained CART will try to complement the so-far residual. Objective function been optimized at \((t+1)\)th CART is described:
\[\label{eq:loss} \mathcal{L}(\phi) = \sum_{i}^n l( \widehat{y}_i^{(t)}, y_i ) + \sum_{k}\Omega( f_{k} ),\](17)
where \(l\) is a differentiable convex training loss function that measures the difference between the prediction \(\widehat{y}_i^{(t)}\) and the target \(y_i\) (ground truth) at step \(t.\) Here, the term
\[\Omega(f) = \gamma T + \frac{1}{2} \lambda \|w\|^2,\](18)
is the regularization fuction that helps to smooth the final learnt weights to avoid over-fitting. \(T\) are the number of leaves and \(w\) is the score on leaf. When (17) is optimized, Taylor’s expansion is used so that gradient descent can be used for different loss functions. Furthermore, feature selection is no need when we use XGBoost approach. During training period of XGBoost, good features would be chosen as node in trees, which means features not used are abandoned.

Naive Bayes

The Naive Bayes (NB) classifier (Domingos & Pazzani 1997DENG X,|L|PEZ-MARTÍNEZ C, CHEN J & HAN P. 2017. Statistical modeling of polarimetric SAR data: A survey and challenges. Remote Sens 9(4): 348.) is a simple probabilistic classifier based on an application of Bayes’ theorem. The main characteristic of the algorithm is that it completely disregards the correlation between variables, therefore, each feature is treated independently. Thus, the method deals with conditional probability, that is, what is the probability of event \(A\) to occur, given event \(B\). The classifier estimates a conditional probability density function of feature vectors by learning samples. Formally, assume a data set \(\mathcal{D} = \{({\boldsymbol x}_i, y_i), i=1,\ldots,n\},\) with \({\boldsymbol x}_i\in \mathbb{R}^p\) and \(y_i\) is a target variable \(c({\boldsymbol x}_i) \in \mathcal{C},\) where an instance \({\boldsymbol x}_i\) can be represented by an \(p\)-dimensional attribute value vector and \(\mathcal{C}\) is the set of all possible class labels \(c.\) The NB predict the class label of the instance \({\boldsymbol x}_i\) as follows:

\[\label{key1} c( {\boldsymbol x}_i )=\operatorname*{argmax\,}_{c \in \mathcal{C}} P(c) \prod_{j=1}^p P(x_{ij}|c),\](19)
where \(x_{ij}\) represents the value of the \(j\)th attribute of the \(i\)th instance, \(P(c)\) is the prior probability of class \(c\), and \(P(x_{ij} | c)\) is the conditional probability of the attribute value \(x_{ij}\) given the class \(c,\) which can be evaluated by Equation (20) and (21), respectively.
\[\label{key2} P(c)=\frac{\sum_{i=1}^n {\mathbb{I}(c_i,c)+1}} {n+\mathcal{V}(C)}\](20)
\[\label{key3} P(x_{ij}|c)=\frac{\sum_{i=1}^n {\mathbb{I}(c_i,c)\mathbb{I}(x_{ij},A_{j})+1}}{\sum_{i=1}^n {\mathbb{I}(c_i,c)}+\mathcal{V}(A_{j})},\](21)
where \(A_j\) represents all the values of the \(j\)th attribute in training instances, \(\mathcal{V}(\cdot)\) is a custom function to calculate the number of unique data in \(\mathcal{C}\) or \(A_j,\) \(c_i\) denotes the correct class label for the \(i\)th instance, and \(\mathbb{I}(a,b)\) is a indicator function, which takes the value 1 if \(a\) and \(b\) are identical and 0 otherwise.

The Kullback-Leibler classifier

The Kullback-Leibler divergence \(D_{\text{KL}}\) represents one mechanism of comparability of probability distributions based on Information Theory (Eguchi & Copas 2006DOMINGOS P & PAZZANI M. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2): 103-130.) and has a direct relationship with the Neymman-Pearson lemma for testing distributional behaviour being that its symmetrization version is a correction form of the Akaike information criterion (Seghouane & Amari 2007SEGHOUANE AK & AMARI SI. 2007. The AIC Criterion and Symmetrizing the Kullback-Leibler Divergence. IEEE Trans Neural Netw Learn Syst 18(1): 97-106.) used to assess goodness-of-fit in a statistical model viewpoint. The \(D_{\text{KL}}\) have been used in several image processing fields including segmentation Sarkar et al. 2021SARKAR S, HALDER T, PODDAR V, GAYEN RK, RAY AM & CHAKRAVARTY D. 2021. A Novel Approach for Urban Unsupervised Segmentation Classification in SAR Polarimetry. In: 2021 2nd International Conference on Range Technology (ICORT). p. 1-5. IEEE., classification (Kersten et al. 2005KERSTEN PR, LEE JS & AINSWORTH TL. 2005. Unsupervised Classification of Polarimetric Synthetic Aperture Radar Images Using Fuzzy Clustering and EM Clustering. IEEE Trans Geosci Remote Sens 43(3): 519-527.), boundary detection (Nascimento et al. 2021NASCIMENTO AD, SILVA KF & FRERY AC. 2021. Distance-based edge detection on synthetic aperture radar imagery. Chil J Stat 12(1).), and change detection (Bouhlel & Meric 2020BISHOP CM. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). 1st ed. Springer-Verlag, 738 p.) techniques, among others. Recently, (Ferreira et al. 2021EGUCHI S & COPAS J. 2006. Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma. J Multivar Anal 97: 2034-2040.) compare PolSAR divergence-based classifiers and the results indicate better performance over classic LDA, QDA, KNN classifiers.

Assume that \(\boldsymbol{X}\) and \(\boldsymbol{Y}\) are random matrices defined in the common support \(\mathcal{X}\) whose distributions are characterized by the densities \(f_{\boldsymbol{X}}(\boldsymbol{Z}';\boldsymbol{\theta}_1)\) and \(f_{\boldsymbol{Y}}(\boldsymbol{Z}';\boldsymbol{\theta}_2)\), respectively, where \(\boldsymbol{\theta}_1\) and \(\boldsymbol{\theta}_2\) are parameters. The Kullback-Leibler distance is defined by

\begin{aligned} d_{\text{KL}}(\boldsymbol{X},\boldsymbol{Y})&amp;=\frac 12 [D_{\text{KL}}(\boldsymbol{X},\boldsymbol{Y})+D_{\text{KL}}(\boldsymbol{Y},\boldsymbol{X})]\\ &amp;= \frac 12 \biggl[ \int_{\mathcal{X}} f_{\boldsymbol{X}}\log{\frac{f_{\boldsymbol{X}}}{f_{\boldsymbol{Y}}}} \mathrm{d}\boldsymbol{Z}' + \int_{\mathcal{X}} f_{\boldsymbol{Y}}\log{\frac{f_{\boldsymbol{Y}}}{f_{\boldsymbol{X}}}} \mathrm{d}\boldsymbol{Z}' \biggl] =\frac{1}{2}\int_{\mathcal{X}}(f_{\boldsymbol{X}}-f_{\boldsymbol{Y}})\log{\frac{f_{\boldsymbol{X}}}{f_{\boldsymbol{Y}}}}\mathrm{d}\boldsymbol{Z}', \end{aligned}
where the differential element \(\mathrm{d}\boldsymbol{Z}'\) is given by
\[\mathrm{d}\boldsymbol{Z}'=\prod_{i=1}^p\mathrm{d}Z_{ii}\prod^p_{\underbrace{i,j=1}_{i<j}}\mathrm{d}\Re\{Z_{ij}\} \mathrm{d}\Im\{Z_{ij}\}.\]
Here, \(Z_{ij}\) is the \((i,j)\)-th entry of matrix \(\boldsymbol{Z}'\), and \(\Re\) and \(\Im\) denote the real and imaginary parts, respectively (Goodman 1963GOODMAN NR. 1963a. Statistical Analysis Based on a Certain Complex Gaussian Distribution (an Introduction). Ann Math Stat 34: 152-177.). When considering the distance between particular cases of the same distribution, only the parameters are relevant. In this case, the parameters \(\boldsymbol{\theta_1}\) and \(\boldsymbol{\theta_2}\) replace the random variables \(\boldsymbol{X}\) and \(\boldsymbol{Y}\). This notation is in agreement with that of (Salicrú et al. 1994SALICRÚ M, MENÉNDEZ ML, PARDO L & MORALES D. 1994. On the Applications of Divergence Type Measures in Testing Statistical Hypothesis. J Multivar Anal 51: 372-391.).

By assume that the random matrices \(\boldsymbol{X}\) and \(\boldsymbol{Y}\) are distributed according to Wishart distribution with \(\boldsymbol{\theta}_1=(L_1,\boldsymbol{\Sigma}_1)\) and \(\boldsymbol{\theta}_2=(L_2,\boldsymbol{\Sigma}_2),\) (Frery et al. 2011FRERY AC, CORREIA AH & FREITAS CDC. 2007. Classifying multifrequency fully polarimetric imagery with multiple sources of statistical evidence and contextual information. IEEE Trans Geosci Remote Sens 45(10): 3098-3109.) presented closed expressions for such distance:

\begin{aligned} % d_\text{KL}(\boldsymbol{\theta}_1,\boldsymbol{\theta}_2)=&amp;\frac{L_1-L_2}{2}\bigg\{\log\frac{|\boldsymbol{\Sigma}_1|}{|\boldsymbol{\Sigma}_2|}-p\log\frac{L_1}{L_2}\nonumber +\psi_p^{(0)}(L_1)-\psi_p(L_2)\bigg\} \nonumber \\ &amp;-\frac{p(L_1+L_2)}{2} %&amp;+(L_2-L_1)\displaystyle \sum_{i=1}^{m-1}\frac{i}{(L_1-i)(L_2-i)}\bigg\} \nonumber \\ + \frac{\operatorname{tr}(L_2\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\Sigma}_1+L_1\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\Sigma}_2)}{2}. \label{expreKL} \end{aligned}(22)
In practical situation, \(L_1=L_2=L\) (generally \(L=4\)) and the expression (22) reduce to
\[d_{\text{KL}}(\boldsymbol{X},\boldsymbol{Y}) = L\bigg\{ \frac{\operatorname{tr}(\boldsymbol{\Sigma}_2^{-1}\boldsymbol{\Sigma}_1+\boldsymbol{\Sigma}_1^{-1}\boldsymbol{\Sigma}_2)}{2} - p\bigg\}.\](23)

In this way, a classification strategy use a supervised scheme based on stochastic distance between between pairs of complex Wishart distributions that model segments of PolSAR data on training samples which represent classes. The assignation of a class is given by the minimization of the stochastic distance \(d_{\text{KL}}\) in a segment, i.e.,

\[\widehat{c}_{_{KL}} = \underset{c \in \mathcal{C}}{\mathrm{arg min}} \ d_{\text{KL}}(\boldsymbol{X},\boldsymbol{Y}).\](24)

Kullback-Leibler distance has been used for PolSAR classification, with outstanding results, in deep neural networks as the main component of the loss function (Wang & Wang 2019WANG R & WANG Y. 2019. Classification of PolSAR Image Using Neural Nonlocal Stacked Sparse Autoencoders with Virtual Adversarial Regularization. Remote Sens 11(1038): 1-20.).

Experimental Setup

In this section, we describe the experiments carried out to validate our analysis of the selected classification methods. We perform results for one actual PolSAR dataset. The metrics used to validate the results are also detailed in this section.

As as difference with most similar research, we do not test the algorithms on synthetic data, neither Monte Carlo analysis have been done. The rationale behind this is due to direct translation from results obtained on PolSAR synthetic data to actual PolSAR data is not as useful as first thought. In this sense, more realistic simulated data is needed beyond the usual approach of generating ramdon samples following the Wishart law and corrupting a bitmap image used as ground truth.

In most works the data are filtered to remove the speckle content, for instance, with the Lee filter (Cheng et al. 2021CAO DS, HUANG JH, YAN J, ZHANG LX, HU QN, XU QS & LIANG YZ. 2012. Kernel k-nearest neighbor algorithm as a flexible SAR modeling tool. Chemom Intell Lab Syst 114: 19-23., Wang & Wang 2019WANG R & WANG Y. 2019. Classification of PolSAR Image Using Neural Nonlocal Stacked Sparse Autoencoders with Virtual Adversarial Regularization. Remote Sens 11(1038): 1-20.) where PolSAR data is filtered and then classified by using deep learning methods). However, as mentioned above, in this work PolSAR data is not filtered: this is done to evaluate, in a direct way, the performances of used classifiers. In doing so, any possible biased due to the filtering operation is completely avoided.

Actual PolSAR data

The fully polarimetric image used to demonstrate the eficiency of the selected classification methods is the San Francisco Bay region. San Francisco data (see Fig. 2) is a spaceborne multilook intensity fully PolSAR data (ENL = 4) of size \(1600 \times 1200\) pixels acquired by the C-band RADARSAT-2 PolSAR system over the San Francisco area. The spatial resolution is 12 m. in range direction and 8 m. in the azimuth direction. The data is represented using RGB composite image with Pauli basis. Although there is no a true ground truth image available to compare with (it is not a synthetic image), a commonly used reference image PolSAR (Uhlmann & Kiranyaz 2013UHLMANN S & KIRANYAZ S. 2013. Integrating color features in polarimetric SAR image classification. IEEE Trans Geosci Remote Sens 52(4): 2197-2216.), is generally used. The reference image is also included in the figure. As it can be seen, five regions (classes) can be easily identified: high-density urban area, low-density urban area, developed area, forest area and the large ocean region. The black area is not a well-defined area, consisting of beaches, soil and other components. It is not included in our classification.

We consider ourselves to be in the field of classification, despite the fact that the processing is carried out on a pixel-by-pixel basis. This is because we utilize known labels to classify new data into these classes.

Figure 2
Pauli-basis image of the RADARSAT-2 San Francisco data (left) and ground truth (right).

Evaluation criteria

To robustly assess the classification performance of the methods, we employ the well-known overall accuracy (Story & Congalton 1986STORY M & CONGALTON RG. 1986. Accuracy assessment: a user’s perspective. Photogramm Eng Remote Sens 52(3): 397-399.) and the kappa coefficient (Landis & Koch 1977LANDIS JR & KOCH GG. 1977. The measurement of observer agreement for categorical data. Biometrics p. 159-174.) as evaluation indicators. These criteria are derived from a matrix of confusion, associated with a classifier. Its structure for \(c\) classes is given by:

\[M = \begin{bmatrix} n_{11} &amp; \dots &amp; n_{1c}\\ \vdots &amp; \ddots &amp; \vdots\\ n_{c1} &amp; \dots&amp; n_{cc} \end{bmatrix},\]
where each row of \(M\) represents a real class \(i\) and the columns are the predicted classes \(j\) with \(i,j= 1,\dots,c\). The total of lines, \(n_{i.}=\sum_{j=1}^{c}n_{ij}\) corresponds to the number of patterns in the class \(i\), and the total of columns \(n_{.j}=\sum_{i=1}^{c}n_{ij}\) is the number of pattern classified for the class \(j\) by the classifier investigated. The total number of patterns under investigation is given by \(N=\sum_{i=1}^{c}n_{i.}\). From the quantities defined above, the accuracy (AC) and the kappa coefficient (\(\kappa\)) are expressed:
\[AC={\sum^{c}_{i=1}n_{ii}}/N,\](25)
\[\kappa={(Ac-P_{c})}/{(1-P_{c})},\](26)
where \(P_c= \sum_{i=1}^{c} (n_{i.}n_{.j})/N^2\). The ideal values for both indices are 1 (\(100\%\) in percentages).

Selection of samples for training the classifiers

Figure 3 shows the selected subset data (150 \(\times\) 150 pixel) for the San Francisco image. The places highlighted in the images represent the areas selected for the training samples (in red color) and test (in blue color) (Figure 3a).).

Figure 3
San Francisco image. (a) Training samples (red) and test (blue). (b) Box-Plot of San Francisco – Channels. (c) Box-Plot of San Francisco – Regions – HH Channel.(d) Principal component analysis. (e) Principal component analysis 3D.

Three regions were selected at each stage (training and testing). The regions were chosen in such a way that each one represented a homogeneous region with a specific type of texture; i.e., ocean (less textured or more homogeneous), forest (half texture) and urban (extremely textured). The data from region 1, which is the least textured, represents the ocean and has dimensions of 26\(\times\)16 pixels for the training sample and 41\(\times\)21 pixels for the test sample. The forest is represented by the region number 2 in which the sample size is 23\(\times\)16 pixels for the training sample and 41\(\times\)21 pixels for the test sample. The data area 3 represents the most heterogeneous region, which is the urban region, in which the data have dimensions of 40\(\times\)15 pixels in the training sample and 41 \(\times\) 21 pixels in the test sample.

From our experience, the size of the training and testing samples worked well for all the cases.

Although all the classification methods discussed in this work use the fully PolSAR data, we show the the box-plot for the channels HH, HV and VV as a whole single class for each channel, and for each separate class (see Figure 3, (b) and (c)).

A preliminar statistical analysis of samples is done with the purpose of identifying some existing structure. Note that the selected areas can be considered homogeneous and therefore, spatially correlated. PCA (Principal Component Analysis) is a widely used technique to reduce data dimensionality and from that, existing patterns within the data can be easily extracted.

The result from the PCA analysis for San Francisco data is also shown in Figure 3 (see subfigures d and e). It can be seen a clear trend showing that the three selected areas -classes- are naturally clustered and well separated. In addition, it can be seen that that the Ocean area is more homogeneous than the Forest and Urban areas. It may be also noted that the Urban area does not intersect with the Ocean area.

Results

Once the training has been done, all the methods are applied to the original San Francisco image. In that set, each pixel is formed by a defined positive Hermitian matrix, and in the main diagonal are the values defined as intensities. Note that, in our study, we operate in the classification domain, even though the computations are performed at the pixel level. This is because we leverage existing labels to assign new data to their respective classes.

Table I Summary of best results for San Francisco image.
Classifier Parameters % kappa coef.
accuracy (stand. dev. Kappa)
k-NN \(k\) = 15 80.85 71.28
(\(1.117\times 10^{-4}\))
kernel k-NN - radial \(k\) = 23, σ = 1 85.36 78.04
(\(9.704\times 10^{-5}\))
fuzzy k-NN \(k\) = 14, \(m\) = 5 85.92 78.88
(\(9.407\times 10^{-5}\))
kernel fuzzy k-NN - radial \(k\) = 18, σ = 1 84.87 77.32
(\(9.971\times 10^{-5}\))
Naive Bayes - 84.84 64.77
(\(1.323\times10^{-4}\))
SVM b = 10, \(\gamma\) = 3 85.32 77.99
(\(9.738\times10^{-5}\))
XgBoost \(\eta\) = 0.3, \(\lambda\) = 0.6, ntree = 14 85.47 78.20
(\(9.676\times10^{-5}\))
KL distance - 92.40 88.61
(\(5.464\times10^{-5}\))
  • *
    \(*\gamma=1/2\sigma^2\).
  • The accuracy of the overall classification for San Francisco image of each classifier can be seen in Table I. All classifiers obtained a general percentage of accuracy more than 80% and less than 93%. It is relevant to remind that no despeckling filters have been applied. Therefore, roughly speaking, from those results shown in this table, the shallow machine learning classifiers discussed herein perform quite well. This seems specially true if one takes into account their low computational cost and their low complexity and low training requirements (when compared to more sophisticated techniques).

    The KL distance was the classifier with the best percentage of correctness. The accuracy was 92.40%, with a kappa coefficient of 88.61% and kappa standard deviation of \(5.464\times10^{-5}\).

    The second best classifier for the San Francisco image obtained the accuracy 85.92%, kappa coefficient of 78.88% and kappa standard deviation 9.407\(\times 10^{-5}\), this result was obtained by the algorithm of fuzzy k-NN with \(k=14\) and \(m=5.\) However XGBoost a kernel fuzzy k-NN are close in performance.

    The worst classification among the proposed algorithms was obtained by the kernel fuzzy k-NN radial using the radial base kernel, with parameter \(\sigma = 1\) and number of neighbors equal to 18. This classifier obtained a accuracy of 84.87%, kappa coefficient of 77.32 and kappa standard deviation of 9.971\(\times 10^{-5}\).

    In Table II the confusion matrices for each proposed algorithm are found, that is, through these matrices, it is possible to analyze the accuracy of each algorithm according to the region to be discriminated. In the lines are the real classes and in the columns the classes estimated by the classifiers. Thus, it is noticed that the KL distance was the best algorithm to classify areas of Forest and Urban with accuracy of 94.59% and 84.60% respectively. For Ocean area, we can observe that all algorithms achieved a performance of 100% of accuracy.

    The setting of parameters for the classification methods analyzed in this work, k-NN, kernel k-NN, fuzzy k-NN, kernel fuzzy k-NN, SVM, Naive Bayes, and XGBoost, were estimated via cross validation (for details, see (Bishop 2006BINTI JAAFAR H, BINTI MUKAHAR N & BINTI RAMLI DA. 2016. A methodology of nearest neighbor: Design and comparison of biometric image database. In: Research and Development (SCOReD), 2016 IEEE Student Conference on. p. 1-6. IEEE.)).

    Conclusions

    In this study we used eight classification algorithms, four based on k-NN, one based on naive Bayes, one that uses Support Vector Machine (SVM), one produced by randomized decision tree and finally one based on stochastic distance. The latter exploits in a natural way the stochastic nature of the PolSAR data, that is, it is well-suited to this kind of remote sensing data.

    The algorithms were used to classify the PolSAR data intensities of the regions of a well-known PolSAR dataset: the San Francisco Bay region.

    For the eight classifiers the crossvalidation procedure was used to determine the best parameters according to the correct percentage. Although all methods are supervised, minimum information is required: the number of classes and some representatives for the classes. Therefore, training is also minimum and consequently, any of the discussed methods can be easily applied to new datasets.

    All classifiers for San Francisco-USA image had a kappa coefficient above 80% and below 93%. It is clear the nowadays deep learning techniques show classification results with kappa \(\approx 1\) (Cheng et al. 2021CAO DS, HUANG JH, YAN J, ZHANG LX, HU QN, XU QS & LIANG YZ. 2012. Kernel k-nearest neighbor algorithm as a flexible SAR modeling tool. Chemom Intell Lab Syst 114: 19-23., Wang & Wang 2019WANG R & WANG Y. 2019. Classification of PolSAR Image Using Neural Nonlocal Stacked Sparse Autoencoders with Virtual Adversarial Regularization. Remote Sens 11(1038): 1-20.). However, at mentioned above, deep learning techniques require extensive training and results are difficult to reproduce.

    The solely purpose of this work is to show that machine learning methods, such as the ones discussed herein, remain still valid for classification of PolSAR data. Such simple methods give single-processor algorithms that can be competitive with state-of-the-art techniques. Additionally, it has been shown that there is room to improve their present capabilities (\(\approx 90\)) to keeping competitive to the new paradigms offered by deep learning methods. The methodology for tuning the different methods discussed in this work provided acceptable results, and it is clear that better results can be obtained by better parameter setting.

    Code availability

    A computer with a processor Intel Dual-Core 2.20GHz, 3GB RAM, Arch Linux System was used in the experiments. The algorithms were implemented in the R programming language (R Core Team 2023R CORE TEAM. 2023. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. URL http://www.R-project.org/.
    http://www.R-project.org/...
    ). The source codes are available at: https://github.com/Raydonal/k-Nearest-PolSAR

    ACKNOWLEDGMENTS

    This research is partly supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) through the grants No. 03192/2022-4 and 402519/2023-0 (R.O.) and by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) in Brazil under financing code 001. The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyzes, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

    • ANFINSEN SN, DOULGERIS AP & ELTOFT T. 2009. Estimation of the equivalent number of looks in polarimetric synthetic aperture radar imagery. EEE Trans Geosci Remote Sens 47(11): 3795-3809.
    • BINTI JAAFAR H, BINTI MUKAHAR N & BINTI RAMLI DA. 2016. A methodology of nearest neighbor: Design and comparison of biometric image database. In: Research and Development (SCOReD), 2016 IEEE Student Conference on. p. 1-6. IEEE.
    • BISHOP CM. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). 1st ed. Springer-Verlag, 738 p.
    • BOUHLEL N & MERIC S. 2020. Multilook Polarimetric SAR Change Detection Using Stochastic Distances Between Matrix-Variate Gd0 Distributions. EEE Trans Geosci Remote Sens 58(10): 6823-6843.
    • BREIMAN L. 1996. Bagging predictors. Mach Learn 24(2): 123-140.
    • CAMPBELL C & YING Y. 2011. Learning with support vector machines. Synth Lect Artif Intell Mach Learn 5(1): 1-95.
    • CAO DS, HUANG JH, YAN J, ZHANG LX, HU QN, XU QS & LIANG YZ. 2012. Kernel k-nearest neighbor algorithm as a flexible SAR modeling tool. Chemom Intell Lab Syst 114: 19-23.
    • CHENG J, ZHANG F, XIANG D, YIN Q, ZHOU Y & WANG W. 2021. PolSAR Image Land Cover Classification Based on Hierarchical Capsule Network. Remote Sens 13(16): 3132.
    • CINTRA RJ, FRERY AC & NASCIMENTO AD. 2013. Parametric and nonparametric tests for speckled imagery. Pattern Anal Appl 16(2): 141-161.
    • CLOUDE S & POTTIER E. 1997. An entropy based classification scheme for land applications of polarimetric SAR. IEEE Trans Geosci Remote Sens 35(1): 68-78. doi:10.1109/36.551935.
    • COVER T & HART P. 1967. Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1): 21-27.
    • DENG X,|L|PEZ-MARTÍNEZ C, CHEN J & HAN P. 2017. Statistical modeling of polarimetric SAR data: A survey and challenges. Remote Sens 9(4): 348.
    • DOMINGOS P & PAZZANI M. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2): 103-130.
    • EGUCHI S & COPAS J. 2006. Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma. J Multivar Anal 97: 2034-2040.
    • FERREIRA JA, COÊLHO H & NASCIMENTO AD. 2021. A family of divergence-based classifiers for Polarimetric Synthetic Aperture Radar (PolSAR) imagery vector and matrix features. Int J Remote Sens 42(4): 1201-1229.
    • FIX E. 1951. Discriminatory analysis: nonparametric discrimination: consistency properties. Report No 4, USAF School of Aviation Medicine, Randolph Field, Texas.
    • FIX E & HODGES JL. 1989. Discriminatory analysis. Nonparametric discrimination: consistency properties. Int Stat Rev 57(3): 238-247.
    • FRERY AC, MULLER H, YANASSE CDCF & SANT’ANNA SJS. 1997. A model for extremely heterogeneous clutter. IEEE Trans Geosci Remote Sens 35(3): 648-659.
    • FRERY AC, CORREIA AH & FREITAS CDC. 2007. Classifying multifrequency fully polarimetric imagery with multiple sources of statistical evidence and contextual information. IEEE Trans Geosci Remote Sens 45(10): 3098-3109.
    • FRERY AC, NASCIMENTO ADC & CINTRA RJ. 2011. Information Theory and Image Understanding: An Application to Polarimetric SAR Imagery. Chil J Stat 2(2): 81-100.
    • FRIEDMAN JH. 2001. Greedy function approximation: a gradient boosting machine. Ann Stat, p. 1189-1232.
    • GIROLAMI M. 2002. Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw Learn Syst 13(3): 780-784.
    • GOMEZ L, ALVAREZ L, MAZORRA L & FRERY AC. 2017. Fully PolSAR image classification using machine learning techniques and reaction-diffusion systems. Neurocomputing 255: 52-60.
    • GOODMAN NR. 1963a. Statistical Analysis Based on a Certain Complex Gaussian Distribution (an Introduction). Ann Math Stat 34: 152-177.
    • GOODMAN NR. 1963b. Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction). Ann Math Stat 34(1): 152-177.
    • GUO HD, ZHANG L & ZHU LW. 2015. Earth observation big data for climate change research. Adv Clim Change Res 6(2): 108-117.
    • HSU CW & LIN CJ. 2002. A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw Learn Syst 13(2): 415-425.
    • JAFARZADEH H, MAHDIANPARI M, GILL E, MOHAMMADIMANESH F & HOMAYOUNI S. 2021. Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sens 13(21): 4405.
    • KELLER J, GRAY M & GIVENS JR J. 1985. A frezzy K-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 15(4): 580-585.
    • KERSTEN PR, LEE JS & AINSWORTH TL. 2005. Unsupervised Classification of Polarimetric Synthetic Aperture Radar Images Using Fuzzy Clustering and EM Clustering. IEEE Trans Geosci Remote Sens 43(3): 519-527.
    • KOTRU R, SHAIKH M, TURKAR V, SIMU S, BANERJEE S & SINGH G. 2021. Semantic Segmentation of PolSAR Images for Various Land Cover Features. In: Int Geosci Remote Sens Symp (IGARSS). p. 351-354. IEEE.
    • KUHN M & JOHNSON K. 2013. Over-fitting and model tuning. In: Appl Predic Mod, p. 61-92. Springer.
    • LANDIS JR & KOCH GG. 1977. The measurement of observer agreement for categorical data. Biometrics p. 159-174.
    • LEE J & GRUNES M. 1992. Classification of multi-look polarimetric SAR data based on complex Wishart distribution. In: [Proceedings] NTC-92: National Telesystems Conference, p. 7-21. IEEE.
    • LEE JS & POTTIER E. 2009. Polarimetric Radar Imaging From Basics to Applications. CRC Press.
    • LEE JS, AINSWORTH TL & WANG Y. 2017. A review of polarimetric SAR speckle filtering. In: Int Geosci Remote Sens Symp (IGARSS), p. 5303-5306. doi:10.1109/IGARSS.2017.8128201.
    • LIU C, LIAO W, LI HC, FU K & PHILIPS W. 2018. Unsupervised classification of multilook polarimetric SAR data using spatially variant wishart mixture model with double constraints. IEEE Trans Geosci Remote Sens 56(10): 5600-5613.
    • LIU F, SHI J, JIAO L, LIU H, YANG S, WU J, HAO H & YUAN J. 2016. Hierarchical semantic model and scattering mechanism based PolSAR image classification. Pattern Recognit 59: 325-342.
    • LOFTSGAARDEN DO ET AL. 1965. A nonparametric estimate of a multivariate density function. Ann Math Stat 36(3): 1049-1051.
    • LUO S, SARABANDI K, TONG L & GUO S. 2020. Unsupervised Multiregion Partitioning of Fully Polarimetric SAR Images With Advanced Fuzzy Active Contours. IEEE Trans Geosci Remote Sens 58(2): 1475-1486. doi:10.1109/TGRS.2019.2947376.
    • MERCER JB. 1909. XVI. Functions of positive and negative type, and their connection the theory of integral equations. Philos Trans R Soc A 209(441-458): 415-446.
    • MULLISSA AG, PERSELLO C & STEIN A. 2019. PolSARNet: A deep fully convolutional network for polarimetric SAR image classification. IEEE J Sel Top Appl Earth Obs Remote Sens 12(12): 5300-5309.
    • NASCIMENTO AD, SILVA KF & FRERY AC. 2021. Distance-based edge detection on synthetic aperture radar imagery. Chil J Stat 12(1).
    • NIKDEL H, FORGHANI Y & MOHAMMAD HOSEIN MOATTAR S. 2018. Increasing the speed of fuzzy k-nearest neighbours algorithm. Expert Syst 35(3): e12254.
    • PATHAK MA. 2014. Beginning data science with R. Springer.
    • R CORE TEAM. 2023. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. URL http://www.R-project.org/
      » http://www.R-project.org/
    • SALICRÚ M, MENÉNDEZ ML, PARDO L & MORALES D. 1994. On the Applications of Divergence Type Measures in Testing Statistical Hypothesis. J Multivar Anal 51: 372-391.
    • SARKAR S, HALDER T, PODDAR V, GAYEN RK, RAY AM & CHAKRAVARTY D. 2021. A Novel Approach for Urban Unsupervised Segmentation Classification in SAR Polarimetry. In: 2021 2nd International Conference on Range Technology (ICORT). p. 1-5. IEEE.
    • SEGHOUANE AK & AMARI SI. 2007. The AIC Criterion and Symmetrizing the Kullback-Leibler Divergence. IEEE Trans Neural Netw Learn Syst 18(1): 97-106.
    • STORY M & CONGALTON RG. 1986. Accuracy assessment: a user’s perspective. Photogramm Eng Remote Sens 52(3): 397-399.
    • TAO M, ZHOU F, LIU Y & ZHANG Z. 2015. Tensorial independent component analysis-based feature extraction for polarimetric SAR data classification. IEEE Trans Geosci Remote Sens 53(5): 2481-2495.
    • UHLMANN S & KIRANYAZ S. 2013. Integrating color features in polarimetric SAR image classification. IEEE Trans Geosci Remote Sens 52(4): 2197-2216.
    • WANG H, XU F & JIN YQ. 2019. A Review of Polsar Image Classification: from Polarimetry to Deep Learning. In: Int Geosci Remote Sens Symp (IGARSS), p. 3189-3192. doi:10.1109/IGARSS.2019.8899902.
    • WANG R & WANG Y. 2019. Classification of PolSAR Image Using Neural Nonlocal Stacked Sparse Autoencoders with Virtual Adversarial Regularization. Remote Sens 11(1038): 1-20.
    • WU X & ZHOU J. 2005. Kernel-based Fuzzy K-nearest-neighbor Algorithm. In: Computational Intelligence for Modelling, Control and Automation, 2005 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, International Conference on, Vol. 2, p. 159-162.
    • XIE W, XIE Z, ZHAO F & REN B. 2018. POLSAR Image Classification via Clustering-WAE Classification Model. IEEE Access 6: 40041-40049. doi:10.1109/ACCESS.2018.2852768.
    • YAMAGUCHI Y, MORIYAMA T, ISHIDO M & YAMADA H. 2005. Four-component scattering model for polarimetric SAR image decomposition. IEEE Trans Geosci Remote Sens 43(8): 1699-1706. doi:10.1109/TGRS.2005.852084.
    • YU K, JI L & ZHANG X. 2002. Kernel nearest-neighbor algorithm. Neural Process Lett 15(2): 147-156.
    • ZHONG C, LIU Y, GAO P, CHEN W, LI H, HOU Y, NUREMANGULI T & MA H. 2020. Landslide mapping with remote sensing: challenges and opportunities. Int J Remote Sens 41(4): 1555-1581.
    • ZUO W, ZHANG D & WANG K. 2008. On kernel difference-weighted k-nearest neighbor classification. Pattern Anal Appl 11(3-4): 247-257.

    Publication Dates

    • Publication in this collection
      22 Apr 2024
    • Date of issue
      2024

    History

    • Received
      25 Jan 2023
    • Accepted
      18 May 2023
    Academia Brasileira de Ciências Rua Anfilófio de Carvalho, 29, 3º andar, 20030-060 Rio de Janeiro RJ Brasil, Tel: +55 21 3907-8100 - Rio de Janeiro - RJ - Brazil
    E-mail: aabc@abc.org.br