Classification of Multimission SAR Images Based on Probabilistic Graphical Models and Convolutional Neural Networks

The problem of the semantic segmentation of multimodal images is characterized by the challenge of jointly exploiting information deriving from images possibly acquired at different spatial resolutions, frequencies, and bands. This paper proposes to address this task in the case of multimission synthetic aperture radar (SAR) images, through a combination of fully convolutional networks (FCNs), hierarchical probabilistic graphical models (PGMs), and decision tree ensembles. The objective is to model the spatial and multiresolution information contained in multimodal remote sensing images collected by distinct space missions with SAR payloads. The experimental validation is conducted with COSMO-SkyMed and SAOCOM images over Northern Italy. The results show that the proposed methodology is capable to reach accurate classification maps from input multimission SAR imagery.


INTRODUCTION
Given the current advances in space missions for Earth observation, it is possible to have access to multimodal satellite imagery, characterized by different synthetic aperture radar (SAR) modalities, and therefore frequencies, bands, polarizations, and spatial resolutions [1].The combination of data obtained from different sensors allows trade-offs between different acquisitions, possibly containing complementary information.Methods for the analysis of multimodal data are getting more and more popular to fully exploit all the available information in the field of remote sensing applications [2].
In the framework of semantic segmentation, deep learning (DL) models are currently major state-of-the-art techniques.Several models have been developed for remote sensing image classification [3].Amongst them, fully convolutional networks (FCNs) have proven capable to obtain accurate classification results over complex remote sensing scenes [4,5].Various DL-based methods have been proposed to deal with multisource data [2,6,7,8].However, as a drawback, University of Genoa and Université Côte d'Azur (UCA) are part of the Ulysseus Alliance (European University).https://ulysseus.eu/ in order to reach high performances, DL methods generally require large training datasets, difficult to retrieve [9] and rarely available in remote sensing applications.Sometimes they may also poorly model spatial consistency [10].
At the same time, stochastic models such as probabilistic graphical models (PGMs) are powerful tools for image processing tasks, as they can be employed for structured prediction problems within Bayesian inference schemes.Particularly, for 2D image analysis, random fields, such as Markov random fields (MRFs) [11] are capable to model spatial and multiresolution information [11,12].
The approach in [13], in particular, adopts a quadtree topology by defining, in each of the associated lattices, a Markov chain-based model for the spatial dependencies between pixels.The result is a set of classification maps jointly generated at all spatial resolutions of the input multimission dataset.A hierarchical PGM is defined over the quadtree, in order to exploit the intrinsic multiscale behavior of FCNs [14], allowing to incorporate multiresolution information through its hidden layers.This paper extends the methodology previously developed in [13] for the classification of optical aerial imagery to the classification of data obtained from different SAR missions on the same date, exploiting the statistical, multifrequency and multiresolution information provided by multiple sensors and their intrinsically complementary nature.

METHODOLOGY
The model (shown in Fig. 1) comprises three main methodological components: an FCN, a hierarchical PGM, and a decision tree ensemble (e.g., random forest) [15].The latter aims to link the feature representation of the FCN and the Bayesian inference of the PGM through a suitable set of pixelwise posterior probabilities.
Starting with the DL component, any FCN can be employed in the proposed framework: given its encoder-decoder architecture, an FCN is capable to yield classification outputs with the same size of the input, making it a convenient tool for semantic segmentation tasks.
The multimodal information is further modeled through the hierarchical PGM which consists of a combination of a hierarchical and a planar MRF to access both multiscale and spatial-contextual information between neighboring pixels.Hierarchical MRFs are known to be causal [11], while this feature is not guaranteed for planar MRFs.Hence, a neighborhood relation À is introduced in the pixel grid S to ensure the causality of the whole PGM: r À s indicates that pixel r is a causal neighbor of pixel s (r, s P S, r ă s).Given X " tx s u sPS , the random field of the discrete class labels x s of all pixels s P S, and a set of l " 1, . . ., L pixel grids at different resolutions, the hierarchical PGM is defined as: where the first part of the equation derives from the hierarchical Markovian property, P pX l |X l´1 , . . ., X 0 q " P pX l |X l´1 q, and x s ´represents the class label of the parent site s ´P S l´1 of pixel s P S l pl " 1, . . ., L; L indicating the grid at the finest spatial resolutionq.The hierarchical Markov model is defined over a quadtree.More details on the methodology can be found in [13,16].This approach was originally developed for the semantic segmentation from aerial optical imagery of urban areas.Here, it is extended to the challenging case of semantic segmentation from input very high resolution (VHR) satellite SAR multimission imagery.Ideally, a collection of SAR images taken on the same date is used.Operatively, a time series of SAR images, collected by distinct missions on different dates with rather small mutual time distances, are used, under the assumption that no relevant changes have occurred in the land cover of the monitored scene during the overall acquisition timeframe.In the proposed approach, the resulting multimission SAR data are incorporated into the aforementioned quadtree topology, whose levels represent different resolutions and contain, therefore, both the feature maps of the hidden layers of the decoder of the FCN and the original bands of the multimodal input SAR images, inserted respecting the original relationship between resolutions.Furthermore, the network included in the proposed method is trained with a multimission dataset of SAR images, set in input to the encoder respecting the relationship between their native resolution to take advantage of the multiscale topology of FCNs.
In view of the causality of the overall hierarchical PGM, an efficient non-iterative inference algorithm can be used for the final inference.The marginal posterior mode (MPM) [12,17] criterion is employed as it is especially advantageous for applications involving multiscale information, because it penalizes errors according to their scale and it can be defined on quadtrees with three efficient recursive steps [16].The proposed method is summarized in Algorithm 1.
Algorithm 1 FCN, MPM on the hierarchical PGM, and RF First top-down pass: estimation of the priors P px s q 5: RF classifier estimation of the posteriors P px s |y s q through RF 6: Bottom-up pass: estimation of P px s |y d s q and P px s |x c s , y d s q, where y d s collects the observations of all descendants of s in the tree (including s), x c s collects the labels of all sites connected to s (x s ´and tx r u rÀs ) 7: Second top-down pass: estimation of P px s |Yq at each level of the quadtree 8: Output: maximization of P px s |Yq

EXPERIMENTAL VALIDATION
The method was applied to a dataset of multimission SAR images acquired by COSMO-SkyMed and SAOCOM in 2021 over Lombardy, Northern Italy.It consists of Stripmap GTC (Geocoded terrain corrected) images with pixel spacing of 2.5 m and around 9 m, respectively.The polarization of the SAO-COM images is VH and VV, while COSMO-SkyMed acquisitions are HH-polarized.The DUSAF1 ("Land use and land cover of the Lombardy Region") data archive, containing land cover information for the year 2018 was used as guideline to define a ground truth (GT).Starting from this archive, four semantic macro-classes of interest were selected for the experiments: urban areas, vegetation, clutter, and water."Clutter" is relatively meaningless as a land cover class, since it comprises all the surfaces that do not explicitly belong to the other semantic classes (in this dataset: beaches, quarries, dumps, degraded areas, detrital accumulations), hence it is highly mixed.Moreover, it represents a negligible percentage of pixels in the images.The resulting dataset consists of tiles of size 1024 ˆ1024 pixels, with the COSMO-SkyMed images multilooked at a resolution of 5 m to reduce the impact of the speckle, and the dualpol SAOCOM images resampled at 10 m to respect the power-of-two relation typical of a quadtree.These images were acquired by the two satellites over the summer 2021, thus not presenting any land cover discrepancy due to seasonal changes (e.g., volume of water bodies, seasonal vegetation).These images, properly split into the training and test sets, were used to assess the performances of the proposed architecture.Given the mismatch in time between the GT and the radar acquisitions, and the possible changes in the extension of the water bodies during the years, the labels for the class "water" were manually corrected to match their actual extension on the SAR images.
The proposed technique depends on three main hyperparameters: L, the number of levels in the quadtree (set to L " 4 for the experiments), and two real-valued parameters ϑ and ψ, representing the transition probabilities in the hierarchical PGM; ϑ and ψ were optimized empirically by trial-and-error.
The proposed method for the classification of multimission images was compared to HRNet [19], another technique dealing with multiscale information through multiresolution subnetworks connected in parallel.
The quantitative results are reported in Table 1 and suggest that incorporating multiresolution and spatial contextual information through the proposed approach guarantees a better discrimination of the land cover classes, with gains in terms of producer accuracy (PA), or recall, user accuracy (UA), or precision, and overall accuracy (OA).From the presented results, it is possible to notice that lower values of the transition probability across the scales, ϑ, correspond to higher OA and UA, but slightly lower PA, hence suggesting that, in this case, information at coarser resolution does not contribute significantly to the final prediction.Experiments with lower values of ψ, not reported for brevity, were characterized by lower values in terms of both PA and UA.
The classification maps obtained by the proposed method and the techniques used for comparison are shown in Fig. 2 and suggest the effectiveness of the proposed method in com-bining SAR images coming from different sensors, at different frequencies, spatial resolutions, and polarizations, to perform land cover mapping from input multimission SAR data.In particular, the map generated by the proposed approach visually well discriminates the land covers in the considered scene.This experimental validation, conducted with images from the COSMO-SkyMed and SAOCOM satellite missions, points out the effectiveness of the integration of FCN and PGM methodological components for the semantic segmentation not only of aerial optical imagery but also of multimission radar images.

DISCUSSION AND CONCLUSION
This paper presents a method for the joint classification of multimission SAR images through FCNs, hierarchical PGMs and decision tree ensembles.The idea is to make use of the multiresolution modeling structure of the aforementioned techniques to fully exploit multimodal, possibly complementary, information.
The experimental validation, conducted with images from COSMO-SkyMed and SAOCOM, shows the potentials of the approach previously developed for both the semantic segmentation of aerial optical imagery and single sensor radar data in the case of multimission radar images.
The results suggest the effectiveness of the proposed method, which achieves higher performances than standard FCNs and other multiresolution techniques in terms of OA, precision, and recall.The classification maps, as well, appear to be smooth and prove the capabilities of the proposed method in discriminating the land cover classes considered.

ACKNOWLEDGMENT
Project carried out using COSMO-SkyMed Products, © of the Italian Space Agency (ASI), and SAOCOM Products, © of the Argentinian Space Agency (CONAE) delivered under a license to use by ASI and CONAE.The activity of the first three authors was partially supported by ASI in the framework of the project MultiBigSARData -ASI no.2021-7-U.0;the support is gratefully acknowledged.

1 :
Training of the FCN with the input multimission SAR dataset at different convolutional blocks 2: Creation of the L´levels quadtree containing, in the random field Y " ty s u s P S of the observations, the network feature maps and the original channels of the image to classify the corresponding resolution 3: Beginning of the MPM: 4:

Table 1 .
Test -set results.Per-class scores are recalls.OA, PA, and UA stand for overall accuracy, producer accuracy, and user accuracy, respectively.