MULTIVARIATE MORPHOMETRIC ANALYSIS OF MANGOSTEEN ( GARCINIA MANGOSTANA VAR. MANGOSTANA , CLUSIACEAE) AND ITS WILD RELATIVES

Mangosteen ( Garcinia mangostana var. mangostana ) is a dioecious and agamospermous cultivated fruit tree. It has two recognised hypothetical wild ancestors, Garcinia mangostana var. malaccensis and G. mangostana var. borneensis , distributed in the lowland dipterocarp-dominated forests of Sumatra, the Malay Peninsula, and Borneo. The highly similar morphological characters between the cultivated and wild varieties have posed challenges in identification. Additionally, Garcinia penangiana is often mistaken for G. mangostana var. malaccensis , and G. venulosa is regarded as morphologically similar to G. mangostana var. borneensis . In the present study, we conducted morphometric analyses of Garcinia mangostana var. mangostana , G. mangostana var. borneensis , G. mangostana var. malaccensis , G. penangiana and G. venulosa . We assessed the efficacy of morphological characters in combination (vegetative–male flowers–female flowers) in distinguishing the taxa as recognised in the current taxonomy. In our morphometric analyses, we found that Garcinia penangiana and G. venulosa are well delimited and congruent with their current taxonomic designations. A combination of vegetative and male-flower characters provided the most definitive delimitation. We recovered the specific coherence of Garcinia mangostana , but the infraspecific delineations of G. mangostana var. mangostana , G. mangostana var. borneensis and G. mangostana var. malaccensis are not supported


Introduction
Clusiaceae is a pantropical family consisting of shrubs and trees represented in 27 genera and slightly more than 1000 species (Stevens, 2007).The genus Garcinia L. is pantropical and comprises close to 250 species (Stevens, 2001-).It is one of the most diverse tree genera in Asian tropical forests (Davies, 2005), and is taxonomically difficult (Sosef & Dauby, 2012).Garcinia is usually dioecious (Stevens, 2001-).Gynodioecy (Pangsuban et al., 2007) and trioecy (Joseph & Murthy, 2015) are known in paleotropical species, and androdioecy herbarium collections.Therefore, there is a need to evaluate the morphological characters used in varietal delimitation of Garcinia mangostana.
There is clearly some confusion regarding species delimitation in the genus.Garcinia venulosa (Blanco) Choisy is recognised as morphologically similar to G. mangostana var.borneensis (Nazre et al., 2018), and the fruit of G. penangiana Pierre is often confused with that of G. malaccensis var.malaccensis in herbaria.Garcinia malaccensis, now considered G. mangostana var.malaccensis, was entirely mistaken for G. penangiana by Kochummen (Kochummen, 1997), and partly so by Whitmore (1973), as inferred from information on determination slips by the authors on herbarium specimens.
Wild varieties of Garcinia mangostana and G. penangiana are sympatric in Sumatra, the Malay Peninsula and Borneo, whereas G. venulosa is confined to the Philippines (Figure 1).According to the Köppen-Geiger climate classification (Beck et al., 2018), an equatorial climate prevails throughout Sumatra, the Malay Peninsula and Borneo, and a monsoon climate in the northwest of the Philippines, specifically northwest Luzon (part of the geographical distribution of Garcinia venulosa).
The dioecious nature of these taxa adds another layer of difficulty to their interpretation and delineation.Garcinia mangostana var.mangostana, a Linnean name most likely based on female material only (Linnaeus, 1753), contrasts with G. penangiana (Pierre, 1883), which was described based on male flower material only.By contrast, the protologues of Garcinia mangostana var.malaccensis (Anderson, 1874), G. mangostana var.borneensis (Nazre et al., 2018) and G. venulosa (Blanco, 1837) include descriptions of both male and female flowers.Despite the acknowledged challenge in delimitating Garcinia taxa (Kochummen & Whitmore, 1973;Sosef & Dauby, 2012), no previous attempt has been made to address this problem using morphometric analyses.
Molecular phylogenetic approaches offer insights into relations among Garcinia mangostana varieties and between G. mangostana and other morphologically similar wild relatives, particularly G. penangiana (Yapwattanaphun et al., 2004;Nazre, 2014).The results of a phylogenetic analysis based on ITS sequence data showed that Garcinia mangostana var.malaccensis forms a paraphyletic group with G. mangostana var.mangostana (Nazre, 2014).In the same study, Garcinia mangostana var.borneensis Nazre, represented by a single accession, was shown to be sister to the mangostana-malaccensis clade, and G. penangiana emerged as the sister group to the clade encompassing all G. mangostana varieties.Garcinia venulosa has not been included in any phylogenetic or morphological studies to date.
The aims of this morphometric study were twofold: (i) to assess the efficacy of morphological characters used to delineate the morphologically highly similar taxa in Garcinia, and (ii) to determine the character combinations that best represent the resulting groupings and that align with the latest taxonomic delimitations.For this purpose, our sampling focused on the taxa mentioned in the above paragraph, known to be closely related to Garcinia mangostana var.mangostana, both genetically and morphologically.The specimens examined and analysed included holdings in herbaria BO, K, KEP, KLU, L, MDI, MPU, P, SAN, SAR, SING, U, US and WAN (herbarium codes follow Thiers, continuously updated).To facilitate taxon identification, we used the identification key and description of Nazre et al. (2018), the identification lists in Nazre (2006), Nazre et al. (2018) and Nazre's specimen annotations, if available.Notably, these specimens collectively represent the entire geographical distribution of wild taxa (see Figure 1).

Specimens and taxa included
Character scoring was conducted using a specimen-based approach.Gatherings of the same collection were treated as duplicates.The full listing of the specimens is provided in Supplementary file 1. Assessments and measurements were made across duplicates, whenever they complemented missing or incomplete organs.
A collection of Garcinia mangostana var.malaccensis, Maingay 149, at K consisted of male flowers (accession no.1643A, barcode K000380446), and at L, of female flowers and young fruit (QR code L.2416659).At K, Maingay's field numbers were partially replaced by herbarium accession numbers (Steenis-Kruseman, 1950).The replacement occurred before duplicates were distributed to other herbaria and although we were not able to confirm the source(s) of the multiple gatherings, we consider them to have originated from at least two different individuals.This distinction is based on our knowledge that the species is reported as dioecious, and we consistently observe male or female flowers on separate specimens.Consequently, these specimens are treated as two distinct collections.
Another notable collection is Daud & Tachun SFN36093 (KEP [barcode KEP239972], L [QR code L.2416581]), identified as a male-flowered representative of Garcinia mangostana var.mangostana.The label states "40′ (feet) tall, flrs.yellow, fruit red"; however, no fruit material was found, only male flowers.Consequently, we have treated these specimens as duplicates of a single collection.For female material, we chose fruit rather than female flowers for analysis for two pragmatic reasons.First, petals of the examined taxa are caducous, and specimens with complete petals are rare.Second, the stigma plate, which persists in fruit, provides many key characters (Nazre et al., 2018), is more developed, and can be better examined and coded in fruiting materials.Male material is represented by specimens bearing male flowers.In all the herbaria visited, only three male specimens of Garcinia mangostana var.mangostana were found.Details regarding the number of specimens analysed, along with information on taxa and specimen subsets, are summarised in Table 1.

Character and character state selection
Because the examined taxa are genetically closely related and highly similar morphologically, selection of additional characters supplementary to key characters is challenging.A total of 38 characters are included in the present study (Table 2).The morphometric datasets assembled included qualitative and quantitative data and were organised in DELTA Editor (Dallwitz, 1980;Dallwitz et al., 1999-).Beentje (2010) was followed for characters and character state descriptive terms.
For qualitative data, 'character states' were predominantly treated as 'conventional multistate factorial data', and 'unconventional coding methods', as defined by Hawkins (2000), included (i) composite coding, (ii) logically related coding, (iii) positional coding, and (iv) mixed coding.Because 13 of the 38 characters had three or more character states, multistate coding was preferred over absent-or-present coding, to avoid unnecessary expansion of the dataset.Quantitative data, namely measurements and counts, were treated as numerical and integer data types, respectively.ImageJ (Schneider et al., 2012) was used for measurements of secondary vein angles and for the counting of number of secondary veins forming loops.Each mean value in the datasets represents the mean calculated from three measurements.
All characters (18) used in the identification keys of Nazre et al. (2018) for the selected taxa were included (here termed 'key characters').Additionally, 20 'additional characters' were specifically identified and included in the present study.
The selection of appropriate diagnostic characters for use in delineating taxa is fundamental to taxonomy (Borkent, 2021), and we approached this task with an open perspective.Additional characters incorporated in this study were not previously employed as diagnostic characters.The rationale behind selecting these additional characters was twofold.First, it was observed that some of these characters were potentially informative in morphometric analyses during the early stages of specimen examination and assessment.The nine additional characters included on this basis were characters 5, 12, 15, 16, 22, 23 and 35-37 (see Table 2).Second, an additional 11 characters (characters 3, 4, 6-11, 24, 27, 28; see Table 2) were included to supplement the key characters and to assess whether they enhance taxa delineation.There was no a priori expectation regarding how these characters would affect the results.
Most characters and their character states are self-explanatory; however, some require further explanation.Characters 8, 9, 16 and 28 (see Table 2) represent the ratios of two ratios (i.e.means) instead of direct measurements.The 'density of secondary vein pairs forming loops at intramarginal vein' was calculated by dividing the 'mean count  of secondary vein pairs that form loops at intramarginal vein' by 'mean lamina length'.This approach was adopted to mitigate bias caused by plasticity in leaf sizes (Chitwood et al., 2021).Measurements of secondary vein angle and glandular line angle were taken between the midrib and the secondary vein and between the midrib and the glandular line, respectively.Although three measurements were taken for the secondary veins of each specimen, measurements of the glandular line angle were coded as 'glandular line orientation' (character 21) and treated as factorial data.Because specimens with fruits at various maturity stages were included to obtain a larger sample size for statistical analyses, characters directly linked to maturity, such as fruit length and width, were deliberately avoided.
In our analyses, the dataset was organised into six subsets (see Table 2) based on criteria pertaining to specimens (samples) and characters (variables).All 124 specimens were designated as 'vegetative' (VG) because they included only vegetative characters.
Owing to the dioecious nature of the taxa examined, specimens were divided into specimen subsets based on the reproductive organs.The female specimen subset, termed 'vegetative and fruit' (FR), comprised 73 specimens, and the male specimen subset, 'vegetative and male flower' (MF), comprised 38 specimens.Both the FR and the MF subsets include vegetative characters, plus fruit materials and male flower materials, respectively.Each of the three specimen subsets were assigned two character subsets: 'key characters' (key) only and 'additional characters' (add).Thus, the six subsets are denoted as follows: 'VG-key', 'VG-add', 'FR-key', 'FR-add', 'MF-key' and 'MF-add'.'Key characters' character subsets consisted of key characters only, whereas 'additional characters' included both key and additional characters.

Algorithms in principal coordinates analysis, ascendent hierarchical classification and CH index
Principal coordinates analysis (PCoA) and ascendent hierarchical classification (AHC) were employed to evaluate the morphological characters used in species delimitation; these methodologies have demonstrated success in prior studies (Pierre et al., 2014;Morel et al., 2021).Analyses were performed using R version 4.1.1(R Core Team, 2021) on RStudio (Posit team, 2022).Three R packages were utilised, namely 'ade4' (Dray & Dufour, 2007), 'vegan' (Oksanen et al., 2022), and 'cluster' (Maechler et al., 2022).R functions were used to enhance the dendrogram plot, namely 'fColorLeaf.R', 'fLabelNoeud.R' and 'EnvelopingEllipse.R' (Le Moguédec, 2020).The morphometric datasets were analysed using morphological characters per se, without considering the specimen's taxonomic identification.Thus, no a priori assumptions were made regarding the formation of clusters, thereby ensuring an unbiased approach.The algorithms employed in the analyses were modelled after Morel et al. (2021) and Pierre et al. (2014).Given that the dataset comprised both quantitative and qualitative variables, Gower's (1971) coefficient of dissimilarity was applied due to its ability to analyse heterogenous variables simultaneously.Subsequently, the dissimilarity matrix was converted into Euclidean distance through square-root transformation.
A PCoA was conducted to obtain an overview of the grouping.In the PCoA, axes are ranked in descending order based on their total inertia, calculated as eigenvalues.The cumulative total of the six axes with highest eigenvalues was then calculated.The results of PCoA are presented in scatter plots, using the two axes with the highest eigenvalues.
An AHC was then applied for clustering analysis.The same transformed dissimilarity matrix for PCoA was employed in AHC.To construct the dendrogram, the aggregation criterion Ward distance (Ward, 1963) was utilised for clustering.The best partition on the datasets was assessed using Caliński-Harabasz (CH) index (Caliński & Harabasz, 1974), based on the same distance matrix used to construct AHC dendrograms a posteriori.Between 2 and 10 partitions were tested for each data subset.The CH index indicates the number of partitions that represents the optimal clusters for the examined dataset (Legendre & Legendre, 1998).

Results
Our early observations during examination of the specimens for character assessments and measurements were that typical Garcinia penangiana has chartaceous lamina texture (character 5), a relatively wide angle between secondary veins and midrib (character 15), and a relatively high density of secondary vein pairs forming loops at the intramarginal vein (character 16).Typical Garcinia mangostana var.mangostana displays prominent lamina apex form (character 12) that bend downwards.In male flowers, only Garcinia malaccensis and G. penangiana have more than 5 flowers in a simple cyme (character 22), and the pedicels of G. penangiana are slender (character 23).
It is crucial to emphasise that the representations of clustering in each six data subsets correspond in their PCoA and AHC plots.This means that the identity of the datapoints and their taxon in PCoA plots can be cross-referenced to the specimens in AHC dendrograms.In the plots, three groups, denoted as clusters in PCoA and assemblages in AHC, were determined and labelled as A, B1 and B2.These labels are used consistently across Figures 2, 3, 4 and 5 as well as Supplementary files 2, 3 and 4, and the datapoints encircled in the dashed-line ellipses (Figure 2) correspond to the specimens at the ends of the nodes (Figure 3, 4 and 5; Supplementary files 2, 3 and 4) with the same labels; these ellipses and nodes are colour-coded.Individual samples are colour-coded according to their current taxon name.In CH tests, four out of six (67%) showed partitioning into three clusters best representing the respective datasets, whereas only one showed partitioning into 2 and 10 as the best scenario (Table 3).

Principal coordinates analysis
Eigenvalues presented as a percentage of variance show that the distance matrix can be effectively summarised using the first six axes, accounting for 86.98-95.16% of the total inertia (Table 4).To illustrate the results of the PCoA, we plotted the first two axes (see Figure 2).Cluster A, which uniquely represents Garcinia penangiana, is clearly delineated from other clusters across all character subsets.Ellipses of clusters B1 and B2 overlap in Figure 2A and B. These two clusters include specimens of Garcinia mangostana varieties and G. venulosa.Three distinct clusters were observed in FR-add (see Figure 2D).However, cluster B2 in FR-add (see Figure 2D) comprises Garcinia venulosa and all G. mangostana varieties.By contrast, the clustering results of MF-add (see Figure 2F) best represent the current taxonomic delimitation of the five taxa included.It shows unique clusters for Garcinia penangiana (A) and G. venulosa (B1), whereas all three G. mangostana varieties uniquely grouped within B2.

Discussion
We assessed the efficacy of morphological characters for use in delineating the varieties of Garcinia mangostana and the closely related taxa G. penangiana and G. venulosa.Additionally, we investigated the character combinations that best represent the current taxonomic delimitation and explored the effect of including additional characters in these combinations.Table 5 summarizes the combined effect of additional characters in Garcinia, based on the results of PCoA, AHC and CH analyses.

Efficacy of morphological characters in defining the closely related taxa
Generally, vegetative characters alone are not effective for distinguishing the taxa investigated; the exception is Garcinia penangiana.Across all six data subsets analysed, Garcinia penangiana consistently formed a cluster clearly delineated from the other taxa.
Our findings highlight that this species is a well-defined taxon, based on the key characters used in Nazre et al. (2018).On close scrutiny, we identified specific characters that delineate Garcinia penangiana from the other taxa.These diagnostic characters and character states include: (i) the presence of single intramarginal veins, as observed on the lower surface of the lamina; (ii) a dark grey or black glandular line; and (iii) a glandular line form consisting of a mix of long wavy lines and short lines.The misapplication of Garcinia malaccensis (a synonym of G. mangostana var.malaccensis) to G. penangiana by Kochummen (1997) and Whitmore (1973) indicates that their taxonomic species concept of G. malaccensis was too broad.Garcinia venulosa is indistinguishable from G. mangostana var.borneensis, using solely vegetative characters.However, characters used in the male flower additional data subset can uniquely delineate Garcinia venulosa.We also observed a trend that when fruit characters datasets (both key and additional data subsets) are used, the taxon is distinguishable from Garcinia mangostana varieties.On examination of which individual characters delineate Garcinia venulosa from other taxa, we identified one character: glandular line orientation, which is almost parallel (180°) to the midrib and margin.By contrast, in other species, glandular line orientation ranges between 10 and 55°, running from the midrib towards the margin.In short, our results suggest that morphological characters could be used to satisfactorily delineate both Garcinia penangiana and G. venulosa from G. mangostana.
Individuals of Garcinia mangostana generally formed an inseparable cluster, with the exception of G. mangostana var.borneensis, which formed a unique taxon assemblage  based on the male flower additional characters data subset.Delimitation between Garcinia mangostana var.mangostana and G. mangostana var.malaccensis is not readily observed.
The key character of the surface of the persistent stigma plate on fruit apex, 'smooth vs rugose', was used to delimitate the two varieties (Nazre et al., 2018).However, we observed a continuum in this character state, leading to challenges in taxa delineation if using only vegetative and fruit characters.The stigma plate surface is especially variable in the mangosteen cultigens from Java.This observation is not novel; based on their examination of numerous samples since early 1996, Hambali & Natawijaya (2016) found the presence of smooth and corrugated surface stigma in both Garcinia mangostana var.mangostana and G. mangostana var.malaccensis.These character states of the stigma plate surface are also observed for fresh fruits.
The morphometric analysis did not identify any character that would allow delineation of Garcinia mangostana varieties.We confirmed that the pistillode is always present in male flowers of Garcinia mangostana var.mangostana and always absent in G. mangostana var.borneensis, but both character states are applicable to G. mangostana var.malaccensis (Nazre et al., 2018).Additionally, we did not find disjunctive measurements in stamen bundle length among the three varieties (Nazre et al., 2018).
Our findings support the inclusion of Garcinia mangostana var.borneensis and G. mangostana var.malaccensis within G. mangostana.However, the recognition of varietal rank is not supported, and construction of an identification key based on morphological characters is not achievable.Harlan & de Wet (1971) emphasised the difficulties in the circumscribing and naming of cultivated plants.They proposed the use of subspecies rank for cultivated and close wild relatives that form a 'primary gene pool'.This idea may be applicable to well-studied crops and their wild relatives whose population genetics have been clarified, a situation not yet realised in mangosteen.In modern taxonomy, the rank of subspecies is more commonly used to delineate taxa with geographically disjunctive populations (Pipoly, 1987).Garcinia mangostana var.borneensis is confined to eastern Borneo whereas G. mangostana var.malaccensis is confined to Sumatra, the Malay Peninsula and western Borneo.However, there is no consensus among taxonomists on the differentiation between subspecies and varieties (Hamilton & Reichard, 1992).We have refrained from proposing taxonomic changes until deeper knowledge of the population genetics of both the cultivated and the wild compartments is available.

Assessment based on character-combination data subsets
In the vegetative characters datasets, the improvement of clustering with the use of additional characters is exemplified by the results for the CH indices.The use of additional characters resulted in recognition of two clusters with the highest score, which could better explain the clustering of all Garcinia penangiana specimens in one assemblage and other taxa in another assemblage.This contrasts with recognising 10 clusters, or even more considering the increasing trend, if only key characters are used.
The male flower characters dataset best reflects the current taxonomic circumscription among taxa, and the use of additional characters improves the clustering topology.Garcinia venulosa is distinguished from G. mangostana varieties, and this is supported in the results for PCoA and CH index score.Additionally, we observed a unique taxonomic subassemblage formed by Garcinia mangostana var.borneensis within assemblage B2.The formation of a unique assemblage by a taxon among the Garcinia mangostana varieties is observed only in the male flower additional characters data subset.However, this subassemblage is not supported in the PCoA clustering or CH index score results.
The inclusion of additional characters in the fruit characters dataset has a negative effect on distinguishing Garcinia venulosa from G. mangostana varieties (Supplementary file 3), whereas key characters of fruit per se can be used to distinguish G. penangiana and G. venulosa from G. mangostana varieties.Clearly, there were mixed effects of the inclusion of additional characters.The additional characters recognised in this study, especially those with positive effect in delineating the taxa, should be considered in future taxonomic studies of Garcinia.

Conclusions
The results of morphometric analyses showed that Garcinia penangiana is essentially delineated by vegetative characters, whereas G. venulosa is delineated from G. mangostana varieties by a combination of vegetative and male flower characters.Generally, all Garcinia mangostana varieties formed a single mixed assemblage.Our findings confirmed the coherence of Garcinia mangostana as a taxonomic species.However, our findings do not support the designation of infraspecific taxa, because we did not find apomorphies.This conclusion is consistent with Corner's (1997) opinion that varieties mangostana and malaccensis cannot be distinguished, although whether a varietal rank would appropriate to var.malaccensis was not discussed.Hambali & Natawijaya (2016) viewed var.malaccensis as a diploid form of the tetraploid var.mangostana, and they favoured the recognition of var.malaccensis at specific rank, a view that is not supported in our morphometric analyses.A deeper investigation of the delimitation of Garcinia mangostana varieties, using nuclear markers in a population genetics framework, would help clarify the taxonomic delimitation of these varieties.
We acknowledge the limitations of using morphometric analysis in studies of dioecious plants.The species concept in dioecious plants typically involves delineation utilising both female and male characters.However, using a specimen-based approach, we could not assess all the species-delineating characters in a single integrated dataset without the results being adversely affected by excessive missing data, as discussed by Pierre et al. (2014).

Figure 1 .
Figure 1.The geographical distribution of the Garcinia specimens examined and analysed.

Figure 2 .
Figure 2. Scatter plots showing the results of principal coordinates analysis (PCoA) for six data subsets: A, vegetative, key characters; B, vegetative, additional characters; C, vegetative and fruit, key characters; D, vegetative and fruit, additional characters; E, vegetative and male flower, key characters; F, vegetative and male flower, additional characters.Each empty circle represents a single specimen.Data points in clusters A, B1 and B2 correspond to the specimens in the clades of the same name in Figures 3, 4 and 5 and Supplementary files 2, 3 and 4. Overlapping points in the PCoA plots are due to specimens sharing identical positions.

Figure 3 .
Figure 3. Ascendent hierarchical classification dendrograms for the vegetative key characters specimen subset.

Figure 5 .
Figure5.Ascendent hierarchical classification dendrograms for the vegetative and male flower additional characters specimen subset.

Table 1 .
Number of Garcinia specimens analysed, with details of taxa and specimen subsets a Specimen subset: VG, vegetative; FR, vegetative and fruit; MF, vegetative and male flower.

Table 2 .
Characters and character states examined and analysed a Nazre et al. (2018)ed in the character subset.aCharacters in bold are 'key characters' used byNazre et al. (2018); characters not in bold are additional characters used in the present study.b Organ: FR, fruit; LF, leaf; MF, male flower; TW, twig.c All qualitative characters are treated as factorial (F) data, quantitative characters include numerical (N) and integer (I) data.d Character subsets: VG-key, vegetative key characters; VG-add, vegetative additional characters; MF-key, vegetative and male flower characters; MF-add, vegetative and male flower additional characters; FR-key, vegetative and fruit characters; FR-add, vegetative and fruit additional characters.

Table 3 .
Caliński-Harabasz indices of six data subsets tested with 2-10 partitions a The highest index value for each subset is in bold font.b Character subsets: VG-key, vegetative key characters; VG-add, vegetative additional characters; FR-key, vegetative and fruit characters; FR-add, vegetative and fruit additional characters; MF-key, vegetative and male flower characters; MF-add, vegetative and male flower additional characters. a

Table 4 .
Summary of eigenvalues in percentage of variance calculated in principal coordinates analysis a Character subsets: VG-key, vegetative key characters; VG-add, vegetative additional characters; FR-key, vegetative and fruit characters; FR-add, vegetative and fruit additional characters; MF-key, vegetative and male flower characters; MF-add, vegetative and male flower additional characters.

Table 5 .
The combined effect of additional characters in the genus Garcinia, based on the results of principal coordinates analysis, ascendent hierarchical classification, and Caliński-Harabasz analyses