STUDENT PROJECT What lives within and on a plant: our understanding from genome NGS data
DOI:
https://doi.org/10.24823/Sibbaldia.2026.2130Keywords:
Contaminants, Genome assembly, Pathogens, Pests , Symbionts, Biosecurity, GesneriaceaeAbstract
Next-generation sequencing (NGS) can generate gigabytes of genome data. Unlike Sanger sequencing, NGS generates a ‘read’ from a single DNA molecule, reflecting directly the starting DNA, including non-target organisms such as symbionts and pathogens. Non-target organism sequences are usually discarded during genome assembly as contaminants; these are potentially a great source of information for understanding the microbiome surrounding the plant. The present study explores bioinformatically the identification of the non-target organisms from genome NGS datasets of two cultivated Gesneriaceae species. The datasets were generated using different NGS technologies: one is from Streptocarpus rexii (Bowie ex Hook.) Lindl., sequenced using Oxford Nanopore Technologies long-read sequencing, and the second from Aeschynanthus angustifolius (Blume) Steud., sequenced using Illumina short-read sequencing. The reads were first assembled and then analysed using BlobTools to identify the contaminants. For S. rexii, Actinomycetota and Basidiomycota occupied the highest ratio among genome contaminants, followed by Arthropoda, Ascomycota and Acidobacteriota. In A. angustifolius, the highest contaminant class was Pseudomonadota and the second Actinomycetota, followed by Basidiomycota and Chordata. Arthropoda included mealybugs which were also observed in the glasshouse. The differences in contaminant composition between S. rexii and A. angustifolius may be linked to the relatively short-lived leaves of the former and the long-lived ones of the latter. This pilot study demonstrates that, in principle, this method is suitable to detect and identify associated organisms, and the pipelines designed here greatly facilitated this process. This approach might be useful in a horticultural setting for the assessment of plant material in quarantine or biosecure conditions and may be able to detect pathogens prior to plants showing symptoms. It also has potentially more widespread applications for studying plant–microbiome interactions.
References
ABD-RABOU, S., SHALABY, H., GERMAIN, J.F., RIS, N., KREITER, P. & MALAUSA, T. (2012). Identification of mealybug pest species (Hemiptera: Pseudococcidae) in Egypt and France, using a DNA barcoding approach. Bulletin of Entomological Research, 102: 515–523. doi: https://doi.org/10.1017/S0007485312000041
ANDREWS, S. (2024). FastQC: A quality control tool for high throughput sequence data. Available online: www.bioinformatics.babraham.ac.uk/projects/fastqc (accessed July 2024).
BARD, N.W., DAVIES, T.J. & CRONK, Q.C.B. (2025). Teknonaturalist: A Snakemake pipeline for assessing fungal diversity from plant genome bycatch. Molecular Ecology Resources, 25: e14056. doi: https://doi.org/10.1111/1755-0998.14056
BEHJATI, S. & TARPEY, P.S. (2013). What is next generation sequencing? Archives of Disease in Childhood–Education and Practice, 98: 236–238. doi: https://doi.org/10.1136/archdischild-2013-304340
BGCI (2022). Guide to plant biosecurity in botanic gardens and arboreta. Available online: www.bgci.org/wp/wp-content/uploads/2022/08/BGCI-Guide-to-Biosecurity-in-Botanic-Gardens-and-Arboreta.pdf (accessed July 2024).
BLACKMAN, R.L. (2010). Aphids — Aphidinae (Macrosiphini). Handbooks for the Identification of British Insects, Vol. 2, Part 7. Royal Entomological Society, St Albans.
BOLGER, A.M., LOHSE, M. & USADEL, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30: 2114–2120. doi: https://doi.org/10.1093/bioinformatics/btu170
BUCKLER IV, E.S. & THORNSBERRY, J.M. (2002). Plant molecular diversity and applications to genomics. Current Opinion in Plant Biology, 5: 107–111. doi: https:/doi.org/10.1016/s1369-5266(02)00238-8 CARROLL, G.C. (1988). Fungal endophytes in stems and leaves from latent pathogen to mutual symbiont. Ecology, 69: 2–9. doi: https://doi.org/10.2307/1943154CD GENOMICS (2025). CD Genomics Blog: Error rate of PacBio vs Nanopore: How accurate are long-read sequencing technologies. Available online: www. cd-genomics.com/blog/pacbio-nanopore-error-rate-correction-strategies (accessed September 2025). CHALLIS, R., RICHARDS, E., RAJAN, J., COCHRANE, G. & BLAXTER, M. (2020). BlobToolKit–interactive quality assessment of genome assemblies. G3: Genes, Genomes, Genetics, 10: 1361–1374. doi: https://doi.org/10.1534/g3.119.400908DE COSTER, W., D’HERT, S., SCHULTZ, D.T., CRUTS, M. & VAN BROECKHOVEN, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34: 2666–2669. doi: https://doi.org/10.1093/bioinformatics/bty149DE COSTER, W. & RADEMAKERS, R. (2023). NanoPack2: population-scale evaluation of long-read sequencing data. Bioinformatics, 39: btad311. doi: https://doi.org/10.1093/bioinformatics/btad311DENCHEV, C.M. & DENCHEV, T.T. (2021). Validation of the generic names Meira and Acaromyces and nineteen species names of basidiomycetous yeasts. Mycobiota, 11: 1–10. doi: https://doi.org/10.12664/mycobiota.2021.11.01DOHM, J.C., PETERS, P., STRALIS-PAVESE, N. & HIMMELBAUER, H. (2020). Benchmarking of long-read correction methods. NAR Genomics and Bioinformatics, 2: lqaa037. doi: https://doi. org/10.1093/nargab/lqaa037FRANIĆ, I., ALLAN, E., PROSPERO, S., ADAMSON, K., ATTORRE, F., AUGER-ROZENBERG, M.-A., AUGUSTIN, S., AVTZIS, D., BAERT, W., BARTA, M., BAUTERS, K., BELLAHIRECH, A. ET AL. (2023). Climate, host and geography shape insect and fungal communities of trees. Scientific Reports, 13: 11570. doi: https://doi.org/10.1038/s41598-023-36795-wGALANTI, D., JUNG, J.H., MÜLLER, C. & BOSSDORF, O. (2024). Discarded sequencing reads uncover natural variation in pest resistance in DOI 10.24823/Sibbaldia.2026.2130Thlaspi arvense. eLife, 13: RP95510. doi: https://doi.org/10.7554/eLife.95510.3
GATHERCOLE, L.A.P., NOCCHI, G., BROWN, N., COKER, T.L.R., PLUMB, W.J., STOCKS, J.J., NICHOLS, R.A., DENMAN, S. & BUGGS, R.J.A. (2021). Evidence for the widespread occurrence of bacteria implicated in Acute Oak Decline from incidental genetic sampling. Forests, 12: 1683. doi: https://doi.org/10.3390/f12121683
GEYER, J.K., GRUNBERG, R.L., WANG, J. & MITCHELL, C.E. (2024). Leaf age structures phyllosphere microbial communities in the field and greenhouse. Frontiers in Microbiology, 15. doi: https://doi.org/10.3389/fmicb.2024.1429166
GUREVICH, A., SAVELIEV, V., VYAHHI, N. & TESLER, G. (2013). QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29: 1072–1075. doi: https://doi.org/10.1093/bioinformatics/btt086
HERNÁNDEZ-TASCO, A.J., TRONCHINI, R.A., APAZA-CASTILLO, G.A., HOSAKA, G.K., QUIÑONES, N.R., GOULART, M.C., FANTINATTI-GARBOGGINI, F. & SALVADOR, M.J. (2023). Diversity of bacterial and fungal endophytic communities presents in the leaf blades of Sinningia magnifica, Sinningia schiffneri and Sinningia speciosa from different cladus of Gesneriaceae family: A comparative analysis in three consecutive years. Microbiological Research, 271: 127365. doi: https://doi.org/10.1016/j.micres.2023.127365
HU, T., CHITNIS, N., MONOS, D. & DINH, A. (2021). Next-generation sequencing technologies: An overview. Human Immunology, 82: 801–811. doi: https://doi.org/10.1016/j.humimm.2021.02.012
JAIN, M., FIDDES, I.T., MIGA, K.H., OLSEN, H.E., PATEN, B. & AKESON, M. (2015). Improved data analysis for the MinION nanopore sequencer. Nature Methods, 12: 351–356. doi: https://doi.org/10.1038/nmeth.3290
JO, Y., BACK, C.G., KIM, K.H., CHU, H., LEE, J.H., MOH, S.H. & CHO, W.K. (2021). Comparative study of metagenomics and metatranscriptomics to reveal microbiomes in overwintering pepper fruits. International Journal of Molecular Sciences, 22: 6202. doi: https://doi.org/10.3390/ijms22126202
KANWAR, N., BLANCO, C., CHEN, I.A. & SEELIG, B.(2021). PacBio sequencing output increased through uniform and directional fivefold concatenation. Scientific Reports, 11: 18065. doi: https://doi.org/10.1038/s41598-021-96829-z
KENNEDY, C. & SOUTHWOOD, T. (1984). The number of species of insects associated with British trees: a re-analysis. The Journal of Animal Ecology, 53: 455–478. doi: https://doi.org/10.2307/4528
LAETSCH, D.R. & BLAXTER, M.L. (2017). BlobTools: Interrogation of genome assemblies. F1000Research, 6: 1287. doi: https://doi.org/10.12688/f1000research.12232.1
LI, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34: 3094–3100. doi: https:/doi.org/10.1093/bioinformatics/bty191
LI, H., HANDSAKER, B., WYSOKER, A., FENNELL, T., RUAN, J., HOMER, N., MARTH, G., ABECASIS, G., DURBIN, R. & 1000 GENOME PROJECT DATA PROCESSING SUBGROUP (2009). The sequence alignment/map format and SAMtools. Bioinformatics, 25: 2078–2079. doi: https://doi.org/10.1093/bioinformatics/btp352
LI, Y. (2024). What lives within and on a plant: our understanding from genome NGS data. Unpublished MSc thesis, University of Edinburgh.
LIN, X., TANG, W., AHMAD, S., LU, J., COLBY, C.C., ZHU, J. & YU, Q. (2012). Applications of targeted gene capture and next-generation sequencing technologies in studies of human deafness and other genetic disabilities. Hearing Research, 288: 67–76. doi: https://doi.org/10.1016/j.heares.2012.01.004
LIU, B.W., LI, S.Y., ZHU, H. & LIU, G.X. (2023). Phyllosphere eukaryotic microalgal communities in rainforests: Drivers and diversity. Plant Diversity, 45: 45–53. doi https://doi.org/10.1016/j.pld.2022.08.006
LU, J. & SALZBERG, S.L. (2018). Removing contaminants from databases of draft genomes. PLoS Computational Biology, 14: e1006277. doi: https://doi.org/10.1371/journal.pcbi.1006277
LUO, R., LIU, B., XIE, Y., LI, Z., HUANG, W., YUAN, J., HE, G., CHEN, Y., PAN, Q. & LIU, Y. (2012). SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1: 2047-2217X-2041-2018. doi: https://doi.org/10.1186/2047-217X-1-18
LUPO, V., VAN VLIERBERGHE, M., VANDERSCHUREN, H., KERFF, F., BAURAIN, D. & CORNET, L. (2021). Contamination in reference sequence databases: time for divide-and-rule tactics. Frontiers in Microbiology, 12: 755101. doi: https://doi.org/10.3389/fmicb.2021.755101
MARSBERG, A., KEMLER, M., JAMI, F., NAGEL, J.H., POSTMA‐SMIDT, A., NAIDOO, S.,WINGFIELD, M.J., CROUS, P.W., SPATAFORA, J.W.& HESSE, C.N. (2017). Botryosphaeria dothidea: a latent pathogen of global importance to woody plant health. Molecular Plant Pathology, 18: 477–488. doi: https://doi.org/10.1111/mpp.12495
MEHL, J., WINGFIELD, M.J., ROUX, J. & SLIPPERS, B.(2017). Invasive everywhere? Phylogeographic analysis of the globally distributed tree pathogen Lasiodiplodia theobromae. Forests, 8: 145. doi: https://doi.org/10.3390/f8050145
MICCOLI, C., PALMIERI, D., DE CURTIS, F., LIMA, G., HEITMAN, J., CASTORIA, R. & IANIRI, G. (2020). The necessity for molecular classification of basidiomycetous biocontrol yeasts. BioControl, 65: 489–500. doi: https://doi.org/10.1007/s10526-020-10008-z
MIN, S.H. & ZHOU, J. (2021). smplot: an R package for easy and elegant data visualization. Frontiers in Genetics, 12: 802894. doi: https://doi.org/10.3389/fgene.2021.802894
MÖLLER, M. & CRONK, Q.C.B. (2001). Phylogenetic studies in Streptocarpus (Gesneriaceae): reconstruction of biogeographic history and distribution patterns. Systematics and Geography of Plants, 71(2): 545–555. doi: https://doi.org/10.2307/3668699
NISHII, K., HART, M., KELSO, N., BARBER, S., CHEN, Y.Y., THOMSON, M., TRIVEDI, U., TWYFORD, A.D. & MÖLLER, M. (2022). The first genome for the Cape Primrose Streptocarpus rexii (Gesneriaceae), a model plant for studying meristem‐driven shoot diversity. Plant Direct, 6: e388. doi: https://doi.org/10.1002/pld3.388
PALMIERI, D., BARONE, G., CIGLIANO, R.A., DE CURTIS, F., LIMA, G., CASTORIA, R. & IANIRI, G.(2021). Complete genome sequence of the biocontrol yeast Papiliotrema terrestris strain LS28. G3: Genes, Genomes, Genetics, 11: jkab332. doi: https://doi.org/10.1093/g3journal/jkab332
PAPPALARDO, P., HEMMI, J.M., MACHIDA, R.J., LERAY, M., COLLINS, A.G. & OSBORN, K.J.(2025). Taxon-specific BLAST percent identity thresholds for identification of unknown sequences using metabarcoding. Methods in Ecology and Evolution, 16: 2380–2394. doi: https://doi. org/10.1111/2041-210X.70147PATON, J. (2023). What’s in an NOR? Nuclear ribosomal DNA variation in Aeschynanthus (Gesneriaceae) – what can be revealed with genome skimming? Unpublished MSc thesis, University of Edinburgh. PÉREZ-COBAS, A.E., GOMEZ-VALERO, L. & BUCHRIESER, C. (2020). Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses. Microbial Genomics, 6: e000409. doi: https://doi.org/10.1099/mgen.0.000409PIĄTEK, M., LUTZ, M. & WELTON, P. (2012). Exobasidium darwinii, a new Hawaiian species infecting endemic Vaccinium reticulatum in Haleakala National Park. Mycological Progress, 11: 361–371. doi: https://doi.org/10.1007/s11557-011-0751-4PIOMBO, E., ABDELFATTAH, A., DROBY, S., WISNIEWSKI, M., SPADARO, D. & SCHENA, L. (2021). Metagenomics approaches for the detection and surveillance of emerging and recurrent plant pathogens. Microorganisms, 9: 188. doi: https://doi. org/10.3390/microorganisms9010188PUCKER, B., IRISARRI, I., DE VRIES, J. & XU, B. (2022). Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions. Quantitative Plant Biology, 3: e5. doi: https://doi.org/10.1017/qpb.2021.18QIN, A., DING, Y., JIAN, Z., MA, F., WORTH, J.R., PEI, S., XU, G., GUO, Q. & SHI, Z. (2021). Low genetic diversity and population differentiation in Thuja sutchuenensis Franch., an extremely endangered rediscovered conifer species in southwestern China. Global Ecology and Conservation, 25: e01430. doi: https://doi.org/10.1016/j.gecco.2020.e01430RAMAKRISHNAN, D.K., JAUERNEGGER, F., HOEFLE, D., BERG, C., BERG, G. & ABDELFATTAH, A. (2024). Unravelling the microbiome of wild flowering plants: a comparative study of leaves and flowers in alpine ecosystems. BMC Microbiology, 24: 417. doi: https://doi.org/10.1186/s12866-024-03574-0R CORE TEAM (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, DOI 10.24823/Sibbaldia.2026.2130Austria. Available online: www.R-project.org (accessed August 2025).
ROMAN-REYNA, V., PINILI, D., BORJA, F.N., QUIBOD, I.L., GROEN, S.C., ALEXANDROV, N., MAULEON, R. & OLIVA, R. (2020). Characterization of the leaf microbiome from whole-genome sequencing data of the 3000 Rice Genomes Project. Rice, 13: 72. doi: https://doi.org/10.1186/s12284-020-00432-1
RRWICK (2018). Adapter trimmer for Oxford Nanopore reads. Github. Available online: https://github.com/rrwick/Porechop (accessed June 2024).
RUAN, J. & LI, H. (2020). Fast and accurate long-read assembly with wtdbg2. Nature Methods, 17: 155–158. doi: https://doi.org/10.1038/s41592-019-0669-3
RUNGJINDAMAI, N. & JONES, E.B.G. (2024). Why are there so few Basidiomycota and basal fungi as endophytes? A review. Journal of Fungi, 10: 67. doi: https://doi.org/10.3390/jof10010067
SAIKKONEN, K., FAETH, S., HELANDER, M. & SULLIVAN, T. (1998). Fungal endophytes: a continuum of interactions with host plants. Annual Review of Ecology and Systematics, 29: 319–343. doi: https://doi.org/10.1146/annurev.ecolsys.29.1.319
SAILLARD, C., VIGNAULT, J., BOVÉ, J., RAIE, A., TULLY, J., WILLIAMSON, D., FOS, A., GARNIER, M., GADEAU, A. & CARLE, P. (1987). Spiroplasma phoeniceum sp. nov., a new plant-pathogenic species from Syria. International Journal of Systematic and Evolutionary Microbiology, 37: 106–115. doi: https://doi.org/10.1099/00207713-37-2-106
SAINI, M.K., GAURAV, H., KUMAR, J. & SANU, K. (2023). DNA sequencing techniques: Sanger to next generation sequencing. The Science World, 3: 2378–2393. doi: https://doi.org/10.5281/zenodo.8376905
SANGIOVANNI, M., GRANATA, I., THIND, A.S. & GUARRACINO, M.R. (2019). From trash to treasure: detecting unexpected contamination in unmapped NGS data. BMC Bioinformatics, 20: 168. doi: https://doi.org/10.1186/s12859-019-2684-x
SCHÖNROGGE, K., GIBBS, M., OLIVER, A., CAVERS, S., GWEON, H.S., ENNOS, R.A., COTTRELL, J., IASON, G.R. & TAYLOR, J. (2022). Environmental factors and host genetic variation shape the fungal endophyte communities within needles of Scots pine (Pinus sylvestris). Fungal Ecology, 57–58: 101162. doi: https://doi.org/10.1016/j.funeco.2022.101162
SHEN, W., LE, S., LI, Y. & HU, F. (2016). SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PloS One, 11: e0163962. doi: https://doi.org/10.1371/journal.pone.0163962
SOHRABI, R., PAASCH, B.C., LIBER, J.A. & HE, S.Y. (2023). Phyllosphere microbiome. Annual Review of Plant Biology, 74: 539–568. doi: https://doi.org/10.1146/annurev-arplant-102820-032704
THITLA, T., KUMLA, J., KHUNA, S., LUMYONG, S. & SUWANNARACH, N. (2022). Species diversity, distribution, and phylogeny of Exophiala with the addition of four new species from Thailand. Journal of Fungi, 8: 766. doi: https://doi.org/10.3390/jof8080766
THOMAS, G., KAY, W.T. & FONES, H.N. (2024). Life on a leaf: the epiphyte to pathogen continuum and interplay in the phyllosphere. BMC Biology, 22: 168. doi: https://doi.org/10.1186/s12915-024-01967-1
VALENCIA, C.A., PERVAIZ, M.A., HUSAMI, A., QIAN, Y. & ZHANG, K. (2013). Next Generation Sequencing Technologies in Medical Genetics. Springer: SpringerBriefs in Genetics, New York.
VASSE, M., VOGLMAYR, H., MAYER, V., GUEIDAN, C., NEPEL, M., MORENO, L., DE HOOG, S., SELOSSE, M.-A., MCKEY, D. & BLATRIX, R. (2017). A phylogenetic perspective on the association between ants (Hymenoptera: Formicidae) and black yeasts (Ascomycota: Chaetothyriales). Proceedings of the Royal Society B: Biological Sciences, 284: 20162519. doi: https://doi.org/10.1098/rspb.2016.2519
VENBRUX, M., CRAUWELS, S. & REDIERS, H. (2023). Current and emerging trends in techniques for plant pathogen detection. Frontiers in Plant Science, 14: 1120968. doi: https://doi.org/10.3389/fpls.2023.1120968
VORHOLT, J.A. (2012). Microbial life in the phyllosphere. Nature Reviews Microbiology, 10: 828–840. doi: https://doi.org/10.3929/ethz-b-000059727
WALLER, J.M., LENNÉ, J.M. & WALLER, S.J. (2001). Plant Pathologist’s Pocketbook. CABI Publishing, Wallingford.
WICKHAM, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer, Cham.
YANG, F., PU, X., MATTHEW, C., NAN, Z. & LI, X. (2024). Exploring phyllosphere fungal communities of 29 alpine meadow plant species: composition, structure, function, and implications for plant fungal diseases. Frontiers in Microbiology, 15: 1451531. doi: https://doi.org/10.3389/fmicb.2024.1451531
ZEINELDIN, M., HICKS, J., WARD, H.J., WÜNSCHMANN, A., CAMP, P., FARRELL, D., LEHMAN, K., THACKER, T.C. & CUTHBERT, E. (2023). Complete genome sequence of Candidatus Mycobacterium wuenschmannii, a nontuberculous mycobacterium isolated from a captive population of Amazon milk frogs. Microbiology Resource Announcements, 12: e00547-00523. doi: https://doi.org/10.1128/MRA.00547-23
Downloads
Published
License
Copyright (c) 2026 Yalan Li , Kanae Nishii, Nathan Kelso , Sadie Barber , Louise Galloway , Michael Möller, Joanne E. Taylor

This work is licensed under a Creative Commons Attribution 4.0 International License.
Please read our Open Access, Copyright and Permissions policies for more information.
