AUTHORS: Houriiyah Tegally, Eduan Wilkinson, Joseph L.- H. Tsui, Monika Moir, Darren Martin, Anderson Fernandes Brito, Marta Giovanetti, Kamran Khan, Carmen Huber, Isaac I. Bogoch, James Emmanuel San, Jenicca Poongavanan, Joicymara S. Xavier, Darlan da S. Candido, Filipe Romero, Cheryl Baxter, Oliver G. Pybus, Richard J. Lessells, Nuno R. Faria, Moritz U.G. Kraemer, Tulio de Oliveira
YEAR OF PUBLICATION: 2023
Abstract
The Alpha, Beta, and Gamma SARS-CoV-2 variants of concern (VOCs) co-circulated globally during 2020 and 2021, fueling waves of infections. They were displaced by Delta during a third wave worldwide in 2021, which, in turn, was displaced by Omicron in late 2021. In this study, we use phylogenetic and phylogeographic method store construct the dispersal patterns of VOCs worldwide. We find that source-sink dynamics varied substantially by VOC and identify countries that acted as global and regional hubs of dissemination. We demonstrate the declining role of presumed origin countries of VOCs in their global dispersal, estimating that India contributed 80 countries had received introductions of Omicron within 100 days of its emergence, associated with accelerated passenger air travel and higher transmissibility. Our study highlights the rapid dispersal of highly transmissible variants, with implications for genomic surveillance along the hierarchical airline network.
AUTHORS: Francisco NM, van Wyk S, Moir M, San JE, Sebastião CS, Tegally H, Xavier J, Maharaj A, Neto Z, Afonso P, Jandondo D, Paixão J, Miranda J, David K, Inglês L, Pereira A, Paulo A, Carralero RR, Freitas HR, Mufinda F, Lutucuta S, Ghafari M, Giovanetti M, Giandhari J, Pillay S, Naidoo Y, Singh L, Tshiabuila D, Martin DP, Chabuka L, Choga W, Wanjohi D, Mwangi S, Pillay Y, Kebede Y, Shumba E, Ondoa P, Baxter C, Wilkinson E, Tessema SK, Katzourakis A, Lessells R, de Oliveira T, Morais J.
YEAR OF PUBLICATION: 2023
Abstract
In Angola, COVID-19 cases have been reported in all provinces, resulting in >105,000 cases and >1900 deaths. However, no detailed genomic surveillance into the introduction and spread of the SARS-CoV-2 virus has been conducted in Angola. We aimed to investigate the emergence and epidemic progression during the peak of the COVID-19 pandemic in Angola. We generated 1210 whole-genome SARS-CoV-2 sequences, contributing West African data to the global context, that were phylogenetically compared against global strains. Virus movement events were inferred using ancestral state reconstruction. The epidemic in Angola was marked by four distinct waves of infection, dominated by 12 virus lineages, including VOCs, VOIs, and the VUM C.16, which was unique to South-Western Africa and circulated for an extended period within the region. Virus exchanges occurred between Angola and its neighboring countries, and strong links with Brazil and Portugal reflected the historical and cultural ties shared between these countries. The first case likely originated from southern Africa. A lack of a robust genome surveillance network and strong dependence on out-of-country sequencing limit real-time data generation to achieve timely disease outbreak responses, which remains of the utmost importance to mitigate future disease outbreaks in Angola.
AUTHORS: Tsui JL, Pena RE, Moir M, Inward RP, Wilkinson E, San JE, Poongavanan J, Bajaj S, Gutierrez B, Dasgupta A, de Oliveira T, Kraemer M, Tegally H, Sambaturu P.
YEAR OF PUBLICATION: 2024
Abstract
Health consequences arising from climate change are threatening to offset advances made to reduce the damage of infectious diseases, which vary by region and the resilience of the local health system. Here, we discuss how climate change-related migrations and infectious disease burden are linked through various processes, such as the expansion of pathogens into non-endemic areas, overcrowding in new informal settlements, and the closer proximity of disease vectors and susceptible human populations. Countries that are predicted to have the highest burden are those that have made the least contribution to climate change. Further studies are needed to generate robust evidence on the potential consequences of climate change-related human movements and migration, as well as identify effective and bespoke short- and long-term interventions.
AUTHORS: Poongavanan J, Lourenço J, Tsui JL, Colizza V, Ramphal Y, Baxter C, Kraemer MUG, Dunaiski M, de Oliveira T, Tegally H.
YEAR OF PUBLICATION: 2024
Abstract
Dengue is a significant global public health concern that poses a threat in Africa. Particularly, African countries are at risk of viral introductions through air travel connectivity with areas of South America and Asia in which explosive dengue outbreaks frequently occur. Limited reporting and diagnostic capacity hinder a comprehensive assessment of continent-wide transmission dynamics and deployment of surveillance strategies in Africa. In this study, we aimed to identify African airports at high risk of receiving passengers with dengue from Asia, Latin America, and other African countries with high dengue incidence. For this modelling study, air travel flow data were obtained from the International Air Transport Association database for 2019. Data comprised monthly passenger volumes from 14 high-incidence countries outside of Africa and 18 countries within the African continent that reported dengue outbreaks in the past 10 years to 54 African countries, encompassing all 197 commercial airports in both the source and destination regions. The risk of dengue introduction into Africa from countries of high incidence in Asia, Latin America, and within Africa was estimated based on origin-destination air travel flows and epidemic activity at origin. We produced a novel proxy for local dengue epidemic activity using a composite index of theoretical climate-driven transmission suitability and population density, which we used, in addition to travel information in a risk flow model, to estimate importation risk. Countries in eastern Africa had a high estimated risk of dengue importation from Asia and other east African countries, whereas for west African countries, the risk of importation was higher from within the region than from countries outside of Africa. Some countries with high risk of importation had low local transmission suitability, which is likely to hamper the risk that dengue importations would lead to local transmission and establishment of a dengue outbreak. Mauritius, Uganda, Côte d’Ivoire, Senegal, and Kenya were identified as countries susceptible to dengue introductions during periods of persistent transmission suitability. Our study improves data-driven allocation of surveillance resources, in regions of Africa that are at high risk of dengue introduction and establishment, including from regional circulation. Improvements in resource allocation will be crucial in detecting and managing imported cases and could improve local responses to dengue outbreaks.
AUTHORS: Jenicca Poongavanan, José Lourenço, Joseph L.-H. Tsui, Vittoria Colizza, Yajna Ramphal, Cheryl Baxter, Moritz U.G. Kraemer, Marcel Dunaiski, Tulio de Oliveira, Houriiyah Tegally
YEAR OF PUBLICATION: 2024
Abstract
Dengue is a significant global public health concern that poses a threat to Africa. Particularly, African countries are at risk of viral introductions through air travel connectivity with areas of South America and Asia that experience frequent explosive outbreaks. Limited reporting and diagnostic capacity hinder a comprehensive assessment of continent-wide transmission dynamics and deployment of surveillance strategies in Africa. The risk of dengue introduction into Africa from countries of high incidence was estimated based on origin-destination air travel flows and epidemic activity at origin. We produced a novel proxy for local dengue epidemic activity using a composite index of theoretical climate-driven transmission potential and population density, which we used, along with travel information in a risk flow model, to estimate the importation risks. We find that countries in east Africa face higher risk of importation from Asia, whereas for west African countries, larger risk of importation is estimated from South America. Some countries with high risk of importation experience low local transmission potential which likely hampers the chances that importations lead to local establishment and transmission. Conversely, Mauritius, Uganda, Ivory Coast, Senegal, and Kenya are identified as countries susceptible to dengue introductions during periods of persistent transmission suitability. Work improves the data driven allocation of surveillance resources, in regions of Africa that are at high risk of dengue introductions and establishment. This will be critical in detecting and managing imported cases and can improve local response to dengue outbreaks.
AUTHORS: Tegally H, Dellicour S, Poongavanan J, Mavian C, Dor G, Fonseca V, Tagliamonte MS, Dunaiski M, Moir M, Wilkinson E, de Albuquerque CFC, Frutuoso LCV; CLIMADE Consortium; Holmes EC, Baxter C, Lessells R, Kraemer MUG, Lourenço J, Alcantara LCJ, de Oliveira T, Giovanetti M.
YEAR OF PUBLICATION: 2023
Abstract
In March 2024, the Pan American Health Organization (PAHO) issued an alert in response to a rapid increase in Oropouche fever cases across South America. Brazil has been particularly affected, reporting a novel reassortant lineage of the Oropouche virus (OROV) and expansion to previously non-endemic areas beyond the Amazon Basin. Utilising phylogeographic approaches, we reveal a multi-scale expansion process with both short and long-distance dispersal events, and diffusion velocities in line with human-mediated jumps. We identify forest cover, banana and cocoa cultivation, temperature, and human population density as key environmental factors associated with OROV range expansion. Using ecological niche modelling, we show that OROV circulated in areas of enhanced ecological suitability immediately preceding its explosive epidemic expansion in the Amazon. This likely resulted from the virus being introduced into simultaneously densely populated and environmentally favourable regions in the Amazon, such as Manaus, leading to an amplified epidemic and spread beyond the Amazon. Our study provides valuable insights into the dispersal and ecological dynamics of OROV, highlighting the role of human mobility in colonisation of new areas, and raising concern over high viral suitability along the Brazilian coast.
AUTHORS: Natalia Blanco, Olanrewaju Lawal, Jibreel Jumare, Christina Riley, James Onyemata, Thomas Kono,Anna Winters, Chenfeng Xiong, Alash’le Abimiku,Manhattan Charurat, Kristen A. Stafford.
YEAR OF PUBLICATION: 2025
Abstract
Social vulnerability has been shown to be a strong predictor of disparities in health outcomes. A common approach to estimating social vulnerability is using a composite index, such as the social vulnerability index (SVI), which combines multiple factors corresponding to key social determinants of health. Lawal and Osayomi created an SVI to explore key social determinants of health-related COVID-19 infection among the Nigerian population. This study explored the association of COVID-19 SVI with COVID-19 seroprevalence using a large household survey in Nigeria. Weighted COVID-19 seroprevalence estimates at the Local Government Areas (LGA) were estimated and merged with the Lawal and Osayomi SVI, also at the LGA-level. Linear regression models were constructed to evaluate the relationship between the SVI and COVID-19 seroprevalence. The effect of SVI was evaluated both as a continuous variable and categorized into quintiles to evaluate dose–response effects. Our results confirmed a positive relationship between social vulnerability and COVID-19 infection in four states and the Federal Capital Territory in Nigeria. Compared to class 1 (the least vulnerable group), COVID-19 seroprevalence was, on average, 9.21% and 6.42% higher in classes 4 and 5 LGAs, respectively, after adjustment by phase of the survey. The effect was particularly strong farther into the pandemic (June 2021), when COVID-19 mitigation measures were relaxed. In conclusion, SVI can potentially be a useful tool to effectively prioritize communities for resource allocation as part of emergency response and preparedness in Africa.
AUTHORS: Lele Zhang, Xin (Bruce) Wu, Kailun Liu, Md Abdullah Al Mehedi, Jiashu Zhou, Virginia Smith, Chenfeng Xiong
YEAR OF PUBLICATION: 2026
Abstract
Establishing causal relationships between urban flooding and behavioral responses is challenging in tropical coastal cities experiencing seasonal flooding, where exposure often limits distinct control areas, rainy seasons with inundation episodes complicate discrete treatment timing, and satellite temporal resolution constrains flood tracking. We develop a framework that facilitates causal inference by shifting the unit of analysis from geographic locations to facility types. The framework uses two screening metrics to identify donor categories: Maximal Information Coefficient (MIC) for identifying facility types whose visitation patterns exhibit minimal sensitivity to precipitation variability, thereby screening for weather-resilient categories rather than direct flood impacts, and Coefficient of Variation (CV) for assessing temporal stability across flood phases. The framework then integrates these selected donors into a hybrid Synthetic Control-Difference-in-Differences estimator. In Lagos, Nigeria’s June–July 2020 rainy season, the framework integrates Location-Based Services data, ERA5 precipitation reanalysis, Sentinel-1 SAR imagery, and OpenStreetMap infrastructure. Analysis reveals heterogeneity: healthcare visitation increased 40 % during flooding and remained elevated at 51 % above baseline through recovery; transportation declined 22 % with no recovery; retail exhibited post-flood rebounds of 35 %. Effect directions remained consistent across three control specifications (Religious-only, Residential-only, and optimized synthetic control), with the synthetic approach achieving 42–64 % reductions in standard errors relative to fixed controls. The framework provides a systematic approach for impact assessment in data-constrained disaster contexts where spatial controls are limited, and discrete event isolation is constrained by monitoring infrastructure. By using precipitation as a temporally resolved proxy for flood exposure, the framework estimates compound flood-season effects using data increasingly accessible in tropical urban settings.
AUTHORS: Tsui JL, McCrone JT, Lambert B, Bajaj S, Inward RPD, Bosetti P, Pena RE, Tegally H, Hill V, Zarebski AE, Peacock TP, Liu L, Wu N, Davis M, Bogoch II, Khan K, Kall M, Abdul Aziz NIB, Colquhoun R, O’Toole Á, Jackson B, Dasgupta A, Wilkinson E, de Oliveira T; COVID-19 Genomics UK (COG-UK) consortium¶; Connor TR, Loman NJ, Colizza V, Fraser C, Volz E, Ji X, Gutierrez B, Chand M, Dellicour S, Cauchemez S, Raghwani J, Suchard MA, Lemey P, Rambaut A, Pybus OG, Kraemer MUG
YEAR OF PUBLICATION: 2023
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants of concern (VOCs) now arise in the context of heterogeneous human connectivity and population immunity. Through a large-scale phylodynamic analysis of 115,622 Omicron BA.1 genomes, we identified >6,000 introductions of the antigenically distinct VOC into England and analyzed their local transmission and dispersal history. We find that six of the eight largest English Omicron lineages were already transmitting when Omicron was first reported in southern Africa (22 November 2021). Multiple datasets show that importation of Omicron continued despite subsequent restrictions on travel from southern Africa as a result of export from well-connected secondary locations. Initiation and dispersal of Omicron transmission lineages in England was a two-stage process that can be explained by models of the country’s human geography and hierarchical travel network. Our results enable a comparison of the processes that drive the invasion of Omicron and other VOCs across multiple spatial scales.
AUTHORS: Moir M, Sitharam N, Hofstra M, Dor G, Mwanyika G, Ramphal Y, Reichmuth ML, San JE, Gifford R, Wilkinson E, Tshiabula D, Preiser W, Konou AA, Bitew M, Bernard Onoja A, Paganotti BM, Abera A, Maror JA, Kayiwa J, Abuelmaali S, Lusamaki EK, CLIMADE Consortium, Venter M, Burt F, Baxter C, Lessells R, de Oliveira T, Tegally H.
YEAR OF PUBLICATION: 2025
Abstract
West Nile virus (WNV) is a priority pathogen that poses a high risk for public health emergencies of global concern. Although WNV is endemic to Africa, only few (n=63) whole genomic sequences are available from the continent. In this Review, we examined the status of the molecular testing and genomic sequencing of WNV across Africa and mapped its global spatiotemporal spread. WNV has been detected in 39 African countries, the Canary Islands, and Réunion Island. Although publications, including those with molecular data, originated from 24 of these countries, genomic sequences were available from only 16 countries. Our analysis identified regions with detected viral circulation but without molecular surveillance. The current literature has substantial knowledge gaps in terms of the disease burden, molecular epidemiology, and distribution of WNV in Africa. Addressing these gaps requires an integrated One Health surveillance approach, which is challenging to establish. We propose three key surveillance needs that could improve the current understanding of the WNV disease burden in Africa, to strengthen the global public health response to this vector-borne disease.
AUTHORS: Choga WT, Gustani-Buss E, Tegally H, Maruapula D, Yu X, Moir M, Zuze BJL, James SE, Ndlovu NS, Seru K, Motshosi P, Blenkinsop A, Gobe I, Baxter C, Manasa J, Lockman S, Shapiro R, Makhema J, Wilkinson E, Blackard JT, Lemey P, Lessells RJ, Martin DP, de Oliveira T, Gaseitsiwe S, Moyo S.
YEAR OF PUBLICATION: 2024
Abstract
Botswana, like the rest of the world, has been significantly impacted by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In December 2022, we detected a monophyletic cluster of genomes comprising a sublineage of the Omicron variant of concern (VOC) designated as B.1.1.529.5.3.1.1.1.1.1.1.74.1 (alias FN.1, clade 22E). These genomes were sourced from both epidemiologically linked and unlinked samples collected in three close locations within the district of Greater Gaborone. In this study, we assessed the worldwide prevalence of the FN.1 lineage, evaluated its mutational profile, and conducted a phylogeographic analysis to reveal its global dispersal dynamics. Among approximately 16 million publicly available SARS-CoV-2 sequences generated by 30 September 2023, only 87 were of the FN.1 lineage, including 22 from Botswana, 6 from South Africa, and 59 from the UK. The estimated time to the most recent common ancestor of the 87 FN.1 sequences was 22 October 2022 [95% highest posterior density: 2 September 2022—24 November 2022], with the earliest of the 22 Botswana sequences having been sampled on 7 December 2022. Discrete trait reconstruction of FN.1 identified Botswana as the most probable place of origin. The FN.1 lineage is derived from the BQ.1.1 lineage and carries two missense variants in the spike protein, S:K182E in NTD and S:T478R in RDB. Among the over 90 SARS-CoV-2 lineages circulating in Botswana between September 2020 and July 2023, FN.1 was most closely related to BQ.1.1.74 based on maximum likelihood phylogenetic inference, differing only by the S:K182E mutation found in FN.1. Given the early detection of numerous novel variants from Botswana and its neighbouring countries, our study underscores the necessity of continuous surveillance to monitor the emergence of potential VOCs, integrating molecular and spatial data to identify dissemination patterns enhancing preparedness efforts.
AUTHORS: van Wyk S, Moir M, Banerjee A, Bazykin GA, Biswas NK, Sitharam N, Das S, Ma W, Maitra A, Mazumder A, Abdool Karim W, Lamarca AP, Li M, Nabieva E, Tegally H, San JE, Vasconcelos ATR, Xavier JS, Wilkinson E, de Oliveira T.
YEAR OF PUBLICATION: 2024
Abstract
Brazil, Russia, India, China, and South Africa (BRICS) are a group of developing countries with shared economic, healthcare, and scientific interests. These countries navigate multiple syndemics, and the COVID-19 pandemic placed severe strain on already burdened BRICS’ healthcare systems, hampering effective pandemic interventions. Genomic surveillance and molecular epidemiology remain indispensable tools for facilitating informed pandemic intervention. To evaluate the combined manner in which the pandemic unfolded in BRICS countries, we reviewed the BRICS pandemic epidemiological and genomic milestones, which included the first reported cases and deaths, and pharmaceutical and non-pharmaceutical interventions implemented in these countries. To assess the development of genomic surveillance capacity and efficiency over the pandemic, we analyzed the turnaround time from sample collection to data availability and the technologies used for genomic analysis. This data provided information on the laboratory capacities that enable the detection of emerging SARS-CoV-2 variants and highlight their potential for monitoring other pathogens in ongoing public health efforts. Our analyses indicated that BRICS suffered >105.6M COVID-19 infections, resulting in >1.7M deaths. BRICS countries detected intricate genetic combinations of SARS-CoV-2 variants that fueled country-specific pandemic waves. BRICS’ genomic surveillance programs enabled the identification and characterization of the majority of globally circulating Variants of Concern (VOCs) and their descending lineages. Pandemic intervention strategies first implemented by BRICS countries included non-pharmaceutical interventions during the onset of the pandemic, such as nationwide lockdowns, quarantine procedures, the establishment of fever clinics, and mask mandates- which were emulated internationally. Vaccination rollout strategies complemented this, some representing the first of their kind. Improvements in BRICS sequencing and data generation turnaround time facilitated quicker detection of circulating and emerging variants, supported by investments in sequencing and bioinformatic infrastructure. Intra-BRICS cooperation contributed to the ongoing intervention in COVID-19 and other pandemics, enhancing collective capabilities in addressing these health challenges. The data generated continues to inform BRICS-centric pandemic intervention strategies and influences global health matters. The increased laboratory and bioinformatic capacity post-COVID-19 will support the detection of emerging pathogens.
AUTHORS: Poongavanan J, Xavier J, Dunaiski M, Tegally H, Oladejo SO, Ayorinde O, Wilkinson E, Baxter C, de Oliveira T
YEAR OF PUBLICATION: 2023
Abstract
The Data Management and Analysis Core and Next Generation Sequencing Core will facilitate and support effective data management and analysis across INFORM Africa Consortium. We will capture and provide analysis support for relevant, timely, accurate and coherent data that can be interpreted and accessed across collaborators in multiple African countries and all collaborators in this Hub and future research hub collaborators through pilot projects. Ultimately, our focus is to enable increased access to high quality data and reproducible data analysis that can be used as tools to engage policy makers in view to better prepare for future pandemics.
Authors: Songhua Hu and Chenfeng Xiong
Year of Publication: 2022
Abstract
Location-based service (LBS) data are emerging data sources in the transportation domain which contain large-scale, fine-grained, near real-time information in population flow. However, limited studies have built forecasting models based on population flow time series extracted from LBS data. This study introduces a deep learning framework, the Interpretable Hierarchical Transformer (IHTF), for high-dimensional multi-horizontal population flow time series forecasting and interpretation. A variety of cutting-edge deep learning technologies are fused, including the gated residual network to control nonlinearity and to bypass irrelevant information, the variable selection network to assign canonical variable-wise weight, the recurrent positional encoding to learn temporal locality, and the transformer architecture to capture temporal seasonality and trend. Various exogenous variables are included, endowing the framework with sensitivity in socioeconomics, demographics, land development, weather conditions, and holidays. Different internal parameters, such as variable selection weight and temporal attention weight, are extracted to explain underlying patterns learned by the framework. Numerical experiments based on one-year nationwide county-level population flow time series show that: 1) IHTF outperforms extensive baseline models in model accuracy, yielding symmetric mean absolute percentage error (SMAPE) from 8.420% (1-day-ahead) to 11.178% (21-day-ahead). 2) Model performances vary substantially across counties. Large counties broadly present better performances in relative metrics but worse performances in absolute metrics. 3) Feature relative importance generated by IHTF is similar to tree-based model but with more even distribution, among which point-of-interests (POIs) count, county location, median household income, and percentage of accommodation and food services are the most important static variables. 4) Attention weight demonstrates that IHTF can automatically learn trend and seasonality from raw time series. The framework can serve as a dynamic travel demand forecasting module in the transportation planning process. Outcomes can be fed into dynamic traffic assignment to obtain time-dependent link-level traffic conditions in future scenarios.
Authors: Fati Murtala-Ibrahim; Jibreel Jumare; Manhattan Charurat; Chenfeng Xiong, Vivek Naranbhai; Patrick Dakum; Shirley Collie; Waasila Jassat; Gambo Aliyu; Adetifa Ifedayo; Alash’le Abimiku
Year of Publication: 2023
Abstract
Data science explores the use of big data to gain deeper insights and generate new knowledge and innovations which can lead to economic growth and sustainable development. However, setting up data science research comes with challenges. How we engage stakeholders is a major factor that determines success. This Commentary highlights important considerations for stakeholder engagement based on the experiences of investigators in a data science for health discovery project underway in Nigeria and South Africa. The perspectives presented will guide implementation in this relatively new but rapidly growing research domain
Authors: Weiyu Luo; Chenfeng Xiong; Jiajun Wan; Ziteng Feng; Olawole Ayorinde; Natalia Blanco, Man Charurat, Vivek Naranbhai, Christina Riley, Anna Winters, Fati Murtala-Ibrahim and Alash’le Abimiku
Year of Publication: 2023
Abstract
We employed emerging smartphone-based location data and produced daily human mobility measurements using Nigeria as an application site. A data-driven analytical framework was developed for rigorously producing such measures using proven location intelligence and data-mining algorithms. Our study demonstrates the framework at the beginning of the SARS-CoV-2 pandemic and successfully quantifies human mobility patterns and trends in response to the unprecedented public health event. Another highlight of the paper is the assessment of the effectiveness of mobility-restricting policies as key lessons learned from the pandemic. We found that travel bans and federal lockdown policies failed to restrict trip-making behaviour, but had a significant impact on distance travelled. This paper contributes a first attempt to quantify daily human travel behaviour, such as trip-making behaviour and travelling distances, and how mobility-restricting policies took effect in sub-Saharan Africa during the pandemic. This study has the potential to enable a wide spectrum of quantitative studies on human mobility and health in sub-Saharan Africa using well-controlled, publicly available large data sets.
Authors: Daniel J. van Zyl, Marcel Dunaiski, Houriiyah Tegally, Cheryl Baxter, Tulio de Oliveira, Joicymara S. Xavier & The INFORM Africa research study group
Year of Publication: 2024
Abstract
The rapid increase in nucleotide sequence data generated by next-generation sequencing (NGS) technologies demands efficient computational tools for sequence comparison. Alignment-based methods, such as BLAST, are increasingly overwhelmed by the scale of contemporary datasets due to their high computational demands for classification. This study evaluates alignment-free (AF) methods as scalable and rapid alternatives for viral sequence classification, focusing on identifying techniques that maintain high accuracy and efficiency when applied to extremely large datasets. We employed six established AF techniques to extract feature vectors from viral genomes, which were subsequently used to train Random Forest classifiers. Our primary dataset comprises 297,186 SARS-CoV-2 nucleotide sequences, categorized into 3502 distinct lineages. Furthermore, we validated our models using dengue and HIV sequences to demonstrate robustness across different viral datasets. Our AF classifiers achieved 97.8% accuracy on the SARS-CoV-2 test set, and 99.8% and 89.1% accuracy on dengue and HIV test sets, respectively. Despite the high-class dimensionality, we show that word-based AF methods effectively represent viral sequences. Our study highlights the practical advantages of AF techniques, including significantly faster processing compared to alignment-based methods and the ability to classify sequences using modest computational resources.
Authors: van Zyl DJ, Dunaiski M, Tegally H, Baxter C; INFORM Africa research study group; de Oliveira T, Xavier JS.
Year of Publication: 2025
Abstract
The dengue virus poses a major global health threat, with nearly 390 million infections annually. A recently proposed hierarchical dengue nomenclature system enhances spatial resolution by defining major and minor lineages within genotypes, aiding efforts to track viral evolution. While current subtyping tools– Genome Detective, GLUE, and NextClade– rely on computationally intensive sequence alignment and phylogenetic inference, machine learning presents a promising alternative for achieving accurate and rapid classification. We present Craft (Chaos Random Forest), a machine learning framework for dengue subtyping. We demonstrate that Craft is capable of faster classification speeds while matching or surpassing the accuracy of existing tools. Craft achieves 99.5% accuracy on a hold-out test set and processes over 140000 sequences per minute. Notably, Craft maintains remarkably high accuracy even when classifying sequence segments as short as 700 nucleotides.
Authors: Danilo Silva , Monika Moir , Marcel Dunaiski , Natalia Blanco , Fati Murtala-Ibrahim , Cheryl Baxter , Tulio de Oliveira , Joicymara S Xavier , The INFORM Africa research study group
Year of Publication: 2025
Abstract
In a world where data drive effective decision-making, bioinformatics and health science researchers often encounter difficulties managing data efficiently. In these fields, data are typically diverse in format and subject. Consequently, challenges in storing, tracking, and responsibly sharing valuable data have become increasingly evident over the past decades. To address the complexities, some approaches have leveraged standard strategies, such as using non-relational databases and data warehouses. However, these approaches often fall short in providing the flexibility and scalability required for complex projects. While the data lake paradigm has emerged to offer flexibility and handle large volumes of diverse data, it lacks robust data governance and organization. The data lakehouse is a new paradigm that combines the flexibility of a data lake with the governance of a data warehouse, offering a promising solution for managing heterogeneous data in bioinformatics. However, the lakehouse model remains unexplored in bioinformatics, with limited discussion in the current literature. In this study, we review strategies and tools for developing a data lakehouse infrastructure tailored to bioinformatics research. We summarize key concepts and assess available open-source and commercial solutions for managing data in bioinformatics.
Authors: Thomas J Y Kono , Ezenwa J Onyemata , Natalia Blanco , Chika K Onwuamah 5, Nnaemeka Ndodo , Paul Oluniyi , Olanrewaju Lawal , Christina Riley, Sophia Osawe , Cheryl Baxter, Anna Winters, Chenfeng Xiong, Christian T Happi, Babatunde L Salako, Ifedayo Adetifa , Alash’le Abimiku , Manhattan Charurat, Kristen A Stafford; INFORM Africa Research Study Group
Year of Publication: 2025
Abstract
As Nigeria has the sixth-highest population in the world and a significant amount of inbound and outbound travel, the characterization of SARS-CoV-2 genomic diversity across the country is critical for understanding novel pandemic dynamics. We describe the genomic diversity of SARS-CoV-2 in Nigeria throughout the COVID-19 pandemic and examine the coverage of Nigeria’s genomic surveillance system. Genome sequences and sample metadata were downloaded from the GISAID repository. A beta regression was used to test for a relationship between fully resolved nucleotide proportion over time, as a proxy for data quality. Sample and sequencing source were compared to assess geographic coverage. A total of 7759 COVID-19 sequences collected from February 2020 to March 2023 were included. The majority were collected in 2021 (76.6%) and South West (43%). Eleven states (30%) reported 10 or fewer SARS-CoV-2 genomes across the entire period. The genome sequences submitted to GISAID from Nigeria were of high quality with very few unresolved nucleotides. Waves 4 and 5, predominantly Omicron lineages, show higher diversity around position 23 kb than the other waves. Overall, the Nigeria Centre for Disease Control (NCDC) and state-run hospitals were the largest contributors to the sample collection efforts during this study period. However, the collection efforts shifted over time from NCDC in waves 1-3 to regional hospitals and other healthcare facilities in waves 4-5, although this pattern varied by geopolitical zone (GPZ). Sequencing efforts also shifted from research laboratories during the first waves to NCDC during waves 4 and 5. The findings suggest the need for a coordinated sequencing strategy and standardized protocols to improve genomic surveillance during future outbreaks of existing and novel pathogens. A network of sequencing laboratories that includes at least one in each GPZ, linked to and coordinated by the national reference laboratory at NCDC might provide more balanced coverage for future pandemics and pathogen surveillance.
Authors: Luo W, Wu X, Li R, Fitzpatrick M, Charurat M, Blanco N, Stafford KA, Naranbhai V, Abimiku A, Winters A; INFORM Africa Research Study Group for D-SI Africa Consortium; Xiong C.
Year of Publication: 2025
Abstract
We evaluated the dynamic impacts of three types of human mobilities-provincial inflows, cross-district flows, and within-district flows-on daily reported COVID-19 cases for 2020. Using a structural equation modeling approach, we conducted regressions on dynamic panel datasets. Our findings indicate that these three types of mobility influenced daily new COVID-19 case numbers in distinct and sometimes overlapping ways during the early stages of the epidemic. Within-district flows played a particularly significant role in increasing cases during the spreading stage. During the epidemic stage, we observed a sustained but gradually declining impact of within-district mobility on daily new cases, potentially highlighting the effectiveness of non-pharmaceutical interventions (NPIs). In addition, signs of social distancing fatigue were evident. Our model further shows that the first and most stringent lockdown policy significantly curtailed human mobility, whereas the second, less restrictive lockdown had negligible impact on human mobility.
Authors: Houriiyah Tegally, James E. San, Matthew Cotton, Monika Moir, Bryan Tegomoh, Gerald Mboowa, Darren P. Martin, Cheryl Baxter, Arnold W. Lambisia et al.
Year of Publication: 2022
Abstract
In many regions of the world, the Alpha, Beta and Gamma SARS-CoV-2 Variants of Concern (VOCs) co-circulated during 2020-21 and fueled waves of infections. During 2021, these variants were almost completely displaced by the Delta variant, causing a third wave of infections worldwide. This phenomenon of global viral lineage displacement was observed again in late 2021, when the Omicron variant disseminated globally. In this study, we use phylogenetic and phylogeographic methods to reconstruct the dispersal patterns of SARS-CoV-2 VOCs worldwide. We find that the source-sink dynamics of SARS-CoV-2 varied substantially by VOC, and identify countries that acted as global hubs of variant dissemination, while other countries became regional contributors to the export of specific variants. We demonstrate a declining role of presumed origin countries of VOCs to their global dispersal: we estimate that India contributed <15% of all global exports of Delta to other countries and South Africa <1-2% of all global Omicron exports globally. We further estimate that >80 countries had received introductions of Omicron BA.1 100 days after its inferred date of emergence, compared to just over 25 countries for the Alpha variant. This increased speed of global dissemination was associated with a rebound in air travel volume prior to Omicron emergence in addition to the higher transmissibility of Omicron relative to Alpha. Our study highlights the importance of global and regional hubs in VOC dispersal, and the speed at which highly transmissible variants disseminate through these hubs, even before their detection and characterization through genomic surveillance.
Authors: Houriiyah Tegally, James E. San, Matthew Cotton, Monika Moir, Bryan Tegomoh, Gerald Mboowa, Darren P. Martin, Cheryl Baxter, Arnold W. Lambisia et al.
Year of Publication: 2022
Abstract
Three lineages (BA.1, BA.2 and BA.3) of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Omicron variant of concern predominantly drove South Africa’s fourth Coronavirus Disease 2019 (COVID-19) wave. We have now identified two new lineages, BA.4 and BA.5, responsible for a fifth wave of infections. The spike proteins of BA.4 and BA.5 are identical, and similar to BA.2 except for the addition of 69–70 deletion (present in the Alpha variant and the BA.1 lineage), L452R (present in the Delta variant), F486V and the wild-type amino acid at Q493. The two lineages differ only outside of the spike region. The 69–70 deletion in spike allows these lineages to be identified by the proxy marker of S-gene target failure, on the background of variants not possessing this feature. BA.4 and BA.5 have rapidly replaced BA.2, reaching more than 50% of sequenced cases in South Africa by the first week of April 2022. Using a multinomial logistic regression model, we estimated growth advantages for BA.4 and BA.5 of 0.08 (95% confidence interval (CI): 0.08–0.09) and 0.10 (95% CI: 0.09–0.11) per day, respectively, over BA.2 in South Africa. The continued discovery of genetically diverse Omicron lineages points to the hypothesis that a discrete reservoir, such as human chronic infections and/or animal hosts, is potentially contributing to further evolution and dispersal of the virus.
Authors: Prof Keymanthri Moodley, MBChB Nezerith Cengiz, MSc Med, Aneeka Domingo, MBChB Gonasagrie Nair, MBChB Adetayo Emmanuel Obasa, PhD Richard John Lessells, MBChB et al.
Year of Publication: 2022
Abstract
Data sharing in research is fraught with controversy. Academic success is premised on competitive advantage, with research teams protecting their research findings until publication. Research funders, by contrast, often require data sharing. Beyond traditional research and funding requirements, surveillance data have become contentious. Public health emergencies involving pathogens require intense genomic surveillance efforts and call for the rapid sharing of data on the basis of public interest. Under these circumstances, timely sharing of data becomes a matter of scientific integrity. During the COVID-19 pandemic, the transformative potential of genomic pathogen data sharing became obvious and advanced the debate on data sharing. However, when the genomic sequencing data of the omicron (B.1.1.529) variant was shared and announced by scientists in southern Africa, various challenges arose, including travel bans. The scientific, economic, and moral impact was catastrophic. Yet, travel restrictions failed to mitigate the spread of the variant already present in countries outside Africa. Public perceptions of the negative effect of data sharing are detrimental to the willingness of research participants to consent to sharing data in postpandemic research and future pandemics. Global health governance organisations have an important role in developing guidance on responsible sharing of genomic pathogen data in public health emergencies.
Authors: Joicymara S. Xavier, Monika Moir, Houriiyah Tegally, Nikita Sitharam, Wasim Abdool Karim, James E. San, Joana Linhares, Eduan Wilkinson, David B. Ascher, Cheryl Baxter, Douglas E. V. Pires & Tulio de Oliveira
Year of Publication: 2023
Abstract
The SARS-CoV-2 Africa dashboard is an interactive tool that enables visualization of SARS-CoV-2 genomic information in African countries. The customizable app allows users to visualize the number of sequences deposited in each country, and the variants circulating over time. Our dashboard enables near real-time exploration of public data that can inform policymakers, healthcare professionals and the public about the ongoing pandemic.
Authors: Houriiyah Tegally, MSc, Kamran Khan, MD, Carmen Huber, MSA, Tulio de Oliveira, PhD, Moritz U G Kraemer, DPhil
Year of Publication: 2023
Abstract
Human mobility changed in unprecedented ways during the SARS-CoV-2 pandemic. In March and April 2020, when lockdowns and large travel restrictions began in most countries, global air-travel almost entirely halted (92% decrease in commercial global air travel in the months between February and April 2020). Initial recovery in global air travel started around July 2020 and subsequently nearly tripled between May and July 2021. Here, we aim to establish a preliminary link between global mobility patterns and the synchrony of SARS-CoV-2 epidemic waves across the world.
We compare epidemic peaks and human global mobility in two time periods: November 2020 to February 2021 (when just over 70 million passengers travelled) and November 2021 to February 2022 (when more than 200 million passengers travelled). We calculate the time interval during which continental epidemic peaks occurred for both of these time periods, and we calculate the pairwise correlations of epidemic waves between all pairs of countries for the same time periods.
We find that as air travel increases at the end of 2021, epidemic peaks around the world are more synchronous with one another, both globally and regionally. Continental epidemic peaks occur globally within a 20 day interval at the end of 2021 compared with 73 days at the end of 2020, and epidemic waves globally are more correlated with one another at the end of 2021.
This suggests that the rebound in human mobility dictates the synchrony of global and regional epidemic waves. In line with theoretical work, we show that in a more connected world, epidemic dynamics are more synchronized.
Authors: Houriiyah Tegally, Eduan Wilkinson, Darren Martin, Monika Moir, Anderson Brito, Marta Giovanetti, Kamran Khan, Carmen Huber, Isaac I. Bogoch, James Emmanuel San, Joseph L.-H. Tsui, Jenicca Poongavanan, Joicymara S. Xavier, Darlan da S. Candido, Filipe Romero, Cheryl Baxter, Oliver G. Pybus, Richard Lessells, Nuno R. Faria, Moritz U.G. Kraemer, Tulio de Oliveira
Year of Publication: 2022
Abstract
In many regions of the world, the Alpha, Beta and Gamma SARS-CoV-2 Variants of Concern (VOCs) co-circulated during 2020-21 and fueled waves of infections. During 2021, these variants were almost completely displaced by the Delta variant, causing a third wave of infections worldwide. This phenomenon of global viral lineage displacement was observed again in late 2021, when the Omicron variant disseminated globally. In this study, we use phylogenetic and phylogeographic methods to reconstruct the dispersal patterns of SARS-CoV-2 VOCs worldwide. We find that the source-sink dynamics of SARS-CoV-2 varied substantially by VOC, and identify countries that acted as global hubs of variant dissemination, while other countries became regional contributors to the export of specific variants. We demonstrate a declining role of presumed origin countries of VOCs to their global dispersal: we estimate that India contributed <15% of all global exports of Delta to other countries and South Africa <1-2% of all global Omicron exports globally. We further estimate that >80 countries had received introductions of Omicron BA.1 100 days after its inferred date of emergence, compared to just over 25 countries for the Alpha variant.
Get the Latest Updates on News, Events and Everything Inform-Africa
©2026. INFORM Africa. All Rights Reserved
