Harvesting Social Media + LLMs to Understand the Challenges of Women in Livestock Farming in Sub-Saharan Africa

Our latest research, published on VeriXiv, is a feasibility study that explores whether social media listening and large language models (LLMs) can be used to understand the challenges faced by women in livestock farming across Sub-Saharan Africa.
It works and provides valuable insight, but with clear caveats around representation and bias.
What We Set Out to Do
Women in livestock farming across Sub-Saharan Africa (SSA) face structural challenges that are often under-documented. Traditional surveys and qualitative research capture a portion of this experience, but they miss the evolving, daily discourse that is happening online. To fill this gap, we asked: Can social media listening, combined with large language models, surface real-world, unfiltered insight into the challenges faced by women in livestock production?
How We Did It
-
We collected around 84,000 social media posts from platforms across SSA. After cleaning and relevancy filtering, we retained 8,048 posts for analysis.
-
We anonymised identifiers (usernames, images) to preserve privacy and comply with ethical standards.
-
We then used GPT-4 in a zero-shot classification approach (without task-specific training) to assign posts to themes, supported by manual validation of sample posts.
-
We complemented automated coding with thematic analysis (TA) to distil key themes, tensions, and narratives.
Key Findings
Four major thematic areas emerged:
-
Marginalisation of women’s role in livestock farming
Women’s contributions are often downplayed or displaced by male counterparts in public forums and decision making. -
Structural and resource barriers
Limited access to finance, land, credit systems, and veterinary services surfaced repeatedly. -
Disease management and veterinary service challenges
Posts revealed gaps in animal health support, delayed interventions, reliance on informal remedies, and difficulties in accessing veterinary extension services. -
Digital divide and voice representation
The bulk of discourse came from English-speaking regions and better-connected areas, underrepresenting francophone or conflict-affected zones.
We also identified grassroots adaptation strategies including peer learning, community networks, and local knowledge exchanges that are often invisible to formal studies.
Why It Matters
-
Our approach offers a scalable, cost-effective method to access lived experiences in contexts where traditional surveys are expensive or logistically difficult.
-
It reveals emergent, real-time challenges (such as disease outbreaks and market pressures) as they are discussed by people themselves rather than mediated through formal instruments.
-
It highlights biases in digital representation. What we see online is shaped by connectivity, language, and platform access.
This methodology does not replace traditional research but augments it, especially in under-resourced, dynamic settings.
Our Role and What’s Next
At SurreyDataHub, we are committed to pushing the boundary of data-driven insight in human and animal health. This project builds on our prior work in social media listening but extends it into the intersection of gender, agriculture, and livestock health.
Next steps:
-
Expand language coverage (eg. French, Portuguese) to reduce regional bias.
-
Integrate metadata (eg. geolocation) to map patterns of challenge over time and place.