Semantic Search Is Changing How We Find Development Data

Try this experiment. Go to any major development data portal – IATI Datastore, the World Bank’s project database, a UN agency document library – and search for « community resilience programs in the Sahel. »

You’ll get results. Some will be relevant. Many won’t. And you’ll almost certainly miss documents that use different terminology for the same concept: « renforcement de la résilience communautaire, » « community-based adaptation, » « pastoral risk management, » or « social protection in fragile contexts. »

Same concept. Different words. Traditional keyword search treats them as completely unrelated queries.

How keyword search fails the development sector

Most development databases still rely on keyword matching – the same technology that powered search engines in the early 2000s. Type a word, get documents that contain that exact word (or a close variant). It works fine when you know exactly what you’re looking for and you know what it’s called.

In international development, both of those conditions fail regularly.

First, the multilingual problem. Development documents exist in English, French, Spanish, Arabic, Portuguese, and dozens of other languages. A keyword search in English won’t find French-language documents, even when they describe identical programs. For consultants working in the Sahel – where French and English coexist as working languages – this means half the relevant knowledge base is invisible.

Second, the vocabulary problem. Development has a jargon problem. Every donor, agency, and evaluation framework uses slightly different terminology. What the World Bank calls « social safety nets, » the EU calls « social protection floors, » and a Senegalese government document might call « filets sociaux. » These are the same thing. Keyword search doesn’t know that.

Third, the concept problem. Sometimes you’re not looking for a specific term at all. You want to find projects that addressed a particular type of challenge, or evaluations that measured a particular kind of outcome. You’re searching for meaning, not words.

What semantic search actually does

Semantic search works differently. Instead of matching character strings, it converts text into mathematical representations – called « embeddings » – that capture meaning. Two sentences that mean the same thing but use completely different words will have similar embeddings. Two sentences that share words but mean different things will have different embeddings.

In practical terms, this means you can search a database of development documents in English and find relevant results in French. You can search for « drought response » and find documents about « early warning systems for food insecurity » – because the system understands they’re related concepts.

The technology behind this has matured rapidly. Models like Voyage AI’s multilingual embeddings can represent text in over 100 languages in the same mathematical space. Combined with vector databases (like pgvector, running on PostgreSQL), this creates search systems that are both powerful and practical to deploy.

RAG: search meets intelligence

Semantic search becomes even more powerful when combined with Retrieval-Augmented Generation, or RAG. In a RAG system, a user’s question first triggers a semantic search to find the most relevant documents. Those documents are then fed to a language model, which synthesizes an answer grounded in the actual source material.

Instead of getting a list of 200 documents to sort through, you get a direct answer – with citations pointing back to the original sources you can verify.

For a development consultant preparing a project proposal, this changes the research phase dramatically. Instead of spending two days reading through World Bank project documents, you can ask: « What were the key lessons learned from community-based natural resource management projects in Mauritania between 2018 and 2024? » and get a synthesized answer in seconds, with links to the source evaluations.

How ICOpedia uses this technology

ICOpedia’s document intelligence layer is built on exactly this stack: multilingual embeddings (Voyage AI), vector storage (pgvector on Supabase), and RAG-powered synthesis (Claude API). The system ingests documents from IATI, donor portals, and uploaded institutional reports, converts them into searchable embeddings, and makes them queryable through a natural-language interface.

The result: a development professional in Nouakchott can search in French and find English-language World Bank evaluations. A consultant in Dakar can ask a conceptual question and get answers drawn from across the entire document corpus – not just the documents that happened to use the right keywords.

This isn’t a marginal improvement. It’s the difference between having access to a fraction of the sector’s accumulated knowledge and having access to all of it.

What comes next

Semantic search in development is still early. Most major platforms haven’t adopted it yet. The organizations and tools that move first will have a significant advantage – not just in search quality, but in the depth of insight they can extract from existing data.

The knowledge already exists. The technology to unlock it is here. The only question is how fast the sector catches up.

Semantic Search Is Changing How We Find Development Data

How keyword search fails the development sector

What semantic search actually does

RAG: search meets intelligence

How ICOpedia uses this technology

What comes next

Quick Links

Nouakchott Office

Nouadhibou Office