
Photo by Luke Chesser on Unsplash
Wikimedia Announced A New AI-Accessible Data System
Wikipedia’s operator, the Wikimedia Foundation, announced on Wednesday a new AI-friendly database called the Wikidata Embedding Project, designed to make its information easier for AI systems to process.
In a rush? Here are the quick facts:
- The Wikimedia Foundation announced a new AI-friendly database called the Wikidata Embedding Project.
- The vector-based system has been designed to help smaller AI organizations and open-source projects create AI-driven applications with accurate information.
- It has also enabled Model Context Protocol (MCP) integration.
According to The Verge, the nonprofit launched this initiative as an extension of its Wikipedia Embedding Project, originally developed by its German branch to enhance search functionality by integrating vector-based semantic search.
Wikimedia Deutschland—which manages the collaborative database Wikidata—spent a year developing the system with the help of a large language model (LLM). The effort transformed 19 million data entries into vectors that capture context and meaning within Wikidata.
Lydia Pintscher, Wikidata portfolio lead, told The Verge that the vectorized format structures information like a graph of interconnected lines and dots, making it easier for AI developers to access. While large AI companies such as Anthropic and OpenAI have the resources to vectorize Wikidata themselves, this new project is aimed at smaller organizations.
“Really, for me, it’s about giving them that edge up and to at least give them a chance, right?” said Pintscher. She added that she hopes more projects, such as Govdirectory—a fact-checked, crowdsourced directory of official government services and social media accounts—will use Wikidata for the public good.
“Beyond improving search for Wikidata, this project also encourages the open-source AI/ML community to build innovative solutions on top of a structured and publicly accessible knowledge graph,” states the Wikidata page. “By making the tools and data open-source, the project empowers developers to create new AI-driven applications that leverage Wikidata’s inclusive and accessible knowledge.”
The team expects the project will help niche topics gain better representation across the internet. It has also enabled Model Context Protocol (MCP) integration—an initiative launched by Anthropic and joined by multiple companies and organizations—so that AI agents can query the database autonomously. To achieve this, the team used an AI model from Jina AI and stored the resulting vector database for free with IBM’s DataStax.
Wikipedia has previously raised concerns about AI-generated errors and hallucinations. This new project could help reduce hallucinations and preserve knowledge.