COUNTRY FOCUS : GERMANY
“ Although only available in English for now , DataStax ’ s solution provided a valuable initial experiment – 10x faster than our previous , on-premise GPU solution . This near-real-time speed will permit us to experiment at scale and speed by testing the integration of large subsets in a vector database aligned with the frequent updates of Wikidata .”
Developer efficiency is also key to Wikimedia Deutschland , as Wikidata is one of the world ’ s largest open-source knowledge graphs , and with the DataStax AI Platform on AWS , it was possible to ingest , process , and vector embed over 10 million entries in under three days . The vectorised data is certainly still available under free CC0 licence .
Vectorising such an extensive dataset is highly complex , as each document requires resourceintensive embedding processes to support real-time search and accessibility . Traditional linear read / write operations cannot keep pace with the scale and speed Wikimedia Deutschland needs to make hundreds of thousands of daily updates by the global community instantly accessible to millions of users .
DataStax ’ s solution provided a valuable initial experiment – 10x faster than our previous , on-premise GPU solution .
As the world ’ s foremost open-source knowledge graph , Wikidata demands high-quality , real-time results for hundreds of updates each minute . With Astra DB ’ s serverless Vectorize offering , hosted on AWS , and NVIDIA NeMo , the DataStax AI Platform provides the near-zero-latency and scalability needed to ensure Wikidata ’ s vector database is always up-todate , maintaining the reliability essential for serving Wikimedia ’ s global audience .
48 INTELLIGENTCIO EUROPE www . intelligentcio . com