WIT by Google AI

A Wikipedia-Based Image Text Dataset For Multimodal Multilingual Machine Learning
 A Wikipedia-Based Image Text Dataset For
Product Information
This tool is verified because it is either an established company, has good social media presence or a distinctive use case
Release date22 June, 2023
PlatformDesktop

WIT by Google AI Features

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages. Motivation Multimodal visio-linguistic models rely on a rich dataset to help them learn to model the relationship between images and texts. Having large image-text datasets can significantly improve performance, as shown by recent works. Furthermore the lack of language coverage in existing datasets (which are mostly only in English) also impedes research in the multilingual multimodal space – Google AI consider this a lost opportunity given the potential shown in leveraging images (as a language-agnostic medium) to help improve our multilingual textual understanding. To address these challenges and advance research on multilingual, multimodal learning Google AI created the Wikipedia-based Image Text (WIT) Dataset. WIT is created by extracting multiple different texts associated with an image (e.g., as shown in the above image) from Wikipedia articles and Wikimedia image links. This was accompanied by rigorous filtering to only retain high quality image-text sets. The resulting dataset contains over 37.6 million image-text sets – making WIT the largest multimodal dataset (publicly available at the time of this writing) with unparalleled multilingual coverage – with 12K+ examples in each of 108 languages (53 languages have 100K+ image-text pairs). Sljf
Browse AI Tools Similar to WIT by Google AI

Trends prompts: