Wikiodia Foundation Wikipedia Is, to deal with Artificial ielligence robots Which is constaly extracting this platform’s information, has published a set of data specifically designed to teach artificial ielligence models.
Wikipedia has announced in collaboration with the platform Kaggn (Well owned by Google and hosts machine learning data), has released a beta version of a data set that includes the structured Wikipedia coe in English and French.
Wikipedia Data Collection Help to develop artificial ielligence developers

According to Wikipedia, this dataset considers the needs of Developers Designed and accessible to machine -readable information for training, precision adjustme, evaluation, matching and analysis of artificial ielligence models easier.
These data have been released with free license and include research summaries, short descriptions, images links, infox data and segmeation of articles, but there are no references and non -written files such as audio files.
The Wikipedia Foundation says in a stateme that these data, preseed in the form of JSON files, could be a better alternative to direct extraction and analysis of raw text articles. Data extraction by robots is currely putting a lot of pressure on Wikipedia servers, as these artificial ielligence robots widely use its bandwidth.
Previously, Wikipedia had signed coe sharing coracts with companies such as Google and Iernet Archive, but collaborating with Kagget could make Wikipedia data more accessible to smaller companies and independe researchers.
Bernanda Flynn, director of Kaggn’s collaboration, said of the collaboration:
“We are very excited to host the Wikipedia Foundation data. “Kaggle will proudly play a role in maiaining access, productivity and usefulness.”



