Aus Aifbportal
Wechseln zu:Navigation, Suche
News of April 01, 2018

KIT releases monumental dataset of more than 15 trillion triples

KIT is proud today to release an extension to an existing dataset, which will increase the size of the dataset by a factor of more than 1000. The widely cited Linked Open Numbers dataset (more than 30 citations) has been updated. Every single triple was regenerated, and even though the size has been dramatically expanded, we remain confident in the quality of every single triple.

It has been - on the data today - eight years since the original publication of the Linked Open Numbers dataset. Today, we are proud to announce to increase the size and thus utility of the dataset by three orders of magnitude.

The page has received a thorough remake, not only refreshing it optically and updating it to display better on mobile devices, but also introducing a number of new features:

  • the previous limit to the first billion natural numbers has been lifted, since the page has in the meantime moved to a 64 bit architecture. We expanded the supported numbers to the first trillion natural numbers, therefore creating 999 billion new entities.
  • all links to Wikipedia and DBpedia have been refreshed. In the eight years since the original release, Wikipedia and DBpedia have in an effort to catch up with Linked Open Numbers created new entities for numerous numbers. We have updated the links to all of those.
  • also links to Wikidata entities representing these numbers have been created and added, extending the linkage between Linked Open Numbers and the LOD cloud by thousands and thousand of new entities.
  • the whole dataset is now published under the terms of the CC-0 license, countering long years of discussion that resulted in fear, uncertainty, and doubt. Now the Linked Open Numbers dataset is standing on a solid grounding, joining other major datasets in choosing the perfect license for data.
  • we expanded the ontology and the dataset to also provide the digit sum of the numbers, allowing new applications on top of that.
  • we refreshed the links to Linked Data browsers. The original six browsers are all not available anymore to allow to browse over the Linked Open Numbers dataset. Therefore these links were all removed, and replaced with two current browsers.
  • we also support the URI4URI project and providing data about the Linked Open Numbers URIs in the URI4URI scheme.
  • the page has been updated to support Unicode's UTF8, thus showing the number names in their new full glory.

Eight years - 2922 days - after the original publication Linked Open Numbers still gets tens of thousand hits per month. We are happy to have updated the resource and expanded its lifetime considerably.

The community is invited and challenged to provide a SPARQL endpoint to the dataset. We think that the size of the dataset would provide for an interesting challenge.

An open source release of the code base is being planned.

The update was created in collaboration by Denny Vrandecic, Steffen Thoma, Andreas Thalhammer, Andreas Harth, and York Sure-Vetter.

From the research group Web Science