Infochimps and the 9 +1 properties of usable datasets

Infochimps is an online marketplace for datasets. In a conversation with Flip Kromer, the founder, he proposed set of dataset properties to evaluate the ability of datasets to be used, reused, and re-mixed. This table summarizes these properties and also proposes a format for developing and defining practicable metrics. This approach builds on the work of Tom and Kai Gilb. The attempt here is to get feedback on a simple set of properties and their measurement that can help develop useful data descriptions in the many flourishing open data efforts.

There are three primary levels of usability properties: properties that enable discovery and access; properties that enable database instantiation; and properties that enable data analysis and data mashups. Dataset usability at each level requires a  description of any restrictions on use; hence "Use restrictions" is the "+1" property. For example, minimal usability of a dataset requires the ability to find it and access it. Once accessed, a dataset's fitness for use can be determined. The presence of a README file with at least simple contact information is proposed as a basic usability requirement.

The "Scale" and "Meter" columns provide descriptions of appropriate and practicable metrics. Some simple metrics for the Discovery Properties are suggested. The metrics for Database and Data Analysis properties are areas of investigation. Please let us know what you think.


UPDATE1: the full Dataset Usability Properties spreadsheet is available as an Infochimps dataset .

UPDATE2: hat tip to Tom and Kai Gilb.

UPDATE3: the latest version of the spreadsheet . Image updated.



Leave a comment

Recent Entries