The second paragraph of the Introduction to the recent book "A Vast Machine: ComputerModels, Climate Data, and the Politics of Global Warming" by Paul N.Edwards reads:

' ... without models there are no data. I'm not talking about the difference between "raw" and "cooked" data. I mean this literally. Today, no collection of signals or observations -- even from satellites, which can "see" the whole planet -- becomes global in time and space without passing through a series of models.'

A few pages on Edwards summarizes: "... everything we know about the world's climate -- past, present, and future -- we know through models."

This book distinguishes three types of computer models.

1. Simulation models that are built on physical theory. These use atmospheric physics to build numerical models to calculate large-scale atmospheric motion and predict the weather.

2. Reanalysis models that come from weather forecasting. These models also simulate the weather, but they also check their results with actual weather observations.

3. Data analysis models that are a collection of mathematical techniques, algorithms, and empirically derived adjustments to instrument readings.

In addition to these models, the socio-technical systems that have been developed to observe weather and climate make up a climate knowledge infrastructure. One further important observation is that this infrastructure and the models used to provide and analyze the data we have about weather and climate are constantly changing.

I have just started reading Edwards' book, but I am already taking one lesson from it: I need to understand how data and models inform what I accept as information and knowledge. As my friend Dennis Hamilton reminded me, "there are no uninterpreted data". Knowing how models and data yield information I can evaluate and use is one of those 21st Century required skills.


Infochimps is an online marketplace for datasets. In a conversation with Flip Kromer, the founder, he proposed set of dataset properties to evaluate the ability of datasets to be used, reused, and re-mixed. This table summarizes these properties and also proposes a format for developing and defining practicable metrics. This approach builds on the work of Tom and Kai Gilb. The attempt here is to get feedback on a simple set of properties and their measurement that can help develop useful data descriptions in the many flourishing open data efforts.

There are three primary levels of usability properties: properties that enable discovery and access; properties that enable database instantiation; and properties that enable data analysis and data mashups. Dataset usability at each level requires a  description of any restrictions on use; hence "Use restrictions" is the "+1" property. For example, minimal usability of a dataset requires the ability to find it and access it. Once accessed, a dataset's fitness for use can be determined. The presence of a README file with at least simple contact information is proposed as a basic usability requirement.

The "Scale" and "Meter" columns provide descriptions of appropriate and practicable metrics. Some simple metrics for the Discovery Properties are suggested. The metrics for Database and Data Analysis properties are areas of investigation. Please let us know what you think.


UPDATE1: the full Dataset Usability Properties spreadsheet is available as an Infochimps dataset .

UPDATE2: hat tip to Tom and Kai Gilb.

UPDATE3: the latest version of the spreadsheet . Image updated.



This appeared in a tweet on 18 Jan 2010:
 'Just heard the phrase "open transparency" to mean "open access to data." #gov20 // open & transparent mean the same thing'

Let's see how well this works.

If (1) "open transparency" means "open access to data"; and (2) "open" and "transparent" mean the same thing; then "open access to data" is the same as "transparent access to data".

Oops. That's not quite right.

But I do like the phrase "open transparency" because it makes me wonder if we mean "transparent openness". It seems that there are activities that are closed and transparent. Take subscription-based access to research output, for example. How research results end up in subscription-based publications is partly (slightly?) transparent. But access to those results are closed, unless you have a subscription. We need to have clear and agreed upon definitions for these terms we keep using to argue for change. Maybe we need more than one word for "open".

Jacques Barzun, in Simple & Direct, argues that if you want to make your meaning clear, then choosing words carefully is paramount important. Barzun makes the following observation about oral conversation: "Few people organize their thoughts and words in fully intelligible remarks. It seems easier to use a sort of oral shorthand and rely on the listener to jump to the right conclusion. He often fails. You correct him or he asks you questions to settle his uncertainty." He continues, "[w]ith a written text there is no opportunity to ask questions. All the reader has is words and punctuation marks. It follows that these must be set down right -- right for the purpose and right for the reader."

One of the tweets from an open access session at the recent ScienceOnline2010 conference reads "takeaway from OA discussion: OA is incredibly complicated. #scio10".

Being sloppy with how we use words isn't going to make these efforts simpler.

[UPDATE: 2010-04-07: re-reading "Simple & Direct" revealed my own incorrect use of the word "paramount" (it's an adjective, not a noun).]

On 25 June the{app}gap broadcast a webinar on Facebook in business. The slide deck, a link to the audio, and the questions and comments submitted by attendees are here. Many thanks to the{app}gap for their support and facilitation of the webinar. Join us at their site for ongoing Q&A.

As a panelist I had the opportunity to listen to everyone else and I have a few reflections to share.

First, some background. The Facebook Groups in Business Investigation (FGIBI) was a short empirical study of how a small number of businesses used Facebook Groups as part of their business strategies. It was a completely volunteer effort, convened by Jenny Ambrozek and Victoria Axelrod from 21st Century Organizations, and Bill Anderson from Praxis101. Jenny started with one known Facebook connection and within a few weeks we'd gathered 10 participant groups from 6 countries on 4 continents, established a set of research advisors, and a process for ongoing data collection for 12 consecutive weeks. Speaking for myself I was prepared for the effort needed to keep it going, but pleased that sharing the effort among the three convenors, and with support from the advisors, made it tractable. This was a collaboration, and couldn't have been done any other way.

Three of the study particpants were able to participate as webinar panelists and provided thoughts on their own business use of Facebook groups. In addition to the material in the slide deck, the following ideas stood out to me.

Eric Edelstein's Facebook Group eSquared Fashion focused strictly on marketing and sales. Eric's very honest assessment of Facebook groups in business? It's more work than you think. Facebook Fanclubs might be a better internet marketing venue. The best outcome is when the fans pitch in on marketing and promotion. Eric's experience is that internet marketing gets more exciting the more you put into it. So, get out there!

Kimberly Samaha ran five different groups focused on energy and sustainability (The Bordeaux Energy Colloquium was one group). She experimented with several innovative activties and interventions, as well as learning to weave networks from existing groups. The slide deck has the details of the activities, outcomes, and conclusions. Kimberly mentioned that social networking behaviors are often not viewed as business and professional behaviors. Is this changing? What are the contexts? These kinds of distinctions need more research and discussion.

Francois Goisseaux's Marketing 2.0 Facebook Group was the largest and, because of its size, quickly ran into Facebook group communication limits.* Francois attributes the growth of his group to providing a weekly event, what he called a "heartbeat" (a word with many associations). He also speculated that the name "Marketing 2.0" also attracted people who wanted their Facebook profile to be associated with the moniker "Marketing 2.0". Names matter, and the impact of identity politics on business uses of social networking platforms is another area that deserves further research.

Jenny Ambrozek summarized the shared investigation experiences noting that social networking and group work is never free. And even though I often expect activity to just happen, someone (or some ones) need to "architect the magic". By themselves, new tools, concepts, and work practices can't make anything happen. We need to "architect the magic". Now that sounds like fun.

Let us know your thoughts.

* In an e-mail yesterday from the American Association for the Advancement of Science Facebook Group I learned that the Facebook group messaging limit has been increased to 5000. (The networks have ears.)

Technorati Tags: , , , , , ,

What value does Facebook have for businesses? In particular, what might be the value for your business, your non-profit, or non-governmental organization? And how might you go about creating and sharing that value?

From December 2007 until March 2008, Jenny Ambrozek, Victoria Axelrod, and Bill Anderson, convened a study on the use of Facebook Groups in Business. Now the study conveners and participants will be reflecting on their experiences and learnings in a Webinar on Wed., June 25 at 3 p.m. EST over at The App Gap titled "Should Your Business Be Friends with Facebook?"

Whether you're a Facebook fan, skeptic, or bystander, consider joining us -- I know we will learn something together.

Technorati Tags: , , , , ,

The 2nd TCDL: Day 2

Day two of the Texas Conference on Digital Libraries showed just how far TDL has come since last year. The presentations again featured stories of day-to-day practices in gathering materials, building collections, and sharing access. The more we share about daily practice the better we'll build practices and systems we can share.

Here's the summary of the second day presentations (day one summaries).

The University of North Texas is reaching out to the community to help build the metadata for their rescuing Texas history project. With about one hour of training and ongoing guidance volunteers are creating the metadata for the collection. The interaction among the volunteers when they work together is similar to a knitting circle. This is a good thing. It's not about control. Engaging and guiding volunteer efforts is how libraries can build collections.

At Texas A&M University, the preservation of the Texas Agricultural Experiment Station Bulletin reveals just how much work is required to turn paper into a useful digital resource. The work of scanning, reviewing the quality of the results, managing the names of individual image files, creating the metadata, etc., is tedious. It's good to be reminded just how tedious it can be.

Texas Tech University has built a digital assembly line to step up to their scanning and digitization work load. They are diligent about documenting their practices, processes, and policies, and their experience provides a benchmark for how to build a high-quality, high-volume scanning center. One tip on improving worker ability is to use quality control as a learning tool. (As I fix my own errors I learn not to make them again.)

John Leggett, TDL co-director, showed a live demo of "Vireo", a Manakin front-end to TDL that supports the deposit of, and access to, electronic theses and dissertations (ETDs). ETDs are good candidates for an institutional repository (IR). They need to be published and curated. They are born digital. For the TDL to support all member institutions the repository must accommodate a variety of procedures, policies and practices. The Vireo ETD service does all this and more. It serves as a learning object itself. The software engineering tradeoffs required to put Vireo into production provide valuable lessons for future IR applications.

The development and growth in the Shibboleth Federation since last year is impressive; currently there are 13 member institutions. This year's presentation highlighted the work done in building the federation, establishing the social bonds of trust, and the requisite technical operations. Furthermore, the commitment of the TDL team to implement and support the Lone Star Education and Research Network (LEARN) demonstrates how the TDL is establishing itself. A technical "Shibboleth Install-fest" will be held 14-15 July at UTexas Austin. Contact TDL for more information.

The conference ended with a call for the TDL to consider being open to all Texans (for a start), not just those with academic affiliations. This proposal grows from Yochai Benkler's The Wealth of Networks argument about what can be called "long tail peer production". The Shibboleth work shows what it takes to provide secure, easy, and legal access to a federated library. I think it's work we should take on. And where are the resources for this?

The need for explicit resources to support the continuing development of the TDL was one theme of this year's conference. Now that projects and working groups are established and making progress, it is crucial to insure that they don't become "after work hours" activities. The visibiity of the TDL can help support its funding. That's my hope.

And finally, the plan is to hold the third TCDL held next year in conjunction with the 2009 Joint Conference on Digital Libraries, which will be held in Austin. More opportunities for social networking; more visibility for TDL. In this case more is better.

UPDATE: It was pointed out that I did not include a summary of the Vireo demonstration. Mea culpa. It was in my conference notes. Another lesson for me: haste makes waste.

Technorati Tags: , , , , , ,

Last June the University of Texas, Texas A&M University, and the Texas Digital Library (TDL) convened a Texas Conference on Digital Libraries (TCDL) at the University of Texas at Austin. It was a small meeting, with generative presentations and discussion and showed that the TDL is a project with a vision. Just one year later, to judge from the attendance and energy at the second TCDL, it's also a project that has "legs". After day one my free association about the TDL is "This dog can hunt."

The second TCDL started with a keynote by Mark Leggott from the University of Prince Edward Island on "Virtual Research Environments". At PEI Mark and his collegues have developed a Virtual Research Environment that supports administration, learning, and research. They work in an agile and iterative way, building and learning as they go. The vision Mark has for the repository and it's supporting infrastructure is to be an invisible foudation to all areas of academic activity. It's an ambitious goal, but they seem to be achieving it. Although much of the content requires a login for access it's still worth checking out.

Mark Leggott made several points. First, research and teaching are not orthogonal, but rather on a continuum. There's no reason why fresh research results can't be accessed the next day by students for class assignments. Second, library and information management expertise is needed to help manage newly created research data, not just the published results. And third, with the tools and practices they have established at PEI it's possible to populate an institutional repository with 60% of faculty output with no faculty input. Libraries do not need to wait for faculty participation to get started. A tipping point may not be that hard to create. Finally, the PEI VRE is integrating Fedora, Drupal, and Moodle to build their services. Much can be learned from this project.

Here's a summary of the rest of the first day's presentations:

Texas Tech University is focusing on library outreach. They are prototyping Meebo as a way to implement an "Ask a Librarian" function. They are making changes ... stay tuned.

At Texas A&M University the Energy Systems Lab has constructed an online digital collection that is a model of both well documented content and development of skill and expertise.

The Texas A&M TDL Bridge Group reported on their first year of operation and laid out the real work of educating librarians and faculty on both institutional repositories and the TDL. Short story? It's real work.

At UT Dallas the library is working with researchers and faculty to provide access to datasets in management and social sciences. The library is accomodating the demands for data, but as you might imagine, it's not the easiest journey for any of the participants (researchers, librarians, and data providers).

At UT Austin tools are being developed to bring the library to users where they are. LibX, OpenSearch, a Facebook app, and an iGoogle gadget, have been developed to provide user access to the UT libraries. This is the "build it and bring it to them" model -- it looks good!

The Texas State Library and Archives Commission is working on making metadata that can support interoperability among systems. Sometimes choosing what is simplest to implement is the best choice, even when cool features are left out.

The University of North Texas Health Science Center is training librarians and faculty about the ramifications of the recent NIH mandate for deposit of research output into PubMed Central. This is a big deal. Compliance and consequences? These are empirical questions.

At UT Austin's School of Information, tools are being developed to provide rich markup and access to video materials. Quinn Stewart from the iSchool has been developing courses to teach digitization of video materials. The tools enable marking contents, indicies, and specific objects in videos. It's hard to underestimate how useful this might be.

Baylor University reported on a Black Gospel Music Restoration Project. The content collection and description work is excellent. The 30 second gospel music sample just blew me away. The one downside of this project is that all the music is locked up because the copyright issues are not clear. That's a loss.

The University of North Texas described their work with Archival Resource Keys. ARKs are a standard way of providing a persisitent naming scheme for digital information objects. An ARK provides a link to an object, its metadata record, and its available service agreements. This is an interesting project. Being able to query an object about what services it can provide is particularly interesting. But it's also creating (yet) another unique and persistent web identifier we all need to know and care about. I'd like to know the thinking that decided that DOIs and PURLs aren't good enough. I know, we're still in the early days of computers, inter-networks, etc.

So that's the end of day one. Day two promises more operational stories and an open forum. I know that a proper blog post should have URLs for all the sites and resources mentioned. I'll provide an online link to the conference presentations as soon as I have one.

Technorati Tags: , , , , ,

Here's a frequent user experience of mine with Twitter:


I'm not alone with this experience or, curiously, with feeling irked by it. I'm not sure why I like Twitter so much. That in itself is of interest.

Today Tim Bray suggested "Twitterbucks", a subscription business model for Twitter. I found that I would pay a subscription fee to use Twitter. I didn't even hesitate. So I do think it's valuable.

Do you? Would you pay? Take Tim's simple survey.

Technorati Tags: , , , , ,

Dennis Hamilton's, "Trust but demonstrate" post is a nice formulation of our human / systems interdependencies and interactions. It reminded me of a comment in a paper about proving programs correct. Paraphrasing: we don't prove people correct; we expect people to be responsible.

It's the same with a computer system. I expect (or want to expect) the system to take responsibility for it's part of the interaction. After all, I'm relying on the computer when I'm paying bills online, or buying a book from Amazon, or even posting to my blog and pinging Technorati. And if I'm ever going to be able to feel comfortable with using the computer, then the builders of these systems and applications need to step up and demonstrate trustworthiness. Let's write software that takes responsibility for its own actions. Follow up when things don't go as expected. And when it's appropriate, let me know that all went well. Seriously, I need software that can help me take care of my part of the interactions. Systems are getting more complex and complicated. I think the whole computer-human-computer interaction thing would be better if the systems took some responsibility for their actions.

I know this isn't easy. But with all the recent news about software that can tell if I'm lying, or when I can be interrupted, or ..., it doesn't sound that hard. So this is a request to all systems architects and developers and service providers. Build trustworthy applications -- demonstrate that we all need to be responsible participants in this system we call the Internet.

Technorati Tags: , , , , ,

Recent Comments