Sections I-III
From Praxis101Wiki
PRIVILEGED DOCUMENT -- DO NOT QUOTE, CITE, OR DISSEMINATE
17 September DRAFT PAD&PFU
Creating Global Information Commons for Science
An International Initiative of the Committee on Data for Science and Technology
Contents |
I. SUMMARY
The Global Information Commons for Science Initiative is a multi-stakeholder project arising from the second phase of the World Summit on the Information Society in Tunis in November 2005. It has the overall goal to accelerate the development and scaling up of open scientific data and information resources on a global basis, with particular focus on "common use" licensing approaches. The specific objectives are to:
1. Improve understanding and increase awareness of the societal and economic benefits of easy access to and use of scientific data and information online, particularly those resulting from publicly funded research activities.
2. Identify and promote the broad adoption of successful institutional and legal models for providing open availability on a sustainable basis and facilitating reuse of scientific data and information.
3. Encourage and help to coordinate the efforts of the many stakeholders in the world's diverse scientific community who are engaged in devising and implementing effective approaches to attaining these objectives, with particular attention to the circumstances of the developing as well as the developed countries.
4. Promote all of the objectives of the Initiative through the development of an online "open access knowledge environment".
Other international and national organizations representing a broad range of scientific and informatics stakeholders will collaborate with the Initiative on an affiliated basis.
II. BACKGROUND
A. Rationale of this Initiative
The rapid advances in digital technologies and networks over the past two decades have significantly altered and improved the ways that data and information can be produced, disseminated, managed, and used, both in science and in many other spheres of human endeavor. This progress in the emerging cyber-infrastructure has enabled scientists to perform quantitatively and qualitatively new functions to: collect and create ever-increasing amounts and types of raw data about all natural objects and phenomena; collapse the space and time in which data and information can be made available; facilitate entirely new forms of distributed research collaboration and information production; and integrate and transform the data resources into unlimited configurations of information, knowledge, and discovery. Perhaps the most significant and obvious manifestation of these developments has been the internet's effects in reducing the time and costs of producing and transmitting additional copies of data and information on a global basis to negligible levels.
Researchers in public science and engineering historically have been at the forefront of many of the basic technological advances underlying new paradigms of digitally networked information creation and dissemination activities. From their pressing needs to fashion more powerful information processing communications tools have sprung a wide array of the key elements of the "Information Society". Such advances initially included mainframe computers and packet-switched data networks, the TCP/IP protocols of the internet, online search engines, and the World Wide Web. More recent innovations have supported Grid computing and Web-based middleware platforms, as well as computer-mediated tools and techniques for scientific knowledge discovery, that facilitate the globally distributed conduct of collaborative work.
For essentially the same reasons, scientific research communities throughout the world also have begun to develop many kinds of openly available digital resources. These include open-source software, public-domain digital data archives and federated open data networks, open institutional repositories for scientific pre-prints, journal articles and educational materials, and open access electronic journals.
Examples of these initiatives are presented in Box 1 (below)[1]. Box 1 distinguishes between those initiatives that are concerned primarily with the provision of open availability conditions directly supporting collaborative research activities, and others that have focused upon open dissemination of research results. That distinction is, of course, somewhat artificial. Providing ready access to public scientific data and information is a condition for distributed research collaborations, while the archival scientific journals, like the repositories of electronic pre-prints, can properly be viewed as part of the informational infrastructure for globally distributed research, as well as an essential input into the training that equips scientists for international collaboration.
| There are many new kinds of distributed, open collaborative research and information production and dissemination on digital networks. Examples of open data and information production activities include: |
| * Open-source software movement (e.g., Linux and thousands of other programs worldwide, originating in academia, government, and industry); |
| * Distributed Grid computing (e.g., SETI@Home, LHC@home); |
| * Community-based open peer review (e.g., Journal of Atmospheric Chemistry and Physics); |
| * Virtual laboratories and collaboratories (e.g. virtual labs at Howard Hughes Medical Institute); |
| * Virtual observatories (e.g., the International Virtual Observatory for astronomy, Digital Earth); and |
| * Collaborative research Web sites, blogs, and portals (e.g., NASA Clickworkers and Wikipedia). |
| * Open data centers and archives (e.g., GenBank, the Protein Data Bank, space science data centers); |
| * Federated open data networks (e.g., World Data Centers, Global Biodiversity Information Facility; NASA Distributed Active Archive Centers); |
| * Open access (OA) journals (e.g., BioMed Central, Public Library of Science, + > 2500 scholarly journals); |
| * Hybrid OA journals (part OA, part subscription, e.g., PNAS, Springer Open Choice); |
| * Open institutional repositories for that institution’s scholarly works (e.g., the Indian Institute for Science, + many hundreds globally); |
| * Open institutional repositories for publications in a specific subject area (e.g., PubMedCentral, the physics arXiv); |
| * Free university curricula online (e.g., the MIT OpenCourseWare); and |
| * Emerging discipline-based commons (e.g., the Conservation Commons). |
New possibilities have thereby been opened for the improvement of human welfare through more efficient utilization of data and information, especially those arising from public investments in the conduct of scientific research. The digital network infrastructure, networked applications, and the myriad organizations and activities can create unprecedented opportunities for accelerating the progress of science and innovation. Taken together, they are a part of the emerging broader movement in support of formal and informal "peer production" and dissemination of information by mobilizing the cooperation of globally distributed professionals and volunteers in open networked environments.[2] Such activities are based on principles that reflect the cooperative ethos that traditionally has imbued much of academic and government research agencies; their norms and governance mechanisms may be said to characterize the public scientific communities' sharing norms in a "scientific information commons" that resembles a public domain, rather than a market system based upon private property ownership rights and control over the use of information.[3]
The benefits derived from the availability of publicly funded scientific data and information, and hence society's returns on the investments, depends upon their being used[4]. The open availability of digital resources from publicly-funded research at minimal transaction costs offers many advantages not only over secrecy, but in comparison with a closed, proprietary system that places high barriers to both access and subsequent re-use. Broad access to these publicly-funded information resources yields many benefits for society: it reinforces open scientific inquiry; encourages diversity of analysis and opinion; promotes new types of research; allows more rapid verification (or correction) of announced results; makes possible the testing of new or alternative hypotheses and methods of analysis; supports studies on data collection methods and measurement; facilitates the education of new researchers; enables the exploration of topics not envisioned by the initial investigators; permits the creation of new data sets when data from multiple sources are combined; helps transfer factual information to and promote capacity building in developing countries; promotes interdisciplinary, inter-sectoral, inter-institutional, and international research; and generally helps to maximize the research potential of new digital technologies and networks, thereby providing greater returns from the public investment in research[5].
At the same time, it is important to recognize that public policies in the developed and developing countries alike are shaped by legitimate considerations and interests that do not leave all publicly funded scientific data and information in the public domain or under immediate open access conditions. Instead, they impose limitations upon openness and cooperation in the conduct of public research and the utilization of its findings, in varying degrees and for a variety of purposes. Competing values and policy considerations include, among others: protection of national security (from full classification of information as "secret," to grey areas such as "dual use" or "sensitive but unclassified" documents); the protection of personal privacy in research that involves human subjects; the proprietary interests of private-sector parties for the protection and commercial exploitation of their intellectual property (through patents and copyrights); and the practice of allowing publicly funded researchers limited periods of exclusive use of their data prior to the publication of their research findings.
Within the scientific enterprise, copyright historically has served to create market conditions that are more encouraging to publication of scientific research and training materials than might otherwise exist, and patent protection has stimulated investment in the development and production of scientific innovations. When appropriately utilized, these and other legally protected restrictions on access to the exploitation of new information may convey important benefits to society generally, and also more specifically to the progress of science and the advancement of learning. Similarly, to permit researchers who have undertaken the effort of generating or collecting new data some periods of exclusive control over those resources - an interval sufficient for them to be able to carry out the analysis, interpretation and presentation of their findings -- is not simply a matter of moral rights or legal property rights, but should be seen as an integral part of the structure of incentives for the conduct of empirical scientific studies. In the "open science system" the rewards that come to researchers-whether reputational or directly material in nature-require the publication of "results." The disclosure of research findings is critical, of course, because that provides the basis of claims to priority in specific new discoveries and inventions. The validation and acknowledgment of those disclosures by disciplinary peers is the principal basis upon which reputations are built and rewards are allocated in the research system.
Consequently, there is a need for public policies and institutional arrangements to seek a judicious balance between positive and negative effects upon the conduct of publicly funded research that are likely to ensue from the granting and enforcing of private ownership rights in scientific and technical data and information Yet, in recent decades the policy balance in this regard has been disrupted in ways that many science policy analysts perceive as threatening the long-term vitality of fundamental scientific research.[6]
The legal protections afforded to private property owners under the copyright regime today extend far beyond the arena of printed texts and images, potentially covering all forms of digital data and information. Moreover, novel statutes in many national jurisdictions have awarded sui generis intellectual property rights protection to producers and investors in databases containing all forms of information, whether or not the contents are copyrightable[7]. The patentability of inventions and discoveries now is very broadly construed, encompassing claims to information that formerly would have been deemed "facts of nature" and hence ineligible for protection. Patents are issued on an unprecedented scale to inventors of research tools and techniques in many fields of science, notably in the biomedical and computer sciences. Technical information that in former times would most likely have been left in the public domain now tends to be appropriated swiftly as intellectual property - not necessarily in anticipation of significant streams of revenues, but instead because it has potential value as a defensive or aggressive instrument in future negotiations or litigation arising from patent infringement suits. And restrictive licensing practices and increasingly effective digital rights management technologies provide additional layers of enclosure, even beyond those conferred by statutes[8].
The creation of these new legal rights and enforcement mechanisms, while frequently rationalized in the name of scientific and technological progress, are increasingly promoted by special interests for other business objectives. They also tend to impinge upon the conduct of governmental and government-funded research, despite the public-interest objectives of those activities conducted at public expense. The past quarter-century thus has seen the emergence of a pronounced world-wide trend toward the commoditization of publicly funded research outputs, including the underlying data and information resources. This tendency has gained impetus from the intensification of global economic competition and the continuing fiscal pressures on governments, with a concomitant commercialization and privatization of functions previously conducted by public agencies, including research and the dissemination of government data and information.
The "public goods" properties of data and information, however, permit their concurrent use and reuse at negligible incremental costs by an unlimited number of users whose access to and use of the content leaves it un-depleted. Given the expansible nature of information it is unreasonable to ignore the efficiency losses on the functioning of the research system by the enforcement of intellectual property rights in digital scientific and technical data and information. The negative impacts of the barriers to information sharing and collaborations include lost research opportunity costs, as well as the time and costs involved in securing rights to use essential data and information owned by private parties. They also ramify through the system, adversely affecting both private and public rates of returns from investments in "downstream" applications-oriented R&D, curtailing the extent of the benefits derived from wider diffusion of innovations, and widening the gap between levels of scientific capabilities and innovation capacities in developing countries and those that are economically advanced.
The problematic consequences of restricting access to "research inputs" do not stem exclusively from policies and practices pursued by the rich nations. Public information regimes for scientific data produced in developing countries today are among the least open in the world. There are economic and organizational limitations of the governmental organizations for gathering and distributing such data and political restrictions placed upon disclosure of information regarding social and economic conditions. Access to scientific data and information also has been restricted because researchers and their institutions in the developing countries suspect that free and open information exchanges, like free trade, will turn out not to be "fair" trade. The marked asymmetries between rich-country and poor-country partners in the division of intellectual property rights from new discoveries and inventions, and the efforts of transnational corporations to exploit commercially various types of "indigenous knowledge" certainly have contributed to undermining an ethos of international scientific cooperation in some important areas of research, most notably in the life sciences.
Many complex public policy issues are posed by the changing balance between the benefits and drawbacks of privatization and commercialization of data and information as these affect public-sector science, along with similar trade-offs involved in the granting of intellectual property rights and the placing of government restrictions upon certain kinds of research. These policy quandaries will resist quick and simple solutions. Nevertheless, notice of the broad trends reviewed here has been sufficient to prompt increasingly frequent expressions of concern about their potentially adverse effects on the balance between exploratory science and commercially-oriented applied research, and on the sustainability of the norms of open scientific cooperation that have in the past characterized much of academic research. Some commentators have voiced the more explicit worry that continued pressures for the commoditization of information and the privatizing of scientific and technical data could significantly disrupt established scientific research practices. This could threaten the loss of those new and exciting research opportunities - indeed the very opportunities noted at the outset that are made possible by the ongoing advances in digital networks and related technologies[9].
B. Creating Global Information Commons for Science[10]
Based on the foregoing discussion, it is becoming increasingly apparent that there is another and rather different approach whose practical aspects merit wide attention and support to its further development. The proposed approach consists of the voluntary use of the rights held by intellectual property owners, which allow them to construct by means of licensing contracts conditions of "common-use" that emulate the key features of the public domain that are most beneficial for collaborative research in all its forms. The intention is to form legal coalitions for the cooperative use of scientific data, information, materials and research tools that actually are not in the public domain, and whose licensed use is therefore legally protected by an intellectual property regime. Such an undertaking may be properly described as creating a network of "global information commons for science", inasmuch as each "common" constitutes a collectively held and managed bundle of resources to which access by cooperating parties is rendered open (though perhaps limited in its extent or use) under minimal transactions cost conditions.
The economic logic and practical feasibility of the "contractually constructed commons" approach to counteracting the deleterious effects of encroachments made upon the public domain by the granting of intellectual property rights rests on three sets of propositions. First, as noted, data and information have special, "public goods" properties that make them very different from physical resources like land. Hence the economic case for private ownership of intellectual property rights cannot be based on analogous reasoning from the case of land and other exhaustible resources that are subject to being degraded or destroyed by "over-use." Second, even tangible resources such as land, when they are not privately owned, may be and have been managed well under systems of common-use rights. Because common-use can be regulated by non-market mechanisms constructed as systems of customary rights and restraints, historically it was deliberate acts of private enclosure rather than some tragedy of over-grazing that often spelled the end of the agrarian commons. Third, the legal system today makes it possible for the owners of a tangible resource held in common to protect their collective use-rights, and manage their contractually constructed common-pool so as to sustain and augment the benefits that it yields. Consequently, because information cannot be depleted by overuse, individuals having private ownership rights in intellectual property may voluntarily use contracts to construct a common use-rights area that is all inclusive, in granting access to those wishing to use the contents. Because the common in this case is owned and not part of the public domain, the benefits that all users can enjoy from such an arrangement may be preserved and enhanced. This can be accomplished by reserving the legal right to exclude certain usage practices that might otherwise undermine the willingness of others to similarly pool the information that they have created.[11]
The respective rights of the participants in the public research system can be mediated most effectively through the use of contracts at the individual researcher, institutional, and governmental levels. Common-use licensing approaches that promote broad access and reuse rather than restrict it, such as those being developed by the new Science Commons under the Creative Commons (see http://science.creativecommons.org), can preserve essential ownership rights while maximizing the social benefits and returns on the public investments in research. They can help to achieve a productive balance between the domains of proprietary R&D and publicly funded open science, particularly in an increasingly protectionist intellectual property environment.
Indeed, several intergovernmental and international scientific bodies[12], national science policy and funding organizations[13], and major research institutions[14] have begun to look into the issues and cooperative mechanisms raised here. They are starting to develop new policies and procedures for improving access to digital scientific resources. However, much more remains to be done to help the publicly funded research community to rationalize and improve the efficiency and effectiveness of this system.
The rationalization of policies and practices across nations, institutions, and disciplines may be expected to result in much greater social and economic impact from the investment in public research overall by enabling greater access to and use of scientific data and information resources, and by facilitating interdisciplinary and international cooperation in public science and education. Because of the international scope of digital networks and research collaborations, strategic international approaches for building information commons are both necessary and desirable. In short, the adoption in recent years of the many innovative and promising open initiatives from the bottom up, as indicated in Box 1, coupled with the introduction of some new top-down legislative and policy proposals in several countries and inter-governmental organizations, make this an appropriate time to launch the Global Information Commons for Science Initiative. Such an Initiative can help devise and promote new social and legal structures that will be especially well-suited for the future conduct of collaborative research in many domains of science.
III. PRELIMINARY WORK PLAN
A. Summary of Activities under the Initiative
The Global Information Commons for Science Initiative will concentrate on research and analysis, promotion of successful policies and practices, and coordination of activities among the participating organizations and stakeholders, with particular emphasis on "common use" licensing approaches. Because the primary purview of CODATA is scientific and technical data, the initial focus will be on interdisciplinary and international data resources, primarily in the public sector. The addition of information resources for developing integrated information commons will be accomplished over time through its collaboration with other organizations, as discussed below. The Initiative will serve as an interface between the national and international organizations that guide, manage, and fund public research activities from the top-down, on the one hand, with the organizations that have been developing innovative legal and institutional mechanisms for improving the social and economic benefits of such research, on the other. Toward these ends, the Initiative will promote the four principal goals described below, together with a spectrum of related activities.
1. Improve understanding and increase awareness of the societal and economic benefits of easy access to and reuse of scientific data and information online, particularly those resulting from publicly funded research activities.
The research and analysis activities under the Initiative will be focused on several broad areas of inquiry that are necessary to better understand and promote easy access to and use of publicly-funded digital data and information, as noted below. Primary attention will be given to improving the understanding of institutional models that can provide open availability online on an economically sustainable basis and on the legal mechanisms and contractual templates that can be adopted and used more broadly. Substantial effort will be devoted to explaining the meaning and importance of the concepts associated with this Initiative. Some of these, like the common-use licenses, are relatively new and complex, while others, such as the boundaries between "public" and "private" are difficult to define precisely and shift over time and place. The factors involved in developing information "commons" in different scientific contexts, and the related benefits and costs will be examined in detail as well, using a combination of quantitative and qualitative assessment methodologies. The following areas have been identified at the outset:
- Institutional and management aspects (e.g., development and implementation of different commons models, best practices, public/private interfaces, processes for knowledge transfer and diffusion);
- Legal and policy aspects (e.g., international agreements, enabling statutes and regulations, permissive licensing approaches, intellectual property and information policies at the inter-governmental, inter-institutional, and scientific peer-to-peer levels);
- Economic considerations (e.g., strategies for long-term sustainability of commons models, cost/benefit analyses);
- Technical infrastructure requirements related to providing open availability (e.g., technical and semantic interoperability, legal metadata standards); and
- Development issues (e.g., analysis of the different conditions of access to and use of scientific data and information by developing countries, capacity building goals and requirements, and related cultural and sociological considerations).
It should be understood that that not all of this research will be able to be performed directly or concurrently within the Initiative. Subject to the availability of funds and the interest of sponsors, the managers of the Initiative will commission selected experts to produce white papers, convene workshops and symposia with policy analysts and practitioners, and in particular leverage the resources of the partner organizations and other stakeholder groups to conduct a coordinated suite of studies, based upon an agreed research agenda. The results of these and other related studies will be made available through the Initiative's "open access knowledge environment."
2. Identify and promote the wide adoption of successful institutional and legal models for providing open availability on a sustainable basis and facilitating reuse of publicly-funded scientific data and information.
Another major objective of the Initiative will be a comprehensive cataloguing and characterization of different institutional and legal models of data and information access, with a view to facilitating the broad promotion of successful examples of such activities in analogous discipline and institutional contexts within the scientific community. It is important to emphasize that openness can be achieved in many ways, and with different costs and benefits, as will be described and analyzed under objective #1 above. The Initiative's open access knowledge environment will serve as a clearinghouse for these many examples, either linking to other such compilations of information that already exist (e.g., the Lund University registry of open access journals or the Southampton University registry of open access institutional repositories) or developing new ones (e.g., a registry of open access data centers and networks).
3. Encourage and help to coordinate the efforts of the many stakeholders in the world's diverse scientific community who are engaged in devising and implementing effective approaches to attaining these objectives, with particular attention to the circumstances of the developing as well as the developed countries.
The managers of the Initiative will work closely with the affiliated organizations and their large constituencies in the global public research community to establish effective channels of communication with them, and to help coordinate their efforts in pursuing common(s) objectives. There are already many research programs and institutions working in this area, as indicated in Box 1 above. The Initiative will avoid duplication of effort with its collaborating organizations, as well as with the many other groups working on these issues. The primary stakeholders in this Initiative include all researchers worldwide and those associated with the public research system, including governmental science policy and funding agencies, governmental research organizations, universities and not-for-profit research institutes, science and engineering academies, learned and professional societies, publishers and other information disseminators, research libraries and archives, data centers, and individual researchers and information specialists. The international scientific organizations initially collaborating on this Initiative are broadly representative of many of the primary stakeholder constituencies, as outlined below, while many others will be added over time to improve coordination and communication. The Initiative will work directly with the various stakeholder groups in pursuit of its objectives, as well as indirectly, by leveraging the resources of the partner and affiliated organizations.
4. The Open Access Knowledge Environment
An online open access knowledge environment will be developed to provide vigorous support for the Initiative's objectives. This knowledge environment will use established and emerging open Web and grid services and applications with enhanced functionalities, allowing users to move among and easily integrate different information resources in pursuing the established objectives. The knowledge environment thus will be the conduit for providing the information output and outreach from the Initiative, as well as the mechanism through which external participants can interact with the Initiative and with each other. More specifically, this online activity would establish and maintain an Internet portal and knowledge environment with (at least) the following capabilities.
- Up-to-date descriptive and contact information for the Initiative and its participating organizations.
- Descriptions of past, ongoing, and proposed projects and activities.
- Links to publications produced under the Initiative and links to annotated bibliographies of publications (papers, presentations, reports, reviews, etc.).
- Annotated links to relevant external resources (institutional repositories, similar initiatives, investigations of new services, etc.).
- Frequently updated feed of GICSI and related news items (project and event announcements, publication releases, and reports of (with links to) related developments in information access policy, business, and technology, including a calendar of relevant events.
- Support for a research "knowledge environment", including individual weblogs to support international discussion and evaluation of GICSI projects, proposals, events, etc.; wiki work spaces focused on key research issues; access to topically related data and information resources to enable research; and other related functions.
- Support for a social network of interested participants, including individual profiles and e-mail listservs (consistent with all applicable privacy protection laws and policies).
B. Structure of the Work Plan
The GICSI work plan may be divided between those activities carried out under "Core and Continuing Programs" and "Special Projects".
1. Core and Continuing Programs
The suite of core and continuing programs are those programmatic functions that are essential to the success of the Initiative and that will operate continuously for its duration. These program elements comprise all strategic international functions, including:
- The core and continuing tasks identified above under the four principal objectives of the Initiative, including the online open access knowledge environment and its supporting infrastructure;
- An annual three-day GICSI Stakeholders' Conference;
- An annual two-day meeting of the Advisory Board, which will immediately follow the GICSI Stakeholders' Conference; and
- The Secretariat staff functions in support of the core and continuing programs.
2. Special Projects
Special projects are those undertaken in addition to core and continuing programs. Special projects are established by the CODATA Executive Board in consultation with the Secretariat and expert Advisory Board, and in response to sponsor interests. These may include commissioned papers by experts or the organization of various workshops, working groups, or seminars in well-defined topical areas. Possible special projects in the near term include:
- Development of a registry of open access data centers and networks;
- Operational implementation of Creative Commons copyright licenses for scientific information produced by the collaborating organizations of the Initiative and their constituencies, as well as through other affiliated stakeholder groups;
- Development and implementation of Creative Commons licenses for databases and data sets;
- Development of metrics and indicators of the benefits and costs of open access policies;
- The assessment and promotion of best practices for archiving data linked to publications;
- The development of a description language for scientific tools (so that search engines such as Google can be deployed by scientists looking for tools inside the commons).
- Commissioned white papers in support of issue areas identified under GICSI objective #1;
- Support of specific regional (and perhaps national) initiatives.

