Health Data Palooza (#hdpalooza) — Hashtags of the Week (HOTW): (Week of June 2, 2014)

Health Data Palooza

“Health Datapalooza is a national conference focused on liberating health data, and bringing together the companies, startups, academics, government agencies, and individuals with the newest and most innovative and effective uses of health data to improve patient outcomes. … The hallmark of the event is a national competition that searches for the best and most innovative uses of health data in apps and products. The competition culminates in live demonstrations of the winning applications to Health Datapalooza attendees.”

Continue reading

The need for a smart approach to big health care data

From the Health Affairs Blog:

Today, academic medicine and health policy research resemble the automobile industry of the early 20th century — a large number of small shops developing unique products at high cost with no one achieving significant economies of scale or scope. Academics, medical centers, and innovators often work independently or in small groups, with unconnected health datasets that provide incomplete pictures of the health statuses and health care practices of Americans.

Health care data needs a “Henry Ford” moment to move from a realm of unconnected and unwieldy data to a world of connected and matched data with a common support for licensing, legal, and computing infrastructure. Physicians, researchers, and policymakers should be able to access linked databases of medical records, claims, vital statistics, surveys, and other demographic data. To do this, the health care community must bring disparate health data together, maintaining the highest standards of security to protect confidential and sensitive data, and deal with the myriad legal issues associated with data acquisition, licensing, record matching, and the Health Insurance Portability and Accountability Act of 1996 (HIPAA).

Just as the Model-T revolutionized car production and, by extension, transit, the creation of smart health data enclaves will revolutionize care delivery, health policy, and health care research. We propose to facilitate these enclaves through a governance structure know as a digital rights manager (DRM). The concept of a DRM is common in the entertainment (The American Society of Composers, Authors and Publishers or ASCAP would be an example) and legal industries.  If successful, DRMs would be a vital component of a data-enhanced health care industry.

Read the complete blog post here.

ICPSR 2nd Annual Data Seal of Approval Conference

Second Annual Data Seal of Approval Conference
October 8, 2013, in Ann Arbor, MI
In cooperation with the Interuniversity Consortium for Political and Social Research (ICPSR) Biennial Meeting of Official Representatives conference “Beyond Access: Curating Data for Discovery, Re-Use, and Impact,” October 9-11

Please join us for this year’s Data Seal of Approval conference, which is open to all:

Theme:      Data Seal of Approval Conference 2013

Date:         Tuesday, October 8, 2013

Location:    Ann Arbor, MI, on the campus of the University of Michigan

The Data Seal of Approval is an initiative to provide basic certification to data repositories. Receiving the DSA signifies that data are being safeguarded in compliance with community standards and will remain accessible into the future. The DSA and its quality guidelines are relevant to researchers, organizations that archive data, and users of the data.

We have an interesting program planned for the conference. Topics will include:

  • Information on the Data Seal of Approval, including how to apply for the DSA
  • The World Data System and how it compares with the DSA
  • Case studies
  • The Research Data Alliance (RDA) work on certification

Speakers will include experts from the field of digital preservation. The full DSA conference program can be found here:

Attendance at the DSA Conference 2013 is free of charge. Please register at

All DSA participants are also invited to attend the ICPSR Biennial Meeting of Official Representatives taking place October 9-11. You can register for that meeting at the above link as well. The meeting program is available at:

Pharma & Data Transparency

"self portrait" by shawnchin  (2005) CC BY-SA 2.0

self portrait” by shawnchin (2005) CC BY-SA 2.0

I absolutely adore those instances when my job and my degree dovetail, so reading about the pharmaceutical industry’s latest proposal for more transparent data was quite interesting. The interplay between data transparency to validate the effectiveness of clinical trials and protecting the privacy of the patients involved in those trials is certainly proving to be a delicate balance to strike.

The New York Times reports:

Representatives of the world’s biggest pharmaceutical companies pledged … to release detailed data about their drugs to outside researchers, a move that was applauded by some but also seen as an effort to head off more extensive disclosure requirements that are under review in Europe.

Yet, just a few months ago the Pharmaceutical Research and Manufacturers of America (PhRMA) had released a statement in response to Dr. Ben Goldacre’s Bad Pharma, which criticized the recommendations for transparent clinical trial data as “encourag[ing] second-guessing of the regulatory approval process, which would be disastrous for patients,” and could potentially “jeopardize patient privacy and could serve as a deterrent to individuals considering participation in trials.”

The New York Times goes on to point out that “proponents say doctors and patients need independent information — not just that provided by manufacturers — about the risks and benefits of drugs.” Meanwhile, there is already skepticism brewing in the wings. These developments should prove quite fascinating to follow in the months leading up to the proposed January 2014 adoption date.

MEPS data users’ workshop September 23-24, 2013

AHRQ will conduct a free two-day hands-on MEPS Data Users’ Workshop in Rockville, MD, on September 23-24, 2013.

Day 1 of this workshop will consist of lectures designed to provide a general overview of the Medical Expenditure Panel Survey (MEPS) including information about survey design, file content, and the construction of analytic files. Particular emphasis will be on health care utilization, expenditures, and medical conditions.

Day 2 of the workshop is intended to give hands-on experience to participants. A laptop computer will be provided to each participant. The participants will apply the knowledge gained from the previous day’s lectures and work with programmers and analysts on MEPS data. They will learn how to identify and pull together variables to build a data file to answer their research questions. SAS example exercises will be demonstrated. There will be time allotted for open discussion and for answering specific research questions from participants. To fully benefit from the second day, participants should have some prior knowledge of MEPS. A basic knowledge of SAS is desirable, but not essential.

More infomration and registeration will be available by the end of July on the MEPS Web site:

Structured Data comes to Wikipedia

There’s been a lot of excitement recently about unlocking data’s potential. It seems I can’t turn around without seeing another article about Big Data (the buzzword of 2012/2013?). We’ve written about some of the data-related changes in the pipeline, for librarians particularly.

WikidataSo when I received the latest press release from Wikipedia (via Wikimedia Announcements) that the “Wikidata revolution is here”, I wasn’t particularly surprised.

Although this is a decidedly different data beast than much of the research data (and its management) that we’ve written about before, I think the Wikidata project may serve as an interesting, illustrative example:

“Wikidata is a powerful tool for keeping information in Wikipedia current across all language versions,” said Wikimedia Foundation Executive Director Sue Gardner. “Before Wikidata, Wikipedians needed to manually update hundreds of Wikipedia language versions every time a famous person died or a country’s leader changed. With Wikidata, such new information, entered once, can automatically appear across all Wikipedia language versions. That makes life easier for editors and makes it easier for Wikipedia to stay current.”

It is easier to understand by looking at some actual examples (or, the “Further Information” section of the release):

From what I gather, this will facilitate data mining within Wikipedia immensely. Check out a couple of the incredibly early versions of what may come from this method of structuring data:

Are you an avid Wikipedian? Get involved!


The Legal and Policy Landscape for Research Data

The Interuniversity Consortium for Political and Social Research (ICPSR) presents a webinar on April 25th entitled, “The Legal and Policy Landscape for Research Data.”

Description: Intellectual property and associated policy issues surrounding data sharing, use and reuse can be tricky to understand and apply in academic research settings. In this Webinar, MacKenzie Smith will discuss the legal and policy landscape for research data, including clarifying laws governing data and explaining challenges and opportunities to improving data governance from a non-lawyer’s perspective.

About the presenter: The Webinar will be presented by MacKenzie Smith, University Librarian at UC Davis and research fellow for the Creative Commons organization, where she has worked extensively on data governance and intellectual property policy for data, and particularly scientific research data.

This webinar is free and open to the public.
Title: The Legal and Policy Landscape for Research Data
Date: Thursday, April 25, 2013
Time: 3:00 PM – 4:00 PM EDT

Space is limited.
Reserve your Webinar seat now at:

After registering you will receive a confirmation email containing information about joining the Webinar.
System Requirements
PC-based attendees
Required: Windows® 7, Vista, XP or 2003 Server
Mac®-based attendees
Required: Mac OS® X 10.6 or newer
Mobile attendees
Required: iPhone®, iPad®, Android™ phone or Android tablet

Curating & Managing Research Data

Effectively managing research data is no small feat, yet it is becoming an ever more important component in the research process. Last summer we wrote about the topics to watch in librarianship as reflected in the 2012 Medical Libraries Association conference, and data management was one of the star features.

Fortunately for us at the University of Michigan, we have one a preeminent organization on campus dedicated to data management and (the coolest part, I think) re-use: the Inter-University Consortium for Political and Social Research, or ICPSR. ICPSR is offering a summer workshop on data curation and management:

Location: ICPSR — Ann Arbor, MI

Date(s): July 29 – August 2, 2013

Time: 9:00 AM – 5:00 PM

Instructor(s) include:

From ICPSR’s course announcement:

This workshop is for individuals interested or actively engaged in the curation and management of research data, particularly data librarians, data archivists, and data stewards with responsibilities for data management. The course will assist individuals working with data to apply efficient curation practices to ensure the usability and safekeeping of data resources. Participants will learn about best practices for managing research data, how to apply them to daily operations, and the types of tools that can assist in curation efforts.

This five-day workshop will explore and apply the concepts and benefits of life cycle principles for data curation, from preparing data for archiving to optimizing data for reuse. An ICPSR social science dataset will serve as a case study and participants will track the dataset as it makes its way through the ICPSR data pipeline. Examples from other scientific domains will also be integrated. Participants will learn about data review and preparation, confidential data management, effective documentation practices, how to create, comply with, and evaluate required data management plans, and repository requirements and assessment. Emphasis will be placed on hands-on exercises demonstrating curation practices and on small group discussions for sharing local experiences and learning from others.

An earlier version of this course, titled ‘Applied Data Science: Managing Research Data for Re-Use’, was offered in 2012.

Fee: Members = $1500; Non-members = $3000

NIH to recruit Associate Director for Data Science

Reposted from the NIH News:

For Immediate Release
Thursday, January 10, 2013
NIH Office of Communications

NIH to recruit Associate Director for Data Science
National Institutes of Health Director Francis S. Collins, M.D., Ph.D., today announced plans to recruit a new senior scientific position, the Associate Director for Data Science. The associate director will lead a series of NIH-wide strategic initiatives that collectively aim to capitalize on the exponential growth of biomedical research data, such as from genomics, imaging, and electronic health records. Dr. Collins recently charged a working group of the Advisory Committee to the NIH Director (ACD) to examine the growing data and informatics challenges associated with biomedical research. One of the major recommendations made by that working group in June 2012 is the creation of a new NIH leadership position focused on data science.

“There is an urgent need and increased opportunities for advanced collaboration and coordination of access to, and analysis of, the rapidly expanding collections of biomedical data,” Dr. Collins said. “NIH aims to play a catalytic lead role in addressing these complex issues — not only internally, but also with stakeholders in the research community, other government agencies, and private organizations involved in scientific data generation, management, and analysis.”

Dr. Collins has asked Eric Green, M.D., Ph.D., to serve as the Acting Associate Director for Data Science. Dr. Green was appointed as the third director of the National Human Genome Research Institute (NHGRI) in 2009. Dr. Green has been at the forefront of efforts to map, sequence, and understand eukaryotic genomes. He played a leadership role in the Human Genome Project and subsequently pioneered work in comparative genomics that provided important insights about genome structure, function, and evolution. Among his many honors, Dr. Green was inducted into the Association of American Physicians in 2007, and received the Cotlove Award from the Academy of Clinical Laboratory Physicians and Scientists in 2011 and the Wallace H. Coulter Lectureship Award from the American Association for Clinical Chemistry in 2012. He will continue to serve in his current role at NHGRI while serving in this acting leadership position…

Read the full news item here.

ICPSR Life of a Dataset Webinars

The Inter-university Consortium for Political and Social Research (ICPSR) announced their webinar series, “Life of Dataset,” for January 2013.  Read the full announcement below:

Join ICPSR  for a webinar series titled, “Life of a Dataset.”  This 3-part series will describe ICPSR’s data management approach starting with the identification and deposit of data, to processing and cleaning, and finally to building metadata and tools to ready the data for discovery and dissemination.

Whether you are collecting data, planning on sharing (depositing) data to meet funding agency requirements, or interested in repository and archive management, you will find lots of informative content and guidance.  Q&A will be available during each session.

Interested in all three sessions?  Please note you must register for each session individually.

Life of a Dataset Part I: The Deposit – Wednesday, January 16, 2013, at 1 pm EST

Register here:

This stage in a dataset’s life will discuss and describe the opening steps it takes on the path to dissemination through the Web site at We’ll cover the method of identifying data for acquisition, depositing of study materials (ingest), and the technical support that the Acquisitions unit provides to researchers. Tools for the researcher will be reviewed and the tools that are used by this unit will also be shown.

Life of a Dataset Part II: The Process of Processing – Wednesday, January 23, 2013, at 1 pm EST

Register here:

This stage in a dataset’s life will describe the process of how the data are reviewed, cleaned, archived, and prepared for dissemination. The working relationship between the processor and the processing supervisor to ensure the confidentiality of respondents will be examined. The tools used by the processor to insure data security and accuracy in processing will be discussed.

Life of a Dataset Part III: Discovery & Dissemination – Wednesday, January 30, 2013, at 1 pm EST

Register here:

This stage in a dataset’s life will relate how the study’s description is built and the metadata are enriched to aid discovery. We’ll consider how the codebook and other materials are created, and what final steps are taken to preserve the digital data in advance of their release. Finally, we will talk about how the study is released, announced, and disseminated through the ICPSR Web site.

Registration information regarding this webinar series is also summarized on ICPSR’s Announcements page: