Will Big Data DNA analysis herald new era in medicine?

Computer graphic of a DNA autoradiogram and a human head.

Often, you are barely aware of it, but hop on a train, spend some time in the shops, watch a movie or a match or visit your GP, and the chances are you will have contributed to half a dozen collections of mass data.

Government and companies now collect, store and analyse as much information as they can about the way we interact with them.

Their goal is the pursuit of efficiency, and to find ways to save, or make, money. There is even a phrase for it - "big data".

The idea is not just to collect this data, but to analyse it.

Take healthcare. In December 2012, the government announced a big data plan for perhaps our most intimate of data, the DNA read-out of 100,000 people with rare diseases and cancer.

kshire Police's new technology predicts cri

It is a colossal sequencing effort. Not only does each patient have a unique DNA code, but so do their cancer tumours. And some patients will respond to certain drugs better than others, depending on the genetic variants they carry.

The claim is that a mass DNA database could herald a new era in medicine, and make the nation richer too. Aside from highlighting British innovation and attracting investment, the initial focus is to help people who are already sick.

For the rest of us, the argument goes, if enough people are on the database, trends will become clear.

So we could be more confident that our personal DNA read-out can be checked against those trends and might warn us we are more at risk of certain diseases, and do something about it like changing our lifestyle of getting screened.

We might also be able to avoid drugs known to be toxic in people that carry a similar genetic make-up to our own.

Prof Sir John Bell is one of the government-appointed "champions" for the Life Sciences industry, and chair of the government's Human Genomics Strategy Group. He sees genomics in the NHS as a vital tool and said it is quite a "dramatic change in the way that medicine is likely to evolve".

A graphic of DNA
The struture of DNA was discovered in 1953

The big data at the heart of this is the DNA double-helix.

It is made of four chemicals - essentially a code with four letters. The string of letters that spells out a human being is huge - it took about eight years and cost billions of dollars, to unravel the first human genome.

But now, the computer technology that made that possible is far more powerful, and cheaper.

These days, it takes a little over a day to unravel the DNA sequence of a single individual. And though it is not yet possible, there is talk of a £60 price tag.

Aside from cheaper, more powerful technology, it is also scale that brings the real power.

If the plan takes off, then the sheer numbers of patients involved will allow researchers, both public and private, to ask all sorts of questions of the dataset.

The NHS already has big data projects in place, notably, a system that enables scientists to carry out research on our clinical information, once anonymised, and smaller scale genetics research databases, such as UK Biobank, but what is new is the idea of bringing all of this together.

Genetic testing
A national gene database might aid epidemiology

"The great thing about the UK, and particularly the English NHS, is 50 million people and it's at that scale that you're probably going to have the power to detect all kinds of things that are very powerful in terms of the management of disease, and have quite a profound impact," said Prof Bell.

He said he does not stand to make any money from the project himself, though he told us he sits on the board of Roche and Genentech, pharmaceutical companies which may benefit from genomics being applied more widely in healthcare in the UK.

There is some agreement that having genetic information from somebody who is already sick can help to find the best treatment for them.

What is less clear is how much the entire genetic read-out of a healthy person can tell you about the illnesses they might get in the future.

Just because someone carries a particular change in their genes, in most cases, it is far from definite that they will go on to become ill.

We are into the realm of probability and risk, which are notoriously difficult to assess and convey.

Identification by data

There is also the issue of privacy.

Professor of security engineering at University of Cambridge, Ross Anderson, has been asked by the Nuffield Council on Bioethics to join a team that will examine the pros and cons of big data and genomics in the NHS.

The government says the information in the new database will be anonymised. But if it is linked to medical records, that will bring new risks, according to Prof Anderson.

He said medical data is especially hard to protect because it is so rich in information and his primary concern is that individuals, and their data, could be identified by a process of triangulation: "If you look at the typical person's medical record they may have some things that are known to their friends and family, such as that you broke your leg on the 17 January 1991, and some things that you don't want all your friends and family to know, such as that you had a treatment for depression.

"The problem is that if you make de-identified medical records available, then everyone from whom the subject wants privacy knows part of the record - namely the leg break, which is enough to identify that record out of many records - and they can therefore get access to the sensitive information, namely the treatment for depression."

Prof Bell said there are already robust methods in place to protect people's privacy in medical research which rely in part on limiting access to the data to trusted research partners.

"You probably can't get around the issue that no data in any setting is absolutely anonymised and secure," he said.

"But I think the constraints in the system that have already been thought about for other types of clinical data are probably pretty secure."

That is not enough for Prof Anderson, who wants the government to make details public.

"What we actually need is for anonymisation mechanisms to be open to the public, so that we can work out for ourselves whether the protection is adequate.

"I want to be able to test them. I want to be able to kick the tyres, and if the government's lying, I want to expose them, and embarrass them for it."

Bioethicist Stuart Hogarth, of King's College London, said he is not sure people are ready: "The 'grand bargain' that the government is offering us is that if we give them our DNA then they are going to revolutionise healthcare.

"Well it's not clear in fact that we need so much genomic data to understand the genetic basis of health and disease.

"It's not clear that the government has the capacity to put in place the large-scale IT project of the sort that would be necessary to do this, and it's not clear that the British public is willing to accept that bargain."