Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Dear Google, am I pregnant? (antipope.org)
131 points by cstross on March 3, 2014 | hide | past | favorite | 56 comments


Side note: In the USA, doing this would be a federal offence under Title II of HIPAA.

It would probably depend on whether the consultancy got a BAA signed by Google. That would obligate Big Daddy G to a host of security and privacy requirements, but would extinguish the HIPAA liability for the consultancy with regards to the data while it was in Google's care. You can think of it sort of like cert-chaining -- the government wants to see a chain of BAAs linking all subcontractors that data stops at from the original covered entity (e.g. a hospital).

Source: I have signed a few and gotten a few signed. Talk to a lawyer if you need implementation advice on this, though.


There is another loophole that is admittedly unlikely (and the post doesn't go into any details on what the actual records contain). If these records were somehow scrubbed of HIPAA identifiers, then it would in fact not be a HIPAA violation in the US. For example: a dataset of randomly assigned ID's tied to diagnosis codes. You could uniquely ID an individual within the dataset but not know who they were in the real world.

I hear all the privacy folks lighting their torches and sharpening the pitchforks. So, for the record, yes, there are all sorts of methods and studies that show you can potentially re-identify people from all sorts of data that seems at first blush to be not that identifiable [1] and isn't part of the list of HIPAA identifiers. However, in the US, in actual practice, when you talk to compliance people, they often take a very narrow view of what "identifiable" is. The standard is often that it has to be more or less trivial to do. For example, matching on easily accessible public records.

I encounter this all the time in my capacity as a biomedical researcher and have discovered that my "geek intuition" on what is identifiable does me no good in this space. The most crazy one is your DNA sequence. I'm having trouble finding the original document now, but Health and Human Services went out of their way to not make this a formal HIPAA identifier (except in very narrow cases relating to insurance companies) when they had the opportunity to do so during some recent rule-making. Which you would think it would clearly be because HIPAA allows for "other biometric identifiers" and what could be a better biometric than your DNA? But I digress...

One of the problems with HIPAA is that it leaves a lot to the eye of the beholder, and many beholders have wildly differing vision. This, as you state, is why you need a lawyer who can make sure your vision doesn't lead to decisions with a high probability of business-ending bankruptcy and going to jail.

[1] http://arstechnica.com/tech-policy/2009/09/your-secrets-live...


From the article:

"And what they uploaded was the entire shooting match—full personal medical records indexed by NHS patient number—with enough additional data (post code, address, date of birth, gender) to make de-anonymizing the records trivial."

So I'm guessing that they are not properly scrubbed.


Thanks, I missed that on my initial read somehow.


> when you talk to compliance people, they often take a very narrow view of what "identifiable" is. The standard is often that it has to be more or less trivial to do.

Frankly, I would expect those compliance people to be disciplined for this obvious neglect of their duties.

(or whoever is responsible for the custom that 'identifiable' means 'identifiable to a 2 year old')

This reminds me strongly of ethically obviously-wrong tax avoidance schemes. "Yes, it's OK to pretend you're a used car salesman for tax purposes. There's nothing illegal about it." Let's get real.


This stuff isn't done in a vacuum by compliance offices. It's done with guidance from HHS. HIPAA has a lot of stuff that is not clearly defined. As a result, it's important to be keeping with the spirit of the rule or HHS will come after you. The analogous healthcare loophole scenario you describe would not hold water with HHS.

Again, my perspective is from the biomedical research world for which the HIPAA privacy rule gives certain limited affordances for communicating patient data that is de-identified to other institutions. Without that safety valve of de-identification being fairly reasonable, there are tons of research studies that would not be allowed to go forward. There is a point where the very tiny risk of re-identification is vastly outweighed by the good of a research study going forward. This is what HHS and institutional review boards struggle with all the time.


> (or whoever is responsible for the custom that 'identifiable' means 'identifiable to a 2 year old')

You mean the American people? If you don't like that definition, propose a bill and get signatures.


While I agree with the overall thrust of the article, I have to take issue with the author's ostensibly-BigData-access-enabled disaster scenarios.

Because let's be honest... From which source would the data be more likely to be stolen to enable the author's scenarios? Google's infrastructure (post-hardening against intrusion by---among other entities---GCHQ)? Or the NHS's system (which has been proven to---among other things---hand data on every NHS patient in England and Wales out to contractors on 27 unauditable, non-remotely-deletable DVDs)?


While I agree with your overall point that Google security is pretty big, I have no reason to think they did anything else than copy the data to Google, and that the original data is not still hosted in their infrastructure.


See the update to the post - apparently the data was made publicly accessible. It's not about Google per se. It's about what this says about the consultancy's laissez faire attitude to dealing with this data.


The addendum to the post appears to be a totally unrelated NHS security breach, which actually strengthens the point made by the poster-above-but-one. Further down the page still, in the comments section, I saw this:

To quote from the report where PA admitted this is what they had done:

" ...upload it to the cloud using tools such as Google Storage and use BigQuery to extract data from it. As PA has an existing relationship with Google, we pursued this route (with appropriate approval). This showed that it is possible to get even sensitive data in the cloud and apply proper safeguards."

So they are claiming that they (a) had permission to do this and (b) steps were taken to secure it. It may also be that PA via partnership with Google have access to a more secure version of BigData.

Don't get me wrong, I've sold proprietary data to big-name management consultants before and know they don't always respect licensing restrictions, but at the moment the suggestion that PA have actually done anything wrong here is pure conjecture


I think Google has direct line to NSA, so yes the Google.


The article wasn't positing scenarios enabled by a nation's intelligence arm having access to the health records; it was positing burglary gangs, religious organizations, and insurance companies having (and abusing) access to the data. I suppose one could posit a scenario where an intelligence agency abuses access to the health records of citizens of another nation, but the scenarios highlighted in the article are not enabled by PA uploading the data to Google's servers. I'd even go so far as to argue that the specific scenarios highlighted in the article are LESS likely if PA uploaded the data to Google's service and then destroyed the originals; "27 DVDs" is a format far more likely to be stolen by a burglary gang.

Assuming the author's scenarios are enabled by NSA access assumes a much, much larger global conspiracy against the people of England and Wales than I'd be willing to assume.

(This thought process even grants the assumption that the NSA has some kind of clear-text access to the data in Google's services these days, which is also not an assumption I'd make.)


> I think Google has direct line to NSA,

Source?


The title is ironic, since Google certainly knows if you are pregnant based on your search history. Target knows if you are pregnant based on purchases of prenatal vitamins, and does special marketing:

http://www.forbes.com/sites/kashmirhill/2012/02/16/how-targe...


Ugh this specific example has been used time and time again and it irritates the crap out of me. This is not some accomplishment of big data or even beyond trivial to figure out.

Example Google searches:

    > 1 month before conception

    best days to conceive
    conception calculator free
    best iphone conception reminder
    best pregnancy tester
    what to do when planning to concieve
    planning for baby

    > 1 month later

    signs that youre pregnant
    how many days after missed period should i check if i'm pregnant
    best pregnancy test 2014
    reliable pregnancy tests 2014

    > 2 days later

    best prenatal vitamins 2014
    obgyn in <city> <state>
    what to eat when pregnant
    can i eat eggs while pregnant
    medicine safe while pregnant
    when to tell family pregnant
    when to tell friends pregnant
I mean, come on! That's not an accomplishment of statistical analysis or big data.

For Target, it's even easier considering that making a purchase is considered the holy grail of all of the Internet's efforts. I mean, when else are you going to be buying pregnancy tests and then prenatal vitamins within months of each other? Not to mention the other things like baby name books, What To Expect..., etc. that a future parent would buy at Target.

How about a real challenge like determining paternity via search histories?


Perhaps my sister is pregnant and she uses our family computer with a shared account?


Does "big data" mean anything? I feel like we are long past this phrase losing all meaning. How is 125GB "big"? This is still well within "keep it all in memory" sizes of data. Surely the indexes for this data could be kept in memory by commodity hardware with something as simple as MySQL as the storage engine.


Nowhere in the article did I see "big data", or "big" for that matter.


125Gb "after compression". Not sure what format the data was in, but it could be much bigger in raw form.


"considerably stronger than the implicit privacy rights acknowledged in the US, which are not enumerated directly by the US constitution"

The Constitution was meant to enumerate the powers of the government, not the people. Sadly, our courts have let us down and now we must rely on the enumeration even if the 10th amendment says "The powers not delegated to the United States by the Constitution, nor prohibited by it to the States, are reserved to the States respectively, or to the people."

Privacy was historically understood and thought to need no additional enumerations. Check the writings of the founding fathers and British common law at the time.


Their Data Protection Register submission confirms that "When this is needed information may be transferred to countries or territories around the world". For those not familiar with the UK Information Commissioners Office (ICO) process, when submitting you can choose between transferring data just within the UK, within the EEA and worldwide.

http://ico.org.uk/ESDWebPages/DoSearch?reg=440927

http://ico.org.uk/ESDWebPages/DoSearch?reg=251239


Should be - "Dear US Govt. am I pregnant?"

It's entirely likely some three letter agency intercepted that upload and has all the data for itself.


On a vaguely related note, I recently discovered that "am i pregnant" seems to be a popular search term on Google: https://twitter.com/wundercounter/status/430175442432053248


Definitely bad behavior but I wonder how much of it is attributable to the cloud story that companies have been selling - that all storage and analysis belong in the cloud.


Funny thing is, Target or another retailer might know because of your purchase history and would that be a violation under some laws currently on the books?


Charlie Stross being male, and the Target story being massively advertised, I’m assuming that was something he initially wanted to point at.

Purchase history is by far the worst offender in that aspect: you can predict diabetes, risky behaviour, alcoholism, pregnancy, emotional state (and marriage stability), intention to move (via DIY equipment), even religion and ethnicity, that carry a similar moral and legal burden in Europe.

As far as I can read, all legal documents mention: asking, collecting, processing explicit data. None seem to cover the case of a high factor between “has bough unscented skin cream” and “Recommend cribs, nappies and milk bottles”.

What was interesting in the NYT piece about Target was how even with that knowledge, being too explicit was considered horrible and an invasive practice. Based on anecdotal reaction to ‘People who bought that’ from Amazon from more than a decade ago (and the general sucky obviousity of most of their suggestion) I’m assuming that a lot of the crunchy cases (say: blue books, box wine, dark chocolate and dildos) are censored to avoid public scandal, and a corresponding regulation.

Based on experts’ mobility and the strength of statistical trends in general, I guess the Target Data scientist who was fired after the NYT was not guilty of spelling the beans, or revealing that Hadoop was the right choice (duh) to competitors like WalMart and CostCo, but letting the public know.



I fairly the Data Protection act says it's ok to move data outside the UK, as long the country your moving to has similar data protection laws.


That's only one of the data protection principles (principle 8). The others still apply, such as principle 7:

> Appropriate technical and organisational measures shall be taken against unauthorised or unlawful processing of personal data and against accidental loss or destruction of, or damage to, personal data.

... which I would say has pretty clearly not been followed in this case.

It's probably a breach of many of the other principles, too. For example, principle 3:

> Personal data shall be adequate, relevant and not excessive in relation to the purpose or purposes for which they are processed.

... it's hard to see how the data used was not excessive for the purpose in question.

http://ico.org.uk/for_organisations/data_protection/the_guid... http://ico.org.uk/for_organisations/data_protection/the_guid...


The document where PA consulting announce their use of Google tools[1] ends that section with "For more information please email healthcare@paconsulting.com"

Just sayin'

1: https://www.google.co.uk/url?sa=t&source=web&rct=j&ei=oOwTU9...


I have never understood the reason medical records are extremely private. I'd be much more worried about my email/search history/SMS/etc. being published. My medical history I don't really care much about.

I understand that we don't want companies using our medical history to make decisions, but is HIPAA just a hangover of HIV/AIDS?

My instinct is that HIPAA has cost many lives by locking up data that could be used to help doctors be better doctors. If my kids were allergic to bee stings, I wish this information would show up as the first result of a Google search.


TFA pointed to some reasons:

"Random scenario: a burglary gang gains access to the database and can thereby identify patients aged over 80 living alone in up-market neighbourhoods who have recently been admitted to hospital with conditions suggesting that they will be vulnerable but not supported by full-time carers. A religious organization targets men of a certain age who are HIV positive. Or women below a certain age who are single and pregnant. Or an insurance company notes that a patient made a mistake in their declaration of a pre-existing condition, and thereby invalidates their claim. An identity thief uses the postcode and date of birth, in conjunction with a copy of the public electoral register, to pick victims. The possibilities are endless"

I can add a few more. A wife beater looks up his wife's records to see if she has gone into hospital. A rape victim doesn't want her work colleagues to idly google her medical records and find out. A hopeful job candidate doesn't want her interviewer to know about her problems with self-harm 15 years ago.

Perhaps none of those situations are relevant to you. But they apply to some people and that's why we keep all records private.


I agree with those hypothetical possibilities. The other type of failure is also possible.

My dad recently had a detached retina. He spent 5 hours getting his own medical records. In the medical records, it said he was given a dose of Cipro. "Which is funny because I am deathly allergic to Cipro, and since I am alive I certainly wasn't given Cipro."

Had he not spent 5 hours getting medical records and writing letters, some future doctor could easily kill him based on that erroneous medical record. Other than retired people, who has the time to penetrate the HIPAA mess. It can be lifesaving.


I think it is easier to make private information public, than it is to make public information private.

We can mitigate that sort of failure with things like medical bracelets (http://www.mediband.co.uk/). Paramedics check for medical bracelet, necklace and wallet information cards. One of the first words any medical staff will ask is about allergies.

I recognize that that is hardly ideal and indeed some lives are undoubtedly lost because of it. But I think going public with medical records is too far.


Fully public is clearly too far. I just wish I could access my medical records like I access my own email.

As far as I understand, HIPAA regulations make this illegal to even think about. That is what I mean by "extreme".


> I just wish I could access my medical records like I access my own email.

Why do you think you can't? They can't actually be sent to you over unsecured email, but they certainly can be provided to you securely, and the access methods can be very similar to what you would do with email. I don't think anyone offers a data dump rather than a UI into a hosted product, but I don't see any legal reason under HIPAA for that.


> I understand that we don't want companies using our medical history to make decisions, but is HIPAA just a hangover of HIV/AIDS?

If by "HIPAA" you mean the privacy/security positions, no. First of all, HIV/AIDS (or even STDs more generally) is hardly the only medical concern people want kept private. The privacy and security provisions were designed to negate foreseen resistance to the administrative simplification provisions (incentives to adopt electronic transactions, national standards for form and content of electronic transactions, etc.) and thereby improve information sharing. (And just about every subsequent enhancement of the privacy and security provisions in the law has been tied to a new set of administrative simplification provision that extends new incentives to electronic transactions and records, extends the scope of mandatory standards for electronic transactions, etc.)

> My instinct is that HIPAA has cost many lives by locking up data that could be used to help doctors be better doctors. If my kids were allergic to bee stings, I wish this information would show up as the first result of a Google search.

One of the big motivations for HIPAA and its subsequent enhancements in terms of standardized health records and transactions is, to the extent this is relevant to doctors being doctors, making this the case.

Except not with "Google" or other systems publicly accessible for non-medical purposes that lack detailed accountability to you for who has used it and for what purpose.


It is interesting that HIPAA is the opposite of what I thought it was about. It is motivated by sharing with privacy protections, rather than being motivated by privacy.

I can imagine that if I ever got a leukemia, I wish I could see 1000s of patient records of people with a similar disease, rather than having to rely on a doctor who spills tons of bullshit based on anecdotal experience.


Medical information can be the most private information about people. There are plenty of situations with obvious stigmas - STDs, abortion, mental health as well as a near infinite list of things which could cause embarrassment or that people would want to keep private, from acne to miscarriage.

The last thing we want is for millions of people to stop going to the doctor because they're scared their classmates, coworkers and neighbors are going to know about their [rash, lump, depression, diarrhea, anxiety, psoriasis, pregnancy, etc].


You make a good point about people not seeking medical attention due to privacy concerns. I certainly don't think medical records should be public. It is the extreme privacy that I take issue with.

I mean, why the hell can't my doctor email me my blood work even if I ask?

How do we know we have struck the right balance?


Currently the privacy pendulum is all the way in the free-for-all territory.

- Consider that it takes only one breach and all other measures you've ever taken to protect data is wasted.

- Data can be used in the future in unforseen ways, you don't know by whom and what for.

- Most people don't understand how databases work and think in terms of individual data ("who would care about X") and are therefore careless with their data.

It's the term "extreme privacy" that does not make sense in a time where you can't really have "basic privacy" any more.

Got any more insight into that topic? Or maybe do you underestimate the consequences of the ongoing collection and aggregation of data?


Email is not private and not secure. I'm not sure if the law makes exceptions for encrypted emails, but that's a moot point considering most doctors can barely use a computer.


But it's my privacy and security that's at issue. I don't think it's obvious why I shouldn't be able to waive those concerns.


> But it's my privacy and security that's at issue. I don't think it's obvious why I shouldn't be able to waive those concerns.

You could, in theory, consent to a disclosure to of your PHI in insecure form to the doctor's email provider (or maybe to the whole chain between the doctor and you, but i don't think that would be necessary since once its been disclosed under your consent to someone not covered by HIPAA they can do anything they want with the data, including delivering it to you), but doctors would probably to use a method that doesn't require that overhead and not also manage the kind of consent tracking that would be necessary to use unsecure email (and would rather not deal with the PR fallout that would happen when someone who didn't really understand the implications got bit by it.)

The fact that you could, in theory, under the law consent to certain insecure disclosures to third parties doesn't mean that doctors are mandated to maintain the infrastructure to deal with every conceivable way you might want to do that. They are required to provide information to you under certain circumstances, and not to provide it to others except as allowed under the law.


Fair enough.


Most people can not understand the impact of such a decision and must therefore be protected from doing that.


That's one argument; I don't think it's an outstanding one. A sort of "traffic analysis" argument seems somewhat reasonable, too - if the only things kept secret are embarrassing or compromising, then learning of the existence of a secret communication tells you there's something embarrassing or compromising going on.


Protecting people from themselves has plenty of precedents - food industry, handrails...

Regarding your second point: Many things that are not embarrassing or compromising (in a negative sense) are rightfully kept secret.

Anyways, great topic, not so great subthread, I'm out.


"Many things that are not embarrassing or compromising (in a negative sense) are rightfully kept secret."

I don't intend any negative sense, just things that you have a particular reason to keep hidden. Allowing additional inference around that by being selective about channels seems like poor privacy practice.


True. The doctor can still be held liable for damages, so the law is actually there to protect both patient and doctor in this case.


Your medical records are a proxy to your current health. Thanks to big data, they are a proxy to your future health and that of your relatives too (and by extension, theirs to yours). Health factors can impact every relationship you have with anything or anyone, as well as your ability to perform any activity, and many of them are not reversible in the way that debts can be repaid or ideology can change. Therefore, if you believe in any need for privacy at all, then health is the most important privacy of all.


Because doctor-patient trust is vital. If people are afraid to tell their health care providers anything important, then both individual health care outcomes and public health will suffer.

That you don't care if people know your medical history is great, but you shouldn't project that to other people.

Examples of things where people have entirely legitimate concerns about how others might react: sexually transmitted diseases, addictions, chronic diseases, contagious diseases, terminal conditions, mental health issues, pregnancy, failed pregnancy, abortion, and anything predictive of future health issues. And examples of people that they might not want finding things out: parents, children, pastors, church members, neighbors, insurance companies, current employers, tabloid journalists, future employers.


You lack imagination. There are many people whose medical records contain much more sensitive information than "allergic to bee stings".


As an example US immigration are now sometimes turning away tourists with mental health problems. So in the absence of privacy if you have such problems you can either turn down treatment and suffer or risk being unable to visit or even transit via the US. http://www.telegraph.co.uk/news/worldnews/northamerica/canad...


I remember a similar issue when the italian government decided that it doctors should report treatment of illegal aliens. This just meant that a lot of illegal immigrants didn't want to get cured and it posed a severe risk for public health regarding contagious diseases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: