Month: November 2014

Collection of data is not the only problem

What the NSA has taught us is that mass surveillance is not as hard as people used to think. Other governments, and most commercial companies, do that, too. With the advent of smartphones we’ve learned to ignore most of that for the sake of convenience, and most of the time, it’s ok.

It’s true that the bulk surveillance from governments can spark enough false positives to make people worried, or that Google and Facebook are using your personal details to make a bucket load of money, and some others are selling those details, sometimes not even realising.

When you think of all the power that the government can do with your data, or all the money that big corporations are making with your personal information, it’s nor surprising to think: “where’s my share in this?”. Some people even tried to evaluate how much would you get for selling different types of personal information to corporations. But, is that the real question that we should be asking?

Should we be concerned with what data do we leak and try to minimise it, or should we really be thinking what can they really do with that information? Of course, any answer will be a mix of both (since not all investigating parties are well intentioned or law abiding), but there is the limit of government and corporation’s powers that can go a long way of making the data useful but not harmful.

Privacy

I said this before and I still maintain my position that no one has ever had privacy. Parents eavesdrop on their kids behaviour since the dawn of humanity as a way to grow them into responsible adults. The concept of “being responsible” has changed over the millennia, but parents have not.

Law making and enforcing bodies have eavesdropping as their primordial way of acquiring information. Since people normally only do bad stuff when no one is looking, expecting the police to only use highly visual enquiring methods (such as asking personally or patrolling an area) become impossibly expensive very quickly. It is true that random checkpoints, fake speed cameras and signs do help awareness, but that’s also not optimal from a monetary point of view.

Privacy also goes against any common sense in the outside world. If you take a bus, every one in that bus knows you’re there, even if they don’t know who you are. If there is a picture of you on the bus saying “wanted, dead or alive”, they will see you and report you. There’s little you can do, besides hiding and never showing your face again. Famous people (actors, etc) have the same problem and the solution is pretty much hide.

Data

The data you “leak” is also the data that defines you. Where you have been, what you like, where you work and live, what food you eat and what you do on Saturdays. Collecting that data and providing a service on that is actually extremely beneficial to you. The problem is who has access to that information.

Tesco knows what I need to buy better than I do. They send me vouchers with discount on fresh mozzarella cheese, fresh basil and fresh tomato on the vine. They know I love Caprese salad, and I actually like Tesco knowing that, because I get a slightly cheaper Caprese salad once in a while.

Google Maps knows where I live and work, so that when I’m going home I can just say: “Ok Google, go home”, and it does the rest. If I don’t share that kind of information with Google, it would never be able to do what I want it to. Examples like that are everywhere, and each company must have access to a wide range of data from you (location, shopping habits, browsing habits) for them to be able to do so. It’s the unavoidable fact of information theory that you need enough entropy to find patterns.

Legality

The real problem here is what companies end up doing with your data, and how well they protect it from malicious outsiders. Even if the company is benign, once they get hacked, your bundle of personal data which is enough to infer pretty accurate patters about your personal life, are out there. Who know what the attackers will do you that?

Another problem is blanket approvals to bypass any legal system and arrest, judge and execute individuals solely based on bulk surveillance patterns that are known to generate an immense amount of false positives, not only because the algorithms are inexact, but because the people filtering and creating the rules don’t posses enough knowledge to know what they’re looking for in the first place.

Finally, what happens if the benign company that provides you an invaluable service is suddenly acquired by an unscrupulous company? Can the reach of the service widen based on the parent company’s privacy policy? Or is the data protected like source code that is licensed as open source with, for example, the GNU license?

Solutions

So, a pragmatic view on surveillance should attack the problem of the legality of actions on data, not just the legality of acquiring data in the first place. The legal system can already cope with that, for instance when evidence is found via illegal means (unapproved wire or microphone), it cannot be used against the accused. The “Patriot Act” changed all that in the US, and in other countries, and that’s the first thing that has to be changed back to a sane standard. Governments should never have the ability to bypass the judicial and executive system based on *any* collected data, especially if it was done in bulk, with irrelevant patterns to match.

Another topic that needs addressing is licences on data, especially collected data for the purposes of personal services. There are licenses that cover open data, such as Creative Commons, but these cannot be applied to private data that a company has access with the sole purpose of providing a service. Each company has a different privacy policy and the EFF has great tools to monitor them all, but all of that is solely dependent on the company’s ethics.

A change of the board, or the managing directors, or even an acquisition, is enough to pervert the privacy policy and render the previous data they had on you (that you cannot ever delete any more) to their benefit. What we need is a data license that is not open (since it’s private data), but that is protected in the same way against future changes.

There may be cases for more or less stringent licenses (like GNU vs. BSD) for different uses, but once they’re standard licenses, we don’t need to read every single privacy policy of every company every time they change some minor wording, we’d know what kind of freedoms and guarantees we’re getting, and companies won’t have the right to subversively change it.

Finally, there should be a guarantee in the license that the company is required to store such data in a protected way, following a set of standard cryptographic techniques and solutions, and there should be a clause on how they would destroy the data on the minimal attempt of intrusion. To compensate the total loss of service for all users, they must store such data in different locations, using different techniques and keys, and distribute it across multiple locations.

It may seem daunting for small companies to provide small services, but so did cheap scalable storage and service providing until Amazon created the AWS and all others followed suit. If there is a demand, someone will create the solution. That has been the human response to everything since we came down trees to conquer the planet and we won’t stop here.

Conclusion

It’s not the data, it’s what governments and corporations can do with the data, and how to protect it from malicious parties.