recentpopularlog in

charlesarthur : data   270

« earlier  
Excavating AI
Kate Crawford and Trevor Paglen:
<p>Images do not describe themselves. This is a feature that artists have explored for centuries. Agnes Martin creates a grid-like painting and dubs it “White Flower,” Magritte paints a picture of an apple with the words “This is not an apple.” We see those images differently when we see how they’re labeled. The circuit between image, label, and referent is flexible and can be reconstructed in any number of ways to do different kinds of work. What’s more, those circuits can change over time as the cultural context of an image shifts, and can mean different things depending on who looks, and where they are located. Images are open to interpretation and reinterpretation.

This is part of the reason why the tasks of object recognition and classification are more complex than Minksy—and many of those who have come since—initially imagined.

Despite the common mythos that AI and the data it draws on are objectively and scientifically classifying the world, everywhere there is politics, ideology, prejudices, and all of the subjective stuff of history. When we survey the most widely used training sets, we find that this is the rule rather than the exception.</p>


Great essay.
ai  data  machinelearning 
31 minutes ago by charlesarthur
HP printers try to send data back to HP about your devices and what you print • Robert Heaton
He thought he was just helping his in-laws set up their new printer:
<p>In summary, HP wants its printer to collect all kinds of data that a reasonable person would never expect it to. This includes metadata about your devices, as well as information about all the documents that you print, including timestamps, number of pages, and the application doing the printing (HP state that they do stop short of looking at the contents of your documents). From the HP privacy policy, linked to from the setup program:
<p>Product Usage Data – We collect product usage data such as pages printed, print mode, media used, ink or toner brand, file type printed (.pdf, .jpg, etc.), application used for printing (Word, Excel, Adobe Photoshop, etc.), file size, time stamp, and usage and status of other printer supplies. We do not scan or collect the content of any file or information that might be displayed by an application.

Device Data – We collect information about your computer, printer and/or device such as operating system, firmware, amount of memory, region, language, time zone, model number, first start date, age of device, device manufacture date, browser version, device manufacturer, connection port, warranty status, unique device identifiers, advertising identifiers and additional technical information that varies by product.</p>


HP wants to use the data they collect for a wide range of purposes, the most eyebrow-raising of which is for serving advertising. Note the last column in this “Privacy Matrix”, which states that “Product Usage Data” and “Device Data” (amongst many other types of data) are collected and shared with “service providers” for purposes of advertising.

HP delicately balances short-term profits with reasonable-man-ethics by only half-obscuring the checkboxes and language in this part of the setup.

At this point everything has become clear - the job of this setup app is not only to sell expensive ink subscriptions; it’s also to collect what apparently passes for informed consent in a court of law. I clicked the boxes to indicate “Jesus Christ no, obviously not, why would anyone ever knowingly consent to that”, and then spent 5 minutes Googling how to make sure that this setting was disabled.</p>


Thanks to dark patterns, it can be really hard to be certain that you have disabled these things. You're often navigating a chicane of tickboxes - just ticking all yes or all no won't sort it.
hp  data  printer 
6 days ago by charlesarthur
Fears of no-deal chaos as ministers forced to publish secret Brexit papers • The Guardian
Heather Stewart:
<p>A no-deal Brexit could result in rising food and fuel prices, disruption to medicine supplies and public disorder on Britain’s streets, according to secret documents the government was forced by MPs to publish on Wednesday.

A <a href="https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/831199/20190802_Latest_Yellowhammer_Planning_assumptions_CDL.pdf">five-page document spelling out the government’s “planning assumptions” under Operation Yellowhammer</a> – the government’s no-deal plan – was disclosed in response to a “humble address” motion.

The content of the document was strikingly similar to the plan leaked to the Sunday Times in August, which the government dismissed at the time as out of date.

That document was described as a “base case”; but the new document claims to be a “worst-case scenario”…

…The document, which says it outlines “reasonable worst case planning assumptions” for no deal Brexit, highlights the risk of border delays, given an estimate that up to 85% of lorries crossing the Channel might not be ready for a new French customs regime.

“The lack of trader readiness combined with limited space in French ports to hold ‘unready’ HGVs could reduce the flow rate to 40%-60% of current levels within one day as unready HGVs will fill the ports and block flow,” it warns.

This situation could last for up to three months, and disruption might last “significantly longer”, it adds, with lorries facing waits of between 1.5 days and 2.5 days to cross the border.</p>

Three months would be well into January, having extended through Christmas. Note that the government purposely put this in non-machine-readable PDF, scanned at an angle to make OCR harder. Pure pettiness. There's also a redacted part of the scenario - which journalist Rosamund Urwin got hold of weeks ago (the government then said it was "old"), and <a href="https://twitter.com/RosamundUrwin/status/1171873763295682560">says relates to a forecast of thousands of job losses</a> due to fuel refinery closures because the UK won't be able to export fuel to the EU.
brexit  plans  pdf  data 
12 days ago by charlesarthur
Ring has given ‘active camera’ maps of its customers to police • VICE
Caroline Haskins:
<p>Ring, Amazon’s home surveillance company, has consistently told Motherboard and other reporters that it does not share maps showing the exact locations of camera-owners with police.

However, a <a href="https://www.theguardian.com/technology/2019/aug/29/ring-amazon-police-partnership-social-media-neighbor">map published by The Guardian</a> last week reveals that Ring gave Georgia's Gwinnett County Police Department, located just northeast of Atlanta, an “active camera” map that shows hundreds of dots representing the locations of Ring owners in the region.

Now, emails and documents obtained from the police department by Motherboard provide additional context. The emails reveal that the image was one of two maps showing active Ring cameras in Gwinnett County. (One of the maps is slightly more zoomed-in than the other.)

The maps were provided several months before Ring donated 80 video doorbells to the county worth a total of $15,920, according to documents reviewed by Motherboard. The emails reviewed by Motherboard show the maps were shared with Gwinnett County in order to show that a Ring partnership would give them possible access to a large amount of data.

“Gwinnett County has an incredible amount of Ring devices and neighbors using the Ring app,” a Ring representative told Gwinnett County police. “At no cost, the portal can be an incredible asset to your agency Please let me know what you think.”</p>


I think it's the consumer-surveillance complex.
ring  police  map  data 
14 days ago by charlesarthur
Netflix’s biggest bingers get hit with higher internet costs • Los Angeles Times
Gerry Smith:
<p>James Wright had never worried about staying under his data cap.

Then he bought a 4K TV set and started binge-watching Netflix in ultra-high definition. The picture quality was impressive, but it gobbled up so much bandwidth that his internet service provider, Comcast Corp., warned that he had exceeded his monthly data limit and would need to pay more.

“The first month I blew through the cap like it was nothing,” said Wright, 50, who lives with his wife in Memphis, Tenn. With a 4K TV, he said, “It’s not as hard to go through as you’d think.”

All that bingeing and ultra-HD video can carry a high price tag. As online viewing grows, more subscribers are having to pay up for faster speeds. Even then, they can run into data limits and overage fees. Some opt for an unlimited plan that can double the average $52-a-month internet bill.

Wright is what the cable industry calls a power user — someone who chews through 1 terabyte of data or more each month. Though still rare, the number of power users has doubled in the past year as more families stream TV shows, movies and video games online. They should continue to grow as new video services from Walt Disney Co., AT+T, Apple and NBCUniversal arrive in coming months.

In the first quarter of this year, about 4% of internet subscribers consumed at least 1 terabyte of data — the limit imposed by companies such as Comcast, AT&T and Cox Communications Inc. That’s up from 2% a year ago, according to OpenVault, which tracks internet data usage among cable subscribers in the US and Europe.</p>


What's amazing is that the cable executives are even surprised by this. But of course they're going to gouge people for it.
netflix  data  cable 
5 weeks ago by charlesarthur
Study: many of the “oldest” people in the world may not be as old as we think • Vox
Kelsey Piper:
<p>We’ve long been obsessed with the super-elderly. How do some people make it to 100 or even 110 years old? Why do some regions — say, Sardinia, Italy, or Okinawa, Japan —produce dozens of these “supercentenarians” while other regions produce none? Is it genetics? Diet? Environmental factors? Long walks at dawn?

A new working paper released on bioRxiv, the open access site for prepublication biology papers, appears to have cleared up the mystery once and for all: It’s <a href="https://www.biorxiv.org/content/10.1101/704080v1">none of the above</a>.

Instead, it looks like the majority of the supercentenarians (people who’ve reached the age of 110) in the United States are engaged in — intentional or unintentional — exaggeration.

The paper, by Saul Justin Newman of the Biological Data Science Institute at Australian National University, looked at something we often don’t give a second thought to: the state of official record-keeping.</p>


As the article (and paper) also shows, all the other places - Italy, Japan - with "supercentenarians" tend to have lousy records too.
age  centenary  data 
6 weeks ago by charlesarthur
Startup HYP3R saved Instagram users' stories and tracked locations • Business Insider
Rob Price:
<p>A combination of configuration errors and lax oversight by Instagram allowed one of the social network's vetted advertising partners to misappropriate vast amounts of public user data and create detailed records of users' physical whereabouts, personal bios, and photos that were intended to vanish after 24 hours.

The profiles, which were scraped and stitched together by the San Francisco-based marketing firm HYP3R, were a clear violation of Instagram's rules. But it all occurred under Instagram's nose for the past year by a firm that Instagram had blessed as one of its "preferred marketing partners."

On Wednesday, Instagram sent HYP3R a cease-and-desist letter after being presented with Business Insider's findings and confirmed that the startup broke its rules.

"HYP3R's actions were not sanctioned and violate our policies. As a result, we've removed them from our platform. We've also made a product change that should help prevent other companies from scraping public location pages in this way," a spokesperson said in a statement.

The existence of the profiles is a stark indication that more than a year after revelations that Facebook users' data was exploited by Cambridge Analytica to fuel divisive political ad campaigns, Facebook's struggles in locking down users' personal information not only persist but also extend beyond the core Facebook app…

…The total volume of Instagram data HYP3R has obtained is not clear, though the firm has publicly claimed to have "a unique dataset of hundreds of millions of the highest value consumers in the world," and sources say more than of 90% of its data came from Instagram. It ingests in excess of 1 million Instagram posts a month, sources say.</p>


Will the US get sensible gun laws before it gets sensible data laws, or vice-versa?
data  instagram 
6 weeks ago by charlesarthur
Google ordered to halt human review of voice AI recordings over privacy risks • TechCrunch
Natasha Lomas:
<p>A German privacy watchdog has ordered Google to cease manual reviews of audio snippets generated by its voice AI. 

This follows a leak last month of scores of audio snippets from the Google Assistant service. A contractor working as a Dutch language reviewer handed more than 1,000 recordings to the Belgian news site VRT which was then able to identify some of the people in the clips. It reported being able to hear people’s addresses, discussion of medical conditions, and recordings of a woman in distress.

The Hamburg data protection authority told Google of its intention to use Article 66 powers of the General Data Protection Regulation (GDPR) to begin an “urgency procedure” under Article 66 of GDPR last month.</p>


Surprise: Google complied. It told Ars Technica that "Shortly after we learned about the leaking of confidential Dutch audio data, we paused language reviews of the Assistant to investigate. This paused reviews globally." No date for resumption.
google  voice  audio  data 
7 weeks ago by charlesarthur
Apple halts practice of contractors listening in to users on Siri • The Guardian
Alex Hern:
<p>Contractors working for Apple in Ireland said they were not told about the decision when they arrived for work on Friday morning, but were sent home for the weekend after being told the system they used for the grading “was not working” globally. Only managers were asked to stay on site, the contractors said, adding that they had not been told what the suspension means for their future employment.

The suspension was prompted by a report in the Guardian last week that revealed the company’s contractors “regularly” hear confidential and private information while carrying out the grading process, including in-progress drug deals, medical details and people having sex.

The bulk of that confidential information was recorded through accidental triggers of the Siri digital assistant, a whistleblower told the Guardian. The Apple Watch was particularly susceptible to such accidental triggers, they said. “The regularity of accidental triggers on the watch is incredibly high … The watch can record some snippets that will be 30 seconds – not that long, but you can gather a good idea of what’s going on.</p>

One week from the original report to this change. That's impressive - moreso given that Bloomberg had a weaker form of this report much earlier this year but didn't get anything like the detail. The power of newsprint: it makes a difference having something you can put on a chief executive's desk (even if you have to fly it out there).

Apple has indicated that it's eventually going to restart this, but on an opt-in basis.
apple  privacy  data  siri 
7 weeks ago by charlesarthur
We tested Europe’s new digital lie detector. It failed • The Intercept
Ryan Gallagher and Ludovica Jona:
<p>Prior to your arrival at the airport, using your own computer, you log on to a website, upload an image of your passport, and are greeted by an avatar of a brown-haired man wearing a navy blue uniform.

“What is your surname?” he asks. “What is your citizenship and the purpose of your trip?” You provide your answers verbally to those and other questions, and the virtual policeman uses your webcam to scan your face and eye movements for signs of lying.

At the end of the interview, the system provides you with a QR code that you have to show to a guard when you arrive at the border. The guard scans the code using a handheld tablet device, takes your fingerprints, and reviews the facial image captured by the avatar to check if it corresponds with your passport. The guard’s tablet displays a score out of 100, telling him whether the machine has judged you to be truthful or not.

A person judged to have tried to deceive the system is categorized as “high risk” or “medium risk,” dependent on the number of questions they are found to have falsely answered. Our reporter — the first journalist to test the system before crossing the Serbian-Hungarian border earlier this year — provided honest responses to all questions but was deemed to be a liar by the machine, with four false answers out of 16 and a score of 48. The Hungarian policeman who assessed our reporter’s lie detector results said the system suggested that she should be subject to further checks, though these were not carried out…

…The results of the test are not usually disclosed to the traveler; The Intercept obtained a copy of our reporter’s test only after filing a data access request under European privacy laws.</p>


Developed in the UK, and claims to pick up on "micro gestures" in facial expressions, etc. As if a virtual border agent viewing you through a webcam (which you probably won't look at) weren't weird enough already.
data  border  ai 
8 weeks ago by charlesarthur
You’re very easy to track down, even when your data has been anonymized • MIT Technology Review
Charlotte Jee:
<p>Researchers from Imperial College London and the University of Louvain have created a machine-learning model that <a href="https://nature.com/articles/s41467-019-10933-3">estimates exactly how easy individuals are to reidentify</a> from an anonymized data set. You can <a href="https://cpg.doc.ic.ac.uk/individual-risk/">check your own score</a> by entering your zip code, gender, and date of birth.

On average, in the US, using those three records, you could be correctly located in an “anonymized” database 81% of the time. Given 15 demographic attributes of someone living in Massachusetts, there’s a 99.98% chance you could find that person in any anonymized database.

“As the information piles up, the chances it isn’t you decrease very quickly,” says Yves-Alexandre de Montjoye, a researcher at Imperial College London and one of the study’s authors.

The tool was created by assembling a database of 210 different data sets from five sources, including the US Census. The researchers fed this data into a machine-learning model, which learned which combinations are more nearly unique and which are less so, and then assigns the probability of correct identification.

This isn’t the first study to show how easy it is to track down individuals from anonymized databases. A paper back in 2007 showed that just a few movie ratings on Netflix can identify a person as easily as a Social Security number, for example. However, it shows just how far current anonymization practices have fallen behind our ability to break them.</p>
data  anonymity 
8 weeks ago by charlesarthur
FaceApp reveals huge holes in today's privacy laws • The Atlantic
Tiffany C. Li:
<p>Regardless of origin, tech companies need to do better to protect the privacy of their consumers. Part of this is simply making users more aware of how data are being used. This is the rationale behind privacy policies. However, many users don’t read those policies. Developers need to go further and build actual privacy protections into their apps. These can include notifications on how data (or photos) are being used, clear internal policies on data retention and deletion, and easy workflows for users to request data correction and deletion. Additionally, app providers and platforms such as Apple, Microsoft, and Facebook should build in more safeguards for third-party apps.

But asking tech companies to make a few fixes will not be enough to solve the larger systemic problem, which is simply that our society hasn’t figured out how to deal with privacy in a way that actually protects individuals. The way we conceptualize privacy—by focusing, for instance, on the point at which a user decides to enter personal data into a website—is inadequate for the realities of today’s technology. Data are being collected all the time, often in ways that are all but impossible for consumers to know about. You cannot expect every traffic camera to include a privacy policy. Meanwhile, data sets are often sold, bought, aggregated, and transformed by third-party data brokers in ways unimaginable to consumers.</p>
faceapp  data 
9 weeks ago by charlesarthur
The plan to mine the world’s research papers • Nature
Priyanka Pulla:
<p>Over the past year, [American technologist Carl] Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.

No one will be allowed to read or download work from the repository, because that would breach publishers’ copyright. Instead, Malamud envisages, researchers could crawl over its text and data with computer software, scanning through the world’s scientific literature to pull out insights without actually reading the text.

The unprecedented project is generating much excitement because it could, for the first time, open up vast swathes of the paywalled literature for easy computerized analysis. Dozens of research groups already mine papers to build databases of genes and chemicals, map associations between proteins and diseases, and <a href="https://www.nature.com/articles/d41586-019-01978-x">generate useful scientific hypotheses</a>. But publishers control — and often limit — the speed and scope of such projects, which typically confine themselves to abstracts, not full text. Researchers in India, the United States and the United Kingdom are already making plans to use the JNU store instead. </p>
science  data  offshore 
9 weeks ago by charlesarthur
The US, China, and case 311/18 on Standard Contractual Clauses • European Law Blog
Peter Swire:
<p>In the aftermath of the 2015 case [on Facebook transferring data to the US, which found against Facebook and invalidated those transfers], most companies that transfer data from the EU were left to rely on contract standards promulgated by the European Commission, called Standard Contractual Clauses (SCC).  The SCCs set strict requirements for handling personal data by the company that transfers the data.

The legality of SCCs is now before the CJEU, with a similar challenge to Privacy Shield awaiting the outcome of the first case.

A CJEU decision that invalidates SCCs would result in the prohibition of most transfers of personal data from the EU to the US. The case primarily concerns the quality of legal safeguards in the United States for government surveillance, especially by the NSA. (Note – I was selected to provide independent expert testimony on US law by Facebook; under Irish law, I was prohibited from contact with Facebook while serving as an expert, and I have played no further role in the litigation.)

A decision invalidating SCCs, however, would pose a terrible dilemma to EU courts and decisionmakers.

At a minimum, the CJEU might “merely” prohibit data flows to the US due to a finding of lack of sufficient safeguards, notably an insufficient remedy for an EU data subject who makes a subject access request to the NSA. The EU on this approach would continue to authorize the transfer of personal data to countries not directly covered by the Court decision, such as, for example, China.  This approach would be completely unjustified: it would prohibit transfers of data to the US, which has numerous legal safeguards characteristic of a state under the rule of law, while allowing such transfers toward China, where the protection of personal data vis-à-vis the government is essentially non-existent.</p>
data  privacy  europe  china 
10 weeks ago by charlesarthur
German privacy watchdog: Microsoft’s Office 365 cannot be used in public schools • WinBuzzer
Luke Jones:
<p>A data authority in the German State of Hesse has warned Microsoft’s Office 365 cannot be used in schools. Michael Ronellenfitsch, Hesse’s data protection commissioner, says the standard Office 365 configuration creates privacy issues.

He warned this week that data stored in the cloud by the productivity suite could be accessed in the United States. Specifically, personal information from teachers and students would be in the cloud. Ronellenfitsch says even if the data was held in centers in Europe, it is still “exposed to possible access by US authorities”.

The commissioner says public intuitions in Hesse and across Germany “have a special responsibility with regard to the permissibility and traceability of the processing of personal data."…

…It is worth noting that Ronellenfitsch previously endorsed the use of Office 365 in schools. Back in 2017, he said schools can use the suite under certain conditions that match Germany’s data protection compliancy laws. At the time, Microsoft was partnered with Deutsche Telekom and offering the “Germany Cloud” initiative that is now depreciated.</p>


This isn't an opportunity for Google or Apple: they don't meet the authority's criteria on privacy and data either.
privacy  data  microsoft 
10 weeks ago by charlesarthur
Florida DMV sells your personal information to private companies, marketing firms • ABC Action News
Adam Walser:
<p>In Idaho, [Tonia] Batson lived in a group home where someone else handled her finances, daily living and healthcare arrangements. She had no digital footprint because she can’t read or write.

That’s why [Batson’s sister and legal guardian Sonia] Arvin wanted to know how marketers got Batson’s personal information.

“The only one that had it was the DMV,” said Arvin. “Even if it’s a public record in Florida – if we tell them we want it private, it should be kept private.”

The state opened an investigation into Batson’s case after ABC alerted FHSMV officials.

That’s because Florida Department of Highway Safety and Motor Vehicles (FHSMV) said companies buying data on Floridians are not allowed to use that information for marketing.

But not every company plays by the rules.

The state told ABC it has banned data sales to three companies since 2017 for misusing driver and ID cardholder information.

The Florida Department of Highway Safety and Motor Vehicles told ABC that under the law, it must provide driver information but said federal privacy laws and its own rules limit how outside companies can access Floridian’s personal information.

One of the data brokers accessing Florida DMV information is Arkansas-based marketing firm Acxiom, which has an agreement with the state to buy driver and ID cardholder data for a penny a record.

On its website, Acxiom claims it has collected information from almost every adult in the United States.</p>

A penny per record. The incentive for flouting that is far higher, and the fines probably much lower - if fines are handed out (none are mentioned in the story).

US data privacy? It would be a nice idea. But if even the government is selling your data, people like Facebook could legitimately claim, Catch-22 style, that “everyone’s doing it, so I’d be a fool not to”.
Data  Florida  dmv  America 
10 weeks ago by charlesarthur
Brain, set and match! How Novak Djokovic and Co invest in intelligence to get edge over Wimbledon rivals • London Evening Standard
Matt Majendie:
<p>In some ways, [Craig] O’Shannessy [head of analysis company Golden Set Analytics] is like David up against Goliath. Golden Set Analytics, which came into being in 2012, is made up of economists, statisticians and mathematicians hailing from Harvard, Yale and Stanford. They are notoriously secretive, with company policy being “not to provide information about current clients or our services to them”.  In contrast, O’Shannessy, also the architect for Wimbledon quarter-finalist Alison Riske’s dismantling of his fellow Australian and world No1 Ashleigh Barty yesterday, said: “I failed maths in high school!”

But he understands percentages and has been a pioneer in research on rally length and the fact that 70% of points are won in rallies of up to four shots, 20% in five to eight and just 10% in nine shots or above. “The implications for the practice court are massive,” he said. “Why grind it out spending 90 per cent of your time on something that only happens 10 per cent of the match? That’s ludicrous. Analytics debunk the old theories of coaching. It’s like players never used to have a fitness coach, right now you don’t see that many players sitting around computers analysing their game and that of opponents. You’re in the job of winning matches and the Grand Slam prize money is massive so why wouldn’t you want to know an opponent’s strengths and weaknesses?

“And for me, I won’t always watch live. In the movie Moneyball, the manager doesn’t watch a lot live. I’ll watch in granular detail after and anyway, when the match is on I’m already looking at the opponent.”</p>


Hmm. When I was spending a lot of time reporting on tennis - which is about 30 years ago - analytics were already growing: forehand winners, backhand winners, and so on. But a single statistic will almost always predict the winner of a match: how many second serve points they win (whether serving or receiving). But how do you train to do that, exactly?

O'Shannessy's description sounds too simplistic; there's got to be a lot more to it than that. (A "golden set", by the way, is one you win without losing a point - 24 straight.) <a href="https://gamesetmap.com">This company</a>, which GSA bought, is clearly <a href="https://gamesetmap.com/?p=1261">doing interesting stuff</a>.
tennis  statistics  data 
10 weeks ago by charlesarthur
Over 1,300 Android apps scrape personal data regardless of permissions • TechRadar
David Lumb:
<p>Researchers at the International Computer Science Institute (ICSI) created a controlled environment to test 88,000 apps downloaded from the US Google Play Store. They peeked at what data the apps were sending back, compared it to what users were permitting and - surprise - <a href="https://www.ftc.gov/system/files/documents/public_events/1415032/privacycon2019_serge_egelman.pdf">1,325 apps were forking over specific user data they shouldn’t have</a>.

Among the test pool were “popular apps from all categories,” according to ICSI’s report. 

The researchers disclosed their findings to both the US Federal Trade Commission and Google (receiving a bug bounty for their efforts), though the latter stated a fix would only be coming in the full release of Android Q, according to CNET.

Before you get annoyed at yet another unforeseen loophole, those 1,325 apps didn’t exploit a lone security vulnerability - they used a variety of angles to circumvent permissions and get access to user data, including geolocation, emails, phone numbers, and device-identifying IMEI numbers.

One way apps determined user locations was to get the MAC addresses of connected WiFi base stations from the ARP cache, while another used picture metadata to discover specific location info even if a user didn’t grant the app location permissions. The latter is what the ICSI researchers described as a “side channel” - using a circuitous method to get data.

They also noticed apps using “covert channels” to snag info: third-party code libraries developed by a pair of Chinese companies secretly used the SD card as a storage point for the user’s IMEI number. If a user allowed a single app using either of those libraries access to the IMEI, it was automatically shared with other apps.</p>


Android Q isn't going to be universally adopted by any means. Data leaks are going to go on.
android  data  privacy  security 
10 weeks ago by charlesarthur
Google still keeps a list of everything you ever bought using Gmail, even if you delete all your emails • CNBC
Todd Haselton:
<p>In May, I wrote up something weird I spotted on Google’s account management page. I noticed that Google uses Gmail to store a list of everything you’ve purchased, if you used Gmail or your Gmail address in any part of the transaction.

If you have a confirmation for a prescription you picked up at a pharmacy that went into your Gmail account, Google logs it. If you have a receipt from Macy’s, Google keeps it. If you bought food for delivery and the receipt went to your Gmail, Google stores that, too.

You get the idea, and you can see your own purchase history by going to Google’s Purchases page.

Google says it does this so you can use Google Assistant to track packages or reorder things, even if that’s not an option for some purchases that aren’t mailed or wouldn’t be reordered, like something you bought a store.

At the time of my original story, Google said users can delete everything by tapping into a purchase and removing the Gmail. It seemed to work if you did this for each purchase, one by one. This isn’t easy — for years worth of purchases, this would take hours or even days of time.

So, since Google doesn’t let you bulk-delete this purchases list, I decided to delete everything in my Gmail inbox. That meant removing every last message I’ve sent or received since I opened my Gmail account more than a decade ago.

Despite Google’s assurances, it didn’t work.</p>
google  gmail  purchases  data  retention  surveillance 
11 weeks ago by charlesarthur
Why not to use two axes, and what to use instead • Chartable
Lisa Charlotte Rost:
<p>We believe that charts with two different y-axes make it hard for most people to intuitively make right statements about two data series. We recommend two alternatives strongly: using two charts instead of one and using indexed charts.

From time to time we get an email asking if it’s possible in Datawrapper to create charts with two different y-axes (also called double Y charts, dual axis charts, dual-scale data charts or superimposed charts). It is not – and we won’t add it any time soon. We’re sorry if that makes our user’s life harder, but we agree with the many chart experts[1] who make cases against dual axis charts. We hope you’ll hear us out.

We will first look at situations when people want to use dual axis charts, then we explain their problems, and afterward we’ll look at four alternatives</p>


This blogpost is referenced in the slightly wordier, but not less good (just harder to excerpt) <a href="https://digitalblog.ons.gov.uk/2019/07/03/dueling-with-axis-the-problems-with-dual-axis-charts/">blogpost from the Office for National Statistics</a> on the same topic. When the ONS comes out against dual axis, you know it's bad.
charts  data  visualisation 
11 weeks ago by charlesarthur
5G in Australia: supersonic speeds raise data consumption questions • CNET
Daniel Van Boom:
<p>That brings us to a more practical issue. As noted, Randwick was my first testing location. About 25 minutes in, after several speed tests, downloading PUBG and two movies from Netflix, I got an SMS. "You've used 50% of your 20GB data allowance," Telstra warned me. Uh oh.

The SIM card I was using was loaned to me by Telstra for testing, but 20GB isn't an unusually small amount. Telstra's fattest data plan offers 150GB for $70 (AU$100) a month, but the average Australian has a 10GB data limit, according to a 2018 Finder study. Most plans in Australia give you between 10 and 50GB of data. In the US, "unlimited" data plans tend to include up to about 75GB, or 100GB for Sprint's priciest plan, before internet speeds are throttled.

It will be impossible to burn through 50GB, let alone 150GB, just by using social media, answering emails and streaming YouTube on 4G. But with 5G speed comes incentive to, y'know, use 5G. When 5G speeds outpace home broadband by a significant margin, data will have to become cheaper for those blazing speeds to be convenient and truly useful. </p>


In the UK, the mobile company EE (owned by the landline monopoly BT) is the first with 5G. In my experience, it's also the stingiest with data allowances - or the priciest, which works out to the same thing. 5G is fast - though even those testers were seeing speeds fall in their testing.
5g  data  price 
12 weeks ago by charlesarthur
Personal details of 23m drivers given out by DVLA • The Times
Graeme Paton:
<p>The information watchdog is to hold an inquiry after the Driver and Vehicle Licensing Agency released the personal details of a record 23 million vehicle owners last year.

The Times has learnt that an unprecedented 63,600 records a day were handed to third parties including bailiffs and private investigators, often allowing motorists to be aggressively pursued for parking and toll road fines.

The DVLA charged organisations to obtain almost 7.8 million records, suggesting that it made £19.4m from the release of the data of almost two thirds of all vehicle owners in the UK.

Motoring groups called for an independent inquiry amid questions over how a data release on this scale could be properly policed, particularly in light of the rigorous new General Data Protection Regulation (GDPR) introduced across Europe last year.

There are fears that not all organisations that obtained the vehicle records did so legitimately, nor put them to a proper use.</p>
dvla  data  security  gdpr 
june 2019 by charlesarthur
Apple launches 'Sign in with Apple' button for apps, ‘no tracking’ login • 9to5 Mac
Benjamin Mayo:
<p>Apple announced a new Sign in with Apple button as part of its iOS 13 announcements. The button offers Apple ID single-sign on functionality similar to sign-in buttons from Twitter, Facebook or Google.

Apple is marketing this as a privacy-secure sign-in option. Apple will mask user email addresses and other personal information, whilst still allowing the apps to contact users indirectly.

Users select what information to share with the destination app. You can share your real email address with the third-party app, or use the ‘hide my email’ option to forward email onwards. In the latter case, the app would only see a random anonymous email address.

Of course, apps must update to integrate the ‘Sign in with Apple’ button. A lot of apps may not want to add the Apple ID login because they cannot access customer data they want.</p>

Logical expectation is that Apple will push it on its devices, so apps and sites may feel they need to support it. But with the tech landscape as it is, there might be some reluctance to not gather data when you can slurp it up via Google or Facebook. Those sites and apps aren't on your side. They're on their own side.
Apple  data  privacy  signon 
june 2019 by charlesarthur
iPhone privacy is broken…and apps are to blame • WSJ
Joanna Stern:
<p>Congratulations! You’ve bought an iPhone! You made one of the best privacy-conscious decisions... until you download an app from Apple’s App Store. Most are littered with secret trackers, slurping up your personal data and sending it to more places than you can count.

Over the last few weeks, my colleague Mark Secada and I tested 80 apps, most of which are promoted in Apple’s App Store as “Apps We Love.” All but one used third-party trackers for marketing, ads or analytics. The apps averaged four trackers apiece.

Some apps send personal data without ever informing users in their privacy policies, others just use industry-accepted—though sometimes shady—ad-tracking methods. As my colleague Sam Schechner reported a few months ago (also with Mark’s assistance), many apps send info to Facebook, even if you’re not logged into its social networks. In our new testing, we found that many also send info to other companies, including Google and mobile marketers, for reasons that are not apparent to the end user.

We focused on the iPhone in our testing—largely because of Apple’s aggressive marketing of personal privacy. However, apps in Google’s Play Store for Android use the same techniques. In some cases, when it comes to providing on-device information to developers and trackers, Android is worse. Google recently updated its app permissions and says it is taking a deeper look at how apps access personal user information.</p>


Stern must be furious that her former colleague Geoff Fowler, now at the Washington Post, got ahead of her with the story - his appeared a day or two before hers - but it shows that we've become complacent about apps, and especially the third-party trackers they tend to incorporate.
apple  apps  data  privacy 
may 2019 by charlesarthur
Would you recognise yourself from your data? • BBC News
Carl Miller had the clever idea of getting all the data held about him, to see what it revealed - and whether it was accurate:
<p>About 1,500 of those pages were this kind of educated guesswork, all of it from companies I had never heard of before.

It's easy to find data on this scale a little alarming, but most of it I found more silly than sinister:<br />• The age of my boiler had been predicted<br />• My likelihood to be interested in gardening was 23.3%<br />• My interest in prize draws and competitions was 11%<br />• My "animal/nature awareness level" was low<br />• My consumer technology audience segmentation was described as (among other things) "young and struggling".<br />• My household was found to have no "regular interest in book reading" (I have written a book)<br />• At one moment I was a go-getter, an idea-seeker.<br />• Then I was a love aspirer, a disengaged worker, part of a group called budgeted stability or, simply, downhearted.<br />• Something I did triggered a "Netmums - women trying to conceive" event.

If this was a reflection of myself, I didn't recognise it.</p>


Not a very accurate picture, in other words. This is the world of "targeted" advertising?

And of course when he did try to get the data, in many cases he was directed to broken systems or told to send his request by snail mail. Though there's an argument that you want to make it a little harder to access that data than just downloading it, because otherwise it might be open to hackers.
data  personality  tracking  surveillance 
may 2019 by charlesarthur
For a longer, healthier life, share your data • The New York Times
Luke Miner is "a data scientist":
<p>There are a number of overlapping reasons it is difficult to build large health data sets that are representative of our population. One is that the data is spread out across thousands of doctors’ offices and hospitals, many of which use different electronic health record systems. It’s hard to extract records from these systems, and that’s not an accident: The companies don’t want to make it easy for their customers to move their data to a competing provider.

But there is also a fundamental problem with our health care privacy protections, primarily the Health Insurance Portability and Accountability Act, known as Hipaa.

Hipaa was passed in 1996, when artificial intelligence was largely the realm of science fiction movies and computer science dreams. It was intended to safeguard the privacy and confidentiality of patient records (as well as to improve the portability of health coverage when patients switched jobs).

But today one of the main effects of the law is to make it much harder for doctors and hospitals to share data with researchers. The fees they would have to pay for legal experts, statisticians and the other consultants needed to ensure compliance with the law are just too steep to bother.

Julia Adler-Milstein, the director of the Center for Clinical Informatics and Improvement Research at the University of California, San Francisco, told me that “the costs associated with sharing data for research purposes in a Hipaa-compliant way are beyond what many hospitals can justify.” She added, “The fines associated with a potential data breach are also a deterrent.”

These fines are a blunt instrument that don’t correspond to varying levels of harm, creating a climate of fear that discourages sharing. </p>


Obviously, the temptation is to say "you first, Luke." Show us how harmless having your health data shared with the world is, because this is a one-way valve: once the data goes in, it doesn't come out.
health  data 
may 2019 by charlesarthur
Your car knows when you gain weight - and much, much more • NY Times
Bill Hanvey:
<p>Today’s cars are equipped with telematics, in the form of an always-on wireless transmitter that constantly sends vehicle performance and maintenance data to the manufacturer. Modern cars collect as much as 25 gigabytes of data per hour, the consulting firm McKinsey estimates, and it’s about much more than performance and maintenance.

Cars not only know how much we weigh but also track how much weight we gain. They know how fast we drive, where we live, how many children we have — even financial information. Connect a phone to a car, and it knows who we call and who we text.
But who owns and, ultimately, controls that data? And what are carmakers doing with it?

The issue of ownership is murky. Drivers usually sign away their rights to data in a small-print clause buried in the ownership or lease agreement. It’s not unlike buying a smartphone. The difference is that most consumers have no idea vehicles collect data.

We know our smartphones, Nests and Alexas collect data, and we’ve come to accept an implicit contract: We trade personal information for convenience. With cars, we have no such expectation.

What carmakers are doing with the collected data isn’t clear. We know they use it to improve car performance and safety. And we know they have the ability to sell it to third parties they might choose. Indeed, Ford’s chief executive, Jim Hackett, has spoken in detail about the company’s plans to monetize car data.

Debates around privacy often focus on companies like Facebook. But today’s connected cars — and tomorrow’s autonomous vehicles — show how the commercial opportunities in collecting personal data are limitless.</p>

The commercial *desire* to collect personal data is limitless, especially in the US, where everyone and everything is viewed just as more grist for the ever-advancing maw.
Car  data 
may 2019 by charlesarthur
The terrifying potential of the 5G network • The New Yorker
Sue Halpern:
<p>A totally connected world will also be especially susceptible to cyberattacks. Even before the introduction of 5G networks, hackers have breached the control center of a municipal dam system, stopped an Internet-connected car as it travelled down an interstate, and sabotaged home appliances. Ransomware, malware, crypto-jacking, identity theft, and data breaches have become so common that more Americans are afraid of cybercrime than they are of becoming a victim of violent crime. Adding more devices to the online universe is destined to create more opportunities for disruption. “5G is not just for refrigerators,” Spalding said. “It’s farm implements, it’s airplanes, it’s all kinds of different things that can actually kill people or that allow someone to reach into the network and direct those things to do what they want them to do. It’s a completely different threat that we’ve never experienced before.”

Spalding’s solution, he told me, was to build the 5G network from scratch, incorporating cyber defenses into its design. Because this would be a massive undertaking, he initially suggested that one option would be for the federal government to pay for it and, essentially, rent it out to the telecom companies. But he had scrapped that idea. A later draft, he said, proposed that the major telecom companies—Verizon, AT+T, Sprint, and T-Mobile—form a separate company to build the network together and share it. “It was meant to be a nationwide network,” Spalding told me, not a nationalized one. “They could build this network and then sell bandwidth to their retail customers. That was one idea, but it was never that the government would own the network. It was always about, How do we get industry to actually secure the system?”</p>
mobile  privacy  data  5g 
april 2019 by charlesarthur
Health apps pose 'unprecedented' privacy risks • BBC News
<p>Using popular health apps could mean private information about medical conditions is not kept confidential, researchers warn.
Of 24 health apps in the BMJ study, 19 shared user data with companies, including Facebook, Google and Amazon.

It warns this could then be passed on to other organisations such as credit agencies or used to target advertising.

And data was shared despite developers often claiming they did not collect personally identifiable information.

Users could be easily identified by piecing together data such as their Android phone's unique address, the study says.

"The semi-persistent Android ID will uniquely identify a user within the Google universe, which has considerable scope and ability to aggregate highly diverse information about the user," wrote co-author Dr Quinn Grundy of the Lawrence S. Bloomberg Faculty of Nursing at the University of Toronto.
"These apps claim to offer tailored and cost-effective health promotion - but they pose unprecedented risk to consumers' privacy given their ability to collect user data, including sensitive information."

The authors conclude:
• doctors need to warn patients about the threat to their privacy from using such apps<br />• regulators should consider that loss of privacy is not a fair cost for the use of digital health services.</p>
Health  apps  surveillance  data  sharing 
april 2019 by charlesarthur
Boeing 737 MAX crash and the rejection of ridiculous data • Philip Greenspun’s Weblog
<p><a href="https://www.bbc.com/news/world-africa-47553174">“Boeing 737 Max: What went wrong?”</a> (BBC) contains a plot showing the angle of attack data being fed to Boeing’s MCAS software. Less than one minute into the flight, the left sensor spikes to an absurd roughly 70-degree angle of attack. Given the weight of an airliner, the abruptness of the change was impossible due to inertia. But to have avoided killing everyone on board, the software would not have needed a “how fast is this changing?” capability. It would simply have needed a few extra characters in an IF statement.

Had the systems engineers and programmers checked Wikipedia, for example, (or maybe even their own web site) they would have learned that “The critical or stalling angle of attack is typically around 15° – 20° for many airfoils.” Beyond 25 degrees, therefore, it is either sensor error or the plane is stalling/spinning and something more than a slow trim is going to be required.

So, even without checking the left and right AOA sensors against each other (what previous and conventional stick pusher designs have done), all of the problems could potentially have been avoided…</p>


As Greenspun points out, it would just be a little extra logic - if a reading is wildly impossible, then reality hasn't shifted; the sensor is wrong. The logic presently tends to assume the sensor is right.
boeing  data  737 
april 2019 by charlesarthur
CFIUS forces PatientsLikeMe into fire sale, booting Chinese investor • CNBC
Christina Farr and Ari Levy:
<p>[US startup] PatientsLikeMe is being forced to find a buyer after the U.S. government has ordered its majority owner, a Chinese firm, to divest its stake.

PatientsLikeMe provides an online service that helps patients find people with similar health conditions. In 2017, the start-up raised $100m and sold a majority stake to Shenzhen-based iCarbonX, which was started by genomic scientist Jun Wang and is backed by Chinese giant Tencent.

That deal has recently drawn the attention of the Committee on Foreign Investment in the United States (CFIUS), which is aggressively cracking down on Chinese investments in American companies, particularly when national security and trade secrets are at risk.

CFIUS is now forcing a divestiture by iCarbonX, meaning PatientsLikeMe has to find a buyer, according to several people with knowledge of the matter. PatientsLikeMe started receiving notifications from CFIUS late last year, said the people, who asked not to be named because the details are confidential.

The move could have dire implications for the start-up community, as Chinese investors are scared away or forbidden from participating in deals that can help emerging businesses.</p>


Also means CFIUS thinks that personal data is worth treating as a valuable national asset. That has big, big implications.
cfius  china  data 
april 2019 by charlesarthur
Mistakes, we’ve drawn a few • The Economist
Sarah Leo:
<p>At The Economist, we take data visualisation seriously. Every week we publish around 40 charts across print, the website and our apps. With every single one, we try our best to visualise the numbers accurately and in a way that best supports the story. But sometimes we get it wrong. We can do better in future if we learn from our mistakes — and other people may be able to learn from them, too.

After a deep dive into our archive, I found several instructive examples. I grouped our crimes against data visualisation into three categories: charts that are (1) misleading, (2) confusing and (3) failing to make a point. For each, I suggest an improved version that requires a similar amount of space — an important consideration when drawing charts to be published in print.</p>


This is good to see being done. I like this one best: <img src="https://cdn-images-1.medium.com/max/1600/1*9GzHVtm4y_LeVmFCjqV3Ww.png" width="100%" />

And its advice: "aim for leaving at least 33% of the plot area free under a line chart that doesn’t start at zero."
data  journalism  mistakes  visualisation 
march 2019 by charlesarthur
Guardian Mobile Fireweall aim to block the apps that grab your data • Fast Company
Glenn Fleishman:
<p>A New York Times report in December focused on location data being shared with third-party organizations and tied to specific users; in February, a Wall Street Journal investigation reported that app makers were sharing events as intimate as ovulation cycles and weight with Facebook. But no matter how alarmed you are by such scenarios, there hasn’t been much you could do. Mobile operating systems don’t let you monitor your network connection and block specific bits of data from leaving your phone.

That led Strafach and his colleagues at Sudo Security Group aim to take practical action. “We are aware of almost every active tracker that is in the App Store,” he says. Building on years of research, Sudo is putting the finishing touches on an iPhone app called Guardian Mobile Firewall, a product that combines a virtual private network (VPN) connection with a sophisticated custom firewall managed by Sudo.

It looks like Guardian will be the first commercial entry into a fresh category of apps and services that look not only just for malicious behavior, but also what analysis shows could be data about you leaving your phone without your explicit permission. It will identify and variably block all kinds of leakage, based on Sudo’s unique analysis of App Store apps.

Sudo is <a href="https://itunes.apple.com/us/app/guardian-firewall/id1363796315?mt=8">taking preorders for the app in the Apple Store</a> and plans a full launch no later than June. It will debut on iOS, and required some lengthy conversations with Apple’s app reviewers as Sudo laid out precisely what part of its filtering happens in the app (none of it) and what happens at its cloud-based firewall (everything). The price will be in the range of a high-end, unlimited VPN—about $8 or $9 a month. Sudo plans an expanded beta program in April, followed by a production release that will be automatically delivered to preorder customers.</p>


You'd need to be pretty worried about data grabs to pay that amount, wouldn't you? That's nearly a music subscription. Is your data *that* valuable? Wouldn't an adblocker be a lot cheaper?
sudo  data  privacy 
march 2019 by charlesarthur
How artificial intelligence is changing science • Quanta Magazine
Rachel Suggs:
<p>In a <a href="https://arxiv.org/pdf/1812.01114.pdf">paper</a> published in December in Astronomy & Astrophysics, Schawinski and his ETH Zurich colleagues Dennis Turp and Ce Zhang used generative modeling to investigate the physical changes that galaxies undergo as they evolve. (The software they used treats the latent space somewhat differently from the way a generative adversarial network [GAN] treats it, so it is not technically a GAN, though similar.) Their model created artificial data sets as a way of testing hypotheses about physical processes. They asked, for instance, how the “quenching” of star formation — a sharp reduction in formation rates — is related to the increasing density of a galaxy’s environment.

For [Galaxy Zoo creator Kevin] Schawinski, the key question is how much information about stellar and galactic processes could be teased out of the data alone. “Let’s erase everything we know about astrophysics,” he said. “To what degree could we rediscover that knowledge, just using the data itself?”

First, the galaxy images were reduced to their latent space; then, Schawinski could tweak one element of that space in a way that corresponded to a particular change in the galaxy’s environment — the density of its surroundings, for example. Then he could re-generate the galaxy and see what differences turned up. “So now I have a hypothesis-generation machine,” he explained. “I can take a whole bunch of galaxies that are originally in a low-density environment and make them look like they’re in a high-density environment, by this process.”  Schawinski, Turp and Zhang saw that, as galaxies go from low- to high-density environments, they become redder in color, and their stars become more centrally concentrated. This matches existing observations about galaxies, Schawinski said. The question is why this is so.

The next step, Schawinski says, has not yet been automated: “I have to come in as a human, and say, ‘OK, what kind of physics could explain this effect?’”</p>


If you'd forgotten Galaxy Zoo, it was a <a href="https://www.theguardian.com/technology/2009/jan/15/internet-astronomy">crowdsourcing method of cataloguing galaxies</a>, launched 12 years ago. Now, the article says, you'd get it done by an AI system in an afternoon.
ai  data  science 
march 2019 by charlesarthur
Use and fair use: statement on shared images in facial recognition AI • Creative Commons
Ryan Merkley on questions about the legality of IBM's use of a ton of Creative Commons-licensed photos to train facial recognition systems:
<p>While we do not have all the facts regarding the IBM dataset, we are aware that fair use allows all types of content to be used freely, and that all types of content are collected and used every day to train and develop AI. CC licenses were designed to address a specific constraint, which they do very well: unlocking restrictive copyright. But copyright is not a good tool to protect individual privacy, to address research ethics in AI development, or to regulate the use of surveillance tools employed online. Those issues rightly belong in the public policy space, and good solutions will consider both the law and the community norms of CC licenses and content shared online in general.

I hope we will use this moment to build on the important principles and values of sharing, and engage in discussion with those using our content in objectionable ways, and to speak out on and help shape positive outcomes on the important issues of privacy, surveillance, and AI that impact the sharing of works on the web.</p>

There's also a new FAQ, which includes this: "If someone uses a CC-licensed work with any new or developing technology, and if copyright permission is required, then the CC license allows that use without the need to seek permission from the copyright owner so long as the license conditions are respected. This is one of the enduring qualities of our licenses — they have been carefully designed to work with all new technologies where copyright comes into play. No special or explicit permission regarding new technologies from a copyright perspective is required."

In other words: you licensed it. You can't unilaterally revoke the licence. (Perhaps there'll be a new CC variant - "no AI".)
Data  training  licensing  copyright 
march 2019 by charlesarthur
Eero is now officially part of Amazon, pledges to keep network data private • The Verge
Nilay Patel:
<p>concerns that Amazon would somehow make expanded use of Eero network data have been growing ever since the deal was announced — obviously, your Wi-Fi router can see all your network traffic, and Eero’s system in particular relies on a cloud service for network optimization and other features. But Eero is committed to keeping that data private, said [Eero CEO Nick] Weaver, who also <a href="https://blog.eero.com/as-we-join-the-amazon-family-were-accelerating-our-mission-to-deliver-perfect-connectivity-in-every-home/">published a blog post</a> this morning that explicitly promises Eero will never read any actual network traffic.

“If anything, we’re just going to strengthen our commitment to both privacy and security,” Weaver told us. “We’ve got some pretty clear privacy principles that we’ve used for developing all of our products, that are the really the underpinnings of everything. Those aren’t going to change.”

Those three principles, as laid out in the blog post, are that customers have a “right to privacy” that includes transparency around what data is being collected and control over that data; that network diagnostic information will only be collected to improve performance, security, and reliability; and that Eero will “actively minimize” the amount of data it can access, while treating the data it does collect with “the utmost security.”</p>


Never is a long time; there was a time when Nest was never going to be integrated into Google. A more proximate worry for a smaller group of people is whether it's going to keep advertising on podcasts.
amazon  eero  data  privacy 
march 2019 by charlesarthur
No, data is not the new oil • WIRED
Antonio García Martínez:
<p>[Imagine that] Amazon sends a delivery van to my home filled with hard drives containing all its sales and user browsing data for the past year. What do I do with it?

Keep in mind, this trove is worth billions. Accounting rules don’t call (yet) for tech companies to specify their data as a separate asset on the balance sheet, but by any reasonable valuation, Amazon’s purchase data is worth an immense fortune … to Amazon.

That’s because Amazon has built an expansive ecommerce presence, a ruthlessly efficient recommendation and advertising engine, and a mind-bogglingly complex warehouse and fulfillment operation around the data on those hard drives. Ditto Google, Uber, Airbnb, and every other company you’d identify as an “oil field” in this tired metaphor.

Sure, you could maybe sell some of that data—there are companies that would love to know Amazon’s sales data or Google’s search queries or Uber’s routing and pricing history. But here’s the key thing: Those interested outside parties are competitors, and the owners of the data would never in a million years sell it. [Unlike oil, sold between multiple sellers and buyers,] Uber isn’t selling data to Lyft, Amazon isn’t selling data to Walmart, and Airbnb sure isn’t selling user lists to Hotels.com…

…The annual revenue per user for Facebook globally is about $25. In the US and Canada, it’s about $130. Don’t spend it all in one place.

That’s even assuming it’s owed to you by Facebook. Many of the high-value ad placements, such as that creepy ad for the product you browsed but didn’t buy on the web somewhere, are driven by data that Facebook doesn’t own. That outside party, be it Zappos or Walgreens, engages in some data-joining acrobatics to tell Facebook whom to show ads to, but the data itself isn’t shared with the social network; advertisers don’t trust Facebook either.</p>
data  oil  definition 
february 2019 by charlesarthur
Huawei frightens Europe's data protectors; America does, too • Bloomberg
Helen Fouquet and Marie Mawad:
<p>The Cloud Act (or the “Clarifying Lawful Overseas Use of Data Act”) addresses an issue that came up when Microsoft in 2013 refused to provide the FBI access to a server in Ireland in a drug-trafficking investigation, saying it couldn’t be compelled to produce data stored outside the US.

The act’s extraterritoriality spooks the European Union - an issue that’s become more acute as trans-Atlantic relations fray and the bloc sees the US under Trump as an increasingly unreliable ally.

Europe may seek to mitigate the impact of the law by drawing on a provision in the act that allows the US to reach “executive agreements” with countries allowing a mutual exchange of information and data. The European Commission wants the EU to enter into talks with the US, and negotiations may start this spring.

France and other EU countries like The Netherlands and Belgium are pushing for the bloc to present a common front as they struggle to come up with regulations to protect privacy, avert cyber attacks and secure critical networks in the increasingly amorphous world of information in the cloud.

A Dutch lawmaker at the European Parliament, Sophie in ’t Veld, recently expressed frustration at what she called the EU’s “enormous weakness” in the face of the US’s “unlimited data hunger.”

“Because of the Cloud Act, the long arm of the American authorities reaches European citizens, contradicting all EU law,” she said. “Would the Americans accept it if the EU would grant itself extraterritorial jurisdiction on US soil? And would the Commission also propose negotiations with Russia or China, if they would adopt their own Russian or Chinese Cloud Act?"</p>

Got to love the tortuous de-acroynmisation of American legislation (can anyone recall what the "Patriot" bit in the Patriot Act stands for). The US has acted extraterritorially in the UK for as long as I've been writing about computing, which is a very long time. What's changed is the EU's willingness to block it, legally.

(Minor stylistic niggle: Bloomberg writes "U.S." for United States but "EU" for European Union. Both are abbreviations. Why only dots for one? Extraterritorial punctuation? Anyhow, I remove them.)
America  data  legislation  eu 
february 2019 by charlesarthur
The deadly truth about a world built for men – from stab vests to car crashes • The Guardian
Caroline Criado-Perez:
<p>The average smartphone size is now 5.5 inches. While the average man can fairly comfortably use his device one-handed, the average woman’s hand is not much bigger than the handset itself. This is obviously annoying – and foolish for a company like Apple, given that research shows women are more likely to own an iPhone than men.

The tech journalist and author James Ball has a theory for why the big-screen fixation persists: because the received wisdom is that men drive high-end smartphone purchases. But if women aren’t driving high-end smartphone purchases – at least for non-Apple products – is it because women aren’t interested in smartphones? Or could it be because smartphones are designed without women in mind? On the bright side, Ball reassured me that screens probably wouldn’t be getting any bigger because “they’ve hit the limit of men’s hand size”.

Good news for men, then. But tough breaks for women like my friend Liz who owns a third-generation Motorola Moto G. In response to one of my regular rants about handset sizes she replied that she’d just been “complaining to a friend about how difficult it was to zoom on my phone camera. He said it was easy on his. Turns out we have the same phone. I wondered if it was a hand-size thing.”

When Zeynep Tufekci, a researcher at the University of North Carolina, was trying to document tear gas use in the Gezi Park protests in Turkey in 2013, the size of her Google Nexus got in the way. It was the evening of 9 June. Gezi Park was crowded. Parents were there with their children. And then the canisters were fired. Because officials “often claimed that tear gas was used only on vandals and violent protesters”, Tufekci wanted to document what was happening. So she pulled out her phone. “And as my lungs, eyes and nose burned with the pain of the lachrymatory agent released from multiple capsules that had fallen around me, I started cursing.” Her phone was too big. She could not take a picture one-handed – “something I had seen countless men with larger hands do all the time”. All Tufekci’s photos from the event were unusable, she wrote, and “for one simple reason: good smartphones are designed for male hands”.</p>


This is the topic of Criado-Perez's new book; the whole article is a fascinating tour through biases you probably didn't know exist (if you're male). It's certainly puzzling why Apple, and others, don't persist with the SE-sized phone: women I know love them.
women  data  workplace  statistics  bias 
february 2019 by charlesarthur
You give apps sensitive personal information. Then they tell Facebook • WSJ
Sam Schechner and Mark Secada:
<p>Apple Inc. and Alphabet Inc.’s Google, which operate the two dominant app stores, don’t require apps to disclose all the partners with whom data is shared. Users can decide not to grant permission for an app to access certain types of information, such as their contacts or locations. But these permissions generally don’t apply to the information users supply directly to apps, which is sometimes the most personal.

In the Journal’s testing, Instant Heart Rate: HR Monitor, the most popular heart-rate app on Apple’s iOS, made by California-based Azumio, sent a user’s heart rate to Facebook immediately after it was recorded.

Flo Health Inc.’s Flo Period & Ovulation Tracker, which claims 25 million active users, told Facebook when a user was having her period or informed the app of an intention to get pregnant, the tests showed.

Real-estate app Realtor.com, owned by Move Inc., a subsidiary of Wall Street Journal parent News Corp , sent the social network the location and price of listings that a user viewed, noting which ones were marked as favorites, the tests showed.

None of those apps provided users any apparent way to stop that information from being sent to Facebook.

Facebook said some of the data sharing uncovered by the Journal’s testing appeared to violate its business terms, which instruct app developers not to send it “health, financial information or other categories of sensitive information.” Facebook said it is telling apps flagged by the Journal to stop sending information its users might regard as sensitive. The company said it may take additional action if the apps don’t comply…

…Flo Health’s privacy policy says it won’t send “information regarding your marked cycles, pregnancy, symptoms, notes and other information that is entered by you and that you do not elect to share” to third-party vendors.

Flo initially said in a written statement that it doesn’t send “critical user data” and that the data it does send Facebook is “depersonalized” to keep it private and secure.

The Journal’s testing, however, showed sensitive information was sent with a unique advertising identifier that can be matched to a device or profile. A Flo spokeswoman subsequently said the company will “substantially limit” its use of external analytics systems while it conducts a privacy audit.</p>


Just astonishing. Facebook can't help itself; the companies can't help themselves. They're all in thrall to the promise, whether real or not, that gathering more personal data will lead to riches through targeted ads. When in reality the ads just creep us out. And this may be illegal under the GDPR, in Europe at least.
facebook  privacy  apps  data  gdpr 
february 2019 by charlesarthur
Differential privacy: an easy case • Substack
Mark Hansen:
<p>By law, the Census Bureau is required to keep our responses to its questionnaires confidential. And so, over decades, it has applied several “disclosure avoidance” techniques when it publishes data — these have been meticulously catalogued by Laura McKenna, going back to the 1970 census.

But for 2020, the bureau will instead release its data tables using a “formal privacy” framework known as “differential privacy.”

A unique feature of this new approach is that it explicitly quantifies privacy loss and provides mathematically provable privacy guarantees for those whose data are published as part of the bureau’s tables. 

Differential privacy is simply a mathematical definition of privacy. While there are legal and ethical standards for protecting our personal data, differential privacy is specifically designed to address the risks we face in a world of “big data” and “big computation.”

Given its mathematical origins, discussions of differential privacy can become technical very quickly.</p>


Apple and Google use this to make it harder to de-anonymise personal data. This is quite a long post, but it explains it while sticking to quite simple maths.
privacy  data  bigdata  census 
february 2019 by charlesarthur
Tracking sanctions-busting ships on the high seas • BBC News
Chris Baraniuk:
<p>For a long time, being out at sea meant being out of sight and out of reach. And all kinds of shenanigans went on as a result - countries secretly selling oil and other goods to countries they're not supposed to under international sanctions rules, for example, not to mention piracy and kidnapping.

The problem is that captains can easily switch off the current way of tracking ships, called the Automatic Identification System (AIS), hiding their location.

But now thousands of surveillance satellites have been launched into space, and artificial intelligence (AI) is being applied to the images they take. There's no longer anywhere for such ships to hide.

Samir Madani, co-founder of TankerTrackers.com, says his firm's satellite imagery analysis has identified Iranian tankers moving in and out of port, despite US sanctions restricting much of the country's oil exports. He's watched North Korea - which is limited by international rules to 500,000 barrels of refined oil every year - taking delivery of fuel via ship-to-ship transfers on the open ocean.

Turning off the AIS transponders that broadcast a ship's position, course and speed, is no longer a guarantee of anonymity.

His firm can even ascertain what cargo a ship is carrying - and how much - just by looking at its shadow on the water, says Mr Madani.</p>


<a href="https://tankertrackers.com">Tankertrackers</a> is pretty cheap if you were into analysis of oil supply lines - $299 per year.
ai  data  oil 
february 2019 by charlesarthur
Your smart light can tell Amazon and Google when you go to bed • Bloomberg
Matt Day:
<p>For several years, Amazon and Google have collected data every time someone used a smart speaker to turn on a light or lock a door. Now they’re asking smart-home gadget makers such as Logitech and Hunter Fan Co. to send a continuous stream of information.

In other words, after you connect a light fixture to Alexa, Amazon wants to know every time the light is turned on or off, regardless of whether you asked Alexa to toggle the switch. Televisions must report the channel they’re set to. Smart locks must keep the company apprised whether or not the front door bolt is engaged.

This information may seem mundane compared with smartphone geolocation software that follows you around or the trove of personal data Facebook Inc. vacuums up based on your activity. But even gadgets as simple as light bulbs could enable tech companies to fill in blanks about their customers and use the data for marketing purposes. Having already amassed a digital record of activity in public spaces, critics say, tech companies are now bent on establishing a beachhead in the home.

“You can learn the behaviors of a household based on their patterns,” says Brad Russell, who tracks smart home products for researcher Parks Associates Inc. “One of the most foundational things is occupancy. There’s a lot they could do with that.”

Some device makers are pushing back, saying automatic device updates don’t give users enough control over what data they share, or how it can be used. Public guidelines published by Amazon and Google don’t appear to set limits on what the companies can do with the information they glean about how people use appliances.

Amazon and Google say they collect the data to make it easier for people to manage their home electronics…

…When smart speakers first hit the market, using them to command another device worked like this. After receiving the command “Alexa, turn on the light,” the software would ask the light bulb maker’s servers for the current status of the bulb. After a reply came back confirming the switch was off, Alexa would instruct the light to turn on.

Now, in a push that accelerated last year, Amazon and Google are recommending—and, in some cases, requiring—that smart home makers tweak their code to reverse that relationship. Instead, the light bulb must report in to the hub with its status at all times.</p>

That could quickly get messy if your home has lots of devices, and it feels eminently hackable.
google  amazon  smarthome  data 
february 2019 by charlesarthur
I blocked Amazon, Facebook, Google, Microsoft, and Apple • Gizmodo
Kashmir Hill:
<p>I am using a Linux laptop made by a company named Purism and a Nokia feature phone on which I am relearning the lost art of T9 texting…

…in preparation for the week, I export all my contacts from Google, which amounts to a shocking 8,000 people. I have also whittled down the over 1,500 contacts in my iPhone to 143 people for my Nokia, or the number of people I actually talk to on a regular basis, which is incredibly close to Dunbar’s number.

I wind up placing a lot of phone calls this week, because texting is so annoying on the Nokia’s numbers-based keyboard. I find people often pick up on the first ring out of concern; they’re not used to getting calls from me.

I don’t think I could have done this cold turkey.
On the first day of the block, I drive to work in silence because my rented Ford Fusion’s “SYNC” entertainment system is powered by Microsoft. Background noise in general disappears this week because YouTube, Apple Music, and our Echo are all banned—as are Netflix, Spotify, and Hulu, because they rely on AWS and the Google Cloud to get their content to users.

The silence causes my mind to wander more than usual. Sometimes this leads to ideas for my half-finished zombie novel or inspires a new question for investigation. But more often than not, I dwell on things I need to do.

Many of these things are a lot more challenging as a result of the experiment, such as when I record an interview with Alex Goldman of the podcast Reply All about Facebook and its privacy problems.

I live in California, and Alex is in New York; we would normally use Skype, but that’s owned by Microsoft, so instead we talk by phone and I record my end with a handheld Zoom recorder. That works fine, but when it comes time to send the 386 MB audio file to Alex, I realize I have no idea how to send a huge file over the internet.</p>


So essentially like living in 1995. Take it from a survivor: we managed. (OK, there weren't Linux laptops. But Windows and MacOS at the time were pretty much the same as Linux is now.)
internet  privacy  data  tech 
february 2019 by charlesarthur
Germany blocks Facebook from pooling user data without consent • Financial Times
Olaf Storbeck, Madhumita Murgia and Rochelle Toplensky:
<p>Germany’s antitrust watchdog on Thursday blocked Facebook from pooling data collected from Instagram, its other subsidiaries and third-party websites without user consent in a landmark decision on internet privacy rights and competition.

The Federal Cartel Office said it was tackling what it described as the Silicon Valley company’s “practically unrestricted collection and assigning of non-Facebook data” to user accounts.

In a press conference in Bonn, the German authorities said that Facebook needed the “voluntary consent” of users to pool data from other services with its own Facebook user data.

The FCO also said that Facebook needed consent to collect data from third-party websites outside its own ecosystem. “If consent is not given . . . Facebook will have to substantially restrict its collection and combination of data,” the cartel office said.</p>


Note that it's the antitrust office, not the privacy commissioner doing this. Though one suspects that Facebook will get round it with a dialog box.
facebook  data 
february 2019 by charlesarthur
Data without a cause — the hype and hope around wearables • Medium
Annastiina Salminen:
<p>none of the users I talked to had presented any of their sleep or activity data to their doctors or other health professionals. Despite their intrigue, the weekly heart rate variance or the share of REM of last night’s sleep are still arbitrary numbers with little actionability from a scientific perspective. What does a readiness level of 73 actually mean and how does it differ from 52? Are these just vanity metrics or is there a way for the doctor to somehow contextualize them?

The most avid proponents of quantified self think that the clinical system is broken. In times where scientific versus experiential experience is a continuous topic of discussion, the information asymmetry argued to have benefited the clinicians instead of the patients is now perceived to have been turned upside down thanks to the rise of wearables and the democratized access to data. But it’s important to note that all data doesn’t carry the same value. The information asymmetry argument holds true when looking only at the sheer volume, but the main challenge of identifying the clinical benefit of wearables data and integrating it to the clinical work is that despite being mile wide, it’s still only an inch deep…

…Health data is the last frontier that lacks democratization, and the push for wearables is a result of that impatience. The data and the consumer-grade devices presenting it might be far from reliable, but they are the first wave towards a more open health data ecosystem and needs to be taken seriously. The responsibility to interpret the readiness levels and sleep data doesn’t lie with the individual, but with the doctors and every other actor in the ecosystem. </p>
data  wearables 
january 2019 by charlesarthur
Data broker that sold phone locations used by bounty hunters lobbied FCC to scrap user consent • Motherboard
Joseph Cox and Jason Koebler:
<p>Earlier this month Motherboard showed how T-Mobile, AT&T, and Sprint were selling cell phone users’ location data that ultimately ended up in the hands of bounty hunters and people unauthorized to handle it. That data trickled down from the telecommunications giants through a complex network of middlemen and data brokers. One of those third parties was Zumigo, a company that gets location data access directly from the telcos and then sells it for a profit.

Motherboard has now unearthed a presentation that Zumigo gave to the Federal Communications Commission (FCC) in late 2017 in which it asked the agency to place even fewer restrictions on how some of the data it sells can be used, and specifically asked for the agency to loosen user consent requirements for data sharing.

“As breaches become more prevalent and as consumers rely more on mobile phones, there is a tipping point where financial and personal protections begin to equal, or outweigh, privacy concerns,” one of the slides reads.

Another slide titled “solutions” suggests that the FCC loosen current consent requirements that are included in cell phone providers’ terms of service, allowing carriers to use vaguer, “more flexible” language.</p>


Wouldn't it be great if the US had some sort of laws around this stuff?
fcc  data 
january 2019 by charlesarthur
773M password ‘megabreach’ is years old • Krebs on Security
Brian Krebs on the "megahack" that was big news last week:
<p>Collection #1 offered by this seller is indeed 87GB in size. He also advertises a Telegram username where he can be reached — “Sanixer.” So, naturally, KrebsOnSecurity contacted Sanixer via Telegram to find out more about the origins of Collection #1, which he is presently selling for the bargain price of just $45.

Sanixer said Collection#1 consists of data pulled from a huge number of hacked sites, and was not exactly his “freshest” offering. Rather, he sort of steered me away from that archive, suggesting that — unlike most of his other wares — Collection #1 was at least 2-3 years old. His other password packages, which he said are not all pictured in the above screen shot and total more than 4 terabytes in size, are less than a year old, Sanixer explained.

By way of explaining the provenance of Collection #1, Sanixer said it was a mix of “dumps and leaked bases,” and then he offered an interesting screen shot of his additional collections. Click on the image below and notice the open Web browser tab behind his purloined password trove (which is apparently stored at Mega.nz): Troy Hunt’s published research on this 773 million Collection #1.

<img src="https://krebsonsecurity.com/wp-content/uploads/2019/01/sanixer.jpg" width="100%" /><br /><em>Sanixer says Collection #1 was from a mix of sources. A description of those sources can be seen in the directory tree on the left side of this screenshot.</em>

[CTO of Hold Security, Alex] Holden said the habit of collecting large amounts of credentials and posting it online is not new at all, and that the data is far more useful for things like phishing, blackmail and other indirect attacks — as opposed to plundering inboxes. Holden added that his company had already derived 99 percent of the data in Collection #1 from other sources.</p>


So it's basically like the fluff-covered sad-looking pick'n'mix sweet trays in Woolworths of old.
data  breach  password  hacking 
january 2019 by charlesarthur
Los Angeles accuses Weather Channel app of covertly mining user data • The New York Times
Jennifer Valentino-DeVries and Natasha Singer:
<p>One of the most popular online weather services in the United States, the Weather Channel app has been downloaded more than 100 million times and has 45 million active users monthly.

The government said the Weather Company, the business behind the app, unfairly manipulated users into turning on location tracking by implying that the information would be used only to localize weather reports. Yet the company, which is owned by IBM, also used the data for unrelated commercial purposes, like targeted marketing and <a href="https://web.archive.org/web/20180731211011/https://business.weather.com/writable/documents/Financial-Markets/InvestorInsights_SolutionSheet.pdf">analysis for hedge funds</a>, according to <a href="https://int.nyt.com/data/documenthelper/554-l-a-weather-app-location/8980fd9af72915412e31/optimized/full.pdf">the lawsuit</a>.

The lawsuit accuses the Weather Channel of manipulating users by implying that tracking data would be used only to localize weather reports.

The city’s lawsuit cited an article last month in The New York Times that detailed a sprawling industry of companies that profit from continuously snooping on users’ precise whereabouts. The companies collect location data from smartphone apps to cater to advertisers, stores and investors seeking insights into consumer behavior.</p>


Covertly mining user data. Is this better or worse that using your computer to covertly mine cryptocurrency? Discuss.
weatherchannel  app  data 
january 2019 by charlesarthur
Non-disclosure Apple • DIGITS to DOLLARS
Jonathan Goldberg looks back to the 2000s, when everyone used to disclose their handset sales figures:
<p>The industry research shops (e.g. Gartner) sold [product sales forecast] models for other product segments, but those were fragile and prone to breaking under heavy scrutiny. For handsets, everyone involved could make sound judgments, while the other segments were prone to problems stemming from a general lack of data.

All of this started to break down after the launch of the iPhone. Many companies got themselves backed into reporting corners as their data increasingly painted the wrong picture. Apple did not pursue market share, as <a href="https://www.wsj.com/articles/SB124805149501664033">we first argued back in May of 2009</a> (email us if you would like a copy of the original note). Apple was pursuing profit share. It took several years for the other handset companies to realize that their record shipment data was useless for explaining why their profits were plummeting. Then with the early waves of Android, the former leaders’ market shares also started plummeting. And so one by one all the others stopped reporting unit figures.

We remember one example of why this data was important for the companies that were slowly stopping to report it. Around 2009, the India analyst for one of the third party research shops reported market share data that showed Nokia had lost a huge chink of market share there. Nokia actually issued an official statement denying this. The analysis company’s other analysts all chimed in as well, siding with Nokia and not their colleague. We believe the analysts was actually fired, and certainly faced reprimand when his own employers sided with one of their largest customers over their own analyst. But it turns out he was right, he had the correct data, Nokia had very rapidly gone from market share leader to number two player, and they were losing share to a swarm of China-based handset companies. By denying the reality, Nokia turned a blind eye to its growing problem, and ultimately the company was pushed from the handset market entirely.</p>


So why is Apple now going to stop reporting those numbers? Optics, he thinks:
<p> Investors, in particular, tend to analyze data to death. They have to make big decisions (with other people’s money) based on whatever data they can gather. Then they build models to make predictions which can have a huge impact on their valuation decisions. In Apple’s case, this means they will take any declines in unit shipments and extrapolate those numbers out to the heat death of the universe.</p>
iphone  apple  data 
december 2018 by charlesarthur
Uncovering what your phone knows • The New York Times
Jennifer Valentino-DeVries on how they got the data for that "<a href="https://www.nytimes.com/interactive/2018/12/10/business/location-data-privacy-apps.html?module=inline">your phones are tracking you, and the data is being sold</a>" story from last week:
<p>I wrote <a href="https://www.nytimes.com/2018/05/10/technology/cellphone-tracking-law-enforcement.html?module=inline">an article</a> in May about a company that bought access to data from the major US cellphone carriers. My reporting showed that the company, Securus Technologies, allowed law enforcement to get this data, and officers were using the information to track people’s locations without a warrant. After that article ran, I started getting tips that the use of location data from cellphones was more widespread than I had initially reported. One person highlighted <a href="https://news.ycombinator.com/item?id=17081684">a thread on Hacker News</a>, an online forum popular with technologists. On the site, people were anonymously discussing their work for companies that used people’s precise location data.

I called sources who knew about mapping and location data. Many had worked in that field for more than a decade. I also partnered with other Times reporters, Natasha Singer and Adam Satariano, who were looking into something similar. These conversations were the start of an investigation into how smartphone apps were tracking people’s locations, and the revelation that the tipsters were right — selling location data was common and lucrative.

On a big investigation like this one, hours and even days of work can go into a single paragraph or even a sentence. This is especially true in technology investigations because the subject matter is so detailed; combing through data and conducting technical tests is time consuming.</p>
data  phone  journalism 
december 2018 by charlesarthur
What comes next in that contested election in North Carolina • FiveThirtyEight
Nathaniel Rakich on an election to the US House of Representatives, apparently won by 905 votes by the GOP candidate, which is now in doubt over postal ballots:
<p>As we highlighted two weeks ago, Bladen County and neighboring Robeson County had unusually high levels of absentee ballots requested or cast. Harris also received an incredibly high proportion of the mail-in absentee votes in Bladen considering how few registered Republicans voted by mail there. <a href="http://www.oldnorthstatepolitics.com/2018/11/ncs-closest-congressional-contest-gets.html">Only 19%</a> of Bladen County’s accepted mail-in absentee ballots were cast by registered Republicans, yet mail-in absentee ballots leaned heavily Republican; in every other county in the 9th District, mail-in ballots favored the Democrat.

But new information digs down past the county level to find anomalies in certain types of neighborhoods. In an analysis of absentee-by-mail ballots in the 9th District, Kevin Morris and Myrna Pérez at the Brennan Center for Justice <a href="https://www.brennancenter.org/blog/north-carolinas-election-fiasco-about-voter-suppression-not-voter-fraud">found that mail-in absentee ballots from low-income Census tracts were more likely to have been spoiled</a> (that is, declared invalid) than those from high-income areas in the 9th and those from low-income areas outside the 9th. Low-income neighborhoods also had a higher rate of unreturned mail-in ballots. If someone was in fact running a large-scale election-tampering operation, the increase in unreturned ballots could mean that someone was discarding some legitimate ballots before they could be returned, or that voters themselves were discarding ballots fraudulently requested in their names by someone hoping to intercept them and fill them out. According to Morris and Pérez, this discrepancy in the returned ballot rate could be an indication that lower-income voters were specifically targeted for election fraud.

The Raleigh News & Observer <a href="https://www.newsobserver.com/news/politics-government/article222436915.html">calculated</a> that in Robeson County, 69% of mail-in absentee ballots requested by Native American voters and 75% of those requested by African-American voters were not returned, well out of line with the rest of the district. The Brennan Center also found that <a href="https://www.brennancenter.org/blog/north-carolinas-election-fiasco-about-voter-suppression-not-voter-fraud">nonwhite voters’ ballots were more likely to be spoiled</a>.</p>


The key part of this story is that the apparent fraud was picked up by statistics: the numbers from different areas didn't tally with those in Bladen, which was a wild outlier. Data exposes lies as well as truths.
data  election  northcarolina 
december 2018 by charlesarthur
Facebook considered charging for access to user data • WSJ
Deepa Seetharaman and Kirsten Grind:
<p>Internal emails show Facebook Inc. FB +1.89% considered charging companies for continued access to user data several years ago, a step that would have marked a dramatic shift away from the social-media giant’s policy of not selling that information, according to an unredacted court document viewed by The Wall Street Journal.

The emails in the document also indicate that Facebook employees discussed pushing some advertisers to spend more in return for increased access to user information.

Taken together, the internal emails show the company discussing how to monetize its user data in ways that are employed by some other tech firms but that Facebook has said it doesn’t do.

At a congressional hearing in April, Facebook Chief Executive Mark Zuckerberg said, “I can’t be clearer on this topic: We don’t sell data.”

The emails—most from about 2012 to 2014—are far from conclusive, lacking context and in some cases truncated. But they provide a window into mostly sealed court filings—which a British lawmaker has pledged to make public next week—from a lawsuit against Facebook filed by a company called Six4Three LLC…

…The Wall Street Journal viewed three pages of unredacted material from one 18-page document that showed portions of some internal emails. In other court filings, Facebook said these excerpts were subsequently redacted because they contained “sensitive discussion of Facebook’s internal strategic analysis of third-party applications, the release of which could damage Facebook’s relationships” with those apps.</p>


Or you can go and <a href="https://arstechnica.com/tech-policy/2018/11/facebook-pondered-for-a-time-selling-access-to-user-data/">read them unredacted at Ars Technica</a>, where Cyrus Farivar discovered you could just <a href="https://www.documentcloud.org/documents/5317193-7b6c2e96-396f-4cf6-ba18-65c8f848383d.html">put the document into a text editor</a> and, aha. (Choose the text option.)

The discussion it reveals, via internal Facebook emails, is pretty shocking.
facebook  data 
november 2018 by charlesarthur
C M Taylor on ‘keystroke logging project’ with British Library • English and Drama blog
<p>Re-entering the academic world after starting work as an Associate Lecturer on the Publishing degree at Oxford Brookes University, I began speculating about writers' archives. Did previous scholars have access to more hand-written and typed drafts of works in progress - actual objects showing the shaping of works of art - but with the normalisation of computerized authorship, were these discrete drafts abolished in the rolling palimpsest of write and digital rewrite?

Plus, I was considering a new novel myself, but as I have written elsewhere, emotionally I was daunted by the long-haul loneliness of novel writing, a process I considered in my most despairing moments as like wallpapering a dungeon.

I spoke to my friend Mark about these two things - the lost drafts and the loneliness - and in a flash he had the answer: ‘Put a piece of malware on it.’

He meant that if I put some malware, or spyware, on my computer to note everything I did, it would record all changes made to an evolving manuscript, plus it might offer a weird kind of company for me in my wallpapered dungeon.

It was worth a shot.</p>


Generated 222GB of data across 108,318 files.
keystroke  data  novel 
november 2018 by charlesarthur
Parliament seizes cache of Facebook internal papers • The Guardian
Carole Cadwalldr:
<p>[Damian Collins, chair of the Select Committee, said:] “We have followed this court case in America and we believed these documents contained answers to some of the questions we have been seeking about the use of data, especially by external developers.”

The documents seized were obtained during a legal discovery process by Six4Three. It took action against the social media giant after investing $250,000 in an app. Six4Three alleges the cache shows Facebook was not only aware of the implications of its privacy policy, but actively exploited them, intentionally creating andeffectively flagging up the loophole that Cambridge Analytica used to collect data. That raised the interest of Collins and his committee.

A Facebook spokesperson said that Six4Three’s “claims have no merit, and we will continue to defend ourselves vigorously”.

The files are subject to an order of a Californian superior court, so cannot be shared or made public, at risk of being found in contempt of court. Because the MPs’ summons was issued in London where parliament has jurisdiction, it is understood the company founder, although a US citizen, had no choice but to comply. It is understood that Six4Three have informed both the court in California and Facebook’s lawyers.

Facebook said: “The materials obtained by the DCMS committee are subject to a protective order of the San Mateo Superior Court restricting their disclosure. We have asked the DCMS committee to refrain from reviewing them and to return them to counsel or to Facebook. We have no further comment.”

It is unclear what, if any, legal moves Facebook can make to prevent publication. UK, Canada, Ireland, Argentina, Brazil, Singapore and Latvia will all have representatives joining what looks set to be a high-stakes encounter between Facebook and politicians.</p>


An amazing story, using a Parliamentary power that hasn't been used in hundreds of years. Facebook <a href="https://twitter.com/carolecadwalla/status/1066732715737837569">responded</a>; <a href="https://twitter.com/DamianCollins/status/1066773746491498498">Damian Collins answered back</a>, robustly.
facebook  cambridgeanalytica  data 
november 2018 by charlesarthur
Are Pop Lyrics Getting More Repetitive?
<p>In 1977, the great computer scientist Donald Knuth published a paper called The Complexity of Songs, which is basically one long joke about the repetitive lyrics of newfangled music (example quote: "the advent of modern drugs has led to demands for still less memory, and the ultimate improvement of Theorem 1 has consequently just been announced").

I'm going to try to test this hypothesis with data. I'll be analyzing the repetitiveness of a dataset of 15,000 songs that charted on the Billboard Hot 100 between 1958 and 2017…</p>


But how?
<p>You may not have heard of the Lempel-Ziv algorithm, but you probably use it every day. It's a lossless compression algorithm that powers gifs, pngs, and most archive formats (zip, gzip, rar...).

What does this have to do with pop music? The Lempel-Ziv algorithm works by exploiting repeated sequences. How efficiently LZ can compress a text is directly related to the number and length of the repeated sections in that text.</p>


This is wonderful: the graphics are brilliantly done, and the discoveries (top 10 songs are always more repetitive than most) unexpected.
music  analysis  data  compression 
november 2018 by charlesarthur
Tech shoppers ditch desktop PCs and DVD players • Ofcom
<p>Ownership of digital devices such as smart TVs, smart watches and smartphones has grown significantly in recent years, as more people need a constant connection to the internet - internet users say they spend an average of 24 hours a week online.

By contrast, MP3 players, DVD players and desktop computers seem to be falling out of favour as smartphone use continues to grow, particularly for browsing and streaming.

Meanwhile, the popularity of tablets and e-readers seems to have peaked. Ownership of both is significantly higher than it was seven years ago, but has levelled out in the last few years.

<img src="https://www.ofcom.org.uk/__data/assets/image/0024/127509/Cyber-week-story-Nov-2018-graph.png" width="100%" />

Ofcom now measures ownership of smart speakers (owned by 13% of households) and virtual reality (VR) headsets (5%). The first VR headset went on sale in the UK in 2015 – a year earlier than smart speakers, which have been quicker to capture the imagination of tech shoppers.

Other emerging trends include wearable tech, such as smart watches and fitness trackers. One in five households now owns these devices, and ownership has been doubling every year since 2016.

Ian Macrae, Ofcom’s Director of Market Intelligence, said: “As technology evolves and transforms how we live our lives, the devices we rely on are constantly changing.

“The growth in popularity of streaming services has created tremendous demand for connected TVs, which for many people are replacing DVD players, and the smartphone is replacing several other devices at once.

“The range of connected devices is expanding rapidly. Smart speakers really took off last year and along with other smart home devices will again be ones to watch this year.”</p>
Ofcom  data  media 
november 2018 by charlesarthur
Your smartphone’s location data is worth big money to Wall Street • WSJ
Ryan Dezember:
<p>Thasos gets data from about 1,000 apps, many of which need to know a phone’s location to be effective, like those providing weather forecasts, driving directions or the whereabouts of the nearest ATM. Smartphone users, wittingly or not, share their location when they use such apps.

Before Thasos gets the data, suppliers scrub it of personally identifiable information, Mr. Skibiski said. It is just time-stamped strings of longitude and latitude. But with more than 100 million phones providing such coordinates, Thasos says it can paint detailed pictures of the ebb and flow of people, and thus their money.

Alex “Sandy” Pentland, a Massachusetts Institute of Technology computer scientist who helped launch Thasos, likens it to a circulatory system: “You can look at this blood flow of people moving around.”

…Thasos won’t name its clients, but Mr. Skibiski says it sells data to dozens of hedge funds, some of which pay more than $1m a year. Thasos’s largest investor is Ken Nickerson, who helped build PDT Partners into a quantitative-investing mint inside Morgan Stanley .

This month, Thasos is set to start offering data through Bloomberg terminals. A measure of mall foot traffic will be widely available; detailed daily feeds about malls owned or operated by 30 large real-estate investments trusts cost extra.</p>
smartphone  location  data 
november 2018 by charlesarthur
Geospatial Commission earmarks first investments • UK Authority
<p>The Geospatial Commission has announced its first investments with plans to pump £5m into unlocking data held by the British Geological Survey, Coal Authority, HM Land Registry, Ordnance Survey, UK Hydrographic Office and the Valuation Office Agency.

The recently created organisation indicated it will provide £80m over the next two years to support the development of new products that can propel “British companies onto a global market”. 

The six to receive the first round of investments are the partner bodies of the commission, set up by the chancellor a year ago to exploit location information, or geospatial data.

Using this publicly held data more productively could be worth up to £11bn to the economy every year, the Government believes.

The data has been produced from delivering public services and enforcing laws – such as navigating public transport or tracking supply chains – but will now be analysed by private firms for new services.

David Lidington, the Cabinet Office minister, said: “This Government is committed to providing more opportunities for tech businesses - including small firms - to thrive, as well as access public procurement opportunities."</p>

That's good - considering it took four years of lobbying, starting back in 2006, to get the government even to countenance making OS and UKHO data open, this is a continuation down a long road.
Freeourdata  data  freedata  uk 
october 2018 by charlesarthur
Police use Fitbit data to charge 90-year-old man in stepdaughter’s killing • The New York Times
Christine Hauser:
<p>On Sept. 13, a co-worker of Ms. Navarra’s went to the house to check on her because she had not showed up for her job at a pharmacy, the report said. The front door was unlocked, and she discovered Ms. Navarra dead, slouched in a chair at her dining room table.

She had lacerations on her head and neck, and a large kitchen knife was in her right hand, the report said. Blood was spattered and uneaten pizza was strewn in the kitchen. The coroner ruled the death a homicide.

Detectives then questioned Ms. Navarra’s only known next-of-kin, her 92-year-old mother, Adele Aiello, and [stepfather] Mr. Aiello. Mr. Aiello told the authorities he had dropped off the food for his stepdaughter and left her house within 15 minutes, but he said he saw Ms. Navarra drive by his home with a passenger in the car later that afternoon.

Investigators obtained a search warrant and retrieved the Fitbit data [from Ms Navarra's AltaHR worn on her wrist, which measured her heartbeat] with the help of the company’s director of brand protection, Jeff Bonham, the police report said…

When Ms. Navarra’s Fitbit data was compared with video surveillance from her home, the police report said, the police discovered that the car Mr. Aiello had driven was still there when her heart rate stopped being recorded by her Fitbit.

Bloodstained clothes were later found in Mr. Aiello’s home, the document said. He was arrested on Sept. 25.</p>


When I was younger, some sci-fi stories had the idea of monitors which rich people wore to monitor their heartbeat, so that if they were killed, the killer wouldn't get away. Turns out they're available in your local store.
fitbit  data  murder 
october 2018 by charlesarthur
The interesting ideas in Datasette • Simon Willison
The aforesaid Willison, who has built a database tool called Datasette which uses SQLite databases (caution: can only store up to 140TB - yes, terabytes). This will interest you if you're into data tools; Willison built the tools that the Guardian used to analyse MPs' expenses:
<p>Since the data in a Datasette instance never changes, why not cache calls to it forever?

Datasette sends a far future HTTP cache expiry header with every API response. This means that browsers will only ever fetch data the first time a specific URL is accessed, and if you host Datasette behind a CDN such as Fastly or Cloudflare each unique API call will hit Datasette just once and then be cached essentially forever by the CDN.

This means it’s safe to deploy a JavaScript app using an inexpensively hosted Datasette-backed API to the front page of even a high traffic site—the CDN will easily take the load.

Zeit added Cloudflare to every deployment (even their free tier) back in July, so if you are hosted there you get this CDN benefit for free.

What if you re-publish an updated copy of your data? Datasette has that covered too. You may have noticed that every Datasette database gets a hashed suffix automatically when it is deployed:

<a href="https://fivethirtyeight.datasettes.com/fivethirtyeight-c9e67c4">https://fivethirtyeight.datasettes.com/fivethirtyeight-c9e67c4</a>

This suffix is based on the SHA256 hash of the entire database file contents—so any change to the data will result in new URLs. If you query a previous suffix Datasette will notice and redirect you to the new one.

If you know you’ll be changing your data, you can build your application against the non-suffixed URL. This will not be cached and will always 302 redirect to the correct version (and these redirects are extremely fast).

<a href="https://fivethirtyeight.datasettes.com/fivethirtyeight/alcohol-consumption%2Fdrinks.json">https://fivethirtyeight.datasettes.com/fivethirtyeight/alcohol-consumption%2Fdrinks.json</a>

The redirect sends an HTTP/2 push header such that if you are running behind a CDN that understands push (such as Cloudflare) your browser won’t have to make two requests to follow the redirect.</p>
data  database  sqlite  datasette 
october 2018 by charlesarthur
Apple CEO Tim Cook says giving up your data for better services is ‘a bunch of bunk’ • The Washington Post
Hamza Shaban:
<p>Apple chief executive Tim Cook urged consumers not to believe the dominant tech industry narrative that the data collected about them will lead to better services.

In an interview with “Vice News Tonight” that aired Tuesday, Cook highlighted his company’s commitment to user privacy, positioning Apple’s business as one that stands apart from tech giants that compile massive amounts of personal data and sell the ability to target users through advertising.

“The narrative that some companies will try to get you to believe is: I’ve got to take all of our data to make my service better,” he said. “Well, don’t believe them. Whoever’s telling you that, it’s a bunch of bunk.”

Cook’s remarks come at a pivotal time for Silicon Valley. In the past year, technology companies and their executives have come under unprecedented scrutiny from elected officials and regulators stemming from a variety of issues, including a barrage of data privacy scandals, accusations of toxic corporate culture, the negative impact of tech platforms on political debate, and concerns over tech overuse and addiction. In recent months, growing calls from Capitol Hill have boosted the prospects of new legislation aimed at big tech companies…

…Cook said in the interview that he is “exceedingly optimistic” that the topic of data privacy has reached an elevated level of public debate. “When the free market doesn’t produce a result that’s great for society you have to ask yourself what do we need to do. And I think some level of government regulation is important to come out on that.”</p>
apple  data 
october 2018 by charlesarthur
Watch out, algorithms: Julia Angwin and Jeff Larson unveil The Markup, their plan for investigating tech’s societal impacts • Nieman Journalism Lab
Christine Schmidt interviewed Julia Angwin, who left ProPublica to set up the new site:
<p>ANGWIN: …We have an idea about how journalism should be. It’s much more tech-focused than any newsroom, even though ProPublica is the most tech-infused newsroom out there. We want to take it to another level.

SCHMIDT: What is that next level? What are the nuts and bolts of how this organization will operate differently?

ANGWIN: We describe ourselves as doing journalism that is based on the scientific method. The idea is that objectivity has been a false god for journalism for a long time. It started out as a decent idea, but it’s led to a lot of what people call false equivalents. I think journalism needs a new guiding light, a new philosophical approach, and I think that approach should be the scientific method. What that really means is we develop a hypothesis. Maybe the hypothesis is: ‘Brett Kavanaugh. Did he actually harass a woman or not?’ Then you collect evidence — how much evidence is there for and against this. Then you describe the limitations of your evidence: ‘So far the evidence is only one/two people.’
It doesn’t have to be ‘he said, she said.’ It’s more about: this is the amount of evidence to support this hypothesis, and then here are the limitations of this. There are always limitations to our findings. Even though climate change is well accepted scientifically, there are limitations for those findings as well. That’s our goal, to try to frame our journalism around that.

What that means in practice is having people with technical and statistical skills involved in an investigation from the outset. So much of what happens in traditional newsrooms, in every newsroom I’ve ever worked in, is that there’s a data desk. A reporter goes over to the desk and basically orders data like it’s a hamburger. Usually by then, the reporter has already done the reporting and has a hypothesis based on the anecdotes. Then, if the data doesn’t support it, there’s a fight between them and the data desk. Or, more often, there’s not even data available.</p>


This sounds fantastic. (Not for everyone, of course.) Data journalism - where the story comes from the data - is enormously satisfying when it comes right. Some of my best stories have come from interpreting public documents: the story's in there, you just have to listen to what the numbers are saying.
data  journalism 
september 2018 by charlesarthur
Apple gives you a TRUST rating – and it's based on your phone call and email habits • The Sun
Sean Keach:
<p> Apple builds a score based on the number calls and emails you send and receive – to help spot fraudulent transactions made using your device.

"To help identify and prevent fraud, information about how you use your device, including the approximate number of phone calls or emails you send and receive, will be used to compute a device trust score when you attempt a purchase," Apple explained. "The submissions are designed so Apple cannot learn the real values on your device. The scores are stored for a fixed time on our servers."

So how does it actually work? Apple has a bunch of different anti-fraud systems in place to work out whether payments you make are legitimate.

One of these, added in the new iOS 12 update, is a numeric trust score that's associated with your device. This score is sent directly to Apple when you make a purchase.

The data used to create the score – including the number of phone calls you've made – is only ever stored on your device.

Importantly, when Apple sees the score, it doesn't see the contents of your communications. It's not reading your emails, for instance. These scores are also encrypted in transit, which means anyone who managed to intercept them would only see gibberish. Apple says it holds onto the scores for a limited period of time, although it's not clear how long that is.</p>

Clever. It all goes into a single number.
apple  iphone  trust  data 
september 2018 by charlesarthur
Amazon investigates employees leaking data for bribes • WSJ
Jon Emont, Laura Stevens and Robert McMillan:
<p>Employees of Amazon, primarily with the aid of intermediaries, are offering internal data and other confidential information that can give an edge to independent merchants selling their products on the site, according to sellers who have been offered and purchased the data, brokers who provide it and people familiar with internal investigations.

The practice, which violates company policy, is particularly pronounced in China, according to some of these people, because the number of sellers there is skyrocketing. As well, Amazon employees in China have relatively small salaries, which may embolden them to take risks.

In exchange for payments ranging from roughly $80 to more than $2,000, brokers for Amazon employees in Shenzhen are offering internal sales metrics and reviewers’ email addresses, as well as a service to delete negative reviews and restore banned Amazon accounts, the people said.</p>
amazon  data  privacy 
september 2018 by charlesarthur
How game apps that captivate kids have been collecting their data • NY Times
Jennifer Valentino-DeVries, Natasha Singer, Aaron Krolik and Michael Keller:
<p>Before Kim Slingerland downloaded the Fun Kid Racing app for her then-5-year-old son, Shane, she checked to make sure it was in the family section of the Google Play store and rated as age-appropriate. The game, which lets children race cartoon cars with animal drivers, has been downloaded millions of times.

Until last month, the app also shared users’ data, sometimes including the precise location of devices, with more than a half-dozen advertising and online tracking companies. On Tuesday evening, New Mexico’s attorney general filed a lawsuit claiming that the maker of Fun Kid Racing had violated a federal children’s privacy law through dozens of Android apps that shared children’s data.

“I don’t think it’s right,” said Ms. Slingerland, a mother of three in Alberta, Canada. “I don’t think that’s any of their business, location or anything like that.”

The suit accuses the app maker, Tiny Lab Productions, along with online ad businesses run by Google, Twitter and three other companies, of flouting a law intended to prevent the personal data of children under 13 from falling into the hands of predators, hackers and manipulative marketers. The suit also contends that Google misled consumers by including the apps in the family section of its store.

An analysis by The New York Times found that children’s apps by other developers were also collecting data. The review of 20 children’s apps — 10 each on Google Android and Apple iOS — found examples on both platforms that sent data to tracking companies, potentially violating children’s privacy law; the iOS apps sent less data over all.

These findings are consistent with those published this spring by academic researchers who analyzed nearly 6,000 free children’s Android apps. They reported that more than half of the apps, including those by Tiny Lab, shared details with outside companies in ways that may have violated the law.</p>
apps  google  data  privacy  android 
september 2018 by charlesarthur
Cressida Dick calls for fast legal access to social media accounts • The Guardian
Ben Quinn:
<p>The head of Scotland Yard has called for police to be able to quickly access material from social media companies after the suspect in the murder of 13-year-old Lucy McHugh was jailed for withholding his Facebook password.

The Metropolitan police commissioner, Cressida Dick, was speaking after Stephen Nicholson pleaded guilty last week to a charge under the Regulation of Investigatory Powers Act and was sentenced to 14 months’ imprisonment.

Asked if Hampshire police should have been denied the data they had requested, Dick said it was not the first time a police service had approached a social media firm looking for evidence “and had to go through either a very protracted procedure, or has found that it’s impossible to do so”.

She said, during an interview on LBC Radio: “I absolutely think that in certain instances – and it sounds like this is one – law enforcement in the UK ought to have vital evidence which might bring someone to justice. There are complex and practical things for them, and legal things, which I do respect. It’s not as straightforward as it sounds, but I think that’s where we should be.”

Nicholson twice refused to give detectives his Facebook password while being questioned on suspicion of murder and sexual activity with a child. Police were facing difficulties in trying to obtain the messages from Facebook, Southampton crown court was told by prosecutors.</p>


The UK law will change and make it easier for the police to get this sort of detail next year. It's not quite part of the end-to-end encryption row, but you can see the waters getting higher, ever so subtly.
police  facebook  data  privacy 
september 2018 by charlesarthur
How Facebook has flattened human communication • Medium
David Auerbach is a writer and software engineer:
<p>The conclusions and impact of data analyses more often flow from the classifications under which the data has been gathered than from the data itself. When Facebook groups people together in some category like “beer drinkers” or “fashion enthusiasts,” there isn’t some essential trait to what unifies the people in that group. Like Google’s secret recipe, Facebook’s classification has no actual secret to it. It is just an amalgam of all the individual factors that, when summed, happened to trip the category detector. Whatever it was that caused Facebook to decide I had an African-American “ethnic affinity” (was it my Sun Ra records?), it’s not anything that would clearly cause a human to decide that I have such an affinity.

What’s important, instead, is that such a category exists, because it dictates how I will be treated in the future. The name of the category — whether “African American,” “ethnic minority,” “African descent,” or “black” — is more important than the criteria for the category. Facebook’s learned criteria for these categories would significantly overlap, yet the ultimate classification possesses a distinctly different meaning in each case. But the distinction between criteria is obscured. We never see the criteria, and very frequently this criteria is arbitrary or flat-out wrong. The choice of classification is more important than how the classification is performed.

Here, Facebook and other computational classifiers exacerbate the existing problems of provisional taxonomies. The categories of the DSM dictated more about how a patient population was seen than the underlying characteristics of each individual, because it was the category tallies that made it into the data syntheses. One’s picture of the economy depends more on how unemployment is defined (whether it includes people who’ve stopped looking for a job, part-time workers, temporary workers, etc.) than it does on the raw experiences and opinions of citizens.</p>


His "laws of internet data", set out in this piece (which is an extract from his forthcoming book BITWISE), are terrific too.
facebook  data  socialwarming 
august 2018 by charlesarthur
Consumer genomics will change your life, whether you get tested or not • Genome Biology
Razib Khan and David Mittelman:
<p><img src="https://media.springernature.com/lw785/springer-static/image/art%3A10.1186%2Fs13059-018-1506-1/MediaObjects/13059_2018_1506_Fig1_HTML.png" width="100%" />

These enormous numbers of genotyped consumers will generate massive returns on scale, allowing for greater innovation and insight. If hundreds of millions of consumers contribute to genetic databases, then the power of genealogical algorithms to infer matches will increase, until the likelihood of matching a relative, if you have close relatives (at least in the United States), will converge upon total certainty. Public databases such as GEDMatch now include data from one million samples, sufficient to predict a 90% chance of finding at least one third-cousin relative. Even with this ‘small’ database, consumers will almost certainly find relatives, and many of them. Genealogy has proved itself to be a sector with an affluent and passionate consumer base, as evidenced by the multibillion dollar valuation of the Ancestry online database thanks to millions of discretionary subscriptions.

The huge numbers of genotypes provided by consumers are valuable for genealogy, but as the numbers of genotypes increase into the millions, the data become even more valuable for trait prediction and medical applications. The large sample sizes allow for greater statistical power to detect genome-wide associations, which may be useful in linking genomic markers to functional traits and clinical phenotypes. 23andMe, for example, has amassed a database with sample numbers in the millions with which they are now working to obtain genotype–phenotype associations. The analysis of rare variations becomes immensely powerful when sample sizes approach a hundred million genotypes, and medicine could be truly personalized when such massive information reservoirs are available. We simply do not know what we might be able to do until we hit those sample sizes, as that is still unexplored territory.</p>


It's the medical applications that are the most interesting, along with rapid DNA testing - even for a few genes which could affect your response to particular drugs, for example. The cross-matching that's possible once you get a large enough population could, as they say, open up whole new territories.
genomics  data 
august 2018 by charlesarthur
Fitbit heart data reveals its secrets • Yahoo Finance
David Pogue:
<p>Before you freak out: Fitbit’s data is anonymized. Your name is stripped off, and your data is thrown into a huge pool with everybody else’s. (Note, too, that this data comes only from people who own Fitbits — who are affluent enough, and health-conscious enough, to make that purchase. It’s not the whole world.)

Most of what you’re about to read involves resting heart rate. That’s your heart rate when you’re still and calm. It’s an incredibly important measurement. It’s like a letter grade for your overall health. “The cool thing about resting heart rate is that it’s a really informative metric in terms of lifestyle, health, and fitness as a whole,” says Scott McLean, Fitbit’s principal R&D scientist.

For one thing — sorry, but we have to go here — the data suggests that a high resting heart rate (RHR) is a strong predictor of early death. According to the Copenhagen Heart Study, for example, you’re twice as likely to die from heart problems if your RHR is 80, compared with someone whose RHR is below 50. And three times as likely to die if your RHR is over 90.

Studies have found a link between RHR and diabetes, too. “In China, 100,000 individuals were followed for four years,” says Hulya Emir-Farinas, Fitbit’s director of data science. “For every 10 beats per minute increase in resting heart rate, the risk of developing diabetes later in life was 23 percent higher.”

So what’s a good RHR? “The lower the better. It really is that simple,” she says. Your RHR is probably between 60 and 100 beats a minute. If it’s outside of that range, you should see a doctor. There could be something wrong.

…Fitbit’s data confirms a lot of what cardiologists already know. But because the Fitbit data set is ridiculously huge, it unearthed some surprises, too.

“I was a researcher in my past life,” says McLean. “You would conduct an experiment for 20 minutes, then you’d make these huge hypotheses and conclusions about what this means for the general population. We don’t have to do that. We have a large enough data set where we can confidently make some really insightful conclusions.”</p>


Some of it really is counterintuitive - such as these on heart rate by age, and against BMI.
<img src="https://s.yimg.com/ny/api/res/1.2/_II0_Yl2yUrn.QUZMKi8Lw--~A/YXBwaWQ9aGlnaGxhbmRlcjtzbT0xO3c9ODAwO2g9OTAw/http://media.zenfs.com/en/homerun/feed_manager_auto_publish_494/442dccd6135f34bdff510bdfc1c01a6d" width="49% /><img src="https://s.yimg.com/ny/api/res/1.2/4Uja4sjJi1XZyMLANlAdVQ--~A/YXBwaWQ9aGlnaGxhbmRlcjtzbT0xO3c9ODAwO2g9ODUw/http://media.zenfs.com/en/homerun/feed_manager_auto_publish_494/00ece9c24523548d939650fd7e3d25c6" width="49%" />

It would be great to be able to analyse this data in more detail - but Fitbit's not making it public.
fitbit  health  data 
august 2018 by charlesarthur
Firefox Test Pilot • Advance
<p>The Advance Test Pilot experiment is a collaboration between Laserlike and Mozilla.

In addition to the data collected by all Test Pilot experiments, here are the key things you should know about what is happening when you use Advance:

<strong>Sensitive Data:</strong> After installation, Laserlike will receive your web browsing history. No data is sent if you are in private browsing or pause mode, the experiment expires, or you disable it. Laserlike also receives your IP addresses, dates/timestamps, and time spent on webpages. This data is used to index URLs publicly visible on the web.

<strong>Controls:</strong> The settings allow you to request what data Laserlike receives about you from this experiment. You can also delete cookies, web browsing history, and related Laserlike account information.

<strong>Technical and Interaction Data:</strong> Both Mozilla and Laserlike will receive clickthrough rates and time spent on recommended content; data on how you interact with the sidebar and experiment; and technical data about your OS, browser, locale.</p>


It's going to send your web browsing history to a third party?!
firefox  data 
august 2018 by charlesarthur
What data scientists really do, according to 35 data scientists • HBR
Hugo Bowne-Anderson spoke to 25 of them:
<p>Great strides are being made in industries other than tech. I spoke with Ben Skrainka, a data scientist at Convoy, about how that company is leveraging data science to revolutionize the North American trucking industry. Sandy Griffith of Flatiron Health told us about the impact data science has begun to have on cancer research. Drew Conway and I discussed his company Alluvium, which “uses machine learning and artificial intelligence to turn massive data streams produced by industrial operations into insights.” Mike Tamir, now head of self-driving at Uber, discussed working with Takt to facilitate Fortune 500 companies’ leveraging data science, including his work on Starbucks’ recommendation systems. This non-exhaustive list illustrates data-science revolutions across a multitude of verticals.

It isn’t all just the promise of self-driving cars and artificial general intelligence. Many of my guests are skeptical not only of the fetishization of artificial general intelligence by the mainstream media (including headlines such as VentureBeat’s “An AI god will emerge by 2042 and write its own bible. Will you worship it?”), but also of the buzz around machine learning and deep learning. Sure, machine learning and deep learning are powerful techniques with important applications, but, as with all buzz terms, a healthy skepticism is in order. </p>
Datascience  data 
august 2018 by charlesarthur
Facebook fueled anti-refugee attacks in Germany, new research suggests • New York Times
Amanda Taub and Max Fisher:
<p>[The attack on a refugee family in the German town of] Altena exemplifies a phenomenon long suspected by researchers who study Facebook: that the platform makes communities more prone to racial violence. And, now, the town is one of 3,000-plus data points in <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3082972">a landmark study</a> that claims to prove it.

Karsten Müller and Carlo Schwarz, researchers at the University of Warwick, scrutinized every anti-refugee attack in Germany, 3,335 in all, over a two-year span. In each, they analyzed the local community by any variable that seemed relevant. Wealth. Demographics. Support for far-right politics. Newspaper sales. Number of refugees. History of hate crime. Number of protests.

One thing stuck out. Towns where Facebook use was higher than average, like Altena, reliably experienced more attacks on refugees. That held true in virtually any sort of community — big city or small town; affluent or struggling; liberal haven or far-right stronghold — suggesting that the link applies universally.

Their reams of data converged on a breathtaking statistic: Wherever per-person Facebook use rose to one standard deviation above the national average, attacks on refugees increased by about 50%.

Nationwide, the researchers estimated in an interview, this effect drove one-tenth of all anti-refugee violence.

The uptick in violence did not correlate with general web use or other related factors; this was not about the internet as an open platform for mobilization or communication. It was particular to Facebook.

Other experts, asked to review the findings, called them credible, rigorous — and disturbing. The study bolstered a growing body of research, they said, finding that social media scrambles users’ perceptions of outsiders, of reality, even of right and wrong.</p>

Fisher, one of the reporters, said "I can't recall any statistic that has stopped me in my tracks quite like this one", of the data that where per-person Facebook use rose to one standard deviation above national average, attacks on refugees increased by about 50%.

This - now, this is serious.
Facebook  socialwarming  data  research 
august 2018 by charlesarthur
Registry of Open Data on Amazon Web Services
<p>This registry exists to help people discover and share datasets that are available via AWS resources. Learn more about sharing data on AWS.

See <a href="https://registry.opendata.aws/usage-examples">all usage examples for datasets listed in this registry</a>.</p>


This is pretty amazing. Landsat pictures (zoom in on your house!), IRS 990 filings, satellite data labelled for machine learning, 5bn web pages from web crawling, a global database (from broadcast, print and online news) from every country identifying key events, OpenStreetMap, bourse data from the German stock market, Hubble telescope data... it's such a colossal reservoir of data waiting to be made use of. Sure, many others have got there first, but what could come from cross-matching Landsat data with OSM with bourse data with key events? And then applying machine learning to that?
data  aws  bigdata  amazon  opendata  freeourdata 
july 2018 by charlesarthur
Tech’s ‘dirty secret’: the app developers sifting through your Gmail • WSJ
Douglas MacMillan:
<p>One of those companies is Return Path Inc., which collects data for marketers by scanning the inboxes of more than two million people who have signed up for one of the free apps in Return Path’s partner network using a Gmail, Microsoft Corp. or Yahoo email address. Computers normally do the scanning, analyzing about 100 million emails a day. At one point about two years ago, Return Path employees read about 8,000 unredacted emails to help train the company’s software, people familiar with the episode say.

In another case, employees of Edison Software, another Gmail developer that makes a mobile app for reading and organizing email, personally reviewed the emails of hundreds of users to build a new feature, says Mikael Berner, the company’s CEO.

Letting employees read user emails has become “common practice” for companies that collect this type of data, says Thede Loder, the former chief technology officer at eDataSource Inc., a rival to Return Path. He says engineers at eDataSource occasionally reviewed emails when building and improving software algorithms.

“Some people might consider that to be a dirty secret,” says Mr. Loder. “It’s kind of reality.”

Neither Return Path nor Edison asked users specifically whether it could read their emails. Both companies say the practice is covered by their user agreements, and that they used strict protocols for the employees who read emails. eDataSource says it previously allowed employees to read some email data but recently ended that practice to better protect user privacy.</p>

People do see value in having these companies scan their email (though Return Path is not really directly useful to you or I). But the lack of control is as bad as some Twitter API accesses.
Google  gmail  data  customer 
july 2018 by charlesarthur
Why nobody ever wins that car giveaway at the mall • The Hustle
Zachary Crockett dug into what's really going on:
<p>The car is a loaner from local dealer, Acura of Fremont — and despite what the sweepstakes’ marketing may suggest, it’s not up for grabs. (We called the dealer and they confirmed that the vehicle on display isn’t part of the giveaway at all.)

What you’re really signing up for is the opportunity to win an opportunity to possibly win a small amount of taxable cash.

Here’s what actually happens: 1) You enter the sweepstakes; 2) You have to attend a 90-minute timeshare presentation; 3) You get a scratch-off lotto ticket; 4) If you’re a “grand prize winner,” you get to play a game for a chance to win $100k.

The “game” is that the finalist gets to open 4 “mystery envelopes” with random amounts of cash. Last year’s two “big winners” walked away with checks for $575 and $700 — about enough to buy one side view mirror for your Acura..

That’s the absolute best-case scenario of entering one of these contests. Other aren’t so lucky.

Days after entering to win the car, Maggie Nicholson received a call informing her that her name was drawn. After sitting through a 2-hour timeshare presentation with Boiler Room-like sales tactics, she was told there was no car — but she was eligible for a vacation package.</p>


And then there's the way all your details get sold on, and sold on, and sold on... and you give up your do not call rights.
fraud  privacy  cars  data 
june 2018 by charlesarthur
California has 48 hours to pass this privacy bill or else • Gizmodo
Kashmir Hill:
<p>Recent headlines have suggested that California lawmakers are considering a bill that would give Californians “unprecedented control over their data.” This is true but that is not the whole story.

What’s really happening is that California lawmakers have 48 hours to pass such a bill or the policy shit is going to hit the direct democracy fan. Because if lawmakers in the California Senate and House don’t pass this bill Thursday morning, and if California governor Jerry Brown doesn’t sign this bill into law Thursday afternoon, a stronger version of it will be on the state ballot in November. Then the 17 million or so people who actually vote in California would decide for themselves whether they should have the right to force companies to stop selling their data out the back door. Polls predict they would vote yes, despite the claims of tech companies that passage of the law would lead to businesses fleeing California. And laws passed via the ballot initiative process, rather than the legislative process, are almost impossible to change, so California would likely have this one on its books for a very long time.

This, more than, say, an urgent need to address the data scandals that have dominated the tech industry so far this year, is why lawmakers are scrambling to get a bill passed.</p>
california  data  privacy 
june 2018 by charlesarthur
Here Are 18 things you might not have realized Facebook tracks about you • Buzzfeed
Nicole Nguyen:
<p>1. information from "computers, phones, connected TVs, and other web-connected devices," as well as your "internet service provider or mobile operator"
2. "mouse movements" on your computer
3. "app and file names" (and the types of files) on your devices
4. whether the browser window with Facebook open is "foregrounded or backgrounded," and time, frequency, and duration of activities
5. information about "nearby Wi-Fi access points, beacons, and cell towers" and "signal strength" to triangulate your location ("Connection information like your IP address or Wi-Fi connection and specific location information like your device's GPS signal help us understand where you are," said a Facebook spokesperson.)
6. information "about other devices that are nearby or on their network"
7. "battery level"
8. "available storage space"
9. installed "plugins"
10. "connection speed"
11. "purchases [users] make" on off-Facebook websites
12. contact information "such as an address book" and, for Android users, "call log or SMS log history" if synced, for finding "people they may know" (Here's how to turn off contact uploading or delete contacts you've uploaded.)
13. information "about how users use features like our camera" (The Facebook spokesperson explained, "In order to provide features like camera effects, we receive what you see through camera, send to our server, and generate a mask/filter.")
14. "location of a photo or the date a file was created" through the file's metadata
15. information through your device's settings, such as "GPS location, camera, or photos"
16. information about your "online and offline actions" and purchases from third-party data providers
17. "device IDs, and other identifiers, such as from games, apps or accounts users use"
18. "when others share or comment on a photo of them, send a message to them, or upload, sync or import their contact information"</p>


And that's apart from all the demographic and other intensely personal data they hold. This list was released to the US congress on Tuesday.
facebook  data 
june 2018 by charlesarthur
Facebook confirms data-sharing deals with Chinese tech firms • WSJ
Deepa Seetharaman:
<p>The social-media company said it plans to wind down its data-sharing partnership with Huawei by the end of the week. It isn’t clear when Facebook will end partnerships with the three other companies: Lenovo Group Ltd., the world’s largest personal-computer maker; Oppo Electronics Corp., a smartphone maker; and Chinese electronics conglomerate TCL .

Facebook officials defended the decision to work with Huawei and said that no data belonging to Facebook users was saved on Huawei servers. Facebook had a manager and an engineer review the apps before they were deployed to ensure the data wasn’t saved on company servers, the Facebook spokeswoman said.

“Huawei is the third-largest mobile manufacturer globally and its devices are used by people all around the world, including in the United States,” Francisco Varela, vice president of mobile partnerships, said in a statement. “Facebook along with many other US tech companies have worked with them and other Chinese manufacturers to integrate their services onto these phones.”

The New York Times earlier reported on Facebook’s device partnerships with companies like Apple Inc., Amazon.com Inc. and Microsoft Corp. After the Times article, several lawmakers said they felt they had been misled by chief executive Mark Zuckerberg, who testified in April that Facebook restricted data access to outsiders in 2015.

“Facebook’s integrations with Huawei, Lenovo, OPPO and TCL were controlled from the get go—and we approved the Facebook experiences these companies built,” Mr. Varela said. “Given the interest from Congress, we wanted to make clear that all the information from these integrations with Huawei was stored on the device, not on Huawei’s servers.”</p>
facebook  china  data 
june 2018 by charlesarthur
Reproducibility in machine learning: why it matters and how to achieve it • Determined.ai
JEnnifer Villa and Yoav Zimmerman:
<p>You’ve been handed your first project at your new job. The inference time on the existing ML model is too slow, so the team wants you to analyze the performance tradeoffs of a few different architectures. Can you shrink the network and still maintain acceptable accuracy?

The engineer who developed the original model is on leave for a few months, but not to worry, you’ve got the model source code and a pointer to the dataset. You’ve been told the model currently reports 30.3% error on the validation set and that the company isn’t willing to let that number creep above 33.0%.

You start by training a model from the existing architecture so you’ll have a baseline to compare against. After reading through the source, you launch your coworker’s training script and head home for the day, leaving it to run overnight.

The next day you return to a bizarre surprise: the model is reporting 52.8% validation error after 10,000 batches of training. Looking at the plot of your model’s validation error alongside that of your teammate leaves you scratching your head. How did the error rate increase before you even made any changes?

<img src="http://determined.ai/assets/reproducibility-img/base_figure.png" width="100%" /></p>


Via Pete Warden, who is one of Google's people working on AI. A topic that one would imagine is close to his heart.
machinelearning  science  data 
may 2018 by charlesarthur
« earlier      
per page:    204080120160

Copy this bookmark:





to read