Archive for September, 2007

Semantics count: anonymity and privacy are two different things

Thursday, September 27th, 2007

Remember the eBay guy? He sold 90 days’ worth of his non-identifying personal data for $355 back in June, leading me to ask the question:

I can’t help but reflect on deeper questions as I read these stories: mainly, what constitutes an identity?

If I know your habits, your activities, your purchases, your poker games and porn preferences, does it follow that I know you? Surely you are more than just your name—at what point does your identity become non-identifying?

This topic has just resurfaced in a Wired article called Lesson From Tor Hack: Anonymity and Privacy Aren’t the Same. In it, author Bruce Schneier discusses people’s confusion about the difference between anonymous and inaccessible:

As the name implies, Alcoholics Anonymous meetings are anonymous. You don’t have to sign anything, show ID or even reveal your real name. But the meetings are not private. Anyone is free to attend. And anyone is free to recognize you: by your face, by your voice, by the stories you tell. Anonymity is not the same as privacy.

The topic of the article is Tor:

Tor is a free tool that allows people to use the internet anonymously. Basically, by joining Tor you join a network of computers around the world that pass internet traffic randomly amongst each other before sending it out to wherever it is going. Imagine a tight huddle of people passing letters around. Once in a while a letter leaves the huddle, sent off to some destination. If you can’t see what’s going on inside the huddle, you can’t tell who sent what letter based on watching letters leave the huddle.

This system sounds clever; however, if I understood the article correctly (I found it utterly intriguing, but somewhat confusing), Tor has a few problems, and the huddle analogy is a good way to describe them. Although you can’t tell who sent a letter if you can’t see inside the huddle, anyone can join—which means it would be quite easy for someone to gain access to the info you’re trying to hide.

In fact, in a way you’re making it more likely that someone will read the your private data; by definition, you have to pass it around a group before sending it to its destination, as evidenced by this visual description of how the thing works:

How Tor Works: Step 1

How Tor Works: Step 2

How Tor Works: Step 3

Schneier points out, though, that taking an anonymizing route isn’t the same as encrypting. While the information you’re sending is in the Tor huddle, only people in the huddle can see it, but once it leaves, anyone can. It’s like the difference between taking back roads to avoid the cops and wearing a disguise (not that I’ve ever had to do either, thank goodness). From Tor’s FAQs:

Can exit nodes eavesdrop on communications? Isn’t that bad?
Yes, the guy running the exit node can read the bytes that come in and out there. Tor anonymizes the origin of your traffic, and it makes sure to encrypt everything inside the Tor network, but it does not magically encrypt all traffic throughout the Internet.

…So I’m totally anonymous if I use Tor?

‘No.’

If your application runs in a virtual machine, it can access local information because it runs locally. Java, Javascript, Macromedia Flash and Shockwave, QuickTime, RealAudio, ActiveX controls, and VBScript are all known to be able to access local information about your operating system and local network. These technologies will work over proxies and can tunnel the information back to their source. They can also open new connections outside of the proxy to communicate data.

Disabling these technologies in your browser can improve the situation.

With an exensible browser like Firefox, make sure you are not using an extension with a similar behavior as described above.

Generally, you should also worry about entering revealing information into a form, or falling victim to spyware, or similar problems.

Also, there are still some technical attacks that probably work against Tor.

Hmmm… comforting.

The other point the article makes reinforces the semantic argument: just because people can’t see that it was you specifically that hit the ‘Send’ button doesn’t mean that they won’t be able to glean confidential information from reading your email, or even identify you by it. Schneier touches on Dark Web, a scary project funded by the National Science Foundation:

One of the tools developed by Dark Web is a technique called Writeprint, which automatically extracts thousands of multilingual, structural, and semantic features to determine who is creating “anonymous” content online. Writeprint can look at a posting on an online bulletin board, for example, and compare it with writings found elsewhere on the Internet. By analyzing these certain features, it can determine with more than 95 percent accuracy if the author has produced other content in the past.

Obviously, as Schneier points out, if your identity is connected to any of that content, you can be found.

I bring these issues up because of my ongoing fascination with clarity, and I put these questions to you in the hopes that together we can explore the concept further:

  • What is your biggest privacy concern? Why?
  • What would be the downside to having all of your activity exposed for the world to see?
  • At what point does exposure become invasive?

If we want to improve online privacy, we need to truly understand where the problems are. Thanks in advance for your comments.

Second Google privacy video

Tuesday, September 25th, 2007

Maile Ohye from Google is back, with a discussion on web history and personalization:

It’s the second in their video-series-of-indeterminate-length. You can see the first one here.

In this video, Maile shows you how you can see the history that’s associated with your Google account, pause recording of your Google activity, and delete specific records. This last is a useful trick if you’re shopping for a surprise birthday gift on a shared computer, although I would raise an eyebrow if my partner were suspiciously trolling through my search history.

Towards the end of this video, Maile points out that if you clear your history, the only data Google will be left with is what she described in the first video: your search query, IP address, and cookie.

Search log from Google
From the first video: the info that gets retained in the Google logs

I may be really dense here, but isn’t the only difference between a cleared history and an uncleared one is that the uncleared one has your Google account, while the cleared one has your cookie? Here’s what Maile says about search history:

Your email and password don’t tell us personal stuff about you, like your name, age and occupation, so why do we need them? Well, in addition to helping us verify that you’re really you and not someone else who’s using your computer, your email and password allow us to maintain a record of your web history: the things you search for and the sites you visit.

Aha! Perhaps when you’re signed in they also track which sites you click through to! Maile doesn’t say whether they do, but she does say that you can check your history anytime. Just sign into Google and click ‘History’. I tried. Can’t do it. Maybe it doesn’t work in New Zealand. Here’s the full list of options available to me:

All the Google services you care to eat

Did I miss it? Or is it not there? Have you been able to log into your history on Google? If so, what do you see? Also, what do you think of this new video?

More on trust

Monday, September 24th, 2007

Ohmygoshthisissoinsanelyimportant…

The Semantic Web Stack from Tim Berners-Lee:

Semantic Web Stack

What’s at the top of the stack?

Aaron Wall on Search Engine Land talks about credibility as a crucial element of SEO:

Authority is not something you take, but something that is granted. Gaining authority makes it easier to gain more authority, and eventually it becomes a self fulfilling prophecy.

And then ruins it by saying:

In many instances appearing as though you are credible is more important than actually knowing what you are talking about, especially on a network that has no respect for copyright and where just about everything is freely available.

A whole industry is sprouting up to maintain people’s online image (so that their clients and constituents can continue to trust them).

Francis Fukuyama thinks it’s pretty important, if you go by his book ‘Trust: The Social Virtues and the Creation of Prosperity‘. From the Library Journal’s review:

He argues that the most pervasive cultural characteristic influencing a nation’s prosperity and ability to compete is the level of trust or cooperative behavior based upon shared norms. In comparison with low-trust societies (China, France, Italy, Korea), which need to negotiate and often litigate rules and regulations, high-trust societies like those in Germany and Japan are able to develop innovative organizations and hold down the cost of doing business. Fukuyama argues that the United States, like Japan and Germany, has been a high-trust society historically but that this status has eroded in recent years.

Just so you don’t mistrust my words, you should know that I haven’t read that book; I found it via a piece Jakob Nielsen wrote waaaayyyyy back in 1999 (the web equivalent of the Pleistocene) on trustworthiness in web design. It does look like a good read, though.

‘Don’t be evil’ is a fantastic motto—as long as we trust them.

What are your thoughts on trust? How does your level of trust differ online from offline, if at all? Should trust, not relevance, be the holy grail for websites?

Big oops for my beloved Lingo

Friday, September 21st, 2007

CLARIFICATION: Lingo is a US company, not a New Zealand one.

My Lingo VoIP phone is the reason I can live in New Zealand and yet not feel absolutely, utterly isolated from the rest of the world. I adore it. I am a huge fan of the company. But somebody over there just hit the ’send’ button too early and made a major privacy mistake.

I just got an email with what appears to be every single Lingo client’s email address in the ‘To’ field.

Perhaps it wasn’t every Lingo client. I certainly hope they have more than 14,343 clients. But that’s how many email addresses were there.

First lot of addresses
The first lot of addresses…

A small fraction of the addresses in this email
After hitting Page Down a few times—look how much further you can scroll down!

I now have the billing email addresses of 14,343 VoIP customers. If only I had something to sell them! And I can only hope none of them have anything to sell me.

For those of you who aren’t familiar with it, a Lingo phone acts in all ways like a regular phone, except it plugs into a router instead of into the phone jack. I have a U.S. phone number that will ring through to New Zealand at no additional charge to either the caller or me, and I can make unlimited calls to all of North America and most of Europe. This has several advantages over a PC-based VoIP system:

  1. I don’t have to have the computer turned on, or be sitting at it.
  2. My mom can call me without having to learn anything new.
  3. It is a huge convenience and savings for both me and the person calling me.

I am a tremendously loyal customer of this organization. But I can’t help but think it’s a little… icky, for lack of a better word, that more than 14,000 people just got my email address. How would you feel about it?

I’m sure it was an honest mistake. But what if this weren’t the phone company? What if it were the bank? Or the hospital? And what do you think would be an appropriate reaction from Lingo now?

Have you ever hit the ‘Send’ button by mistake?

If you like piña coladas

Thursday, September 20th, 2007

La dada DA dada DA da…

Rupert Holmes’ ‘Piña Colada Song’ was prophetic, if you go by a story in The Daily Telegraph, via Marc Andreessen:

A married couple who didn’t realise they were chatting each other up on the Internet are divorcing.

Sana Klaric and husband Adnan, who used the names “Sweetie” and “Prince of Joy” in an online chatroom, spent hours telling each other about their marriage troubles…

The truth emerged when the two turned up for a date. Now the pair, from Zenica in central Bosnia, are divorcing after accusing each other of being unfaithful.

“I was suddenly in love. It was amazing. We seemed to be stuck in the same kind of miserable marriage. How right that turned out to be,” Sana, 27, said.

Adnan, 32, said: “I still find it hard to believe that Sweetie, who wrote such wonderful things, is actually the same woman I married and who has not said a nice word to me for years”.

For those of you who missed the lyrics of the original song:

I was tired of my lady
We’d been together too long
Like a worn-out recording
Of a favorite song
So while she lay there sleeping
I read the paper in bed
And in the personal columns
There was this letter I read

“If you like Pina Coladas
And getting caught in the rain
If you’re not into yoga
If you have half a brain
If you’d like making love at midnight
In the dunes on the Cape
Then I’m the love that you’ve looked for
Write to me and escape.”

I didn’t think about my lady
I know that sounds kind of mean
But me and my old lady
Have fallen into the same old dull routine
So I wrote to the paper
Took out a personal ad
And though I’m nobody’s poet
I thought it wasn’t half bad

“Yes I like Pina Coladas
And getting caught in the rain
I’m not much into health food
I am into champagne
I’ve got to meet you by tomorrow noon
And cut through all this red-tape
At a bar called O’Malley’s
Where we’ll plan our escape.”

So I waited with high hopes
And she walked in the place
I knew her smile in an instant
I knew the curve of her face
It was my own lovely lady
And she said, “Oh it’s you.”
Then we laughed for a moment
And I said, “I never knew.”

That you like Pina Coladas
Getting caught in the rain
And the feel of the ocean
And the taste of champagne
If you’d like making love at midnight
In the dunes of the Cape
You’re the lady I’ve looked for
Come with me and escape

MySpace’s Personalized Ads: 80%=$5 billion

Thursday, September 20th, 2007

The New York Times yesterday published a piece by Brad Stone entitled, MySpace to Discuss Effort to Customize Ads. In it, Brad unpicks the vast potential of using profile information to personalize ad service.

…MySpace, the Web’s largest social network and one of the most trafficked sites on the Internet, says that after experimenting with technology over the last six months it can tailor ads to the personal information that its 110 million active users leave on their profile pages.

Executives at Fox Interactive Media, the News Corporation unit that owns MySpace, will begin speaking about the results of that program this week. They say the tailoring technology has improved the likelihood that members will click on an ad by 80 percent on average.

“We are blessed with a phenomenal amount of information about the likes, dislikes and life’s passions of our users,” said Peter Levinsohn, president of Fox Interactive Media, who will talk about the program at an address to investors and analysts at a Merrill Lynch conference in Los Angeles on Tuesday. “We have an opportunity to provide advertisers with a completely new paradigm.”

That’s rather a long quote, so I’m going to repeat the bit that jumped out at me:

…the tailoring technology has improved the likelihood that members will click on an ad by 80 percent on average.

80 percent!

Remember how much everybody freaked out when Panama was shown to improve click-throughs on Yahoo! ads by 10%?

Back then, Jonathan Thaw was saying a 10% increase in click rate could translate into a 5% increase in revenue growth. So, ummm, if 10% equals 5%, then (let me just check my math with Miss Teen South Carolina here), such as, 80% could equal 40%? Which, such as, works out to more than $2 billion at Yahoo! and nearly $5 billion at Google.

I don’t need Don Dodge to tell me that a) this is an overly simplistic translation, b) Google and Yahoo would have to have access to the same extensive bank of personal information that MySpace does for each searcher in order to make it work, and c) I should call him for my math questions and just appreciate Miss Teen South Carolina for her beautiful heart and shiny white smile. No matter which way you look at it, these are big numbers we’re dealing with here.

Brad gets into the privacy issues on page 2:

MySpace also plans to give its advertisers information about what kind of people its ads have attracted. “We want them to leave knowing more about their audience then when they came into the door,” Arnie Gullov-Singh, vice president in the advertising technology group at Fox Interactive.

That is precisely the goal that worries some privacy advocates. They argue that users of social networks like MySpace and Facebook are not aware they are being monitored and that current ad-targeting is only the first step in what has become a huge arms race to collect revealing data on Internet users.

“People should be able to congregate online with their friends without thinking that big brother, whether it is Rupert Murdoch or Mark Zuckerberg, are stealthily peering in,” said Jeff Chester, executive director at the Center for Digital Democracy in Washington.

His organization will ask the Federal Trade Commission, during a planned hearing on Internet privacy in November, to investigate social networks for unfair and deceptive practices, he said.

This is definitely sensitive territory that MySpace is playing in, and they need to be careful. As I wrote yesterday, trust is worth more than gold on the Internet, and an 80% increase in clickthrough will mean nothing if there’s nobody there to see the personalized ads.

The reason this is so tricky is that MySpace members gave up the information in a context that had nothing to do with advertising.

In the NYT article, MySpace representatives were dismissive of the issue:

MySpace and Facebook executives argue that they are harming no one. They say that they are using information their members make publicly available, and contrast their ad targeting with efforts by Yahoo, America Online and Microsoft, whose advertising technologies follow people around the Web and try to deduce what they are interested in based on what sites they are looking at.

I think that’s a dangerous attitude to take, though. Given the potential reward, MySpace would be foolish to back off of a personalization program, but they need to get clear that retaining their audience trumps an increase in clickthroughs. A later comment indicates that perhaps they realize this:

Fox executives also say they are planning on letting users opt-out of the ad-targeting program on MySpace, though it means those members will see fewer relevant ads.

Smart move, but I think they’re doing themselves an injustice if they limit it to a simple ‘On-Off’ equation. We love to buy, but we don’t like to be sold to, and the primary difference is in how much control we have over the process. The more control MySpace gives its users over their ad targeting, the happier the users will be. Imagine a dashboard that offers me the ability to allow ads served based on the groups I belong to but not based on my individual conversations. Or that lets me indicate if I’m in ’shopping mode’. (My boyfriend will tell you that I am permanently in shopping mode, but that’s just not true.)

VortexDNA’s aim is to facilitate highly relevant personalization with complete control and total privacy. That is the triad—those three things are all equally important. It’s clear from the above article that MySpace understands the importance of personalization. I hope they manage to balance the triangle as well.

What do you think about their personalization efforts? Are you a MySpace user? Would you like to see targeted ads or would you feel they were intrusive? I’m eager to get your thoughts.