Thursday, August 16, 2007

Skeletons in the Closet (Steven McSwan, 40549813)


The internet is a social mecca with millions of people of varying backgrounds exposing to the world their emotions, thoughts, interests and experiences. Unfortunately few realise the seriousness of the dangers in exposing so much information on the internet.

This article will explore the privacy issues concerning the exposing of personal information through the use of social tools such as Blogger [9], Flickr [10] and Del.icio.us [11]. More specifically it will discuss the issues of information availability, retention and intersection.

These issues can potentially expose private and/or damaging information of anyone who uses one or more social tools.

As a private person I like to keep my activities, thoughts and opinions to myself. With undertaking “Social and Mobile Computing”, a subject based on social interaction, it was necessary for me to divert from this norm and “expose” myself to the world. It is this experience of “exposing” myself and the risks associated with it that will be the focus of this article.


Privacy has always been a concern however with the inception of the internet and social tools new issues have emerged. As a result a number of academic articles have emerged that explore what the potential risks are and how we can combat them. A number of these articles will be used to support my arguments over the next few paragraphs and they are described below.

Lenhart [1] and Nardi [2] discuss privacy from the perspective of the social tool’s users’ and how they themselves manage it. They explain why they use the tools and the amount of information they typically release.

Conto [3] in his article goes on to explain how using one of the most popular search engines, Google [14], this information is readily available and presents an inherent danger to their publishers’ privacy.

Finally Frankowski [4] and Ahern [5] in their articles explore how these collected segments of information no matter how mundane can reveal the publishers’ complete profile and how this can be combated.



Figure 1. Google's initial screen.

Google:
Google is the world’s most popular and powerful search engine with approximately 380 million visitors a month [4]. Google is the tool as explained in Conti’s [4] article that has the power to store and expose potentially damaging personal information through data intersection, as explained by Frankowsk in his article [4].



Figure 2. Blogger's initial screen.

Blogger:
Blogger is a blogging tool that, as explained in Nardi’s [2], article people can use to expose their daily activities, whereabouts, opinions, thoughts and emotions.



Figure 3. A user's gallery on Flickr.

Flickr:
Flickr is a photo sharing tool that enables users to post photos that as explained in Ahern’s article [5] can expose their relationships, frequented locations, behaviour patterns and the objects they own.



Figure 4. My Del.icio.us website showing all my links.

Del.icio.us:
Del.icio.us is a tool that users may use to share their frequented sites. This in turn exposes their current interests, projects and browsing patterns.



At first glance it seems the social tools are quite different. For example Blogger involves the sharing of thoughts, Flickr the sharing of photos, and Del.icio.us the sharing of bookmarks. However upon closer examination it can be found that they all achieve very similar goals. They as discussed by Ahern [5], Nardi [2] and Lenhart [1] provide people with the ability to share personal information whether they concern their thoughts, emotions, interests or activities in which their involved.

This sharing particularly with blogs fulfils social needs as described in Abraham Maslow’s theory of human motivation [6]. He states that we all have a desire/need for reputation, prestige, recognition, attention, importance and appreciation. Having accomplished these we fulfil our esteem and belonginess needs which allow us to move onto fulfilling our other needs as outlined on Maslow’s triangle [6].



Figure 5. Maslow's Triangle showing all the needs we as humans strive to meet.

These desires/needs would have normally been fulfilled in real-life using the prestige of our jobs, the money we earn and our lifestyles. However with the introduction of the internet new methods have emerged. We are now able to fulfil these needs by exposing ourselves to a far wider audience where we have a far greater chance of advancing our social status.

We may fulfil our social needs by exposing ourselves online. Unfortunately this comes at a serious price as the information we disclose can be collected and read by anyone including advertising agencies, potential employers, government agencies, criminal organisations and even potential partners.

Tabularised in the figure below is information that would normally be provided through normal usage of Blogger, Flickr and Deli.icio.us. This provides a clear picture of exactly how much information undesirable parties can gain access to.



Figure 6. Information that could potentially be exposed.

Lenhart [1] and Nardi [2] in their research have found that the majority of users are unconcerned if outsiders know about their interests, social network or what they own. Their main concern was that of others feelings and exposing their full name or home address.

This is sound if it’s in the context of the real-world where the information you put forth is only received by a few individuals and will usually be forgotten. The internet however unlike the real-world has a few key attributes that could make the most mundane information damaging. These attributes are data availability, retention and intersection.

Data Availability
Websites such as Blogger, Flickr and Del.icio.us have a lot of visitors and are often bookmarked due to their nature as a social tool. Because of this the information people provide on such sites are much more readily available than if it was on their own individually hosted website [7]. Search engines such as Google are highly likely to have these sites in the top search results given the right search terms.

Figure 7. The search result returned when searching for my name, Steven McSwan.

One such example is with my own name Steven McSwan. If you were to search for my name in Google you will see the best match is my “One Million Masterpiece” site. This is a site that is regularly visited and due to its nature would have been visited many times. I’ve created many of my own websites over time and would have certainly mentioned my full name however these are nowhere to be found.

Data Retention
Conti discusses in his article the fact high capacity storage devices now cost so little companies can store information about its customers indefinitely [3]. This coupled with the might that is Google is one of the biggest dangers facing users of the internet. Google has the capacity to store and cache most of the internet’s websites [3]. This means the information we post even if deleted will still be available well into the future.

An example of this is with a forum post of mine where I asked for the address of the LEGO [12] website, in which someone kindly replied www.lego.com to my embarrassment. That was eleven years ago and I recently came across it again. The company no longer exists, nor does the website, yet my post could still be found because it existed on Google cache.

This is a harmless example but in the context of social tools it is dangerous. We release a lot of information that we may potentially regret releasing later on. Nardi [2] in her article lists a specific example where a Blogger regrets posting his emotion fuelled opinion of a fellow classmate. He later deletes this post however with Google’s cache that classmate would still have access to it.

Data Intersection
You have more than likely with your inappropriate internet postings maintained your anonymity by not releasing any personal details such as your name, location, sex, birth date, etc. Surely with doing this you should be alright? Unfortunately not, this is where data intersection comes into play. Data intersection as Frankowski [4] describes is where two or more sparse relation spaces (ie. information from your blog, bookmarks and from your anonymous offensive postings) overlap.



Figure 8. A diagram showing the different data segments intersecting.

For example let’s say you have the following sets of information:

Blogger: You have made your first name and country available. You have posted articles stating how much you like reading Harry Potter [13] books.


Del.icio.us: You have links to Harry Potter fan sites and articles about snails.

Anonymous Posting: You have a fake name, location, sex and birth date for a forum about satanic rituals. In some of your posts you talk about your love of Harry Potter and rare snails (which happen to only be found at your location).

People are able to identify your anonymous postings as they can link your love of Harry Potter books, snails and their location. Realistically it would take many more links to personally identify you in this situation as there are many, many Harry Potter fans however given the amount of information users expose the potential links are plentiful.

Given the popularity and usefulness of social tools it cannot be expected that you or others will simply stop using them altogether. Measures should however be taken to lessen the potential damage your exposure can cause now or in the future.

Data Availability
You should be aware of the popularity of any website you publish information to. The more popular it is the more careful you should be about the information you post as it will likely be the first thing people see when they search for anything related to you.

An effective technique I use is to simply assume that anything I post will be read by my family, friends and potential employers. Hence why in all my accounts for Flickr, Blogger, Del.icio.us and so forth you will not find anything offensive or revealing.

Data Retention
You should be aware before posting any information to any website that it may be available for anyone to read even if it is later deleted or has been inactive for quite a number of years. If you are unfortunate and accidently release or publish something that Google has cached there are some steps you can take to rectify the situation.

You need to first of all edit (not delete) the item and change it so that there are no search terms that will relate it to the original version. After several weeks or months Google will refresh its cache so that now there shouldn’t be any offending items in the original or cached version. I used this technique to remove the cache of the post I mentioned earlier where I asked what the website address of the LEGO website was.

Data Intersection
It is unlikely you will be able to combat this issue however a few basic steps will make it more difficult for undesirables to identify you. Frankowski [4] identified those steps as follows:

  1. Never publish personally identifying information such as your real name, country of origin, birthday, email address, your school, friends, etc. This will decrease drastically the ability to narrow dataset intersection results to you personally.

  2. Mention a multitude of popular items in your articles. This will make it a lot more difficult to obtain enough links between datasets to identify you.

In today’s busy world it can often be difficult to fulfil our desire/need for reputation, prestige, recognition, attention, importance and appreciation. With the introduction of the internet and social tools such as Blogger, Flickr and Del.icio.us we now have new easier methods to fulfil these needs however like all good things this doesn’t come risk free. Search engines like Google though useful make any information we share readily available to the undesirables of the world. Worse still the disjointed bits of information we release can together expose details of our lives we would much rather keep private.

Do not release any personal information and most certainly don’t publish information you would rather remain private. Failing this the skeletons in your closet will surely be revealed.

[1]. Lenhart, A., & Madden, M. (2007). Teens, Privacy & Online Social Networks: How teens manage their online identities and personal information in the age of MySpace.. PEW Internet & American Life Project.

[2]. Nardi, B. A., Schiano, D. J., & Gumbrecht, M. (2004). Blogging as Social Activity, or, Would You Let 900 Million People Read Your Diary?. CSCW 2004.

[3]. Conti, Gregory (2006).Googling Considered Harmful. New Security Paradigms Workshop 2006.

[4]. Frankowski, D., Cosley, D., Sen, S., Terveen, L., & Riedl, J. (2006). You Are What You Say: Privacy Risks of Public Mentions. SIGIR 2006.

[5]. Ahern, S., Eckles, D., Good, N., King, S., Naaman, M., & Nair, R. (2007). Over Exposed? Privacy Patterns and Considerations in Online and Mobile Photo Sharing. CHI 2007.

[6]. (2007, June 27). Abraham Maslow's Hierarchy of Needs. Retrieved August 10, 2007, from Business Balls Web site: http://www.businessballs.com/maslow.htm

[7]. Wolfram, E. Score Higher in Google Search Engine. Retrieved August 10, 2007, from Wolfram Web site: http://wolfram.org/writing/howto/3.html

[8]. Walters, G. J. (2001).Privacy and Security: An Ethical Analysis. Human Rights in an Information Age: A Philosophical Analysis.

[9]. Blogger. Retrieved August 10, 2007, Web site: https://www.blogger.com/start

[10]. Flickr. Retrieved August 10, 2007, Web site: http://www.flickr.com/

[11]. Del.icio.us. Retrieved August 10, 2007, Web site: http://del.icio.us/

[12]. LEGO. Retrieved August 10, 2007, Web site: http://www.lego.com/en-US/default.aspx

[13]. Harry Potter. Retrieved August 10, 2007, Web site: http://www.jkrowling.com/

[13]. Google. Retrieved August 10, 2007, Web site: http://www.google.com.au/

12 comments:

Anonymous said...

I know the images are a little too large and are going over the side but if I make them any smaller nobody will be able to read them.

Nevertheless enjoy everyone ;-).

Sandra said...

excellent and informative article, I like all the examples you have given.
Well done. :)

Tom Ireland said...

Nice essay, lots of good references and a very interesting and relevant topic, well presented and backs your argument (liked the real world example of googleing(S?) your name)

Nice read, nice going.

Petra said...

I am a little like you, I found it extemely intrusive initially to put myself into a community site and the fear of rejection and fear of exposing too much of myself in a public space. This was all ok, but you would have to wonder that once you were using these spaces esp Facebook would that familiarity lead to giving additional information that later became an issue for yourself.

Unknown said...

Really enjoyed this article. I have asked myself "why" when using these tools as well, thinking that I had so much more control over my personal information in the "real world". Since I started working for a bank though I have realized that before computers fraud, identity theft, and devious criminal behavior all still existed. True it was a bit more effort for the average criminal but it was still there. The same rules apply now, know who you giving info too and what it will be used for. Truth be told criminals or others who wish you some sort of harm will get what they need if they want it bad enough but we don't have to make it easy for them. Sometimes I think that some social tools are making it too easy for them. I guess we all just have to try and be discerning with our details,thoughts and opinions.

Ray said...

I had never thought of Maslow’s Triangle in terms of the internet and how that affects the way in which we live. In terms of privacy, although I am aware of the risks that are involved by putting personal information on the web I had not consider how this information can be tracked to the various accounts that I use. Overall I really enjoyed this article and found it insightful.

Eletar said...

Very insightful, and well written except for two things. When talking about your forum post inregards to LEGO, it sounds as though your saying LEGO and their site don't exist, but you hopefully mean the website with the forum. And your sentence under data intersection is a little lacking in phrasing. But overall, every easy to read.

Mt Crosby Digital Stories group said...

really well written article... make some interesting points about privicy and appropriate postings of users...

DarrenE said...

This is a very interesting and compelling article with points well backed up.

I think this article shows how important it is for people to understand what can happen to information they put on the internet.

Tim said...

Swanny,

I remember you saying at the start of semester that you were a little annoyed that we were forced to sign up to all these social sites because you like live a private life. I remember thinking at the time - what is he talking about? Surely isn't all that bad... Although I was aware I was giving out information I was really of the opinion that "she's all good". That opinion sounds rather naive now that I have read through your article. I found it rather revealing that just by observing what I do online on a daily basis (reflecting on the Blogger / Flickr / Del.icio.us table) how much information can actually be gained from seemingly innocent acts. Thanks for the really great read - I also liked how you went to the effort of researching and referencing, good work.

Ellena Linden said...

Your article was very informative, and written very clearly with research and statements to support your ideas. I, too, like to keep to myself, and in relation to using the social tools, found it a little unbearable to keep adding more and more personal information about myself in the textboxes. But like your article states, most people are "unconcerned if outsiders know about their interests". People aren't aware how limited their privacy is. Those people should definitely read your article. :)

Unknown said...

Insightful article. Excellent integration of sources to support your argument. Background description of academic articles would've benefited from pulling out key findings/opinions rather than generally describing the article.

Reflection is hidden, tends to get lost in overall discussion of privacy issues. You stray into general discussion of privacy issues, but don't relate it back to the social software applications and the implications for using those.