The Problem with Next Generation Virtual Assistants

33433114056_ff8bc048f1_b

It may not seem like it, but there is quite an arms race going on when it comes to interactive AI and virtual assistants. Every tech company wants their offering to be more intuitive…more human. Yet although they’re improving, voice activated tech like Alexa and Siri are still pretty clunky, and often underwhelming in their interactions.

This obviously isn’t great if developers want to see them entering the workplace in such a way as to supercharge sales.  Continue reading

Want Artificial Intelligence that cares about people? Ethical thinking needs to start with the researchers

We’re delighted to feature a guest post from Grainne Faller and Louise Holden of the Magna Carta For Data initiative.

The project was established in 2014 by the Insight Centre for Data Analytics  – one of the largest data research centres in Europe – as a statement of its commitment to ethical data research within its labs, and the broader global movement to embed ethics in data science research and development.

Magna Carta For Data 1

A self-driving car is hurtling towards a group of people in the middle of a narrow bridge. Should it drive on, and hit the group? Or should it drive off the bridge, avoiding the group of people but almost certainly killing its passenger? Now, what about if there are three people on the bridge but five people in the car? Can you – should you – design algorithms that will change the way the car reacts depending on these situations?

This is just one of millions of ethical issues faced by researchers of artificial intelligence and big data every hour of every day around the world. Continue reading

6 Tech Terms Every Adult Should Learn About To Avoid Being Left Behind

adults

Not for the first time, Apple CEO Tim Cook has spoken out this week about how important it is for children to learn computer code. He’s not alone in believing that this “language of the future” will be critical for kids growing up right now. In a sea of unknowns one thing appears to be certain: technical understanding is a very valuable asset indeed.

It’s interesting then, that in spite of remarkable efforts to equip the adults of tomorrow with such skills, very little is being done to familiarize young adults, middle-aged parents, or retirees (with impressively long-life expectancies!) with the signature terms of the “AI Age”. This seems a like an oversight. Continue reading

Bots may be determining all our futures

social bots

We’ve all seen the stories and allegations of Russian bots manipulating the Trump-Clinton US election and, most recently, the FCC debate on net neutrality. Yet far from such high stakes arenas, there’s good reason to believe these automated pests are also contaminating data used by firms and governments to understand who we (the humans) are, as well as what we like and need with regard to a broad range of things… Continue reading

10 real-world ethical concerns for virtual reality

virtual reality.jpeg

There are lots of emerging ideas about how virtual reality (VR) can be used for the betterment of society – whether it be inspiring social change, or training surgeons for delicate medical procedures.

Nevertheless, as with all new technologies, we should also be alive to any potential ethical concerns that could re-emerge as social problems further down the line. Here I list just a few issues that should undoubtedly be considered before we brazenly forge ahead in optimism.

1.   Vulnerability

When we think of virtual reality, we automatically conjure images of clunky headsets covering the eyes – and often the ears – of users in order to create a fully immersive experience. There are also VR gloves, and a growing range of other accessories and attachments. Though the resultant feel might be hyper-realistic, we should also be concerned for people using these in the home – especially alone. Having limited access to sense data leaves users vulnerable to accidents, home invasions, and any other misfortunes that can come of being totally distracted.

2.   Social isolation

There’s a lot of debate around whether VR is socially isolating. On the one hand, the whole experience takes place within a single user’s field-of-vision, which obviously excludes others from physically participating alongside them. On the other hand, developers like Facebook have been busy inventing communal meeting places like Spaces, which help VR users meet and interact in a virtual social environment. Though – as argued –  the latter could be helpfully utilized by the introverted and lonely (e.g. seniors), there’s also a danger that it could become the lazy and dismissive way of dealing with these issues. At the other end of the spectrum, forums like Spaces may also end-up “detaching” users by leading them to neglect their real-world social connections. Whatever the case, studies show that real face-to-face interactions are a very important factor in maintaining good mental health. Substituting them with VR would be ill-advised.

3.   Desensitization

It is a well-acknowledged danger that being thoroughly and regularly immersed in a virtual reality environment may lead some users to become desensitized in the real-world – particularly if the VR is one in which the user experiences or perpetrates extreme levels of violence. Desensitization means that the user may be unaffected (or less affected) by acts of violence, and could fail to show empathy as a result. Some say that this symptom is already reported amongst gamers who choose to play first person shooters or roleplay games with a high degree of immersion.

4.   Overestimation of abilities

Akin to desensitization, is the problem of users overestimating their ability to perform virtual feats just as well in the real-world. This is especially applicable to children and young people who could take it that their expertise in tightrope walking, parkour, or car driving will transfer seamlessly over to non-virtual environments…

5.   Psychiatric

There could also be more profound and dangerous psychological effects on some users (although clearly there are currently a lot of unknowns). Experts in neuroscience and the human mind have spoken of “depersonalization”, which can result in a user believing their own body is an avatar. There is also a pertinent worry that VR might be swift to expose psychiatric vulnerabilities in some users, and spark psychotic episodes. Needless to say, we must identify the psychological risks and symptoms ahead of market saturation, if that is an inevitability

6.   Unpalatable fantasies

If there’s any industry getting excited about virtual reality, it’s the porn industry (predicted to be the third largest VR sector by 2025, after gaming and NFL-related content). The website Pornhub is already reporting that views of VR content are up 225% since it debuted in 2016. This obviously isn’t an ethical problem in and of itself, but it does become problematic if/when “unpalatable” fantasies become immersive. We have to ask: should there be limitations on uber realistic representations of aggressive, borderline-pedophilic, or other more perverse types of VR erotica? Or outside of the realm of porn, to what extent is it okay to make a game out of the events of 9/11, as is the case with the 08.46 simulator?

7.   Torture/virtual criminality

There’s been some suggestion that VR headsets could be employed by the military as a kind of “ethical” alternative to regular interrogatory torture. Whether this is truth or rumor, it nevertheless establishes a critical need to understand the status of pain, damage, violence, and trauma inflicted by other users in a virtual environment – be it physical or psychological. At what point does virtual behavior constitute a real-world criminal act?

8.   Manipulation

Attempts at corporate manipulation via flashy advertising tricks are not new, but up until now they’ve been 2-dimensional. As such, they’ve had to work hard compete with our distracted focus. Phones ringing, babies crying, traffic, conversations, music, noisy neighbors, interesting reads, and all the rest. With VR, commercial advertisers essentially have access to our entire surrounding environment (which some hold has the power to control our behavior). This ramps up revenue for developers, who now have (literally) whole new worlds of blank space upon which they can sell advertising. Commentators are already warning that this could lead to new and clever tactics involving product placement, brand integration and subliminal advertising.

9.   Appropriate roaming and recreation

One of the most exciting selling points of VR is that it can let us roam the earth from the comfort of our own homes. This is obviously a laudable, liberating experience for those who are unable to travel. As with augmented reality, however, we probably need to have conversations about where it is appropriate to roam and/or recreate as a virtual experience. Is it fine for me to wander through a recreation of my favorite celebrity’s apartment (I can imagine many fans would adore the idea!)? Or peep through windows of homes and businesses in any given city street? The answers to some of these questions may seem obvious to us, but we cannot assume that the ethical parameters of this capability are clear to all who may use or develop.

10.   Privacy and data

Last, but not least, the more we “merge” into a virtual world, the more of ourselves we are likely to give away. This might mean more and greater privacy worries. German researchers have raised the concern that if our online avatars mirror our real-world movements and gestures, these “motor intentions” and the “kinetic fingerprints” of our unique movement signatures can be tracked, read, and exploited by predatory entities. Again, it’s clear that there needs to be an open and consultative dialogue with regards to what is collectable, and what should be off-limits in terms of our virtual activities.

This list is not exhaustive, and some of these concerns will be proven groundless in good time. Regardless, as non-technicians and future users, we are right to demand full and clear explanations as to how these tripwires will be averted or mitigated by VR companies.

Five concerns about government biometric databases and facial recognition

face recognition

Last Thursday, the Australian government announced its existing “Face Verification Service” would be expanded to include personal images from every Australian driver’s license and photo ID, as well as from every passport and visa. This database will then be used to train facial recognition technology so that law enforcers can identify people within seconds, wherever they may be – on the street, in shopping malls, car parks, train stations, airports, schools, and just about anywhere that surveillance cameras pop-up…

Deep learning techniques will allow the algorithm to adapt to new information, meaning that it will have the ability to identify a face obscured by bad lighting or bad angles…and even one that has aged over several years.

This level of penetrative surveillance is obviously unprecedented, and is being heavily criticized by the country’s civil rights activists and law professors who say that Australia’s “patchwork” privacy laws have allowed successive governments to erode citizens’ rights. Nevertheless, politicians argue that personal information abounds on the internet regardless, and that it is more important that measures are taken to deter and ensnare potential terrorists.

However worthy the objective, it is obviously important to challenge such measures by trying to understand their immediate and long-term implications. Here are five glaring concerns that governments mounting similar initiatives should undoubtedly address:

  1. Hacking and security breaches

The more comprehensive a database of information is, the more attractive it becomes to hackers. No doubt the Australian government will hire top security experts as part of this project, but the methods of those intent on breaching security parameters are forever evolving, and it is no joke trying to mount a defense. Back in 2014 the US Office of Personnel Management (OPM) compromised the personal information of 22 million current and former employees due to a Chinese hack, which was one of the biggest in history. Then FBI Director James Comey said that the information included, “every place I’ve ever lived since I was 18, every foreign travel I’ve ever taken, all of my family, their addresses.”

  1. Ineffective unless coverage is total

Using surveillance, citizen data and/or national ID cards to track and monitor people in the hopes of preventing terrorist attacks (the stated intention of the Aussie government) really requires total coverage, i.e. monitoring everyone all of the time. We know this because many states with mass (but not total) surveillance programs – like the US – have still been subject to national security breaches, like the Boston Marathon bombing. Security experts are clear that targeted, rather than broad surveillance, is generally the best way to find those planning an attack, as most subjects are already on the radar of intelligence services. Perhaps Australia’s new approach aspires to some ideal notion of total coverage, but if it isn’t successful at achieving this, there’s a chance that malicious parties could evade detection by a scheme that focuses its attentions on registered citizens.

  1. Chilling effect

Following that last thought through, in the eyes of some, there is a substantial harm inflicted by this biometrically-based surveillance project: it treats all citizens and visitors as potential suspects. This may seem like a rather intangible consequence, but that isn’t necessarily the case. Implementing a facial recognition scheme could, in fact, have a substantial chilling effect. This means that law-abiding citizens may be discouraged from participating in legitimate public acts – for example, protesting the current government administration – for fear of legal repercussions down-the-line. Indeed, there are countless things we may hesitate to do if we have new concerns about instant identifiability…

  1. Mission creep

Though current governments may give their reassurances about the respectful and considered use of this data, who is to say what future administrations may wish to use it for? Might their mission creep beyond national security, and deteriorate to the point at which law enforcement use facial recognition at will to detain and prosecute individuals for very minor offenses? Might our “personal file” be updated with our known movements so that intelligence services have a comprehensive history of where we’ve been and when? Additionally, might the images used to train and update algorithms start to come from non-official sources like personal social media accounts and other platforms? Undoubtedly, it is already easy to build-up a comprehensive file on an individual using publically available data, but many would argue that governments should require a rationale – or even permission – for doing so.

  1. False positives

As all data scientists know, algorithms working with massive datasets are likely to produce false positives, i.e. such a system as proposed may implicate perfectly innocent people for crimes they didn’t commit. This has also been identified as a problem with DNA databases. The sheer number of comparisons that have to be run when, for instance, a new threat is identified, dramatically raises the possibility that some of the identifications will be in error. These odds increase if, in the cases of both DNA and facial recognition, two individuals are related. As rights campaigners point out, not only is this potentially harrowing for the individuals concerned, it also presents a harmful distraction for law enforcement and security services who might prioritize seemingly “infallible” technological insight over other useful, but contradictory leads.

Though apparently most Australians “don’t care” about the launch of this new scheme, it is morally dangerous for governments to take general apathy as a green light for action. Not caring can be a “stand-in” for all sorts of things, and of course most people are busy leading their lives. Where individual citizens may not be concerned to thrash out the real implications of an initiative, politicians and their advisors have an absolute responsibility to do so – even where the reasoning they offer is of little-to-no interest to the general population.

Online dating’s hints of Stoicism

couple

Yesterday, I examined why some believe that data and the internet are conspiring to limit both our attention, and the fields of our knowledge/interest. Today I’m presenting something entirely different, namely the results of a forthcoming report which demonstrate how the phenomenon of online dating is actively altering the fabric of society by expanding our worlds.

An overview of the paper is available here, but in a nutshell, researchers from the University of Essex and the University of Vienna have been studying the social connections between us all, and have revealed how so many of us meeting (and mating with!) complete strangers through online dating is having the effect of broadening out our whole society.

Economists Josue Ortega and Philipp Hergovich argue that, whereas just a couple of decades ago most new people arriving into our social circle (e.g. a new partner) were just a couple of connections away from us to begin with (i.e. someone you meet through existing friends, or that lives in your local community), now our digital “matchings” with random folk from the internet mean that for many of us, our social reach extends much further than it ever would have done – i.e. into completely separate communities.

Looking at the bigger picture, this means that our little clusters of friends/family/neighbors no longer exist in relative isolation because: “as far as networks go, this [dating strangers] is like building new highways between towns…just a few random new paths between different node villages can completely change how the network functions.” This bridging between communities is perhaps most vivid when considering the growing numbers of interracial couples. Indeed, the report’s authors claim that their model predicts almost complete racial integration post the emergence of online dating.

This put me in mind of the concentric circles of Stoic philosophy (further popularized by the modern philosopher Professor Martha Nussbaum). This simple image has existed for centuries and has been described by Nussbaum as a “reminder of the interdependence of all human beings and communities.” It is supposed to encapsulate some of the ancient ideas of belonging and cosmopolitanism, and is similar to the expanding circles of moral concern explained by Philosopher Peter Singer:

hierocles-concentric-circles

As its inventor, Hierocles, imagined it, the most external circles should be pulled in as strangers are treated as friends, and friends as relatives. This happens as we increase our own efforts to recognize the habits, cultures, aims and aspirations of others and consider them akin to – and even constitutive of – our own.

In many respects, the evolution of the internet (as well as other media) has built upon the foundations of global travel to help us realize Hierocles’ rudimentary diagram. Though we still have strong ideas about personal, familial and community identity, the broadening out of our non-virtual social network – as exemplified by this work on online dating – means that our connections and concerns are not limited to the smaller, inner circles any longer. We increasingly draw those from the furthermost circles inward. As Singer argues, this must also mean that our ethical/moral concern emanates outward beyond our immediate vicinities.

Yet, not only can the internet (and in this case, data matching) bring those outer circles in, but in some ways it also seems to enable the distribution of “the self” and – more pertinently – a community…

I remember back in 2012, when I was working in PR and public affairs, there was a lot of talk about current “trends”. One of the ones that has stuck with me was nicknamed something like “patchwork people”. It referred, I think rather observantly, to the notion that so many of us feel better defined by the virtual/global communities we inhabit (perhaps communities based around hobbies or research or careers or fandom) than we do our immediate physical communities, within which we might rarely interact.

Whether the internet is allowing us to draw others into our understanding of the world, or whether we feel that our understanding of the world is mainly constituted by connections to others outside of the “natural” inner circles, there seems to be no doubt that the natural order of priority is evolving, and it will be fascinating to see how and if it continues to progress.

The pros and cons of “big data” lending decisions

lending.jpg

Just as borrowing options are no longer limited to the traditional bank, increasingly new types of lenders are diverging from the trusted credit score system in order to flesh out their customer profiles and assess risk in new ways. This means going beyond credit/payment relevant data and looking at additional factors that could include educational merits and certifications, employment history, which websites you visit, your location, messaging habits, and even when you go to sleep.

Undoubtedly, this is the sort of thing that strikes panic into the hearts of many of us. How much is a creditworthy amount of sleep? Which websites should I avoid? Will they hold the fact I flunked a math class against me? Nevertheless, proponents of “big data” (it’s really just data…) risk assessment claim that this approach works in favor of those who might be suffering from the effects of a low credit score.

Let’s take a look…

Pros

The fact is, credit scores don’t work for everyone and they can be difficult to improve depending upon your position. Some folks, through no fault of their own, end up getting the raw end of the deal (perhaps they’re young, a migrant, or they’ve just had a few knockbacks in life). Now given these newer models can take extra factors into account –  including how long you spend time reading contracts, considering application questions, and looking at pricing options – this additional information can add a further dimension to an application, which in turn may prompt a positive lending decision.

A recent article looked at the approach of Avant, a Chicago-based start-up lender, which uses data analytics and machine learning to “streamline borrowing for applicants whose credit scores fall below the acceptable threshold of traditional banks”. They do this by crunching an enormous 10,000+ data points to evaluate applicants. There isn’t much detail in terms of what these data points are, but doubtless they will draw upon the reams of publicly available information generated by our online and offline “emissions” – webpages browsed, where we shop, our various providers, social media profiles, friend groups, the cars we drive, our zip codes, etc etc etc. This allows the lender to spot patterns not “visible” to older systems – for example, where a potential customer has similar habits to those with high credit scores, but has a FICO score of 650 or below.

The outcome – if all goes well – is that people are judged on factors beyond their credit habits, and for some individuals this will open-up lending opportunities where they had previously received flat “nos”. Great news!

This technology is being made available to banks, or anyone who wants to lend. They may even eventually outmode credit scores, which were an attempt to model credit worthiness in a way that avoided discrimination and the unreliability of a bank manager’s intuition…

So, what are the downsides?

Cons

There are a number of valid concerns about this approach. The first of which regards what data they are taking, and what they are taking it to mean. No algorithm, however fancy, can use data points to understand all the complexities of the world. Nor can it know exactly who each applicant is as an individual. Where I went to school, where I worked, whether I’ve done time, how many children I have, what zip code I live in – they are all being used as mere proxies for certain behaviors I may or may not have. In this case they are being used as proxies for whether or not I am a credit risk.

Why is this an issue? Well, critics of this kind of e-scoring, like Cathy O’Neill, author of Weapons of Math Destruction, argue that this marks a regression back to the days of the high street bank manager. In other words, instead of being evaluated as an individual (as with a FICO score which predominantly looks your personal debt and bill paying records), you are being lumped in a bucket with “people like you”, before it is decided whether such people can be trusted to pay money back.

As O’Neill eloquently points out, the question becomes less about how you have behaved in the past, and about how people like you have behaved in the past. Though proxies can be very reliable (after all, those who live in rich areas are likely to be less of a credit risk than those who live in poor neighborhoods), the trouble with this system is that when someone is unfairly rejected based on a series of extraneous factors, there is no feedback loop to help the model self-correct. Unlike FICO, you can’t redeem yourself and improve your score. So long as the model performs to its specification and helps the lender turn a profit, it doesn’t come to know or care about the individuals who are mistakenly rejected along the way.

There is an important secondary problem with leveraging various data sources to make predictions about the future. There is no way of knowing in every case how this data was collected. By this I mean to say, there is no way of knowing whether the data itself is already infused with bias, which consequently biases the predictions of the model. Much has been made of this issue within the domain of predictive policing, whereby a neighborhood which has been over zealously policed in the past is likely to have a high number of arrest records, which tells an unthinking algorithm to over-police it in the future, and so the cycle repeats… If poor data is being used to make lending decisions, this could have the after effect of entrenching poverty, propagating discrimination, and actively work against certain populations.

Lastly (and I’m not pretending these lists of pros and cons are exhaustive), there is a problem when it comes to the so-called “chilling effect”. If I do not know how I am being surveyed and graded, this might lead me to behave in unusual and overcautious ways. You can interrogate your FICO report if you want to, but these newer scoring systems use a multitude of other unknown sources to understand you. If you continue to get rejected, this might result in you changing certain aspects of your lifestyle to win favor. Might this culminate in people moving to different zip codes? Avoiding certain – perfectly benign – websites? Etcetera, etcetera. This could lead to the unhealthy manipulation of people desperate for funds…

So, is this new way of calculating lending risk a step forward or a relapse into the bad practices of the past? Well having worked for the banking sector in years gone by, one thing still sticks in my mind when discussions turn to lending obstructions: lenders want to lend. It’s a fairly important part of their business model when it comes to making a profit (!). At face value, these newer disrupters are trying to use big data analytics to do exactly that. In a market dominated by the banks, they’re using new and dynamic ways to seek out fresh prospects who have been overlooked by the traditional model. It makes sense for everyone.

However, there is clearly the need for a cautionary note though. Although this method undoubtedly praiseworthy (and canny!) we should also remember that such tactic can breed discrimination regardless of intentions. This means that there needs to be some kind of built-in corrective feedback loop which detects mistakes and poorly reasoned rejections. Otherwise, we still have a system that continually lends to the “same type of people”, even if it broadens out who those people might be. The bank manager returns.

Having a fair and corrigible process also means that lenders need to be more open about the data metrics they are using. The world – and particularly this sector – has been on a steady track towards more transparency, not less. This is difficult for multiple reasons (which warrant another discussion entirely!) but as important as it is to protect commercial sensitivity and prevent tactics like system gaming, it is also critical that applicants can have some idea with regards to what reasonable steps they can take to improve their creditworthiness if there are factors at play beyond their credit activity.

What if Twitter could help predict a death?

I want to use this blog to look at how data and emerging technologies affect us – or more precisely YOU. As a tech ethics researcher, I’m perpetually reading articles and reports that detail the multitude of ways in which data can be used to anticipate bad societal outcomes: criminality, abuse, corruption, disease, mental health, etc etc. Some of these get oxygen, some of them don’t. Some of them have integrity, some don’t. Often these tests, analyses, and studies identify problems that gesture toward ethically “interesting” solutions.

Just today this article caught my attention. It details a Canadian study that tries to get to grips with an endemic problem: suicide in young people. Just north of the border, suicide causes no fewer than 24% of deaths amongst those aged between 15 and 24 (Canadian Mental Health Association). Clearly, this is not a trivial issue.

In response, a group of researchers have tried to determine the signs of self-harm and suicide by studying the social media posts of those in the most vulnerable age bracket. The team – from SAS Canada – have even speculated that, “these new sources could provide early indication of possible trends to guide more formal surveillance activities.” So, with the prospect of officialdom being dangled before us, it’s important to ask how this social media analysis works. In short, might any one of us land-up being surveilled as a suicide risk if we happen to make a trigger comment or two on Twitter?

Well the answer seems to be “possibly”. This work harvested 2.3 million tweets, of which 1.1 million were identified as “likely to have been authored by 13 to 17-year-olds in Canada”. This determination was made by a machine learning model that has been trained to predict age by relying on the way young people use language. So, if the algorithm thinks you tweet like a teenager, you’re potentially on the hook. From there, the team looked for where these tweets related to depression and suicide, and “picked some specific buzzwords and created topics around them, and our software mined those tweets to collect the people.”social media

Putting aside the undoubtedly harrowing idea of people collection, it’s important to highlight the usefulness of this survey. The data scientists involved insist that the data they’ve collected can help them narrow down the Canadian regions which have a problem (although one might contest that the suicide statistics themselves should reveal this), and/or identify a particular school or a time of year in which the tell-tale signs are more widespread or stronger. This in turn can help better target campaigns and resources, which – of course – is laudable, particularly if it is an improvement on existing suicide statistics. It only starts to get ethically icky once we consider what further steps might be taken.

The technicians on the project speculate as to how this data might be used in the future. Remember, we are not dealing with anonymized surveys here, but real teen voices “out in the wild”: “He (data expert Jos Polfliet) envisions the solution being used to find not only at-risk teens, but others too, like first responders and veterans who may be considering suicide.”

Eh? Find them? Does that mean it might be used to actually locate real people based on what they’ve tweeted on their personal time? As with many well-meaning data projects, everything suddenly begins to feel a little Minority Report at this point. Although this study is quite obviously well-intentioned, we are fooling ourselves if we don’t acknowledge the levels of imprecision we’re dealing with here.

Firstly, without revealing the actual identities of every account holder picked-out by the machine learning, we have no way of knowing the levels of accuracy these researchers have hit upon when it comes to monitoring 13-17 year-olds. Although the use of certain language and terminologies might be a good proxy for the age of the user, it certainly isn’t an infallible one in the wacky world of the internet.

Secondly, the same is true of suicide and depression-related buzzwords. Using a word or phrase typically associated with teen suicide is not a sufficient condition for a propensity towards suicide (indeed, it is unlikely to even be a necessary condition). As Seth Stephens-Davidowitz discussed in his new book Everybody Lies: Big Data, New Data, And What the Internet Can Tell Us About Who We Really Are, in 2014 research found that there were 6,000 Google searches for the exact “how to kill your girlfriend” and yet there were “only” 400 murders of girlfriends. In other words, not everyone who vents on the internet is in earnest, and many who are earnest in their intentions may not surface on the internet at all. So, in short, we don’t know exactly what we’ve got when we look at these tweets.

Lastly, without having read the full methodology, it appears that these suicide buzzwords were hand-picked by the team. In other words, they were selected by human beings, presumably based on what sorts of things they deemed suicidal teens might tweet. Fair enough, but not particularly scientific. In fact, this sort of process can be riddled with guesswork and human bias. How could you possibly know with any certainty, even if instructed by a physician or psychiatrist, exactly which kinds of words of phrases denote true intention and which denote teenage angst?

Hang on a second – you might protest – these buzzwords may have been chosen by a very clever, objective algorithm? Yet, even if a clever algorithm could somehow ascertain the difference between a “I hate my life” tweeted by a genuinely suicidal teen and a “I hate my life” tweeted by a tired and hormonal teenager (perhaps based on whatever language it was couched in), to make this call it would have to have been trained on data which used the tweets of teens who have either a) committed suicide or b) have been diagnosed/treated for depression. To harvest such tweets, the data would have to rely upon more than Twitter alone… all information would have to be cross-referenced with other databases (like medical records) in ways that would undoubtedly de-anonymize.

So, with no guarantees of accuracy, the prospect of physical intervention by social services or similar feels like a scary one – as is the idea of ending up on a watchlist because of a bad day at school. Particularly when we don’t know how this data would be propagated forward…

Critically, I am not trying to say that the project isn’t useful, and SAS Canada are forthcoming in their acknowledgment that ethical conversations that need to take place. Nevertheless, this feels like the usual ethical caveat which acts as a disclaimer on work that has already taken place and – one might reasonably assume – is already informing actions, policies, and future projects.

Some of the correlations this work has unveiled clearly have some value, for example, there is a 39% overlap between conversations about suicide and conversations about bullying. This is a broad trend and a helpful addition to an important narrative. Where it becomes unhelpful, however, is when it enables and/or legitimizes the external surveillance of all bullying-related conversations on social media and – to carry that thought forward – some kind of ominous, state sanctioned “follow-up” for selected individuals…