Mapping the University’s Social Media Footprint – A Practice-Led Project

Between January and April 2016 I undertook a study of the use of Twitter for public engagement among members of the University of Nottingham staff.  The project was run under the auspices of CaSMa – Citizen-Centric Approaches to Social Media Analysis – a research team at the Horizon Digital Economy Research Institute that explores methods of performing social media research that respect the rights to privacy and ownership of personal data of social media users.  Consenting study participants provided data from their Twitter feed by exporting it from a web tool that was designed primarily to allow users to monitor and manage their Twitter activity.  Using graph visualisation software Gephi I created an image of the network of interactions created by the tweeting, retweeting, quoting, mentioning, favouriting and following events in the data, and performed an analysis of hashtags propagation to look for signs of successful public engagement.


It was a challenging project.  Designing the data collection in line with CaSMa’s citizen-centric ethos required meeting with each participant in person (and consequently much to-ing and fro-ing between the University of Nottingham’s various campuses and partner organisations), talking them through the web tool and the data collection process, and obtaining their written consent to analyse their data.  Once the collection procedure was in place, I had to work out what to do with the data: it was delivered to me in json format text files, and in order to be able to render complete data sets much sorting and parsing of the data structures was needed.  The text files presented a series of events: tweets, retweets, favouriting, following, etc.  I needed to find, for example, where in the hierarchical text structure I could find the ID of the initiator of a particular event – the person who had written a tweet, or like a tweet, or retweeted a tweet.  In the latter two cases I also wanted to know the ID of the person whose tweet had been liked or retweeted.  This information was not nested in exactly the same place in every event type, and a considerable amount of time was spent establishing the necessary paths within each event type.  Once this was established, I used Python to retrieve the data and compile it into uniform data sets.  I had not previously done any programming, and getting to grips with the language was a real learning experience.

The code compiled primary user, secondary user, and mentioned user data, and with this I created a network graph visualisation.  The final product looked like this:

final final graph

The layout is determined by mathematical algorithm, and the colours a result of a modularity analysis carried out by the software to identify discrete communities based on interactions.  Unsurprisingly, most of the communities in the image above are centered on my participants, although the blue, purple and black communities subsume more than one individual participant, and not, in all cases, by the conscious design of the users themselves.  Outlying coloured dots that seem to have ‘escaped’ their neighbours represent individuals who bridge two communities (and are consequently located equi-distant between the two).

Combining this approach with an analysis of hashtags suggested that successful uptake of a hashtag-denoted topic or event can be aided by recruiting partners to help spread the message.  However, detecting true public engagement proved challenging.  Due to the data collection method, full profile data were only collected on users tweeting or retweeting, and not from users favouriting or following, resulting in profile data for only 60% of users.  Consequently, it was not possible to perform a robust analysis of users as ‘inside’ or ‘outside’ the academic community, and to what extent the message was reaching a general ‘public’, or circulating around a more specialised audience.  In fact, this consideration raised questions of who constitutes the ‘public’ in public engagement, and whether the concept of a demarcated ‘academia’ is a valid proposition (apologies for all the air-quotes).

Further research could look at finding computational methods to process profile descriptions and produce judgements of the likely affiliation of an individual.  However, this would again raise ethical questions, which are going to become more and more salient in future social media research.

Pathways to STEM


On the 16th of March 2016 I participated in the Pathways to STEM outreach event at the central library in Mansfield.  Around 300 year 10 pupils from schools in the Mansfield area who had shown interest in STEM subjects at school (Science, Technology, Engineering, Mathematics) were invited to come to the library to meet postgraduate students from the University of Nottingham, who brought along examples of, and activities based on, their research.

Two days of training from the UoN Graduate School a few weeks earlier had result in around fifty interested postgrads forming small groups based on shared(ish) research interests, with the aim of creating a short activity for the school pupils to engage with.  At this stage of the process, I initially felt a little discouraged, as my very tenuous links to STEM gave me little common ground with the chemists, physicists, biologists and engineers around me.  I found myself in a group with an agricultural scientist specialising in efficiency in dairy farming, and an engineer working on developing new, non-invasive ways to accurately measure the heart rate of newborn babies.  With such disparate disciplines we opted to share a stand at the event, but develop small activities individually, under the umbrella theme of “What Technology Can Do For Us”.

I found it difficult at first to design a short, fun activity based on my research.  I thought that the most salient application of technology within the areas of Applied Linguistics with which I was most familiar was the use of computerised language corpora to study patterns of language in use.  My initial ideas were to involve the students in the process of corpus creation, perhaps to create a corpus from samples of their own classwork, and then perform some basic analyses of their language use.  However, while this is an intriguing idea, it was too involved for the format of the day.

In the end I settled on the idea of exploring the linguistic phenomenon of collocation.  This is the tendency of certain words to ‘attract’ certain other words, and so co-occur together frequently.  To put it another way, it is the tendency of language users to have ‘go to’ combinations of words which they can pull out and use with minimal mental effort.  The strength of attraction between words varies, but at the high end of the scale are semi-fixed combinations such as ‘torrential rain’ and ‘excruciating pain’.  The development of computerised corpora over the past thirty years or so has facilitated the study of this phenomenon, and the strength of attraction can be quantified using various statistical tests that generate scores; the stronger the connection, the higher the score.

In addition to being a good example of the application of technology in the study of language, I chose to focus on collocation because it is something that all native speakers of a language intuitively understand.  Show any native English speaker a sentence in which the word following ‘torrential’ has been removed, there is a very good chance indeed that they will supply ‘rain’, or possibly ‘downpour’ to fill the gap.  Running with this idea, I thought that I could sell this awareness as a form of mind reading, and the idea for ‘I Can Read Your Mind’ was born.


Using my newly-acquired powers of telepathy to get the students’ attention, I then wanted to explain a little about the use of computerised corpora to study this, and let them try it out for themselves.  I decided to give the students an adjective chosen at random, and ask them to guess the five words that collocate most strongly with it.  Using an online corpus analysis tool I generated a list of these words from the British National Corpus, a 100 million-word collection of text assembled in the 1990s with the intention of creating a representative sample of modern British English.  I wrote a short computer program that compared the students’ five guesses with the top twenty words from the corpus, and scored each guess according to its rank in the top twenty.  Their total scores would then be recorded on a leaderboard, to add a competitive element to the activity.

My team mates prepared great activities.  Shiemaa brought along her cardiac monitor that allowed the students to see their heart rate in real time simply by holding two light-emitting contacts in their fingertips and go home with a print out of the signal, while Emma prepared a fantastic Monopoly-style game in which students took over the running of a dairy farm for a five-year period, applying various technological methods seeing their effects on feed price and milk yield!  By our final preparatory meeting the week before the event, we were confident we had a good stand.

The day went smoothly and enjoyably.  In two ninety-minute sessions, the students circulated around the large events room at the library, stopping at the different stands and experiencing a range of scientific and technological works-in-progress, from 3D printers, to disease prevention, to optimized growing conditions for plants, to DNA Jenga.  My team’s activities went well, and the students were suitably wowed by my mind-reading powers.  I learned several interesting things, notably that my activity was a little unfair, as some words had very obvious collocating nouns (the girls who got the word ‘healthy’, for example, scored very highly with ‘food’, diet’, ‘lifestyle’ etc.), whilst others had a tendency to take rather more obscure and difficult words.  Furthermore, it was very hard to predict, without looking at the wordlist, which adjectives would be challenging, so even when I became aware of the problem and started pre-selecting adjectives (previously I was using an online random word generator), I still couldn’t guarantee non-zero scores.  Still, most teams seemed to enjoy the activity, and there were several real eye-opening moments.  The team of boys who got the adjective ‘nice’ thought they were being jokers when they wrote down the word ‘guy’… but it turned out to be the top word.


The event as a whole was a great success, and the organisers passed on the following feedback from students and teachers:

Teacher emails:

I just want to thank you firstly for a fantastic event today. I saw every single student engage in an activity and interact with the people who were leading the activities. It was really good to see them enjoy themselves and for some stretch themselves a little.

Thank you for yesterday I had a thoroughly enjoyable and informative afternoon as did my colleagues and more importantly my students.

Student comments:

I learnt lots about how science effects our world

I learnt lots and all the events were cool

I could see science displayed in different ways to how it is done in school

It was a great opportunity for me to apply my research interests and really gave me a fresh perspective on my work.