You are browsing the archive for Sam Leon.

Digital Methods Initiative Winter School, University of Amsterdam

- January 27, 2015 in Events, Блог, Интернационален

Exploding book in the pulpit of the De Krijtberg Church in Spui, Amsterdam where some of the sprint took place

Exploding book in the pulpit of the Algemene Doopsgezinde Sociëteit in Spui, Amsterdam where some of the sprint took place

Last week I attended the 7th annual Winter School at Amsterdam University. Run by the Digital Methods Initiative, it took the form of a data sprint in which students joined professional developers and designers to answer research questions using social media data.

The DMI group at Amsterdam have developed and collated a suite of easy-to-use tools specifically for this kind of research. They are well worth checking out for anyone interested in this field and they cover a range of techniques from web scraping to list triangulation, and can be found online here.

I joined a group looking at bias across three APIs through which you can acquire Twitter data: the Search API, the Stream API and the proprietary Firehose endpoint – generally regarded as the most complete source of Twitter data. We had three sets captured from the three separate APIs for a critical period between 7th and 15th October 2014 when the Hong Kong protests were taking place.

Other groups took on a range of tasks from mapping the open data revolution to tracking the global climate change debate. All projects deployed a range of data wrangling techniques to answer these complex social, political and cultural phenomena.

A few things I learned:

  • Anyone wanting to use social media data to answer research questions about society and culture needs more than just spreadsheet skills. These datasets are generally larger than what Excel can comfortably handle, so basic database skills are a massive help.
  • Off-the-shelf tools for data analysis are brilliant, but often one needs to tweak lines of enquiry to your specific research question. Having some knowledge of programming means that you can take a much more flexible approach then when relying on the GUI tools.
  • Working in such a collaborative fast-paced environment meant that reproducibility (ie. where different parts of the team would re-use scripts and code developed by other parts of the team) was essential, alongside creating documentation on the fly. We found iPython notebooks especially useful for this, whereas analytical steps taken in Excel were harder to reproduce.
  • Free Twitter data – like that which can be acquired from the Search and Stream API – is still good, and sometimes better than that which you get through the proprietary APIs. When investigating online reactions to contentious and controversial events – such as the Hong Kong protests – tweets will inevitably be removed both by users and Twitter. If you want to get the full story, it’s far better to scape data as it comes in through the streaming API.
  • We’ve written about it before on this blog but the Pandas module for Python is brilliant for data wrangling and analysis and well worth getting to know if you plan on working with big datasets. It’s quick, flexible and powerful.
  • Nothing beats hands-on learning when it comes to technical skills. Having a motivating research question and some real life data is the best way to learn how to use the multitude of tools now at any budding data wranglers disposal. I learnt more in a week than I could have in months reading about tools and languages in the abstract!

For those interested in attending a DMI school in the future – take a look at the summer school coming later in 2015.

Flattr this!

Uncovering Asia and Data Journalism in the Philippines

- November 28, 2014 in Events

Last weekend I went to Manila to attend Asia’s first international conference for investigative journalists, Uncovering Asia. Run by the Global Investigative Journalism Network and the Philippines Centre for Investigative Journalism (PCIJ), the event brought together over 200 journalists from countries across Asia and the world.

Sheila Coronel giving her keynote at the conference. A [transcript of her talk](http://www.rappler.com/thought-leaders/76100-truth-power-asian-value) has been published on Rappler.

Sheila Coronel giving her keynote at the conference. A [transcript of her talk](http://www.rappler.com/thought-leaders/76100-truth-power-asian-value) has been published on Rappler.

The opening keynote was given by Sheila Coronel, Professor of Investigative Journalism at Columbia Journalism School and co-founder of PCIJ. Coronel rose to prominence in the Philippines as a journalist for the magazine Panorama, she widely reported on the human rights abuses of the Marcos dictatorship in its final years. Her talk, entitled 9 billion eyes: Holding power to account in the world’s largest continent, painted a picture of Asia in which conditions were generally improving for investigative journalists, but in which there was more need than ever for a vibrant fourth estate to combat the abuse of power and corruption.

Coronel’s opening talk covered a range of corruption scandals and the way they affect people’s lives in Asia. She referred to the environmental damage wrought by egregious logging companies in Malaysia, as well as the poorly constructed Sichuanese buildings that crumbled in the earthquake of 2008 killing thousands of people. In the Philippines, she mentioned the botched public road projects that hamper the ability of farmers to move their goods around the country, as well as a shortage of textbooks in classrooms that prevent many Philippine children from getting the education which they’re entitled to.

Coronel also spoke of the factors that continued to inhibit the development of a vibrant fourth estate in many Asian countries. These included government gagging laws, like those passed by the Japanese government in the aftermath of the Fukushima disaster, but also the social backlash faced by journalists that publish on sensitive topics, such as Islam in Indonesia. The physical and legal risks that investigative journalists in the Philippines expose themselves to was underlined by the fact that the first day of the conference took place on the 5 year anniversary of the Amputuan Massacre in which the single largest killing of journalists took place.

Vigil for the victims of the Amputuan Massacre on the first evening of the conference in Quezon City, Philippines.

Vigil for the victims of the Amputuan Massacre on the first evening of the conference in Quezon City, Philippines.

Despite the seriousness of the challenges still faced within many Asian newsrooms, the mood of Coronel’s opening keynote and of the conference more generally was optimistic. One of the major threads of the conference was the opportunity for investigative journalism that exploits technology and public data. In the wake of international initiatives like the Open Government Partnership, most Asian journalists now find themselves with an ever expanding quantity of public interest open data. The Philippines, for instance, unveiled its open data portal earlier this year which contains a wide range of data on government spending, procurement and reconstruction – topics that have long been the subject of corruption investigations by the likes of PCIJ but about which information has traditionally been patchy and scarce.

Credit goes to the organisers who developed a brilliant data track that gave participants an opportunity to learn how to find stories in public data. As we at Open Knowledge and School of Data believe, data literacy skills are critical if initiatives to release open data are to drive accountability. I ran a session with Nils Mulvad on data cleansing using Open Refine. Other workshops included an introduction to mapping using Google Fusion Tables abd building collaborative databases and using OCCRP’s Investigative Dashboard. There were also two very well attended sessions on digital security run by Bobby Soriano of Tactical Tech and Smari McCarthy that gave participants hands-on experience with free and affordable tools for protecting sensitive data from intrusion. The BBC’s Paul Myers ran a brilliant workshop on web forensics with a rapid fire demonstration of how to use these tools to find hidden information on the web. This included searching using domain registry searches effectively, checking for hidden files on website servers, using the powerful Facebook graph search and analysing image metadata. Many of the learnings from the data track were captured in the tip sheets published on the Uncovering Asia website, go check them out here!

Flattr this!

Global Witness and Open Knowledge – Working together to investigate and campaign against corruption related to the extractives industries

- November 17, 2014 in Data Journalism

Sam Leon, one of Open Knowledge’s data experts, talks about his experiences working as an School of Data Embedded Fellow at Global Witness.

Global Witness are a Nobel Peace Prize nominated not-for-profit organisation devoted to investigating and campaigning against corruption related to the extractives industries. Earlier this year they received the TED Prize and were awarded $1 million to help fight corporate secrecy and on the back of which they launched their End Anonymous Companies campaign.


In February 2014 I began a six month ‘Embedded Fellowship’ at Global Witness, one of the world’s leading anti-corruption NGOs. Global Witness are no strangers to data. They’re been publishing pioneering investigative research for over two decades now, piecing together the complex webs of financial transactions, shell companies and middlemen that so often lie at the heart of corruption in the extractives industries.

Like many campaigning organisations, Global Witness are seeking new and compelling ways to visualise their research, as well as use more effectively the large amounts of public data that have become available in the last few years.

“Sam Leon has unleashed a wave of innovation at Global Witness”

-Gavin Hayman, Executive Director of Global Witness

As part of my work, I’ve delivered data trainings at all levels of the organisation – from senior management to the front line staff. I’ve also been working with a variety of staff to use data collected by Global Witness to create compelling infographics. It’s amazing how powerful these can be to draw attention to stories and thus support Global Witness’s advocacy work.

The first interactive we published on the sharp rise of deaths of environmental defenders demonstrated this. The way we were able to pack some of the core insights of a much more detailed report into a series of images that people could dig into proved a hit on social media and let the story travel further.

GW Info

See here for the full infographic on Global Witness’s website.

But powerful visualisation isn’t just about shareability. It’s also about making a point that would otherwise be hard to grasp without visual aids. Global Witness regularly publish mind-boggling statistics on the scale of corruption in the oil and gas sector.

“The interactive infographics we worked on with Open Knowledge made a big difference to the report’s online impact. The product allowed us to bring out the key themes of the report in a simple, compelling way. This allowed more people to absorb and share the key messages without having to read the full report, but also drew more people into reading it.”
-Oliver Courtney, Senior Campaigner at Global Witness

Take for instance, the $1.1 billion that the Nigerian people were deprived of due to the corruption around the sale of Africa’s largest oil block, OPL 245.

$1.1 billion doesn’t mean much to me, it’s too big of a number. What we sought to do visually was represent the loss to Nigerian citizens in terms of things we could understand like basic health care provision and education.

See here for the full infographic on Shell, ENI and Nigeria’s Missing Millions.

In October 2014, to accompany Global Witness’s campaign against anonymous company ownership, we worked with developers from data journalism startup J++ on The Great Rip Off map.

The aim was to bring together and visualise the vast number of corruption case studies involving shell companies that Global Witness and its partners have unearthed in recent years.

The Great Rip Off!

It was a challenging project that required input from designers, campaigners, developers, journalists and researchers, but we’re proud of what we produced.

Open data principles were followed throughout as Global Witness were committed to creating a resource that its partners could draw on in their advocacy efforts. The underlying data was made available in bulk under a Creative Commons Attribution Sharealike license and open source libraries like Leaflet.js were used. There was also an invite for other parties to submit case studies into the database.

“It’s transformed the way we work, it’s made us think differently how we communicate information: how we make it more accessible, visual and exciting. It’s really changed the way we do things.”
-Brendan O’Donnell, Campaign Leader at Global Witness

For more information on the School of Data Embedded Fellowship Scheme, and to see further details on the work we produced with Global Witness, including interactive infographics, please see the full report here.

Flattr this!

4 Network Visualisation Tools

- August 20, 2014 in Fusion Table, Google Fusion Table

Network visualisation has become an important tool in the armoury of the data wrangler. An increasing volume of research and journalism is using network analysis and visualisation to gain insight into the real world social, political and cultural networks that influence our lives. Take for instance GFK’s analysis of the European political Twittersphere or Gild Lotan’s piece on personalising propoganda in the Israel-Gaza war.

Instagram co-tag graph produced using Gephi, highlighting three distinct topical communities: 1) pro-Israeli (Orange), 2) pro-Palestinian (Yellow), and 3) Muslim (Pink). Source: http://globalvoicesonline.org/2014/08/04/israel-gaza-war-data-the-art-of-personalizing-propaganda/

Instagram co-tag graph produced using Gephi, highlighting three distinct topical communities: 1) pro-Israeli (Orange), 2) pro-Palestinian (Yellow), and 3) Muslim (Pink). Source: http://globalvoicesonline.org/2014/08/04/israel-gaza-war-data-the-art-of-personalizing-propaganda/

Below I’ve listed some of the top free tools for sketching and analysing the networks that you produce in the course of your investigations. The first two tools are primarily for those who want to visualise networks based on desk research and where there is a need to include many different types of entity. The latter two, Gephi and Google Fusion Tables, are more tailored for use with larger datasets. Gephi in particular let’s you perform in-depth statistical analysis of networks which can be especially useful for analysing social networks.

##VIS: Visual Investigative Scenarios

http://vis.occrp.org/

A tool for producing simple but stylish network maps using a stock of icons for entities that often come up in investigations e.g. people, companies and cases. It also gives you the option to share and embed your networks online, you can also export it for print. It’s in Beta stage at the moment, so play nice and be sure to report any bugs!

Network of diagram mapping the assets of Azeri Officials in Czech Republic, taken from the VIS public gallery: http://vis.occrp.org/

Network of diagram mapping the assets of Azeri Officials in Czech Republic, taken from the VIS public gallery: http://vis.occrp.org/

##Text2MindMap

https://www.text2mindmap.com/

An online tool that turns lists into network structures so you don’t have to fiddle around with positioning when you add entities into your network. It is limited in terms of design options but it’s simplicity means that you can produce your network sketches pretty quickly.

##Google Fusion Tables

https://sites.google.com/site/fusiontablestalks/

Google Fusion Tables now offers a basic network mapping tool. It has some useful filter functionality and although it lacks the deep customisation options and analysis functionality of Gephi (see below) it can produce insightful visualisations.

OpenOil’s attempt to map BP and its subsidiaries using Google Fusion Tables. More information [here] (http://openoil.net/corporate-networks/bp-corporate-network/)

OpenOil’s attempt to map BP and its subsidiaries using Google Fusion Tables. More information [here] (http://openoil.net/corporate-networks/bp-corporate-network/)

##Gephi

https://gephi.github.io/

A desktop tool for performing powerful network analysis and creating slick network visualisations. For those interested in experimenting with Gephi, I would recommend that you try and visualise your own Facebook network. Find more details on how to do this here: https://www.youtube.com/watch?v=kbLFMObmLNQ. School of Data has also published tutorials for mapping company networks and social network analysis.

GFK and University of Vienna's research on the key influencers of the EU Twittersphere: key influencers in the EU Twittersphere: http://www.gfk.com/documents/whitepaper/eurotwittersphere_final.pdf

GFK and University of Vienna’s research on the key influencers of the EU Twittersphere: key influencers in the EU Twittersphere: http://www.gfk.com/documents/whitepaper/eurotwittersphere_final.pdf

Flattr this!