workshop – Школа за податоци – Македонија

The Data Journalism Bootcamp at AUB Lebanon

Ali Rebaie — Thu, 29 Jan 2015 10:20:18 +0000

Data love is spreading like never before. Unlike previous workshops we did in the MENA region, on the 18^th of January 2015, we gave an intensive data journalism workshop at the American University of Beirut for four consecutive days in collaboration with Dr. Jad Melki, Director of media studies program at AUB. The Data team at Data Aurora were really happy sharing this experience with students from different academic backgrounds, including media studies, engineering or business.

The workshop was mainly led by Ali Rebaie, a Senior School of Data fellow, and Bahia Halawi, a data scientist at Data Aurora, along with the data community team assistants; Zayna Ayyad, Noor Latif and Hsein Kassab. The aim of the workshop was to give the students an introduction to the world of open data and data journalism, in particular, through tutorials on open source tools and methods used in this field. Moreover, we wanted to put students on track regarding the use of data.

On the first day, the students were introduced to data journalism, from a theoretical approach, in particular, the data pipeline which outlined the different phases in any data visualization project: find, get, verify, clean, analyze and present. After that, students were being technically involved in scraping and cleaning data using tools such as open refine and Tabula.

Day two was all about mapping, from mapping best practices to mapping formats and shapes. Students were first exposed to different types of maps and design styles that served the purpose of each map. Moreover, best mappings techniques and visualizations were emphasized to explain their relative serving purpose. Eventually, participants became able to differentiate between the dot maps and the choropleth maps as well as many others. Then they used twitter data that contained geolocations to contrast varying tweeting zones by placing these tweets at their origins on cartodb. Similarly, they created other maps using QGIS and Tilemill. The mapping exercises were really fun and students were very happy to create their own maps without a single line of code.

On the third day, Bahia gave a lecture on network analysis, some important mathematical notions needed for working with graphs as well as possible uses and case studies related to this field. Meanwhile, Ali was unveiling different open data portals to provide the students with more resources and data sets. After these topics were emphasized, a technical demonstration on the use of network analysis tool to analyze two topics was performed. Students were analyzing climate change and later, the AUB media group on Facebook was also analyzed and we had its graph drawn. It was very cool to find out that one of the top influencers in that network was among the students taking the training. Students were also taught to do the same analysis for their own friends’ lists. Facebook data was being collected and the visualizations were being drawn in a network visualization tool.

After completing the interactive types of visualizations, the fourth day was about static ones, mainly, infographics. Each student had the chance to extract the information needed for an interesting topic to transform it into a visual piece. Bahia was working around with students, teaching them how to refine the data so that it becomes simple and short, thus usable for building the infographic design. Later, Yousif, a senior creative designer at Data Aurora, trained the students on the use of Photoshop and illustrator, two of the tools commonly used by infographic designers. At the end of the session, each student submitted a well done infographic of which some are posted below.

After the workshop Zayna had small talks with the students to get their feedback and here she quoted some of their opinions:

“It should be a full course, the performance and content was good but at some point, some data journalism tools need to be more mature and user-friendly to reduce the time needed to create a story,” said Jad Melki, Director of media studies program at AUB, “it was great overall.”

“It’s really good but the technical parts need a lot of time. We learned about new apps. Mapping, definitely I will try to learn more about it,” said Carla Sertin, a media student.

“It was great we got introduced to new stuff. Mapping, I loved it and found it very useful for me,” said Ellen Francis, civil engineering student. “The workshop was a motivation for me to work more on this,” she added, “it would work as a one semester long course.”

Azza El Masri, a media student, is interested in doing MA in data journalism. “I like it I expected it to be a bit harder, I would prefer more advanced stuff in scraping,” she added.

Why should we care about comparability in corruption data?

Tin Geber — Thu, 29 May 2014 20:52:50 +0000

Does comparing diverse data-driven campaigns empower advocacy? How can comparing data on corruption across countries, regions and contexts contribute to efforts on the ground? Can the global fight against corruption benefit from comparable datasets? The engine room tried to find some answers through two back-to-back workshops in London last April, in collaboration with our friends from School of Data and CIVICUS.

The first day was dedicated to a data expedition, where participants explored a collection of specific corruption-related datasets. This included a wide range of data, from international perception-based datasets such as Transparency International’s Global Corruption Barometer, through national Corruption Youth Surveys (Hungary), to citizen-generated bribe reports like I Paid A Bribe Kenya.

Hard at work organizing corruption datatypes.

The second day built on lessons learned in the data expedition. Technologists, data literates and harmonization experts convened for a day of brainstorming and toolbuilding. The group developed strategies and imagined heuristics through an analysis of existing cases, best practices and personal experience.

Here is what we learned:

Data comparability is hard

Perhaps the most important lesson from the data expedition was that one single day of wrangling can’t even begin to grasp the immensely diverse mix of corruption data out there. When looking at scope, there was no straightforward way to find links between locally sourced data and the large-scale corruption indices. Linguistic and semantic challenges to comparing perceptions across countries were an area of concern. Since datasets were so diverse, groups spent a considerable amount of time familiarizing themselves with the available data, as well as hunting for additional datasets. Lack of specific incident-reporting datasets was also noticeable. In the available datasets, corruption data usually meant corruption perception data: data coming from surveys gauging people’s feelings about the state of corruption in their community. Datasets containing actual incidents of corruption (bribes, preferred sellers, etc) were less readily available. Perception data is crucial for taking society’s pulse, but is difficult to compare meaningfully across different contexts — especially considering the fluidity of perception in response to cultural and social customs — and very complex to cross-correlate with incident reporting.

Pattern-finding expedition

An important discussion also came to life regarding the lack of technical capacity among grassroots organizations that collect data, and how that negatively impacts the data quality. For organizations on the ground it’s a question of priorities and capacity. Organisations that operate in dangerous areas, responding to urgent needs with limited resources, don’t necessarily consider data collection proficiency a top-shelf item. In addition, common methods and standards in data collection empower global campaigns for remote actors (cross-national statistics, high-level policy projects etc) but don’t necessarily benefit the organizations on the ground collecting the data. These high-level projects may or may not have trickle-down benefits. Grassroots organizations don’t have a reason to adopt standardized data collection practices, unless it helps them in their day-to-day work: for example providing tools that are easier to use, or having the ability to share information with partner organizations.

Data comparability is possible

While the previous section might paint a black picture, the reality is more positive, and the previous paragraph tells us where to look (or, how to look). The amorphous blob of all corruption-related data is too generically daunting to make sense of — until we flip the process on its head. Like in the best detective novels, starting small and investigating specific local stories of corruption lets investigators find a thread and follow it along, slowly unraveling the complex yarn of corruption towards the bigger picture. So for example, a small village in Azerbaijan complaining about the “Ingilis” that contaminate their water can unravel a story of corruption leading all the way to the presidential family. This excellent example, and many more, come from Paul Radu’s investigative experience, described in the Exposing the Invisible project produced by the Tactical Technology Collective.

Screengrab from “Our Currency is Information” by Tactical Technology Collective

There are also excellent resources that collect and share data in comparable, standardized and functional ways. Open Corporates, for example, collects information on more than 60 million corporations, and provides beautiful, machine-readable, API-pluggable information, ready to be perused by humans and computers, and easily comparable and mashable. If your project involves digging through corporation ownership, Open Corporates will most surely be able to help you out. Another project of note is the Investigative Dashboard that collects scraped business records from numerous countries, as well as hundreds of reference databases.

What happens when datasets just aren’t compatible, and there is no easy way to convince the data producers to make them more user-friendly? Many participants voiced their trust in civic hackers and the power of scraping — even if datasets aren’t provided in machine-readable formats, or standardized and comparable, there are many tools (as well as many helpful people) that can come to the rescue. The best source for finding both? Well, the School of Data, of course. Apart from providing a host of useful tutorials and links, it acts as a hub for engaged civic hackers, data wranglers and storytellers all over the world.

Citizen engagement is key

During a brainstorm where participants compared real-life models of data mashups (surveys, incident reporting, budget data), it became clear that many corruption investigation projects involve crowdsourced verification. While crowdsourcing is a vague concept in itself, it can be very powerful when focused within a specific use case. It’s important for anti-corruption projects that revolve around leaked data (such as the Yanukovych leaks), or FOIA requests that yield information in difficult-to-parse formats that aren’t machine readable (badly scanned documents, or even boxes of paper prints). In cases like these, citizen engagement is possible because there are clear incentives for individuals to get involved. Localized segmentation (where citizens look only at data directly involving them or their communities) is a boon for disentangling large lumps of data, as long as the information interests enough people to engage a groundswell of activity. Verification of official information can also help, for example when investigating whether state-financed infrastructures are actually being built, or if there is just a very expensive empty lot where a school is supposed to be.

It makes perfect sense, then, to look at standardization and comparability as an enabling force for citizen engagement. The ability to mash and compare different datasets brings perspective, and enables the citizens themselves to have a clearer picture, and act upon that information to hold their institutions accountable. However, translating, parsing and digesting spaghetti-data can be so time-consuming and cumbersome that organisations might just decide it’s not worth the effort. At the same time, data-collecting organizations on the ground, presented with unwieldy, overly complex standards, will simply avoid using them and compound the comparability problem. The complexity in the landscape of corruption data represents a challenge that needs to be overcome, so that data being collected can truly inspire citizen action for change.

Looking back: The Data Bootcamp in Ghana

Michael Bauer — Wed, 31 Oct 2012 14:58:51 +0000

After a successful Data Bootcamp in Dar es Salaam, Tanzania we moved to Accra, Ghana to rinse and repeat. Like Tanzania, Ghana has started an Open Government Data Initiative. They committed to it in 2010 and commissioned the National Information Technology Agency to drive the program. Currently a Open Government Data platform is in the making at data.gov.gh. However, little is to be seen except a mockup.

Compared to Tanzania we found an excited and engaged good mix of journalists, civil society representatives and technologists. While most of the participants were from Ghana, we had one participant coming in from Benin and a Ph.D student from Berkeley researching the Ghana Diaspora – who happened to be in town. We guided around 60 participants through the intensive 3 day program – kicking them off with basic spreadsheet skills and taking them all the way through Google Refine to creating Visualisations with Fusion Tables. This was a long stretch for the journalists and civil society organisers. During the workshop participants formed a total of 7 groups to work on specific stories and applications ranging from traffic accidents to public procurement.

On the third day the participants started teaching themselves: One person created a tutorial on how to import point of interest data into excel while others took the stage to show how to create simple websites and embed Fusion Table graphs and maps.

After three days and two nights of intensive work the Bootcamp ended in short presentations of the projects. The session started with great excitement and provided valuable and critical feedback for all programs. Awarding winners was not an easy task and so the African Media Initative and Worldbank Institute – who funded the prize – decided to ramp up the prize money and distribute it to more projects. A clear winning team focussed on extractive industries and whether the revenue generated helps the communities where the extraction takes place. Two runners-up worked with public procurement data and on a platform to track government manifestos – “It is a contract the government makes with us – the people”. Two project teams were awarded for finding the most interesting stories to take them further: Hospital Coverage and Road Accidents. Both discovered interesting stories in their data and started researching.

The three intensive days left everyone excited and exhausted. Most of the people came into the room knowing one or two other participants and connected with like-minded people of different skills. As a result the HacksHackers chapter in Accra increased it’s participants from 15 to over 90 at the end of the Bootcamp. We’ll keep an eye on further development in Ghana.

The Data Bootcamp in Tanzania

Michael Bauer — Tue, 23 Oct 2012 11:01:19 +0000

I am on the Road in Tanzania and Ghana to spread the datalove. Last week Tanzania’s first data journalism event happened: The Data Bootcamp, organized by the World Bank Institute and the African Media Initiative, brought together international experts, journalists, civil society organizations and technologists to work on data related projects.

In 2010 Tanzania comitted to release open government data as part of the open government partnership. Nevertheless, the Tanzanian government has only released two datasets so far. One goal of the data bootcamp was to spur demand by implementing small data projects. The format was tested before in South Africa, Kenya and Moldovia and helped to raise awareness of Open Data. In preparation and during the workshop 4 more datasets were scraped and liberated. Further data was collected by the participants to work on their specific projects.

Of the 40 participants only 7 were able to code – the majority was journalists and activists who never handled data before. Through the three days they received an intensive training in how to use spreadsheets and tools like Google Refine or Fusion Tables to tell stories with data.

The data bootcamps not only consist of intense hands-on learning experience, they also are a small competition, where 2000$ are awarded to the winner. Since the Tanzania bootcamp did not result in a clear winning project, three finalists were chosen. Each of them received a small starting sum to produce a working prototype within three weeks. After this the winning project will be chosen. Final projects were 1.) A platform tracking promises made by politicians and whether they were fulfilled. 2.) Tracking and monitoring foreign direct investments and 3.) a project illustrating the problem with Land-Grabbing and land ownership in Tanzania.