On February 21 (International Open Data Day), Bantay.ph, a platform that uses technology in mobilizing citizens to demand good governance, will host the very first citizen-initiated “Data Jam.”
Happy: As a fellow, I spent a lot of time learning! The Fellowship really helped me to be brave and dive into the data… other than the events you have to do, a lot of it is a learning experience. It really never stops!
Yuandra: Usually, I meet with a lot of people – working with data is very new in my country, Indonesia, so there was lots of interest. I spent lots of time going from organisation to organisation, raising awareness of what they can do with data. Then, planning training – the materials, preparing them, thinking about how to package the materials in a way that people will understand.
Milena: We’re looking for a diversity of skills among the fellows, we’re hoping each fellow will have a strong skill that they’ll be able to teach others, as well as be able to identify gaps in their own knowledge. We only have 7 spaces this year, which is fewer than last year, so it will (hopefully!) be a competitive process.
Codrina: It’s important to have some connections in your region, because the Fellowship (and School of Data) is not just about learning things for yourself, but then to take what you have learned and what you know, and spread it in your own geographical context. Or if you don’t already – be prepared to go around and meet lots of new organisations and build the community around you!
Yuandra: Community building is really important, you’ll be working with other organisations around you who definitely have the need for data. So is communication: my background is very technical, but this Fellowship taught me how to put my technical jargon aside, and explain issues in a simple way for newcomers to the topic.
Yuandra: I worked with Publish What You Pay (who work on extractive industries transparency), who previously only used data in Excel, and for reports. When I went there, one of my main points was to show them how they can use data in other ways, for example in visualisations and infographics. They’re still in an early stage of working with data, but they’ve come a long way!
Codrina: I’m a mapping person, so much of the work I did involved either building maps or teaching people how to use them, and how to stay away from usual map problems. I went to Bosnia & Herzegovina, and worked on election maps. If you’re ever curious about the most horrible election system in the world – take a look! We spent a week trying to work out how it works, we ended up asking people to explain the system in a 3 minute video, which worked really well.
Happy: I found that it’s hard to ‘sell’ open data to different CSOs just by explaining – so, I wanted to use my own organisation as a model, to demonstrate what exactly people can do with open data. It was a really good way actually for us to engage with government – you build trust, and partnerships with them, by teaching them what they can do with data. Now, the government are opening up datasets that they’ve never opened before – so this is really exciting for me.
Nisha: We did a data journalism workshop for people who are really not very technologically savvy – it was really rewarding because after a while of working with people who want to know more advanced stuff, you can forget there’s lots of people who still want to know the basics, so you get to open this whole new world to them. We also did a data expedition with an organisation that’s working in the urban space in Hyderabad, with data that they’d collected.
On Monday 16th February, our 2014 Fellows Codrina, Happy and Yuandra, from Romania, the Philippines, and Indonesia respectively, joined myself and Milena to talk through their experiences in last year’s fellowship.
Here’s the video online (just under an hour long):
And on Tuesday 17th February, Olu and Nisha, from Nigeria and India respectively, joined us to discuss their fellowship. Here’s their video, which is just over 30 minutes long:
]]>We have funding for 7 School of Data fellows to take part in our 2015 Fellowship Programme, and from previous experience, we’ve found that the fellowships work best when there is an established local host.
School of Data is promoting data literacy by working with local partners to create impactful data-driven projects. We’re looking organisations that need support in using data more effectively and that are willing to work closely with one of our School of Data fellows over a 9 month period.
If you are selected, you’ll welcome a School of Data fellow in your office on a regular basis, to work on concrete projects and provide you with custom trainings and support, depending on what you need most. You’ll open up your data to the fellow, and allow them to see how you work with data now, help you guide your organisation towards being more data-savvy and using data to strengthen your work, be that in the field of advocacy, campaigning, journalism, or elsewhere within the civil society space. You’ll support the growth of the data-literate community, by inviting those within your network to attend trainings, and organising your own data expeditions, supported closely by the School of Data fellow.
This programme involves a great deal of resources and commitment from us, and we expect an equal amount of resources and commitment from our partners.
The ideal partner would be able to commit:
If you are accepted as our local partner, we’ll ask for your assistance in selecting the best applicant to be the School of Data fellow who will work with you. The fellow will support you by:
Here is just an example of what our 2014 fellow Hannah Williams worked on together with local partners from South Africa: http://capetownbudgetproject.org.za/
Deadline: March 10th
You are also welcome to contact us on [email protected] while you are preparing your application; we’d be happy to answer your questions and help you put together a good application.
]]>Following our successful 2014 School of Data Fellowships, we’re opening today our Call for Applications for the 2015 Fellowship programme. As with last year’s programme, we’re looking to find new data trainers to spread data skills around the world.
As a School of Data fellow, you will receive data and leadership training, as well as coaching to organise events and build your community in your country or region. You will also be part of a growing global network of School of Data practitioners, benefiting from the network effects of sharing resources and knowledge and contributing to our understanding about how best to localise our training efforts.
As a fellow, you’ll be part of a nine-month training programme where you’ll work with us for an average of ten working days a month, including attending online and offline trainings, organising events, and being an active member of the thriving School of Data community.
Our 2015 fellowship programme will run from April-December 2015. We’re asking for 10 days a month of your time – consider it to be a part time role, and your time will be remunerated. To apply, you need to be living in a country classified as lower income, lower-middle income or upper-middle income categories as classified here.
People who fit the following profile:
To give you an idea of who we’re looking for, check out the profiles of our 2014 fellows – we welcome people from a diverse range of backgrounds, too, so people with new skillsets and ranges of experience are encouraged to apply.
This year, we’d love to work with people with a particular topical focus, especially those interest in working with extractive industries data, financial data, or aid data.
There are 7 fellowship positions open for the April to December 2015 School of Data training programme.
We’re looking for people based in low-, lower-middle, and upper-middle income countries as classified by the World Bank, and we have funding for Fellows in the following geographic regions:
As a School of Data fellow, you’ll be part of our 9-month programme, which includes the following activities:
Check out the Testimonials page to see what the 2014 Fellows said about the programme, or watch our Summer Camp video to meet some of the community.
This year’s fellowships will be supported by the Partnership for Open Development (POD) OD4D, Hivos, and the Foreign and Commonwealth Office in Macedonia. We welcome more donors to contribute to this year’s fellowship programme! If you are a donor and are interested in this, please email us at [email protected].
Got questions? See more about the Fellowship Programme here and have a looks at this Frequently Asked Questions (FAQ) page.– or, watch the Ask Us Anything Hangouts that we held in mid-February to take your questions and chat more about the fellowship.
Not sure if you fit the profile? Have a look at our 2013 and 2014 fellows profiles.. Women and other minorities are encouraged to apply.
Convinced? Apply now to become a School of data fellow. The application will be open until March 10th and the programme will start in April 2015.
]]>This article is part tutorial, part demonstration of the process I go through to complete a data expedition alone, or as a participant during a School of Data event. Each of the following steps will be detailed: Find, Get, Verify, Clean, Explore, Analyze, Visualize, Publish
Depending on your data, your source or your tools, the order in which you will be going through these steps might be different. But the process is globally the same.
FIND
A data expedition can start from a question (e.g. how polluted are european cities?) or a data set that you want to explore. In this case, I had a question: Has the dynamic of the physical video game magazine market been declining in the past few years ? I have been studying the video game industry for the past few weeks and this is one the many questions that I set myself to answer. Obviously, I thought about many more questions, but it’s generally better to start focused and expand your scope at a later stage of the data expedition.
A search returned Wikipedia as the most comprehensive resource about video game magazines. They even have some contextual info, which will be useful later (context is essential in data analysis).
https://en.wikipedia.org/wiki/List_of_video_game_magazines
GET
The wikipedia data is formatted as a table. Great! Scraping it is as simple as using the importHTML function in Google spreadsheet. I could copy/paste the table, but that would be cumbersome with a big table and the result would have some minor formatting issues. LibreOffice and Excel have similar (but less seamless) web import features.
importHTML asks for 3 variables: the link to the page, the formatting of the data (table or list), and the rank of the table (or the list) in the page. If no rank is indicated, as seen below, it will grab the first one.
Once I got the table, I do two things to help me work quicker:
VERIFY
So, will this data really answer my question completely? I do have the basic data (name, founding data, closure date), but is it comprehensive? A double check with the French wikipedia page about video game magazines reveals that many French magazines are missing from the English list. Most of the magazines represented are from the US and the UK, and probably only the most famous. I will have to take this into account going forward.
CLEAN
Editing your raw data directly is never a good idea. A good practice is to work on a copy or in a nondestructive way – that way, if you make a mistake and you’re not sure where, or want to go back and compare to the original later, it’s much easier. Because I want to keep only the US and UK magazines, I’m going to:
Tip: to avoid moving your column headers when ordering the data, go to Display→Freeze lines→Freeze 1 line.
Some other minor adjustments have to be made, but they’re light enough that I don’t need to use a specialized cleaning tool like Open Refine. Those include:
EXPLORE
I call “explore” the phase where I start thinking about all the different ways my cleaned data could answer my initial question[1]. Your data story will become much more interesting if you attack the question from several angles.
There are several things that you could look for in your data:
So what can I do? I can:
For the purpose of this tutorial, I will focus on the second one, looking at the number of magazines created per year Another tutorial will be dedicated to the first, because it requires a more complex approach due to the formatting of our data.
At this point, I have a lot of other ideas: Can I determine which year produced the most enduring magazines (surprising interactions)? Will there be anything to see if I bring in video game website data for comparison (revealing comparisons)? Which magazines have lasted the longest (interesting factoid)? This is outside of the scope of this tutorial, but those are definitely questions worth exploring. It’s still important to stay focused, but writing them down for later analysis is a good idea.
ANALYSE
Analysing is about applying statistical techniques to the data and question the (usually visual) results.
The quickest way to answer our question “How many magazines have been created each year?” is by using a pivot table.
This data can then be visualized with a bar graph.
The trendline seems to show a decline in the dynamic of the market, but it’s not clear enough. Let’s group the years by half-decade and see what happens:
The resulting bar chart is much clearer:
The number of magazines created every half-decade decreases a lot in the lead up to the 2000s. The slump of the 1986-1990 years is perhaps due to a lagging effect of the North american video game crash of 1982-1984
Unlike what we could have assumed, the market is still dynamic, with one magazine founded every year for the last 5 years. That makes for an interesting, nuanced story.
VISUALISE
In this tutorial the initial graphs created during the analysis are enough to tell my story. But if the results of my investigations required a more complex, unusual or interactive visualisation to be clear for my public, or if I wanted to tell the whole story, context included, with one big infographic, it would fall into the “visualise” phase.
PUBLISH
Where to publish is an important question that you have to answer at least once. Maybe the question is already answered for you because you’re part of an organisation. But if you’re not, and you don’t already have a website, the answer can be more complex. Medium, a trendy publishing platform, only allows images at this point. WordPress might be too much for your need. It’s possible to customize the Javascript of tumblr posts, so it’s a solution. Using a combination of Github Pages and Jekyll, for the more technically inclined, is another. If a light database is needed, take a look at tabletop.js, which allows you to use a google spreadsheet as a quasi-database.
Any data expedition, of any size or complexity, can be approached with this process. Following it helps avoiding getting lost in the data. More often than not, there will be a need to get and analyze more data to make sense of the initial data, but it’s just a matter of looping the process.
[1] I formalized the “explore” part of my process after reading the excellent blog from MIT alumni Rahoul Bhargava http://datatherapy.wordpress.com
]]>The workshop was mainly led by Ali Rebaie, a Senior School of Data fellow, and Bahia Halawi, a data scientist at Data Aurora, along with the data community team assistants; Zayna Ayyad, Noor Latif and Hsein Kassab. The aim of the workshop was to give the students an introduction to the world of open data and data journalism, in particular, through tutorials on open source tools and methods used in this field. Moreover, we wanted to put students on track regarding the use of data.
On the first day, the students were introduced to data journalism, from a theoretical approach, in particular, the data pipeline which outlined the different phases in any data visualization project: find, get, verify, clean, analyze and present. After that, students were being technically involved in scraping and cleaning data using tools such as open refine and Tabula.
Day two was all about mapping, from mapping best practices to mapping formats and shapes. Students were first exposed to different types of maps and design styles that served the purpose of each map. Moreover, best mappings techniques and visualizations were emphasized to explain their relative serving purpose. Eventually, participants became able to differentiate between the dot maps and the choropleth maps as well as many others. Then they used twitter data that contained geolocations to contrast varying tweeting zones by placing these tweets at their origins on cartodb. Similarly, they created other maps using QGIS and Tilemill. The mapping exercises were really fun and students were very happy to create their own maps without a single line of code.
On the third day, Bahia gave a lecture on network analysis, some important mathematical notions needed for working with graphs as well as possible uses and case studies related to this field. Meanwhile, Ali was unveiling different open data portals to provide the students with more resources and data sets. After these topics were emphasized, a technical demonstration on the use of network analysis tool to analyze two topics was performed. Students were analyzing climate change and later, the AUB media group on Facebook was also analyzed and we had its graph drawn. It was very cool to find out that one of the top influencers in that network was among the students taking the training. Students were also taught to do the same analysis for their own friends’ lists. Facebook data was being collected and the visualizations were being drawn in a network visualization tool.
After completing the interactive types of visualizations, the fourth day was about static ones, mainly, infographics. Each student had the chance to extract the information needed for an interesting topic to transform it into a visual piece. Bahia was working around with students, teaching them how to refine the data so that it becomes simple and short, thus usable for building the infographic design. Later, Yousif, a senior creative designer at Data Aurora, trained the students on the use of Photoshop and illustrator, two of the tools commonly used by infographic designers. At the end of the session, each student submitted a well done infographic of which some are posted below.
After the workshop Zayna had small talks with the students to get their feedback and here she quoted some of their opinions:
“It should be a full course, the performance and content was good but at some point, some data journalism tools need to be more mature and user-friendly to reduce the time needed to create a story,” said Jad Melki, Director of media studies program at AUB, “it was great overall.”
“It’s really good but the technical parts need a lot of time. We learned about new apps. Mapping, definitely I will try to learn more about it,” said Carla Sertin, a media student.
“It was great we got introduced to new stuff. Mapping, I loved it and found it very useful for me,” said Ellen Francis, civil engineering student. “The workshop was a motivation for me to work more on this,” she added, “it would work as a one semester long course.”
Azza El Masri, a media student, is interested in doing MA in data journalism. “I like it I expected it to be a bit harder, I would prefer more advanced stuff in scraping,” she added.
]]>
Last November, the Open Government Partnership (OGP) Summit took place in Latin America. CSO participants from 18 countries got together to share and exchange in an “unconference” where many topics were discussed. It was really interesting to learn about ways data things are handled in different countries, and to pinpoint the similarities and differences between our contexts.
After a few words from the President of Costa Rica and other government representatives, a series of talks and roundtables began… And then, in parallel, Antonio (School of Data fellow in Peru) and I started a datathon.
In this datathon, our task was to give training and support to the five teams asking questions to the dataset on the commitments of the OGP countries, and which can be found here → Action Plan Commitments and IRM Data, http://goo.gl/yZmcKC, http://goo.gl/vLgYWj
The first step is to approach the data and structure it. After this, it was time to pose the questions we wanted to answer through the analysis of this data, and a lot of great questions (and interesting purposes) arose – many more than time allowed us to develop further. Teams picked the topics that seemed most relevant to them.
Teams were already working on their analysis at 9 sharp the following morning, while OGP San Jose sessions were taking place. The datathon participants looked for more data, did cross-comparisons, scraping, etc. By noon, they had found results and answers – it was time to start working to present them in visualizations, infographics, maps, articles, etc. At 3PM, the teams impressed us with their presentations, and showed us the following outcomes: http://ogpcr.hackdash.org
At the end of the day, the jury chose teams InfografiaFeliz and Accesa as winners (which earned them a prize in cash).
This was the first data expedition in Costa Rica, and you can find more in the following links: https://www.facebook.com/ogpsanjose, https://twitter.com/OGPSanJose, https://www.flickr.com/photos/ogpsanjose , http://grupoincocr.com/open-data/miembros-de-grupo-inco-ganan-la-primera-expedicion-de-datos-en-costa-rica
What I take away from my experience in this expedition is that people are always willing to learn and create, but not everyone is aware of what open data is, or how it can be useful for them. Initiatives of this sort are achieving their mission, but are insufficient – and that’s why we need to keep in touch with the participants and encourage them to share their experiences, and, why not: to replicate these initiatives.
Here are some tips for people with an interest in running data expeditions:
Last week I attended the 7th annual Winter School at Amsterdam University. Run by the Digital Methods Initiative, it took the form of a data sprint in which students joined professional developers and designers to answer research questions using social media data.
The DMI group at Amsterdam have developed and collated a suite of easy-to-use tools specifically for this kind of research. They are well worth checking out for anyone interested in this field and they cover a range of techniques from web scraping to list triangulation, and can be found online here.
I joined a group looking at bias across three APIs through which you can acquire Twitter data: the Search API, the Stream API and the proprietary Firehose endpoint – generally regarded as the most complete source of Twitter data. We had three sets captured from the three separate APIs for a critical period between 7th and 15th October 2014 when the Hong Kong protests were taking place.
Other groups took on a range of tasks from mapping the open data revolution to tracking the global climate change debate. All projects deployed a range of data wrangling techniques to answer these complex social, political and cultural phenomena.
A few things I learned:
For those interested in attending a DMI school in the future – take a look at the summer school coming later in 2015.
]]>I had the pleasure of running a couple of School of Data related sessions, too – one short skillshare running through the ‘data pipeline’, and a longer session building out a ‘follow the money’ focused data pipeline, focused mainly on gathering various data sources on topics in this field. The pipeline, in its rough format, is online here, and I’ll publish it in a more accessible format on the School of Data site soon too.
These sessions made me think about how data literacy skills could be developed within this community, and what is really needed to support and further the work of Follow the Money initiatives. Pragmatically speaking, for technology and data to be engaged and used successfully to further people’s work, not everyone in that room needs to be a superstar data wrangler or developer. What they do need, though, is to know where the people with technical expertise are, and to be able to ask them for assistance.
In the ‘thanks’ at the end of the workshop, lots of us mentioned that being in a space where, as our facilitator Allen Gunn said, ‘asking a question is considered to be a heroic act of leadership’ rather than a signal of a lack of knowledge. It was obvious that we valued most the patience and understanding of those around us who have higher levels of knowledge in a certain field, be that topical expertise, or technical; and that for many, the opportunity to ask these technical questions comes far too rarely.
This made me think about the value of the School of Data community – in my follow up emails from the workshop, I’ve been connecting people from various countries and contexts to former fellows who are based near them, or people running local groups in neighbouring countries, who can help them in person as well as online with their data-related queries. From past experience of seeing how well our data trainers and community members work with civil society groups with lower levels of data literacy, I’m optimistic that this will work out well – whether it be simply exchanging a few emails, or working with the community members or us at School of Data central to commission actual in person trainings.
As I mentioned, these connections provide a somewhat pragmatic solution to a need for better use of data among the community. Ideally, however, we would have people based within these organisations for long term support, who have both topical expertise and data wrangling skills.
And from what I heard, the need for this skillset will become extremely pronounced in the coming years; various directives and new laws regarding data availability and transparency sitting at different points of the money trail will be coming into force over the next couple of years, and they will bring with them a deluge of data. For example, data on extractives following Section 1504 of the Dodd Frank Reform, and company data following the EU Accounting and Transparency directives. What stories lie within that data, and how can we uncover them?
Many of the people and organisations represented at the Follow the Money workshop have been instrumental in campaigning for those transparency directives; but how many of those organisations possess in-house ability to actually process and use that data? Effectively, the next round of campaigning should be based on stories that come out of that hard-fought for data – but for that to happen, we need to start preparing now, by building data and technical skills among our communities.
So, how can we start doing this? It could be through providing support for current employees of organisations to attend data expeditions or data skills courses on an ongoing basis; not just one off workshops, but people learning skills that are clearly relevant to their work, and having regular refresher courses to keep it relevant and in their minds. Or, (apologies for the blatant self-promotion here!) – it could be through supporting topical School of Data fellows to be based within the community and provide ongoing support, focusing on a specific topic – like extractives, or corporate money flows, for example.
Our experiences from the 2014 fellowships have led us to believe that the fellowship scheme is a sustainable and successful method of building up capacity both in terms of finding and supporting data storytellers and trainers (the Fellows), and equipping them with the skills they need to provide ongoing support to organisations based in their area, with whom they share their skills. Last year, the fellows carried out activities ranging from regular workshops with local organisations, to data clinics and expeditions for newcomers to get hands on with data, to simply being present within organisations as in-house support.
From what I saw last week, a lot of organisations within the Follow the Money network could do with this support. The earlier we start developing this capacity, the better equipped we will be as a community to start delving into the avalanche of data that is soon to come our way.
If you want to find out more about the Fellowship scheme, see the section ‘Fellowship Programme’ on our 2014 Annual Report, and if you’d like to talk about supporting a fellow through our upcoming 2015 scheme, get in touch with me on zara.rahman [at] okfn.org
]]>