Thursday 9 a.m.–12:30 p.m.

Introduction to data processing with Python

Ivan Zimine

Audience level:
All
Room:
Smart Lecture Theatre L5-W2

Description

This tutorial is for people eager to get started with data analysis and visualisation using Python. It will cover numpy, pandas, matplotlib and ipython notebook.

Abstract

Data processing skills are highly valuable and in great demand across all industries. Python has been steadily gaining more and more attention as a language of choice for scientific, engineering, financial and generally speaking data-intensive applications. This is due to the amazing combination of clarity and readability of Python as a programming language and the development of very efficient numerical and visualisation libraries (numpy/scipy, pandas, matplotlib and the libraries built on top of those) and a versatile interactive environment called IPython.

The tutorial is organised as follows:

  • IPython: shell and notebook
  • Numpy, scipy, pylab: who is what?
  • NumPy arrays: shapes, dimensions, slicing, "array math"
  • Basic plotting with matplotlib
  • Array-oriented programming: look ma no loops
  • Pandas: data IO and the magic of data labelling
  • Overview of SciPy, Statsmodels, scikit-image

Attendees are expected to have basic knowledge of Python. Ideally, installation of numpy, pandas, matplotlib and ipython shall be done in advance. The easiest way these days is to go with anaconda distribution [1]. If you're on OSX and getting "locale" error (something likeValueError: unknown locale: UTF-8), read this [2].

[1] http://continuum.io/downloads.html
[2] http://stackoverflow.com/questions/15526996/ipython-notebook-locale-error