Main Objectives of the workshop:
1) What is data analysis? 2) How to approach a data analysis problem? 3) What are the various questions one can ask given data? 4) How to perform data analysis and gain insights?.
By the end of the workshop, the attendees should be able to take a dataset and be confident to perform data analysis in a scientific manner .
The following steps need to be followed to understand how to do data analysis/data analytics
Making the case.
Acquire - "Data is the new oil"
Gathered manually and recorded
Refine - "Data is messy"
Missing e.g. Check for missing or incomplete data
Summary e.g. show summary stats like mean
Explore - "I don't know, what I don't know"
Why do visual exploration?
Explore multi variable graphs
Model - "All models are wrong, Some of them are useful"
The power and limits of models
Classification model
Insight - “The goal is to turn data into insight”
Why do we need to communicate insight?
Types of Data Analysis/Analytics Question:
In this workshop, we take a dataset and go through many of the steps listed above.
Instead of introducing libraries, our approach is to introduce the problem and provide the approach on how to solve it. As we go about solving the problem(s), we will introduce a number of libraries commonly used in the Python data stack (numpy
, pandas
, matplotlib
, seaborn
, scipy
etc).
The repository for the talk is https://github.com/amitkaps/weed.
The repository has instructions on what packages to install and the data we would be using for the workshop.
From our prior experience, attendees have been able to install the requirements on Windows, Linux and Mac without any issue.
We strongly advise attendees to install the requirements prior to the workshop. If faced with a bug or challenge, please submit an issue on the repo.
We are looking for sponsors!
If you can help make PyCon SG a success, please message us at conference@pycon.sg