Data Analysis and Machine Learning with Python
  • ondemand_video
       Video Length : 18h17m36s
  • format_list_bulleted
       Tasks Number : 113
  • group
       Students Enrolled : 111
  • equalizer
       Medium Level
  • Curriculum
  • 1. Introduction
    • videocam
      The tasks to do in this course
      11m26s
    • videocam
      Install Development Environment
      11m26s
  • 2. Interactive Computing with IPython
    • videocam
      IPython Basics
      11m26s
    • videocam
      The commands in IPython
      11m26s
    • videocam
      Interacting with the OS
      11m26s
    • videocam
      Debug with pdb
      11m26s
    • videocam
      Advanced IPython Features
      11m26s
  • 3. Arrays and Vectorized Computation
    • videocam
      Introduction to NumPy
      11m26s
    • videocam
      Multidimensional Array Object
      11m26s
    • videocam
      Fast Element-wise Array Functions
      11m26s
    • videocam
      Data Processing Using Arrays
      11m26s
    • videocam
      File Input and Output with Arrays
      11m26s
    • videocam
      Linear Algebra
      11m26s
    • videocam
      Random Number Generation
      11m26s
  • 4. Data Analysis with pandas
    • videocam
      Introduction to pandas Data Structures
      11m26s
    • videocam
      Essential Functionality
      11m26s
    • videocam
      Summarizing and Computing Descriptive Statistics
      11m26s
    • videocam
      Handling Missing Data
      11m26s
    • videocam
      Hierarchical Indexing
      11m26s
    • videocam
      Advanced pandas
      11m26s
  • 5. Data Loading, Storage, and File Formats
    • videocam
      Reading and Writing Data in Text Format
      11m26s
    • videocam
      Binary Data Formats
      11m26s
    • videocam
      Interacting with HTML and Web APIs
      11m26s
    • videocam
      Interacting with Databases
      11m26s
  • 6. Data Wrangling
    • videocam
      Combining and Merging Data Sets
      11m26s
    • videocam
      Reshaping and Pivoting
      11m26s
    • videocam
      Data Transformation
      11m26s
    • videocam
      String Manipulation
      11m26s
  • 7. Plotting and Visualization
    • videocam
      Matplotlib APIs
      11m26s
    • videocam
      Plotting Functions in pandas
      11m26s
    • videocam
      Example Visualizing Earthquake Crisis Data
      11m26s
    • videocam
      Visualization Tool Ecosystem
      11m26s
  • 8. Data Aggregation and Group Operations
    • videocam
      GroupBy Mechanics
      11m26s
    • videocam
      Data Aggregation
      11m26s
    • videocam
      Group-wise Operations and Transformations
      11m26s
    • videocam
      Pivot Tables and Cross-Tabulation
      11m26s
  • 9. Time Series
    • videocam
      Date and Time Data Types
      11m26s
    • videocam
      Time Series Basics
      11m26s
    • videocam
      Date Ranges Frequencies and Shifting
      11m26s
    • videocam
      Time Zone Handling
      11m26s
    • videocam
      Periods and Period Arithmetic
      11m26s
    • videocam
      Resampling and Frequency Conversion
      11m26s
    • videocam
      Time Series Plotting
      11m26s
    • videocam
      Moving Window Functions
      11m26s
    • videocam
      Performance and Memory Usage
      11m26s
  • 10. Financial and Economic Data
    • videocam
      Time Series and Cross-Section Alignment
      11m26s
    • videocam
      Operations with Time Series of Different Frequencies
      11m26s
    • videocam
      Time of Day and Data Selection
      11m26s
    • videocam
      Splicing Together Data Sources
      11m26s
    • videocam
      Return Indexes and Cumulative Returns
      11m26s
    • videocam
      Group Transforms and Analysis
      11m26s
  • 11. Advanced NumPy
    • videocam
      ndarray Object Internals
      11m26s
    • videocam
      Advanced Array Manipulation
      11m26s
    • videocam
      Broadcasting
      11m26s
    • videocam
      Structured and Record Arrays
      11m26s
    • videocam
      Sorting
      11m26s
    • videocam
      NumPy Matrix Class
      11m26s
    • videocam
      Advanced Array I/O
      11m26s
  • 12. Big Data with Python
    • videocam
      Introducing big data
      11m26s
    • videocam
      Hadoop for big data
      11m26s
    • videocam
      Apache Hadoop
      11m26s
    • videocam
      Example in Hadoop
      11m26s
    • videocam
      Hadoop for finance
      11m26s
    • videocam
      Introducing NoSQL
      11m26s
    • videocam
      MongoDB and PyMongo
      11m26s
  • 13. Getting Started with Python Machine Learning
    • videocam
      Machine learning and Python
      11m26s
    • videocam
      A simple example machine learning
      11m26s
    • videocam
      Linear regression algorithm
      11m26s
    • videocam
      Training a linear regression model
      11m26s
    • videocam
      Recursive polynomial algorithm
      11m26s
    • videocam
      Training a recursive polynomial model
      11m26s
    • videocam
      Support Vector Machine Regression
      11m26s
    • videocam
      The Decision Tree Algorithm
      11m26s
    • videocam
      Random forest algorithm
      11m26s
  • 14. Classification in Machine Learning
    • videocam
      Logistic Regression
      11m26s
    • videocam
      K-Nearest Neighbor Classifier
      11m26s
    • videocam
      Support Vector Machine
      11m26s
    • videocam
      Kernel Support Vector Machine
      11m26s
    • videocam
      Naive Bayes Classifier
      11m26s
    • videocam
      Tree Based Algorithms
      11m26s
    • videocam
      Random Forest Classifier
      11m26s
    • videocam
      K-means clustering
      11m26s
    • videocam
      Hierarchical Clustering in Python
      11m26s
  • 15. Artificial Neural Networks
    • videocam
      Introduction to ANN
      11m26s
    • videocam
      Mathematical basis of ANN
      11m26s
    • videocam
      Perceptron neural network
      11m26s
    • videocam
      The Backpropagation Algorithm
      11m26s
    • videocam
      Building an ANN
      11m26s
    • videocam
      Training an ANN
      11m26s
  • 16. TensorFlow Framework
    • videocam
      Introduction to TensorFlow
      11m26s
    • videocam
      TensorFlow APIs
      11m26s
    • videocam
      Building an ANN with TensorFlow
      11m26s
    • videocam
      Training an ANN with TensorFlow
      11m26s
  • 17. Practical Projects
    • videocam
      Handwriting Recognition with Python
      11m26s
    • videocam
      Image Recognition with Python
      11m26s
    • videocam
      Natural Language Processing with Python
      11m26s
Authors

Kevin Gautama is a systems design and programming engineer with 16 years of expertise in the fields of electrical and electronics and information technology.

He teaches at the Hanoi University of Industry in the period 2003-2011 and he has a certificate of vocational training by the Ministry of Industry and Commerce and the Hanoi University of Industry.

From extensive design experience through numerous engineering projects, the author founded the Enziin Academy.

The Enziin Academy is a startup in the field of educational, it's core goal is to training design engineers in the fields technology related.

The Enziin Academy is headquartered in Stockholm-Sweden with an orientation operating multi-lingual and global.

The author's skills in IT:

  • Implementing the application infrastructure on Amazon's cloud computing platform.
  • Linux server system administration (Sysadmin).
  • Design load balancing and content distribution system.
  • MySQL database administration.
  • C/C++/C# Programming
  • Ruby and Ruby on Rails Programming
  • Python and Django Programming
  • The WPF/C# on the .NET Framework Programming
  • The PHP/JAVA Programming
  • Machine Learning and Expert System.
  • Internet of Things.

The author's skills in the fields of electric and electronic:

  • The design of popular CPU / MCU systems.
  • Design FPGA / CPLD system (Xilinx - Altera).
  • Design and programming of DSP systems (Texas Instruments).
  • Embedded ARM system design.
  • The RTOS Programming
  • Design and programming electronic power systems.
  • PLC - inverter - sensor - electric control cabinet industrial.
  • Control systems distributed connection with Server.

Read more...

Data Analysis and Machine Learning with Python


Python for Data Analysis

For many people, the Python language is easy to fall in love with. Since its first appearance in 1991, Python has become one of the most popular dynamic, programming languages, along with Perl, Ruby, and others.

Python and Ruby have become especially popular in recent years for building websites using their numerous web frameworks, like Rails (Ruby) and Django (Python). Such languages are often called scripting languages as they can be used to write quick-and-dirty small programs, or scripts.

I don’t like the term “scripting language” as it carries a connotation that they cannot be used for building mission-critical software. Among interpreted languages Python is distinguished by its large and active scientific computing community.

Adoption of Python for scientific computing in both industry applications and academic research has increased significantly since the early 2000s. For data analysis and interactive, exploratory computing and data visualization.

Python will inevitably draw comparisons with the many other domain-specific open source and commercial programming languages and tools in wide use, such as R, MATLAB, SAS, Stata, and others. In recent years, Python’s improved library support (primarily pandas) has made it a strong alternative for data manipulation tasks.

Combined with Python’s strength in general purpose programming, it is an excellent choice as a single language for building data-centric applications.

Python for Machine Learning

Machine learning (ML) teaches machines how to carry out tasks by themselves, it is that simple. The complexity comes with the details, and that is most likely the reason you are reading this book. Maybe you have too much data and too little insight, and you hoped that using machine learning algorithms will help you solve this challenge.

So you started to  dig into random algorithms. But after some time you were puzzled: which of the myriad of algorithms should you actually choose? Or maybe you are broadly interested in machine learning and have been reading  a few blogs and articles about it for some time.

The goal of machine learning is to teach machines to carry out tasks by providing them with a couple of examples (how to do or not do a task). Let us assume that each morning when you turn on your computer, you perform the  same task of moving e-mails around so that only those e-mails belonging to a particular topic end up in the same folder.

After some time, you feel bored and  think of automating this chore. One way would be to start analyzing your brain  and writing down all the rules your brain processes while you are shuffling your e-mails. However, this will be quite cumbersome and always imperfect.

While you will miss some rules, you will over-specify others. A better and more future-proof way would be to automate this process by choosing a set of e-mail meta information and body/folder name pairs and let an algorithm come up with the best rule set.

The pairs would be your training data, and the resulting rule set (also called model) could then be applied to future e-mails that we have not yet seen. This is machine learning in its simplest form. Of course, machine learning (often also referred to as data mining or predictive analysis) is not a brand new field in itself.

Quite the contrary, its success over recent years can be attributed to the pragmatic way of using rock-solid techniques and insights from other successful fields; for example, statistics. There, the purpose is for us humans to get insights into the data by learning more about the underlying patterns and relationships.

As you read more and more about successful applications of machine learning (you have checked out kaggle.com already, haven't you?), you will see that applied statistics is a common field among machine learning experts. As you will see later, the process of coming up with a decent ML approach is never a waterfall-like process.

Instead, you will see yourself going back and forth in your analysis, trying out different versions of your input data on diverse sets of ML algorithms. It is this explorative nature that lends itself perfectly to Python. Being an interpreted high-level programming language, it may seem that Python was designed specifically for the process of trying out different things.

What is more, it does this very fast. Sure enough, it is slower than C or similar statically-typed programming languages; nevertheless, with a myriad of easy-to-use libraries that  are often written in C, you don't have to sacrifice speed for agility.

Table of Content

1. Introduction

  • The tasks to do in this course
  • Install Development Environment

2. Interactive Computing with IPython

  • IPython Basics
  • The commands in IPython
  • Interacting with the OS
  • Debug with pdb
  • Advanced IPython Features

3. Arrays and Vectorized Computation

  • Introduction to NumPy
  • Multidimensional Array Object
  • Fast Element-wise Array Functions
  • Data Processing Using Arrays
  • File Input and Output with Arrays
  • Linear Algebra
  • Random Number Generation

4. Data Analysis with pandas

  • Introduction to pandas Data Structures
  • Essential Functionality
  • Summarizing and Computing Descriptive Statistics
  • Handling Missing Data
  • Hierarchical Indexing
  • Advanced pandas

5. Data Loading, Storage, and File Formats

  • Reading and Writing Data in Text Format
  • Binary Data Formats
  • Interacting with HTML and Web APIs
  • Interacting with Databases

6. Data Wrangling

  • Combining and Merging Data Sets
  • Reshaping and Pivoting
  • Data Transformation
  • String Manipulation

7. Plotting and Visualization

  • Matplotlib APIs
  • Plotting Functions in pandas
  • Example Visualizing Earthquake Crisis Data
  • Visualization Tool Ecosystem

8. Data Aggregation and Group Operations

  • GroupBy Mechanics
  • Data Aggregation
  • Group-wise Operations and Transformations
  • Pivot Tables and Cross-Tabulation

9. Time Series

  • Date and Time Data Types
  • Time Series Basics
  • Date Ranges Frequencies and Shifting
  • Time Zone Handling
  • Periods and Period Arithmetic
  • Resampling and Frequency Conversion
  • Time Series Plotting
  • Moving Window Functions
  • Performance and Memory Usage

10. Financial and Economic Data

  • Time Series and Cross-Section Alignment
  • Operations with Time Series of Different Frequencies
  • Time of Day and Data Selection
  • Splicing Together Data Sources
  • Return Indexes and Cumulative Returns
  • Group Transforms and Analysis

11. Advanced NumPy

  • ndarray Object Internals
  • Advanced Array Manipulation
  • Broadcasting
  • Structured and Record Arrays
  • Sorting
  • NumPy Matrix Class
  • Advanced Array I/O

12. Big Data with Python

  • Introducing big data
  • Hadoop for big data
  • Apache Hadoop
  • Example in Hadoop
  • Hadoop for finance
  • Introducing NoSQL
  • MongoDB and PyMongo

13. Getting Started with Python Machine Learning

  • Machine learning and Python
  • A simple example machine learning
  • Linear regression algorithm
  • Training a linear regression model
  • Recursive polynomial algorithm
  • Training a recursive polynomial model
  • Support Vector Machine Regression
  • The Decision Tree Algorithm
  • Random forest algorithm

14. Classification in Machine Learning

  • Logistic Regression
  • K-Nearest Neighbor Classifier
  • Support Vector Machine
  • Kernel Support Vector Machine
  • Naive Bayes Classifier
  • Tree Based Algorithms
  • Random Forest Classifier
  • K-means clustering
  • Hierarchical Clustering in Python

15. Artificial Neural Networks

  • Introduction to ANN
  • Mathematical basis of ANN
  • Perceptron neural network
  • The Backpropagation Algorithm
  • Building an ANN
  • Training an ANN

16. TensorFlow Framework

  • Introduction to TensorFlow
  • TensorFlow APIs
  • Building an ANN with TensorFlow
  • Training an ANN with TensorFlow

17. Practical Projects

  • Handwriting Recognition with Python
  • Image Recognition with Python
  • Natural Language Processing with Python