응용수학¶

주제: 데이터 분석 도구 파이썬¶

파이썬 기초: Python, IPython, Jupyter, Numpy
데이터 불러오기/쓰기: 텍스트문서, 웹문서, 엑셀, 데이터베이스
데이터 가공: 누락 항목, 잘못된 형식 처리,
데이터 분석: 그룹핑, 그룹별 계산
데이터 시각화: 막대, 꺽은선, 박스플롯, 버블플롯, 파이 차트

교재¶

Python for Data Analysis, 2nd Edition, by William Wesley McKinney, Publisher: O’Reilly Media, Inc.Pub. Date: October 10, 2017(고려대학교 도서관 온라인 북)

내용¶

Preliminaries
- 1.1 What Is This Book About?
- 1.2 Why Python for Data Analysis?
- 1.3 Essential Python Libraries NumPy
  - pandas
  - matplotlib
  - IPython and Jupyter
  - SciPy
  - scikit-learn
  - statsmodels
- 1.4 Installation and Setup
- 1.5 Community and Conferences
- 1.6 Navigating This Book
Python Language Basics, IPython, and Jupyter Notebooks
- 2.1 The Python Interpreter
- 2.2 IPython Basics
  - Running the IPython Shell
  - Running the Jupyter Notebook
  - Tab Completion
  - Introspection
  - The %run Command
  - Executing Code from the Clipboard
  - Terminal Keyboard Shortcuts
  - About Magic Commands
  - Matplotlib Integration 2.3 Python Language Basics
  - Language Semantics
  - Scalar Types
  - Control Flow
Built-in Data Structures, Functions, and Files
- 3.1 Data Structures and Sequences
  - Tuple
  - List
  - Built-in Sequence Functions
  - dict
  - set
  - List, Set, and Dict Comprehensions
- 3.2 Functions
  - Namespaces, Scope, and Local Functions
  - Returning Multiple Values
  - Functions Are Objects
  - Anonymous (Lambda) Functions
  - Currying: Partial Argument Application
  - Generators
  - Errors and Exception Handling
- 3.3 Files and the Operating System
- 3.4 Conclusion
NumPy Basics: Arrays and Vectorized Computation
- 4.1 The NumPy ndarray: A Multidimensional Array Object
  - Creating ndarrays
  - Data Types for ndarrays
  - Arithmetic with NumPy Arrays
  - Basic Indexing and Slicing
  - Boolean Indexing
  - Fancy Indexing
  - Transposing Arrays and Swapping Axes
- 4.2 Universal Functions: Fast Element-Wise Array Functions
- 4.3 Array-Oriented Programming with Arrays
  - Expressing Conditional Logic as Array Operations
  - Mathematical and Statistical Methods
  - Methods for Boolean Arrays
  - Sorting
  - Unique and Other Set Logic
- 4.4 File Input and Output with Arrays
- 4.5 Linear Algebra
- 4.6 Pseudorandom Number Generation
- 4.7 Example: Random Walks
- 4.8 Conclusion
Getting Started with pandas
- 5.1 Introduction to pandas Data Structures
  - Series
  - DataFrame
  - Index Objects
- 5.2 Essential Functionality
  - Reindexing
  - Dropping Entries from an Axis
  - Indexing, Selection, and Filtering
  - Integer Indexes
  - Arithmetic and Data Alignment
  - Function Application and Mapping
  - Sorting and Ranking
  - Axis Indexes with Duplicate Labels
- 5.3 Summarizing and Computing Descriptive Statistics
- 5.4 Conclusion
Data Loading, Storage, and File Formats
- 6.1 Reading and Writing Data in Text Format
  - Reading Text Files in Pieces
  - Writing Data to Text Format
  - Working with Delimited Formats
  - JSON Data
  - XML and HTML: Web Scraping
- 6.2 Binary Data Formats
- 6.3 Interacting with Web APIs
- 6.4 Interacting with Databases
- 6.5 Conclusion
Data Cleaning and Preparation
- 7.1 Handling Missing Data
- 7.2 Data Transformation
  - Removing Duplicates
  - Transforming Data Using a Function or Mapping
  - Replacing Values
  - Renaming Axis Indexes
  - Discretization and Binning
  - Detecting and Filtering Outliers
  - Permutation and Random Sampling
  - Computing Indicator/Dummy Variables
- 7.3 String Manipulation
  - String Object Methods
  - Regular Expressions
  - Vectorized String Functions in pandas
- 7.4 Conclusion
Data Wrangling: Join, Combine, and Reshape
- 8.1 Hierarchical Indexing
- 8.2 Combining and Merging Datasets
  - Database-Style DataFrame Joins
  - Merging on Index
  - Concatenating Along an Axis
  - Combining Data with Overlap
- 8.3 Reshaping and Pivoting
  - Reshaping with Hierarchical Indexing
  - Pivoting “Long” to “Wide” Format
  - Pivoting “Wide” to “Long” Format
- 8.4 Conclusion
Plotting and Visualization
- 9.1 A Brief matplotlib API Primer
  - Figures and Subplots
  - Colors, Markers, and Line Styles
  - Ticks, Labels, and Legends
  - Annotations and Drawing on a Subplot
  - Saving Plots to File
  - matplotlib Configuration
- 9.2 Plotting with pandas and seaborn
  - Line Plots
  - Bar Plots
  - Histograms and Density Plots
  - Scatter or Point Plots
  - Facet Grids and Categorical Data
- 9.3 Other Python Visualization Tools
- 9.4 Conclusion
Data Aggregation and Group Operations
- 10.1 GroupBy Mechanics
- 10.2 Data Aggregation
- 10.3 Apply: General split-apply-combine
  - Suppressing the Group Keys
  - Quantile and Bucket Analysis
  - Example: Filling Missing Values with Group-Specific Values
  - Example: Random Sampling and Permutation
  - Example: Group Weighted Average and Correlation
  - Example: Group-Wise Linear Regression
- 10.4 Pivot Tables and Cross-Tabulation
- 10.5 Conclusion
Time Series
- 11.1 Date and Time Data Types and Tools
- 11.2 Time Series Basics
- 11.3 Date Ranges, Frequencies, and Shifting
  - Generating Date Ranges
  - Frequencies and Date Offsets
  - Shifting (Leading and Lagging) Data
- 11.4 Time Zone Handling
- 11.5 Periods and Period Arithmetic
  - Period Frequency Conversion
  - Quarterly Period Frequencies
  - Converting Timestamps to Periods (and Back)
  - Creating a PeriodIndex from Arrays
- 11.6 Resampling and Frequency Conversion
  - Downsampling
  - Upsampling and Interpolation
  - Resampling with Periods
- 11.7 Moving Window Functions
- 11.8 Conclusion
Advanced pandas
- 12.1 Categorical Data
  - Background and Motivation
  - Categorical Type in pandas
  - Computations with Categoricals
  - Categorical Methods
- 12.2 Advanced GroupBy Use
- 12.3 Techniques for Method Chaining
- 12.4 Conclusion
Introduction to Modeling Libraries in Python
- 13.1 Interfacing Between pandas and Model Code
- 13.2 Creating Model Descriptions with Patsy
- 13.3 Introduction to statsmodels
- 13.4 Introduction to scikit-learn
- 13.5 Continuing Your Education
Data Analysis Examples
- 14.1 1.USA.gov Data from Bitly
  - Counting Time Zones in Pure Python
  - Counting Time Zones with pandas
- 14.2 MovieLens 1M Dataset
- 14.3 US Baby Names 1880–2010
  - Analyzing Naming Trends
- 14.4 USDA Food Database
- 14.5 2012 Federal Election Commission Database
  - Donation Statistics by Occupation and Employer
  - Bucketing Donation Amounts
  - Donation Statistics by State
- 14.6 Conclusion
Advanced NumPy
- A.1 ndarray Object Internals
- A.2 Advanced Array Manipulation
  - Reshaping Arrays
  - C Versus Fortran Order
  - Concatenating and Splitting Arrays
  - Repeating Elements: tile and repeat
  - Fancy Indexing Equivalents: take and put
- A.3 Broadcasting
- A.4 Advanced ufunc Usage
- A.5 Structured and Record Arrays
- A.6 More About Sorting
- A.7 Writing Fast NumPy Functions with Numba
- A.8 Advanced Array Input and Output
- A.9 Performance Tips
More on the IPython System
- B.1 Using the Command History
- B.2 Interacting with the Operating System
- B.3 Software Development Tools
  - Interactive Debugger
  - Timing Code: %time and %timeit
  - Basic Profiling: %prun and %run -p
  - Profiling a Function Line by Line
- B.4 Tips for Productive Code Development Using IPython
- B.5 Advanced IPython Features
- B.6 Conclusion

참고 서적¶

Official Python Tutorial
Python Cookbook, Third Edition, by David Beazley and Brian K. Jones (O’Reilly)
Fluent Python by Luciano Ramalho (O’Reilly)
Effective Python by Brett Slatkin (Pearson)
Pandas for Everyone: Python Data Analysis, First Edition, By Daniel Y. Chen
`Object-Oriented Programming in Python <http://python-textbok.readthedocs.io/en/1.0/index.html>`__