응용수학
주제: 데이터 분석 도구 파이썬
- 파이썬 기초: Python, IPython, Jupyter, Numpy
- 데이터 불러오기/쓰기: 텍스트문서, 웹문서, 엑셀, 데이터베이스
- 데이터 가공: 누락 항목, 잘못된 형식 처리,
- 데이터 분석: 그룹핑, 그룹별 계산
- 데이터 시각화: 막대, 꺽은선, 박스플롯, 버블플롯, 파이 차트
교재
- Python for Data Analysis, 2nd Edition, by William Wesley McKinney,
Publisher: O’Reilly Media, Inc.Pub. Date: October 10, 2017(고려대학교
도서관 온라인 북)
내용
- Preliminaries
- 1.1 What Is This Book About?
- 1.2 Why Python for Data Analysis?
- 1.3 Essential Python Libraries NumPy
- pandas
- matplotlib
- IPython and Jupyter
- SciPy
- scikit-learn
- statsmodels
- 1.4 Installation and Setup
- 1.5 Community and Conferences
- 1.6 Navigating This Book
- Python Language Basics, IPython, and Jupyter Notebooks
- 2.1 The Python Interpreter
- 2.2 IPython Basics
- Running the IPython Shell
- Running the Jupyter Notebook
- Tab Completion
- Introspection
- The %run Command
- Executing Code from the Clipboard
- Terminal Keyboard Shortcuts
- About Magic Commands
- Matplotlib Integration 2.3 Python Language Basics
- Language Semantics
- Scalar Types
- Control Flow
- Built-in Data Structures, Functions, and Files
- 3.1 Data Structures and Sequences
- Tuple
- List
- Built-in Sequence Functions
- dict
- set
- List, Set, and Dict Comprehensions
- 3.2 Functions
- Namespaces, Scope, and Local Functions
- Returning Multiple Values
- Functions Are Objects
- Anonymous (Lambda) Functions
- Currying: Partial Argument Application
- Generators
- Errors and Exception Handling
- 3.3 Files and the Operating System
- 3.4 Conclusion
- NumPy Basics: Arrays and Vectorized Computation
- 4.1 The NumPy ndarray: A Multidimensional Array Object
- Creating ndarrays
- Data Types for ndarrays
- Arithmetic with NumPy Arrays
- Basic Indexing and Slicing
- Boolean Indexing
- Fancy Indexing
- Transposing Arrays and Swapping Axes
- 4.2 Universal Functions: Fast Element-Wise Array Functions
- 4.3 Array-Oriented Programming with Arrays
- Expressing Conditional Logic as Array Operations
- Mathematical and Statistical Methods
- Methods for Boolean Arrays
- Sorting
- Unique and Other Set Logic
- 4.4 File Input and Output with Arrays
- 4.5 Linear Algebra
- 4.6 Pseudorandom Number Generation
- 4.7 Example: Random Walks
- 4.8 Conclusion
- Getting Started with pandas
- 5.1 Introduction to pandas Data Structures
- Series
- DataFrame
- Index Objects
- 5.2 Essential Functionality
- Reindexing
- Dropping Entries from an Axis
- Indexing, Selection, and Filtering
- Integer Indexes
- Arithmetic and Data Alignment
- Function Application and Mapping
- Sorting and Ranking
- Axis Indexes with Duplicate Labels
- 5.3 Summarizing and Computing Descriptive Statistics
- 5.4 Conclusion
- Data Loading, Storage, and File Formats
- 6.1 Reading and Writing Data in Text Format
- Reading Text Files in Pieces
- Writing Data to Text Format
- Working with Delimited Formats
- JSON Data
- XML and HTML: Web Scraping
- 6.2 Binary Data Formats
- 6.3 Interacting with Web APIs
- 6.4 Interacting with Databases
- 6.5 Conclusion
- Data Cleaning and Preparation
- 7.1 Handling Missing Data
- 7.2 Data Transformation
- Removing Duplicates
- Transforming Data Using a Function or Mapping
- Replacing Values
- Renaming Axis Indexes
- Discretization and Binning
- Detecting and Filtering Outliers
- Permutation and Random Sampling
- Computing Indicator/Dummy Variables
- 7.3 String Manipulation
- String Object Methods
- Regular Expressions
- Vectorized String Functions in pandas
- 7.4 Conclusion
- Data Wrangling: Join, Combine, and Reshape
- 8.1 Hierarchical Indexing
- 8.2 Combining and Merging Datasets
- Database-Style DataFrame Joins
- Merging on Index
- Concatenating Along an Axis
- Combining Data with Overlap
- 8.3 Reshaping and Pivoting
- Reshaping with Hierarchical Indexing
- Pivoting “Long” to “Wide” Format
- Pivoting “Wide” to “Long” Format
- 8.4 Conclusion
- Plotting and Visualization
- 9.1 A Brief matplotlib API Primer
- Figures and Subplots
- Colors, Markers, and Line Styles
- Ticks, Labels, and Legends
- Annotations and Drawing on a Subplot
- Saving Plots to File
- matplotlib Configuration
- 9.2 Plotting with pandas and seaborn
- Line Plots
- Bar Plots
- Histograms and Density Plots
- Scatter or Point Plots
- Facet Grids and Categorical Data
- 9.3 Other Python Visualization Tools
- 9.4 Conclusion
- Data Aggregation and Group Operations
- 10.1 GroupBy Mechanics
- 10.2 Data Aggregation
- 10.3 Apply: General split-apply-combine
- Suppressing the Group Keys
- Quantile and Bucket Analysis
- Example: Filling Missing Values with Group-Specific Values
- Example: Random Sampling and Permutation
- Example: Group Weighted Average and Correlation
- Example: Group-Wise Linear Regression
- 10.4 Pivot Tables and Cross-Tabulation
- 10.5 Conclusion
- Time Series
- 11.1 Date and Time Data Types and Tools
- 11.2 Time Series Basics
- 11.3 Date Ranges, Frequencies, and Shifting
- Generating Date Ranges
- Frequencies and Date Offsets
- Shifting (Leading and Lagging) Data
- 11.4 Time Zone Handling
- 11.5 Periods and Period Arithmetic
- Period Frequency Conversion
- Quarterly Period Frequencies
- Converting Timestamps to Periods (and Back)
- Creating a PeriodIndex from Arrays
- 11.6 Resampling and Frequency Conversion
- Downsampling
- Upsampling and Interpolation
- Resampling with Periods
- 11.7 Moving Window Functions
- 11.8 Conclusion
- Advanced pandas
- 12.1 Categorical Data
- Background and Motivation
- Categorical Type in pandas
- Computations with Categoricals
- Categorical Methods
- 12.2 Advanced GroupBy Use
- 12.3 Techniques for Method Chaining
- 12.4 Conclusion
- Introduction to Modeling Libraries in Python
- 13.1 Interfacing Between pandas and Model Code
- 13.2 Creating Model Descriptions with Patsy
- 13.3 Introduction to statsmodels
- 13.4 Introduction to scikit-learn
- 13.5 Continuing Your Education
- Data Analysis Examples
- 14.1 1.USA.gov Data from Bitly
- Counting Time Zones in Pure Python
- Counting Time Zones with pandas
- 14.2 MovieLens 1M Dataset
- 14.3 US Baby Names 1880–2010
- 14.4 USDA Food Database
- 14.5 2012 Federal Election Commission Database
- Donation Statistics by Occupation and Employer
- Bucketing Donation Amounts
- Donation Statistics by State
- 14.6 Conclusion
- Advanced NumPy
- A.1 ndarray Object Internals
- A.2 Advanced Array Manipulation
- Reshaping Arrays
- C Versus Fortran Order
- Concatenating and Splitting Arrays
- Repeating Elements: tile and repeat
- Fancy Indexing Equivalents: take and put
- A.3 Broadcasting
- A.4 Advanced ufunc Usage
- A.5 Structured and Record Arrays
- A.6 More About Sorting
- A.7 Writing Fast NumPy Functions with Numba
- A.8 Advanced Array Input and Output
- A.9 Performance Tips
- More on the IPython System
- B.1 Using the Command History
- B.2 Interacting with the Operating System
- B.3 Software Development Tools
- Interactive Debugger
- Timing Code: %time and %timeit
- Basic Profiling: %prun and %run -p
- Profiling a Function Line by Line
- B.4 Tips for Productive Code Development Using IPython
- B.5 Advanced IPython Features
- B.6 Conclusion
참고 서적
- Official Python Tutorial
- Python Cookbook, Third Edition, by David Beazley and Brian K. Jones
(O’Reilly)
- Fluent Python by Luciano Ramalho (O’Reilly)
- Effective Python by Brett Slatkin (Pearson)
- Pandas for Everyone: Python Data Analysis, First Edition, By Daniel
Y. Chen
- `Object-Oriented Programming in
Python <http://python-textbok.readthedocs.io/en/1.0/index.html>`__