응용수학

주제: 데이터 분석 도구 파이썬

  • 파이썬 기초: Python, IPython, Jupyter, Numpy
  • 데이터 불러오기/쓰기: 텍스트문서, 웹문서, 엑셀, 데이터베이스
  • 데이터 가공: 누락 항목, 잘못된 형식 처리,
  • 데이터 분석: 그룹핑, 그룹별 계산
  • 데이터 시각화: 막대, 꺽은선, 박스플롯, 버블플롯, 파이 차트

교재

  • Python for Data Analysis, 2nd Edition, by William Wesley McKinney, Publisher: O’Reilly Media, Inc.Pub. Date: October 10, 2017(고려대학교 도서관 온라인 북)

내용

  • Preliminaries
    • 1.1 What Is This Book About?
    • 1.2 Why Python for Data Analysis?
    • 1.3 Essential Python Libraries NumPy
      • pandas
      • matplotlib
      • IPython and Jupyter
      • SciPy
      • scikit-learn
      • statsmodels
    • 1.4 Installation and Setup
    • 1.5 Community and Conferences
    • 1.6 Navigating This Book
  • Python Language Basics, IPython, and Jupyter Notebooks
    • 2.1 The Python Interpreter
    • 2.2 IPython Basics
      • Running the IPython Shell
      • Running the Jupyter Notebook
      • Tab Completion
      • Introspection
      • The %run Command
      • Executing Code from the Clipboard
      • Terminal Keyboard Shortcuts
      • About Magic Commands
      • Matplotlib Integration 2.3 Python Language Basics
      • Language Semantics
      • Scalar Types
      • Control Flow
  • Built-in Data Structures, Functions, and Files
    • 3.1 Data Structures and Sequences
      • Tuple
      • List
      • Built-in Sequence Functions
      • dict
      • set
      • List, Set, and Dict Comprehensions
    • 3.2 Functions
      • Namespaces, Scope, and Local Functions
      • Returning Multiple Values
      • Functions Are Objects
      • Anonymous (Lambda) Functions
      • Currying: Partial Argument Application
      • Generators
      • Errors and Exception Handling
    • 3.3 Files and the Operating System
    • 3.4 Conclusion
  • NumPy Basics: Arrays and Vectorized Computation
    • 4.1 The NumPy ndarray: A Multidimensional Array Object
      • Creating ndarrays
      • Data Types for ndarrays
      • Arithmetic with NumPy Arrays
      • Basic Indexing and Slicing
      • Boolean Indexing
      • Fancy Indexing
      • Transposing Arrays and Swapping Axes
    • 4.2 Universal Functions: Fast Element-Wise Array Functions
    • 4.3 Array-Oriented Programming with Arrays
      • Expressing Conditional Logic as Array Operations
      • Mathematical and Statistical Methods
      • Methods for Boolean Arrays
      • Sorting
      • Unique and Other Set Logic
    • 4.4 File Input and Output with Arrays
    • 4.5 Linear Algebra
    • 4.6 Pseudorandom Number Generation
    • 4.7 Example: Random Walks
    • 4.8 Conclusion
  • Getting Started with pandas
    • 5.1 Introduction to pandas Data Structures
      • Series
      • DataFrame
      • Index Objects
    • 5.2 Essential Functionality
      • Reindexing
      • Dropping Entries from an Axis
      • Indexing, Selection, and Filtering
      • Integer Indexes
      • Arithmetic and Data Alignment
      • Function Application and Mapping
      • Sorting and Ranking
      • Axis Indexes with Duplicate Labels
    • 5.3 Summarizing and Computing Descriptive Statistics
    • 5.4 Conclusion
  • Data Loading, Storage, and File Formats
    • 6.1 Reading and Writing Data in Text Format
      • Reading Text Files in Pieces
      • Writing Data to Text Format
      • Working with Delimited Formats
      • JSON Data
      • XML and HTML: Web Scraping
    • 6.2 Binary Data Formats
    • 6.3 Interacting with Web APIs
    • 6.4 Interacting with Databases
    • 6.5 Conclusion
  • Data Cleaning and Preparation
    • 7.1 Handling Missing Data
    • 7.2 Data Transformation
      • Removing Duplicates
      • Transforming Data Using a Function or Mapping
      • Replacing Values
      • Renaming Axis Indexes
      • Discretization and Binning
      • Detecting and Filtering Outliers
      • Permutation and Random Sampling
      • Computing Indicator/Dummy Variables
    • 7.3 String Manipulation
      • String Object Methods
      • Regular Expressions
      • Vectorized String Functions in pandas
    • 7.4 Conclusion
  • Data Wrangling: Join, Combine, and Reshape
    • 8.1 Hierarchical Indexing
    • 8.2 Combining and Merging Datasets
      • Database-Style DataFrame Joins
      • Merging on Index
      • Concatenating Along an Axis
      • Combining Data with Overlap
    • 8.3 Reshaping and Pivoting
      • Reshaping with Hierarchical Indexing
      • Pivoting “Long” to “Wide” Format
      • Pivoting “Wide” to “Long” Format
    • 8.4 Conclusion
  • Plotting and Visualization
    • 9.1 A Brief matplotlib API Primer
      • Figures and Subplots
      • Colors, Markers, and Line Styles
      • Ticks, Labels, and Legends
      • Annotations and Drawing on a Subplot
      • Saving Plots to File
      • matplotlib Configuration
    • 9.2 Plotting with pandas and seaborn
      • Line Plots
      • Bar Plots
      • Histograms and Density Plots
      • Scatter or Point Plots
      • Facet Grids and Categorical Data
    • 9.3 Other Python Visualization Tools
    • 9.4 Conclusion
  • Data Aggregation and Group Operations
    • 10.1 GroupBy Mechanics
    • 10.2 Data Aggregation
    • 10.3 Apply: General split-apply-combine
      • Suppressing the Group Keys
      • Quantile and Bucket Analysis
      • Example: Filling Missing Values with Group-Specific Values
      • Example: Random Sampling and Permutation
      • Example: Group Weighted Average and Correlation
      • Example: Group-Wise Linear Regression
    • 10.4 Pivot Tables and Cross-Tabulation
    • 10.5 Conclusion
  • Time Series
    • 11.1 Date and Time Data Types and Tools
    • 11.2 Time Series Basics
    • 11.3 Date Ranges, Frequencies, and Shifting
      • Generating Date Ranges
      • Frequencies and Date Offsets
      • Shifting (Leading and Lagging) Data
    • 11.4 Time Zone Handling
    • 11.5 Periods and Period Arithmetic
      • Period Frequency Conversion
      • Quarterly Period Frequencies
      • Converting Timestamps to Periods (and Back)
      • Creating a PeriodIndex from Arrays
    • 11.6 Resampling and Frequency Conversion
      • Downsampling
      • Upsampling and Interpolation
      • Resampling with Periods
    • 11.7 Moving Window Functions
    • 11.8 Conclusion
  • Advanced pandas
    • 12.1 Categorical Data
      • Background and Motivation
      • Categorical Type in pandas
      • Computations with Categoricals
      • Categorical Methods
    • 12.2 Advanced GroupBy Use
    • 12.3 Techniques for Method Chaining
    • 12.4 Conclusion
  • Introduction to Modeling Libraries in Python
    • 13.1 Interfacing Between pandas and Model Code
    • 13.2 Creating Model Descriptions with Patsy
    • 13.3 Introduction to statsmodels
    • 13.4 Introduction to scikit-learn
    • 13.5 Continuing Your Education
  • Data Analysis Examples
    • 14.1 1.USA.gov Data from Bitly
      • Counting Time Zones in Pure Python
      • Counting Time Zones with pandas
    • 14.2 MovieLens 1M Dataset
    • 14.3 US Baby Names 1880–2010
      • Analyzing Naming Trends
    • 14.4 USDA Food Database
    • 14.5 2012 Federal Election Commission Database
      • Donation Statistics by Occupation and Employer
      • Bucketing Donation Amounts
      • Donation Statistics by State
    • 14.6 Conclusion
  • Advanced NumPy
    • A.1 ndarray Object Internals
    • A.2 Advanced Array Manipulation
      • Reshaping Arrays
      • C Versus Fortran Order
      • Concatenating and Splitting Arrays
      • Repeating Elements: tile and repeat
      • Fancy Indexing Equivalents: take and put
    • A.3 Broadcasting
    • A.4 Advanced ufunc Usage
    • A.5 Structured and Record Arrays
    • A.6 More About Sorting
    • A.7 Writing Fast NumPy Functions with Numba
    • A.8 Advanced Array Input and Output
    • A.9 Performance Tips
  • More on the IPython System
    • B.1 Using the Command History
    • B.2 Interacting with the Operating System
    • B.3 Software Development Tools
      • Interactive Debugger
      • Timing Code: %time and %timeit
      • Basic Profiling: %prun and %run -p
      • Profiling a Function Line by Line
    • B.4 Tips for Productive Code Development Using IPython
    • B.5 Advanced IPython Features
    • B.6 Conclusion

참고 서적

  • Official Python Tutorial
  • Python Cookbook, Third Edition, by David Beazley and Brian K. Jones (O’Reilly)
  • Fluent Python by Luciano Ramalho (O’Reilly)
  • Effective Python by Brett Slatkin (Pearson)
  • Pandas for Everyone: Python Data Analysis, First Edition, By Daniel Y. Chen
  • `Object-Oriented Programming in Python <http://python-textbok.readthedocs.io/en/1.0/index.html>`__