DataFrame

2D data structure

Rows & Columns labeled

Columns can have different data types

Create from

Accessing Elements

loc, iloc

at and iat

at uses index

iat used position

Changing Row and Column Index

Creating DataFrame

Calculation

Reading Data

Formats

Contents of local file states.csv

state,area,pop,density
California,423967,38332521,90.41392608386974
Texas,695662,26448193,38.01874042279153
New York,141297,19651127,139.07674614464568
Florida,170312,19552860,114.80612053173
Illinois,149995,12882135,85.88376279209307

R Datasets

Collection of ~1000 Data sets

Used To illustrate/test algorithms In textbooks & assignments

List

Iris Dataset

Iris flower

Sepal and Petal

Sepal  Petal

Iris Dataset

Given measurements of A Sepal and a petal can we determine which iris?

Dataset contains 150 rows of measurements

Three types of iris

Assign

Returns a copy of the dataframe

Using Lambda

Prolems with dots in names

Some syntax does not support dots in names

Operations on DataFrames

Done element by element

DataFrame & Series Operations

Default is row-wise

Column-wise Operations with Series

Aggregation

Group by: split apply combine

Splitting the data into groups based on some criteria

Applying a function to each group independently

Combining the results into a data structure