Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use simplified data set? #2

Open
kbroman opened this issue Oct 3, 2017 · 2 comments
Open

Use simplified data set? #2

kbroman opened this issue Oct 3, 2017 · 2 comments

Comments

@kbroman
Copy link

kbroman commented Oct 3, 2017

When introducing R, we tend to get bogged down on dealing with missing values (NAs) and dealing with factors. And then dealing with both in read.csv(), via the arguments stringsAsFactors and na.strings.

For the "portal" data set, they both hit you hard with the sex column which has values "M", "F", or "" (blank).

Christina and I recently taught a one-day Data Carpentry workshop, and I taught dplyr and ggplot2 using the gapminder data. Those data were considerably easier because there are no missing values at all, and while there are factors, you don't need to do anything special with them.

We might consider using a reduced portal data set that has only the complete records, deferring discussion of NAs and factors to later or just skipping it entirely.

@byandell
Copy link

byandell commented Oct 5, 2017

I like this idea of simplified portal data set to get quickly in to R. I also see value in the gapminder data, as there is an audience that largely uses categorical data. Not sure we are ready for two versions but I like what I see at https://github.com/kbroman/Workshop_DataCarpNSBE.

@sstevens2
Copy link
Member

Notes from discussion:

  • Clean up the data set so it doesn't have missing data
  • Add info about what the dataset is and its history
  • Put all the files into one zipped file for them to download
    • Including the file structure in this but have them explore this and add an exercise before where they put together a hypothetical project file

Solution: adding a lesson with this - could take about 30 min.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants