Course Project – 1 person team
Housing Data File
This data set contains 25 variables that are described in the data dictionary tab and include
several categorical variables, some binary variables, and numerical data. You can approach this
data from a variety of perspectives using the techniques you learned in class to answer
questions. Use the tool of your choice but make sure you know how to use it correctly. There
are 2930 observations, more than enough to provide valid and reliable statistical analysis.
You have been hired by the local real estate broker to analyze activity in the local housing
market. You must conduct one ANOVA analysis and one Regression analysis to answer two
questions you believe will help the broker provide the best guidance to both buyers and sellers.
Categorical variables include building type, neighborhood, and house style. If you think it is
important to understand the age of the house when it sold, you will need to create a new
variable (year built and year sold are two variables provided). As with any project, you will start
with EDA to get a sense of your data. For categorical variables, using a pivot table to get counts
and proportions is an excellent way to get a better understanding of those variables.
Perform EDA and include information about the data in the report that helps the reader get an
understanding of the data set (useful graphics should be included). Develop two research
questions (ideas include regression model to predict price, ANOVA to compare a numerical
variables based upon categorical variable groups, etc.). Perform the analysis and write a
detailed description of the results and what they mean (how you would use them).
Create a 4-5 slide presentation that would support a very brief presentation (3 minutes) of your
analysis. Please provide the paper in the exact same pattern as the sample paper and also provide separate excel for both the analysis