| skim_type | skim_variable | n_missing | complete_rate | character.min | character.max | character.empty | character.n_unique | character.whitespace | factor.ordered | factor.n_unique | factor.top_counts | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| character | Dating_App | 0 | 1 | 2 | 6 | 0 | 3 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| factor | Gender | 0 | 1 | NA | NA | NA | NA | NA | FALSE | 3 | M: 4, F: 3, O: 2 | NA | NA | NA | NA | NA | NA | NA | NA |
| numeric | Height | 0 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | 165.66667 | 15.97655 | 133 | 156 | 166 | 178 | 183 | ▂▁▃▃▇ |
| numeric | Weight | 0 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | 70.11111 | 21.24526 | 45 | 55 | 70 | 80 | 110 | ▇▂▃▂▂ |
| numeric | Age | 0 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | 30.77778 | 16.93943 | 16 | 25 | 25 | 28 | 74 | ▇▂▁▁▁ |
Manuscript/Report Template for a Data Analysis Project
Austin Thrash contributed to this exercise.
The structure below is one possible setup for a data analysis project (including the course project). For a manuscript, adjust as needed. You don’t need to have exactly these sections, but the content covering those sections should be addressed.
This uses MS Word as output format. See here for more information. You can switch to other formats, like html or pdf. See the Quarto documentation for other formats.
1 Summary/Abstract
This study is will be used to determine what the correlation is between winning and losing teams in the NFL and the amount of salary each team is spending.
2 Introduction
2.1 General Background Information
I think it’s safe to say that many of us may be familiar with the movie Moneyball. The movie, based on the book of the same title, follows a team with one of the lowest payrolls in Major League Baseball as they catapult to the playoffs to compete for the championship against teams with unending resources. The team, the Oakland Athletics, was able to do this by using a more advanced statistical approach to evaluating players for their team. This allowed them to find hidden value in the player market that other teams were not aware of. This has always been fascinating and it led me to wonder how this might work in other leagues. I decided to take a closer look at the National Football League.
While I will not be studying advanced football statistics to measure a player’s “hidden” value, I will be looking at where they rank in terms of salary and how much of an impact they have in their team’s winning percentage.
2.2 Description of data and data source
The data is contains a recap of the 2023 NFL season which includes such variables as wins, losses, points scored, total offense, total defense as well as the amount of money the teams are spending in salary. The data will be from Sportrac.com, overthecap.com, profootballreference.com, and espn.com.
2.2.1 Assignment 2: Data Analysis
The variables ‘Age’ and ‘Dating App’ were added to the worksheet. Age will be a numerical value in years. Dating App will be a categorical value; Bumble, Hinge, Tinder and NA will be accepted values.
2.3 Questions/Hypotheses to be addressed
The most obvious question is, do teams that spend the most on salary tend to have higher winning percentages? But the question that I will focus on is, where is the value? When you look at a winning team, there is generally a player, or a group of players, that are playing beyond their salary, i.e., these players are playing above expectations and are therefore considered a good or great value for their respective teams. Is there a common value among certain positions? Or is the value to be found in those players strictly on rookie contracts?
To cite other work (important everywhere, but likely happens first in introduction), make sure your references are in the bibtex file specified in the YAML header above (here dataanalysis_template_references.bib) and have the right bibtex key. Then you can include like this:
Examples of reproducible research projects can for instance be found in (McKay, Ebell, Billings, et al., 2020; McKay, Ebell, Dale, Shen, & Handel, 2020)
3 Methods
I will perform an initial exploratory data analysis to visualize the data and get an understanding of the patterns and relationships between variables.
3.1 Data aquisition
As applicable, explain where and how you got the data. If you directly import the data from an online source, you can combine this section with the next.
3.2 Data import and cleaning
Write code that reads in the file and cleans it so it’s ready for analysis. Since this will be fairly long code for most datasets, it might be a good idea to have it in one or several R scripts. If that is the case, explain here briefly what kind of cleaning/processing you do, and provide more details and well documented code somewhere (e.g. as supplement in a paper). All materials, including files that contain code, should be commented well so everyone can follow along.
3.3 Statistical analysis
Explain anything related to your statistical analyses.
4 Results
4.1 Exploratory/Descriptive analysis
Use a combination of text/tables/figures to explore and describe your data. Show the most important descriptive results here. Additional ones should go in the supplement. Even more can be in the R and Quarto files that are part of your project.
Table 1 shows a summary of the data.
Note the loading of the data providing a relative path using the ../../ notation. (Two dots means a folder up). You never want to specify an absolute path like C:\ahandel\myproject\results\ because if you share this with someone, it won’t work for them since they don’t have that path. You can also use the here R package to create paths. See examples of that below. I recommend the here package, but I’m showing the other approach here just in case you encounter it.
4.2 Basic statistical analysis
To get some further insight into your data, if reasonable you could compute simple statistics (e.g. simple models with 1 predictor) to look for associations between your outcome(s) and each individual predictor variable. Though note that unless you pre-specified the outcome and main exposure, any “p<0.05 means statistical significance” interpretation is not valid.
Figure 1 shows a scatterplot figure produced by one of the R scripts.

4.3 Full analysis
Use one or several suitable statistical/machine learning methods to analyze your data and to produce meaningful figures, tables, etc. This might again be code that is best placed in one or several separate R scripts that need to be well documented. You want the code to produce figures and data ready for display as tables, and save those. Then you load them here.
Example Table 2 shows a summary of a linear model fit.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 151.8799863 | 16.5462815 | 9.1791008 | 0.0002574 |
| Age | 0.1925006 | 0.4340181 | 0.4435312 | 0.6759162 |
| Dating_AppNA | 13.9574880 | 20.3378116 | 0.6862827 | 0.5230575 |
| Dating_AppTinder | 8.5684978 | 15.3005997 | 0.5600106 | 0.5996324 |
knitr::include_graphics(here("starter-analysis-exercise","results","figures","Age-Height.png"))
knitr::include_graphics(here("starter-analysis-exercise","results","figures","Dating_App-weight.png"))
5 Discussion
5.1 Summary and Interpretation
Summarize what you did, what you found and what it means.
5.2 Strengths and Limitations
Discuss what you perceive as strengths and limitations of your analysis.
5.3 Conclusions
What are the main take-home messages?
Include citations in your Rmd file using bibtex, the list of references will automatically be placed at the end
This paper (Leek & Peng, 2015) discusses types of analyses.
These papers (McKay, Ebell, Billings, et al., 2020; McKay, Ebell, Dale, et al., 2020) are good examples of papers published using a fully reproducible setup similar to the one shown in this template.
Note that this cited reference will show up at the end of the document, the reference formatting is determined by the CSL file specified in the YAML header. Many more style files for almost any journal are available. You also specify the location of your bibtex reference file in the YAML. You can call your reference file anything you like, I just used the generic word references.bib but giving it a more descriptive name is probably better.