Story points | 5 |
Tags | data-wrangling jupyter-notebooks |
Hard Prerequisites | |
IMPORTANT: Please review these prerequisites, they include important information that will help you with this content. | |
|
Please make use of Jupyter notebooks while doing this project.
This data contains personality scores for learners, plus the department they applied for.
You should be able to write basic functions and for loops for this assignment. You should also be familiar with merging, filtering and creating new columns in pandas.
Optional: As far as possible, use functional programming techniques (map, reduce, apply) instead of loops when writing the functions below.
For example, to modify every column in a data frame (to get a percentage in this case), instead of writing:
for column in df:
column = column/10*100 #get percentage
use:
def get_percentage(score):
score/10*100
df.apply(get_percentage, axis = 1) #axis=1 applies the function to all columns
├──data
│ ├──departments.csv
│ └──personality_scores.csv
├──notebook
│ └──data_wrangling.ipynb
├──README.md
├──requirements.txt
└──.gitignore
Section 5 of 6 [I am always prepared.]
becomes I am always prepared
. Store this updated version of the DataFrame in a variable called: personality_score_df
.Write a function (or functions) to calculate the total score for each subscale as defined in scoring. The new columns should be named Conscientiousness
, Emotional stability
, Openness to experience
, Agreeableness
, Extraversion
. In other words, for the Conscientiousness total score, all items marked as belonging to that subscale should be summed.
The new data frame should be named personality_score_totals_df
and will look something like this:
I am always prepared | I am easily disturbed | I am exacting (demanding) in my work | … | Conscientiousness | Neuroticism |
---|---|---|---|---|---|
(3, 5) | (4, 5) | (3, 5) | … | 10 | 5 |
(3, 5) | (4, 1) | (3, 1) | … | 6 | 1 |
(3, 5) | (4, 3) | (3, 3) | … | 8 | 3 |
personality_score_totals_df
.merged_personality_department_df
Filter the merged DataFrame so that you only see the applicants who scored less than 30 on emotional stability, conscientiousness AND agreeableness. Next, assign these applicants the tag “High risk” in a new column called Risk Status
. All other applicants get the tag “Low risk”. The DataFrame here should be named risk_status_df
.
Wrangle a new DataFrame with a count of the number of low and high risk applicants within each department. Let each department be a separate column. This new DataFrame should be named risk_status_summary_df
. If there are no learners in one of the categories, it should be represented by zero and not a null entry. The DataFrame should look something like this:
Risk Status | Copywriting | Data | Design | Strategy | Web Dev |
---|---|---|---|---|---|
Low risk | 150 | 123 | 0 | 4 | 6 |
High risk | 40 | 0 | 22 | 67 | 9 |