African Coding Network Syllabus > Projects > Assertive programming helpers for Pandas

project: Assertive programming helpers for Pandas

Hard Prerequisites

IMPORTANT: Please review these prerequisites, they include important information that will help you with this content.

TOPICS: Introduction to assertive programming

Data-science is a bit of a funny thing because even though data scientists write a lot of code, they typically aren’t expected to follow all the software engineering best practices other kinds of developers would need to follow.

This can be a really good thing because a data-scientist’s job is usually to answer hard questions based on data, and make those answers clear to other people. A lot of the time, writing beautiful production-ready code is actually a bit of a waste of time.

But there is a problem - because data-scientists typically don’t follow “software development best practices” then they don’t get the benefits of those best practices. For example, it’s pretty rare for a data-scientist to write unit tests for a Jupyter notebook.

Data-scientists are human (usually) and so they are prone to human-error (like everyone else). So it’s useful for data-scientists to have some way of at least doing some sanity-checking while they are writing code.

That’s where assertive programming comes in.

If you write good assertions in your code then:

You will have confidence that nothing insane is happening
Your code will be clearer to read since the assertions would make your thought-process more clear to anyone reading your code

Instructions

Please create a single file named assertions.py. Inside this file you will write a number of functions that will make use of assert statements to raise errors with useful error messages if things are not as expected.

Here is an example of a pretty trivial assertion function:

def assert_equal(a,b):
    assert a == b , f"{a} {type(a)} != {b} {type(b)}"

Take note of:

the use of the assert keyword
how much useful information is included in the error message
what happens if the assertion passes

Please implement the following functions:

def assert_df_dimensions_equal(df, height, width):

def assert_df_columns_equal(df,column_names):

def assert_no_duplicate_rows(df):

def assert_no_null_values(df):

def assert_all_values_positive(df):

Think about

What other assertions are you likely to use in future? Maybe you would want to check if a specific column or row meets a specific condition.