project: Write an ETL script to populate the database

Hard Prerequisites
IMPORTANT: Please review these prerequisites, they include important information that will help you with this content.
  • PROJECTS: CloudBnb - intro
  • PROJECTS: CloudBnb - Basic Property model and admin panel

  • Let’s imagine that you have contacted a person who owns a bunch of properties and they want to add all their stuff to CloudBnb in bulk. You could get someone do do manual data capturing to get all the data into your db, but that would be pretty soul-destroying work. And manual data capture is prone to human error.

    Now most businesses are run on spreadsheets, so it is actually a very common task to take a spreadsheet full of data, clean it up, and stick it into your database.

    Instructions

    Write a management command that:

    1. Grabs data from this Google spreadsheet: https://docs.google.com/spreadsheets/d/1LU7FDyrCi-S1CYmEoIWetMhf2lUK28gz71EVcE6RJjY/
    2. Saves each line into your database

    Things to keep in mind:

    1. The script should be idempotent. That’s a funny word but basically it means that if you run the script twice, it shouldn’t screw up and make duplicate properties or anything. Running the script twice should have the same effect as running the script once
    2. Your script will need to be able to access Google sheets!

    Why do we care about making things idempotent?

    Often an etl script will be run by some other program, for example it might be run by Airflow. Airflow is cool because it can do things like retry tasks if something seems to break. And in production, all sorts of things have downtime.

    Eg: the database might be down because updates are being installed, or migrations are being run.

    It should always be safe to retry your scripts.

    On top of that, certain pipelines get run multiple times. For example, what if the property owner adds a whole lot of new stuff to the spreadsheet and asks you to load up the new stuff?


    RAW CONTENT URL