Hard Prerequisites |
IMPORTANT: Please review these prerequisites, they include important information that will help you with this content. |
|
Let’s imagine that you have contacted a person who owns a bunch of properties and they want to add all their stuff to CloudBnb in bulk. You could get someone do do manual data capturing to get all the data into your db, but that would be pretty soul-destroying work. And manual data capture is prone to human error.
Now most businesses are run on spreadsheets, so it is actually a very common task to take a spreadsheet full of data, clean it up, and stick it into your database.
Write a management command that:
Things to keep in mind:
Often an etl script will be run by some other program, for example it might be run by Airflow. Airflow is cool because it can do things like retry tasks if something seems to break. And in production, all sorts of things have downtime.
Eg: the database might be down because updates are being installed, or migrations are being run.
It should always be safe to retry your scripts.
On top of that, certain pipelines get run multiple times. For example, what if the property owner adds a whole lot of new stuff to the spreadsheet and asks you to load up the new stuff?
get_or_create
is hella useful if you want to avoid duplicating data: https://stackoverflow.com/questions/1941212/correct-way-to-use-get-or-create