African Coding Network Syllabus > Topics > python specific resources > Python and Spark

topic: Python and Spark

Hard Prerequisites

IMPORTANT: Please review these prerequisites, they include important information that will help you with this content.

As a Data Engineer, you will be required to process large data sets for various reasons. Apache Spark is an open-source general-purpose distributed processing system used for processing big data.

Apache Spark is written in Scala, but can be controlled using a package called PySpark.

Resources

Real Python has a great introduction.

This is a good tutorial to get you started with PySpark. It’ll take you from zero to hero.