Hard Prerequisites |
IMPORTANT: Please review these prerequisites, they include important information that will help you with this content. |
|
As a Data Engineer, you will be required to process large data sets for various reasons. Apache Spark is an open-source general-purpose distributed processing system used for processing big data.
Apache Spark is written in Scala, but can be controlled using a package called PySpark.
Real Python has a great introduction.
This is a good tutorial to get you started with PySpark. It’ll take you from zero to hero.