Mastering Spark [PART 01]: A Brief Overview of PySpark Logging

1 minute read


This article is about a brief overview of how to write log messages using PySpark logging.

Log Properties Configuration

I. Go to the conf folder located in PySpark directory.

$ cd spark-2.4.0-bin-hadoop2.7/conf

II. Modify the by appending these lines:

# Define the root logger with Appender file
log4j.rootLogger=WARN, console

# Define the file appender

# Name of the log file

# Set immediate flush to true

# Set the threshold to DEBUG mode

# Set File append to true

# Set the Default Date pattern
log4j.appender.FILE.DatePattern='.' yyyy-MM-dd

# Default layout for the appender

Then, save the configuration file ( and give it a new name:

You can adjust the configuration file in accordance with your specific needs. Please visit log4j documentation for more information. However, the above configuration is quiet enough for a simple logging task.

Use Logging in Your PySpark Code

Execute this simple code:

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("pyspark-logging")
sc = SparkContext(conf=conf)
log4jLogger =

log = log4jLogger.LogManager.getLogger(__name__)
log.trace("Trace Message!")
log.debug("Debug Message!")"Info Message!")
log.warn("Warn Message!")
log.error("Error Message!")
log.fatal("Fatal Message!")

If you want to write the log messages into a file, you can modify these property config lines:

log4j.rootLogger=WARN, FILE

FYI, you can use any name for the appender. In this case, we’re using ‘FILE’ as the appender’s name.