Matplotlib Tutorial (Part 1): Creating and Customizing Our First Plots

Corey Schafer4 minutes read

Matplotlib is a Python library essential for data visualization, and this tutorial covers its installation, basic plotting techniques, and the use of sample data from the 2019 Stack Overflow Developer Survey to graph median salaries by age. Future tutorials will explore more complex plotting methods and data handling while encouraging viewer engagement through subscriptions and support.

Insights

  • Matplotlib is a powerful Python library essential for data visualization, enabling users to create various types of plots, such as line graphs, with straightforward commands like `plt.plot()` and customizable features like colors, styles, and markers, making it a vital tool for data science projects.
  • The tutorial utilizes real-world data from the 2019 Stack Overflow Developer Survey, specifically median salaries by age, to demonstrate plotting techniques, revealing insights such as a significant salary gap for Python developers between ages 25 and 35, while also encouraging viewers to explore additional plot types and engage with the content through likes and subscriptions.

Get key ideas from YouTube videos. It’s free

Recent questions

  • What is Matplotlib used for?

    Matplotlib is a Python library designed for creating visualizations, making it essential in data science for effectively graphing data. It provides a wide range of plotting techniques that allow users to represent data visually, which is crucial for analysis and interpretation. By utilizing Matplotlib, data scientists can create various types of plots, such as line graphs, bar charts, and scatter plots, to convey insights and trends in their data. This capability enhances the understanding of complex datasets, making it easier to communicate findings to others.

  • How do I install Matplotlib?

    To install Matplotlib, you can use the command `pip install matplotlib` in your terminal. It is advisable to create a virtual environment for your new projects to keep dependencies organized and avoid conflicts with other packages. While setting up a virtual environment is not mandatory, it is a best practice that helps maintain a clean workspace. Once installed, you can start using Matplotlib in your Python scripts or interactive environments, allowing you to create visualizations for your data analysis tasks.

  • What is the purpose of `plt.show()`?

    The `plt.show()` function in Matplotlib is used to display the plot that has been created. After you have defined your data and plotted it using commands like `plt.plot()`, calling `plt.show()` will render the visualization in a window, allowing you to see the graphical representation of your data. This function is essential for visualizing the results of your plotting commands, as it triggers the graphical user interface to present the plot. Without this command, the plot may not appear, especially when running scripts outside of interactive environments like Jupyter Notebooks.

  • How can I customize plot colors?

    You can customize plot colors in Matplotlib by specifying color options in the plotting functions. For instance, you can set the color of lines using the `color` argument, such as `color='blue'` for a blue line. Additionally, you can use hex color values for more precise color choices, like `#5A7D9E` for a specific shade. Matplotlib also allows you to change line styles and thickness, enhancing the visual appeal of your plots. By adjusting these parameters, you can create more informative and visually engaging representations of your data.

  • What are built-in styles in Matplotlib?

    Built-in styles in Matplotlib are predefined visual themes that allow users to quickly change the appearance of their plots. You can access these styles using `plt.style.available`, which lists options like 'Seaborn', 'ggplot', and '538'. To apply a style, you simply use the command `plt.style.use('style_name')`, replacing 'style_name' with your chosen style. These styles help enhance the aesthetics of your visualizations, making them more appealing and easier to interpret, while also saving time in formatting plots from scratch.

Related videos

Summary

00:00

Visualizing Data with Matplotlib in Python

  • Matplotlib is a Python library used for creating visualizations, essential in data science for graphing data effectively. The series will cover various plotting techniques using this library.
  • To install Matplotlib, use the command `pip install matplotlib` in the terminal. It's recommended to create a virtual environment for new projects, though it's not mandatory.
  • The tutorial will utilize Sublime Text for coding, but Jupyter Notebooks can also be used for interactive plotting. A separate video on Jupyter usage is available for interested viewers.
  • Begin by importing the `pyplot` module from Matplotlib using `from matplotlib import pyplot as plt`, a common convention to simplify code.
  • Sample data from the 2019 Stack Overflow Developer Survey will be used, specifically median salaries by age. A link to the data files will be provided for viewers to follow along.
  • Create a list for the x-axis (`dev_X`) representing age ranges from 25 to 35, and a corresponding list for the y-axis (`dev_Y`) for median salaries, which will be plotted.
  • Use `plt.plot(dev_X, dev_Y)` to create a basic line plot. To display the plot, include `plt.show()` after the plotting command.
  • Add a title and axis labels to the plot using `plt.title("Median Salary by Age in USD")`, `plt.xlabel("Ages")`, and `plt.ylabel("Median Salary in USD")` for clarity.
  • To plot additional data, such as median salaries for Python developers, create new lists (`pydev_X`, `pydev_Y`) and use the same x-axis values to avoid redundancy.
  • Implement a legend to differentiate between the plotted lines by using `plt.legend(["All Devs", "Python"])` or by adding a `label` argument in the `plt.plot()` method for better accuracy.

13:37

Enhancing Plot Clarity and Style in Python

  • The code uses plot labels for clarity, making it easier to read and self-documenting, enhancing understanding of what is being plotted.
  • To change line colors and styles, the developer sets the developers' line to gray and the Python line to blue, reflecting Python's branding.
  • A format string can be used to specify line color and style, such as using 'K' for black and '--' for a dashed line.
  • For better readability, the developer prefers passing color and line style as arguments instead of using format strings, e.g., `color='K'` and `line_style='--'`.
  • Markers can be added to lines by specifying `marker` in the plot method, with options like dots or circles available from the formatting page.
  • Hex color values provide more color options; for example, a gray line can use the hex value `#444444`, while Python's line can use `#5A7D9E`.
  • Additional data for JavaScript developer salaries is incorporated, with a yellowish color represented by the hex value `#ADAD3B`.
  • Line thickness can be adjusted using `line_width=3` to emphasize specific language lines, while the default for all developers remains unchanged.
  • To improve plot appearance, `plt.tight_layout()` is used to adjust padding, ensuring no data is cut off, especially on smaller screens.
  • Built-in styles in Matplotlib can be accessed with `plt.style.available`, allowing users to experiment with styles like '538', 'Seaborn', and 'ggplot' for visual enhancement.

27:12

Matplotlib Plotting Techniques and Insights

  • To change the plot style in matplotlib, use `plt.style.use('style_name')`, replacing 'style_name' with options like '538' or 'ggplot' for different visual appearances.
  • The `plt.xkcd()` method mimics the style of xkcd comics, creating hand-drawn-like plots with squiggly lines, ideal for informal presentations or blogs.
  • Save plots programmatically using `plt.savefig('filename.png')`, which saves the current figure as a PNG file in the specified directory, defaulting to the current directory.
  • Median salary data for ages 18 to 55 was plotted, revealing a significant salary gap for Python developers primarily between ages 25 and 35, with other languages catching up later.
  • Future tutorials will cover creating various plot types, including bar charts, pie charts, and scatter plots, as well as loading data from CSV files for more complex visualizations.
  • Viewers are encouraged to engage by liking, sharing, and subscribing for future content, with options to support the channel through Patreon for those interested.
Channel avatarChannel avatarChannel avatarChannel avatarChannel avatar

Try it yourself — It’s free.