News from this site

 Rental advertising space, please contact the webmaster if you need cooperation


+focus
focused

classification  

no classification

tag  

no tag

date  

no datas

Python画图常用代码总结,这20个画图代码现拿现用

posted on 2023-06-03 19:55     read(564)     comment(0)     like(2)     collect(2)


Table of contents

foreword

1. Scatter plot

2. Bubble chart with border

3. Scatter plot with line of best fit for linear regression

4. Jitter diagram

5. Count chart

6. Edge histogram

7. Edge box plot

9. Matrix diagram

10. Divergent bar chart

11. Divergent text

12. Divergent package point map

13. Marked divergent lollipop chart

14. Area chart

15. Ordered bar chart

16. Lollipop map

17. Package map

18. Slope map

19. Dumbbell diagram

20. Histogram of continuous variables


foreword

Summary of commonly used codes for Python drawing, ready to use! Summary of commonly used codes for Python drawing, ready to use!

Hello everyone, today I would like to share with you a summary of 20 Matplotlib graphs, which are very useful in data analysis and visualization, and you can collect them and practice slowly.


1. Scatter plot

Scatteplot is a classic and fundamental plot for studying the relationship between two variables. If you have multiple groups in your data, you may want to visualize each group in a different color. In Matplotlib, you can conveniently use .

  1. # Import dataset
  2. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  3. # Prepare Data
  4. # Create as many colors as there are unique midwest['category']
  5. categories = np.unique(midwest['category'])
  6. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  7. # Draw Plot for Each Category
  8. plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  9. for i, category in enumerate(categories):
  10. plt.scatter('area', 'poptotal',
  11. data=midwest.loc[midwest.category==category, :],
  12. s=20, c=colors[i], label=str(category))
  13. # Decorations
  14. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  15. xlabel='Area', ylabel='Population')
  16. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  17. plt.title("Scatterplot of Midwest Area vs Population", fontsize=22)
  18. plt.legend(fontsize=12)
  19. plt.show()

 plan:

2. Bubble chart with border

Sometimes, you want to display a group of points within a boundary to emphasize their importance. In this example, you are taking the records from the dataframe that should be wrapped and passing it the records described in the code below. encircle()

  1. from matplotlib import patches
  2. from scipy.spatial import ConvexHull
  3. import warnings; warnings.simplefilter('ignore')
  4. sns.set_style("white")
  5. # Step 1: Prepare Data
  6. midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
  7. # As many colors as there are unique midwest['category']
  8. categories = np.unique(midwest['category'])
  9. colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]
  10. # Step 2: Draw Scatterplot with unique color for each category
  11. fig = plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')
  12. for i, category in enumerate(categories):
  13. plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5)
  14. # Step 3: Encircling
  15. # https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
  16. def encircle(x,y, ax=None, **kw):
  17. if not ax: ax=plt.gca()
  18. p = np.c_[x,y]
  19. hull = ConvexHull(p)
  20. poly = plt.Polygon(p[hull.vertices,:], **kw)
  21. ax.add_patch(poly)
  22. # Select data to be encircled
  23. midwest_encircle_data = midwest.loc[midwest.state=='IN', :]
  24. # Draw polygon surrounding vertices
  25. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1)
  26. encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5)
  27. # Step 4: Decorations
  28. plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
  29. xlabel='Area', ylabel='Population')
  30. plt.xticks(fontsize=12); plt.yticks(fontsize=12)
  31. plt.title("Bubble Plot with Encircling", fontsize=22)
  32. plt.legend(fontsize=12)
  33. plt.show()

 plan:

3. Scatter plot with line of best fit for linear regression

 If you want to understand how two variables change each other, then the line of best fit is the way to go. The following plot shows the difference in the lines of best fit between the groups in the data. To disable grouping and draw only one line of best fit for the entire dataset, remove the argument from the call below.

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_select = df.loc[df.cyl.isin([4,8]), :]
  4. # Plot
  5. sns.set_style("white")
  6. gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select,
  7. height=7, aspect=1.6, robust=True, palette='tab10',
  8. scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
  9. # Decorations
  10. gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
  11. plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)

 

Each regression line is in its own column

Alternatively, you can display the line of best fit for each group in its own column. You can do this by setting parameters in it.

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_select = df.loc[df.cyl.isin([4,8]), :]
  4. # Each line in its own column
  5. sns.set_style("white")
  6. gridobj = sns.lmplot(x="displ", y="hwy",
  7. data=df_select,
  8. height=7,
  9. robust=True,
  10. palette='Set1',
  11. col="cyl",
  12. scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))
  13. # Decorations
  14. gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
  15. plt.show()

4. Jitter diagram

Often, multiple data points have the exact same X and Y values. As a result, multiple points are drawn over each other and hidden. To avoid this, it's handy to jitter a bit so you can see them visually.

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. # Draw Stripplot
  4. fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
  5. sns.stripplot(df.cty, df.hwy, jitter=0.25, size=8, ax=ax, linewidth=.5)
  6. # Decorations
  7. plt.title('Use jittered plots to avoid overlapping of points', fontsize=22)
  8. plt.show()

5. Count chart

Another option to avoid the point overlapping problem is to increase the point size, depending on how many points are in that point. Therefore, the larger the point size, the greater the concentration of surrounding points.

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. df_counts = df.groupby(['hwy', 'cty']).size().reset_index(name='counts')
  4. # Draw Stripplot
  5. fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
  6. sns.stripplot(df_counts.cty, df_counts.hwy, size=df_counts.counts*2, ax=ax)
  7. # Decorations
  8. plt.title('Counts Plot - Size of circle is bigger as more points overlap', fontsize=22)
  9. plt.show()

 

6. Edge histogram

A marginal histogram has a histogram of variables along the X and Y axes. This is used to visualize the relationship between X and Y as well as the univariate distribution of X and Y alone. This graph if often used in exploratory data analysis (EDA).

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. # Create Fig and gridspec
  4. fig = plt.figure(figsize=(16, 10), dpi= 80)
  5. grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2)
  6. # Define the axes
  7. ax_main = fig.add_subplot(grid[:-1, :-1])
  8. ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[])
  9. ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])
  10. # Scatterplot on main ax
  11. ax_main.scatter('displ', 'hwy', s=df.cty*4, c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, cmap="tab10", edgecolors='gray', linewidths=.5)
  12. # histogram on the right
  13. ax_bottom.hist(df.displ, 40, histtype='stepfilled', orientation='vertical', color='deeppink')
  14. ax_bottom.invert_yaxis()
  15. # histogram in the bottom
  16. ax_right.hist(df.hwy, 40, histtype='stepfilled', orientation='horizontal', color='deeppink')
  17. # Decorations
  18. ax_main.set(title='Scatterplot with Histograms
  19. displ vs hwy', xlabel='displ', ylabel='hwy')
  20. ax_main.title.set_fontsize(20)
  21. for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):
  22. item.set_fontsize(14)
  23. xlabels = ax_main.get_xticks().tolist()
  24. ax_main.set_xticklabels(xlabels)
  25. plt.show()

7. Edge box plot

Marginal boxplots serve a similar purpose to marginal histograms. However, boxplots are helpful in pinpointing the X and Y medians, 25th and 75th percentiles.

  1. # Import Data
  2. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
  3. # Create Fig and gridspec
  4. fig = plt.figure(figsize=(16, 10), dpi= 80)
  5. grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2)
  6. # Define the axes
  7. ax_main = fig.add_subplot(grid[:-1, :-1])
  8. ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[])
  9. ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])
  10. # Scatterplot on main ax
  11. ax_main.scatter('displ', 'hwy', s=df.cty*5, c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, cmap="Set1", edgecolors='black', linewidths=.5)
  12. # Add a graph in each part
  13. sns.boxplot(df.hwy, ax=ax_right, orient="v")
  14. sns.boxplot(df.displ, ax=ax_bottom, orient="h")
  15. # Decorations ------------------
  16. # Remove x axis name for the boxplot
  17. ax_bottom.set(xlabel='')
  18. ax_right.set(ylabel='')
  19. # Main Title, Xlabel and YLabel
  20. ax_main.set(title='Scatterplot with Histograms
  21. displ vs hwy', xlabel='displ', ylabel='hwy')
  22. # Set font size of different components
  23. ax_main.title.set_fontsize(20)
  24. for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):
  25. item.set_fontsize(14)
  26. plt.show()

 

8. Correlation diagram

Correlogram is used to visualize the correlation measures between all possible pairs of numerical variables in a given data frame (or 2D array).

 

  1. # Import Dataset
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
  3. # Plot
  4. plt.figure(figsize=(12,10), dpi= 80)
  5. sns.heatmap(df.corr(), xticklabels=df.corr().columns, yticklabels=df.corr().columns, cmap='RdYlGn', center=0, annot=True)
  6. # Decorations
  7. plt.title('Correlogram of mtcars', fontsize=22)
  8. plt.xticks(fontsize=12)
  9. plt.yticks(fontsize=12)
  10. plt.show()

 

9. Matrix diagram

Pairplots are a favorite in exploratory analysis to understand the relationship between all possible pairs of numeric variables. It is an essential tool for bivariate analysis.

  1. # Load Dataset
  2. df = sns.load_dataset('iris')
  3. # Plot
  4. plt.figure(figsize=(10,8), dpi= 80)
  5. sns.pairplot(df, kind="scatter", hue="species", plot_kws=dict(s=80, edgecolor="white", linewidth=2.5))
  6. plt.show()

 

  1. # Load Dataset
  2. df = sns.load_dataset('iris')
  3. # Plot
  4. plt.figure(figsize=(10,8), dpi= 80)
  5. sns.pairplot(df, kind="reg", hue="species")
  6. plt.show()

 

10. Divergent bar chart

Divergence bars are a great tool if you want to see how items have changed based on a single metric, and visualize the order and magnitude of this difference. It helps to quickly differentiate the performance of groups in the data and is very intuitive and communicates this immediately.

  1. # Prepare Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
  3. x = df.loc[:, ['mpg']]
  4. df['mpg_z'] = (x - x.mean())/x.std()
  5. df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']]
  6. df.sort_values('mpg_z', inplace=True)
  7. df.reset_index(inplace=True)
  8. # Draw plot
  9. plt.figure(figsize=(14,10), dpi= 80)
  10. plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=5)
  11. # Decorations
  12. plt.gca().set(ylabel='$Model$', xlabel='$Mileage$')
  13. plt.yticks(df.index, df.cars, fontsize=12)
  14. plt.title('Diverging Bars of Car Mileage', fontdict={'size':20})
  15. plt.grid(linestyle='--', alpha=0.5)
  16. plt.show()

 

11. Divergent text

Scattered text is similar to diverging bars, it's preferred if you want to show the value of each item in the chart in a nice and presentable way.

  1. # Prepare Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
  3. x = df.loc[:, ['mpg']]
  4. df['mpg_z'] = (x - x.mean())/x.std()
  5. df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']]
  6. df.sort_values('mpg_z', inplace=True)
  7. df.reset_index(inplace=True)
  8. # Draw plot
  9. plt.figure(figsize=(14,14), dpi= 80)
  10. plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z)
  11. for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z):
  12. t = plt.text(x, y, round(tex, 2), horizontalalignment='right' if x < 0 else 'left',
  13. verticalalignment='center', fontdict={'color':'red' if x < 0 else 'green', 'size':14})
  14. # Decorations
  15. plt.yticks(df.index, df.cars, fontsize=12)
  16. plt.title('Diverging Text Bars of Car Mileage', fontdict={'size':20})
  17. plt.grid(linestyle='--', alpha=0.5)
  18. plt.xlim(-2.5, 2.5)
  19. plt.show()

12. Divergent package point map

A divergence point plot is also similar to a divergence bar. However, the absence of bars reduces contrast and differences between groups compared to diverging bars.

  1. # Prepare Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
  3. x = df.loc[:, ['mpg']]
  4. df['mpg_z'] = (x - x.mean())/x.std()
  5. df['colors'] = ['red' if x < 0 else 'darkgreen' for x in df['mpg_z']]
  6. df.sort_values('mpg_z', inplace=True)
  7. df.reset_index(inplace=True)
  8. # Draw plot
  9. plt.figure(figsize=(14,16), dpi= 80)
  10. plt.scatter(df.mpg_z, df.index, s=450, alpha=.6, color=df.colors)
  11. for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z):
  12. t = plt.text(x, y, round(tex, 1), horizontalalignment='center',
  13. verticalalignment='center', fontdict={'color':'white'})
  14. # Decorations
  15. # Lighten borders
  16. plt.gca().spines["top"].set_alpha(.3)
  17. plt.gca().spines["bottom"].set_alpha(.3)
  18. plt.gca().spines["right"].set_alpha(.3)
  19. plt.gca().spines["left"].set_alpha(.3)
  20. plt.yticks(df.index, df.cars)
  21. plt.title('Diverging Dotplot of Car Mileage', fontdict={'size':20})
  22. plt.xlabel('$Mileage$')
  23. plt.grid(linestyle='--', alpha=0.5)
  24. plt.xlim(-2.5, 2.5)
  25. plt.show()

 

13. Marked divergent lollipop chart

Marked lollipops provide a flexible way to visualize divergence by highlighting any important data points you want to draw attention to and giving reasoning appropriately in the graph.

  1. # Prepare Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
  3. x = df.loc[:, ['mpg']]
  4. df['mpg_z'] = (x - x.mean())/x.std()
  5. df['colors'] = 'black'
  6. # color fiat differently
  7. df.loc[df.cars == 'Fiat X1-9', 'colors'] = 'darkorange'
  8. df.sort_values('mpg_z', inplace=True)
  9. df.reset_index(inplace=True)
  10. # Draw plot
  11. import matplotlib.patches as patches
  12. plt.figure(figsize=(14,16), dpi= 80)
  13. plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=1)
  14. plt.scatter(df.mpg_z, df.index, color=df.colors, s=[600 if x == 'Fiat X1-9' else 300 for x in df.cars], alpha=0.6)
  15. plt.yticks(df.index, df.cars)
  16. plt.xticks(fontsize=12)
  17. # Annotate
  18. plt.annotate('Mercedes Models', xy=(0.0, 11.0), xytext=(1.0, 11), xycoords='data',
  19. fontsize=15, ha='center', va='center',
  20. bbox=dict(boxstyle='square', fc='firebrick'),
  21. arrowprops=dict(arrowstyle='-[, widthB=2.0, lengthB=1.5', lw=2.0, color='steelblue'), color='white')
  22. # Add Patches
  23. p1 = patches.Rectangle((-2.0, -1), width=.3, height=3, alpha=.2, facecolor='red')
  24. p2 = patches.Rectangle((1.5, 27), width=.8, height=5, alpha=.2, facecolor='green')
  25. plt.gca().add_patch(p1)
  26. plt.gca().add_patch(p2)
  27. # Decorate
  28. plt.title('Diverging Bars of Car Mileage', fontdict={'size':20})
  29. plt.grid(linestyle='--', alpha=0.5)
  30. plt.show()

 

14. Area chart

By coloring the area between the axes and lines, area charts emphasize not only peaks and troughs, but also the duration of highs and lows. The longer the high lasts, the larger the area below the line.

  1. import numpy as np
  2. import pandas as pd
  3. # Prepare Data
  4. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates=['date']).head(100)
  5. x = np.arange(df.shape[0])
  6. y_returns = (df.psavert.diff().fillna(0)/df.psavert.shift(1)).fillna(0) * 100
  7. # Plot
  8. plt.figure(figsize=(16,10), dpi= 80)
  9. plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] >= 0, facecolor='green', interpolate=True, alpha=0.7)
  10. plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] <= 0, facecolor='red', interpolate=True, alpha=0.7)
  11. # Annotate
  12. plt.annotate('Peak
  13. 1975', xy=(94.0, 21.0), xytext=(88.0, 28),
  14. bbox=dict(boxstyle='square', fc='firebrick'),
  15. arrowprops=dict(facecolor='steelblue', shrink=0.05), fontsize=15, color='white')
  16. # Decorations
  17. xtickvals = [str(m)[:3].upper()+"-"+str(y) for y,m in zip(df.date.dt.year, df.date.dt.month_name())]
  18. plt.gca().set_xticks(x[::6])
  19. plt.gca().set_xticklabels(xtickvals[::6], rotation=90, fontdict={'horizontalalignment': 'center', 'verticalalignment': 'center_baseline'})
  20. plt.ylim(-35,35)
  21. plt.xlim(1,100)
  22. plt.title("Month Economics Return %", fontsize=22)
  23. plt.ylabel('Monthly returns %')
  24. plt.grid(alpha=0.5)
  25. plt.show()

 

 

15. Ordered bar chart

An ordered bar chart effectively communicates the ranking order of items. However, by adding the value of the metric above the graph, the user can get precise information from the graph itself.

  1. # Prepare Data
  2. df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
  4. df.sort_values('cty', inplace=True)
  5. df.reset_index(inplace=True)
  6. # Draw plot
  7. import matplotlib.patches as patches
  8. fig, ax = plt.subplots(figsize=(16,10), facecolor='white', dpi= 80)
  9. ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=20)
  10. # Annotate Text
  11. for i, cty in enumerate(df.cty):
  12. ax.text(i, cty+0.5, round(cty, 1), horizontalalignment='center')
  13. # Title, Label, Ticks and Ylim
  14. ax.set_title('Bar Chart for Highway Mileage', fontdict={'size':22})
  15. ax.set(ylabel='Miles Per Gallon', ylim=(0, 30))
  16. plt.xticks(df.index, df.manufacturer.str.upper(), rotation=60, horizontalalignment='right', fontsize=12)
  17. # Add patches to color the X axis labels
  18. p1 = patches.Rectangle((.57, -0.005), width=.33, height=.13, alpha=.1, facecolor='green', transform=fig.transFigure)
  19. p2 = patches.Rectangle((.124, -0.005), width=.446, height=.13, alpha=.1, facecolor='red', transform=fig.transFigure)
  20. fig.add_artist(p1)
  21. fig.add_artist(p2)
  22. plt.show()

 

16. Lollipop map

Lollipop charts serve a similar purpose to ordered bar charts in a visually pleasing manner.

  1. # Prepare Data
  2. df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
  4. df.sort_values('cty', inplace=True)
  5. df.reset_index(inplace=True)
  6. # Draw plot
  7. fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
  8. ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=2)
  9. ax.scatter(x=df.index, y=df.cty, s=75, color='firebrick', alpha=0.7)
  10. # Title, Label, Ticks and Ylim
  11. ax.set_title('Lollipop Chart for Highway Mileage', fontdict={'size':22})
  12. ax.set_ylabel('Miles Per Gallon')
  13. ax.set_xticks(df.index)
  14. ax.set_xticklabels(df.manufacturer.str.upper(), rotation=60, fontdict={'horizontalalignment': 'right', 'size':12})
  15. ax.set_ylim(0, 30)
  16. # Annotate
  17. for row in df.itertuples():
  18. ax.text(row.Index, row.cty+.5, s=round(row.cty, 2), horizontalalignment= 'center', verticalalignment='bottom', fontsize=14)
  19. plt.show()

17. Package map

The dot chart conveys the rank order of the items. Since it's aligned along the horizontal axis, you can more easily see how far the points are from each other.

  1. # Prepare Data
  2. df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
  4. df.sort_values('cty', inplace=True)
  5. df.reset_index(inplace=True)
  6. # Draw plot
  7. fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
  8. ax.hlines(y=df.index, xmin=11, xmax=26, color='gray', alpha=0.7, linewidth=1, linestyles='dashdot')
  9. ax.scatter(y=df.index, x=df.cty, s=75, color='firebrick', alpha=0.7)
  10. # Title, Label, Ticks and Ylim
  11. ax.set_title('Dot Plot for Highway Mileage', fontdict={'size':22})
  12. ax.set_xlabel('Miles Per Gallon')
  13. ax.set_yticks(df.index)
  14. ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'})
  15. ax.set_xlim(10, 27)
  16. plt.show()

 

18. Slope map

Slope charts are best for comparing the "before" and "after" positions of a given person/item.

  1. import matplotlib.lines as mlines
  2. # Import Data
  3. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv")
  4. left_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1952'])]
  5. right_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1957'])]
  6. klass = ['red' if (y1-y2) < 0 else 'green' for y1, y2 in zip(df['1952'], df['1957'])]
  7. # draw line
  8. # https://stackoverflow.com/questions/36470343/how-to-draw-a-line-with-matplotlib/36479941
  9. def newline(p1, p2, color='black'):
  10. ax = plt.gca()
  11. l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='red' if p1[1]-p2[1] > 0 else 'green', marker='o', markersize=6)
  12. ax.add_line(l)
  13. return l
  14. fig, ax = plt.subplots(1,1,figsize=(14,14), dpi= 80)
  15. # Vertical Lines
  16. ax.vlines(x=1, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted')
  17. ax.vlines(x=3, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted')
  18. # Points
  19. ax.scatter(y=df['1952'], x=np.repeat(1, df.shape[0]), s=10, color='black', alpha=0.7)
  20. ax.scatter(y=df['1957'], x=np.repeat(3, df.shape[0]), s=10, color='black', alpha=0.7)
  21. # Line Segmentsand Annotation
  22. for p1, p2, c in zip(df['1952'], df['1957'], df['continent']):
  23. newline([1,p1], [3,p2])
  24. ax.text(1-0.05, p1, c + ', ' + str(round(p1)), horizontalalignment='right', verticalalignment='center', fontdict={'size':14})
  25. ax.text(3+0.05, p2, c + ', ' + str(round(p2)), horizontalalignment='left', verticalalignment='center', fontdict={'size':14})
  26. # 'Before' and 'After' Annotations
  27. ax.text(1-0.05, 13000, 'BEFORE', horizontalalignment='right', verticalalignment='center', fontdict={'size':18, 'weight':700})
  28. ax.text(3+0.05, 13000, 'AFTER', horizontalalignment='left', verticalalignment='center', fontdict={'size':18, 'weight':700})
  29. # Decoration
  30. ax.set_title("Slopechart: Comparing GDP Per Capita between 1952 vs 1957", fontdict={'size':22})
  31. ax.set(xlim=(0,4), ylim=(0,14000), ylabel='Mean GDP Per Capita')
  32. ax.set_xticks([1,3])
  33. ax.set_xticklabels(["1952", "1957"])
  34. plt.yticks(np.arange(500, 13000, 2000), fontsize=12)
  35. # Lighten borders
  36. plt.gca().spines["top"].set_alpha(.0)
  37. plt.gca().spines["bottom"].set_alpha(.0)
  38. plt.gca().spines["right"].set_alpha(.0)
  39. plt.gca().spines["left"].set_alpha(.0)
  40. plt.show()

 

19. Dumbbell diagram

The dumbbell diagram communicates the "front" and "back" positions of various items and the ordering of the items. It is useful if you want to visualize the impact of a particular project/plan on different objects.
 

  1. import matplotlib.lines as mlines
  2. # Import Data
  3. df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/health.csv")
  4. df.sort_values('pct_2014', inplace=True)
  5. df.reset_index(inplace=True)
  6. # Func to draw line segment
  7. def newline(p1, p2, color='black'):
  8. ax = plt.gca()
  9. l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='skyblue')
  10. ax.add_line(l)
  11. return l
  12. # Figure and Axes
  13. fig, ax = plt.subplots(1,1,figsize=(14,14), facecolor='#f7f7f7', dpi= 80)
  14. # Vertical Lines
  15. ax.vlines(x=.05, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
  16. ax.vlines(x=.10, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
  17. ax.vlines(x=.15, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
  18. ax.vlines(x=.20, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
  19. # Points
  20. ax.scatter(y=df['index'], x=df['pct_2013'], s=50, color='#0e668b', alpha=0.7)
  21. ax.scatter(y=df['index'], x=df['pct_2014'], s=50, color='#a3c4dc', alpha=0.7)
  22. # Line Segments
  23. for i, p1, p2 in zip(df['index'], df['pct_2013'], df['pct_2014']):
  24. newline([p1, i], [p2, i])
  25. # Decoration
  26. ax.set_facecolor('#f7f7f7')
  27. ax.set_title("Dumbell Chart: Pct Change - 2013 vs 2014", fontdict={'size':22})
  28. ax.set(xlim=(0,.25), ylim=(-1, 27), ylabel='Mean GDP Per Capita')
  29. ax.set_xticks([.05, .1, .15, .20])
  30. ax.set_xticklabels(['5%', '15%', '20%', '25%'])
  31. ax.set_xticklabels(['5%', '15%', '20%', '25%'])
  32. plt.show()

 

20. Histogram of continuous variables

A histogram shows the frequency distribution of a given variable. The representation below groups frequency bars based on categorical variables, allowing for better understanding of continuous and series variables.

  1. # Import Data
  2. df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
  3. # Prepare data
  4. x_var = 'displ'
  5. groupby_var = 'class'
  6. df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)
  7. vals = [df[x_var].values.tolist() for i, df in df_agg]
  8. # Draw
  9. plt.figure(figsize=(16,9), dpi= 80)
  10. colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]
  11. n, bins, patches = plt.hist(vals, 30, stacked=True, density=False, color=colors[:len(vals)])
  12. # Decoration
  13. plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})
  14. plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)
  15. plt.xlabel(x_var)
  16. plt.ylabel("Frequency")
  17. plt.ylim(0, 25)
  18. plt.xticks(ticks=bins[::3], labels=[round(b,1) for b in bins[::3]])
  19. plt.show()

 



Category of website: technical article > Blog

Author:python98k

link:http://www.pythonblackhole.com/blog/article/78488/4e69a8588b420c95a2aa/

source:python black hole net

Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.

2 0
collect article
collected

Comment content: (supports up to 255 characters)