posted on 2024-11-02 14:01 read(185) comment(0) like(1) collect(3)
This article details the relationship between artificial intelligence , data analysis, and deep learning , and provides an introductory guide to the Pandas library required for data analysis. Happy reading!
This article is the second article of the column "Data Analysis Encyclopedia" , which is now renamed "Data Analysis" . When I was writing this article, I suddenly realized that -It is impossible for me to explain every aspect of data analysis clearly. I just share what I know and then output the knowledge I understand.. So the word " encyclopedia " in "Data Analysis Encyclopedia" is really not enough, so it was changed to "Data Analysis" . This article mainly introduces the knowledge points related to Python Pandas
in data analysis , and intends to help you through this articleGetting Started with Python Pandas, master the basic usage and ideas .
The previous issue of " Data Analysis Encyclopedia " - Numpy Basics may have focused too much on the code and neglected the explanation. If you are a beginner, you may not know what it is about and why it is talked about.
Strong practicality and low threshold are the necessary factors for a good article. The previous article emphasized practicality too much. As a result, the article is short, but except for colleagues who have already started or are engaged in related work, few can understand what is said. Therefore, this article learns from the previous lessons and carefully improves the wording of the article before submitting it , adding sentences such as connection and examples between paragraphs to make it easier for novices to understand .
Let me fill in the pit of the previous issue and talk about the differences and connections between data analysis and deep learning .
Data analysis and deep learning are both new technologies , and the emergence of new technologies is to solve real-world problems . We can tentatively divide real-world problems into simple problems and complex problems .Simple problem, only simple analysis is needed, we use data analysis .Complex problems require complex analysis, we use machine learning .
- So what are simple problems and what are complex problems?
Simple problems are such as the selection of college scholarships this year and the company's performance today. The amount of data is not very large , so we use data analysis .
And the shopping apps such as Taobao and JD.com that we use every day will recommend products that you may be interested in based on your historical shopping habits (which contain a huge amount of data). How is it done? For such complex problems , machine learning and corresponding recommendation algorithms are used behind these apps .
Artificial intelligence has a wide range. In a broad sense, artificial intelligence refers to the use of computers (machines) to realize human thinking and make machines make decisions like humans .
Machine learning is a technology that realizes artificial intelligence . There are many methods (algorithms) in machine learning, and different methods solve different problems . Deep learning is a branch of machine learning .
To summarize: the relationship between artificial intelligence, machine learning and deep learning is:Artificial intelligence includes machine learning, and machine learning includes deep learning (method),Right nowData Analysis > Machine Learning > Deep Learning > Machine Learning。
Deep learning has achieved very good results in the classification and recognition of rich media such as images and voices , so major research institutions and companies have invested a lot of manpower in related research and development.
For example, in 2016, AlphaGo, developed by Google's DeepMind, defeated the top human Go players. The main working principle of AlphaGo is " deep learning ".
Ahem, I have strayed off topic. I haven’t talked about Pandas in this article yet.
Before learning anything, we should understand two questions:What does it do? What can I do with it?
I believe that some people, like me when I first started learning data structures, have many questions about this library called "Pandas" - What is Pandas? Where did the word Pandas come from? What does Pandas do? ... Let's solve these confusions together.
First of all, what is Pandas? Is it Panda→Panda ?
This sounds cool... but obviously we can't use pandas to help us with data analysis. In fact, Pandas is an open source, BSD-licensed library that provides high-performance, easy-to-use data structures and data analysis tools .
So, where did the term Pandas come from ?
The name Pandas is derived from the terms "panel data" and "Python data analysis". In general, Pandas is a powerful tool set for analyzing structured data, based on Numpy (which provides high-performance matrix operations).
That sounds a bit clearer. Let’s take a look at it again.What is Pandas used for?。
Pandas can import data from various file formats such as CSV, JSON, SQL, and Microsoft Excel.
Pandas can perform various operations on data, such as merging, reshaping, selection, data cleaning, and data processing features.
Pandas is widely used in various data analysis fields such as academia, finance, and statistics.
Let's summarize: Pandas is a Python library created based on Numpy, which provides easy-to-use data structures and data analysis tools for Python . Just remember this sentence and you can continue with our next study!
In Python, we can use the following statement to import the Pandas library :
>>> import pandas as pd
First, let's look at the series . A Pandas Series is like a column in a table, similar to a one-dimensional array, and can store any data type. A Series consists of an index and a column . The function is as follows:
pandas.Series( data, index, dtype, name, copy)
Let's briefly explain the above parameters :
data: A set of data (ndarray type).
index: Data index label. If not specified, it starts from 0 by default.
dtype: Data type, the default is to judge by yourself.
name: Set the name.
copy: Copy data, default is False. Think about it, if you want to implement a one-dimensional
array that stores any type of data (as shown below), how should it be implemented? Here is the implementation code:
>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
A DataFrame is a tabular data structure that contains an ordered set of columns, each of which can be a different value type (such as numeric, string, Boolean) . A DataFrame has both row and column indices .Can be viewed as a dictionary of Series (with a common index)
If we want to implement the above two-dimensional array that stores different types of data , we can do it like this:
>>> data = {'Country': ['Belgium', 'India', 'Brazil'], 'Capital': ['Brussels', 'New Delhi', 'Brasília'],'Population': [11190846, 1303171035, 207847528]}
>>> df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population'])
Before solving this problem, let's first understand what CSV is:
CSV (Comma-Separated Values, sometimes also called character-separated values because the separator character can be other than a comma) files store tabular data (numbers and text) in plain text form .
CSV is a common and relatively simple file format, widely used by users, business and science.
Pandas can easily handle CSV files :
>>> pd.read_csv('file.csv', header=None, nrows=5)
>>> df.to_csv('myDataFrame.csv')
When solving problems, it often involves reading or writing data from Excel . The following is a code implementation of the relevant code. There is also a code implementation for reading data from Excel containing multiple tables :
>>> pd.read_excel('file.xlsx')
>>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')
# 读取内含多个表的Excel
>>> xlsx = pd.ExcelFile('file.xls')
>>> df = pd.read_excel(xlsx, 'Sheet1')
The code for reading and writing SQL queries and database tables is as follows:
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///:memory:')
>>> pd.read_sql("SELECT * FROM my_table;", engine)
>>> pd.read_sql_table('my_table', engine)
>>> pd.read_sql_query("SELECT * FROM my_table;", engine)
read_sql() is a convenient wrapper around read_sql_table() and read_sql_query()
>>> pd.to_sql('myDf', engine)
Of course, there are many strange problems encountered during the development process.Post a question on the technical forum、Ask for help from senior brothers and sisters, we also need to learn to view the help document ourselves : the code
to call the help is as follows:
>>>help(pd.Series.loc)
When getting values, we can get the value of the sequence or the value of the data frame . The following is the code implementation of getting the sequence value and getting the data frame subset , you can refer to it:
# 取序列的值
>>> s['a']
-5
# 取数据框的子集
>>> df[1:]
Country Capital Population
1 India New Delhi 1303171035
2 Brazil Brasília 207847528
When we select certain data according to needs , it often involves selecting a value by row and column position . The specific code is given below:
# 按行与列的位置选择某值
>>> df.iloc[[0],[0]]
'Belgium'
>>> df.iat([0],[0])
'Belgium'
The code to select a value by row and column name is as follows:
# 按行与列的名称选择某值
>>> df.loc[[0], ['Country']]
'Belgium'
>>> df.at([0], ['Country'])
'Belgium'
We can also select a row or a column :
# 选择某行
>>> df.ix[2]
Country Brazil
Capital Brasília
Population 207847528
# 选择某列
>>> df.ix[:,'Capital']
0 Brussels
1 New Delhi
2 Brasília
>>> df.ix[1,'Capital']
'New Delhi'
Pandas supports selection by physical order as well as by logic . Here are a few examples:
>>> s[~(s > 1)] # 序列 S 中没有大于1的值
>>> s[(s < -1) | (s > 2)] # 序列 S 中小于-1或大于2的值
>>> df[df['Population']>1200000000] # 序列 S 中小于-1或大于2的值
You can also set the value of the index item :
>>> s['a'] = 6 # 将序列 S 中索引为 a 的值设为6
Delete the values of a sequence by index :
>>> s.drop(['a', 'c']) # 按索引删除序列的值 (axis=0)
>>> df.drop('Country', axis=1) # 按索引删除序列的值 (axis=0)
Now that we have introduced the basic additions, deletions, queries, and modifications, let's introduce the following sorting. The following are the codes for sorting by index, sorting by the value of a column, and sorting by the value of a column :
>>> df.sort_index() # 按索引排序
>>> df.sort_values(by='Country') # 按某列的值排序
>>> df.rank() # 按某列的值排序
Now that sorting is introduced, let's talk about querying . Here are two methods for obtaining row and column indexes and basic information of the data frame :
>>> df.shape # (行,列))
>>> df.index # 获取索引
>>> df.columns # 获取索引
>>> df.info() # 获取数据框基本信息
>>> df.count() # 获取数据框基本信息
Common function implementation functions are summarized as follows:
>>> df.sum() # 合计
>>> df.cumsum() # 合计
>>> df.min()/df.max() # 最小值除以最大值
>>> df.idxmin()/df.idxmax() # 最小值除以最大值
>>> df.describe() # 基础统计数据
>>> df.mean() # 平均值
>>> df.median() # 中位数
Here is givenSeveral commonly used functionsThe calling method:
>>> f = lambda x: x*2 # 应用匿名函数lambda
>>> df.apply(f) # 应用函数
>>> df.applymap(f) # 应用函数
In case of inconsistent indices, NA values are used:
>>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
>>> s + s3
a 10.0
b NaN
c 5.0
d 7.0
You can also use the Fill method to perform internal alignment operations :
>>> s.add(s3, fill_value=0)
a 10.0
b -5.0
c 5.0
d 7.0
>>> s.sub(s3, fill_value=2)
>>> s.div(s3, fill_value=4)
>>> s.mul(s3, fill_value=3)
This issue has introduced the relationship between artificial intelligence, data analysis, and deep learning . The focus of this article is on the quick introduction of Pandas . It would be great if it can help you in scientific research projects, engineering development, and daily learning !Next, we will introduce advanced knowledge of Pandas.(Because this article is too long, I split it into two parts.)
Thank you very much for reading, and welcome your valuable suggestions! See you next week!
Author:Believesinkinto
link:http://www.pythonblackhole.com/blog/article/245756/1f01c7bc5ff000d8099d/
source:python black hole net
Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.
name:
Comment content: (supports up to 255 characters)
Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7
For complaints and reports, and advertising cooperation, please contact vgs_info@163.com or QQ3083709327
Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!