{DAY 16} Pandas 学习笔记 part.2

前言

Pandas 是强大的资料科学分析工具，结合前几天所学的NumPy特性

提供方便读取及使用的资料结构

来处理关联式数据和有标签的数据

上一章介绍了Series的概念
今天要进行延伸的学习
分成两部分：

取得特定值
与NumPy的整合

看的课程是Coursera上的 Introduction to Data Science in Python

Coursera | Online Courses & Credentials From Top Educators. Join for Free | Coursera

练习是利用影片教的还有以前在学校教过的
重新整理有关Series的笔记

取得特定值（资料）

Pandas里可以透过标签或是索引值来取得特定栏位的数值

若是要透过标签的话是使用.loc( )

若是要透过索引值的话是使用.iloc( )

(记得索引值是从０开始的)

music_chart = {"Ed Sheeran":"Bad habits", "Justin Biber": "STAY","Cardie B":"Rumors","Anne-Maire":"Kiss My"}
m_c = pd.Series(music_chart)
m_c
'''
Ed Sheeran      Bad habits
Justin Biber          STAY
Cardie B            Rumors
Anne-Maire         Kiss My
dtype: object
'''
m_c.iloc[0]

如果我们想用索引值找出"Bad habits" 这首歌

m_c.iloc[0]
'''
'Bad habits'
'''

Series 也支援的切片的功能: .iloc[索引值起点：索引值结束点]

m_c.iloc[1:]
'''
Justin Biber       STAY
Cardie B         Rumors
Anne-Maire      Kiss My
dtype: object
'''

如果我们想用标签找出Justin Biber唱的歌

m_c.loc["Justin Biber"]
'''
'STAY'
'''

提取多笔不连续的资料可以透过指定多笔数字的索引值或标签

使用loc.( )也可以使用来取代指定index原本的value

m_c.loc["Ed Sheeran", "Justin Biber"] = ["Visiting Hours", "Ghost"]
m_c
'''
Ed Sheeran      Visiting Hours
Justin Biber             Ghost
Cardie B                Rumors
Anne-Maire             Kiss My
dtype: object
'''

可以看到原本的Bad habits被Visiting Hours取代掉, STAY被Ghost取代掉

删除 Series 内的资料，使用.drop(labels=" ")

m_c.drop(labels=["Cardie B", "Justin Biber"])
'''
Ed Sheeran    Visiting Hours
Anne-Maire           Kiss My
dtype: object
'''

与 NumPy 的整合

import numpy as np

#产生偶数
even_num = np.arange(2,11,2)
print("even:",even_num)

#产生乱数 Series
import numpy as np
rand_array = np.random.randint(0,1000,10)
print("random:",rand_array)

'''
even: [ 2  4  6  8 10]
random: [571 217 128 954 472 651 606 122 903 407]
'''

#变成Series
pd.Series(rand_array)
'''
0    571
1    217
2    128
3    954
4    472
5    651
6    606
7    122
8    903
9    407
dtype: int64
'''

用 head 查询前五笔资料.head( )

括号内可以指定要查询的数量，像是.head(8)，可以查询前8笔
用 tail 查询後五笔资料.tail( )

用 take 指定查询索引值，

假如要指定查询索引值为 2, 5 的资料，.take([2, 5])

#产生从0到1000之间的随意整数10000个
import numpy as np
numbers = pd.Series(np.random.randint(0,1000,10000)) 
print("前五笔资料:", numbers.head())
print("前三笔资料:", numbers.head(3))
print("後五笔资料:", numbers.tail())
print("索引值为 2, 5 的资料:", numbers.take([2, 5]))
'''
前五笔资料: 0    571
1    224
2    408
3    984
4    617
dtype: int64
前三笔资料: 0    571
1    224
2    408
dtype: int64
後五笔资料: 9995     47
9996    424
9997     63
9998    190
9999    258
dtype: int64
索引值为 2, 5 的资料: 2    408
5    815
dtype: int64
'''

检查输入的 values 是否在 series 里面，使用.isin( )

m_c.isin(["Visiting Hours", "Ghost"])
'''
Ed Sheeran       True
Justin Biber     True
Cardie B        False
Anne-Maire      False
dtype: bool
'''

Series 的逐元运算

先随意创建五个学生的分数

scores = pd.Series([85,43,65,79,30], index = ["a","b","c","d","e"])
scores
'''
a    85
b    43
c    65
d    79
e    30
dtype: int64
'''

当老师想看有谁及格的时候

scores>60
'''
a     True
b    False
c     True
d     True
e    False
dtype: bool
'''

可以直接用运算元比较，看到b跟e是不及格的

若是想单独看到有及格的人，可以直接使用[ ]

产生一个新的series，里面的人的分数是有及格的

scores[scores>60]
'''
a    85
c    65
d    79
dtype: int64
'''

假设老师想让所有人的分数都除以10再加上85

可以直接用series运算

scores = (scores/10) + 85
'''
a    93.5
b    89.3
c    91.5
d    92.9
e    88.0
dtype: float64
'''

小结

明天会进入到类似SQL架构的Dataframe

就可以更进阶的操作一笔庞大的数据

进行资料初步的清洗

<<: 企划实现(13)

>>: Day 16 : 模型衡量指标

【Day 14】OSM 浅谈 part 2

杂谈

Day25-好用的网页服务器-nginx（一）

杂谈

Kotlin Android 第28天，从 0 到 ML - TensorFlow Lite -姿态估计 (Pose estimation)

杂谈

【D12】制作图表：加权指数和交易金额的图表

杂谈

Chapter4 用音乐做动画结合前三章学习的内容，一口气冲刺吧！

杂谈

[Day20]集合运算符实作

在HR帐户的employees, job_history资料表中，查询公司内从来没有更换过工作的员工...

[拯救上班族的 Chrome 扩充套件] Chrome Extention 的讯息传接球

Hi 各位大大~ 今天要来分享在 Chrome extension 讯息传递的部分，主要算是官方文...

[Day19]集合运算符

这篇文章一开始要介绍的是两种联集运算符：UNION和UNION ALL。 UNION 说明和使用规则...

Day2:安装Azure AD Connector同步至M365遇到TLS 1.2卡卡要怎麽办

当我们正准备将企业AD帐号透过传送门(Azure AD Connector)运送到Microsoft...

[Day9]参观乙太链与区块链

那我们就实际上网去看一下乙太链跟区块链长怎麽样吧! 参观乙太链这边分别是最近新增的区块以及最近完...