My Numpy Guide

由于最近在看python代码的时候发现很多地方难以理解,特别是用到Numpy来处理数据的部分。在这边整理一个官方教程Numpy User Guide的简单学习笔记。本文大量参考该博客 Respect

1. 基础

numpy的array类 (ndarray):一个同质多维数组(同质指所有元素类型相同)

1
2
a = np.arange(15).reshape(3, 5)
a #np.array
1
2
3
4
5
Output: >> 

array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])

ndarray的重要属性

1
2
3
4
5
6
7
8
print("ndarray类型为{}".format(type(a)))
print("ndarray的维度ndim为{}".format(a.ndim))
print("ndarray的shape为{}".format(a.shape))
print("ndarray总的元素数size为{}".format(a.size))
print("ndarray元素类型dtype为{}".format(a.dtype))
print("ndarray元素类型dtype.name为{}".format(a.dtype.name))
print("ndarray每个元素占据的字节数itemsize为{}".format(a.itemsize))
print("ndarray元素的buffer data为{}".format(a.data))
1
2
3
4
5
6
7
8
9
10
Output: >>

ndarray类型为<classnumpy.ndarray’>
ndarray的维度ndim为2
ndarrayshape为(3, 5)
ndarray总的元素数size为15
ndarray元素类型dtypeint32
ndarray元素类型dtype.nameint32
ndarray每个元素占据的字节数itemsize为4
ndarray元素的buffer data为<memory at 0x000002CFFE04BE48>

1.1 创建array

方法1:通过python的list或者tuple来创建

1
2
3
4
5
6
7
8
9
10
11
12
#具体做法就是传入一个list或tuple作为参数
a = np.array([1,2,3,4])
#一种典型的错误情况:a = np.array(1,2,3,4)

#如果传入由多个list或tuple构成的list或tuple作为参数,遵守这个规律:
#array transforms sequences of sequences into two-dimensional arrays,
#sequences of sequences of sequences into three-dimensional arrays,
#and so on.
b = np.array([(1.5,2,3), (4,5,6)])

#自定义元素类型
c = np.array( [ [1,2], [3,4] ], dtype=complex )

方法2:用占位符创建已知尺寸的数组。默认dtype是float64

这种方法主要用于不知道array中的数字,但是已知array的尺寸。这种占位方法的好处在于以后修改array数字时不用再扩张数组,因为扩张数组很耗时。

1
2
3
4
5
6
7
8
9
10
11
d=np.zeros((3, 4))
e=np.ones( (2,3,4), dtype=np.int16 ) #也可以自定义元素类型
f=np.empty( (2,3) ) #根据内存状态随机初始化数组

#arrange类似于Python内置的range
g=np.arange( 10, 30, 5 )
h=np.arange( 0, 2, 0.3 ) #接受小数参数

#arrange函数由于浮点数精度的原因,可能无法知道数组最后有多少个数
#为解决这一问题可以使用linspace函数:
i=np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2

创建array的所有方法:array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace, numpy.random.Generator.rand, numpy.random.Generator.randn, fromfunction, fromfile

1.2 打印元素

指定set_printoptions方法让ndarray打印时不省略元素。

1.3 基础数学运算

基础数学运算都在元素层面上进行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
a = np.array( [20,30,40,50] )
b = np.arange( 4 )
print("a-b是{}".format(a-b))
print("a**2是{}".format(a**2))
print("10*np.sin(a)是{}".format(10*np.sin(a)))
print("a<35是{}".format(a<35))
print("a*b是{}".format(a*b))

#矩阵乘法
A = np.array( [[1,1], [0,1]] )
B = np.array( [[2,0], [3,4]] )
print("A*B是{}".format(A*B))
print("A@B是{}".format(A@B)) #矩阵乘法
print("A.dot(B)是{}".format(A.dot(B))) #矩阵乘法

#+=,*=等运算符进行原地运算(需要加同类型的,如果不能强制转换就会报错)
1
2
3
4
5
6
7
8
9
10
11
12
13
Output: >>

a-b是[20 29 38 47]
a**2是[ 400 900 1600 2500]
10np.sin(a)是[ 9.12945251 -9.88031624 7.4511316 -2.62374854]
a<35是[ True True False False]
ab是[ 0 30 80 150]
A*B是[[2 0]
[0 4]]
A@B是[[5 4]
[3 4]]
A.dot(B)是[[5 4]
[3 4]]

不同dtype的ndarray间运算的转换逻辑是upcasting(大概就是会往精度高的方向自动转换,如果不能满足这一要求就会报错)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#很多一元运算都是ndarray类的方法
print("a是{}".format(a))
print("a.sum()是{}".format(a.sum()))
print("a.min()是{}".format(a.min()))
print("a.max()是{}".format(a.max()))
print()

b = np.arange(12).reshape(3,4)
print("b是{}".format(b))
print("b.sum(axis=0)是{}".format(b.sum(axis=0)))
#这个axis我一直没太搞懂怎么计算的
#0是垂直,1是水平,这种比较直观的。再高维度我就搞不懂了
#为了便于理解,我自己的理解是消融维:
#如果在这个维度上做加总之类的计算,最后的结果就没有这个维度了
print("b.min(axis=1)是{}".format(b.min(axis=1)))
print("b.cumsum(axis=1)是{}".format(b.cumsum(axis=1))) #cumulative sum
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Output: >>

a是[20 30 40 50]
a.sum()是140
a.min()是20
a.max()是50

b是[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
b.sum(axis=0)是[12 15 18 21]
b.min(axis=1)是[0 4 8]
b.cumsum(axis=1)是[[ 0 1 3 6]
[ 4 9 15 22]
[ 8 17 27 38]]

1.4 Universal Functions 逐元素

1
2
3
4
B是[0 1 2]
np.exp(B)是[1. 2.71828183 7.3890561 ]
np.sqrt(B)是[0. 1. 1.41421356]
np.add(B, C)是[2. 0. 6.]
1
2
3
4
5
6
Output: >>

B是[0 1 2]
np.exp(B)是[1. 2.71828183 7.3890561 ]
np.sqrt(B)是[0. 1. 1.41421356]
np.add(B, C)是[2. 0. 6.]

所有universal functions:all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, invert, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where

1.5 Indexing, Slicing and Iterating

切片还是有点绕的,使用之前自己先做几个测试一下。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#一维数组的情况
a = np.arange(10)**3
print("a是{}".format(a))
print("a[2]是{}".format(a[2]))
#带:的切片:start_index(取得到):end_index(取不到):stride
#省略start_index默认为0,省略end_index默认为最后一个索引,省略stride默认为1
print("a[2:5]是{}".format(a[2:5]))
a[:6:2] = 1000 #等于a[0:6:2] = 1000
print("a[:6:2] = 1000之后的a是{}".format(a))
print("a[ : :-1]是{}".format(a[ : :-1])) #ndarray反转
for i in a:
pass #总之ndarray是可以迭代的,是每个元素的迭代器
print()

#多维数组的情况
#在切片时没写的维度视作全部(:(一个维度)或...(任意多所需维度))
def f(x,y):
return 10*x+y
b = np.fromfunction(f,(5,4),dtype=int)
print("b是{}".format(b))
print("b[2,3]是{}".format(b[2,3]))
print("b[2][3]是{}".format(b[2][3]))
print("b[0:5, 1]是{}".format(b[0:5, 1]))
print("b[ : ,1]是{}".format(b[ : ,1]))
print("b[1:3, : ]是{}".format(b[1:3, : ]))
print("b[-1]是{}".format(b[-1]))
print()

c = np.array( [[[ 0, 1, 2],
[ 10, 12, 13]],
[[100,101,102],
[110,112,113]]])
#a 3D array (two stacked 2D arrays)
print("c是{}".format(c))
print("c[1,...]是{}".format(c[1,...]))
print("c[...,2]是{}".format(c[...,2]))

for element in b.flat: #对所有元素的迭代器
pass
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Output: >>

a是[ 0 1 8 27 64 125 216 343 512 729]
a[2]是8
a[2:5]是[ 8 27 64]
a[:6:2] = 1000之后的a是[1000 1 1000 27 1000 125 216 343 512 729]
a[ : :-1]是[ 729 512 343 216 125 1000 27 1000 1 1000]

b是[[ 0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]]
b[2,3]是23
b[2][3]是23
b[0:5, 1]是[ 1 11 21 31 41]
b[ : ,1]是[ 1 11 21 31 41]
b[1:3, : ]是[[10 11 12 13]
[20 21 22 23]]
b[-1]是[40 41 42 43]

c是[[[ 0 1 2]
[ 10 12 13]]

[[100 101 102]
[110 112 113]]]
c[1,…]是[[100 101 102]
[110 112 113]]
c[…,2]是[[ 2 13]
[102 113]]

其他相关方法:newaxis, ndenumerate, indices

2. Shape Manipulation 矩阵形状操作

1
2
3
4
5
6
a = np.array([[3., 7., 3., 4.],
[1., 4., 2., 2.],
[7., 2., 4., 9.]])
print("a是{}".format(a))
print("a.shape是{}".format(a.shape))
print()
1
2
3
4
5
6
Output: >>

a是[[3. 7. 3. 4.]
[1. 4. 2. 2.]
[7. 2. 4. 9.]]
a.shape是(3, 4)

2.1 Changing the shape of an array

1
2
3
4
5
6
#不改变a这一原始ndarray的方法:
print("a.ravel()是{}".format(a.ravel())) #抻平到一维
print("a.reshape(6,2)是{}".format(a.reshape(6,2)))
print("a.T是{}".format(a.T))
#元素顺序是C-style(也可以通过参数设置为FORTRAN-style)
#C-style就是这个顺序
1
2
3
4
5
6
7
8
9
10
11
12
13
Output: >>

a.ravel()是[3. 7. 3. 4. 1. 4. 2. 2. 7. 2. 4. 9.]
a.reshape(6,2)是[[3. 7.]
[3. 4.]
[1. 4.]
[2. 2.]
[7. 2.]
[4. 9.]]
a.T是[[3. 1. 7.] # 因为这边并没有改变原始矩阵的shape
[7. 4. 2.]
[3. 2. 4.]
[4. 2. 9.]]
1
2
3
#改变a这一原始ndarray的方法:
a.resize((2,6))
print(a)
1
2
3
4
Output: >>

[[3. 7. 3. 4. 1. 4.]
[2. 2. 7. 2. 4. 9.]]

2.2 Stacking together different arrays 不同矩阵的拼合

1
2
3
4
5
6
7
8
a = np.array([[9., 7.],[5., 2.]])
b = np.array([[1., 9.],[5., 1.]])

print('np.vstack((a,b))是{}'.format(np.vstack((a,b))))
#和row_stack相同

print('np.hstack((a,b))是{}'.format(np.hstack((a,b))))
#和column_stack相似但不同(见下面解释)
1
2
3
4
5
6
7
8
Output: >>

np.vstack((a,b))是[[9. 7.]
[5. 2.]
[1. 9.]
[5. 1.]]
np.hstack((a,b))是[[9. 7. 1. 9.]
[5. 2. 5. 1.]]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from numpy import newaxis
#函数column_stack会将一维数组视作列堆叠成二维数组
#堆叠二维数组时和hstack结果一样

#二维数组:一样
print('np.column_stack((a,b))是{}'.format(np.column_stack((a,b))))
print()

print('a[:,newaxis]是{}'.format(a[:,newaxis]))
#将a转化为二维列向量

print('np.column_stack((a[:,newaxis],b[:,newaxis]))是{}'.format(np.column_stack((a[:,newaxis],b[:,newaxis]))))

print('np.hstack((a[:,newaxis],b[:,newaxis]))是{}'.format(np.hstack((a[:,newaxis],b[:,newaxis]))))
print()

#一维数组:不一样
a = np.array([4.,2.])
b = np.array([3.,8.])
print('np.column_stack((a,b))是{}'.format(np.column_stack((a,b))))
print('np.hstack((a,b))是{}'.format(np.hstack((a,b))))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Output: >>

np.column_stack((a,b))是[[4. 3.]
[2. 8.]]

a[:,newaxis]是[[4.]
[2.]]
np.column_stack((a[:,newaxis],b[:,newaxis]))是[[4. 3.]
[2. 8.]]
np.hstack((a[:,newaxis],b[:,newaxis]))是[[4. 3.]
[2. 8.]]

np.column_stack((a,b))是[[4. 3.]
[2. 8.]]
np.hstack((a,b))是[4. 2. 3. 8.]

In general, for arrays with more than two dimensions, hstack stacks along their second axes, vstack stacks along their first axes, and concatenate allows for an optional arguments giving the number of the axis along which the concatenation should happen.

1
2
3
#r_和c_可以沿某一维度堆叠数字为一个ndarray
np.r_[1:4,0,4]
#如果用ndarray作为参数,就跟vstack和hstack差不多,但是可以定义维度
1
2
3
Output: >>

array([1, 2, 3, 0, 4])

2.3 Splitting one array into several smaller ones 拆分矩阵

1
2
3
4
5
6
7
8
9
10
#hsplit水平方向切割
a=np.array([[6., 7., 6., 9., 0., 5., 4., 0., 6., 8., 5., 2.],
[8., 5., 5., 7., 1., 8., 6., 7., 1., 8., 1., 0.]])

print(np.hsplit(a,3)) #平均切成三份
print()
print(np.hsplit(a,(3,4))) #在第三列、第四列后面切开

#vsplit垂直方向切割
#array_split可以沿任一方向切割
1
2
3
4
5
6
7
8
9
10
11
Output: >>

[array([[6., 7., 6., 9.],
[8., 5., 5., 7.]]), array([[0., 5., 4., 0.],
[1., 8., 6., 7.]]), array([[6., 8., 5., 2.],
[1., 8., 1., 0.]])]

[array([[6., 7., 6.],
[8., 5., 5.]]), array([[9.],
[7.]]), array([[0., 5., 4., 0., 6., 8., 5., 2.],
[1., 8., 6., 7., 1., 8., 1., 0.]])]

3. Copies and Views

3.1 直接引用

直接分配或通过函数调用可变object(这种情况下函数传参是直接引用的),(一般来说numpy直接用不会产生拷贝)

3.2 View or Shallow Copy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
c=a.view()

print('c is a? {}'.format(c is a))

print('c.base is a? {}'.format(c.base is a))
#c is a view of the data owned by a

print('c.flags.owndata? {}'.format(c.flags.owndata))
#改变c的形状(reshape)不会改变a的形状,但是改变c的数据会改变a的数据

#slicing相当于返回一个view
s = a[ : , 1:3]
s[:] = 10 #s[:] is a view of s.
print(a)

3.3 Deep Copy

1
2
3
4
5
d = a.copy()  #一个全新的ndarray,数据都是新建的
print('d is a? {}'.format(d is a))
print('d.base is a? {}'.format(d.base is a))
d[0,0] = 9999
print(a)
1
2
3
4
5
6
7
8
9
10
11
12
13
Output: >>

c is a? False
c.base is a? True
c.flags.owndata? False

[[ 6. 10. 10. 9. 0. 5. 4. 0. 6. 8. 5. 2.]
[ 8. 10. 10. 7. 1. 8. 6. 7. 1. 8. 1. 0.]]

d is a? False
d.base is a? False
[[ 6. 10. 10. 9. 0. 5. 4. 0. 6. 8. 5. 2.]
[ 8. 10. 10. 7. 1. 8. 6. 7. 1. 8. 1. 0.]]

如果有中间变量的一部分需要被复制,建议先切片后调用copy方法

3.4 Functions and Methods Overview

全部函数列表

Array Creation

arange, array, copy, empty, empty_like, eye, fromfile, fromfunction, identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r_, zeros, zeros_like

Conversions

ndarray.astype, atleast_1d, atleast_2d, atleast_3d, mat

Manipulations

array_split, column_stack, concatenate, diagonal, dsplit, dstack, hsplit, hstack, ndarray.item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes, take, transpose, vsplit, vstack

Questions

all, any, nonzero, where

Ordering

argmax, argmin, argsort, max, min, ptp, searchsorted, sort

Operations

choose, compress, cumprod, cumsum, inner, ndarray.fill, imag, prod, put, putmask, real, sum

Basic Statistics

cov, mean, std, var

Basic Linear Algebra

cross, dot, outer, linalg.svd, vdot

4. Less Basic:Broadcasting rules 广播

如果universal functions的输入参数拥有不同的shape:对不同维数组,用1扩充较小的数组;某一维度上如果只有一个元素,让它表现得像在这个维度上有一组相同元素(长度为该维度最长长度)。广播规则的更多细节见:Broadcasting

5. Advanced indexing and index tricks(用整数列表和布尔列表来索引)

5.1 整数列表索引

1
2
3
4
5
6
7
a = np.arange(12)**2
i = np.array([1, 1, 3, 8, 5])
print('a[i]是{}'.format(a[i]))
print()

j = np.array([[3, 4], [9, 7]])
print('a[j]是{}'.format(a[j]))
1
2
3
4
5
6
Output: >>

a[i]是[ 1 1 9 64 25]

a[j]是[[ 9 16]
[81 49]]

对多维数租进行索引:标量索引对应数组第一个维度的对应数据

1
2
3
4
5
6
7
8
9
#一个图片的示例
palette = np.array([[0, 0, 0], # black
[255, 0, 0], # red
[0, 255, 0], # green
[0, 0, 255], # blue
[255, 255, 255]]) # white
image = np.array([[0, 1, 2, 0], # each value corresponds to a color in the palette
[0, 3, 4, 0]])
palette[image] # the (2, 4, 3) color image
1
2
3
4
5
6
7
8
9
10
11
Output: >>

array([[[ 0, 0, 0],
[255, 0, 0],
[ 0, 255, 0],
[ 0, 0, 0]],

[[ 0, 0, 0],
[ 0, 0, 255],
[255, 255, 255],
[ 0, 0, 0]]])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#用多维索引对多维数组进行索引:对每一维度切片的索引数组必须有相同的shape
a = np.arange(12).reshape(3,4)
print('a是{}\n'.format(a))
i = np.array([[0, 1], # indices for the first dim of a
[1, 2]])
j = np.array([[2, 1], # indices for the second dim
[3, 3]])

print('a[i, j]是{}\n'.format(a[i, j])) # i and j must have equal shape
print('a[i, 2]是{}\n'.format(a[i, 2]))
print('a[:, j]是{}'.format(a[:, j])) # i.e., a[ : , j]
#最后这个应该就是(2×2)矩阵,每个元素位置上放一个a[:,n]的向量。
#为什么是这个方向我有点没看懂,但是我能想象它的立体排布

#arr[i, j] is exactly the same as arr[(i, j)]

#不能这样把i和j叠起来做索引:s=np.array([i, j])
#但是可以用tuple(s)作为索引,效果等如[i,j]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Output: >>

a是[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

a[i, j]是[[ 2 5]
[ 7 11]]

a[i, 2]是[[ 2 6]
[ 6 10]]

a[:, j]是[[[ 2 1]
[ 3 3]]

[[ 6 5]
[ 7 7]]

[[10 9]
[11 11]]]

找最大元素的索引

1
2
3
4
time = np.linspace(20, 145, 5)                 # time scale
data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series
print(time)
print(data)
1
2
3
4
5
6
7
8
Output: >>

[ 20. 51.25 82.5 113.75 145. ]
[[ 0. 0.84147098 0.90929743 0.14112001]
[-0.7568025 -0.95892427 -0.2794155 0.6569866 ]
[ 0.98935825 0.41211849 -0.54402111 -0.99999021]
[-0.53657292 0.42016704 0.99060736 0.65028784]
[-0.28790332 -0.96139749 -0.75098725 0.14987721]]
1
2
3
# index of the maxima for each series
ind = data.argmax(axis=0)
print(ind)
1
2
3
Output: >>

[2 0 3 1]
1
2
time_max = time[ind]
print(time_max)
1
2
3
Output: >>

[ 82.5 20. 113.75 51.25]
1
2
3
4
data_max = data[ind, range(data.shape[1])]
# => data[ind[0],0], data[ind[1],1]...

print(data_max)
1
2
3
Output: >>

[0.98935825 0.84147098 0.99060736 0.6569866 ]
1
print(np.all(data_max == data.max(axis=0)))
1
2
3
Output: >>

True

用整数列表索引来赋值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
a = np.arange(5)
print(a)
a[[1,3,4]] = 0
print(a)

#用列表索引赋值时,如果列表索引有重复,只会留下最后的赋值
#因为会运行多次赋值工作
a = np.arange(5)
a[[0,0,2]]=[1,2,3]
print(a)

#但是如果用+=:
a = np.arange(5)
a[[0,0,2]]+=1
print(a)
#对这种现象的解释是:
#Even though 0 occurs twice in the list of indices,
#the 0th element is only incremented once.
#This is because Python requires “a+=1” to be equivalent to “a = a + 1”.
#我没看懂,但是反正是有这种现象
1
2
3
4
5
6
Output: >>

[0 1 2 3 4]
[0 0 2 0 0]
[2 1 3 3 4]
[1 1 3 3 4]

5.2 布尔列表索引

情况1:索引数组和原数组拥有同样的shape

1
2
3
4
5
6
7
a = np.arange(12).reshape(3,4)
print(a)
b = a > 4
print(b)
print(a[b])
a[b] = 0
print(a) #可以直接用以赋值
1
2
3
4
5
6
7
8
9
10
11
12
Output: >>

[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[False False False False]
[False True True True]
[ True True True True]]
[ 5 6 7 8 9 10 11]
[[0 1 2 3]
[4 0 0 0]
[0 0 0 0]]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#画曼德布洛特集合
#(函数没看懂,但是反正就是布尔索引和被切片数组shape相同这种情况)
import numpy as np
import matplotlib.pyplot as plt
def mandelbrot( h,w, maxit=20 ):
"""Returns an image of the Mandelbrot fractal of size (h,w)."""
y,x = np.ogrid[ -1.4:1.4:h*1j, -2:0.8:w*1j ]
c = x+y*1j
z = c
divtime = maxit + np.zeros(z.shape, dtype=int)

for i in range(maxit):
z = z**2 + c
diverge = z*np.conj(z) > 2**2 # who is diverging
div_now = diverge & (divtime==maxit) # who is diverging now
divtime[div_now] = i # note when
z[diverge] = 2 # avoid diverging too much

return divtime

plt.imshow(mandelbrot(400,400))
1
2
3
Output: >>

<matplotlib.image.AxesImage at 0x2cf82a71088>

情况2:跟整数数组索引类似,对被切片数组每个维度用一维索引数组切片

索引数组必须要跟被切片数组对应维度等长

1
2
3
4
5
6
7
8
9
10
11
12
a = np.arange(12).reshape(3,4)
print(a)
print()
b1 = np.array([False,True,True]) # first dim selection
b2 = np.array([True,False,True,False]) # second dim selection
print(a[b1])
print()
print(a[b1,:])
print()
print(a[:,b2])
print()
print(a[b1,b2]) #……?这个结果为什么是这样
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Output: >>

[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

[[ 4 5 6 7]
[ 8 9 10 11]]

[[ 4 5 6 7]
[ 8 9 10 11]]

[[ 0 2]
[ 4 6]
[ 8 10]]

[ 4 10]

5.3 ix_():可以计算多个数组逐元素运算的情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#举例来说,可以对abc三个向量每个三元组计算a+b*c
a = np.array([2,3,4,5])
b = np.array([8,5,4])
c = np.array([5,4,6,8,3])
ax,bx,cx = np.ix_(a,b,c)
print(ax)
print(bx)
print(cx)
print(ax.shape)
print(bx.shape)
print(cx.shape)

result = ax+bx*cx
print(result)

print(result[3,2,4])
print(a[3]+b[2]*c[4])
#虽然我还是没看懂具体如何实现的,但是我看懂它在干啥了,我会用了
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Output: >>

[[[2]]

[[3]]

[[4]]

[[5]]]

[[[8]
[5]
[4]]]

[[[5 4 6 8 3]]]

(4, 1, 1)

(1, 3, 1)

(1, 1, 5)

[[[42 34 50 66 26]
[27 22 32 42 17]
[22 18 26 34 14]]

[[43 35 51 67 27]
[28 23 33 43 18]
[23 19 27 35 15]]

[[44 36 52 68 28]
[29 24 34 44 19]
[24 20 28 36 16]]

[[45 37 53 69 29]
[30 25 35 45 20]
[25 21 29 37 17]]]

17
17
1
2
3
4
5
6
7
8
9
10
11
12
13
def ufunc_reduce(ufct, *vectors):
vs = np.ix_(*vectors)
r = ufct.identity
for v in vs:
r = ufct(r,v)
return r

ufunc_reduce(np.add,a,b,c)

#这个ufunc.reduce还没有搞懂是什么东西
#用这个版本的reduce相比ufunc.reduce的好处在于:
#利用了广播机制,不用创造尺寸为output * the number of vectors的参数数组
#(什么玩意,没看懂)j
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Output: >>

array([[[15, 14, 16, 18, 13],
[12, 11, 13, 15, 10],
[11, 10, 12, 14, 9]],

[[16, 15, 17, 19, 14],
[13, 12, 14, 16, 11],
[12, 11, 13, 15, 10]],

[[17, 16, 18, 20, 15],
[14, 13, 15, 17, 12],
[13, 12, 14, 16, 11]],

[[18, 17, 19, 21, 16],
[15, 14, 16, 18, 13],
[14, 13, 15, 17, 12]]])

5.4 字符串索引

字符串索引

6. Linear Algebra(官方文档这一模块还在建设)

6.1 Simple Array Operations

可以看numpy文件夹中的linalg.py查看详情(我看了,没看懂)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#例子也没看懂,我就抄一遍完了
a = np.array([[1.0, 2.0], [3.0, 4.0]])
print(a)
print(a.transpose())
print(np.linalg.inv(a))

u = np.eye(2) #I
print(u)

j = np.array([[0.0, -1.0], [1.0, 0.0]])

print(j @ j)
print(np.trace(u))

y = np.array([[5.], [7.]])
print(np.linalg.solve(a, y))

print(np.linalg.eig(j))
#返回值第一个元素是特征值,第二个元素是对应的特征向量(normalized)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Output: >>

[[1. 2.]
[3. 4.]]

[[1. 3.]
[2. 4.]]

[[-2. 1. ]
[ 1.5 -0.5]]

[[1. 0.]
[0. 1.]]

[[-1. 0.]
[ 0. -1.]]

2.0

[[-3.]
[ 4.]]

(array([0.+1.j, 0.-1.j]), array([[0.70710678+0.j , 0.70710678-0.j ],
[0. -0.70710678j, 0. +0.70710678j]]))

7. Tricks and Tips

7.1 自动计算元素长度

reshape时某维度用-1可以自动计算对应维度的元素长度

7.2 堆向量

用hstack和vstack堆向量

7.3 Histograms

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import numpy as np
rg = np.random.default_rng(1)
import matplotlib.pyplot as plt
# Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2
mu, sigma = 2, 0.5
v = rg.normal(mu,sigma,10000)
# Plot a normalized histogram with 50 bins
plt.hist(v, bins=50, density=1) # matplotlib version (plot)直方图
# Compute the histogram with numpy and then plot it
(n, bins) = np.histogram(v, bins=50, density=True) # NumPy version (no plot)
plt.plot(.5*(bins[1:]+bins[:-1]), n) #折线图(这个计算平均值的方法好绝)


#matplotlib的hist方法直接就画了(蓝框)
#numpy的histogram方法返回两个向量:输入参数的直方图,bin_edges(那个长方形的宽长度)
1
2
3
Output: >>

[<matplotlib.lines.Line2D at 0x2cf82cab548>]

8. 其他

1
2
from numpy import pi
print(pi)
1
2
3
Output: >>

3.141592653589793
1
2
#random
rg = np.random.default_rng(1)
1
2
3
Output: >>

‘int32’

9. Further Reading

作者

Jhuoer Yen

发布于

2024-03-20

更新于

2024-03-20

许可协议

评论

Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×