Python函数、文件IO和正则表达式

本文主要介绍python基础入门级语法。

1.函数

函数的定义
语法格式：

def  函数名（参数列表）：
	函数体
return 返回值

普通函数
>>> def myAdd(x,y):
return x+y
>>> myAdd(10,20)
30

函数参数默认值
>>> def mySub(x=100,y=10):
return x-y
>>> mySub()
90

缺省参数
>>> def mySub(x=100,y=10):
return x-y
>>> mySub(y=20)						//x的值默认，y的值为20
80
>>> def fun(x,y=10):				//x没有值，y有默认值是可以的
return x-y
>>> def fun(x=10,y):				//x有值，y没有值是不允许的
return x-y
SyntaxError: non-default argument follows default argument

lambda函数(匿名函数)

>>> def f(x,y):
return x*y
>>> f(2,3)
6
>>> g = lambda x,y:x*y
>>> g(2,3)
6
>>> type(g)
<type 'function'>
>>> print g
<function <lambda> at 0x02CE4B70>

generator函数

>>> def fun(n):
	for i in range(n):
		yield i	
>>> for i in fun(3):
	print i
0
1
2
>>> r = fun(3)
>>> print r.next()
0
>>> print r.next()
1
>>> print r.next()
2

模块和包
模块名就是文件名。
导入模块方式：import、import xxx as yyy、from xxx import yyy
调用模块里的函数：xxx.yyy # xxx为模块名，yyy为函数名

多个模块放一起就是包，包里必须包含一个文件：init.py文件，内容为空也可以。
包的使用：import xxx.yyy #xxx为包名，yyy为模块名

2.文件操作

Python提供了os、os.path、shutil等模块用于处理文件。

文件的打开或创建
文件的打开或创建可以使用file()函数。
file(name,mode,buffer)
参数说明：
name：文件名
mode：打开文件的模式
buffer：设置缓存模式。0表示不缓存；1表示缓存；大于1表示缓冲区大小，单位字节。
文件打开模式：

| 参数 | 说明 |
| —- | ——————————————————– |
| r | 以只读方式打开文件。 |
| w | 以写入方式打开文件。若文件存在，则覆盖；不存在，则创建。 |
| a | 以写入方式打开文件。若文件存在，则追加；不存在，则创建。 |
| r+ | 以读写方式打开文件。 |
| w+ | 以读写方式打开文件。若文件存在，则覆盖；不存在，则创建。 |
| a+ | 以读写方式打开文件。若文件存在，则追加；不存在，则创建。 |
| b | 以二进制方式打开。可与rwa+结合使用 |
| U | 支持所有的换行符号。’\r’、’\n’都表示换行 |

注意：图片、视频等文件必须使用b模式读写。

file类的属性和方法：,

| 属性和方法 | 说明 |
| —————— | ———————————- |
| Closed | 判断文件是否关闭。如果关闭返回True |
| Encoding | 显示文件的编码类型 |
| Mode | 显示文件的打开模式 |
| Name | 显示文件的名称 |
| Newlines | 文件使用的换行模式 |
| file() | file(name,mode,buffer) |
| flush() | 将缓冲区的内容写入磁盘 |
| close() | 关闭文件 |
| read() | 从文件读，返回值为读到的内容 |
| readline() | 从文件读一行 |
| Readlines() | 从文件读所有行 |
| seek(offset,where) | Offset为位置，where是参考点 |
| tell() | 返回文件指针的位置 |
| next() | 返回下一行的内容 |
| truncat() | 删除内容 |
| write() | 写入内容 |
| writelines() | 写入内容 |
文件读取
(1) 按单行读取方式readline()
(2) 按多行读取方式readlines()
(3) 按一次性读取方式read()
文件写入
使用write()，writelines()方法写入。

文件读写实例

文件写入
>>> f = file('tmp.txt','w+')
>>> f.write('This is temp file\n')
>>> f.flush()
>>> f.close()

文件读取
>>> f = file('tmp.txt','r+')
>>> f.readline()
'This is temp file\n'
>>> f.seek(0,0)
>>> f.read()
'This is temp file\n'

文件删除
文件删除需要使用os模块和os.path模块。
os模块常用的文件处理函数

| 函数 | 说明 |
| ————————- | ———————————————— |
| access(path,mode) | 按照mode指定的权限访问文件 |
| chmod(path,mode) | 修改文件的访问权限。mode使用unix系统中的权限 |
| open(name,flag,mode) | 按照mode指定的权限访问文件 |
| remove(path) | 删除path指定的文件 |
| rename(old,new) | 重命名文件或目录 |
| stat(path) | 返回path指定文件的所有属性 |
| fstat(path) | 返回打开文件的所有属性 |
| lseek(fd,pos,how) | 设置文件的当前位置，返回当前位置的字节数 |
| startfile(path,operation) | 启动关联程序打开文件。例如，1.html，将启动浏览器 |
| tmpfile() | 在系统的临时目录创建一个临时文件 |

注意：os模块的open函数与系统内建的file、open函数用法不一样

os.path模块常用的文件处理函数

函数	说明
abspath(path)	返回path所在的绝对路径
dirname(p)	返回目录的路径
exists(path)	判断文件是否存在
getatime(filename)	返回文件的访问时间
getctime(filename)	返回文件的创建时间
getmtime(filename)	返回文件的修改时间
getsize(filename)	返回文件的大小
isabs(s)	测试路径是否为绝对路径
isdir(path)	判断path是否为目录
isfile(path)	判断path是否为文件
split(p)	对路径进行分隔，并以列表的方式返回
splitext(p)	从路径中分隔文件的扩展名
splitdrive(p)	从路径中分离驱动器的名称
walk(top,func,arg)	遍历目录树

目录的基本操作
os模块常用的目录处理函数

| 函数 | 说明 |
| ———————————– | ——————————— |
| mkdir(path,mode) | 创建path指定的目录 |
| makedirs(name,mode) | 创建多级目录，name为’path1/path2’ |
| rmdir(path) | 删除path指定的目录 |
| removedirs(path) | 删除path指定的多级目录 |
| listdir(path) | 返回path指定目录下所有的文件名 |
| getcwd() | 返回当前的工作路径 |
| chdir(dir) | 将当前目录改为path指定的目录 |
| walk(top,topdown=True,onerror=None) | 遍历目录树 |

文件流
sys模块提供了3种基本的流对象。
(1) stdin

>>> import sys
>>> sys.stdin = open('tmp.txt','r')
>>> for line in sys.stdin.readlines():
	print line

(2)stdout

>>> import sys
>>> sys.stdout = open(r'./tmp.txt','a')
>>> print 'goodboy'
>>> sys.stdout.close()

(3)stderr

>>> import sys
>>> sys.stderr = open('tmp.txt','a')
>>> f = open(r'./hello.txt','r')
>>> t = time.strftime('%Y-%m-%d%X',time.localtime())
>>> context = f.read()
>>> if context:
>>>		sys.stderr.write(t+' '+context)
>>> else:
>>> raise Exception + 'error msg'

3.RE模块

RE模块常见函数。
re.compile 、re.match 、re.search 、re.sub 、re.subn 、re.split 、re.findall 、re.finditer、re.escape 、re.purge。
实例：
(1)第一个正则表达式
Python提供了两种不同的原始操作：match和search。match是从字符串的起点开始做匹配，而search（perl默认）是从字符串做任意匹配。

>>> import re								//导入re模块
>>> pa = re.compile(r'hello')				//编译正则表达式，返回RegexObject对象，然后可以通过RegexObject对象调用match()和search()方法。r表示源字符串。
>>> result = pa.match('hello world')
>>> result.group()
'hello'
>>> result.span()
(0, 5)

也可以不使用compile函数

>>> import re
>>> result = re.match(r'hello','hello world')
>>> result.group()
'hello'

使用search查找

>>> import re
>>> res = re.search(r'hello','hello world')
>>> res.group()
'hello'

match和search区别：

>>> import re
>>> res = re.match(r'python','I love python')
>>> res.group()									  //报错
Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    res.group()
AttributeError: 'NoneType' object has no attribute 'group'
>>> res = re.search(r'hello','hello world')			//正确
>>> res.group()
'hello'

总结：match从开始位置匹配，search在整个字符串中查找。
其他函数：

>>> re.findall(r'\d+','12ab34cd56ef78gh90')						//查找所有的数字
['12', '34', '56', '78', '90']
>>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)			//分离字母
['0', '3', '9']
>>> it = re.finditer(r'\d+','12ab34cd56ef78gh90')				//查找字符串，返回迭代器
>>> for str in it:
print str.group()
12
34
56
78
90

综合练习：抓取某个网页中的所有图片。

>>> import re
>>> import urllib2
>>> req = urllib2.urlopen('http://sports.sina.com.cn/nba/')
>>> buf = req.read()
>>> listurl = re.findall(r'http:.+\.jpg',buf)
>>> i = 0
>>> for url in listurl:
	f = open(str(i)+'.jpg','w')
	req = urllib2.urlopen(url)
	buf = req.read();
	f.write(buf)
	i+=1