使用Python设计一个代码统计工具

2023-09-14 12:48:05 60

问题

设计一个程序，用于统计一个项目中的代码行数，包括文件个数，代码行数，注释行数，空行行数。尽量设计灵活一点可以通过输入不同参数来统计不同语言的项目，例如：

#type用于指定文件类型
pythoncounter.py--typepython

输出：

files:10
code_lines:200
comments:100
blanks:20

分析

这是一个看起来很简单，但做起来有点复杂的设计题，我们可以把问题化小，只要能正确统计一个文件的代码行数，那么统计一个目录也不成问题，其中最复杂的就是关于多行注释，以Python为例，注释代码行有如下几种情况：

1、井号开头的单行注释

#单行注释

2、多行注释符在同一行的情况

"""这是多行注释"""
'''这也是多行注释'''
3、多行注释符

"""
这3行都是注释符
"""

我们的思路采取逐行解析的方式，多行注释需要一个额外的标识符in_multi_comment来标识当前行是不是处于多行注释符当中，默认为False，多行注释开始时，置为True，遇到下一个多行注释符时置为False。从多行注释开始符号直到下一个结束符号之间的代码都应该属于注释行。

知识点

如何正确读取文件，读出的文件当字符串处理时，字符串的常用方法

简化版

我们逐步进行迭代，先实现一个简化版程序，只统计Python代码的单文件，而且不考虑多行注释的情况，这是任何入门Python的人都能实现的功能。关键地方是把每一行读出来之后，先用strip()方法把字符串两边的空格、回车去掉

#-*-coding:utf-8-*-
"""
只能统计单行注释的py文件
"""
defparse(path):
comments=0
blanks=0
codes=0
withopen(path,encoding='utf-8')asf:
forlineinf.readlines():
line=line.strip()
ifline=="":
blanks+=1
elifline.startswith("#"):
comments+=1
else:
codes+=1
return{"comments":comments,"blanks":blanks,"codes":codes}
if__name__=='__main__':
print(parse("xxx.py"))

多行注释版

如果只能统计单行注释的代码，意义并不大，要解决多行注释的统计才能算是一个真正的代码统计器

#-*-coding:utf-8-*-
"""

可以统计包含有多行注释的py文件

"""
defparse(path):
in_multi_comment=False#多行注释符标识符号
comments=0
blanks=0
codes=0
withopen(path,encoding="utf-8")asf:
forlineinf.readlines():
line=line.strip()
#多行注释中的空行当做注释处理
ifline==""andnotin_multi_comment:
blanks+=1
#注释有4种
#1.#井号开头的单行注释
#2.多行注释符在同一行的情况
#3.多行注释符之间的行
elifline.startswith("#")or\
(line.startswith('"""')andline.endswith('"""')andlen(line))>3or\
(line.startswith("'''")andline.endswith("'''")andlen(line)>3)or\
(in_multi_commentandnot(line.startswith('"""')orline.startswith("'''"))):
comments+=1
#4.多行注释符的开始行和结束行
elifline.startswith('"""')orline.startswith("'''"):
in_multi_comment=notin_multi_comment
comments+=1
else:
codes+=1
return{"comments":comments,"blanks":blanks,"codes":codes}
if__name__=='__main__':
print(parse("xxx.py"))

上面的第4种情况，遇到多行注释符号时，in_multi_comment标识符进行取反操作是关键操作，而不是单纯地置为False或True，第一次遇到"""时为True，第二次遇到"""就是多行注释的结束符，取反为False，以此类推，第三次又是开始，取反又是True。

那么判断其它语言是不是要重新写一个解析函数呢？如果你仔细观察的话，多行注释的4种情况可以抽象出4个判断条件，因为大部分语言都有单行注释，多行注释，只是他们的符号不一样而已。

CONF={"py":{"start_comment":['"""',"'''"],"end_comment":['"""',"'''"],"single":"#"},
"java":{"start_comment":["/*"],"end_comment":["*/"],"single":"//"}}
start_comment=CONF.get(exstansion).get("start_comment")
end_comment=CONF.get(exstansion).get("end_comment")
cond2=False
cond3=False
cond4=False
forindex,iteminenumerate(start_comment):
cond2=line.startswith(item)andline.endswith(end_comment[index])andlen(line)>len(item)
ifcond2:
break
foriteminend_comment:
ifline.startswith(item):
cond3=True
break
foriteminstart_comment+end_comment:
ifline.startswith(item):
cond4=True
break
ifline==""andnotin_multi_comment:
blanks+=1
#注释有4种
#1.#井号开头的单行注释
#2.多行注释符在同一行的情况
#3.多行注释符之间的行
elifline.startswith(CONF.get(exstansion).get("single"))orcond2or\
(in_multi_commentandnotcond3):
comments+=1
#4.多行注释符分布在多行时，开始行和结束行
elifcond4:
in_multi_comment=notin_multi_comment
comments+=1
else:
codes+=1

只需要一个配置常量把所有语言的单行、多行注释的符号标记出来，对应出cond1到cond4几种情况就ok。剩下的任务就是解析多个文件，可以用os.walk方法。

defcounter(path):
"""
可以统计目录或者某个文件
:parampath:
:return:
"""
ifos.path.isdir(path):
comments,blanks,codes=0,0,0
list_dirs=os.walk(path)
forroot,dirs,filesinlist_dirs:
forfinfiles:
file_path=os.path.join(root,f)
stats=parse(file_path)
comments+=stats.get("comments")
blanks+=stats.get("blanks")
codes+=stats.get("codes")
return{"comments":comments,"blanks":blanks,"codes":codes}
else:
returnparse(path)

当然，想要把这个程序做完善，还有很多工作要多，包括命令行解析，根据指定参数只解析某一种语言。

补充：

Python实现代码行数统计工具

我们经常想要统计项目的代码行数，但是如果想统计功能比较完善可能就不是那么简单了，今天我们来看一下如何用python来实现一个代码行统计工具。

思路：

首先获取所有文件，然后统计每个文件中代码的行数，最后将行数相加.

实现的功能：

统计每个文件的行数；
统计总行数；
统计运行时间；
支持指定统计文件类型，排除不想统计的文件类型；
递归统计文件夹下包括子文件件下的文件的行数；

排除空行；

#coding=utf-8
importos
importtime
basedir='/root/script'
filelists=[]
#指定想要统计的文件类型
whitelist=['php','py']
#遍历文件,递归遍历文件夹中的所有
defgetFile(basedir):
globalfilelists
forparent,dirnames,filenamesinos.walk(basedir):
#fordirnameindirnames:
#getFile(os.path.join(parent,dirname))#递归
forfilenameinfilenames:
ext=filename.split('.')[-1]
#只统计指定的文件类型，略过一些log和cache文件
ifextinwhitelist:
filelists.append(os.path.join(parent,filename))
#统计一个文件的行数
defcountLine(fname):
count=0
forfile_lineinopen(fname).xreadlines():
iffile_line!=''andfile_line!='\n':#过滤掉空行
count+=1
printfname+'----',count
returncount
if__name__=='__main__':
startTime=time.clock()
getFile(basedir)
totalline=0
forfilelistinfilelists:
totalline=totalline+countLine(filelist)
print'totallines:',totalline
print'Done!CostTime:%0.2fsecond'%(time.clock()-startTime)

结果：

[root@pythontabscript]#pythoncountCodeLine.py
/root/script/test/gametest.php----16
/root/script/smtp.php----284
/root/script/gametest.php----16
/root/script/countCodeLine.py----33
/root/script/sendmail.php----17
/root/script/test/gametest.php----16
totallines:382
Done!CostTime:0.00second
[root@pythontabscript]#

只会统计php和python文件，非常方便。

总结

以上所述是小编给大家介绍的使用Python设计一个代码统计工具，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对毛票票网站的支持！

使用Python设计一个代码统计工具

热门推荐

随机推荐