问题描述
我需要确定目录中哪个文件是二进制,哪个是文本.
I need identify which file is binary and which is a text in a directory.
我尝试使用 mimetypes 但在我的情况下这不是一个好主意,因为它无法识别所有文件 mime,而且我这里有陌生人...我只需要知道,二进制或文本.简单的 ?但是我找不到解决方案...
I tried use mimetypes but it isnt a good idea in my case because it cant identify all files mimes, and I have strangers ones here... I just need know, binary or text. Simple ? But I couldn′t find a solution...
谢谢
推荐答案
谢谢大家,我找到了适合我的问题的解决方案.我在 http://www.51sjk.com/Upload/Articles/1/0/334/334211_20221025103623978.jpg 和我只改变了一点以适合我.
Thanks everybody, I found a solution that suited my problem. I found this code at http://www.51sjk.com/Upload/Articles/1/0/334/334211_20221025103623978.jpg and I changed just a little piece to suit me.
它工作正常.
from __future__ import division import string def istext(filename): s=open(filename).read(512) text_characters = "".join(map(chr, range(32, 127)) + list(" ")) _null_trans = string.maketrans("", "") if not s: # Empty files are considered text return True if "" in s: # Files with null bytes are likely binary return False # Get the non-text characters (maps a character to itself then # use the 'remove' option to get rid of the text characters.) t = s.translate(_null_trans, text_characters) # If more than 30% non-text characters, then # this is considered a binary file if float(len(t))/float(len(s)) > 0.30: return False return True