问题描述
我想检索存储在 FTP 服务器上的压缩 gz 文件中的数据,而不将文件写入本地存档.
I would like to retrieve the data inside a compressed gz file stored on an FTP server, without writing the file to the local archive.
目前我已经完成了
from ftplib import FTP import gzip ftp = FTP('ftp.server.com') ftp.login() ftp.cwd('/a/folder/') fileName = 'aFile.gz' localfile = open(fileName,'wb') ftp.retrbinary('RETR '+fileName, localfile.write, 1024) f = gzip.open(localfile,'rb') data = f.read()
然而,这会将文件localfile"写入当前存储.
This, however, writes the file "localfile" on the current storage.
我试图改变这个
from ftplib import FTP import zlib ftp = FTP('ftp.server.com') ftp.login() ftp.cwd('/a/folder/') fileName = 'aFile.gz' data = ftp.retrbinary('RETR '+fileName, zlib.decompress, 1024)
但是,ftp.retrbinary 不输出其回调的输出.有没有办法做到这一点?
but, ftp.retrbinary does not output the output of its callback. Is there a way to do this?
推荐答案
一个简单的实现是:
将文件下载到内存中类似文件的对象,例如 BytesIO;
将其传递给 fileobj 参数noreferrer">GzipFile 构造函数.
pass that to fileobj parameter of GzipFile constructor.
import gzip from io import BytesIO import shutil from ftplib import FTP ftp = FTP('ftp.example.com') ftp.login('username', 'password') flo = BytesIO() ftp.retrbinary('RETR /remote/path/archive.tar.gz', flo.write) flo.seek(0) with open('archive.tar', 'wb') as fout, gzip.GzipFile(fileobj = flo) as gzip: shutil.copyfileobj(gzip, fout)
<小时>
以上将整个 .gz 文件加载到内存中.对于大文件来说什么是低效的.更智能的实现将改为流式传输数据.但这可能需要实现一个智能的自定义类文件对象.
The above loads whole .gz file to a memory. What can be inefficient for large files. A smarter implementation would stream the data instead. But that would probably require implementing a smart custom file-like object.
另请参阅在 FTP 服务器上的 zip 文件中获取文件名,而无需下载整个存档.