31
loading...
This website collects cookies to deliver better user experience
open()
file = open('./i-am-a-file', 'rb')
for line in f.readlines():
print(line)
f.close()
import os
file_path = './i-am-a-very-large-file'
if(os.path.isfile(file_path)):
f = open(file_path, 'rb')
for line in f.readlines():
print(line)
f.close()
f.close()
will not be called, resulted in the file not closed before the interpreter is closed, which is a bad practice as it might cause unexpected issues (for example, if a non-stop python program is running and it reads a temp file without closing it explicitly, while the OS (such as Windows) protects the temp file as it's being read, this temp file cannot be deleted until this program ends).with
to wrap the file operation, so that it automatically close the file no matter the operation succeeds or fails.with
import os
file_path = './i-am-a-very-large-file'
if(os.path.isfile(file_path)):
with open(file_path, 'rb') as f:
for line in f.readlines():
print(line)
import os
size = 1024*1024*1024*4 # 4GB
with open('i-am-a-very-large-file', "wb") as f:
f.write(os.urandom(size))
python
uses 5,154,136 KB memories, which is about 5.19 GB memories, just for reading this file only! You can clearly see the steep increasing line from the memory diagram. (FYI, I have a total of 24 GB memory)yield
import os
def read_file(f_path):
BLOCK_SIZE = 1024
if(os.path.isfile(f_path)):
with open(file_path, 'rb') as f:
while True:
block = f.read(BLOCK_SIZE)
if block:
yield block
else:
return
file_path = './i-am-a-very-large-file'
for line in read_file(file_path):
print(line)
yield
keyword in our solution.yield
works, we need to know the concept of generator. Here is a very clear and concise explanation about it, check out What does the “yield” keyword do? on StackOverflow.yield
simply makes this read_file
function to be a generator function. When read_file
gets called, it runs until yield block
, returns the first block of string and stops until the function gets called next time. So, only one block of file gets read each time read_file(file_path)
is called. read_file(file_path)
need to call (for line in read_file(file_path)
), and each time it only consume a little memory to read one block.with open(path) as file:
for line in file: print(line)
import sys
import io
file_path = './i-am-a-very-large-file'
with open(file_path, 'rb') as f:
BLOCK_SIZE = 1024
fi = io.FileIO(f.fileno())
fb = io.BufferedReader(fi)
while True:
block = fb.read(1024)
if block:
print(block)
else:
break