-
Notifications
You must be signed in to change notification settings - Fork 0
FILE HANDLING IN PYTHON
A file is a container that stores information. Files can take different forms depending on the user requirements like data files, text files, program executable files etc. computer processes these files by translating them into 0s and 1s.
Every file contains three main parts:
- Header: This contains information about the file i.e. file name, file type, file size etc.
- Data: This is the actual information/content stored in the file.
- End of file: This is a special character that marks the end of the file.
Also, in a file, a new line character marks the end of a line or start of a new line. In Python, reading and writing files is relatively simple, thanks to the built-inopen()
function.
open()
returns a file object, and is most commonly used with two positional arguments and one keyword argument:
syntax:
open(file_name, mode, encoding=None)
-
file_name
is a string that specifies the name of the file to be opened. -
mode
is an optional string that specifies the mode in which the file is opened. The mode can be:- 'r': for reading (default)
with open('file.txt', 'r') as f:
data = f.read()
print(data)
Python does not consider reading past end-of-file to be an error; it simply returns an empty string.
- 'r+': opens a file for both reading a nd writing
with open('file.txt', 'r+') as f:
data = f.read()
print(data)
f.seek(0)
f.write("Modified data")
- 'w': for writing (overwrites the file or creates it if it does not exist)
with open('file.txt', 'w') as f:
f.write("Some data to be written")
- 'a': for appending to the file (creates the file if it does not exist)
with open('file.txt', 'a') as f:
f.write("Additional data")
- 'x': for creating and writing to a new file (raises an error if the file already exists). This is useful when you want to ensure that a file you are creating does not overwrite an existing file.
try:
with open('file.txt', 'x') as f:
f.write("Some data to be written")
except FileExistsError as e:
print("File already exists.")
- 'b': for reading/writing binary files.
with open('file.txt', 'rb+') as f:
data = f.read()
f.seek(0)
f.write(data)
- 't': for reading/writing text files (default)
with open('file.txt', 'rt', encoding='utf-8') as f:
data = f.read()
print(data)
-
encoding
specifies the character encoding of the file. This determine how the file's contents are interpreted as characters. If the file encoding is not specified, the default encoding is usuallyUTF-8
. Other encodings includeASCII
,UTF-16
andWindows-1252
.
A string is a sequence of Unicode characters. But a file on disk is a sequence of bytes. Python converts the sequence of bytes into a sequence of Unicode characters(strings) by decoding the bytes using a specific character encoding algorithm.
f = open('workfile', 'w', encoding="utf-8")
When working with text files, it's also possible to run into encoding errors(UnicodeDecodeError
) when opening or reading a file if the file's character encoding is not correctly specified. For example, if a file is encoded in utf-8
and you try to open it using Windows-1252
, you will run into encoding errors.
The default encoding is platform-dependent,
The f.read()
method is used to read the contents of the file, which is then stored in a variable. Then the print(data)
is used to display the contents of the file.
The f.seek(0)
method is used to set the file position to the beginning of the file, this is done to be able to write on the first byte of the file. If you don't set the file position to the beginning of the file, any new data you write will be added to the end of the file.
Then f.write("Modified data")
method is used to write the string "Modified data" to the file. The new data overwrite the previous content of the file.
It's also possible to open a file in "read and write" mode in binary mode by using 'rb+'
It is good practice to use the with
keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point. Using with
is also much shorter than writing equivalent try - finally blocks:
If you are not using the with
keyword, then you should call f.close()
to the file and immediatley free up any system resources used by it.
N/B:
Calling f.write()
without using the with
keyword or calling f.close()
might result in the arguments of f.write()
not being completely written to the disk, even if the program exits successfully.
After a file is closed either by the with
keyword or f.close()
, attempts to use the file object will fail.
f.close()
f.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file.
In Python, a file object is an object that represents an open file. Once you have a file object, you can use various methods to interact with the file. Here are some common file object methods:
read(size)
with open('file.txt', 'r') as f:
data = f.read(10) # Reads first 10 bytes
print(data)
The read(10)
method reads the first 10 bytes of the file and stores it in the data
variable, then it is printed using the print()
method. If size is not specified the method reads the whole file.
The read()
method counts characters. English characters require one byte each but other languages like Chinese need more bytes for a single character.
Trying to read a character from the middle will fail with a
UnicodeDecodeError
.
readline()
:
with open('file.txt', 'r') as f:
line = f.readline()
print(line)
In this example, the readline()
method reads one line of the file and stores it in the line
variable, then it is printed using the print()
method.
Text files can use several different characters to mark the end of a line. Every operating system has its own convention. Some use a carriage return character \r
, others use a line feed character \n
, and some use both characters \r\n
at the end of every line.
Python handles line endings automatically by default. ython will figure out which kind of line ending the text file uses and and it will all Just Work
readlines()
:
with open('file.txt', 'r') as f:
lines = f.readlines()
for l in lines:
print(l)
In this example, the readlines()
method reads all lines of the file and stores it in the lines
variable as a list, then it iterates over the list and prints each line using the print()
method.
write(string)
:
with open('file.txt', 'w') as f:
f.write("Some data to be written")
In this example, the write()
method is used to write the string "Some data to be written" to the file. The open('file.txt', 'w')
opens the file for writing. The method returns the number of characters written.
Other types of objects need to be converted – either to a string (in text mode) or a bytes object (in binary mode) – before writing them:
value = ('the answer', 42)
s = str(value) # convert the tuple to string
num = f.write(s)
print(num)
#18
writelines(list)
:
data = ["line1\n", "line2\n", "line3\n"]
with open('file.txt', 'w') as f:
f.writelines(data)
In this example, the writelines()
method is used to write a list of strings to a file in a single call.
seek(offset[, whence])
:
with open('file.txt', 'r') as f:
print(f.read(10)) # read first 10 bytes
f.seek(10) # set position to 10
print(f.read(10)) # read next 10 bytes
f.seek(0, 0) # set position to 0,0 (beginning of the file)
print(f.read(5)) # read next 5 bytes
the f.seek(10)
sets the file's current position.
The position is computed by adding offset
to a reference point; the reference point is selected by the whence
argument(optional). A whence value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. whence can be omitted and defaults to 0, using the beginning of the file as the reference point.
f.seek(-3, 2) # Go to the 3rd byte before the end
tell()
:
with open('file.txt', 'r') as f:
f.seek(10)
print(f.tell())
The f.seek(10)
sets the file's current position to 10 and f.tell()
method returns the file's current position, in this case 10. This integer represents number of bytes.
The seek()
and tell()
methods always count bytes.
close()
:
f = open('file.txt', 'r')
data = f.read()
f.close()
The file is opened using the open()
function and the file object is stored in the f
variable, then the file is read using the read()
method and the close()
method is used to close the file. Once the file is closed, file object methods can no longer be used and the file is closed, so you will not be able to read or write to it.
flush()
:
with open('file.txt', 'w') as f:
f.write('some data')
f.flush()
In this example, the flush()
method is used to ensure that any data remaining in the buffer is written to the file on disk, The flush()
method is useful when you want to make sure that the data is written to the file immediately and also when you are working with large files and you don't want to wait for the buffer to be full before writing to the file. It also can be used when you want to make sure that the buffer is clear before continuing to execute the next commands. Using the method is not usually necessary when using the file object in the context of a with
block, since the file is automatically flushed when the block is exited.
In addition to the methods, file objects also have several attributes that provide information about the file. Here are some common file object attributes:
-
name
: The name of the file. -
mode
: The mode in which the file was opened. -
closed
: Whether or not the file is closed. -
encoding
: The encoding used to open the file. -
newlines
: Indicates the newline mode used when the file was opened. It is set to None if no newlines have been encountered.
Here's an example of how to use some of these attributes:
with open('file.txt', 'r', encoding='utf-8') as f:
print("File name:", f.name) #'file.txt'
print("File mode:", f.mode) #'r'
print("File closed:", f.closed) # False
print("File encoding:", f.encoding) #'utf-8'
print("File newlines:", f.newlines)
It's important to note that the newlines
attribute is set when you open the file in text mode and it contains various values like '\r'
, '\n'
, '\r\n
' depending on the newline encoding used in the file. If you open a file in binary mode the attribute is set to None.
Binary files are files that contain non-text data, such as images, audio, or executable files. These types of files are not meant to be read or edited by humans, but they can be read and written by programs.
To read and write binary files in Python, you can use the open()
function in binary mode by setting the mode parameter to 'rb'
for reading or 'wb'
for writing.
A Binary stream object has no encoding
attribute
Since you opened the file in binary mode, the read()
method takes the
with open('image.jpg', 'rb') as binary_file:
data = binary_file.read()
# do something with the data
with open('image.jpg', 'wb') as binary_file:
binary_file.write(data)
Additionally, You can use different libraries like Pillow, OpenCV, scipy, to handle image, audio, and other types of binary files and perform different operations on them. These libraries provide high-level functions to read and write specific types of binary files and perform advanced operations on them. number of bytes to read, not the number of characters
The os
module in Python provides a way to interact with the operating system, including working with files and directories. The module provides functions for working with file paths, creating and deleting directories, renaming and deleting files, and other file-related tasks.
Here are a few examples of how you can use the os module to work with files:
-
os.rename(src, dst)
: renames the file or directorysrc
todst
. -
os.remove(path)
: deletes the file at the specified path. -
os.mkdir(path)
: creates a new directory at the specified path. -
os.rmdir(path)
: deletes the directory at the specified path. -
os.listdir(path)
: returns a list of the files and directories in the specified directory. -
os.path.exists(path)
: returns True if the specified path exists and False otherwise.
import os
os.rename('old_file.txt', 'new_file.txt')
os.remove('file_to_delete.txt')
os.mkdir('new_directory')
os.rmdir('directory_to_delete')
files_and_dirs = os.listdir('.') # list the current directory
for file_or_dir in files_and_dirs:
print(file_or_dir)
if os.path.exists('existing_file.txt'):
print('File exists')
else:
print('File does not exist')