A file object in Python can be created using the built-in open function.
To this function, you can pass other parameters besides the file name and the mode, which are worth looking at.
The Built-In Open Function
The built-in open function opens a file and returns the created file object. Using this file object, you can subsequently perform the required operations on the file.
open(filename, [mode, buffering, encoding, errors, newline])
The file name or path to the file to be opened and the mode in which the file is to be opened are two parameters that are used in Python programming. A string must be passed for the mode parameter. All valid values and their meanings are listed in this table.
The mode parameter is optional and is assumed to be "r" if omitted.
The four additional optional parameters, buffering, encoding, errors, and newline, are not usually needed. Nevertheless, we want to provide a brief summary of their meaning.
The fourth optional parameter, encoding, can be used to specify the encoding in which the file is to be read or written. The encoding determines how special characters beyond the ASCII character set are stored. Specifying an encoding makes no sense when opening a file in binary mode and should be omitted in this case.
The fifth parameter, errors, determines how to deal with errors in encoding characters in the specified encoding. If the "ignore" value is passed for errors, they’ll be ignored. A ValueError exception is raised for a value of "strict", which also happens if you don’t specify the parameter.
The buffering parameter controls the internal buffer size, and newline specifies the characters to be recognized or used as new line characters when reading or writing the file.
Attributes and Methods of a File Object
The parameters specified when opening can be read again via the name, encoding, errors, mode, and newlines attributes of the resulting file object.
The table below briefly summarizes the most important methods of a file object.
Changing the Write/Read Position
Files can be read or written in a sequential manner. Due to the special nature of files, it’s possible to change the write or read position at will. For this purpose, you can use the seek and tell methods of the file object.
seek(offset, [whence])
The seek method of a file object sets the read/write position within the file. It is the counterpart of the tell method, which returns the current read/write position.
Note: The seek method has no effect in "a" mode. In "a+" mode, the read/write position is changed so that it can be read at any point in the file, but it’s reset before a write operation.
If the file has been opened in binary mode, the offset parameter is counted in bytes from the beginning of the file. This interpretation of offset can be influenced by the optional whence parameter.
You won’t be able to use seek as freely if the file is opened in text mode. Here, only return values of the tell method should be used as offset. Deviating values can cause undefined behavior.
In the following example, the seek method is used to determine the width, height, and color depth of a bitmap graphic:
from struct import unpack
with open("coffee.bmp", "rb") as f:
f.seek(18)
width, height = unpack("ii", f.read(8))
f.seek(2, 1)
bpp = unpack("H", f.read(2))[0]
print("Width:", width, "px")
print("Height:", height, "px")
print("Color depth:", bpp, "bpp")
From the specification of the bitmap file format, we can see that the information we’re looking for is located at offsets 18, 22, and 28, in the form of two consecutive four-byte values and one two-byte value. We therefore open the image.bmp file to read in binary mode and skip the first 18 bytes using the seek method. At this point, we can use read to read the width and height of the graphic. The values read via read are returned as a bytes string and must therefore be converted for our purposes into numbers. To do that, we use the unpack function from the struct module of the standard library. The format statement "ii" required by unpack states that the bytes string passed should be interpreted as two consecutive signed 32-bit integers.
After reading the width and height, we skip two more bytes from the current read position (the whence parameter is set to 1 in the seek call) and then can read two bytes containing the color depth of the image. Finally, we output the read information and can check it for correctness with a graphics program, for example:
Width: 800 px
Height: 600 px
Color depth: 24 bpp
Editor’s note: This post has been adapted from a section of the book Python 3: The Comprehensive Guide by Johannes Ernesti and Peter Kaiser.
Comments