Summary of Automate the Boring Stuff with Python

This book is about how to automate work with Python and where it can be applied. It is aimed at non-programmers who want to increase productivity and reduce repetitive work.

As a future programmer, why do I recommend this book aimed at non-programmers? In my view, code is a tool. As a tool, it should be used to help solve problems and meet needs. However, in many of our coding-related learning, we don't consider why, but simply learn what. This learning method is not efficient, and the most critical thing is that there will be a sense of confusion about what it is used for/whether it is useful, thereby losing the motivation to continue learning. Therefore, I recommend this book, starting from the demand, using the demand as the goal, telling you what you can do, and you don't have to learn the specific use part in order, you can jump to the relevant part based on the demand after mastering certain basic syntax.

The structure of this book is divided into three parts:

  1. The beginning explains why programming can improve efficiency and why Python is chosen.
  2. Learning Python.
  3. Specific use of Python in daily life/work.

As I have already learned Python, I skipped the first two parts and recorded my reading.

Reading and Writing Files

Files

  • During program execution, data can be stored in variables, but if you want to persistently store data, it needs to be stored in files.
  • The representation of a file is composed of a path + file name.
  • Windows system uses \ to separate paths, while Linux/Mac uses /, to ensure that the code can run uniformly, you can use os.path.join to connect paths.
  • os.path.getsize gets the file size.
  • os.listdir gets the contents of the folder.
  • Files can be divided into text files and binary files according to their content.

Reading and Writing

  • Process
    1. open(path, mode='r') opens the file and returns a File object.
    2. Read or write to the file object through read or write.
    3. close closes the file object.
  • File opening is divided into read (r), write (w), append (a).
  • read returns the entire file content in string form, readline returns line by line.
  • Variables can be read and saved through the shelve module.

Examples

Randomly Generate Test Files

The geography teacher plans to give 35 students a multiple-choice test on the cities corresponding to the 50 states in the United States. The target form is 35 test question files, 35 corresponding answer files, and the requirement is that the 35 multiple-choice questions are different, and the order of the questions and the options for the multiple-choice questions are different.

  1. Store the test data in the form of dict in the file/input code.
  2. Create test files and answer files, and write the header information.
  3. Random functions disrupt the order of the states given by the data, select the corresponding cities, delete the correct answers from the given city data, and randomly select 3 as wrong answers.
  4. Write the states and option cities obtained in step 3 into the test files and answer files.

Multiple Clipboards

When you need to copy and paste multiple times, the content of the next copy will overwrite the previous content, which is very inconvenient when you need to reuse the copied content. Therefore, implementing the function of multiple clipboards can record past copied content for repeated use.

  1. Each time the program is run, the local variable data is read through shelve.
  2. Determine the function to be executed based on the input parameters.
  3. When you need to record the clipboard, call pyperclip.paste to get the current clipboard content and store it in the variable.
  4. When you need to display the history of the clipboard content, convert the stored variable list or single variable into a string and copy it into the clipboard through pyperclip.copy.

Organizing Files

Batch processing of files, including traversal, copying, renaming, moving or compressing. Mainly using the shutil module.

  • Copy:
    • shutil.copy
    • shutil.copytree
  • Move:
    • shutil.move
  • Delete:
    • os.unlink
    • os.rmdir for empty directories
    • os.retree for all directories
  • Traversal: os.walk returns an iterator about current_folder, sub_folders, files.
  • Compression: zipfile module

Examples

File Name Conversion

Change the American-style date (MM-DD-YYYY) in the names of thousands of files to European-style date (DD-MM-YYYY).

  1. Construct a regular expression to identify the date.
  2. Traverse the directory name os.listdir, and identify it through a regular expression, and split it.
  3. If it meets the regular expression, rename it with shutil.move.

Related tasks:

  • Add prefixes to batch file names.

Folder Compression

Compress the specified folder and all files and folders under it.

  1. Find a usable and non-repetitive compression path + file name.
  2. Create a compressed file.
  3. Traverse the directory, write all files and folders into the compressed file.
  4. Close the compressed file.

WEB Crawling

Design modules:

  • webbrowser
    • open to open a webpage
  • request
    • get to obtain resources
    • raise_for_status to judge whether it is ready
    • iter_content to get the iterator of the content
  • Beautiful Soup to parse HTML documents
    • select
  • Selenium

Examples

Batch Open Web Pages

Open Google Map web pages in batches according to the given address.

  1. Generate a URL address based on the given address.
  2. Use webbrowser.open to open the webpage of the URL address.

Search Keywords

Search Google based on keywords, and open each page of search results.

  1. Use request.get to request the search results page based on keywords.
  2. Use Beautiful Soup to parse the page, find all the hyperlinks of the search results.
  3. Use webbrowser.open to open the obtained hyperlinks.

Similar uses:

  • Open all product pages of the shopping page.
  • Open all review pages of the product.
  • Get all results of searching for pictures.

Download Web Comics

Download XKCD comics, download the previous page after downloading, and end after downloading the first page.

  1. Get the XKCD webpage, extract the comic part and write it into the file.
  2. Find the hyperlink of the previous page button and get the comic webpage of the previous page.
  3. Repeat steps 1-2 until the previous page button cannot be found.

Operating Excel

Use the third-party module openpyxl to read and write Excel files.

  • load_workbook
  • get_sheet_by_name
  • cell(row=1, column=2)
  • save

Examples

Read Excel and Count Data

According to the data in the Excel file, perform clustering analysis, count the clusters separately, and sum the clusters. In fact, Excel's own functions can be completed and are simpler, so skip it.

Update the Specified Column in the Excel File

Read the Excel file, traverse each row, if the row is a specified product, then modify its price.

Operating PDF

  • Open PDF file and extract text.
  • PDF document decryption - input password.
  • Document encryption.
  • Create a PDF file, add a page.
  • PDF page merging - add watermark.

Examples

Merge Specific Pages of Multiple PDF Documents

Select all PDF documents in the directory except the first page and merge them into a new PDF document in dictionary order.

  1. Get all file names in the current directory, and add the file names ending with .pdf to the array.
  2. The array is sorted by file name.
  3. Open each PDF document in order, read each document from the second page, and add it to the new document.
  4. Save the new document.

Operating CSV and JSON

CSV

Through the csv module, you can read and write csv. Reading is similar to a two-dimensional array, and you can write rows through write_row.

Examples

Read the data of all csv files in the directory and remove the header information.

  1. Get all file names in the directory and traverse them. Skip non-csv files.
  2. Open the csv file, read the csv data, skip the first row, and add the remaining rows to the array.
  3. Write the rows in the array to the new file in csv format.

JSON

Json is a commonly used data format, mostly used for interaction between APIs, but it is used less in daily life. Through the json module, you can convert json-formatted strings into dict.

Get Real-time Weather Data

Through the weather service interface provided online, get the real-time weather conditions of the specified location.

  1. Get the location input through the command line.
  2. Send an HTTP request to the specified weather service API with the location as a parameter.
  3. Parse the returned json data and output it.

Operating Email

Examples

According to the member data situation in the Excel table, send an email to remind the members who have not paid this month's membership fee.

  1. Open the Excel file to read the member data, judge whether the last column is paid, if it is, record the member name and email address.
  2. Log in to the email through the smtplib module, and send emails one by one based on the information recorded in the previous step.

Automating Mouse and Keyboard Operations

Implemented through the third-party module pyautogui, it can perform related operations on the graphical interface:

  • Control the mouse to move at a specified speed.
  • Get mouse position.
  • Mouse interaction operations
    • Click.
    • Drag.
    • Scroll.
  • Screenshot.
  • Image recognition.
  • Press/release the specified key/combination key on the keyboard.