linux_course_doc/modules/qualifying/exercise_python.md

4.2 KiB

Scripting exercises - sorting files

Damn it, I made a mess of my files! Can you sort the following picture collections for me please? I don't really care what ordering system you use but thousands of files in one directory is not practical. There are four different folders with files to sort and each one is a separate assignment so at the end I expect four folders with sorted files (and subdirectories).

The files are archived and can me downloaded here. I advise you to keep a copy of the archive so you can run your scripts multiple times until you achieve a desirable outcome.

The batches

Simple filenames

The first batch of files has a very straight forward filename FUJI_20120103_171310.jpg. The pictures span multiple years but are all from one single device (FUJI). Sort however you want but $YEAH/$MONTH might be a good start.

Multiple cameras and formats

The second batch has pictures form multiple cameras as well as multiple file extensions. You can sort in multiple ways but for example $YEAR/$CAMERA/$MONTH or $CAMERA/$YEAR/$MONTH. The choice is yours.

Messy filenames

The third batch is very messy and has not only multiple cameras and formats but also multiple date structures. This one will require some hefty debugging to parse the datetime strings!

Recovery files

The fourth batch is pretty messed up. No dates or logic can be found in the filenames but luckily jpg files can contain metadata about the files. This challenge will require you to search and install extra Python3 libraries to access this metadata. Installing will be done via pip3 which comes with Pycharm. Have a look at this library and figure out how to use it. It might be handy to install imagemagick via sudo apt install imagemagick. This gives you the ability to inspect metadata on the Linux command line via identify -verbose BJtpWU7n7WCeOL2B84Vz.jpg. I left the file extension on purpose to make it a bit easier but in the last exercise you'll have to live without so maybe try not relying on the extension to prepare yourself.

Music metadata

This batch is similar to the previous batch but instead of photo's it's a bunch of mp3's. As with photo's mp3's have metadata in them do help with sorting in programs like iTunes. There are multiple libraries out there to read and write these tags with Python3 but none come installed by default. I can recommend eyed3 to parse the metadata. Here it makes little sense to sort via date so maybe sort by $ARTIST/$ALBUM. As an added bonus you could rename the mp3's by their track number instead of the cryptic string the have.

Mixed filetypes

This one is very tricky. It's a mix of photo's and mp3's and the extensions are missing! You'll have to figure out how to determine the filetype before reading the metadata or have a deep dive into try except statements in Python3.

Some hints and tips

While we just discovered the creation of our own objects in Python3 you don't need your own classes to complete these exercises. I would advise to create multiple functions and use a lot of print() calls to help you make sense of your for file in files: loops. You can slow down the loops with time.sleep(1) if the cycling feels to quick to you. You can make one script for each batch, or reuse the same script but create different functions for the four batches, whatever is easiest for you. Ask for help from your classmates when you're stuck. By explaining your problem to someone else you often come up with a solution.

Now some links and phrases to google:

  • to manipulate dates in Python3, your best bet is the datetime library
  • definitely have a look at the strptime method of a datetime object
  • paths can be manipulated easily with two different libraries, os or pathlib
  • files can be moved with multiple libraries as well