71 lines
4.2 KiB
Markdown
71 lines
4.2 KiB
Markdown
# Scripting exercises - sorting files
|
|
|
|
Damn it, I made a mess of my files!
|
|
Can you sort the following picture collections for me please?
|
|
I don't really care what ordering system you use but thousands of files in one directory is **not** practical.
|
|
There are four different folders with files to sort and each one is a **separate** assignment so at the end I expect **four** folders with **sorted** files (and subdirectories).
|
|
|
|
The files are archived and can me downloaded [here](./assets/files.tar.gz).
|
|
I advise you to keep a copy of the archive so you can run your scripts multiple times until you achieve a desirable outcome.
|
|
|
|
## The batches
|
|
|
|
### Simple filenames
|
|
|
|
The first batch of files has a very straight forward filename `FUJI_20120103_171310.jpg`.
|
|
The pictures span multiple years but are all from one single device (FUJI).
|
|
Sort however you want but `$YEAH/$MONTH` might be a good start.
|
|
|
|
### Multiple cameras and formats
|
|
|
|
The second batch has pictures form multiple cameras as well as multiple file extensions.
|
|
You can sort in multiple ways but for example `$YEAR/$CAMERA/$MONTH` or `$CAMERA/$YEAR/$MONTH`.
|
|
The choice is yours.
|
|
|
|
### Messy filenames
|
|
|
|
The third batch is very messy and has not only multiple cameras and formats but also multiple date structures.
|
|
This one will require some hefty debugging to parse the datetime strings!
|
|
|
|
### Recovery files
|
|
|
|
The fourth batch is pretty messed up.
|
|
No dates or logic can be found in the filenames but luckily jpg files can contain **metadata** about the files.
|
|
This challenge will require you to search and install extra Python3 libraries to access this metadata.
|
|
Installing will be done via `pip3` which comes with Pycharm.
|
|
Have a look at this [library](https://github.com/TNThieding/exif) and figure out how to use it.
|
|
It might be handy to install `imagemagick` via `sudo apt install imagemagick`.
|
|
This gives you the ability to inspect metadata on the Linux command line via `identify -verbose BJtpWU7n7WCeOL2B84Vz.jpg`.
|
|
I left the file extension on purpose to make it a bit easier but in the last exercise you'll have to live without so maybe try not relying on the extension to prepare yourself.
|
|
|
|
### Music metadata
|
|
|
|
This batch is similar to the previous batch but instead of photo's it's a bunch of mp3's.
|
|
As with photo's mp3's have metadata in them do help with sorting in programs like iTunes.
|
|
There are multiple libraries out there to read and write these tags with Python3 but none come installed by default.
|
|
I can recommend [eyed3](https://eyed3.readthedocs.io/en/latest/) to parse the metadata.
|
|
Here it makes little sense to sort via date so maybe sort by `$ARTIST/$ALBUM`.
|
|
As an added bonus you could rename the mp3's by their track number instead of the cryptic string the have.
|
|
|
|
### Mixed filetypes
|
|
|
|
This one is very tricky.
|
|
It's a mix of photo's and mp3's and the extensions are missing!
|
|
You'll have to figure out how to determine the filetype before reading the metadata *or* have a deep dive into `try except` statements in Python3.
|
|
|
|
## Some hints and tips
|
|
|
|
While we *just* discovered the creation of our own objects in Python3 you don't *need* your own classes to complete these exercises.
|
|
I would advise to create multiple **functions** and use a lot of `print()` calls to help you make sense of your `for file in files:` loops.
|
|
You can slow down the loops with `time.sleep(1)` if the cycling feels to quick to you.
|
|
You can make one script for each batch, or reuse the same script but create different functions for the four batches, whatever is easiest for you.
|
|
Ask for **help** from your classmates when you're stuck.
|
|
By explaining your problem to someone else you often come up with a solution.
|
|
|
|
Now some links and phrases to google:
|
|
|
|
* to **manipulate dates** in Python3, your best bet is the [datetime](https://docs.python.org/3/library/datetime.html) library
|
|
* definitely have a look at the [strptime](https://stackabuse.com/converting-strings-to-datetime-in-python) **method** of a datetime object
|
|
* paths can be manipulated easily with **two different** [libraries](https://www.reddit.com/r/Python/comments/l45ojr/ospath_vs_pathlib/), os **or** pathlib
|
|
* files can be **moved** with multiple [libraries](https://stackoverflow.com/questions/8858008/how-to-move-a-file) as well
|