linux_course_doc/modules/qualifying/exercise_python.md

55 lines
3.1 KiB
Markdown
Raw Normal View History

2021-06-01 21:11:42 +02:00
# Scripting exercises - sorting files
Damn it, I made a mess of my files!
Can you sort the following picture collections for me please?
I don't really care what ordering system you use but thousands of files in one directory is **not** practical.
There are four different folders with files to sort and each one is a **separate** assignment so at the end I expect **four** folders with **sorted** files (and subdirectories).
The files are archived and can me downloaded [here](./assets/files.tar).
I advise you to keep a copy of the archive so you can run your scripts multiple times until you achieve a desirable outcome.
## The batches
### Simple filenames
The first batch of files has a very straight forward filename `FUJI_20120103_171310.jpg`.
The pictures span multiple years but are all from one single device (FUJI).
Sort however you want but `$YEAH/$MONTH` might be a good start.
### Multiple cameras and formats
The second batch has pictures form multiple cameras as well as multiple file extensions.
You can sort in multiple ways but for example `$YEAR/$CAMERA/$MONTH` or `$CAMERA/$YEAR/$MONTH`.
The choice is yours.
### Messy filenames
The third batch is very messy and has not only multiple cameras and formats but also multiple date structures.
This one will require some hefty debugging!
### Recovery files
The fourth batch is pretty messed up.
No dates or logic can be found in the filenames but luckily jpg files can contain **metadata** about the files.
This challenge will require you to search and install extra Python3 libraries to access this metadata.
Installing will be done via `pip3` which comes with Pycharm.
Have a look at this [library](https://github.com/TNThieding/exif) and figure out how to use it.
It might be handy to install `imagemagick` via `sudo apt install imagemagick`.
This gives you the ability to inspect metadata on the Linux command line via `identify -verbose BJtpWU7n7WCeOL2B84Vz.jpg`.
## Some hints and tips
While we *just* discovered the creation of our own objects in Python3 you don't *need* your own classes to complete these exercises.
I would advise to create multiple **functions** and use a lot of `print()` calls to help you make sense of your `for file in files:` loops.
You can slow down the loops with `time.sleep(1)` if the cycling feels to quick to you.
You can make one script for each batch, or reuse the same script but create different functions for the four batches, whatever is easiest for you.
Ask for **help** from your classmates when you're stuck.
By explaining your problem to someone else you often come up with a solution.
Now some links and phrases to google:
* to **manipulate dates** in Python3, your best bet is the [datetime](https://docs.python.org/3/library/datetime.html) library
* definitely have a look at the [strptime](https://stackabuse.com/converting-strings-to-datetime-in-python) **method** of a datetime object
* paths can be manipulated easily with **two different** [libraries](https://www.reddit.com/r/Python/comments/l45ojr/ospath_vs_pathlib/), os **or** pathlib
* files can be **moved** with multiple [libraries](https://stackoverflow.com/questions/8858008/how-to-move-a-file) as well