When it comes to non-trivial sized projects, you'll want to split the program code into different files, with each file holding procedures and the like for one part of the project. You'll even want to split the files across different directories for more levels of grouping.
That's well and good, but how does Python find those files?
That is what this post will answer.
To give some context for this discussion, below is a picture of the organisation of files in the cipher-tools
repository on Github. I've one directory for the ciphers, one for the various helpers (such as langauge models and text prettification), and one for tests.
├── 2017
│ ├── 1a.ciphertext
│ └── 2017-challenge1.ipynb
├── caesar-break.ods
├── caesar_break_parameter_trials.csv
├── caesar_break_parameter_trials.ipynb
├── caesar_break_parameter_trials.py
├── cipher
│ ├── affine.py
│ ├── caesar.py
│ └── keyword_cipher.py
├── logger.py
├── main.py
├── run_tests
├── support
│ ├── count_1l.txt
│ ├── count_1w.txt
│ ├── count_2l.txt
│ ├── count_2w.txt
│ ├── count_3l.txt
│ ├── count_big.txt
│ ├── language_models.py
│ ├── lettercount.py
│ ├── norms.py
│ ├── segment.py
│ ├── shakespeare.txt
│ ├── sherlock-holmes.txt
│ ├── text_prettify.py
│ ├── utilities.py
│ ├── war-and-peace.txt
│ └── words.txt
├── test
└── test
├── test_affine.py
└── test_doctests.py
import
is how Python loads code from an additional file into the current session. When Python is asked to
import some_module
it looks for a file called some_module.py
. Where it looks is controlled by a built-in variable called sys.path
. This is a list of directories where Python looks for modules. The first item is the empty string, making Python look in the current directory. The other directories are other places on the computer where installed modules are supposed to live[1].
So, when Python is asked to import some_module
, it first looks in the current directory for some_module.py
, then (on my computer) for /usr/lib/python3.6/some_module.py
, then /usr/local/lib/python3.6/dist-packages/some_module.py
, and so on.
If the import statement looks like import cipher.caesar
, Python knows that the file caesar.py
will be in a directory called cipher
. It will first look for a subdirectory of the current directory called cipher
and the file caesar.py
within it (i.e. ./cipher/caesar.py
), then for /usr/lib/python3.6/cipher/caesar.py
, then /usr/local/lib/python3.6/dist-packages/cipher/caesar.py
, and so on.
That works well if you're always loading self-built modules from the root directory of your project. But, in the example directory tree above, I keep each year's National Cipher Challenge files in a separate directory of the project. That means I need to persuade Python to first go up a directory before searching for the cipher tool files I've written.
This little bit of magic does that:
import os, sys, inspect, collections
currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
sys.path.insert(0,parentdir)
from cipher.caesar import *
from cipher.affine import *
from support.utilities import *
from support.text_prettify import *
from support.language_models import *
Line 3 finds the current directory of the program running.
Line 4 finds the parent directory.
Line 5 inserts that parent directory at the first item of sys.path
.
That means that, when Python looks for files to import in lines 7–11, each import first looks in the parent directory, then the current directory, and then the rest of the directories which were already there in sys.path
.
And that's how to organise your code.
Code
There's not much "code" for this article, but you can see an example of this organiation in the cipher-tools
repository on Github.
Acknowledgements
Photo by Philip Swinburn on Unsplash
On my computer,
sys.path
includes directories such as/usr/lib/python3.6
,/usr/local/lib/python3.6/dist-packages
,/usr/lib/python3/dist-packages
, and/usr/lib/python3.6/dist-packages
. ↩︎