- Debugging, profiling and testing
- Idiom
- ipython
- Pro tip: dotenv
- String formatting/interpolation
- Compiling and FFI
- Packaging and environments
- Asynchrony in python
- functional-style programming
- Watching files for changes
- Python 2 vs 3
- IDEs
- Typing
- Command line parsers
- Miscellaneous stuff I always need to look up
- Progress meters
- Misc recommendations
Guido van Rossum and Python, by Sir Frederick Leighton
A Swiss army knife of coding tools. Good matrix library, general scientific tools, statistics tools, web server, art tools, but, most usefully, interoperation with everything else - It wraps C, C++, Fortran. It includes HTTP clients, parsers, API libraries, and all the other fruits of a thriving diverse community with a common trade language. Fast enough, easy to debug, garbage-collected. If some bit is too slow, you call into a compiled language, otherwise, you relax. A good default if you’d rather get stuff done than write code.
I typically do my stats and graphs in R. My user interface is javascript, My linear algebra library is fortran, and I use julia for my other scientific calculations. Python is the thread that stitches this Frankensteinisch monster together.
Of course, it could be better. clojure is more elegant, scala is easier to parallelise, julia prioritises science more highly, MATLAB is what your academic advisor is obsessed with…
But in terms of using a well-supported satisfactory and unexciting language that goes on your computer right now, and requires you to reinvent few wheels, and which is transferable across number crunching, web development, UIs, text processing, graphics and sundry other domains, and does not require heavy licensing costs, this one is a good starting point. See Hillel Wayne, The hard part of learning a language that lists all the main friction points that various languages have, and which I read as a cynical of low-key endorsement of the acceptable mediocrity of this particular language. What a pitch! Wow! Pythong for the win! Now, let’s look closer and see all the horrid things that are wrong with it.
Debugging, profiling and testing
Idiom
Python people set great value on mastering python idiom.
For example, python idiom for accessing a nullable member is a and a.x
Python idiom for ‘python person’ is pythonista.
Python idiom for ‘python idiom’ is pythonic.
Much has been written on this.
But I think the best way to learn is by negative examples.
Here, may I proffer to you John Burkardt’s
Hypershere integrals as an example of the least idiomatic python code I have seen?
ipython
The interactive python upgrade The python-specific part of jupyter, which can also run without jupyter. Long story. But think of it as a REPL, a CLI-style execution environment, that is a little friendlier and has colourisation and autocomplete and such.
ipython config
Ipython config is per-default located in
(ipython locate)/profile_default/startup/
Autocomplete breaks
The ecosystem that support tab-completion is fragile and lackadaisical.
Usually if autocomplete is broken it is because the sensitive dependencies of jedi
are managed by cowboys.
Recent example iPython 7.19.0 autocomplete breaks with jedi <=0.17.0 and parso >=0.8.0.
The fix for that particular version was
conda install jedi==0.17.2
Pretty display of objects
Check out the rich display protocol
which allows you to render objects as arbitrary graphics.
Some examples of how to use that are helpful.
How to use this? The
display api docs
explain that you should implement methods on your objects such as, e.g.,
_repr_svg_
.
This was what I used to create latex_fragment
which can display arbitrary latex inline.
This is how you use it:
def _figure_data(self, format):
fig, ax = plt.subplots()
ax.plot(self.data, 'o')
ax.set_title(self._repr_latex_())
data = print_figure(fig, format)
# We MUST close the figure, otherwise
# IPython’s display machinery
# will pick it up and send it as output,
# resulting in double display
plt.close(fig)
return data
# Here we define the special repr methods
# that provide the IPython display protocol
# Note that for the two figures, we cache
# the figure data once computed.
def _repr_png_(self):
if self._png_data is None:
self._png_data = self._figure_data('png')
return self._png_data
For a non-graphical non-fancy terminal, you probably simply want nice formatting of big data structures.
from pprint import pprint, pformat
# display it
pprint(obj)
# get a formatted representation
pretty = pformat(obj)
What? You want to write your own pretty-printer, with correct indentation? Use tiles.
Pro tip: dotenv
dotenv allows easy configuration through OS environment variables or text files in the parent directory. You should probably use this. PRO-TIP: there are lots of packages with similar names. Make sure you install using this one
pip install python-dotenv
%% or
conda install -c conda-forge python-dotenv
Then you can be indifferent to whether files came from an FS config or an environment variable.
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
# Code of your application, which uses environment variables (e.g. from `os.environ` or
# `os.getenv`) as if they came from the actual environment.
String formatting/interpolation
Non obvious hacks
What a nightmare is that manual for the string formatting. While all the information you need is in there, it is arranged in perverse inversion of some mixture of the frequency and the priority with which you use it. See Marcus Kazmierczak’s cookbook instead.
Highlights:
## float precision
>>> print("{:.2f}".format(3.1415926))
3.14
## left padding
>>> print("{:0>2d}".format(5))
05
## power tip which the manual does not make clear:
## variable formatting of variable formatting
>>> pi = 3.1415926
>>> precision = 4
>>> print( "{:.{}f}".format( pi, precision ) )
3.1415
f-strings
f-strings
make things somewhat easier for python 3.6+
because they don’t need to mess around with naming things for the
.format(foo=foo)
call:
>>> name = "Fred"
>>> f"He said his name is {name!r}."
"He said his name is 'Fred'."
Timestamps
Why is a now timestamp in UTC not the first line in every academic research workbook/paper/data analysis? Because it’s tedious to look up the different bits.
Here you go:
from datetime import datetime
datetime.utcnow().isoformat(timespec='seconds')
Rendering HTML output
You have a quick and dirty chunk o’ HTML you need to output? You aren’t writing some damnable webapp with a nested hierarchy of template rendering and CSS integration into some design framework? You just want to pump out some markup?
I recommend yattag which is fast, simple, good and has a 1-page manual. It works real good.
Rendering markdown as HTML
from markdown import markdown
html_string = markdown("""
# Title
Paragraph. *emphasis*.
""")
Compiling and FFI
see Accelerating python.
Packaging and environments
Asynchrony in python
See asynchronous python.
functional-style programming
The expressive world of functional programming is often conceptually useful, and it can sometimes even be made performance in python.
toolz and the accelerated version cytoolz provide many useful functional abstractions which look fairly natural
Watching files for changes
Does this inotify solution work for non-Linux? Because macOS uses FSEvents and Windows uses I-don’t-even-know.
Python 2 vs 3
Are you old? New to python 3?
Sebastian Raschka, The key differences between Python 2.7.x and Python 3.x with examples
Neat python 3 stuff
Alex Rogozhnikov, Python 3 with pleasure highlights some tricks which landed ub the python 3 lineage. There are a lot of those.
Useful for me is a friendlier python struct-like thing, the data class Geir Arne Hjelle explains.
You can override module accessors.
Asynchrony is less awful (although still kind of awful).
Python 2 and 3 compatibility
TL;DR: I am no employee of giant enterprise-type business with a gigantic legacy code base, and so I don’t use python 2. My code is not python 2 compatible. Python 3 is more productive, and no-one is paying me to be less productive right now. Python 2 code is usually easy to port to python 3. It is possible to write code which is compatible with python 2 and 3, but then I would miss out on some of the work that has gone into making python 3 easier and better, and waste time porting elegant easy python 3 things to hard boring python 2 things.
🏗 six
versus future
.
IDEs
I meant to list some, but in fact I only ever use VS Codium so that is the only current advice I would be able to give.
Typing
Python 3.6+ includes type hinting, and projects such as mypy support static analysis using type hints.1 There were not many tutorials on the details of this at time of writing. Here’s one.
Short version: you go from this:
def fib(n):
a, b = 0, 1
while a < n:
yield a
a, b = b, a+b
to this:
def fib(n: int) -> Iterator[int]:
a, b = 0, 1
while a < n:
yield a
a, b = b, a+b
A version with actual corporate backing is pyright, which uses type checking to provide usability improvements in VS Codium. Might be useful.
Command line parsers
Another bike-shedding danger zone is command-line parsing, leading to the need to spend too much time parsing command line parsers rather than parsing command lines.
argparse
is built-in to python stdlib and is adequate, so why not just use that and avoid other dependencies? Answer: a dependency you might already have is likely to have introduced another CLI parsing library, because of bike-shedding.Google framework abseil has a python cli adapter whose selling points that
- Works across google ML apps
- integrates C++ arguments somehow?
- allows distributed definition of arguments rather than centralized.
- has some logging and testing features uneasily bolted into the same library together
Why bother? Because AFAICT it is a dependency of jax and Tensorflow so if you do machine learning, this is pre-installed and you may as well keep it when copy-pasting Google sample code.
“Hydra is a framework for elegantly configuring complex applications”
Builds CLIs with autocomplete and other fun stuff.
-
is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary. It’s the “Command Line Interface Creation Kit”. It’s highly configurable but comes with sensible defaults out of the box. […]
- arbitrary nesting of commands
- automatic help page generation
- supports lazy loading of subcommands at runtime
Aims to offer an alternative to the built-in argparse, which they regard as excessively magical. Its special feature is setuptools integration enabling installation of command-line tools from your current ipython virtualenv.
-
provides a clean, high level API for running shell commands and defining/organizing task functions from a tasks.py file […] it offers advanced features as well — namespacing, task aliasing, before/after hooks, parallel execution and more.
argh was/is a popular extension to argparse
Argh is fully compatible with argparse. You can mix Argh-agnostic and Argh-aware code. Just keep in mind that the dispatcher does some extra work that a custom dispatcher may not do.
clip.py comes with a passive-aggressive app name, (+1) is all about wrapping generic python commands in command-line applications easily, much like
click
.
Miscellaneous stuff I always need to look up
Progress meters
tqdm seems to be a de facto standard.
Also works in jupyter. Install it:
> pip install ipywidgets
> jupyter nbextension enable --py widgetsnbextension
> jupyter labextension install @jupyter-widgets/jupyterlab-manager
Use it:
%%capture
from tqdm import tqdm_notebook as tqdm
tqdm().pandas()
# Only this last bit is needed outside jupyter
with tqdm(total=len(my_list)) as pbar:
for x in my_list:
pbar.update(1)
No comments yet!