Skip to content

wcEcoli Python style guide

Jerry Morrison edited this page Apr 30, 2018 · 13 revisions

I condensed the PEP8 and Google style guides here, along with our decisions. Attach πŸ‘πŸΎ, πŸ‘ŽπŸΎ, and ❓ annotations. -- Jerry

Style Guides

Style guides make recommendations among programming alternatives like imports, docstrings, names, and formatting. The point is to reduce the likelihood of some bugs, increase code readability and familiarity, and make it easier for programmers to collaborate and merge code. But don't overdo consistency.

For each guideline, we could decide to:

  1. Set an expectation. Point it out in code reviews.
  2. Soft target. Don't sweat it in code reviews.
  3. Don't adopt it.

and we can plan to:

  1. Adopt it for new code.
  2. Plan to change existing code gradually or rapidly.

PEP8 Style Guidelines -- with a few adjustments

  • πŸ‘πŸΎIndentation: Stick with TABs in this project.

    • Set your editor's TAB stops to 4 spaces.

    • Python assumes TAB stops are at 8 spaces unless told otherwise.

    • Python 3 disallows mixing the use of tab and space indentation.

    • Set a nightly build script or a check-in script to flag SPACE indentation. It could use python -t sourcefile.py, or tabnanny which is less lenient but still allows mixtures that Python allows, or just search for any SPACE in indentation (although it's normal to use TABs followed by some SPACEs, esp. for odd indentation to line up with a ( or for half-TAB indentation when tab stops are 8 spaces):

      find . -name "*.py" -o -name "*.pyx" | xargs grep '^\s* '
      

      although this will include indentation in multi-line comments and strings and indentation that aligns a continuation line with the previous line's ( or [.

    • (PEP8 recommends 4 spaces per indentation level but that's primarily for shared code.)

  • Use ASCII text in Python 2; UTF-8 in Python 3. Otherwise the file needs an encoding declaration.

  • πŸ‘πŸΎThe line length soft target is 79 columns; harder target at 99 columns; no hard limit. The same limit for comments and docstrings.

    • A standard line length aids editor window layout and diff displays, but bio simulations might have many long names. It's annoying to stay within a hard limit but very useful to have a shared target.

    • (PEP8 recommends 79 columns, but 72 for comments and docstrings.)

    • A shell script to check for very long lines in source files:

      find . -name '*.py' -exec awk '{ if (length($0) > max) max = length($0) } END { if (max > 199) print max, FILENAME }' {} \;
      
  • πŸ‘πŸΎDon't use implicit relative imports (e.g. import sibling where sibling is in the same directory) because it can import the wrong file (e.g. import random), it can import the same module twice (really?), and it doesn't work in Python 3.

    Instead use absolute imports (preferred) or explicit relative imports:

    from __future__ import absolute_import  # prevents implicit relative imports
    from . import sibling
    from path.to.mypkg import sibling
    from .sibling import example
    
  • πŸ‘πŸΎPut imports at the top of the file.

    Occasionally there are good reasons to break this rule, like import pdb.

    Plan to fix cases like analysis plots that have imports nested within classes or functions.

  • πŸ‘πŸΎImport separate modules on separate lines.

  • πŸ‘πŸΎAvoid wildcard imports (from <module> import *).

    • Never import * within a class or a function. That generates slow code and it won't compile in Python 3.
  • Use if x is None: or if x is not None: rather than == or !=, and likewise for other singletons like enum values (see pip enum34). It states a simpler aim. It's faster, esp. if it avoids calling a custom __eq__() method, and it might avoid exceptions or incorrect results in __eq__(None).

  • πŸ‘πŸΎPrefer to use Python's implied line continuation inside parentheses, brackets and braces over a backslash for line continuation.

  • Write docstrings for all public modules, functions, classes, and methods.

    This is a nice goal.

    πŸ‘πŸΎ (Travis) - might be worth it to go back and try to add some for functions that already exist but this should essentially be required for any new pushes.

    πŸ‘πŸΎ (John) - Yes, this is very important. Even just a sentence would make much of the code comprehensible. I think we should also reevaluate what is 'public' or 'private' - I tried to use _leading_underscores sensibly but the current framework was one of my first big OOP endeavors, and oftentimes the pattern was broken.

  • Comments that contradict the code are worse than no comments.

  • πŸ‘πŸΎ Line continuation

    PEP8 alternative:

    foo = long_function_name(var_one, var_two,
                             var_three, var_four)
    

    preferred:

    def long_function_name(
            var_one, var_two, var_three,
            var_four):
        print(var_one)
    

    preferred:

    def long_function_name(
            var_one, var_two, var_three,
            var_four
            ):
        print(var_one)
    

    PyCharm is configurable but it implements this indentation style by default, and using the Refactor command to rename "long_function_name" will suitably adjust the indentation of the continuation lines.

    ❓ (Travis) - I think the second is preferable because sometimes if we line up with the start of the parenthesis it limits the amount of space we have to work with and creates too many new lines.

    πŸ‘πŸΎ (John) - Agreed, I prefer the latter. Often I'll try to group related arguments on one line.

  • Prefer to put a line break before a binary operator, but after is also OK.

  • πŸ‘πŸΎ Put at least one space before an inline comment, then #␣ (that's one space after the #).

    ❓ (John) - I don't quite get the logic on this one [two spaces before the #], except that without code formatting a # with one space on each side would look like an operator. Inline comments are already so hard to fit on a line that adding one more character sounds irritating.

  • Two blank lines between top-level definitions, one blank line between method definitions. Other blank lines sparingly.

    ❓ (John) - I'm pretty aggressive with whitespace, it helps me with readability.

  • Import order: standard library imports, blank line, third party imports, blank line, local imports.

  • Order:

    """Module docstring."""
    from __future__ import ...
    __all__ = ['a', 'b', 'c']  # and other "dunder" settings like __version__ and __author__
    imports
    code
    
  • """Doc strings in triple quotes.""" whether it's one line or many; whether """ or '''. The """ that ends a multiline docstring should be on a line by itself.

  • πŸ‘πŸΎ Spacing like this (see the PEP8 doc for more info):

    # Put no spaces immediately within `()`, `[]`, or `{}`.
    spam(ham[1], {eggs: 2, salsa: 10})
    
    # Put a space between `,` `;` or `:` and any following item.
    demo = (0,) + (2, 3)
    if x == 4:
        print x, y; x, y = y, x
    
    # Put no space in a simple slice expression, but parentheses to clarify complicated slice precedence
    # or construct a slice object or put subexpressions into variables.
    ham[1:9], ham[1:9:3], ham[:9:3], ham[1::3], ham[1:9:]
    ham[(lower+offset):(upper+offset)]
    
    # Put no space in function application or object indexing.
    spam(1) < spam(2)
    dct['key'] += lst[index]
    
    # Don't line up the `=` on multiple lines of assignment statements.
    x = 1
    long_variable = (3, 10)
    
    # Spaces around keyword `=` are OK, unlike in PEP8, which recommends them only
    # when there's a Python 3 parameter annotation.
    c = magic(real=1.0, imag=10.5)
    c = magic(real = 1.0, imag = 10.5)
    def munge(input: AnyStr, sep: AnyStr = None, limit=1000): ...
    
    # Use spaces or parentheses to help convey precedence.
    # Put zero or one space on both sides of a binary operator (except indentation).
    hypot2 = x*x + y*y
    

    Avoid trailing whitespace. A backslash followed by a space and a newline does not count as a line continuation marker.

    ❓ (John) - I prefer more spaces to less. As with line-breaks I tend to use white space to group ideas together, so smaller ideas in a larger expression will sometimes get no spaces. I don't think I will ever become accustomed to having no spaces around keyword arguments.

  • Avoid compound statements on one line.

    if foo == 'blah': do_something()
    
  • Comments should be complete sentences. The first word should be capitalized unless it's an identifier that begins with a lower case letter.

  • Style names like this:

    ClassName
    ExceptionName # usually ends with "Error"
    
    GLOBAL_CONSTANT_NAME
    
    function_name, method_name
    decorator_name
    
    local_var_name, global_var_name, instance_var_name, function_parameter_name
    camelCase  # OK to match the existing style
    
    __mangled_class_attribute_name
    _internal_name
    
    module_name
    package  # underscores are discouraged
    
    • Public names (like a class used as a decorator) follow conventions for usage rather than implementation.
    • Use a trailing "_" to avoid conflicting with a Python keyword like yield_, complex_, and max_.
    • Don't invent __double_leading_and_trailing_underscore__ special names.
    • Always use self for the first argument of an instance method and cls for the first argument of a class method.
    • Don't use l, O, or I for single character variable names.
    • Don't make exceptions for scientific conventions like Kcat and math conventions like matrix M, and any name is better than a single letter.
    • Avoid using properties for expensive operations. The attribute notation suggests it's cheap.
    • Use the verb to distinguish methods like get_value() from compute_value().
  • Documented interfaces are considered public, unless the documentation says they're provisional or internal. Undocumented interfaces are assumed to be internal.

    ❓ (John) - I'm a bit fuzzy on what is considered an 'interface' in Python.

  • The __all__ attribute is useful for introspection.

Programming tips:

  • if x is not None is more readable than if not x is None.

  • When implementing ordering operations with rich comparisons, it's best to implement all six operations or use the functools.total_ordering() decorator to fill them out.

  • Use def f(x): return 2*x instead of f = lambda x: 2*x for more helpful stack traces.

  • Derive exceptions from Exception rather than BaseException unless catching it is almost always the wrong thing to do.

  • When designing and raising exceptions aim to answer the question "What went wrong?" rather than only indicating "A problem occurred."

  • In Python 2, use raise ValueError('message') instead of raise ValueError, 'message' (which is not legal in Python 3).

  • Use the bare except: clause only when printing/logging the traceback.

  • Use the form except Exception as exc: to bind the exception name.

  • Limit a try clause to a narrow range of code so it only doesn't bury totally unexpected exceptions.

  • Use a with statement or try/finally to ensure cleanup gets done. For a file-like object that that doesn't support the with statement, use with contextlib.closing(urllib.urlopen("https://www.python.org/")):.

  • In a function, make either all or none of the return statements return an explicit value.

    • πŸ‘πŸΎ Furthermore, have a consistent return type. Make a class instance, tuple, namedtuple, or dictionary to handle a union of different cases.
    • πŸ‘πŸΎ Any sort of failure should raise an explicit exception.
  • Use string methods instead of the string module. They're faster and have the same API as Unicode strings.

  • String .startswith() and .endswith() are less error prone than string slicing.

  • Use e.g. isinstance(obj, int) instead of type(obj) is type(1) to check an object's type. Use isinstance(obj, basestring) to accept both str and unicode.

    • πŸ‘πŸΎ Better yet, avoid checking types except to catch common errors. It's cleaner to call different function for distinct input patterns or use O-O dispatch.
  • Use a_string.join() rather than looping over a_string += stuff to combine strings. It takes linear rather than n^2 time.

From Google's Python Style Guide

See https://google.github.io/styleguide/pyguide.html

  • Use pylint. [And/or PyCharm inspections.]

  • Import packages and modules only, not names from modules.

  • Use full pathnames to import a module; no relative imports to help prevent importing a package twice.

  • Avoid global variables.

  • Use the operator module e.g. operator.mul over lambda x, y: x * y. There's also operator.itemgetter(*items), operator.attrgetter(*attrs), and operator.methodcaller(name[, args...]).

  • Don't use mutable objects as default values in a function or method definition.

  • Use Python falsy tests for empty sequences and 0, e.g. if sequence: rather than if len(sequence): or if len(sequence) > 0:, but not for testing if a value is (not) None.

    • But don't write if value: to test for a non-empty string. That can be confusing.
  • Avoid features such as metaclasses, access to bytecode, on-the-fly compilation, dynamic inheritance, object reparenting, import hacks, reflection, modification of system internals, etc. [at least without compelling reasons].

  • Use extra parentheses instead of backslash line continuation.

  • Don't use parentheses in return statements or conditional statements except for implied line continuation or tuples.

    • πŸ‘πŸΎ Prefer explicit tuple parentheses, definitely for 1-element tuples. (x,) = func() not x, = func(). for (i, x) in enumerate(...):.
    • [PyCharm recommends return x, y over return (x, y).]

    πŸ‘ŽπŸΎ (John) - I personally have started using the latter since the output of the function will be a tuple, and the parentheses make that explicit. I also always use parentheses when tuple unpacking.

  • Write a doc string as a summary sentence, a blank line, then the rest.

  • A function must have a docstring, unless it's: not externally visible, short, and obvious. Explain what you need to know to call it.

  • Classes should have doc strings.

  • Don't put code at the top level that you don't want executed when the file is imported, unit tested, or pydoc'd.

Other Recommendations

  • It's common to write floating point literals like 1.0 or 0.1 rather than 1. or .1 for clarity, but NumPy uses 1. and 0.1 so I guess we should follow suit.

    πŸ‘πŸΎ (John) - I strongly prefer 1.0 and 0.1; dangling periods look strange.

  • πŸ‘πŸΎ Adopt from __future__ import division at the top of every file regardless of whether or not it is doing any division. This is an extremely common (and typically silent) error to hit in Python 2. Use // for explicit integer division.

    • πŸ‘πŸΎ Adopt from __future__ import absolute_import.
    • Sooner or later, adopt from __future__ import print_function.
    • These will help facilitate an eventual Python 3 transition.
Clone this wiki locally