Skip to content

Latest commit

 

History

History
224 lines (171 loc) · 6.77 KB

carpenter.md

File metadata and controls

224 lines (171 loc) · 6.77 KB
title author date output
**The why, when, and how of functions<br />&nbsp;**
**Bob Carpenter** <br />CCM<br />&nbsp;
<small>November 2020</small>
revealjs::revealjs_presentation

Programming is hard

Programs quickly spin out of control

  • users want more features
  • research code grows organically idea to idea
  • production code is constrained on all sides
  • code gets duplicated
  • debugging feels hopeless due to scale and combinatorics

Code goes stale

  • REPL code (Python, R, Julia) fails due to lack of context
  • external dependencies change (especially in R)
  • can't be understood by others or you in the future

Only one solution to control fear & dread

  • first, admit we have a problem
  • then, improve a couple simple coding practices
  • today, we'll focus on writing readable code

Naming is hard

Heard on the street

  • There are only two hard things in Computer Science: cache invalidation and naming things.
    --Phil Karlton
     
  • There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-one errors.
    --Leon Bambrick
     
  • Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
    -- Jamie Zawinski

Write readable code

The bad and the good

  • Bad:
double yx_32;  // transfer constant in ml/s
  • Better:
double transfer_constant;  // in ml/s
  • Best:
double transfer_ml_per_s;
  • Look ma, no doc.

Design top down, code bottom up

Code should be designed for users

  • Functional specification is the what
  • Technical specification is the how
  • Design functionally top down from user goals
  • Flesh out user interfaces in the user's shoes
  • Work down to the algorithms needed to code it

Build code bottom up for developers

  • Find small reusable functional units
  • Build them, document their interfaces, and test.
  • Only then proceed to the next level up.
  • Work on manageable units with trustworthy foundations.

Code changes over time

  • But we can't anticipate where it's going well
  • Do not overengineer in anticipation of the future
  • Instead, write simple reusable chunks of code
  • All this takes practice and judgement

Design, Documentation, and Testing

The Trinity of Code Development

  • After 30+m of struggling with tikz/markdown in reveal.js, I gave up.
        FEATURES
        /     \
       /       \
QUALITY ---- TIME

 

  • But that's not what concerns us today.

The Trinity of Function Development

  • Code, doc, and testing are inextricably linked
 DOCUMENTATION
     /    \
    /      \
CODE ------ TESTS
  • They are different aspects of the same thing

    • documentation provides the functional specification---what the code does
    • code implements the specification---how it does it
    • tests verify the code does the right thing
  • Can reify this into "test-driven development"

Exercises in Specification

Let's work on some functions

  • I'll give you a simple mathematical function
  • You think about how it should behave (i.e., the doc)

Sum

  • sum(x, y)

    • x, y are scalars
  • sum(v)

    • v is a sequence of scalars

Did you define ...

  • argument types?
  • return type and value?
  • behavior with NaN inputs? infinite inputs?
  • behavior with zero-length sequence inputs?
  • behavior when there's overflow?
  • behavior with integers or complex numbers?
  • size of number types?

Sum, revisted

/**
 * Return the sum of the specified vector of double-precision
 * real-valued scalars.
 *
 * If the vector is empty, return 0.If any elements of the argument is
 * NaN, the result is NaN.  If one or more of the inputs is +infinity
 * and the rest are finite, the result is +infinity.  If one or more
 * of the inputs is -infinity and the rest are finite, the result is
 * -infinity.  If both +infinity and -infinty show up as elements of
 * the argument, the result is NaN.  If the sum exceeds the capacity
 * of a double precision float, the result is +infinity.
 *
 * @param v a vector of scalars
 * @return the sum of the elements of the vector
 */

Or maybe

  • Just say it follows IEEE 754 arithmetic and let users look that up.

  • What about varying floating point behavior?

    • what we wrote is guaranteed by the spec
    • but the actual value is only given up to some digits of precision
  • Maybe there should be an integer/complex/real matrix/complex matrix signatures.

Mean

  • mean(v)
    • v is a sequence of scalars

Did you...

  • Appropriately define boundary condition with size zero input?
    • really helps to get boundaries right to make code flow nicely

Variance

  • var(v)
    • v is a sequence of scalars

Did you...

  • Make sure to document whether the function implements the maximum likelihood (divide sume of squares by length) or unbiased (divide by length -
    1. estimator?
  • Make sure to document size 0 and 1 inputs?

Don't Get Carried Away

Only abstract as much as you need

  • Level of code can't exceed the capabilities of the developers
  • It's easy to stray too far into the land of abstraction
  • Only put in as much work as the current project demands
  • Don't over-engineer for a future you're only imaginging

Example: Accumulators

std::unordered_map<std::string, std::string> id_to_seq_;


int64_t total_bases const {
  int64_t tot = 0;                               // initialize to 0
  for (const auto& id_seq : id_to_seq_)          // visit each element
    tot += id_seq.second().size();               // update value
  return tot;                                    // return value
}


int64_t total_bases() const {
  return std::accumulate(                        // return value
      id_to_seq_.begin(),
      id_to_seq_.end(),                          // visit each element
      0,                                         // initialize to 0
      [](const auto& tot, const auto& id_seq) {
        return tot + id_seq.second.size();       // update value
      }
    );
  • Generates same optimized code (see: godbolt.org).
  • If accumulate() looks fun, look up: "foldl" (not a typo), "monad", and "functional programming".