title | author | date | output |
---|---|---|---|
**The why, when, and how of functions<br /> ** |
**Bob Carpenter** <br />CCM<br /> |
<small>November 2020</small> |
revealjs::revealjs_presentation |
- users want more features
- research code grows organically idea to idea
- production code is constrained on all sides
- code gets duplicated
- debugging feels hopeless due to scale and combinatorics
- REPL code (Python, R, Julia) fails due to lack of context
- external dependencies change (especially in R)
- can't be understood by others or you in the future
- first, admit we have a problem
- then, improve a couple simple coding practices
- today, we'll focus on writing readable code
- There are only two hard things in Computer Science: cache
invalidation and naming things.
--Phil Karlton
- There are 2 hard problems in computer science: cache invalidation,
naming things, and off-by-one errors.
--Leon Bambrick
- Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
-- Jamie Zawinski
- Bad:
double yx_32; // transfer constant in ml/s
- Better:
double transfer_constant; // in ml/s
- Best:
double transfer_ml_per_s;
- Look ma, no doc.
- Functional specification is the what
- Technical specification is the how
- Design functionally top down from user goals
- Flesh out user interfaces in the user's shoes
- Work down to the algorithms needed to code it
- Find small reusable functional units
- Build them, document their interfaces, and test.
- Only then proceed to the next level up.
- Work on manageable units with trustworthy foundations.
- But we can't anticipate where it's going well
- Do not overengineer in anticipation of the future
- Instead, write simple reusable chunks of code
- All this takes practice and judgement
- After 30+m of struggling with tikz/markdown in
reveal.js
, I gave up.
FEATURES / \ / \ QUALITY ---- TIME
- But that's not what concerns us today.
- Code, doc, and testing are inextricably linked
DOCUMENTATION / \ / \ CODE ------ TESTS
-
They are different aspects of the same thing
- documentation provides the functional specification---what the code does
- code implements the specification---how it does it
- tests verify the code does the right thing
-
Can reify this into "test-driven development"
- I'll give you a simple mathematical function
- You think about how it should behave (i.e., the doc)
-
sum(x, y)
x, y
are scalars
-
sum(v)
v
is a sequence of scalars
- argument types?
- return type and value?
- behavior with NaN inputs? infinite inputs?
- behavior with zero-length sequence inputs?
- behavior when there's overflow?
- behavior with integers or complex numbers?
- size of number types?
/** * Return the sum of the specified vector of double-precision * real-valued scalars. * * If the vector is empty, return 0.If any elements of the argument is * NaN, the result is NaN. If one or more of the inputs is +infinity * and the rest are finite, the result is +infinity. If one or more * of the inputs is -infinity and the rest are finite, the result is * -infinity. If both +infinity and -infinty show up as elements of * the argument, the result is NaN. If the sum exceeds the capacity * of a double precision float, the result is +infinity. * * @param v a vector of scalars * @return the sum of the elements of the vector */
-
Just say it follows IEEE 754 arithmetic and let users look that up.
-
What about varying floating point behavior?
- what we wrote is guaranteed by the spec
- but the actual value is only given up to some digits of precision
-
Maybe there should be an integer/complex/real matrix/complex matrix signatures.
mean(v)
v
is a sequence of scalars
- Appropriately define boundary condition with size zero input?
- really helps to get boundaries right to make code flow nicely
var(v)
v
is a sequence of scalars
- Make sure to document whether the function implements the maximum likelihood
(divide sume of squares by length) or unbiased (divide by length -
- estimator?
- Make sure to document size 0 and 1 inputs?
- Level of code can't exceed the capabilities of the developers
- It's easy to stray too far into the land of abstraction
- Only put in as much work as the current project demands
- Don't over-engineer for a future you're only imaginging
std::unordered_map<std::string, std::string> id_to_seq_; int64_t total_bases const { int64_t tot = 0; // initialize to 0 for (const auto& id_seq : id_to_seq_) // visit each element tot += id_seq.second().size(); // update value return tot; // return value } int64_t total_bases() const { return std::accumulate( // return value id_to_seq_.begin(), id_to_seq_.end(), // visit each element 0, // initialize to 0 [](const auto& tot, const auto& id_seq) { return tot + id_seq.second.size(); // update value } );
- Generates same optimized code (see:
godbolt.org
). - If
accumulate()
looks fun, look up: "foldl" (not a typo), "monad", and "functional programming".