Skip to content

Latest commit

 

History

History
527 lines (362 loc) · 13.8 KB

main.md

File metadata and controls

527 lines (362 loc) · 13.8 KB

Sciware

Functions

https://github.com/flatironinstitute/learn-sciware-dev/tree/master/12_Functions

Rules of Engagement

Goal:

Activities where participants all actively work to foster an environment which encourages participation across experience levels, coding language fluency, technology choices*, and scientific disciplines.

*though sometimes we try to expand your options

Rules of Engagement

  • Avoid discussions between a few people on a narrow topic
  • Provide time for people who haven't spoken to speak/ask questions
  • Provide time for experts to share wisdom and discuss
  • Work together to make discussions accessible to novices
(These will always be a work in progress and will be updated, clarified, or expanded as needed.)

Zoom Specific

  • If comfortable, please keep video on so we can all see each other's faces.
  • Ok to break in for quick, clarifying questions.
  • Use Raise Hand feature for new topics or for more in-depth questions.
  • Please stay muted if not speaking. (Host may mute you.)
  • We are recording. Link will be posted on #sciware Slack.

Future Sessions

  • Sciware "Office Hours" follow-up discussion in 2 weeks (Nov 19)
  • Future sessions planned:
    • Intro to R
    • Data storage file formats (hdf5, numpy, parquet, sqlite, ...)
  • Suggest topics and vote on options in #sciware Slack

Today's Agenda

  • Intro: what is a function? (Alex Barnett, CCM)
  • Designing good functions (Bob Carpenter, CCM)
  • Packaging a Jabberwocky (Alex Barnett, CCM)
  • Group exercises!
  • Case studies and discussion (Joakim Anden, formerly CCM, Nils Wentzell, CCQ, and others)
<style type="text/css"> .reveal p { text-align: left; } </style>

Intro: what is a function?

What is a function?

Say you have python script/notebook including the snippet...

S = 0; T = 0
for i in range(10) :
  S += a[i]*a[i]
for i in range(20) :
  T += b[i]*b[i]
result = S-T

Seems to sum the squares of array a, then similar for b, then subtract them.

Good to package the self-contained repeated task as a function:
def sumsquares(a):
    """Sum the squares of the elements of a NumPy array."""
    return sum(a*a)
Then entire snippet becomes much simpler, and easier to read:
result = sumsquares(a)-sumsquares(b)
plus other users, and future you, get a useful tool: a function :)

Functions are sad without tests

def sumsquares(a):
    """Sum the squares of the elements of a NumPy array."""
    return sum(a*a)

A ("pure") function has inputs (array a), outputs (return value)

  • refers to no global variables
  • has no "state" (internal memory). This makes it testable.
Never trust a func you wrote but didn't test! A simple tester:

import numpy as np
if sumsquares(np.array((3.0,4.0))) == 25.0:
    print('pass')
else:
    print('fail')
Q: Other than checking it does the right thing, why else are test "driver" codes essential?
shows a human how to use your function

Standing on the shoulders of giants

We rely on other people's functions all the time, eg y=sin(x) (low-level math lib), coeffs=numpy.polyfit(y,x,degree). Good design work went into these functions, as docs show:

Note interface: simple required args; advanced ones are optional

You can be a giant too, by writing (then distributing) your own functions!

Remember "good function" = function body + docs + tester(s) + usage example(s)

Session goals & topics

I. Get us to package most of our code as functions:

  • breaks tasks in to separately testable components
  • clean, modular code, leaving scripts/notebooks short and readable

II. Write functions with good interfaces, and critique them:

  • naming, inputs, outputs, their types, optional args
  • testers, documentation
  • these are just as important as the algorithm ("meat") inside the function Otherwise no-one will use your functions... and that includes future you! :(

Call it: "good interface design", in sense of API (application programmer interface). Can include wrapping existing code so it can be called from another language (eg Python y=sin(x) actually wraps libm library)

[Not to be confused with: "interfacing" in sense of UX/UI (user experience), human-computer interfaces, package management (apt etc)...]

Any questions about what we're talking about or goals ?

The why, when, and how of functions

slides

Packaging a Jabberwocky



(why Jabberwocky? Because all technical terms in science codes are nonsense poems to other users outside that area)

Creating a good function from a script

You heard your friend wrote code once that included a cool way to find a place \(x\) where some 1D function \(f(x)=0\). You ask and they say, sure, here you go:

# script proving my gimble is frumious!
# (recall technical words are gibberish to other users)
for brillig in np.arange(0, pi, 0.1):
    mimsy = cos(brillig)
    wabe_guess = 1.5
    while True:
        wabe_newguess = wabe_guess - runcible(mimsy, wabe_guess) /
                                      runciblederivative(mimsy, wabe_guess)
        if abs(wabe_newguess - wabe_guess) < 1e-9:
            break
        wabe_guess = wabe_newguess        
    print(bandersnatch(brillig, wabe_newguess))

You laugh, but this is a common collaboration experience!

How useful is this to you? Must waste time understanding the whole script, some outer loop. Is runcible a 1D function? It has 2 args. Where’s the bit that finds the zero of a 1D function? What variable contains the answer??

Split out a root-finding function

Ok, in real world at this point you'd read around, eventually (!) find scipy.optimize.newton, and be done. Say internet is broken. Need to make own func...

Ah, you realize it's doing Newton's iteration \[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\] so that the sequence \(x_0, x_1, \ldots\) tends to a root. mimsy is some parameter we ignore. wabe is \(x\), which is a simpler and more universal variable name!

Let's wrap in interface for roots of a general function, not just `runcible`!

We steal the algorithm of the script, just do better packaging.

First draft function

To root-find a general func, need pass that func in, and a (separate?) func for its derivative:

def rootfind1d(f, dfdx, x0):
    """Find a nearby root of a 1D scalar function.
    Inputs: f - a 1D scalar function
            dfdx - its derivative function
            x0 - an initial guess to the root
    Returned value: approximate root x, meaning f(x)=0.
    Notes: uses the Newton-Raphson iteration.
    """
    while True:
        xnew = x0 - f(x0)/dfdx(x0)
        if abs(xnew-x0) < 1e-9:
            break
        x0 = xnew
    return xnew

Look it has docs! We suggest you even write the docs before you write the body of the function, focuses the mind.

Q: are we done with this basic draft?

No, recall it is incomplete without a test routine, like this:

    from math import *
    f = lambda x: sin(x)
    dfdx = lambda x: cos(x)
    x = rootfind1d(f, dfdx, 2.0)
    if abs(x-pi)<1e-9:
        print('pass')
    else:
        print('fail')

Very niiice!

Note: universal simple math language and symbols (all jargon gone).

Now why didn't your friend hand you this in the first place?? If they had been a good structured coder, they would have :)

And So Can You*.

* Stephen Colbert

Improvements

The tolerance check 1e-9 was particular to frumious gimbles. Think about your user (or future self): they'll want to change it.

Q: how best do this?


def rootfind1d(f, dfdx, x0, tol=1e-9):
    """Find a nearby root of a 1D scalar function.
    Inputs: f - a 1D scalar function
            dfdx - its derivative function
            x0 - an initial guess to the root
            tol (optional) - desired absolute tolerance in the root
    Returned value: approximate root x, meaning f(x)=0.
    Notes: uses the Newton-Raphson iteration.
    """
    while True:
        xnew = x0 - f(x0)/dfdx(x0)
        if abs(xnew-x0) < tol:
            break
        x0 = xnew
    return xnew

Optional args are a great way make it easy on the basic user while the power user can tweak various algorithm parameters from sensible defaults.

Q: how else could you improve this function or its tests?

A: one of the group exercises!

Exercises (breakouts)

Choose an example to work on in a small group, from the files in the repo: https://github.com/flatironinstitute/learn-sciware-dev/tree/master/12_Functions/exercise

There are several different languages to choose from.

Discussion, advanced topics

Naming

Comments are good, but names are better.

def get_values2(x, y):
    temp = np.mean(np.abs(x - y) ** 2)
    return temp
vs.

def mean_squared_error(x, x_ref):
    mse = np.mean(np.abs(x - x_ref) ** 2)
    return mse
Comments can go stale and require maintenance.
Comments also can't be tested (generally).

High- and low-level APIs

More advanced libraries or packages can have multiple levels of APIs.

High level calls low level.

fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
                           int sign, unsigned flags);
typedef struct {
     int n;
     int is;
     int os;
} fftw_iodim;

fftw_plan fftw_plan_guru_dft(
     int rank, const fftw_iodim *dims,
     int howmany_rank, const fftw_iodim *howmany_dims,
     fftw_complex *in, fftw_complex *out,
     int sign, unsigned flags);

High- and low-level APIs (cont.)

FINUFFT: simple call vs. plan interface


f = finufft.nufft2d1(x, y, c, (N1, N2))

plan = finufft.Plan(nufft_type, (N1, N2), n_trans=c.shape[0])
plan.setpts(x, y)
f = plan.execute(c)

Default arguments

Why do we need them?
When do we need them?
What is a “sensible default”?
How do we order them?
Before:

iflag = 1
eps = 1e-6
f = np.zeros([N1, N2], dtype=np.complex128, order='F')
ret = finufftpy.nufft2d1(x, y, c, iflag, eps, N1, N2, f)
After:

f = finufft.nufft2d1(x, y, c, (N1, N2))
How do we implement them in languages that do not support them natively?
Provide -1, NULL?

Example: HDF5 (Hierarchical data format)

  • Binary data format to store large datasets
  • Organize Data in a directory structure
  • Seamless compression and decompression

Storing a C++ vector

std::vector<double> vec = {1.0, 2.0};

// Create a new file using the default properties.
hid_t file = H5Fcreate ("vec.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

// Create a dataspace describing the dimensions
hsize_t dims[1] = {vec.size()};
hid_t space = H5Screate_simple(1 /*ndims*/, dims, NULL);

// Create a dataset and save it to file
hid_t dset = H5Dcreate(file, "vec", H5T_IEEE_F64LE, space, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
herr_t status = H5Dwrite(dset, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, vec.data());

// Close and release resources.
H5Dclose (dset); H5Sclose (space); H5Fclose (file);

The output

$ h5dump vec.h5

HDF5 "vec.h5" {
GROUP "/" {
   DATASET "vec" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
      DATA {
      (0): 1, 2
      }
   }
}
}

What would we like to write?

std::vector<double> vec = {1.0, 2.0};

// Create a new file with write access
h5::file file("vec.h5", 'w');

// Store the vector to dataset "vec"
h5::write(file, "vec", vec);
// Create a new file with read access
h5::file file("vec.h5", 'r');

// Read the vector from dataset "vec"
std::vector<double> vec;
h5::read(file, "vec", vec);

The User Interface

namespace h5 {
  template <typename T>
  void write(group g, std::string key, T const& data);

  template <typename T>
  void read(group g, std::string key, T& data);

  template <typename T>
  T read(group g, std::string key);
}

Find library at github.com/TRIQS/h5

Feedback Survey

http://bit.ly/sciware-functions-2020