-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data frame RFC #3
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,252 @@ | ||
- Feature Name: data_frames | ||
- Start Date: 2020-08-05 | ||
- RFC PR: [nushell/rfcs#0003](https://github.com/nushell/rfcs/pull/3) | ||
- Nushell Issue: [nushell/nushell#0000](https://github.com/nushell/nushell/issues/0000) | ||
|
||
# Summary | ||
|
||
[summary]: #summary | ||
|
||
This RFC merges the Row and Table Value types into a single new value type: Frame. Data frames take inspiration from data processing systems like R and Pandas. Data frames will play the fundamental role of modelling data in Nu and will have enough descriptive power to describe all forms of structure, including streaming tables, lists, and objects. | ||
|
||
# Motivation | ||
|
||
[motivation]: #motivation | ||
|
||
The current system has a few unexpected limitations: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The inlining of nested tables is a limitation right now too, correct? If the nested table is incredibly large, we could easily run out of memory since it doesn't get streamed. Arguably we could solve this without data frames, but it seems like what's being proposed here will potentially solve that problem? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add that to the list. Yes, this protocol lets us stream inner tables also, so you could get the initial structure, and remember where the inner tables are, then read the contents of those inner tables from the stream. |
||
|
||
- The top-level rows represent a table of rows, but it's unclear how to represent a top-level list of strings vs a stream of strings. | ||
- A similar ambiguity exists between an "object" (a data structure denoted by key/value pairs) and a table of one row | ||
- Inner-tables are modelled differently than top-level tables, leading to confusion | ||
- There is no way to currently represent a matrix | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Arguably, this could be a matrix:
Not saying it'd be easy to work with, but I think it's representable 🙂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lol, true. I guess a real matrix vs a list of lists. I could call that out |
||
- As rows are streamed instead of tables, it's not possible to predict how to display this information. This is currently mitigated by buffering some number of rows and treating them as one "table" | ||
- Likewise, since rows are streamed instead of tables, it's unclear what the user should expect if they request a column that is not present, as this column may appear in the following row instead. | ||
- When table data is serialized, there is a large amount of duplication, as columns are repeated with each row sent. | ||
- Additionally, there is currently no way to represent rows using a row literal. We propose a frame literal that will fill this role. | ||
|
||
# Guide-level explanation | ||
|
||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
Data frame representation: | ||
|
||
```rust | ||
struct DataFrame { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since I'm not familiar with nu's current representations, repeating that here for a quick comparison could help. |
||
headers: Option<Vec<String>>, | ||
rows: Vec<Vec<Value>>, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pandas dataframes are stored as lists of column, each of which is an array for column-based arithmetic efficiency. Maybe more insightful is a document of a hypothetical pandas 2.0 design if pandas was rewritten. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, thanks for the heads up! Will definitely check it out |
||
partial_frame_id: Option<Uuid>, | ||
is_object: bool, | ||
} | ||
``` | ||
|
||
## Tables | ||
|
||
A self-contained table would look like this: | ||
|
||
```rust | ||
let frame = DataFrame { | ||
headers: Some(vec!["name", "age"]), | ||
rows: vec![ | ||
vec![Value::from("Bob"), Value::from(30)], | ||
vec![Value::from("Sally"), Value::from(43)] | ||
], | ||
partial_frame_id: None, | ||
is_object: false, | ||
}; | ||
``` | ||
|
||
The above code could be created using this Nu syntax: | ||
|
||
```sh | ||
[name: [Bob, Sally], age: [30, 43]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would you express this in nu syntax, and/or the above in rust syntax, if you were to state the shape without data? For instance, to say that Windows Alternatively, is there a constructor that says this df is 5 rows and 6 columns? At some point, I expect we'll have variables that can hold a dataframe. It's hard for me to visualize how this will work in a streaming environment where things are built up and torn down in a pipeline. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fdncred - for the first question, I think you're asking "how do you write types in Nu?" We'll probably need a separate RFC for that, as types will be their own topic. Or may you're asking how we handle matrices and how this differs from a list of lists? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was asking about initializing a dataframe with a predetermined shape as One could think of making dataframes with 2 columns and 3 rows as an empty dataframe except with column names, and then, as the pipeline progresses, update the information in those rows. In order to do this, some type of initialization of the df would have to take place. Maybe the term is dataframe literal. I think this is what you've created here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fdncred - ah, I think I got it. There isn't a way to fill in a dataframe, though we could think of creating some API around that like we do for TaggedDictBuilder and related. Not sure what you mean by populate it later in the pipeline. Since we're passing values through, you'd create a new value. But maybe these helpers would be able to take in a shape and let you fill it in? Seems doable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, take in a shape and fill it. This may not be exactly
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think @fdncred questions from below is a better fit here:
The data frame keeps each row separate, but the proposed table syntax groups by column. That's surprising and maybe not enough. Maybe the column names can use the argument syntax from
single line: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Potentially, yeah. Something I'm not sure of is rather we should be row-major or column-major inside of the data frame. In practice, we probably filter by column more than row, so grouping column values together internally might make the most sense. If so, perhaps we reflect that in the syntax. This feels like something we'll need to actually experiment with to see how it feels in Nu. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can imagine that there's going to be two ways (syntax) to specify tables in columns, both row-major and column-major, while the internal representation should be more predictable. But yeah, some experiments make sense. Since I want to learn Rust, I might try to build a tiny "table" parser myself. Nothing to wait for 😅 |
||
``` | ||
|
||
## Lists and matrices | ||
|
||
A self-contained list would look like this: | ||
|
||
```rust | ||
let list = DataFrame { | ||
headers: None, | ||
rows: vec![ | ||
vec![Value::from(1), Value::from(2), Value::from(3)] | ||
], | ||
partial_frame_id: None, | ||
is_object: false, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if [EDIT] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe so. |
||
} | ||
``` | ||
|
||
The above code could be created using this Nu syntax: | ||
|
||
```sh | ||
[1, 2, 3] | ||
``` | ||
|
||
## Objects (aka hash tables) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. aka dictionary, map? 'hash table' sounds rather implementation specific to me |
||
|
||
A self-contained object would look like this: | ||
|
||
```rust | ||
let obj = DataFrame { | ||
headers: Some(vec!["name", "level"]), | ||
rows: vec![ | ||
vec![Value::from("Thomas"), Value::from(12)] | ||
], | ||
partial_frame_id: None, | ||
is_object: true, | ||
} | ||
``` | ||
|
||
**Note:** we use the boolean in the table rather than enumeration because all processing on the frame remains uniform regardless of if the frame is a single row with headers vs an object. This simplifies algorithms to only have to work with the data directly, and we can later represent this data and/or serialize this data in a way that maintains the user's model. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What 'table' is this referring to? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should be 'data frame'. I'm trying to say here that the using a boolean rather than making an |
||
|
||
The above code could be created using this Nu syntax: | ||
|
||
```sh | ||
[name: Thomas, level: 12] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I have 100 rows of data, do I have to repeat the column names for each row? It may be nice to consider something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I give an example above for how to write a dataframe. This example is about "objects", or hash tables, so we only have one value per column. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we must be talking passed each other because I really understand what you're saying and I think you didn't understand what I was saying. I'm just showing a possible way of creating a dataframe literal without repeating the column names for every row. I define the column names one time with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, you're right. I totally missed what the example was saying. Yeah, we could do some kind of tagging like that to differentiate the headers from the rows. If we go this route, how would it look when there aren't header values? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With no header values, I think we'd just use indexes like If we want to leave a column blank and did not previously define the columns, using example above, I'd do something like this |
||
``` | ||
|
||
## Streaming | ||
|
||
An important part of Nu is being able to work with potentially endless streams of data. As is often the case when working with external commands, there's no guarantee that the output will terminate. | ||
fdncred marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We need to be able to represent the results of processing a stream of unprocessed data as a stream of data frames. | ||
|
||
To be able to output a data frame as a stream, we need to know two key elements: that the data frame is incomplete as-is, and a unique identifier that allows us to later stream additional data for this frame. | ||
|
||
To accomplish this streaming, we also introduce an `EndFrame`. Frame and EndFrame work together to allow a frame to be streamed as a multi-part frame, ending once the corresponding EndFrame has been read. | ||
|
||
As an example, let's say we were processing some content and wanted to output the first row and later the second row of this table: | ||
|
||
| tag | length | | ||
| ---- | ------ | | ||
| head | 1024 | | ||
| body | 8192 | | ||
|
||
To do this, we would send the two separate frames, both marked as partial frames. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could add, ", ending the stream with an |
||
|
||
```rust | ||
let frame_id = Uuid::new_v4(); | ||
output.send(UntaggedValue::DataFrame { | ||
headers: Some(vec!["tag", "length"]), | ||
length: vec![ | ||
vec![ Value::from("head"), Value::from(1024)] | ||
], | ||
partial_frame_id: Some(frame_id), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we stream nested frames, will different ids be interleaved? Will the onus be on commands to track that? Will there be helper methods/structs for dealing with that? Maybe a light discussion on that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we'd probably want some helper methods. Will have to think about that more. |
||
is_object_false, | ||
}.into_value()); | ||
|
||
// ... time passes | ||
|
||
output.send(UntaggedValue::DataFrame { | ||
headers: Some(vec!["tag", "length"]), | ||
length: vec![ | ||
vec![ Value::from("body"), Value::from(8192)] | ||
], | ||
partial_frame_id: Some(frame_id), | ||
is_object_false, | ||
}.into_value()); | ||
|
||
output.send(UntaggedValue::EndFrame(frame_id).into_value()); | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be good to have some code sample in the above section showing what it would look like on the receiving end. I'm particularly interested in the more complex cases, like frames being nested in frames. |
||
|
||
# Reference-level explanation | ||
|
||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
Much of the implementation is part of the explanation in the previous section. In this section, we'll explore more of the impact of making this change. | ||
|
||
`UntaggedValue` will have `Table` and `Row` replaced with `DataFrame` (possibly just called `Frame` for brevity) and `EndFrame`. | ||
|
||
All commands that operate on Row today will need to be updated to work with the data frame instead. | ||
|
||
Rather than operating on a single row, commands will need to be updated to handle a frame at a time. Here, the mapping should largely be the same, though an additional inner loop to process over the rows will be necessary. This processing can be done serial or in parallel and may be done synchronously or asynchronously. | ||
|
||
Commands that worked over inner tables should be able to migrate to data frames, as there is a strong overlap in functionality. | ||
|
||
Commands that filter will work similarly as before, and may opt to output streams which are flattened by the output stream. This allows them to optionally return no frame if no rows in the frame met the requirements of the filter. | ||
|
||
# Drawbacks | ||
|
||
[drawbacks]: #drawbacks | ||
|
||
Some drawbacks come to mind: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are mostly drawbacks on implementing data frames. Are there no drawbacks or limitations to frames themselves, similar to the drawbacks you listed for rows/tables above? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I should list those out. Some that come to mind:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just thought of a potentially rather large one. Let's say we go column-based internally, how do we handle data coming from source likes JSON?
Today, the above gives you a table just fine, and you can immediately start using that. It'd be a shame if we landed on a design where it stopped working. |
||
|
||
- This is a large, non-trivial amount of work. Getting this landed, updating the commands to use the new model, and thoroughly testing will take time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any plan for transitioning slowly, or does this work have to be done as a single unit? No need to document the plan here, but the ability to iterate on this is not immediately obvious to me, so it may be worth describing if (and perhaps how) that would be possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One thing we could do is to document how to transition code from one style to another. We could also support bow Row and Frame for a time, allowing people to transition off the old protocol while we roll onto the new. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think that'd be the way to go in the future. We would need backwards-incompatible protocol changes an RFC process with a clear timeline. We'll also need to figure out how to communicate that to the nushell community 🙂
fdncred marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- This will break most, if not all, third-party plugins | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hopefully we don't have to do this again anytime soon, but I'm thinking this would be a good opportunity for us to think about deprecations in our protocol. How do we version the plugins, and know what version of the protocol they want? How long do we keep old There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll definitely want to add that to the plugin protocol. I don't think it's currently part of it. |
||
|
||
# Rationale and alternatives | ||
|
||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
## Everything is a frame | ||
|
||
One alternative is to require everything to live inside of a frame. There are some advantages here: this is seemingly more uniform, but at the risk of overloading the data frame concept. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this paragraph. What "everything" isn't included in the proposal, for this to be an alternative? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here "everything" would mean all of the data primitives. In practice, this largely changes what data type would be streamed between commands. Commands would interact with each other firstly with a data frame, so that each step would start with a frame first. I'm not sure if, in practice, this buys us much simplification, but I wanted to at least mention it. |
||
|
||
# Prior art | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since streaming seems to be a big motivator for this, I wonder if there's other prior art regarding streams. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a total outsider, I'd take a look at Apache Arrow here. A lot of their messaging/docs are focused on efficient columnar storage (which I assume is not relevant here), but they have two features that are probably interesting for Nu to learn from:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @alanhdu - thanks for the tip, I'll definitely check these out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. High Level API Docs on Apache Arrow for Rust... https://docs.rs/arrow/1.0.1/arrow/ Seems to have most of the relevant stuff needed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://github.com/nevi-me/rust-dataframe/blob/master/notes/update-01__04-04-2020.md Some more thoughts on dataframes in rust using arrow and a dataframe package |
||
|
||
[prior-art]: #prior-art | ||
|
||
Other data processing systems and languages have a data frame concept. The R language and the `pandas` library for Python use it as a model for working with data in tabular format. | ||
|
||
## R data frame | ||
|
||
```r | ||
Live Demo | ||
|
||
# Create the data frame. | ||
emp.data <- data.frame( | ||
emp_id = c (1:5), | ||
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), | ||
salary = c(623.3,515.2,611.0,729.0,843.25), | ||
|
||
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", | ||
"2015-03-27")), | ||
stringsAsFactors = FALSE | ||
) | ||
# Print the data frame. | ||
print(emp.data) | ||
``` | ||
|
||
which outputs: | ||
|
||
``` | ||
emp_id emp_name salary start_date | ||
1 1 Rick 623.30 2012-01-01 | ||
2 2 Dan 515.20 2013-09-23 | ||
3 3 Michelle 611.00 2014-11-15 | ||
4 4 Ryan 729.00 2014-05-11 | ||
5 5 Gary 843.25 2015-03-27 | ||
``` | ||
|
||
## Pandas data frame | ||
|
||
Below is an example of the pandas data frame: | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If anyone is interested, this is where pandas defines the DataFrame class. Lots of code here but interesting. https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py |
||
```python | ||
>>> d = {'col1': [1, 2], 'col2': [3, 4]} | ||
>>> df = pd.DataFrame(data=d) | ||
>>> df | ||
col1 col2 | ||
0 1 3 | ||
1 2 4 | ||
``` | ||
|
||
# Unresolved questions | ||
|
||
[unresolved-questions]: #unresolved-questions | ||
|
||
- Are there syntactic ambiguities with the proposed syntax? This will require that we support parsing data frames, which includes colons and commas at the end of bare words. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Arguably that could break things for some users, but the idea of not being 1.0 yet is that we're still trying to figure things out. I doubt it will break much. That being said, I've been wanting to put together a description of our grammar. Calling out what you're adding and what would break in a grammar would make this super clear 🙂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add a section about this. |
||
- How do we want to handle partial inner data frames? That is, a data frame that is inside of another data frame. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like a big one (e.g., [EDIT] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, it seems like something we could add later as needed. The protocol would support it, but we'd want to have more API surface to deal with it. |
||
- How do we handle non-data frames in between data frames? Do all partial data frames have to stream out until complete? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally, no, but we'd probably have to relay information back through the stream to allow that. Probably an RFC on its own 🙂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you consider a non-data frame? As far as I can tell, this proposal doesn't define it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll use a better term here. I meant "data types that aren't data frames", like strings, numbers, etc. |
||
|
||
# Future possibilities | ||
fdncred marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[future-possibilities]: #future-possibilities | ||
|
||
We would like to be able to extend Data Frames further to be able to handle sending snapshots of data at the current time. This allows us to stream updates to existing tables, allowing viewers to animate as data is updated. | ||
|
||
We may also elect to add type information to the columns, so that we can maintain a more rigorous internal representation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In that case, maybe the headers should be more than an optional list of strings, so that further information can be added there later. In JavaScript/JSON the solution is to start with a list of objects, instead of a list of strings, so that more properties can be added to the object later. I guess that can be applied here, too. Maybe in Rust that would be a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. I was hoping we could evolve in that direction rather than trying to figure it out with this RFC. One thing we could do (which I proposed recently) is to create an experimental implementation for data frames and try to add support to a few commands. See how it works in practice, and if it turns out we almost always have the type information there because the source knows it, we can just add it. For example, |
||
|
||
Frames also allow us to store values in an unboxed way if we can ensure all the values in a column match, and that this holds for all columns in the frame. | ||
|
||
Commands that collect a stream into a list could potentially have the optional to merge together all partial data frames into self-contained data frames for further processing. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo? optional => option
Merging partial frames (when the end frame is received) into a single data frame makes sense to me. Though I don't understand the distinction with "self-contained data frames" - how are those different to partial frames? Why would it still be multiple frames, not a single one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a way of saying "a data frame that isn't partial", so all of its data is in that one frame. It would only be the single one, yeah. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm completely unfamiliar with R or pandas, and I've never heard the term 'data frame'. Maybe a little more detail or an example could make this summary more accessible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, will definitely fill that out. The way I'm using the term here is that it's a 2-dimensional block of data. There are some columns, and these are uniform across all the rows in the block. I think technically data frames are a bit more configurable than that, but I wanted to start with a slightly more restricted definition and adjust from there.