Skip to content

Latest commit

 

History

History
329 lines (312 loc) · 44.4 KB

openrefine_reproducibility_ui.md

File metadata and controls

329 lines (312 loc) · 44.4 KB


OpenRefine Reproducibility Demo

These four demos are to test the OpenRefine reproducibility
We use the first 5 rows of Menu.csv from New York Public Library as our demo dataset.
The version of OpenRefine is 3.1 and all of the experiments are launched on Mac OS.
The csv file used here is in comma separated mode.
This is the original dataset:

,id,name,sponsor,event,venue,place,physical_description,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count
0,12463,,HOTEL EASTMAN,BREAKFAST,COMMERCIAL,"HOT SPRINGS, AR",CARD; 4.75X7.5;,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67
1,12464,,REPUBLICAN HOUSE,[DINNER],COMMERCIAL,"MILWAUKEE, [WI];",CARD; ILLUS; COL; 7.0X9.0;,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34
2,12465,,NORDDEUTSCHER LLOYD BREMEN,FRUHSTUCK/BREAKFAST;,COMMERCIAL,DAMPFER KAISER WILHELM DER GROSSE;,CARD; ILLU; COL; 5.5X8.0;,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84
3,12466,,NORDDEUTSCHER LLOYD BREMEN,LUNCH;,COMMERCIAL,DAMPFER KAISER WILHELM DER GROSSE;,CARD; ILLU; COL; 5.5X8.0;,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63
4,12467,,NORDDEUTSCHER LLOYD BREMEN,DINNER;,COMMERCIAL,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER; ILLU; COL; 5.5X7.5;,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES:  ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33

Demo 0

This demo is supported by OpenRefine natively, should always work.
Part 1:
1). Create new OpenRefine project (P1) importing test data set (T).

Create Project steps:

  • Click on “Create Project” and “Choose Files” from the local computer, then click on “Next”.
    enter image description here
  • OpenRefine will generate a “Configure Parsing Options” page with default settings, where users can change the choices accordingly, including “Project name”, “Character encoding”, “Columns are separated by”, etc. Here we change the project name into “demo0_part1_partMenu”, then Click on “Create Project”, a new OpenRefine project is set up.
    enter image description here
  • This is the project interface for a new OpenRefine. On the right, users can do data cleaning manipulations on the table (see the picture below), and all of the manipulations will be recorded in the “Undo/Redo” sidebar on the left simultaneously.
    Users can undo or redo some data cleaning steps by simply click on the steps they want to restore.
    enter image description here
    2). Perform a few data cleaning operations, both generalizable and non-generalizable.
    Here there are 6 steps in all recorded in the Undo/Redo operation history sidebar.
    enter image description here

3). View the operation history(H1)
enter image description here
4). Undo all data cleaning steps, then redo all the operations.
Click on the first step: “0. Create project”, which can help undo all data cleaning steps and initialize the project status.
Then click back on the final step to redo all the operations.
enter image description here
5). Export the cleaned data set C1.
Click on “Export” button, and choose “Comma-separated value” which can generate a csv file.
enter image description here

Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count
0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67
1,12464,,republican house,dinner;,commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34
2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84
3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63
4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES:  ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33

enter image description here
6). Export the project and save as an archive A, a tarball.
Click on “Export” and choose “Export project”, a tar.gz compressed folder will be generated.
enter image description here
Part 2:
1). Create a new OpenRefine project (P2), importing the exported archive (A).
Import Project steps:
Click on “Import Project” on the left, then choose the archive file and re-name project (optional choice).
enter image description here
2). View the operation history (H2) and check that it looks like H1
The operation history on the left is H2, and H1 is on the right. We can see that H2 is the same as H1.
enter image description here
3). Undo all data cleaning steps, then redo all the operations.
4). Export the cleaned data set C2.
This is C2:

Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count
0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67
1,12464,,republican house,dinner;,commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34
2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84
3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63
4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES:  ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33

enter image description here
5). Show that C1 and C2 are the same.

wirelessprv-10-194-219-248:demo_all barbaralee$  diff demo0_part1/demo0_part1_partMenu.csv demo0_part2/demo0_part2.csv

wirelessprv-10-194-219-248:demo_all barbaralee$

We use diff to test C1 and C2, and there is no return here. Thus, C1 and C2 are the same.

Demo 1a

This demo shows that OpenRefine recipes suffice when all operations are generalizable.

Part 1:
1). Create new OpenRefine project (P3) importing test data set (T).
(Follow the instructions in Demo 0 part 1)
enter image description here
2). Perform a few data cleaning operations where all operations are generalizable.
There are 5 steps in all.
enter image description here
3). Export the operation history and save as a recipe R.
Click on “Extract…” and then copy and paste the json-format contents from the right red box.
enter image description here
This is the recipe R:

[
  {
    "op": "core/text-transform",
    "description": "Text transform on cells in column id using expression value.toNumber()",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "id",
    "expression": "value.toNumber()",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10
  },
  {
    "op": "core/text-transform",
    "description": "Text transform on cells in column sponsor using expression value.toLowercase()",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "sponsor",
    "expression": "value.toLowercase()",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10
  },
  {
    "op": "core/text-transform",
    "description": "Text transform on cells in column event using expression value.toLowercase()",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "event",
    "expression": "value.toLowercase()",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10
  },
  {
    "op": "core/text-transform",
    "description": "Text transform on cells in column venue using expression value.toLowercase()",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "venue",
    "expression": "value.toLowercase()",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10
  },
  {
    "op": "core/column-split",
    "description": "Split column physical_description by separator",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "physical_description",
    "guessCellType": true,
    "removeOriginalColumn": true,
    "mode": "separator",
    "separator": ";",
    "regex": false,
    "maxColumns": 0
  }
]

4). Export the cleaned data set (C3).
This is the output csv file C3:

Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count
0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67
1,12464,,republican house,[dinner],commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34
2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84
3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63
4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES:  ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33

enter image description here

Part 2:
1). Create a new OpenRefine project (P4) importing test data set (T). (Follow the instructions in Demo 0 part 1)
enter image description here
2). Execute recipe R through the OR interface.
Click on the button “Apply…”, and paste the contents of R into the box, then click on “Perform Operations”.
enter image description here
3). Export the cleaned data set C4.
This is the output csv file C4:

Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count
0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67
1,12464,,republican house,[dinner],commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34
2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84
3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63
4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES:  ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33

enter image description here
4). Show that C3 and C4 are the same (data cleaning was reproduced).
Here C3 is named as demo1a_part1_partMenu.csv file, C4 is demo1a_part2_partMenu.csv file. We use diff to test the difference between these two files.

wirelessprv-10-194-219-248:demo_all barbaralee$ diff demo1a_part1/demo1a_part1_partMenu.csv demo1a_part2/demo1a_part2_partMenu.csv

wirelessprv-10-194-219-248:demo_all barbaralee$

It shows that there is no difference between C3 and C4

Demo 1b

This demo shows that OpenRefine recipes do not suffice when operations not generalizable.

Part 1:
1). Create a new OpenRefine project (P5) importing test data set (T). (Follow the instructions in Demo 0 part 1)
enter image description here
2). Perform a few data cleaning operations where one operation is non-generalizable.
3). View the operation history (H1).
There are six steps in all, where the 4th step is a non-generalizable operation, “Edit single cell on row 2, column event”.
enter image description here
4). Export the operation history and save as a recipe R via copy and paste to a file.
When we check this JSON file, there are only five operations recorded in it.
recipe R:
This is the R:

[
  {
    "op": "core/text-transform",
    "description": "Text transform on cells in column id using expression value.toNumber()",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "id",
    "expression": "value.toNumber()",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10
  },
  {
    "op": "core/text-transform",
    "description": "Text transform on cells in column sponsor using expression value.toLowercase()",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "sponsor",
    "expression": "value.toLowercase()",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10
  },
  {
    "op": "core/text-transform",
    "description": "Text transform on cells in column event using expression value.toLowercase()",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "event",
    "expression": "value.toLowercase()",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10
  },
  {
    "op": "core/text-transform",
    "description": "Text transform on cells in column venue using expression value.toLowercase()",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "venue",
    "expression": "value.toLowercase()",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10
  },
  {
    "op": "core/column-split",
    "description": "Split column physical_description by separator",
    "engineConfig": {
      "facets": [],
      "mode": "row-based"
    },
    "columnName": "physical_description",
    "guessCellType": true,
    "removeOriginalColumn": true,
    "mode": "separator",
    "separator": ";",
    "regex": false,
    "maxColumns": 0
  }
]

5). Export the cleaned data set C1.
This is the output csv file C1:

Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count
0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67
1,12464,,republican house,dinner;,commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34
2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84
3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63
4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES:  ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33

enter image description here

Part 2:
1). Create a new OpenRefine project (P6) importing test data set (T). (Follow the instructions in Demo 0 part 1)
enter image description here
2). Execute recipe R through the OR interface.
Click on “Apply…” and copy and paste the contents of R into the box, then click on “Perform Operations”.
enter image description here
3). View the operation history (H2) and note that H2 lacks the non-generalizable steps from H1.
As it shows below, Step 4 “Edit single cell on row 2, column event” recorded in H1 disappears in H2.
enter image description here
3). Export the cleaned data set C2.
This is the output csv file C2:

Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count
0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67
1,12464,,republican house,[dinner],commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34
2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84
3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63
4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES:  ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33

enter image description here
4). Show that C1 and C2 are different (data cleaning not reproduced)
C1 here is named as demo1b_part1.csv file, C2 is named as demo1b_part2_partMenu.csv file.

wirelessprv-10-194-219-248:demo_all barbaralee$ diff demo1b_part1/demo1b_part1.csv demo1b_part2/demo1b_part2_partMenu.csv

3c3

< 1,12464,,republican house,dinner;,commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34


> 1,12464,,republican house,[dinner],commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34

As it shows, there is one difference between C1 and C2 which is stored in column 3 and row 3. In C1, the value is “dinner;”, whereas the value in C2 is “[dinner]”. This difference is exactly caused by the missing non-generalizable operation Step 4 “Edit single cell on row 2, column event”.