These four demos are to test the OpenRefine reproducibility
We use the first 5 rows of Menu.csv from New York Public Library as our demo dataset.
The version of OpenRefine is 3.1 and all of the experiments are launched on Mac OS.
The csv file used here is in comma separated mode.
This is the original dataset:
,id,name,sponsor,event,venue,place,physical_description,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count 0,12463,,HOTEL EASTMAN,BREAKFAST,COMMERCIAL,"HOT SPRINGS, AR",CARD; 4.75X7.5;,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67 1,12464,,REPUBLICAN HOUSE,[DINNER],COMMERCIAL,"MILWAUKEE, [WI];",CARD; ILLUS; COL; 7.0X9.0;,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34 2,12465,,NORDDEUTSCHER LLOYD BREMEN,FRUHSTUCK/BREAKFAST;,COMMERCIAL,DAMPFER KAISER WILHELM DER GROSSE;,CARD; ILLU; COL; 5.5X8.0;,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84 3,12466,,NORDDEUTSCHER LLOYD BREMEN,LUNCH;,COMMERCIAL,DAMPFER KAISER WILHELM DER GROSSE;,CARD; ILLU; COL; 5.5X8.0;,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63 4,12467,,NORDDEUTSCHER LLOYD BREMEN,DINNER;,COMMERCIAL,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER; ILLU; COL; 5.5X7.5;,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES: ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33
This demo is supported by OpenRefine natively, should always work.
Part 1:
1). Create new OpenRefine project (P1) importing test data set (T).
Create Project steps:
- Click on “Create Project” and “Choose Files” from the local computer, then click on “Next”.
- OpenRefine will generate a “Configure Parsing Options” page with default settings, where users can change the choices accordingly, including “Project name”, “Character encoding”, “Columns are separated by”, etc. Here we change the project name into “demo0_part1_partMenu”, then Click on “Create Project”, a new OpenRefine project is set up.
- This is the project interface for a new OpenRefine. On the right, users can do data cleaning manipulations on the table (see the picture below), and all of the manipulations will be recorded in the “Undo/Redo” sidebar on the left simultaneously.
Users can undo or redo some data cleaning steps by simply click on the steps they want to restore.
2). Perform a few data cleaning operations, both generalizable and non-generalizable.
Here there are 6 steps in all recorded in the Undo/Redo operation history sidebar.
3). View the operation history(H1)
4). Undo all data cleaning steps, then redo all the operations.
Click on the first step: “0. Create project”, which can help undo all data cleaning steps and initialize the project status.
Then click back on the final step to redo all the operations.
5). Export the cleaned data set C1.
Click on “Export” button, and choose “Comma-separated value” which can generate a csv file.
Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count 0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67 1,12464,,republican house,dinner;,commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34 2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84 3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63 4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES: ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33
6). Export the project and save as an archive A, a tarball.
Click on “Export” and choose “Export project”, a tar.gz compressed folder will be generated.
Part 2:
1). Create a new OpenRefine project (P2), importing the exported archive (A).
Import Project steps:
Click on “Import Project” on the left, then choose the archive file and re-name project (optional choice).
2). View the operation history (H2) and check that it looks like H1
The operation history on the left is H2, and H1 is on the right. We can see that H2 is the same as H1.
3). Undo all data cleaning steps, then redo all the operations.
4). Export the cleaned data set C2.
This is C2:
Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count 0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67 1,12464,,republican house,dinner;,commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34 2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84 3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63 4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES: ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33
5). Show that C1 and C2 are the same.
wirelessprv-10-194-219-248:demo_all barbaralee$ diff demo0_part1/demo0_part1_partMenu.csv demo0_part2/demo0_part2.csv
wirelessprv-10-194-219-248:demo_all barbaralee$
We use diff to test C1 and C2, and there is no return here. Thus, C1 and C2 are the same.
This demo shows that OpenRefine recipes suffice when all operations are generalizable.
Part 1:
1). Create new OpenRefine project (P3) importing test data set (T).
(Follow the instructions in Demo 0 part 1)
2). Perform a few data cleaning operations where all operations are generalizable.
There are 5 steps in all.
3). Export the operation history and save as a recipe R.
Click on “Extract…” and then copy and paste the json-format contents from the right red box.
This is the recipe R:
[
{
"op": "core/text-transform",
"description": "Text transform on cells in column id using expression value.toNumber()",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "id",
"expression": "value.toNumber()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10
},
{
"op": "core/text-transform",
"description": "Text transform on cells in column sponsor using expression value.toLowercase()",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "sponsor",
"expression": "value.toLowercase()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10
},
{
"op": "core/text-transform",
"description": "Text transform on cells in column event using expression value.toLowercase()",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "event",
"expression": "value.toLowercase()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10
},
{
"op": "core/text-transform",
"description": "Text transform on cells in column venue using expression value.toLowercase()",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "venue",
"expression": "value.toLowercase()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10
},
{
"op": "core/column-split",
"description": "Split column physical_description by separator",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "physical_description",
"guessCellType": true,
"removeOriginalColumn": true,
"mode": "separator",
"separator": ";",
"regex": false,
"maxColumns": 0
}
]
4). Export the cleaned data set (C3).
This is the output csv file C3:
Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count 0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67 1,12464,,republican house,[dinner],commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34 2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84 3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63 4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES: ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33
Part 2:
1). Create a new OpenRefine project (P4) importing test data set (T). (Follow the instructions in Demo 0 part 1)
2). Execute recipe R through the OR interface.
Click on the button “Apply…”, and paste the contents of R into the box, then click on “Perform Operations”.
3). Export the cleaned data set C4.
This is the output csv file C4:
Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count 0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67 1,12464,,republican house,[dinner],commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34 2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84 3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63 4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES: ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33
4). Show that C3 and C4 are the same (data cleaning was reproduced).
Here C3 is named as demo1a_part1_partMenu.csv file, C4 is demo1a_part2_partMenu.csv file. We use diff to test the difference between these two files.
wirelessprv-10-194-219-248:demo_all barbaralee$ diff demo1a_part1/demo1a_part1_partMenu.csv demo1a_part2/demo1a_part2_partMenu.csv
wirelessprv-10-194-219-248:demo_all barbaralee$
It shows that there is no difference between C3 and C4
This demo shows that OpenRefine recipes do not suffice when operations not generalizable.
Part 1:
1). Create a new OpenRefine project (P5) importing test data set (T). (Follow the instructions in Demo 0 part 1)
2). Perform a few data cleaning operations where one operation is non-generalizable.
3). View the operation history (H1).
There are six steps in all, where the 4th step is a non-generalizable operation, “Edit single cell on row 2, column event”.
4). Export the operation history and save as a recipe R via copy and paste to a file.
When we check this JSON file, there are only five operations recorded in it.
recipe R:
This is the R:
[
{
"op": "core/text-transform",
"description": "Text transform on cells in column id using expression value.toNumber()",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "id",
"expression": "value.toNumber()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10
},
{
"op": "core/text-transform",
"description": "Text transform on cells in column sponsor using expression value.toLowercase()",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "sponsor",
"expression": "value.toLowercase()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10
},
{
"op": "core/text-transform",
"description": "Text transform on cells in column event using expression value.toLowercase()",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "event",
"expression": "value.toLowercase()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10
},
{
"op": "core/text-transform",
"description": "Text transform on cells in column venue using expression value.toLowercase()",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "venue",
"expression": "value.toLowercase()",
"onError": "keep-original",
"repeat": false,
"repeatCount": 10
},
{
"op": "core/column-split",
"description": "Split column physical_description by separator",
"engineConfig": {
"facets": [],
"mode": "row-based"
},
"columnName": "physical_description",
"guessCellType": true,
"removeOriginalColumn": true,
"mode": "separator",
"separator": ";",
"regex": false,
"maxColumns": 0
}
]
5). Export the cleaned data set C1.
This is the output csv file C1:
Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count 0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67 1,12464,,republican house,dinner;,commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34 2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84 3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63 4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES: ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33
Part 2:
1). Create a new OpenRefine project (P6) importing test data set (T). (Follow the instructions in Demo 0 part 1)
2). Execute recipe R through the OR interface.
Click on “Apply…” and copy and paste the contents of R into the box, then click on “Perform Operations”.
3). View the operation history (H2) and note that H2 lacks the non-generalizable steps from H1.
As it shows below, Step 4 “Edit single cell on row 2, column event” recorded in H1 disappears in H2.
3). Export the cleaned data set C2.
This is the output csv file C2:
Column,id,name,sponsor,event,venue,place,physical_description 1,physical_description 2,physical_description 3,physical_description 4,physical_description 5,occasion,notes,call_number,keywords,language,date,location,location_type,currency,currency_symbol,status,page_count,dish_count 0,12463,,hotel eastman,breakfast,commercial,"HOT SPRINGS, AR",CARD, 4.75X7.5,,,,EASTER;,,1900-2822,,,1900-04-15,Hotel Eastman,,,,complete,2,67 1,12464,,republican house,[dinner],commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34 2,12465,,norddeutscher lloyd bremen,fruhstuck/breakfast;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, STEAMSHIP AND SAILING VESSEL;",1900-2827,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,84 3,12466,,norddeutscher lloyd bremen,lunch;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,CARD, ILLU, COL, 5.5X8.0,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH SAILING VESSEL;",1900-2828,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,2,63 4,12467,,norddeutscher lloyd bremen,dinner;,commercial,DAMPFER KAISER WILHELM DER GROSSE;,FOLDER, ILLU, COL, 5.5X7.5,,,"MENU IN GERMAN AND ENGLISH; ILLUS, HARBOR SCENE WITH ROCKS AND LIGHTHOUSE; STEAMSHIP AND SAILING VESSELS; CONCERT PROGRAM; DATES: ON GERMAN SIDE OF MENU ""MONTAG, DEN 16 APRIL 1900""; ON ENGLISH SIDE OF MENU ""MONDAY, APRIL 15TH, 1900"";",1900-2829,,,1900-04-16,Norddeutscher Lloyd Bremen,,,,complete,4,33
4). Show that C1 and C2 are different (data cleaning not reproduced)
C1 here is named as demo1b_part1.csv file, C2 is named as demo1b_part2_partMenu.csv file.
wirelessprv-10-194-219-248:demo_all barbaralee$ diff demo1b_part1/demo1b_part1.csv demo1b_part2/demo1b_part2_partMenu.csv
3c3
< 1,12464,,republican house,dinner;,commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34
> 1,12464,,republican house,[dinner],commercial,"MILWAUKEE, [WI];",CARD, ILLUS, COL, 7.0X9.0,,EASTER;,"WEDGEWOOD BLUE CARD; WHITE EMBOSSED GREEK KEY BORDER; ""EASTER SUNDAY"" EMBOSSED IN WHITE; VIOLET COLORED SPRAY OF FLOWERS IN UPPER LEFT CORNER;",1900-2825,,,1900-04-15,Republican House,,,,complete,2,34
As it shows, there is one difference between C1 and C2 which is stored in column 3 and row 3. In C1, the value is “dinner;”, whereas the value in C2 is “[dinner]”. This difference is exactly caused by the missing non-generalizable operation Step 4 “Edit single cell on row 2, column event”.