Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading a sparse set from a 2D table #7

Open
4er4er4er opened this issue May 21, 2020 · 11 comments
Open

Reading a sparse set from a 2D table #7

4er4er4er opened this issue May 21, 2020 · 11 comments

Comments

@4er4er4er
Copy link

Use AMPL .dat format files, it is possible to read a subset of pairs and an associated parameter from the same table. For example, with

set NUTR;
set FOOD;
set LINKS within {NUTR, FOOD};
param amt {LINKS} >= 0;

and data

set NUTR := A B1 B2 C ;
set FOOD := BEEF CHK FISH HAM MCH MTL SPG TUR ;

param :LINKS: amt (tr):
           A    C   B1   B2 :=
   BEEF   60   20    .   15
   CHK     .    .   20   20
   FISH    .    .   15    .
   HAM    40   40   35    .
   MCH    15   35   15   15
   MTL    70   30   15   15
   SPG    25   50   25   15
   TUR    60   20   15    . ;

neither the values marked . or the corresponding members of LINKS are included in the data that is read:

ampl: display LINKS, amt;
set LINKS :=
(A,BEEF)    (B2,CHK)    (B1,HAM)    (B2,MCH)    (B2,MTL)    (B2,SPG)
(C,BEEF)    (B1,FISH)   (A,MCH)     (A,MTL)     (A,SPG)     (A,TUR)
(B2,BEEF)   (A,HAM)     (C,MCH)     (C,MTL)     (C,SPG)     (C,TUR)
(B1,CHK)    (C,HAM)     (B1,MCH)    (B1,MTL)    (B1,SPG)    (B1,TUR);

amt [*,*] (tr)
:      A    B1   B2   C     :=
BEEF   60   .    15   20
CHK    .    20   20   .
FISH   .    15   .    .
HAM    40   35   .    40
MCH    15   15   15   35
MTL    70   15   15   30
SPG    25   25   15   50
TUR    60   15   .    20
;

Reading from spreadsheet data should work the same, with the role of . played by a blank cell. But currently the entire set {NUTR,FOOD} is read rather than just the pairs corresponding to cells that are non-blank. In this example, reading

image

with table definition

table dietAmts IN "amplxl" "dietIssue.xlsx" "Amounts" "2D":
   LINKS <- [NUTR, FOOD], amt;

gives the same values but the wrong LINKS set:

display LINKS, amt;
set LINKS :=
(A,BEEF)    (B2,CHK)    (A,HAM)     (B2,MCH)    (A,SPG)     (B2,TUR)
(B1,BEEF)   (C,CHK)     (B1,HAM)    (C,MCH)     (B1,SPG)    (C,TUR)
(B2,BEEF)   (A,FISH)    (B2,HAM)    (A,MTL)     (B2,SPG)
(C,BEEF)    (B1,FISH)   (C,HAM)     (B1,MTL)    (C,SPG)
(A,CHK)     (B2,FISH)   (A,MCH)     (B2,MTL)    (A,TUR)
(B1,CHK)    (C,FISH)    (B1,MCH)    (C,MTL)     (B1,TUR);

amt [*,*] (tr)
:      A    B1   B2   C     :=
BEEF   60   .    15   20
CHK    .    20   20   .
FISH   .    15   .    .
HAM    40   35   .    40
MCH    15   15   15   35
MTL    70   15   15   30
SPG    25   25   15   50
TUR    60   15   .    20
;
@nfbvs
Copy link
Contributor

nfbvs commented May 22, 2020

There seems to be a difference between the .dat reader and the table handler reader. I was able to reproduce the issue with a relational table in amplxl, odbc and the .tab handler.
To reproduce the issue with the .tab handler copy the following data to a file named amounts.tab

ampl.tab 2 1
NUTR	FOOD	amt
A	BEEF	60
A	CHK	.
A	FISH	.
A	HAM	40
A	MCH	15
A	MTL	70
A	SPG	25
A	TUR	60
B1	BEEF	.
B1	CHK	20
B1	FISH	15
B1	HAM	35
B1	MCH	15
B1	MTL	15
B1	SPG	25
B1	TUR	15
B2	BEEF	15
B2	CHK	20
B2	FISH	.
B2	HAM	.
B2	MCH	15
B2	MTL	15
B2	SPG	15
B2	TUR	.
C	BEEF	20
C	CHK	.
C	FISH	.
C	HAM	40
C	MCH	35
C	MTL	30
C	SPG	50
C	TUR	20

and the following to a .run script

reset;
set NUTR := {"A", "B1", "B2", "C"};
set FOOD := {"BEEF", "CHK", "FISH", "HAM", "MCH", "MTL", "SPG", "TUR"};
set LINKS within {NUTR, FOOD};
param amt {LINKS} >= 0;

table amounts IN:
	LINKS <- [NUTR, FOOD], amt;

read table amounts;
display LINKS, amt;

the display in AMPL shows

set LINKS :=
(A,BEEF)    (A,SPG)     (B1,MCH)    (B2,FISH)   (C,BEEF)    (C,SPG)
(A,CHK)     (A,TUR)     (B1,MTL)    (B2,HAM)    (C,CHK)     (C,TUR)
(A,FISH)    (B1,BEEF)   (B1,SPG)    (B2,MCH)    (C,FISH)
(A,HAM)     (B1,CHK)    (B1,TUR)    (B2,MTL)    (C,HAM)
(A,MCH)     (B1,FISH)   (B2,BEEF)   (B2,SPG)    (C,MCH)
(A,MTL)     (B1,HAM)    (B2,CHK)    (B2,TUR)    (C,MTL);

amt [*,*] (tr)
:      A    B1   B2   C     :=
BEEF   60   .    15   20
CHK    .    20   20   .
FISH   .    15   .    .
HAM    40   35   .    40
MCH    15   15   15   35
MTL    70   15   15   30
SPG    25   25   15   50
TUR    60   15   .    20
;

@mingodadampl
Copy link

Based on Nicolau's comment I looked at the escrow/acl/tables.c and it seems that in the Add_Rows function we are inserting the the keys before check the rest of values.
The behavior Robert expect is not present in any table handler, we can rework the code to achieve that behavior and it'll break (change behavior) for existing ampl scripts.

@mingodadampl
Copy link

After a while I'm revisiting my comment because it was not correct, on Nicolau's example it's not equivalent with Robert example, to be equivalent it should be like this:

ampl.tab 2 1
NUTR	FOOD	amt
A	BEEF	60
A	HAM	40
A	MCH	15
A	MTL	70
A	SPG	25
A	TUR	60
B1	CHK	20
B1	FISH	15
B1	HAM	35
B1	MCH	15
B1	MTL	15
B1	SPG	25
B1	TUR	15
B2	BEEF	15
B2	CHK	20
B2	MCH	15
B2	MTL	15
B2	SPG	15
C	BEEF	20
C	HAM	40
C	MCH	35
C	MTL	30
C	SPG	50
C	TUR	20

And then running it will get this output (that somehow match Robert example):

set LINKS :=
(A,BEEF)    (A,SPG)     (B1,HAM)    (B1,TUR)    (B2,MTL)    (C,MCH)
(A,HAM)     (A,TUR)     (B1,MCH)    (B2,BEEF)   (B2,SPG)    (C,MTL)
(A,MCH)     (B1,CHK)    (B1,MTL)    (B2,CHK)    (C,BEEF)    (C,SPG)
(A,MTL)     (B1,FISH)   (B1,SPG)    (B2,MCH)    (C,HAM)     (C,TUR);

amt [*,*] (tr)
:      A    B1   B2   C     :=
BEEF   60   .    15   20
CHK    .    20   20   .
FISH   .    15   .    .
HAM    40   35   .    40
MCH    15   15   15   35
MTL    70   15   15   30
SPG    25   25   15   50
TUR    60   15   .    20
;

It seems that if there isn't data in a cell we should not include it in a call to AddRow.

@mingodadampl
Copy link

With this modification to initial Robert example we can see the equivalent "LINKS' table:

set NUTR;
set FOOD;
set LINKS within {NUTR, FOOD};
param amt {LINKS} >= 0;

data test-empty.dat;

table amounts_dat OUT:
	[NUTR, FOOD], amt;
write table amounts_dat;

table links_dat OUT:
	{(n,f) in LINKS} -> [n ~NUTR, f ~FOOD];
write table links_dat;

display LINKS, amt;
ampl.tab 2 0
NUTR	FOOD
A	BEEF
C	BEEF
B2	BEEF
B1	CHK
B2	CHK
B1	FISH
A	HAM
C	HAM
B1	HAM
A	MCH
C	MCH
B1	MCH
B2	MCH
A	MTL
C	MTL
B1	MTL
B2	MTL
A	SPG
C	SPG
B1	SPG
B2	SPG
A	TUR
C	TUR
B1	TUR

@mingodadampl
Copy link

I'm still in doubt about my second comment, it seems that there is no way from the table handler to know if it's updating a set like ampl can when reading dat files.
I tried modify escrow/acl/tables.c::Add_Rows but could not get it to work as expected so far.
I'm again inclined to say the problem is inside ampl not the table handler.

@mingodadampl
Copy link

Finally I could find a possible fix for this issue with this commit https://github.com/ampl/escrow/commit/f85bf5c67cedf8d39cbbce5fe10e33e16f354f0c, I was correct on my first comment but I didn't understood the code at that time to be able to move the insertion on the indexed set after checking for '.' values.

@nfbvs
Copy link
Contributor

nfbvs commented May 24, 2020

Can you test the fix with the following files?
dat_data.dat

param: dat_keys: dat_vals:=
A 1
B 2
C .
D 4
;

tab_data.tab

ampl.tab 1 1
K	V
A	1
B	2
C	.
D	4

example.run

reset;

# data loaded from .dat
set dat_keys;
param dat_vals{dat_keys};

data dat_data.dat;

# data loaded from .tab
set tab_keys;
param tab_vals{tab_keys};

table tab_data IN:
	tab_keys <- [K], tab_vals ~ V;

read table tab_data;

# compare
display dat_keys, dat_vals;
display tab_keys, tab_vals;

expected output:

set dat_keys := A B C D;

dat_vals [*] :=
A  1
B  2
D  4
;

set tab_keys := A B C D;

tab_vals [*] :=
A  1
B  2
D  4
;

@mingodadampl
Copy link

Good catch !
Thanks for the sample script I extended the initial fix here https://github.com/ampl/escrow/commit/a8aed66684fcfd74abdc63c8358a6756f602e5d7 , I could not identify where the "escrow/acl/read.c" manage this case and came with this not elegant/clean fix.

It now produces the expected output for both of Nicolau's examples, I could not reproduce Robert example because I could not create the xlsx file that could be accepted by the table handler definition.

A peer review is welcome !

@nfbvs
Copy link
Contributor

nfbvs commented May 27, 2020

The behavior seems to be different for different dimensions, for example with the followinf script

reset;

set A := {"a", "b", "c", "d"};
set B within A;
param C{B};

data data.dat;

display A, B, C;

reading the data

param: B: C:=
a 1
b 2
c .
d 4
;

AMPL does not skip the missing values and displays

set A := a b c d;

set B := a b c d;

C [*] :=
a  1
b  2
d  4
;

@mingodadampl
Copy link

There is also a confuse or not clear way to explain this behavior where we have a set declared with dimen and check (what really change in the data format is having a 2D table or not, independent of the within clause):

set NUTR;
set FOOD;
set LINKS dimen 2;
check LINKS within {NUTR, FOOD};
param dat_vals {LINKS};

data dat_data2.dat;
display dat_vals, LINKS;
param :LINKS: dat_vals (tr):
           A    C   B1   B2 :=
   BEEF   60   20    .   15
   CHK     .    .   20   20
   FISH    .    .   15    .
   HAM    40   40   35    .
   MCH    15   35   15   15
   MTL    70   30   15   15
   SPG    25   50   25   15
   TUR    60   20   15    . ;

@4er4er4er
Copy link
Author

It seems that the rules for reading set and parameter values together from a .dat table are as follows:

  • For 1D tables, a member is read from every row of the table, regardless of . entries.
  • For 2D tables, a member is generated for every entry in the table, skipping . entries.

I believe that these rules were chosen to correspond to the cases that are most likely to be useful. It's possible to think up situations where the opposites of the above rules would be desirable, but they are less likely cases and can be handled in other ways.

Given that these are indeed the rules for .dat tables, I propose to use analogous rules for amplxl tables in spreadsheets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants