-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add foreign key support for users id and posts id #8
add foreign key support for users id and posts id #8
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feature add!
I am completely behind adding foreign key constraints for performance reason. However, I have some reservations about changing the underlying data, esp. about dropping data which may have special meaning for people.
I would be happy to accept the PR in two parts: adding the Foreign Keys which are safe, and then we can discuss the data-altering ones separately.
What do you think?
sql/PostLinks_fk.sql
Outdated
@@ -0,0 +1,5 @@ | |||
-- impossible to enforce so set NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit too harsh.
I think we should err on the side of caution and let data remain in the DB and not add the foreign key constraints here. Another option is to add explicit option for --hard-foreign-keys
which will add these foreign key constraints while dropping data.
What do you think?
@@ -0,0 +1,2 @@ | |||
-- dummy query | |||
SELECT 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this here because Postgres complains if we try to run just a file with a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is the reason.
sql/Votes_fk.sql
Outdated
@@ -0,0 +1,4 @@ | |||
ALTER TABLE Votes ADD CONSTRAINT fk_votes_userid FOREIGN KEY (userid) REFERENCES users (id); | |||
-- impossible to enforce so set NULL | |||
UPDATE Votes SET postid=NULL WHERE postid NOT IN (SELECT DISTINCT id FROM Posts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before about erring in the side of caution and keeping extra data in the DB.
sql/Votes_pre.sql
Outdated
@@ -1,7 +1,7 @@ | |||
DROP TABLE IF EXISTS Votes CASCADE; | |||
CREATE TABLE Votes ( | |||
Id int PRIMARY KEY , | |||
PostId int not NULL , | |||
PostId int , -- not NULL , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not add the foreign key constraints if they require dropping the not NULL
on these fields.
One will not be updating the tables anyway after they have been created once. So the primary reason to have foreign key constraints is for performance (to make EXISTS
queries faster, at least), correct?
In that case, I'd rather leave them up to the discretion of the user while preserving as many 'sanity' checks as we can (e.g. not NULL constraints) which are provided by the underlying real-data.
The foreign key are useful for the planer to optimise queries. But constraints are mainly important to enforce the data business integrity. A NULL value for a foreign key means the linked entry has been deleted. I would think that data are more 'sane' when there is no link to non-existent data. But it is just a point of view :-) I chose to nullify some values to keep the maximum of the information. But the drawback is that I had to drop the NOT NULL constraint. An other option to keep the not null constraint is to delete the offending rows before creating the foreign keys. That would be more damaging on the data so definitely, a change to '--hard-foreign-keys' would be a good idea. @musically-ut Tell me what you prefer, I can make the changes. |
Okay, in this case, let's split it into two separate PRs: one with After this, I think #9 would be a great feature to add! Thanks again! |
OK, I'll look into it to create and use these options. |
…n-keys" switch WARNING: when using the foreign keys option, some entries in votes and postlinks might be updated to enforce data integrity
f5948ea
to
8456f4b
Compare
8456f4b
to
b583f1f
Compare
@musically-ut I removed the data update. The constraint are set as What do you think of this version? |
load_into_pg.py
Outdated
if len(valuesStr) > 0: | ||
cmd = 'INSERT INTO ' + table + \ | ||
' VALUES\n' + valuesStr + ';' | ||
cur.execute(cmd) | ||
conn.commit() | ||
six.print_('Table processing took {:.1f} seconds'.format(time.time() - start_time)) | ||
six.print_('Table processing took {1:.1f} seconds'.format(table, time.time() - start_time)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not 'Processing table {} took ...' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have had the idea to log it. Well spotted!
I added it.
LGTM! Thanks! |
Great! Let's go for the third PR... |
This option is using the "--foreign-keys" switch
WARNING: when using the foreign keys option, some entries in votes and postlinks might be updated to enforce data integrity