Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add foreign key support for users id and posts id #8

Merged
merged 3 commits into from
Jan 23, 2019

Conversation

madtibo
Copy link
Contributor

@madtibo madtibo commented Aug 16, 2018

This option is using the "--foreign-keys" switch

WARNING: when using the foreign keys option, some entries in votes and postlinks might be updated to enforce data integrity

Copy link
Collaborator

@musically-ut musically-ut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feature add!

I am completely behind adding foreign key constraints for performance reason. However, I have some reservations about changing the underlying data, esp. about dropping data which may have special meaning for people.

I would be happy to accept the PR in two parts: adding the Foreign Keys which are safe, and then we can discuss the data-altering ones separately.

What do you think?

@@ -0,0 +1,5 @@
-- impossible to enforce so set NULL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit too harsh.

I think we should err on the side of caution and let data remain in the DB and not add the foreign key constraints here. Another option is to add explicit option for --hard-foreign-keys which will add these foreign key constraints while dropping data.

What do you think?

@@ -0,0 +1,2 @@
-- dummy query
SELECT 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this here because Postgres complains if we try to run just a file with a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is the reason.

sql/Votes_fk.sql Outdated
@@ -0,0 +1,4 @@
ALTER TABLE Votes ADD CONSTRAINT fk_votes_userid FOREIGN KEY (userid) REFERENCES users (id);
-- impossible to enforce so set NULL
UPDATE Votes SET postid=NULL WHERE postid NOT IN (SELECT DISTINCT id FROM Posts);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before about erring in the side of caution and keeping extra data in the DB.

@@ -1,7 +1,7 @@
DROP TABLE IF EXISTS Votes CASCADE;
CREATE TABLE Votes (
Id int PRIMARY KEY ,
PostId int not NULL ,
PostId int , -- not NULL ,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not add the foreign key constraints if they require dropping the not NULL on these fields.

One will not be updating the tables anyway after they have been created once. So the primary reason to have foreign key constraints is for performance (to make EXISTS queries faster, at least), correct?

In that case, I'd rather leave them up to the discretion of the user while preserving as many 'sanity' checks as we can (e.g. not NULL constraints) which are provided by the underlying real-data.

@madtibo
Copy link
Contributor Author

madtibo commented Aug 16, 2018

The foreign key are useful for the planer to optimise queries. But constraints are mainly important to enforce the data business integrity. A NULL value for a foreign key means the linked entry has been deleted. I would think that data are more 'sane' when there is no link to non-existent data. But it is just a point of view :-)

I chose to nullify some values to keep the maximum of the information. But the drawback is that I had to drop the NOT NULL constraint.

An other option to keep the not null constraint is to delete the offending rows before creating the foreign keys. That would be more damaging on the data so definitely, a change to '--hard-foreign-keys' would be a good idea.

@musically-ut Tell me what you prefer, I can make the changes.

@musically-ut
Copy link
Collaborator

Okay, in this case, let's split it into two separate PRs: one with --foreign-keys (safe) and another with --hard-foreign-keys. Then in this version, we'll only add foreign-key constraints which are safe to add.

After this, I think #9 would be a great feature to add!

Thanks again!

@madtibo
Copy link
Contributor Author

madtibo commented Sep 3, 2018

OK, I'll look into it to create and use these options.

…n-keys" switch

WARNING: when using the foreign keys option, some entries in votes and postlinks might be updated to enforce data integrity
@madtibo madtibo force-pushed the foreign_key_support branch 2 times, most recently from f5948ea to 8456f4b Compare January 22, 2019 14:58
@madtibo madtibo force-pushed the foreign_key_support branch from 8456f4b to b583f1f Compare January 22, 2019 15:11
@madtibo
Copy link
Contributor Author

madtibo commented Jan 22, 2019

@musically-ut I removed the data update. The constraint are set as not valid, which means the constraints will not be verified for current data. They will be enforced on new inserts and updates.
I put some comments on how to enforce the constraints validation in the Votes_fk.sql and PostLinks_fk.sql files.

What do you think of this version?

load_into_pg.py Outdated
if len(valuesStr) > 0:
cmd = 'INSERT INTO ' + table + \
' VALUES\n' + valuesStr + ';'
cur.execute(cmd)
conn.commit()
six.print_('Table processing took {:.1f} seconds'.format(time.time() - start_time))
six.print_('Table processing took {1:.1f} seconds'.format(table, time.time() - start_time))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 'Processing table {} took ...' ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have had the idea to log it. Well spotted!
I added it.

@musically-ut musically-ut merged commit cbb39fb into Networks-Learning:master Jan 23, 2019
@musically-ut
Copy link
Collaborator

LGTM! Thanks!

@madtibo
Copy link
Contributor Author

madtibo commented Jan 23, 2019

Great! Let's go for the third PR...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants