-
-
Notifications
You must be signed in to change notification settings - Fork 128
Custom Domains for Territories #1958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…customDomain resolver
…itory domains editing
… middleware slight readability
…alues; show DNS configs on territory edit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only looked at the middleware and the stuff it calls so far.
Makes sense to me what's going on (not much fortunately haha) but left some comments!
middleware.js
Outdated
} | ||
|
||
// If we're on a custom domain, handle that next | ||
const host = request.headers.get('host') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also check for X-Forwarded-Host
here or is that not needed because it's not a reverse proxy that's going to call us but a browser because DNS will resolve to our IP address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think that if we give instructions to point at our load balancer or CDN, X-Forwarded-Host
may be needed to know the starting domain (which is what we need). For best practice, I think we should check X-Forwarded-Host
and if that's not present (i.e. there's not a reverse proxy, browsers don't create this header) then we'll fallback to the Host
header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I think we can go ahead with X-Forwarded-Host
to check for reverse proxies and then fallback to Host
if that header doesn't exist. I'll leave this comment unresolved to remind myself to do further checks on reverse proxies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitting review now since we'll meet in ~15 minutes
We can go over my comments in the meeting, you don't need to read them before the meeting
] | ||
} | ||
|
||
const certificate = await acm.requestCertificate(params).promise() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For how long are we requesting/issuing a certificate?
Certificates should expire when their territory expires or have to be renewed every year.
I tried to find how this is determined in the AWS docs but I couldn't find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super question, I didn't think of that, the way I described it to you during the meeting got me confused so I'll explain better here:
ACM issues certificates with a fixed validity of 13 months or 395 days, to have some kind of control here we could send a DeleteCertificate request to ACM after the territory period of grace ends without successful renewal. I'll annotate this to implement it.
But! If we don't ever delete a certificate (good territory owner), after 13 months, ACM will automatically renew it so we don't have to do anything.
{verification?.ssl?.state === 'PENDING' && ( | ||
<> | ||
<h5>Step 2: Prepare your domain for SSL</h5> | ||
<p>We issued an SSL certificate for your domain. To validate it, add the following CNAME record:</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state says PENDING
but here you mention we already issued it. That's confusing. Do you mean with PENDING
that we're waiting for the user or for the issuer?
I think if it's pending for the issuer, we shouldn't tell them to do anything if they can only wait.
If we're waiting for the user to do something, this might need another PENDING
state.
I also don't know what "validating a SSL certificate" means. Do they have to validate it in some way if they don't trust the one we issued?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mapped ACM's statuses to something we can digest better. But it's clear that we might want to differentiate DNS and SSL states.
In the case of SSL validation, PENDING
does mean PENDING_VALIDATION
in ACM's language, and will stay in this status until the user has put the correct DNS validation records issued by ACM.
ACM statuses: PENDING_VALIDATION | ISSUED | INACTIVE | EXPIRED | VALIDATION_TIMED_OUT | REVOKED | FAILED
ACM DNS validation requires DNS records to be set to validate the domain before enabling a SSL certificate. if the domain is, e.g., forum.pizza.com
we will prompt the user to add:
CNAME record: _a79865eb4cd1a6ab990a45779b4e0b96.forum.pizza.com.
Value: _424c7224e9b0146f9a8808af955727d0.acm-validations.aws.
^info gathered from ACM docs
I didn't handle that much of ACM edge cases because we don't have the ability to test with Localstack its behavior, so I left it for last
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another partial review. Mostly UX stuff, but I am confused regarding CNAME+TXT.
According to the DNS spec, this won't work, see comment.
Domain mappings: - use direct db calls with cachedFetcher to map domains Middleware: - pass main domain, custom domain, subName to the customDomainMiddleware; - retrieve main domain's hostname from env vars; - only get subName from domain mappings cache (domain is the key); - only check cache if we're not on the main domain; correct function declarations; - light cleanup Schema: - use Citext for domain names as they're case-insensitive
Copy - Uses CopyButton from Form to copy DNS record values Domain Verification UI/UX - Can trigger a re-verification by re-submitting the domain - slight cleanup Validation hints = Better validation hint
components/territory-domains.js
Outdated
const dnsRecord = (host, value) => ( | ||
<div className='d-flex align-items-center gap-2'> | ||
<span className={`${styles.record}`}> | ||
<small className='fw-bold text-muted d-flex align-items-center gap-1 position-relative'> | ||
host | ||
<CopyButton | ||
value={host} | ||
append={ | ||
<ClipboardLine | ||
className={`${styles.clipboard}`} | ||
style={{ width: '1rem', height: '1rem' }} | ||
/> | ||
} | ||
/> | ||
</small> | ||
<pre>{host}</pre> | ||
</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the columns!
The clipboard on hover also looks cool but unfortunately, this won't be obvious on mobile that this is how you can copy since there's no cursor. In general, any feature that requires the hover
state is limited to desktop. So hover
should only be used for stuff that is just nice-to-have. A copy button is pretty important for UX though.
The same reasoning applies to DNS pending
etc. I think the timestamp it shows is when the last check was made (this also needs to be clarified) which is pretty useful, but it's limited to desktop.
Why not use the existing CopyInput
that we also use for the QR code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes! I didn't think of mobile in this instance. I didn't want to bloat every field with a clipboard button, but by not doing so mobile can't actually use the feature. I'll try to find a way to make it permanent but less obtrusive!
You're also right on the last check popover, I'll find an appropriate place for it.
I didn't use CopyInput because of its big design that doesn't match the one for the various fields, but let's see if I can still use it by modifying it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
</span> | ||
) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To include a copy button on DomainGuidelines I exported what we already have, adding an append prop that perhaps needs to be properly named in cleanup phase
edit: realized that GitHub crapped my selection, this references the CopyButton
component in form.js
…s TXT record to be checked; light cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got all DNS and SSL verification to work!
But I couldn't test the custom domain because I think my DNS setup is weird:
$ dig snx.ekzy.is
; <<>> DiG 9.18.33 <<>> snx.ekzy.is
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35942
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;snx.ekzy.is. IN A
;; ANSWER SECTION:
snx.ekzy.is. 3600 IN CNAME sn.ekzy.is.
sn.ekzy.is. 3600 IN CNAME ekzy.is.
ekzy.is. 3600 IN A 172.105.95.124
;; Query time: 482 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Fri Apr 25 02:46:17 CEST 2025
;; MSG SIZE rcvd: 87
https://sn.ekzy.is/ has the following nginx config on my server running at the given IP address:
nginx config
upstream sn {
server 127.0.0.1:3333;
}
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
server_name sn.ekzy.is;
listen 80;
listen [::]:80;
include letsencrypt.conf;
return 301 https://sn.ekzy.is$request_uri;
}
server {
server_name sn.ekzy.is;
listen 443;
listen [::]:443 ssl;
ssl_certificate /etc/letsencrypt/live/sn.ekzy.is/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/sn.ekzy.is/privkey.pem;
client_header_timeout 30;
client_body_timeout 30;
keepalive_timeout 30;
location / {
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_read_timeout 30s;
proxy_send_timeout 30s;
proxy_pass http://sn$request_uri;
}
location /_next/webpack-hmr {
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Prefix /;
proxy_pass http://sn$request_uri;
}
include letsencrypt.conf;
}
I then run ssh -fnNR 3333:localhost:3000 sn.ekzy.is
to tunnel to SN running on my laptop.
But since I don't have a nginx config for snx.ekzy.is
, it then shows me my default nginx site which is ekzy.is
.
But I think it would work if sn.ekzy.is
wouldn't have a CNAME record itself so I think we don't need to worry about this case.
Just need to figure out how I can test this further.
Requesting changes mostly because of the schema stuff.
{sub && !customDomain && | ||
<div className='w-100'> | ||
<AccordianItem | ||
header={<div style={{ fontWeight: 'bold', fontSize: '92%' }}>advanced</div>} | ||
body={<TerritoryDomains sub={sub} />} | ||
/> | ||
</div>} | ||
{sub && customDomain && <Link className='text-muted w-100' href={`${process.env.NEXT_PUBLIC_URL}/~${sub.name}/edit`}>domain settings on stacker.news</Link>} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that it's weird to see a dropdown after the save button on territory edit, but I put it there because while the options
dropdown will refer to that save button, the advanced
dropdown won't.
It's a UI/UX dilemma, I don't necessarily think that people will confuse save buttons but at the same time there will be lots of save buttons. I think I have to see it before deciding, I'll try in a bit ^^
if (domain) { | ||
const existing = await models.customDomain.findUnique({ where: { subName } }) | ||
if (existing && existing.domain === domain && existing.status !== 'HOLD') { | ||
throw new GqlInputError('domain already set') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to be an error for some reason? If I try to save the domain with the same value, I would expect it to just work™ but this throws.
Next to not sure if this has to be an error, the client only shows the generic error message "failed to update domain."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a way to prevent users to trigger useless verifications that could lead to an offline state, changing TXT values etc.
It's possible to allow domain re-saving but this singlehandedly prevents mistakes!
About the error: I think I forgot to update the toasts after the latest commits, fixing it asap!
components/territory-domains.js
Outdated
const dnsRecord = (host, value) => ( | ||
<div className='d-flex align-items-center gap-2'> | ||
<span className={`${styles.record}`}> | ||
<small className='fw-bold text-muted d-flex align-items-center gap-1 position-relative'> | ||
host | ||
<CopyButton | ||
value={host} | ||
append={ | ||
<ClipboardLine | ||
className={`${styles.clipboard}`} | ||
style={{ width: '1rem', height: '1rem' }} | ||
/> | ||
} | ||
/> | ||
</small> | ||
<pre>{host}</pre> | ||
</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the columns!
The clipboard on hover also looks cool but unfortunately, this won't be obvious on mobile that this is how you can copy since there's no cursor. In general, any feature that requires the hover
state is limited to desktop. So hover
should only be used for stuff that is just nice-to-have. A copy button is pretty important for UX though.
The same reasoning applies to DNS pending
etc. I think the timestamp it shows is when the last check was made (this also needs to be clarified) which is pretty useful, but it's limited to desktop.
Why not use the existing CopyInput
that we also use for the QR code?
api/resolvers/domain.js
Outdated
// schedule domain verification in 5 seconds, apply exponential backoff and keep it for 2 days | ||
// 12 retries, 42 seconds of delay between retries will fit 48 hours of trying for DNS propagation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this.
There are 12 retries and 42 seconds of delay between retries. How does this fit 48 hours? Does this depend on the exponential backoff?
I think it should be easier to know when this job will run so it's also predictable for the user.
Is it an expensive job? Can we not run it every 5 minutes with no backoff or something for 2 days?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retrycount, retrydelay, retrybackoff are all part of the pgboss' exponential backoff system.
By retrying 12 times, with a delay of 42 seconds each and with exponential backoff enabled, it basically does:
42 * 2^12
which equates to 172.032
seconds, roughly 48 hours.
The main reason why I did this is because we use exponential backoff on other jobs too and wanted to give some kind of continuity with the rest of the code.
But another reason was the fact that I was afraid of running an expensive job by resolving records for domains that could also not be properly set at all.
I mean, I don't think it's actually an expensive job, before I was checking EVERY domain at the same time every 5 minutes, which was undoubtedly worse.
I think that a mix of both can be good: still pgboss job scheduling at domain upsert but check it individually every 5 minutes for a maximum of 48 hours, if it fails we will mark it as HOLD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to a domain verification job triggered at domain update, that renews every time the domain is still in PENDING
state. If during the verification process we're still in PENDING
and 48 hours has passed since the domain has been created, it puts the domain in HOLD
and stops further verification processes.
status String @default("PENDING") | ||
failedAttempts Int @default(0) | ||
lastVerifiedAt DateTime? | ||
verification Json @default("{}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get rid of status
, failedAttempts
, lastVerifiedAt
by having another table that stores each verification attempt that links to a domain via foreign keys and then resolving these columns via JOINS.
This way, we also know what happened during each verification attempt and not just the latest one.
I hope it won't require too many changes to do this but I think it's pretty important to keep our schema as normalized as possible. @huumn is currently working on fixing existing issues like this in our current schema in #2084.
Before you do this though, we should consult @huumn to confirm the schema I described is the correct one so you don't have to change it again after his review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I kinda forgot about this schema after a long time and It shouldn't require lots of changes at all to normalize it, I'll wait before implementing it then ^^
@@ -177,6 +177,7 @@ AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE | |||
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY | |||
PERSISTENCE=1 | |||
SKIP_SSL_CERT_DOWNLOAD=1 | |||
LOCALSTACK_ENDPOINT=http://localhost:4566 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be this:
LOCALSTACK_ENDPOINT=http://localhost:4566 | |
LOCALSTACK_ENDPOINT=http://aws:4566 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought I pushed it during a cleanup commit, so sorry, it remained in local.
I'll push it with the next commit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I deleted my comment because I wanted to make it part of my review lol
Oh, I thought I pushed it during a cleanup commit, so sorry, it remained in local.
Cool, no worries!
error: uses the error thrown by the domains resolver hover: clipboards and verification time are shown by default instead of being hidden by hover cleanup: clearer dnsRecord subcomponent
- job will be scheduled after 30 seconds and will be maintained for 48 hours - step-by-step verification process (to be modularized) - don't throw an error if it's not critical - if verification process results in a still PENDING domain we schedule it again after 5 minutes - if the status is still NOT ACTIVE (e.g. PENDING) we put the status on HOLD to avoid re-scheduling
- deletes certificates, if any, when a domain is in HOLD - better initial domain structure - also deletes related jobs on domain deletion - fix ACM validation values gathering - put domain in HOLD after 3 consecutive critical errors - independent verification steps - fix typo on domain update via domain verification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed 4bfda81 because I didnt' realize that git fetch
didn't fetch the new commits like 5130057. I assume that's the case because they have dates in the future:
Please make sure that your commits are not created in the future so git
isn't confused.
I looked at all the code for authentication now.
If I understood the authentication flow for custom domains right, this is how it works:
- Go to /login on custom domain
- Middleware for custom domain redirects us to stacker.news/login and sets
domain
query parameter
Lines 38 to 53 in 4bfda81
// Auth sync redirects with domain and optional callbackUrl and multiAuth params | |
if (pathname === '/login' || pathname === '/signup') { | |
const redirectUrl = new URL(pathname, mainDomain) | |
redirectUrl.searchParams.set('domain', domain) | |
if (query.get('callbackUrl')) { | |
redirectUrl.searchParams.set('callbackUrl', query.get('callbackUrl')) | |
} | |
if (query.get('multiAuth')) { | |
redirectUrl.searchParams.set('multiAuth', query.get('multiAuth')) | |
} | |
const redirectResp = NextResponse.redirect(redirectUrl, { | |
headers: requestHeaders | |
}) | |
// apply referrer cookies to the redirect | |
return applyReferrerCookies(redirectResp, referrerResp) | |
} |
- User authenticates themselves on stacker.news as usual
- When we have a session for stacker.news, we get redirected to /api/auth/sync with
redirectUrl
set to the custom domain insidecallbackUrl
Lines 33 to 58 in 4bfda81
// If we're coming from a custom domain, set as callbackUrl the auth sync endpoint | |
if (domain) { | |
const params = new URLSearchParams() | |
params.set('redirectUrl', encodeURIComponent(callbackUrl)) | |
if (multiAuth) { // take care of multiAuth if requested | |
params.set('multiAuth', multiAuth) | |
} | |
callbackUrl = '/api/auth/sync?' + params.toString() | |
} | |
if (session && callbackUrl && !multiAuth) { | |
// in the case of auth linking we want to pass the error back to settings | |
// in the case of multi auth, don't redirect if there is already a session | |
if (error) { | |
const url = new URL(callbackUrl, process.env.NEXT_PUBLIC_URL) | |
url.searchParams.set('error', error) | |
callbackUrl = url.pathname + url.search | |
} | |
return { | |
redirect: { | |
destination: callbackUrl, | |
permanent: false | |
} | |
} | |
} |
- /api/auth/sync then creates a token in the db and redirects us back to the root of the custom domain (via
redirectUrl
) withtype=sync
andtoken
as a query parameter
stacker.news/pages/api/auth/sync.js
Lines 6 to 68 in 4bfda81
export default async function handler (req, res) { | |
const { redirectUrl, multiAuth } = req.query | |
if (!redirectUrl) { | |
return res.status(400).json({ status: 'ERROR', reason: 'missing redirectUrl parameter' }) | |
} | |
// redirectUrl parse | |
const decodedRedirectUrl = decodeURIComponent(redirectUrl) | |
let customDomain | |
try { | |
customDomain = new URL(decodedRedirectUrl) | |
// not cached because we're handling sensitive data | |
const domain = await models.customDomain.findUnique({ where: { domain: customDomain.host, status: 'ACTIVE' } }) | |
if (!domain) { | |
return res.status(400).json({ status: 'ERROR', reason: 'custom domain not found' }) | |
} | |
} catch (error) { | |
return res.status(400).json({ status: 'ERROR', reason: 'invalid redirectUrl parameter' }) | |
} | |
const mainDomain = process.env.NEXT_PUBLIC_MAIN_DOMAIN | |
// get the user's session | |
const session = await getServerSession(req, res, getAuthOptions(req, res)) | |
if (!session) { | |
// redirect to the login page, middleware will handle the rest | |
return res.redirect(mainDomain + '/login?callbackUrl=' + encodeURIComponent(decodedRedirectUrl)) | |
} | |
try { | |
const token = generateRandomString(32) | |
// create a sync token | |
await models.verificationToken.create({ | |
data: { | |
identifier: `sync:${session.user.id}`, | |
token, | |
expires: new Date(Date.now() + 1 * 60 * 1000) // 1 minute | |
} | |
}) | |
if (process.env.NODE_ENV === 'production') { | |
res.setHeader('Set-Cookie', [ | |
'SameSite=Lax; Secure; HttpOnly' | |
]) | |
} | |
// domain provider will handle this sync request | |
const customDomainCallback = new URL('/?type=sync', decodedRedirectUrl) | |
customDomainCallback.searchParams.set('token', token) | |
customDomainCallback.searchParams.set('callbackUrl', decodedRedirectUrl) | |
if (multiAuth) { | |
customDomainCallback.searchParams.set('multiAuth', multiAuth) | |
} | |
// TODO: security headers? | |
// redirect to the custom domain callback | |
return res.redirect(customDomainCallback.toString()) | |
} catch (error) { | |
console.error('Error generating token:', error) | |
return res.status(500).json({ error: 'Failed to generate token' }) | |
} | |
} |
- This is then caught in a
useEffect
ofDomainProvider
and callssignIn
:
stacker.news/components/territory-domains.js
Lines 38 to 49 in 4bfda81
// temporary auth sync | |
useEffect(() => { | |
if (router.query.type === 'sync') { | |
console.log('signing in with sync', router.query) | |
signIn('sync', { | |
token: router.query.token, | |
multiAuth: router.query.multiAuth, | |
redirect: false | |
}) | |
router.push(router.query.callbackUrl) // next auth redirect only supports the main domain | |
} | |
}, [router.query.type]) |
- The provider for
sync
checks that the token exists and did not expire. If so, we return the user object which issues the cookies for the custom domain.
stacker.news/pages/api/auth/[...nextauth].js
Lines 226 to 250 in 4bfda81
// Authentication Provider for syncing a user's session to a custom domain | |
const syncAuth = async (token, req, res) => { | |
try { | |
const verificationToken = await prisma.verificationToken.findUnique({ where: { token } }) | |
if (!verificationToken) return null | |
// has to be a sync token | |
if (!verificationToken.identifier.startsWith('sync:')) return null | |
// sync has user id | |
const userId = parseInt(verificationToken.identifier.split(':')[1], 10) | |
if (!userId) return null | |
// delete the token to prevent reuse | |
await prisma.verificationToken.delete({ | |
where: { id: verificationToken.id } | |
}) | |
if (new Date() > verificationToken.expires) return null | |
return await prisma.user.findUnique({ where: { id: userId } }) | |
} catch (error) { | |
console.error('auth sync error:', error) | |
return null | |
} | |
} |
I wonder if we can simplify this flow if we never redirect to stacker.news but issue cookies for (only) the custom domain in step 3?
Since the browser sees the custom domain (and isn't aware that it's connecting to stacker.news), our backend should be able to set cookies for this custom domain.
Afaict, the only issue with that is how to tell that next-auth
since it by default issues cookies for the domain set by NEXTAUTH_URL
. Maybe we can configure the cookie domain via the advanced cookie options?
The three redirects until we call signIn
makes this a little hard to follow and thus hard to be confident that there isn't anything to exploit in here like open redirects or worse, but so far, it looks good to me. Will take a closer look to find security issues tomorrow, but I understand now how the authentication works in detail.
update: I just realized that in step 7, next-auth
already issues cookies for the custom domain without any additional configuration. Why are all these redirects needed?
|
||
// Auth sync redirects with domain and optional callbackUrl and multiAuth params | ||
if (pathname === '/login' || pathname === '/signup') { | ||
const redirectUrl = new URL(pathname, mainDomain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this throws invalid URL
because mainDomain
is only the host
I fixed it to continue the review like this:
diff --git a/middleware.js b/middleware.js
index 8badffd9..a368c47f 100644
--- a/middleware.js
+++ b/middleware.js
@@ -231,8 +231,8 @@ export async function middleware (request) {
// If we're on a custom domain, handle that next
const domain = request.headers.get('x-forwarded-host') || request.headers.get('host')
// get the main domain from the env vars
- const mainDomain = new URL(process.env.NEXT_PUBLIC_URL).host
- if (domain !== mainDomain) {
+ const mainDomain = new URL(process.env.NEXT_PUBLIC_URL)
+ if (domain !== mainDomain.host) {
// get the subName from the domain mappings cache
const cachedDomainMappings = await getDomainMappingsCache()
const domainMapping = cachedDomainMappings?.[domain?.toLowerCase()]
if (process.env.NODE_ENV === 'production') { | ||
res.setHeader('Set-Cookie', [ | ||
'SameSite=Lax; Secure; HttpOnly' | ||
]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhh, this looks wrong
data: { | ||
identifier: `sync:${session.user.id}`, | ||
token, | ||
expires: new Date(Date.now() + 1 * 60 * 1000) // 1 minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually use datePivot
from lib/time.js for stuff like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is caused by this API handler, but I see API handler should not return a value, received object.
during login.
app | 2025-04-30T04:31:26.730315000Z GET /login?domain=snx.ekzy.is&callbackUrl=https%3A%2F%2Fsnx.ekzy.is%2F 307 in 55ms
app | 2025-04-30T04:31:27.305190000Z API handler should not return a value, received object.
app | 2025-04-30T04:31:27.305349000Z GET /api/auth/sync?redirectUrl=https%253A%252F%252Fsnx.ekzy.is%252F 307 in 8ms
app | 2025-04-30T04:31:28.263846000Z GET /?type=sync&token=3nvx7ndr6vuyedhe6z9f4fq5l7dzrf38&callbackUrl=https%3A%2F%2Fsnx.ekzy.is%2F 200 in 58ms
CredentialsProvider({ | ||
id: 'sync', | ||
name: 'Sync', | ||
credentials: { | ||
token: { label: 'token', type: 'text' } | ||
}, | ||
authorize: async ({ token }, req) => { | ||
return await syncAuth(token, req, res) | ||
} | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, this seems to get called twice:
app | 2025-04-30T04:31:38.175024000Z POST /api/auth/callback/sync 200 in 21ms
app | 2025-04-30T04:31:38.744352000Z POST /api/auth/callback/sync 401 in 5ms
Description
Closes #1942
Introduces custom domains for territories, featuring one-click sign-in and customizable branding
Additional Context
Several parts are at the core of custom domains functionality.
Middleware
We use the main middleware to handle redirects and rewrites for custom domains.
As usual we run
referrerMiddleware
to get referral cookies but we pass it on to custom domainsTo recognize an allowed custom domain we check req.headers.host and verify it against a cached
/api/domains
endpoint, if there's a match we'll proceed to the next step:customDomainMiddleware
with the referral cookiesThis middleware handles various paths differently
/login
/signup
The custom domain,
callbackUrl
,multiAuth
flag, gets applied to the path to be later used./
orTERRITORY_PATHS
As we got redirected to a path without
/~subName
we still need to know what territory are we on for paths that requires it, so this step applies a hidden rewrite to/~subName
/~subName
To minimize changes to the codebase, every path that makes us of
/~subName
gets redirected to a path without it, applyingsearchParams
and running then the/
rewrite we just talked about.Remember the referral cookies we passed at the start? For every redirect or rewrite, we apply those referral cookies, maintaining the usual behavior.
And we apply security headers, too.
Auth Sync
To provide a seamless experience between custom domains and SN, all auth actions gets handled by stacker.news.
During the middleware stage we recognize if we're going to a login or signup page and pass our custom domain.
We can then build a callbackUrl to
/api/auth/sync
containing the custom domain as final callback./api/auth/sync
If the login is successful or we're already logged in, we create a verification token that lasts a minute for the user they want to login with.
We then redirect back to the custom domain we passed via
/login
/signup
, with the token and the callback using the?type=sync
parameter.a TEST
DomainProvider
will catch this parameter and will start a next authsignIn
A
CredentialsProvider
called Sync will verify this token and provide the user object if everything goes smoothly.We're in!
Setup and Verification UI/UX
On edits, a new collapsed section called advanced pops-up

Here we can submit our custom domain
Upon submitting, DNS verification starts automatically and informs the user of the steps they need to follow

Entering this part of the page or submitting a domain starts a poll that stops whenever the domain is fully VERIFIED (dnsState and sslState).
And on verified DNS we trigger SSL verification

We're all set!

DNS and SSL verification
We saw how it appears on the user side, let's dive in the worker that's behind these verifications
NOTE: The verification process has changed from routine worker to a boss.send based approach
verifyDomainDNS
We check both
TXT
andCNAME
records that we asked the user to set, specifically if the domain they provided has aCNAME
that points atstacker.news
and aTXT
with our randomly generated stringAdding to this verification, we use a DNS Resolver set from the
ENV
or we fallback to CloudFlare's1.1.1.1
, for consistency.If both records are set correctly, assuming everything goes smoothly, we set
dnsState
toVERIFIED
and go onto the next step.Issue and verify SSL certificates
If dnsState is VERIFIED, we can start our SSL issuance and verification process.
We contact ACM to issue a domain certificate for the domain the user has provided, eg.
forum.pizza.com
If we got a
certificateArn
from ACM then we can already check its status to see if we got the validation values with it. If so, let's put them in our database so the user can set point a CNAME record to ACM's validation values.Since we need to wait for the user to complete this step, we continue to check the
sslState
by contacting ACM.If ACM declares the domain as valid we can then set
sslState
toVERIFIED
and the user will be notified.We are all set!
Custom Branding
On full positive domain verification a new section pops up: Custom Branding
This sub-feature gives the possibility to choose colors, logo, text and description of the community, further distinguishing a community powered by SN from SN itself.
TODO: better styling
It works by saving these settings to the CustomBranding model which is connected to the Sub model instead of CustomDomain to make it independent from any domain change.
TODO: better way
At the moment, colors are applied via a BrandingProvider that fetches them during SSR.
Navigation and UI
Custom domains won't have the territory picker or informations about the territory they're in, leaving these to SN.
Subject to change
Default branding:

This is handled by the
DomainProvider
, a context that recognizes a custom domain and applies the related subName via SSR and client-side, also gets the changed styles via an SSR fetch to the customBranding model, using the subName that we obtained before.SEO and colors are applied dynamically, overwriting SN's style with the territory owner's choices.
This is a title bars comparison, the difference here is that we don't show the territory name and

stacker news
, instead we just show the title chosen by the territory owner.TBD
Checklist
Are your changes backwards compatible? Please answer below:
Custom Domains are bolted on top of SN codebase, it shouldn't interfere, at all, with stacker.news usual behavior
On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
5, iterative testing
For frontend changes: Tested on mobile, light and dark mode? Please answer below:
Yes, remains the SCSS style application that has a slight delay
Did you introduce any new environment variables? If so, call them out explicitly here:
DNS_RESOLVER
-> with a fallback to1.1.1.1
inworker/domainVerification.js
LOCALSTACK_ENDPOINT
-> for test atmDOMAIN_DEBUG
-> for custom domain logger, for test atmMVP Done Tasks
APIs
SSL
Custom branding
Territory Edit
Domains
Navigation and UI
Middleware
Auth Sync
Take care of
isCustomDomain
to be lostAfter MVP TODOs
Wallet Sync
APIs
More
Take care of
a.stacker.news
andm.stacker.news
setCustomDomain
Change requests progress