package main
import "fmt"
func main() {
fmt.Printf("Hello, world.\n")
}
# Find the entry point
ENTRY_TEST=$(readelf -h test | grep Entry | awk '{print $4}')
# Start gdb and set the breakpoint at entry
gdb -ex "break *${TEST_ENTRY}" ./test
- runtime/rt0_linux_amd64.s:_rt0_amd64_linux
- On entry, stack contains argc, argv (and aux info) and is guaranteed to be aligned on 16 bytes, refer x86_64 Linux ABI
- Get the argc, argv from stack and put them into rsi and rdi respectively
- Jump to runtime/rt0_linux_amd64.s:main
- Jump to runtime/asm_amd64.s:runtime.rt0_go
- Make space for args in the stack, align by 16 bytes
- Copy the argc, argv into the stack
- Setup stackguard (?)
- Find processor information (SSE, AVX etc)
- Call _cgo_init if available
- Setup TLS sys_linux_amd64.s:runtime.settls
- Verify TLS works
- Call src/runtime/runtime1.go:check()
- Call src/runtime/runtime1.go:args()
- Call src/runtime/os_linux.go:sysargs()
- Process values out of AUXV passed by kernel: _AT_NULL, _AT_PAGESZ and_AT_RANDOM
- Call src/runtime/vdso_linux_amd64.go:archauxv()
- Process _AT_SYSINFO_EHDR and get VDSO info out of it
- Call src/runtime/vdso_linux_amd64.go:vdso_init_from_sysinfo_ehdr()
- Get load offset and dyanamic table offsets (_PT_LOAD and _PT_DYNAMIC)
- Out of dynamic table, get the follow tags/entries: _DT_STRTAB, _DT_SYMTAB, _DT_HASH, _DT_VERSYM and _DT_VERDEF
- Call src/runtime/vdso_linux_amd64.go:vdso_parse_symbols()
- Get symbol info for: __vdso_time, __vdso_gettimeofday, __vdso_gettimeofday_sym and __vdso_clock_gettime
- Call runtime.osinit()
- Call runtime.getproccount()
- Compute number of processors available by calling sched_getaffinity()
- Call runtime.getproccount()
- Call runtime.schedinit()
- Call runtime.tracebackinit()
- Call runtime.moduledataverify()
- Verify all modules
- Modules are found via firstmoduledata linker symbol
- Call src/runtime/os_linux.go:sysargs()
- Call runtime.stackinit()
- Call runtime.mallocinit()
- Call runtime.mcommoninit()
- Call runtime.alginit()
- Call runtime.typelinksinit()
- Call runtime.itabsinit()
- Call runtime.msigsave()
- Call runtime.goargs()
- Call runtime.goenvs()
- Call runtime.parsedebugvars()
- Call runtime.gcinit()
- Call setGCPercent to set the GC trigger percentage (based on GOGC environment variable)
- Compute number of Ps (honoring GOMAXPROCS env variable and _MaxGomaxprocs)
- Call src/runtime/proc.go:procresize()
- Create Ps
- Use runtime.allp[0] for the current execution context (i.e., acquire P for current M)
- Set all Ps to Idle state
- Call src/runtime/proc.go:newproc(): Create new goroutine to
start the program. src/runtime/proc.go:main() is the goroutine
entry point.
- Call src/runtime/proc.go:newproc1()
- Allocate new stack
- Copy args into the new stack
- Set the new g’s stack’s top point to
src/runtime/asm_amd64.s:goexit()
- This sets up things as if goexit called this goroutine, so control will return to this on exit from the goroutine
- Set the new g to runnable state
- Call src/runtime/proc.go:runqput() to put the newg in runqueue
- Call src/runtime/proc.go:newproc1()
- Call src/runtime/proc.go:mstart()
- Call src/runtime/proc.go:mstart1()
- Call src/runtime/os_linux.go:minit()
- Setup signal stack
- Set signal mask
- Set signal handlers (pointing to src/runtime/sys_linux_amd64.s:sigtramp())
- sigtramp() calls src/runtime/signal_sigtramp.go:sigtrampgo()
- Call src/runtime/os_linux.go:minit()
- Call src/runtime/proc.go:schedule()
- Find a g to run
- Call src/runtime/proc.go:execute()
- Call src/runtime/asm_amd64.s:gogo()
- Call src/runtime/proc.go:main(). By retrieving the entry
point from gobuf and jumping to it.
- Start sysmon src/runtime/proc.go:sysmon() in new ‘M’
(i.e., in new thread)
- Forever
- usleep(between 10us and 10ms)
- If sched.gcwaiting or sched.npidle == gomaxprocs
- sched.sysmonwait = 1
- maxsleep = min(forcegcperiod/2, scavengelimit/2)
- forcegcperiod == 120 seconds and scavengelimit == 300 seconds
- notetsleep(&sched.sysmonnote, maxsleep)
- notetsleep is an onetime event waiter, with nanosecond timeout
- sched.sysmonwait = 0
- noteclear(&sched.sysmonnote)
- Call netpoll() if we haven’t polled in last 10ms
- if there are any events, inject the corresponding goroutines into global runqueue
- Call src/runtime/proc.go:retake() to retake P’s blocked in syscalls
- For each P in the system (i.e., for each gomaxprocs)
- if P.status == _Psyscall
- if blocked for more than 20us
- Call src/runtime/proc.go:handoffp()
- This will handoff this P to m (by calling startm(), which may create new M’s if required (== new OS thread))
- else if running for too long, preempt it
- If GC wasn’t run in last 2 minutes, force a GC (by injecting GC goroutine forcegchelper() into global runque)
- Scavenge heap periodically (every 52/2 mins)
- This will give back unused heap memory back to OS (by doing madvise(.., _MADV_FREE))
- Forever
- Call src/runtime/proc.go:init()
- Which starts forcegchelper() goroutine
- Call src/runtime/mgc.go:gcenable()
- Start bgsweep() goroutine
- Set memstats.enablegc = true
- Call main_init()
- Call main_main()
- This is the user’s main function
- Call exit(0)
- Start sysmon src/runtime/proc.go:sysmon() in new ‘M’
(i.e., in new thread)
- Call src/runtime/proc.go:mstart1()
- Jump to runtime/asm_amd64.s:runtime.rt0_go
Since Go is using cooperative preemption, the goroutines have preemption points that will do the self stop/preemption. This may trip us under not-so-common circumstances if our code doesn’t have those preemption points. For example, the following program will not stop (tested in go1.7.3):
package main
import (
"fmt"
"runtime"
"time"
)
func testfunc1() {
num := 0
for i := 1; i < 10000; i++ {
num = num + i
}
}
func busyFunc() {
for {
testfunc1()
}
}
func main() {
runtime.GOMAXPROCS(1)
go busyFunc()
time.Sleep(1 * time.Second)
fmt.Println(" Hello world")
}
As we saw in the above startup sequence, even though the sysmon can detect that goroutine has been busy for long, it will only indicate the intent to preempt by setting a flag in the respective goroutine, which the respective goroutine checks at (compile time inserted) premption points (like function prologue).
The compiler doesn’t generate function prologue if it is a leaf function or if the stack is small.
To “verify” that, for example, if you use the below modified testfunc1() function, the premeption check will be generated (since it is no longer a small stack)
func testfunc1() {
var a [128]byte
a[0] = 0
num := 0
for i := 1; i < 10000; i++ {
num = num + i
}
}
Or add a division, which will insert a runtime.panicdivide call (to detect divide by zero), which will make this non-leaf function.
func testfunc1() {
num := 0
for i := 1; i < 10000; i++ {
num = num + i/i
}
}
Though note that as of go1.8, runtime.panicdivide will not generate a function prologue