-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathoutline.txt
104 lines (92 loc) · 3.87 KB
/
outline.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Saman wants a "story" for motivating thesis. Tell the story of developing two
clients: an instruction counting client, and a memory alignment client.
How about an opcodemix client? This is actually easy to optimize the trivial
implementation of. For every instr, insert call to increment array element
indexed by argument of opcode enum. Inline, fold lea, bam. Con: requires
re-running benchmarks.
- What do I need in background?
- What does DR offer a tool writer?
- BB-by-BB exec model
- App context vs. DR context
- Roughly how to switch contexts
- Discuss scratch space and OS TLS concerns
Possible outline:
- Introduction.
- Tell story of tools, why they are good, examples of what they can do. Tell
story of writing instruction count client.
- Background on DynamoRIO
- Benefits for tool writers:
- Provides abstract representation of machine instructions as bbs executed,
allows tool to insert its own instructions as instrumentation.
- Maintains control for tool, even in face of SMC, JITs, execution from
stack, complex Windows desktop apps using OS resources, etc.
- DR provides transparency, helps hide tool operations from application,
Ex must use alternate stack.
- Execution model
- Two contexts: application context and dynamorio context, separate stacks
- Basic block interpretation: appstart, from first instr to next CTI
- Terminating CTI is mangled to return control to DR
- Modified basic block is encoded into a "fragment" in the "code cache"
- DR switches to app context in fragment, fragment switches to DR context
when finished.
- DR examines app context, determines app's target PC, interprets next basic
block.
- Finally, basic blocks with direct CTIs are "linked" to eliminate any
context switches from code paths executed more than once.
- Inlining
- Describe instruction count
- Clean calls
- Inlined calls
- Coalesced calls: Remove register save/restore pairs between calls
- Use TLS
- RLE/DSE
- Avoid aflags
- Fold leas
- Partial inlining
- alignment & memtrace: describe structure of common fast path, alude to
similar technique in our fasttrack race detector.
- Clean Calls with Arguments: argument materialization
- Decoding the routine in the presence of control flow
- Handling Fastpath: describe detection of and isolation of fastpath
- Handling Slowpath: describe expansion of slowpath
- Handling Side Effects: describe how side effects are deferred in memtrace
example
- Constant Folding (immediate arguments)
- Complex lea Folding
- System Overview
- Steps and phases
- Call-site insertion:
- Get or compute callee info from cache, discussed next
- Make call-site info struct with args and callee info
- Code is expanded later, after whole bb optimization
- Callee analysis:
- decode
- analyze stack usage
- analyze for partial inline
- optimize, give enumerated list of optimizations
- add to cache
- Return to call-site analysis
- Whole BB Optimization
- Schedule calls together, within register liveness constraints
- Expand to pseudo-save, inline code, pseudo-restore
- Delete reciprocal ops
- Expand pseudo-save and pseudo-restore ops
- Optimize whole bb again, be careful to only touch meta-instrs
- Call-site expansion:
- Clone cached ilist
- Materialize live args in ilist
- Re-optimize with args, especially fold immediates
- Analyze register usage, folding eliminates usages
- Performance
- inscount
- alignment
- memtrace
- comparison with Pin?
- Future or related work?
- Alternative: Predicate Function: discuss Pin's approach of InsertIfCall +
InsertThenCall, inline IfCall if possible. Requires client code
restructuring, but doesn't suffer from above issues.
- Contributions
- Inlining
- Partial inlining
- x86 instruction optimization suite