Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linktype for ETW (Event Trace For Windows) #978

Closed
wants to merge 1 commit into from

Conversation

wiresharkyyh
Copy link

No description provided.

@guyharris
Copy link
Member

Where is a precise specification for the binary format of ETW messages? We need that to assign a LINKTYPE_ value.

@wiresharkyyh
Copy link
Author

wiresharkyyh commented Nov 5, 2020

Where is a precise specification for the binary format of ETW messages? We need that to assign a LINKTYPE_ value.

@guyharris, EVENT_RECORD is the binary format of ETW message, I also list some other links in case you are interested. Thanks

EVENT_RECORD: https://docs.microsoft.com/en-us/windows/win32/api/evntcons/ns-evntcons-event_record
Other structures: https://docs.microsoft.com/en-us/windows/win32/api/_etw/#structures
Event trace portal: https://docs.microsoft.com/en-us/windows/win32/etw/event-tracing-portal

@guyharris
Copy link
Member

So each packet begins with an EVENT_HEADER structure, with all fields aligned on a natural boundary (2 bytes for USHORT, 4 bytes for ULONG, 8 bytes for LARGE_INTEGER and ULONG64, etc.), followed by an ETW_BUFFER_CONTEXT structure, followed by a 2-byte ExtendedDataCount value, followed by a 2-byte UserDataLength value. What follows that? An EVENT_RECORD structure contains pointers, but those won't be in the file.

@wiresharkyyh
Copy link
Author

An EVENT_RECORD structure contains pointers, but those won't be in the file.

EVENT_RECORD is not saved in the file exactly the same as its binary format. Microsoft doesn't publish how to persist those information in file. But the APIs provided by Microsoft can load it back to memory.

@guyharris
Copy link
Member

EVENT_RECORD is not saved in the file exactly the same as its binary format. Microsoft doesn't publish how to persist those information in file. But the APIs provided by Microsoft can load it back to memory.

That's not good enough for a LINKTYPE_ value; a LINKTYPE_ value has to be sufficiently well defined so that somebody can independently write code to read it and dissect it.

@wiresharkyyh
Copy link
Author

wiresharkyyh commented Nov 5, 2020

That's not good enough for a LINKTYPE_ value; a LINKTYPE_ value has to be sufficiently well defined so that somebody can independently write code to read it and dissect it.

Is it ok to defines a new data structure like below and publish it on https://www.tcpdump.org/linktypes.html so everyone can write code for DLT_ETW? Please kindly advice if there is better way. Thanks

typedef struct _WTAP_ETL_RECORD {
EVENT_HEADER EventHeader; // Event header
ETW_BUFFER_CONTEXT BufferContext; // Buffer context
ULONG UserDataOffset;
ULONG UserDataLength;
ULONG MessageOffset;
ULONG MessageLength;
ULONG ProviderNameOffset;
ULONG ProviderLength;
} WTAP_ETL_RECORD;

@guyharris
Copy link
Member

So you're saying that every event contains:

  • an event header containing:
    -- 2 bytes of Size;
    -- 2 bytes of HeaderType;
    -- 2 bytes of Flags;
    -- 2 bytes of EventProperty;
    -- 4 bytes of ThreadId;
    -- 4 bytes ProcessId;
    -- 8 bytes of TimeStamp;
    -- 16 bytes of ProviderId;
    -- an event descriptor containing:
    --- 2 bytes of Id;
    --- 1 byte of Version;
    --- 1 byte of Channel;
    --- 1 byte of Level;
    --- 1 byte of Opcode;
    --- 2 bytes of Task;
    --- 8 bytes of Keyword;
    -- 8 bytes of CPU time information;
    -- 16 bytes of ActivityId;
  • an ETW buffer context containing:
    -- 2 bytes of processor identifier;
    -- 2 bytes of LoggerId;
  • 4 bytes of UserDataOffset;
  • 4 bytes of UserDataLength;
  • 4 bytes of MessageOffset;
  • 4 bytes of MessageLength;
  • 4 bytes of ProviderNameOffset;
  • 4 bytes of ProviderLength;

and nothing else, so that each event is exactly 108 bytes long, and has only that information (no user data, no message, no provider name)?

@wiresharkyyh
Copy link
Author

The event will include the 108 bytes, user data, message and provider name after the static 108 bytes. The whole event size is sizeof(WTAP_ETL_RECORD) + UserDataLength + MessageLength +ProviderLength

@guyharris
Copy link
Member

So does the event consist of:

  • the 108-byte fixed-length header;
  • UserDataLength bytes of user data;
  • MessageLength bytes of message;
  • ProviderLength bytes of provider name?

Or is there padding, or something else, between those items, so that UserDataOffset, MessageOffset, and ProviderNameOffset are necessary? (If they appear in that fashion, with no padding, UserDataOffset will be 108, MessageOffset will be 108+UserDataLength, and ProviderNameOffset will be 108+UserDataLength+MessageLength, so those fields wouldn't be needed.)

@wiresharkyyh
Copy link
Author

wiresharkyyh commented Nov 5, 2020

There may be padding automatically appended after the static 108 bytes by compiler. The dynamic UserData, Message and Provider don't require padding. That is why the offset is needed. It may only need 1 offset instead of 3

@wiresharkyyh
Copy link
Author

wiresharkyyh commented Nov 5, 2020

Technically, the padding added by compiler can be eliminated like "FIELD_OFFSET(WTAP_ETL_RECORD, ProviderLength) + sizeof(ULONG)", but I believe an offset field can make it more convenient to access those dynamic data (User data, message or provider).

@guyharris, let me know what you prefer

@infrastation
Copy link
Member

Which of these fields are defined in byte order terms? Is ETW ever intended to be stored in a file? If yes, is there existing software to read and write these files already?

@wiresharkyyh
Copy link
Author

Which of these fields are defined in byte order terms? Is ETW ever intended to be stored in a file? If yes, is there existing software to read and write these files already?

@infrastation , those field is intended to be stored in a file in byte order. I am working on Wireshark to read and write file. The change is in review and almost ready to merge. The only left unresolved issue is to ask for a new DLT_ value for this data type. Here is the PR
https://gitlab.com/wireshark/wireshark/-/merge_requests/468

@guyharris
Copy link
Member

Which of these fields are defined in byte order terms? Is ETW ever intended to be stored in a file? If yes, is there existing software to read and write these files already?

@infrastation , those field is intended to be stored in a file in byte order.

Every numerical value with 2 or more bytes is stored in a byte order. Is that little-endian or big-endian byte order? (Given that this is from Windows, I'm guessing that's little-endian.)

I am working on Wireshark to read and write file. The change is in review and almost ready to merge. The only left unresolved issue is to ask for a new DLT_ value for this data type. Here is the PR
https://gitlab.com/wireshark/wireshark/-/merge_requests/468

Then you can just write a full specification for what etl.c writes to a pcapng file.

@wiresharkyyh
Copy link
Author

wiresharkyyh commented Nov 5, 2020

Then you can just write a full specification for what etl.c writes to a pcapng file.

Yes, it is all little-endian. So I will write a readme to describe the struct and check it in along with etl.c. @guyharris, can it meet the specification request? Are you ok to have 3 offset for 3 dynamic data? I can change it to only use 1 offset if you prefer since it can save 8 bytes for each event

@guyharris
Copy link
Member

guyharris commented Nov 6, 2020

Technically, the padding added by compiler can be eliminated like "FIELD_OFFSET(WTAP_ETL_RECORD, ProviderLength) + sizeof(ULONG)",

The format of the events must be compiler-independent and processor-independent - the offsets of those fields must be exactly the same whether you're on a 32-bit machine or a 64-bit machine, and exactly the same whether the compiler aligns 64-bit quantities on 64-bit boundaries or just on 32-bit boundaries.

The header is 108 bytes long, so what comes immediately after it is not aligned on an 8-byte boundary.

If the user data might have to be aligned on an 8-byte boundary, you could add 4 bytes of padding at the end of the header, so it's 112 bytes long; that's a multiple of 8.

However, if the message might also have to be aligned on an 8-byte boundary, that's not sufficient. You could either 1) have a separate offset field for the message data or 2) simply specify that, following the user data, there's always enough padding to put the message on an 8-byte boundary.

The same applies to the provider name, although if that's a UTF-8 string, there's no alignment requirement. If it's a UTF-16 string, you could either have another separate offset field for the provider name or 2) simply specify that, following the message, there's always enough padding to put the message on a 2-byte boundary.

but I believe an offset field can make it more convenient to access those dynamic data (User data, message or provider).

It requires that code to read the file check the offset and length fields to ensure that the header, user data, message, and provider name:

  1. don't overlap;
  2. don't go past the end of the event;

so it's not clear that it would necessarily be more convenient.

@guyharris
Copy link
Member

Yes, it is all little-endian. So I will write a readme to describe the struct and check it in along with etl.c. @guyharris, can it meet the specification request?

I'd have to read it to see if it's sufficiently complete. It should allow somebody to write code in tcpdump/Wireshark/Scapy/etc. to read those events from a pcap or pcapng file and process them, solely from the information in the specification and from information in documents to which the specification refers.

@wiresharkyyh
Copy link
Author

I'd have to read it to see if it's sufficiently complete. It should allow somebody to write code in tcpdump/Wireshark/Scapy/etc. to read those events from a pcap or pcapng file and process them, solely from the information in the specification and from information in documents to which the specification refers.

True, what I mean to say is you are ok with the way that publish the specification along with the code

@guyharris
Copy link
Member

guyharris commented Nov 6, 2020

True, what I mean to say is you are ok with the way that publish the specification along with the code

A better way would probably be to publish it as a pull request to the tcpdump Web site repository, adding a file in the "linktypes" directory.

@wiresharkyyh
Copy link
Author

wiresharkyyh commented Nov 6, 2020

so it's not clear that it would necessarily be more convenient.

@guyharris, Thanks for your feedback. After considering your suggestion, I change the the binary block a little, and it looks like below. If you are ok with it I will include the specification in "linktypes" along with this PR
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/ /
/ EVENT_HEADER /
/ /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 80 bytes
| ETW_BUFFER_CONTEXT |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| UserDataLength |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MessageLength |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ProviderLength |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/ UserData /
/ variable length, padded to 32 bits /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/ Message /
/ variable length, padded to 32 bits /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/ ProviderName /
/ variable length, padded to 32 bits /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

@guyharris
Copy link
Member

So presumably you'll:

  • fill in the "EVENT_HEADER" section, either directly indicating what all the fields are or pointing to Microsoft documentation that indicates what all the fields are;
  • fill in the "ETW_BUFFER_CONTEXT" section, either directly indicating what all the fields are or pointing to Microsoft documentation that indicates what all the fields are;
  • indicate what the data is in the "UserData" section, either directly indicating what all the fields are or pointing to Microsoft documentation that indicates what's there;
  • indicate what the data is in the "Message" section, either directly indicating what all the fields are or pointing to Microsoft documentation that indicates what's there;
  • indicate what the data is in the "ProviderName" section, either directly indicating what all the fields are or pointing to Microsoft documentation that indicates what's there;
  • note that all multi-byte numerical fields are little-endian, unless you point to Microsoft documentation that indicates that explicitly.

@wiresharkyyh
Copy link
Author

  • note that all multi-byte numerical fields are little-endian, unless you point to Microsoft documentation that indicates that explicitly.

Yes, I will mention all you listed

@wiresharkyyh
Copy link
Author

@guyharris, the specification PR is the-tcpdump-group/tcpdump-htdocs#18. Please let me know if anything need be updated. Thanks

mcr added a commit to the-tcpdump-group/tcpdump-htdocs that referenced this pull request Nov 13, 2020
@wiresharkyyh
Copy link
Author

wiresharkyyh commented Nov 16, 2020

@infrastation, @guyharris, and members, could you please merge this PR since the specification PR has been merged? The Wireshark PRs are waiting for the DLT_ETW value to move on.

@wiresharkyyh
Copy link
Author

kindly ping... Can anyone help to merge this change?

@citronneur
Copy link

Hello,
If I can help don't hesitate to ask me.
Have a nice day,
Sylvain

@wiresharkyyh
Copy link
Author

If I can help don't hesitate to ask me.

@citronneur, could you please merge this change? Thanks very much

@citronneur
Copy link

Hello,

Do you mean merge in my PR?

@wiresharkyyh
Copy link
Author

Do you mean merge in my PR?

Yes, the PR #978. Could you please merge it?

@citronneur
Copy link

It's done!

@infrastation
Copy link
Member

@wiresharkyyh, please mind that your pull request had issues and you did not see it, and the person that merged your pull request prematurely didn't see it. Please stop rushing the developers for a while.

@citronneur citronneur mentioned this pull request Nov 26, 2020
@wiresharkyyh
Copy link
Author

wiresharkyyh commented Nov 28, 2020

@citronneur Sorry, I am not aware you ever crated #934. If your PR #934 intends to use DLT_ETW, it need be updated to be compliant to spec the-tcpdump-group/tcpdump-htdocs#18.
What I mean to say is to merge this PR to master since it only add new DLT_ETW value and its spec has been approved.

@wiresharkyyh
Copy link
Author

@wiresharkyyh, please mind that your pull request had issues and you did not see it, and the person that merged your pull request prematurely didn't see it. Please stop rushing the developers for a while.

@infrastation, sorry about that, I mis-interpreted @citronneur 's comments since I am not aware he authored PR #934. What I mean to say is to merge this PR to master since it only add new DLT_ETW value and its spec has been approved.

@wiresharkyyh
Copy link
Author

wiresharkyyh commented Nov 28, 2020

@wiresharkyyh, please mind that your pull request had issues and you did not see it, and the person that merged your pull request prematurely didn't see it. Please stop rushing the developers for a while.

@infrastation
Regarding the issue of my pull request, could you please help to take a look at it since the failure seems not related to my change?

_winflexbison
By installing you accept licenses for the packages.
winflexbison not installed. An error occurred during installation:
The operation has timed out
winflexbison package files install completed. Performing other installation steps.
The install of winflexbison was NOT successful.
winflexbison not installed. An error occurred during installation:
The operation has timed out

Chocolatey installed 0/1 packages. 1 packages failed.
See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).
Failures

  • winflexbison (exited 1) - winflexbison not installed. An error occurred during installation:
    The operation has timed out
    Command exited with code 1_

@fxlb
Copy link
Member

fxlb commented Jul 10, 2021

Was done with commit eb505c6.

@fxlb fxlb closed this Jul 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants