Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 95 additions & 28 deletions public/content/developers/docs/data-and-analytics/index.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing a word count around 1800

https://ethereum.org/en/contributing/style-guide/#best-practices

Generally try to keep things closer to 1000, 1500 max... perhaps we could split the content of this page into two separate pages?

Original file line number Diff line number Diff line change
@@ -1,55 +1,122 @@
---
title: Data and analytics
description: How to get on-chain analytics and data for use in your dapps
title: Data and Analytics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We use sentence casing for all titles)

Suggested change
title: Data and Analytics
title: Data and analytics

description: Understanding and navigating the crypto data stack
lang: en
---

## Introduction {#Introduction}

As utilization of the network continues to grow, an increasing amount of valuable information will exist in the on-chain data. As the volume of data rapidly increases, calculating and aggregating this information to report upon or drive a dapp can become a time and process heavy endeavor.
As you get into crypto, you're likely to see stats, charts, and APIs across dozens of different websites (and github repos). This is a stark contrast from traditional tech or finance where there is just one trusted source (i.e. Bloomberg). This is due to the open data nature of the industry, where all data can be freely downloaded from client nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As you get into crypto, you're likely to see stats, charts, and APIs across dozens of different websites (and github repos). This is a stark contrast from traditional tech or finance where there is just one trusted source (i.e. Bloomberg). This is due to the open data nature of the industry, where all data can be freely downloaded from client nodes.
As you get into crypto, you're likely to see stats, charts, and APIs across dozens of different websites (and GitHub repos). This is a stark contrast from traditional tech or finance where there is just one trusted source (i.e. Bloomberg). This is due to the open data nature of the industry, where all data can be freely downloaded from client nodes.


Leveraging existing data providers can expedite development, produce more accurate results, and reduce ongoing maintenance efforts. This will enable a team to concentrate on the core functionality their project is trying to provide.
However, that data is stored in hex/binary form and is hardly legible or usable. As an ecosystem, we've had to work together to share queries, tools, tables, and labor to make crypto data accessible for everyone. Each year, we see giant leaps in how we can combine onchain/offchain data in analytics and applications.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just a note on how it's also not indexed by default or easily searchable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, see it's covered below, that'll do


## Prerequisites {#prerequisites}
In this overview page, we'll cover all the parts of the crypto data stack as they stand today (last updated in 2024).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewhong5297 Certainly apologize for delays here... anything critical that has developed since that you'd like to add now that we're in 2025?


You should understand the basic concept of [Block Explorers](/developers/docs/data-and-analytics/block-explorers/) in order to better understand using them in the data analytics context. In addition, familiarize yourself with the concept of an [index](/glossary/#index) to understand the benefits they add to a system design.
## Crypto Data Landscape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Custom header ids needed for all headers—related to our translations)

Suggested change
## Crypto Data Landscape
## Crypto Data Landscape {#crypto-data-landscape}


In terms of architectural fundamentals, understanding what an [API](https://www.wikipedia.org/wiki/API) and [REST](https://www.wikipedia.org/wiki/Representational_state_transfer) are, even in theory.
Below is a diagram of the five parts of the crypto data stack, with some top example product logos called out. You can learn more about these products in the [tool descriptions page](public/content/developers/docs/data-and-analytics/tool-descriptions.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Below is a diagram of the five parts of the crypto data stack, with some top example product logos called out. You can learn more about these products in the [tool descriptions page](public/content/developers/docs/data-and-analytics/tool-descriptions.md)
Below is a diagram of the five parts of the crypto data stack, with some top example product logos called out. You can learn more about these products in the [tool descriptions page](/developers/docs/data-and-analytics/tool-descriptions/)

And similarly, that page should be at tool-descriptions/index.md, not tool-descriptions.md; we'll get it moved


## Block explorers {#block-explorers}
![](/public/images/data/data_landscape.jpg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Alt text)

Suggested change
![](/public/images/data/data_landscape.jpg)
![Diagram showing the web3 data stack as represented in 2024](/public/images/data/data_landscape.jpg)


Many [Block Explorers](/developers/docs/data-and-analytics/block-explorers/) offer [RESTful](https://www.wikipedia.org/wiki/Representational_state_transfer) [API](https://www.wikipedia.org/wiki/API) gateways that will provide developers visibility into real-time data on blocks, transactions, miners, accounts, and other on-chain activity.
#### web2 counterparts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Important not to skip heading levels)

Suggested change
#### web2 counterparts
### Web2 counterparts {#web2-counterparts)


Developers can then process and transform this data to give their users unique insights and interactions with the [blockchain](/glossary/#blockchain). For example, [Etherscan](https://etherscan.io) provides execution and consensus data for every 12s slot.
If you are new to web3 and this looks confusing to you, think of these web2 parallels:

## The Graph {#the-graph}
- indexing: amplitude, stripe, general logging services
- explore: grafana, datadog
- query: metabase, hex
- define and store: dbt, snowflake, databricks
- …and back onchain: this is just reverse ETL
Comment on lines +25 to +29
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any proper names/brands should be capitalized here


The [Graph Network](https://thegraph.com/) is a decentralized indexing protocol for organizing blockchain data. Instead of building and managing off-chain and centralized data stores to aggregate on-chain data, with The Graph, developers can build serverless applications that run entirely on public infrastructure.
### Index: Read raw data from the blockchain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Index: Read raw data from the blockchain
### Index: Read raw data from the blockchain {#index}


Using [GraphQL](https://graphql.org/), developers can query any of the curated open APIs, known as sub-graphs, to acquire the necessary information they need to drive their dapp. By querying these indexed sub-graphs, Reports and dapps not only get performance and scalability benefits but also the built in accuracy provided by network consensus. As new improvements and/or sub-graphs are added to the network, your projects can rapidly iterate to take advantage of these enhancements.
Blockchains run with nodes. Nodes run “client” code - which are repos that have implemented the EVM in some fashion. These clients have a set of implemented RPCs (API endpoints), some of which are standard and some of which are custom to support better/faster data querying. Data from RPC endpoints are direct reflections of the blockchain state - there are no external party changes or transformations here. Nodes-as-a-service providers run tons of nodes and offer them collectively through an API product, creating a stable data service.

## Client diversity
We’ve seen a ton of change in the indexing layer over the last year, with two new subcategories emerging:

[Client diversity](/developers/docs/nodes-and-clients/client-diversity/) is important for the overall health of the Ethereum network because it provides resilience to bugs and exploits. There are now several client diversity dashboards including [clientdiversity.org](https://clientdiversity.org/), [rated.network](https://www.rated.network), [supermajority.info](https://supermajority.info//) and [Ethernodes](https://ethernodes.org/).
- Forks-as-a-Service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Forks-as-a-Service
**Forks-as-a-Service**

- Fork any contract and add events and calculations, and then pull this data from a new “forked” RPC/data service.
- Some of the main providers for this are [shadow.xyz](https://www.shadow.xyz/), [ghostlogs](https://ghostlogs.xyz/), and [smlXL](https://smlxl.io/).
- ilemi gave his thoughts on [shortcomings and difficulties](https://twitter.com/andrewhong5297/status/1732230186966413484) on this approach.
Comment on lines +38 to +40
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Fork any contract and add events and calculations, and then pull this data from a new “forked” RPC/data service.
- Some of the main providers for this are [shadow.xyz](https://www.shadow.xyz/), [ghostlogs](https://ghostlogs.xyz/), and [smlXL](https://smlxl.io/).
- ilemi gave his thoughts on [shortcomings and difficulties](https://twitter.com/andrewhong5297/status/1732230186966413484) on this approach.
- Fork any contract and add events and calculations, and then pull this data from a new “forked” RPC/data service.
- Some of the main providers for this are [shadow.xyz](https://www.shadow.xyz/), [ghostlogs](https://ghostlogs.xyz/), and [smlXL](https://smlxl.io/).
- ilemi gave his thoughts on [shortcomings and difficulties](https://twitter.com/andrewhong5297/status/1732230186966413484) on this approach.


## Dune Analytics {#dune-analytics}
- Rollups-as-a-Service (RaaS):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Rollups-as-a-Service (RaaS):
**Rollups-as-a-Service (RaaS):**

- The big theme of the year has been rollups, with Coinbase kicking it off by launching a rollup (Base) on the Optimism Stack (OP Stack) earlier this year.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see, time passes fast... using terms like "earlier this year" quickly become dated. Ideally we rephrase to keep content a bit more future-proof, although I do appreciate the note at the top mentioning the timeframe we're discussing.

- Teams are building products specifically for running the nodes and sequencer(s) for your own Rollup. We’ve already seen dozens of rollups launch.
- New startups like [Conduit](https://conduit.xyz/), [Caldera](https://caldera.xyz/), and [Astria](https://www.astria.org/) are offering full stack rollup services. Quicknode and Alchemy have launched similar RaaS offerings.
Comment on lines +43 to +45
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The big theme of the year has been rollups, with Coinbase kicking it off by launching a rollup (Base) on the Optimism Stack (OP Stack) earlier this year.
- Teams are building products specifically for running the nodes and sequencer(s) for your own Rollup. We’ve already seen dozens of rollups launch.
- New startups like [Conduit](https://conduit.xyz/), [Caldera](https://caldera.xyz/), and [Astria](https://www.astria.org/) are offering full stack rollup services. Quicknode and Alchemy have launched similar RaaS offerings.
- The big theme of the year has been rollups, with Coinbase kicking it off by launching a rollup (Base) on the Optimism Stack (OP Stack) earlier this year.
- Teams are building products specifically for running the nodes and sequencer(s) for your own Rollup. We’ve already seen dozens of rollups launch.
- New startups like [Conduit](https://conduit.xyz/), [Caldera](https://caldera.xyz/), and [Astria](https://www.astria.org/) are offering full stack rollup services. Quicknode and Alchemy have launched similar RaaS offerings.


[Dune Analytics](https://dune.com/) pre-processes blockchain data into relational database (PostgreSQL and DatabricksSQL) tables, allows users to query blockchain data using SQL and build dashboards based on query results. On-chain data are organized into 4 raw tables: `blocks`, `transactions`, (event) `logs` and (call) `traces`. Popular contracts and protocols have been decoded, and each has its own set of event and call tables. Those event and call tables are processed further and organized into abstraction tables by the type of protocols, for example, dex, lending, stablecoins, etc.
Alchemy and Quicknode have expanded further into crypto native infra and data engineering infra. Alchemy launched [account abstraction](https://www.alchemy.com/account-abstraction-infrastructure) and [subgraph services](https://www.alchemy.com/subgraphs). Quicknode has been busy with [alerts](https://www.quicknode.com/quickalerts), [data streaming](https://www.quicknode.com/quickstreams), and [rollup services](https://blog.quicknode.com/introducing-quicknode-custom-chains).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few updated links

Suggested change
Alchemy and Quicknode have expanded further into crypto native infra and data engineering infra. Alchemy launched [account abstraction](https://www.alchemy.com/account-abstraction-infrastructure) and [subgraph services](https://www.alchemy.com/subgraphs). Quicknode has been busy with [alerts](https://www.quicknode.com/quickalerts), [data streaming](https://www.quicknode.com/quickstreams), and [rollup services](https://blog.quicknode.com/introducing-quicknode-custom-chains).
Alchemy and Quicknode have expanded further into crypto native infra and data engineering infra. Alchemy launched [account abstraction](https://www.alchemy.com/bundler) and [subgraph services](https://www.alchemy.com/subgraphs). Quicknode has been busy with [alerts](https://www.quicknode.com/quickalerts), [data streaming](https://www.quicknode.com/stream), and [rollup services](https://blog.quicknode.com/introducing-quicknode-custom-chains).


## SubQuery Network {#subquery-network}
We should see our first “intents” clients/services soon. Intents are a part of the modular stack, and are essentially transactions handled outside the mempool that have extra preferences attached. UniswapX and Cowswap both operate limit order intent pools, and should both release clients within the year. Account abstraction bundlers like [stackup](https://www.stackup.sh/) and [biconomy](https://www.biconomy.io/) should venture into intents as well. It’s unclear if data providers like Alchemy will index these “intents” clients, or if it will be like MEV where we have specialized providers like [Blocknative](https://www.blocknative.com/) and [Bloxroute](https://bloxroute.com/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We should see our first “intents” clients/services soon. Intents are a part of the modular stack, and are essentially transactions handled outside the mempool that have extra preferences attached. UniswapX and Cowswap both operate limit order intent pools, and should both release clients within the year. Account abstraction bundlers like [stackup](https://www.stackup.sh/) and [biconomy](https://www.biconomy.io/) should venture into intents as well. It’s unclear if data providers like Alchemy will index these “intents” clients, or if it will be like MEV where we have specialized providers like [Blocknative](https://www.blocknative.com/) and [Bloxroute](https://bloxroute.com/).
We should see our first “intents” clients/services soon. Intents are a part of the modular stack, and are essentially transactions handled outside the mempool that have extra preferences attached. UniswapX and Cowswap both operate limit order intent pools, and should both release clients within the year. Account abstraction bundlers like [stackup](https://www.stackup.fi/) and [biconomy](https://www.biconomy.io/) should venture into intents as well. It’s unclear if data providers like Alchemy will index these “intents” clients, or if it will be like MEV where we have specialized providers like [Blocknative](https://www.blocknative.com/) and [Bloxroute](https://bloxroute.com/).


[SubQuery](https://subquery.network/) is a leading data indexer that gives developers fast, reliable, decentralized, and customized APIs for their web3 projects. SubQuery empower developers from over 165+ ecosystems (including Ethereum) with rich indexed data to build an intuitive and immersive experiences for their users. The SubQuery Network powers your unstoppable apps with a resilient and decentralized infrastructure network. Use SubQuery's blockchain developer toolkit to build the web3 applications of the future, without spending time building a custom backend for data processing activities.
Another up-and-coming type of provider is the “all-in-one” service, which combines indexing, querying, and defining. There are a few products here such as [indexing.co](https://www.indexing.co/) and [spec.dev](https://spec.dev/) - they are not included them in the landscape since they are still nascent.

To start, visit the [Ethereum quick start guide](https://academy.subquery.network/quickstart/quickstart_chains/ethereum-gravatar.html) to start indexing Ethereum blockchain data in minutes in a local Docker environment for testing before going live on a [SubQuery's managed service](https://managedservice.subquery.network/) or on [SubQuery's decentralised network](https://app.subquery.network/dashboard).
### Explore: Quickly look into addresses, transactions, protocols, and chains
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Explore: Quickly look into addresses, transactions, protocols, and chains
### Explore: Quickly look into addresses, transactions, protocols, and chains {#explore}


## Ethernow - Mempool Data Program {#ethernow}
[Blocknative](https://www.blocknative.com/) provides open access to its Ethereum historical [mempool data archive](https://www.ethernow.xyz/mempool-data-archive). This enables researchers and community good projects to explore the pre-chain layer of Ethereum Mainnet. The data set is actively maintained and represents the most comprehensive historical record of mempool transaction events within the Ethereum ecosystem. Learn more at [Ethernow](https://www.ethernow.xyz/).
It’s common to jump between well-crafted data dashboards and blockchain explorers iteratively, to help you identify a trend or build out a cohesive data pipeline.

Outside of etherscan, we now have this plethora of explorers to choose from:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Outside of etherscan, we now have this plethora of explorers to choose from:
Outside of Etherscan, we now have this plethora of explorers to choose from:


- More intuitive explorers like [parsec](https://parsec.fi/), [arkham](https://platform.arkhamintelligence.com/), [onceupon](https://www.onceupon.gg/home) showcasing more metadata and charts for transactions and addresses.
- In depth explorers like [evm.storage](https://evm.storage/eth/18906826/0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48#map) showcasing storage slots and memory stacks
- Cross-chain explorers like [Routescan](https://routescan.io/) (superscan), [Dora](https://www.ondora.xyz/), and [Onceupon](https://www.onceupon.gg/home)
- ZK proof explorers like [modulus](https://explorer.modulus.xyz/batch/157/inference/61974), [succinct](https://alpha.succinct.xyz/), and [axiom](https://explorer.axiom.xyz/v1/mainnet).
- Bridge explorers like [socketscan](https://www.socketscan.io/) and [wormhole](https://wormhole.com/explorer/)
- MEV explorers like [Eigenphi](https://eigenphi.io/mev/ethereum/txr) for transactions or [mevboost.pics](https://mevboost.pics/) for bundles and [beaconcha.in](https://beaconcha.in/) for blocks
- Nansen launched a 2.0 of their token and wallet tracking product, with cool new features like “smart segments”
Comment on lines +59 to +65
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting we do already have a dedicated page for block explorers nested in this section: https://ethereum.org/en/developers/docs/data-and-analytics/block-explorers/

It's hard enough to maintain one list of these; I'd be hesitant to add a list with the same purpose to this page. cc: @konopkja


The dashboard layer hasn’t changed much. It’s still the wild west here. If you spend a day on Twitter, you’ll see charts from dozens of different platforms covering similar data but all with slight differences or twists. Verification and attribution are becoming a bigger issue, especially now with the large growth in both teams and chains.

Marketing specific address explorers are on the rise. Teams like spindl and bello will lead the way here. Cross-chain explorers (and pre-chain like MEV/intents) will see expansion in development.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize team names? Only leave lower if they're branding indicates as such... kinda like ethereum.org =)


Across platforms, wallets are still poorly labelled and tracked, and it’s getting worse with intents/account abstraction now. We don’t mean static labels like “Coinbase” but instead more dynamic ones like “Experienced Contract Deployer”. Some teams are trying to tackle this such as [walletlabels](https://walletlabels.xyz/), [onceupon context](https://github.com/Once-Upon/context), and [syve.ai](https://www.syve.ai/). It will also improve naturally alongside web3 social, which is growing mainly on Farcaster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://walletlabels.xyz
image

image

sad panda..

Suggested change
Across platforms, wallets are still poorly labelled and tracked, and it’s getting worse with intents/account abstraction now. We don’t mean static labels like “Coinbase” but instead more dynamic ones like “Experienced Contract Deployer”. Some teams are trying to tackle this such as [walletlabels](https://walletlabels.xyz/), [onceupon context](https://github.com/Once-Upon/context), and [syve.ai](https://www.syve.ai/). It will also improve naturally alongside web3 social, which is growing mainly on Farcaster.
Across platforms, wallets are still poorly labelled and tracked, and it’s getting worse with intents/account abstraction now. We don’t mean static labels like “Coinbase” but instead more dynamic ones like “Experienced Contract Deployer”. Some teams are trying to tackle this, such as [syve.ai](https://www.syve.ai/). It will also improve naturally alongside web3 social, which is growing mainly on [Farcaster](https://farcaster.xyz/).


### Query: Raw, decoded, and abstracted data that can be queried
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Query: Raw, decoded, and abstracted data that can be queried
### Query: Raw, decoded, and abstracted data that can be queried {#query}


Most SQL query engines are cloud-based, so you can use an in-browser IDE to query against raw and aggregated data (like nf.trades/dex.trades). These also allow for definition of great tables such as NFT wash trading filters. All these products come with their own APIs to grab results from your queries.

GraphQL APIs here let you define your own schemas (in typescript or SQL) and then generates a graphQL endpoint by running the full blockchain history through your schema.

For predefined APIs (where you query prebuilt schemas), there are a ton of niche data providers that are not included in my chart. Data providers covering domains like mempool, nft, governance, orderbook, prices, and more.

Holistically, every platform has got a lot more efficient with their infra (meaning your queries run faster). Most platforms explored advanced methods of getting data out like odbc connectors, data streams, s3 parquet sharing, bigquery/snowflake direct transfers, etc.

Recent changes to existing products:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Again, careful with "recent" language)


1. Query engines like [Dune](https://dune.com/) and [Flipside](https://flipsidecrypto.xyz/) have accepted there is more data than can possibly be ingested in custom data pipelines, and have launched products that allow the user to bring in that data instead. Flipside launched livequery (query an API in SQL) and Dune launched uploads/syncs (upload csv or api, or sync your database to Dune directly).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Query engines like [Dune](https://dune.com/) and [Flipside](https://flipsidecrypto.xyz/) have accepted there is more data than can possibly be ingested in custom data pipelines, and have launched products that allow the user to bring in that data instead. Flipside launched livequery (query an API in SQL) and Dune launched uploads/syncs (upload csv or api, or sync your database to Dune directly).
1. Query engines like [Dune](https://dune.com/) and [Flipside](https://flipsidecrypto.xyz/) have accepted there is more data than can possibly be ingested in custom data pipelines, and have launched products that allow the user to bring in that data instead. Flipside launched livequery (query an API in SQL) and Dune launched uploads/syncs (upload csv or api, or sync your database to Dune directly).

2. Our favorite decentralized data child, [The Graph](https://thegraph.com/), has had to really beef up their infra to not lose market share to centralized subgraph players like goldsky and satsuma (alchemy). They’ve partnered closely with [StreamingFast](https://www.streamingfast.io/), separating out “reader” and “relayer” of data and also introducing [substreams](https://thegraph.com/docs/en/substreams/) which allow you to write rust based subgraphs across chains.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Our favorite decentralized data child" -- I know it's colloquial, but may re-word this for a platform like ethereum.org


No provider here is truly set up for the rollup world yet, either in terms of scaling ingestion or fixing cross-chain schemas/transformations. And by not ready, we mean not ready for the case of 500 rollups launching in a week. Dune has launched a [rollup ingestion product](https://dune.com/product/dune-catalyst) to start making this easier, especially if you use an existing RaaS provider like Conduit.

LLM query/dashboard products like Dune AI will start to gain stronger traction in certain domains, such as wallet analysis or token analysis. Labels datasets will play a strong part in enabling this.

### Define and Store: Create and store aggregations that rely on heavy data transformations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Define and Store: Create and store aggregations that rely on heavy data transformations
### Define and Store: Create and store aggregations that rely on heavy data transformations {#define-and-store}


Raw data is great, but to get to better metrics, you need to be able to standardize and aggregate data across contracts of different protocols. Once you have aggregations, you can create new metrics and labels that enhance everyone's analysis. We've have only included products that have active contribution from both inside and outside the platform’s team, and are publicly accessible.

The collaborative layer of data definition has not really evolved over the last year. Product teams and engineering are barely keeping up as is. To give a sense of growth rate here, in the month of December 2023:

- DeFiLlama adapters saw [230 pull requests from 150 contributors across 559 files](https://github.com/DefiLlama/DefiLlama-Adapters/pulse/monthly) out of about 3700 total files.
- Dune spellbook saw [127 pull requests from 42 contributors across 779 files](https://github.com/duneanalytics/spellbook/pulse/monthly) out of about 3700 total files.

You can probably consider every 2-3 files to be a new protocol/table. So 200+ tables shifting around every month, and that number is only increasing with new chains and startups. If you’ve never worked in GitHub or data tables before, let me tell you that’s a ton of work to manage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above... "let me tell you that’s a ton of work to manage." would reword

Some general ethereum.org style guide tips: https://ethereum.org/en/contributing/style-guide/#best-practices


### Back Onchain: Putting the data back into contracts using Zero Knowledge (ZK) tech
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Back Onchain: Putting the data back into contracts using Zero Knowledge (ZK) tech
### Back onchain: Putting the data back into contracts using zero knowledge (ZK) tech {#back-onchain}


This layer is completely new, and still very nascent. The idea here is that you can prove some data or computation was correctly collected using a “ZK circuit”, and then post that proof onchain (sometimes with outputs, depending on the application). You can get a sense of how this works by trying to create your own simple identity proofs yourself. This [presentation from the summer](https://www.youtube.com/watch?v=EVUELLiDjDA) also gives a good summary of these “ZK coprocessors”.

For historic data access and compute, the main players for now are [herodotus](https://herodotus.dev/), [axiom](https://www.axiom.xyz/), [succinct](https://alpha.succinct.xyz/), [lagrange](https://lagrange.dev/), and [spaceandtime](https://www.spaceandtime.io/). They each have slightly different approaches to their prover and ZK stack, which does impact the type of data and type of calculations that you could verify and post using each tool.

ZK machine learning is the idea that you can prove that a certain inference came from a certain model that was trained on certain data. [EZKL](https://github.com/zkonduit/ezkl) is the backbone of most ZK machine learning stacks right now, other than modulus who is building their own custom ML model prover system. Ritual, Giza, and Spectral all use EZKL for the model proof portion of their stack for now.

ZK machine learning is not as simple as deploy and run (can’t just host on kubeflow), because you now need provers as part of the stack. Products like [gensyn](https://www.gensyn.ai/), [blockless](https://blockless.network/), and other AVS providers are working on forming a prover/compute marketplace.

To get a good sense of where ZK fits in to the future and some of the surrounding technologies enabling it, read these three articles:
- [“Modular Stack” for dummies](https://read.cryptodatabytes.com/p/the-future-of-transactions-for-dummies)
- [The Era of Soft Composability](https://read.cryptodatabytes.com/p/fbb45562-4541-4ea0-a65d-7f1a5c92bf12)
- [The Future of Digital Identity](https://dcbuilder.mirror.xyz/myIlus8pl6SbyuUR4ufGf9OYRps8hCGqeZHNYce3i94)

## Further Reading {#further-reading}

- [Graph Network Overview](https://thegraph.com/docs/en/about/network/)
- [Graph Query Playground](https://thegraph.com/explorer/subgraph/graphprotocol/graph-network-mainnet?version=current)
- [API code examples on EtherScan](https://etherscan.io/apis#contracts)
- [Beaconcha.in Beacon Chain explorer](https://beaconcha.in)
- [Dune Basics](https://docs.dune.com/#dune-basics)
- [SubQuery Ethereum Quick Start Guide](https://academy.subquery.network/indexer/quickstart/quickstart_chains/ethereum-gravatar.html)
- [Andrew Hong's 2024 Data Landscape Guide](https://read.cryptodatabytes.com/p/2024-annual-guide-web3-data-tools)
- [Bytexplorers, an onchain data community that learns and earns together](https://read.cryptodatabytes.com/p/join-the-bytexplorers)
- [Learn SQL and Ethereum on Dune](https://read.cryptodatabytes.com/p/a-basic-wizard-guide-to-dune-sql)
Loading