Skip to content

Add Copy spreadsheet with SAX sample #336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions docs/spreadsheet/how-to-copy-a-worksheet-with-sax.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
api_name:
- Microsoft.Office.DocumentFormat.OpenXML.Packaging
api_type:
- schema
ms.assetid: 2ad4855c-1c83-4dab-b93f-2bae13fac644
title: 'How to: Copy a Worksheet Using SAX (Simple API for XML)'
ms.suite: office

ms.author: o365devx
author: o365devx
ms.topic: conceptual
ms.date: 04/01/2025
ms.localizationpriority: high
---
# Copy a Worksheet Using SAX (Simple API for XML)

This topic shows how to use the the Open XML SDK for Office to programmatically copy a large worksheet
using SAX (Simple API for XML). For more information about the basic structure of a `SpreadsheetML`
document, see [Structure of a SpreadsheetML document](structure-of-a-spreadsheetml-document.md).

------------------------------------
## Why Use the SAX Approach?

The Open XML SDK provides two ways to parse Office Open XML files: the Document Object Model (DOM) and
the Simple API for XML (SAX). The DOM approach is designed to make it easy to query and parse Open XML
files by using strongly-typed classes. However, the DOM approach requires loading entire Open XML parts into
memory, which can lead to slower processing and `Out of Memory` exceptions when working with very large parts.
The SAX approach reads in the XML in an Open XML part one element at a time without reading in the entire part
into memory giving noncached, forward-only access to XML data, which makes it a better choice when reading
very large parts, such as a <xref:DocumentFormat.OpenXml.Packaging.WorksheetPart> with hundreds of thousands of rows.

## Using the DOM Approach

Using the DOM approach, we can take advantage of the Open XML SDK's strongly typed classes. The first step
is to access the package's `WorksheetPart` and make sure that it is not null.

### [C#](#tab/cs-1)
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet1)]

### [Visual Basic](#tab/vb-1)
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet1)]
***

Once it is determined that the `WorksheetPart` to be copied is not null, add a new `WorksheetPart` to copy it to.
Then clone the `WorksheetPart`'s <xref:DocumentFormat.OpenXml.Spreadsheet.Worksheet> and assign the cloned
`Worksheet` to the new `WorksheetPart`'s Worksheet property.

### [C#](#tab/cs-2)
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet2)]

### [Visual Basic](#tab/vb-2)
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet2)]
***

At this point, the new `WorksheetPart` has been added, but a new <xref:DocumentFormat.OpenXml.Spreadsheet.Sheet>
element must be added to the `WorkbookPart`'s <xref:DocumentFormat.OpenXml.Spreadsheet.Sheets>'s
child elements for it to display. To do this, first find the new `WorksheetPart`'s Id and
create a new sheet Id by incrementing the `Sheets` count by one then append a new `Sheet`
child to the `Sheets` element. With this, the copied Worksheet is added to the file.

### [C#](#tab/cs-3)
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet3)]

### [Visual Basic](#tab/vb-3)
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet3)]
***

## Using the SAX Approach

The SAX approach works on parts, so using the SAX approach, the first step is the same.
Access the package's <xref:DocumentFormat.OpenXml.Packaging.WorksheetPart> and make sure
that it is not null.

### [C#](#tab/cs-4)
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet4)]

### [Visual Basic](#tab/vb-4)
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet4)]
***

With SAX, we don't have access to the <xref:DocumentFormat.OpenXml.OpenXmlElement.Clone*>
method. So instead, start by adding a new `WorksheetPart` to the `WorkbookPart`.

### [C#](#tab/cs-5)
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet5)]

### [Visual Basic](#tab/vb-5)
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet5)]
***

Then create an instance of the <xref:DocumentFormat.OpenXml.OpenXmlPartReader> with the
original worksheet part and an instance of the <xref:DocumentFormat.OpenXml.OpenXmlPartWriter>
with the newly created worksheet part.

### [C#](#tab/cs-6)
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet6)]

### [Visual Basic](#tab/vb-6)
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet6)]
***

Then read the elements one by one with the <xref:DocumentFormat.OpenXml.OpenXmlPartReader.Read*>
method. If the element is a <xref:DocumentFormat.OpenXml.Spreadsheet.CellValue> the inner text
needs to be explicitly added using the <xref:DocumentFormat.OpenXml.OpenXmlPartReader.GetText*>
method to read the text, because the <xref:DocumentFormat.OpenXml.OpenXmlPartWriter.WriteStartElement*>
does not write the inner text of an element. For other elements we only need to use the `WriteStartElement`
method, because we don't need the other element's inner text.

### [C#](#tab/cs-7)
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet7)]

### [Visual Basic](#tab/vb-7)
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet7)]
***

At this point, the worksheet part has been copied to the newly added part, but as with the DOM
approach, we still need to add a `Sheet` to the `Workbook`'s `Sheets` element. Because
the SAX approach gives noncached, **forward-only** access to XML data, it is only possible to
prepend element children, which in this case would add the new worksheet to the beginning instead
of the end, changing the order of the worksheets. So the DOM approach is
necessary here, because we want to append not prepend the new `Sheet` and since the `WorkbookPart` is
not usually a large part, the performance gains would be minimal.

### [C#](#tab/cs-8)
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet8)]

### [Visual Basic](#tab/vb-8)
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet8)]
***

## Sample Code

Below is the sample code for both the DOM and SAX approaches to copying the data from one sheet
to a new one and adding it to the Spreadsheet document. While the DOM approach is simpler
and in many cases the preferred choice, with very large documents the SAX approach is better
given that it is faster and can prevent `Out of Memory` exceptions. To see the difference,
create a spreadsheet document with many (10,000+) rows and check the results of the
<xref:System.Diagnostics.Stopwatch> to check the difference in execution time. Increase the
number of rows to 100,000+ to see even more significant performance gains.

### DOM Approach

### [C#](#tab/cs-0)
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet0)]

### [Visual Basic](#tab/vb-0)
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet0)]
***

### SAX Approach

### [C#](#tab/cs-99)
[!code-csharp[](../../samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs#snippet99)]

### [Visual Basic](#tab/vb-99)
[!code-vb[](../../samples/spreadsheet/copy_worksheet_with_sax/vb/Program.vb#snippet99)]
***
14 changes: 14 additions & 0 deletions samples/samples.sln
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,10 @@ Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "working_with_tables_vb", "w
EndProject
Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "insert_a_picture_vb", "word\insert_a_picture\vb\insert_a_picture_vb.vbproj", "{6170C4E1-A109-435A-BF59-026C85B3BD9C}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "copy_worksheet_with_sax_cs", "spreadsheet\copy_worksheet_with_sax\cs\copy_worksheet_with_sax_cs.csproj", "{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D}"
EndProject
Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "copy_worksheet_with_sax_vb", "spreadsheet\copy_worksheet_with_sax\vb\copy_worksheet_with_sax_vb.vbproj", "{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down Expand Up @@ -938,6 +942,14 @@ Global
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Debug|Any CPU.Build.0 = Debug|Any CPU
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Release|Any CPU.ActiveCfg = Release|Any CPU
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Release|Any CPU.Build.0 = Release|Any CPU
{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D}.Debug|Any CPU.Build.0 = Debug|Any CPU
{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D}.Release|Any CPU.ActiveCfg = Release|Any CPU
{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D}.Release|Any CPU.Build.0 = Release|Any CPU
{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4}.Debug|Any CPU.Build.0 = Debug|Any CPU
{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4}.Release|Any CPU.ActiveCfg = Release|Any CPU
{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
Expand Down Expand Up @@ -1095,6 +1107,8 @@ Global
{A43A75AB-D6B6-4D31-99F7-6951AFEF502D} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
{4EB1FCC9-E1E2-4D2A-ACF9-A3A31AA947A5} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
{6170C4E1-A109-435A-BF59-026C85B3BD9C} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
{0AA6B9DD-2A2C-0E96-1052-6F4AC44B3F5D} = {7ACDC26B-C774-4004-8553-87E862D1E71F}
{2DD90EFB-7F2A-497B-A0F4-EE5F62A49BA4} = {7ACDC26B-C774-4004-8553-87E862D1E71F}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {721B3030-08D7-4412-9087-D1CFBB3F5046}
Expand Down
144 changes: 144 additions & 0 deletions samples/spreadsheet/copy_worksheet_with_sax/cs/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@


using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using System.Diagnostics;

CopySheetDOM(args[0]);
CopySheetSAX(args[1]);

// <Snippet0>
void CopySheetDOM(string path)
{
Console.WriteLine("Starting DOM method");

Stopwatch sw = new();
sw.Start();
// <Snippet1>
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(path, true))
{
// Get the first sheet
WorksheetPart? worksheetPart = spreadsheetDocument.WorkbookPart?.WorksheetParts?.FirstOrDefault();

if (worksheetPart is not null)
// </Snippet1>
{
// <Snippet2>
// Add a new WorksheetPart
WorksheetPart newWorksheetPart = spreadsheetDocument.WorkbookPart!.AddNewPart<WorksheetPart>();

// Make a copy of the original worksheet
Worksheet newWorksheet = (Worksheet)worksheetPart.Worksheet.Clone();

// Add the new worksheet to the new worksheet part
newWorksheetPart.Worksheet = newWorksheet;
// </Snippet2>

Sheets? sheets = spreadsheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>();

if (sheets is null)
{
spreadsheetDocument.WorkbookPart.Workbook.AddChild(new Sheets());
}

// <Snippet3>
// Find the new WorksheetPart's Id and create a new sheet id
string id = spreadsheetDocument.WorkbookPart.GetIdOfPart(newWorksheetPart);
uint newSheetId = (uint)(sheets!.ChildElements.Count + 1);

// Append a new Sheet with the WorksheetPart's Id and sheet id to the Sheets element
sheets.AppendChild(new Sheet() { Name = "My New Sheet", SheetId = newSheetId, Id = id });
// </Snippet3>
}
}

sw.Stop();

Console.WriteLine($"DOM method took {sw.Elapsed.TotalSeconds} seconds");
}
// </Snippet0>

// <Snippet99>
void CopySheetSAX(string path)
{
Console.WriteLine("Starting SAX method");

Stopwatch sw = new();
sw.Start();
// <Snippet4>
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(path, true))
{
// Get the first sheet
WorksheetPart? worksheetPart = spreadsheetDocument.WorkbookPart?.WorksheetParts?.FirstOrDefault();

if (worksheetPart is not null)
// </Snippet4>
{
// <Snippet5>
WorksheetPart newWorksheetPart = spreadsheetDocument.WorkbookPart!.AddNewPart<WorksheetPart>();
// </Snippet5>

// <Snippet6>
using (OpenXmlReader reader = OpenXmlPartReader.Create(worksheetPart))
using (OpenXmlWriter writer = OpenXmlPartWriter.Create(newWorksheetPart))
// </Snippet6>
{
// <Snippet7>
// Write the XML declaration with the version "1.0".
writer.WriteStartDocument();

// Read the elements from the original worksheet part
while (reader.Read())
{
// If the ElementType is CellValue it's necessary to explicitly add the inner text of the element
// or the CellValue element will be empty
if (reader.ElementType == typeof(CellValue))
{
if (reader.IsStartElement)
{
writer.WriteStartElement(reader);
writer.WriteString(reader.GetText());
}
else if (reader.IsEndElement)
{
writer.WriteEndElement();
}
}
// For other elements write the start and end elements
else
{
if (reader.IsStartElement)
{
writer.WriteStartElement(reader);
}
else if (reader.IsEndElement)
{
writer.WriteEndElement();
}
}
}
// </Snippet7>
}

// <Snippet8>
Sheets? sheets = spreadsheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>();

if (sheets is null)
{
spreadsheetDocument.WorkbookPart.Workbook.AddChild(new Sheets());
}

string id = spreadsheetDocument.WorkbookPart.GetIdOfPart(newWorksheetPart);
uint newSheetId = (uint)(sheets!.ChildElements.Count + 1);

sheets.AppendChild(new Sheet() { Name = "My New Sheet", SheetId = newSheetId, Id = id });
// </Snippet8>

sw.Stop();

Console.WriteLine($"SAX method took {sw.Elapsed.TotalSeconds} seconds");
}
}
}
// </Snippet99>
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>

</Project>
Loading