Skip to content

SAX Approach Replace Example #566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AlexHedley opened this issue Mar 6, 2019 · 15 comments
Closed

SAX Approach Replace Example #566

AlexHedley opened this issue Mar 6, 2019 · 15 comments

Comments

@AlexHedley
Copy link
Contributor

I'm looking for an example of using the SAX approach (instead of DOM) to open a large file and perform a replace on a given value, then save the file.

I think this would be a useful addition to the Documentation.

How to: Search and replace text in a document part (Open XML SDK)
https://docs.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part

This uses a Stream.

How to: Parse and read a large spreadsheet document (Open XML SDK)
https://docs.microsoft.com/en-us/office/open-xml/how-to-parse-and-read-a-large-spreadsheet

// The SAX approach.
static void ReadExcelFileSAX(string fileName)
{
    using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
    {
        WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
        WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

        OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
        string text;
        while (reader.Read())
        {
            if (reader.ElementType == typeof(CellValue))
            {
                text = reader.GetText();
                Console.Write(text + " ");
            }
        }
        Console.WriteLine();
        Console.ReadKey();
    }
}

If I read in the "text" and wish to replace this:

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filePath, true))
{
    Document document = wordDoc.MainDocumentPart.Document;
    OpenXmlReader reader = OpenXmlReader.Create(document);
    while (reader.Read())
    {
        //OpenXmlElement element = reader.LoadCurrentElement();
        //text = element.InnerText;
        text = reader.GetText();
    }
}

OpenXmlWriter expects an OpenXmlPart or Stream when you create it.

OpenXmlWriter writer = OpenXmlWriter.Create(#);
writer.WriteStartElement(reader);
writer.WriteElement(#);
writer.WriteEndElement();
writer.Close();

What is the supported approach/method for this?

Using the DOM approach on large files can cause memory exceptions.


There are a number of blog posts documenting how to use the OpenXmlWriter but this is for creating new files or adding new elements to an existing file, not updating existing data.

Parsing and Reading Large Excel Files with the Open XML SDK
http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx [Dead Link]
https://web.archive.org/web/20151205145806/http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx

Writing Large Excel Files with the Open XML SDK
http://blogs.msdn.com/b/brian_jones/archive/2010/06/22/writing-large-excel-files-with-the-open-xml-sdk.aspx [Dead Link]
https://web.archive.org/web/20160216062257/http://blogs.msdn.com/b/brian_jones/archive/2010/06/22/writing-large-excel-files-with-the-open-xml-sdk.aspx

Performance issue while reading/writing large excel files using OpenXML SDK
http://tech-turf.blogspot.com/2015/10/performance-issue-while-readingwriting.html

How to read and write Excel cells with OpenXML and C#
http://fczaja.blogspot.com/2013/05/how-to-read-and-write-excel-cells-with.html

How to properly use OpenXmlWriter to write large Excel files
http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/

@github-actions
Copy link

Stale issue message

@AlexHedley
Copy link
Contributor Author

Hi @twsouthwick I’ve see you comment on other issues for this repo, would it be possible for a comment on this?

Also is there a reason why this would be auto closed with no interactions?

I know other questions have been suggested to moved to Stack Overflow, could instead the new Discussions tab be added and it moved to there instead, if you don’t think this is an Issue,

Thanks

@twsouthwick
Copy link
Member

Sorry for the auto close. I enabled a bot that went through and closed issues that had no comments on it. Happy to reopen and take a look.

@twsouthwick twsouthwick reopened this May 22, 2020
@AlexHedley
Copy link
Contributor Author

@twsouthwick if you have some time that would be great, thanks.

@twsouthwick twsouthwick reopened this Jul 13, 2020
@sorensenmatias
Copy link
Contributor

+1 for this - I am currently faced with a presentation containing lots of vector graphics that ends up allocating 15.000.000 objects in memory using the DOM approach. This basically makes our product unusable. I would also love to hear about workaround to avoid loading everything to memory before it is possible to do manipulation on elements.

@twsouthwick
Copy link
Member

I'm going to close this in favor of #1193 since that has a proposal for this approach. Please reopen if you think it's different.

@AlexHedley
Copy link
Contributor Author

@twsouthwick I'm not sure how reading the last element would allow for replacing an item anywhere in the document, unless I'm missing something?

@twsouthwick
Copy link
Member

It may not solve your need - the title had SAX in it and was trying to consolidate efforts and issues. Sounds like different scenarios at play

@twsouthwick twsouthwick reopened this Feb 3, 2023
@rsdelapaz
Copy link

Finally, as I couldn't find a workaround for this bug, I gave a chance to the LargeXlsx library (https://github.com/salvois/LargeXlsx) and after running several tests in a development environment, I deployed the application to a production environment where I always received the OutOfMemory exception on Azure, and since then it has not occurred again despite generating Excel files with hundreds of thousands of records. The library is easy to use and the documentation is good.

In summary, due to the need to solve this problem, I had to look for alternatives, and fortunately, they were successful. If you have any questions, do not hesitate to ask. Regards.

@AlfredHellstern
Copy link
Contributor

@mikeebowen can you come up with a SAX example?

@mikeebowen mikeebowen pinned this issue Apr 3, 2025
@mikeebowen
Copy link
Collaborator

open-xml-docs 336 adds a sample to replace text in a Word document using the SAX approach. This issue can be closed when it is merged

@AlexHedley
Copy link
Contributor Author

Does OfficeDev/open-xml-docs#337 output a link to the deployed page?

@mikeebowen
Copy link
Collaborator

The PR has been merged, but the updated docs haven't been deployed. When the published doc is live, I'll add the link here and close this issue

@mikeebowen
Copy link
Collaborator

Closing this, because the new documentation is live one learn.microsoft:

@AlexHedley
Copy link
Contributor Author

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants