Open XML SDK: Merging Documents
(This post courtesy Natalia Efimsteva)
Office Open XML (OpenXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. The Office Open XML specification has been standardized by ECMA (ECMA-376) [wiki]. Open XML is the native format for MS Office 2007/2010.
Open XML allows you to manipulate MS Office files in your own and desired way. For example, you can create .docx files programmatically on the server side (which wasn’t recommended for binary MS Office formats like .doc).
The Open XML SDK 2.0 for Microsoft Office is built on top of the System.IO.Packaging API and provides strongly typed part classes to manipulate Open XML documents. The SDK also uses the .NET Framework Language-Integrated Query (LINQ) technology to provide strongly typed object access to the XML content inside parts of Open XML documents.
The Open XML SDK 2.0 simplifies the task of manipulating Open XML packages and the underlying Open XML schema elements within a package. The Open XML Application Programming Interface (API) encapsulates many common tasks that developers perform on Open XML packages, so you can perform complex operations with just a few lines of code.
So now let’s discuss an often-asked question like programmatically merging Open XML documents. It’s not a very complicated task, but we need to think about some things.
First of all, let’s look at the internal .docx structure. Below is an unzipped view:
The OpenXML SDK 2.0 contains a great tool – Document Explorer – which allows us to view XML markup as well as .Net representation of a code to construct this markup:
So when we’re merging documents we need not only merge content (text) but also styles of the document and other formatting settings.
Open XML SDK operates on Open XML elements like paragraphs rather than logical (for user) objects like pages, content, and so on.
But we have tool which can make our life easier – DocumentBuilder from PowerTools for Open XML. Another way is to use altChunk. This element specifies a location within a document for the insertion of the contents of a specified file containing external content to be imported into the main WordprocessingML document. Differences between these two approaches described in a post “Comparison of altChunk to the DocumentBuilder Class”. We will talk further about the DocumentBuilder approach.
Use of the DocumentBuilder util is really very simple:
using (WordprocessingDocument part1 = WordprocessingDocument.Open(@"Doc1.docx", false)) using (WordprocessingDocument part2 = WordprocessingDocument.Open(@"Doc2.docx", false)) { List<Source> sources = new List<Source>(); sources.Add(new Source(part1, true)); sources.Add(new Source(part2, true)); DocumentBuilder.BuildDocument(sources, "MergedDoc.docx"); } |
The most interesting is the second argument of the constructor of Source class. Using the keepSections argument appropriately allows you to precisely control which sets of section properties (visual formatting in other words) are moved from source documents into the destination document. For more information please see How to Control Sections when using OpenXml.PowerTools.DocumentBuilder post.
We have two documents:
Doc1.docx
Doc2.docx
DocumentBuilder will do all of the work for you for merging the two documents preserving:
- formatting
- page numbers (including Link Sections)
- headers and footers
- orientation
- and so on.
That’s magic!
Additional Resources
- Microsoft Office Developer Center
- Open XML SDK 2.0 for Microsoft Office
- Open XML Developer Site
- XML in Office
Comments
Anonymous
February 22, 2012
Is there a way to keep each merged document content inside a content control?Anonymous
April 15, 2012
The code doesn't work anymore, New Source now takes WmlDocument and not WordprocessingDocument. Is taht a change in OpenXmlPowerTools?- Anonymous
October 04, 2017
You can also use the filename for the document.
- Anonymous
Anonymous
December 05, 2012
Hello, Using your article i created an application which transforms the output of SSRS reports (more than 1) into an assembled word document. The test documents generated so far happen to perfect open using MS Word 2007 and 2010. Everything works great! Thank You so much. Now there is not so good part to this story. Some or most of these document fail to open using MS Office 2002 or old versions (with latest compatible packs). But if the generated documents are opened and saved with MS Office 2007/2010 and then opened in MS Office 2002 or XP, these documents open without any problem. To achive this (opening documents usng MS Office 2010 and saving them) I have created a dropbox application which listens to a shared folder (dropbox) for new documents, then opens them and then saves them to a delivery location where user can access. This application makes use of "Microsoft.Office.Interop.Word" to launch Word and open the document. The application works for documents which dont have "altchunk", and it fails for the document which my document assembly application generates. Can you please throw some light as to how can this be troubleshooted or resolved? Any help and pointers are appreciated. regards, AJAnonymous
November 11, 2013
Hi , its possible that I can use OpenSDK for GUI processing like finding the range in Active document how it is help full that wihout using Office API I can do every thing what OFFICE API provides