Azure Data Factory Copy Activity - MongoDB Upsert Failing to Update Existing Documents

Ramakrishna Vaitla 0 Reputation points
2025-03-13T09:59:04.5833333+00:00

I'm using Azure Data Factory's Copy Data activity to load data from a CSV file into a MongoDB collection. My goal is to use the "upsert" operation to either insert new documents or update existing ones based on the _id field. However, I'm encountering an issue where the Copy activity consistently inserts new documents, even when documents with the same _id already exist in the MongoDB collection.

Scenario:

  • Source: CSV file in Azure Blob Storage (or other supported source).
  • Sink: MongoDB Atlas
  • Operation: Upsert
  • Problem: Instead of updating existing documents with matching _id values, the Copy activity inserts new documents with the same _id. This results in duplicate documents.

CSV Data Example:

  _id,name,gender,age
67c88f641b4dd8cef2342e9d,Sam,M,25
67c88f641b4dd8cef2342e9d,Tam,F,23

In this example, I expect the second row to update the document with _id: 67c88f641b4dd8cef2342e9d, but instead, a new document is inserted with type string for _id in mongoDB instead of ObjectId.

Key Considerations:

  • _id Data Type: The _id field in my MongoDB collection is of type ObjectId. The _id field in my source CSV is a string representation of the ObjectId.
  • Mapping: I've configured the Copy activity's mapping to correctly map the CSV's _id column to the MongoDB document's _id field.
  • Sink Configuration: The Copy activity is configured to use the "Upsert" write behavior.

Copy Activity Configuration Details:

  • Write Behavior: Upsert
  • Mapping:
  {
"mappings": [
  {
    "source": {
      "name": "_id",
      "type": "String"
    },
    "sink": {
      "path": "$['_id']"
    }
  },
  {
    "source": {
      "name": "name",
      "type": "String"
    },
    "sink": {
      "path": "$['name']"
    }
  },
  {
    "source": {
      "name": "gender",
      "type": "String"
    },
    "sink": {
      "path": "$['gender']"
    }
  },
  {
    "source": {
      "name": "age",
      "type": "int16"
    },
    "sink": {
      "path": "$['age']"
    }
  }
]
}

Troubleshooting Steps Taken:

  • Verified the connection to both the source and sink datasets.
  • Confirmed that the _id field is correctly mapped in the Copy activity (I couldn't find any thing other than string in respect to it's counterpart in MongoDB which. is ObjectId.
  • Reviewed the Copy activity's monitoring output, which indicates that rows are being "written" (inserted) rather than "updated."
  • Tried using different data types for the _id field in the mapping (e.g., String).

Questions:

  1. Is there a known issue with the Azure Data Factory Copy activity's upsert functionality when dealing with ObjectId fields in MongoDB?
  2. Does the Copy activity automatically handle the conversion of a string representation of an ObjectId from the CSV source to an actual ObjectId in MongoDB? If not, how can I perform this conversion within the Copy activity or a preceding activity?
  3. Are there any specific settings or configurations in the Copy activity or the MongoDB sink dataset that I might be missing that are crucial for the upsert operation to work correctly with ObjectId fields?
  4. Is there a better approach to achieve this, perhaps using a different activity or a different data flow pattern?
  5. Are there any limitations to the upsert operation in ADF when copying data to MongoDB?

I would appreciate any guidance or suggestions on how to resolve this issue and ensure that the Copy activity correctly updates existing documents in my MongoDB collection based on the _id field. Thank you!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,343 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Ramakrishna Vaitla 0 Reputation points
    2025-03-13T15:46:56.6666667+00:00

    I played around and I found out answer from https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db-mongodb-api?source=recommendations&tabs=data-factory#schema-mapping.

    I had to map _id to $['_id']['$oid'] in mapping to make the upsert work.

    1 person found this answer helpful.

  2. Chandra Boorla 9,995 Reputation points Microsoft External Staff
    2025-03-13T18:19:26.1466667+00:00

    @Ramakrishna Vaitla

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer.

    Issue:

    Azure Data Factory Copy Activity - MongoDB Upsert Failing to Update Existing Documents

    Solution:

    "To ensure that the upsert operation functioned correctly, I had to map the _id field in the source data to the corresponding JSON path $['_id']['$oid'] in the mapping configuration. This adjustment was necessary to properly extract and utilize the ObjectId value from the nested structure."

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.