Partilhar via


SQLIOSim - Outstanding I/O Warnings

I have encountered some questions pertaining to the SQLIOSim outstanding IO warnings and I am posting some details.

The Display monitor thread is responsible for updating the display (progress outputs) and checking for long IOs (CheckForLongIO/ReportLongIo). Each time it wakes (from 1 to 10 second range depending on state) it fires the check.

Each logical CPU (scheduler) in the simulator has an IO list. The CheckForLongIO loops over each logical CPU and checks the pending IO list of each logical scheduler.

When the IO is posted (async) we grab the current time and add the IO to the logical scheduler specific list. When the check is performed each IO in the list is checked with HasOverlappedIoCompleted . This is a WIN32 macro that checks the status stored in the OVERLAPPED structure of the IO to see if the system has reported the IO as complete. When FALSE is returned and the duration of the IO is > 15 seconds the condition is flagged as a ‘stalled’ IO as the OS and sub-system have not completed the specific IO.

The number of these outstanding IOs is accumulated and reported after all logical CPUs have been checked.

The following output shows 2 logical CPUs being used by the simulator.

03/10/07

15:35:58

3264

System

CPUCount = 2, Affinity = 0, IOAffinity = 0, MaxMemory = 630 MB, StopOnError = Yes, TestCycles = 1, TestCycleDuration = 300 s, CacheHitRatio = 100.00%

At 15:36:46 you have 5 IO requests that exceeded 15 seconds. One second later another 72 requests exceeded the 15 second mark. (The same IO is not reported again, once flagged as long it won’t be counted on secondary passes.)

What this does not show is how many IOs have completed. SQLIOSim could have completed 1000s of IOs during this 15 second window but some of the requests are being stalled.

03/10/07

15:36:45

3764

Display Monitor

Error: 0x00000000 Error Text: Description: 5 IO requests are outstanding for more than 15 sec.

03/10/07

15:36:46

3764

Display Monitor

Error: 0x00000000 Error Text: Description: 72 IO requests are outstanding for more than 15 sec.

The files are open with NO buffering so they should not involve system cache components. You should review filter drivers and the like. Likely something like elevator seek or something of this nature is at play when you see the outstanding IO warnings.

 

03/10/07

15:35:58

3264

System

NoBuffering = Yes, WriteThrough = Yes, UseScatterGather = Yes, ForceReadAhead = No, MaxOutstandingIO = 0, TargetIODuration = 100 ms

LOGICALCONFIG::PrintConfigurationToLog

e:yukonsosbranchsqlntdbmsstorengutilsqliosimlogicalconfig.cpp

228

 

03/10/07

15:35:58

3264

System

AllowIOBursts = Yes, DeleteFilesAtStartup = Yes, DeleteFilesAtShutdown = No, StampFiles = No

     

The following error text is reported for each, individual file no more than every 30 seconds, indicating the number of total IOs identified as taking longer than 15 seconds.

The following indicates that 826 IO requests for the file sqliosim.mdx exceeded 15 seconds to complete in the last 30 second processing window. The last IO that is currently STATUS_PENDING (256) is at the specified offset on the specified handle.

 

03/10/07

15:37:16

3764

Display Monitor

Comments

  • Anonymous
    June 28, 2007
    One of the things that’s great about my job is the fact that I get to meet people who are a lot smarter

  • Anonymous
    July 18, 2007
    I had run the test for HP MSA drive wiht one local drive. But for MSA drvie it is printing following errors:- 19/07 12:43:56 11688 CreateFileStream Error: 0x80070032 Error Text: The request is not supported. Description: Unable to get disk cache info for Z:

  • Anonymous
    October 08, 2007
    The utility attempts to check the caching state.   When it can't it logs an error.  This should be a warning and will be updated in future versions. [RDORR Jan 24, 2008] This is correct and is already updated in the SQL 2008 CTP builds.

  • Anonymous
    March 04, 2009
    "[RDORR Jan 24, 2008] This is correct and is already updated in the SQL 2008 CTP builds." In SQL2008 on Windows 2008, I still get a similar error message from "CreateFileStream":    Error: 0x80070001    Error Text: Incorrect function.    Description: Unable to get disk cache info for C: Version: 2007.100.1600.22 Is it safe to say that it is just a warning? not an error?

  • Anonymous
    June 09, 2009
    According to this post, "The CheckForLongIO calls ReportLongIO but ReportLongIO has the 30 second reporting throttle." In my case it looks like it is in the reverse order. Can you please help me to understand is this is a potential problem. 10:47:09 Description: 1 long IO requests are outstanding for file E:sqliosim.mdx. Handle 0x598. Offset of latest long IO 0xab5f40000, duration 15366 ms CLogicalFile::ReportLongIO 10:47:09 Description: 298 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO 10:47:11 Description: 385 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO 10:47:13 Description: 282 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO 10:47:14 Description: 915 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO 10:47:15 Description: 830 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO 10:47:16 Description: 723 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO 10:47:17 Description: 1446 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO 10:47:18 Description: 101 IO requests are outstanding for more than 15 sec. CLogicalCPUs::CheckForLongIO

  • Anonymous
    August 24, 2010
    wonderful blog thanks for sharing such useful post. http://www.billergeeks.com

  • Anonymous
    October 01, 2010
    Thanks for sharing this information, its quite helpful. www.crystal-limousine.com

  • Anonymous
    May 16, 2013
    The comment has been removed

  • Anonymous
    November 25, 2018
    I have a basic query:In case of underlying device not being accessible for few seconds.How does sqliosim behaves.I see Cachehit Ratio =1000 (10%) being set.My expectation, it should hold the IO for some time before it gives an Error dump.But in actual, i am seeing an sqliosim error dumo immediately.1st Dump Snippet:Data mismatch between the expected disk data and the read buffer:File: E:\sqliosim.mdxOffset: 0x1A6D0000Expected FileId: 0x0Received FileId: 0x0Expected PageId: 0xD368Received PageId: 0x2DD70000 (does not match expected)Expected CheckSum: 0xCDE3D5F4Received CheckSum: 0xD05B4ED0 (does not match expected)Calculated CheckSum: 0xD05B4ED0Expected Buffer Length: 0x2000Received Buffer Length: 0x2000Synchronous read was not successful after 15 attempts++++++Difference data available at the end of dump+++++++Do anyone have an idea, whether its a Storage problem or a tool problem.I ran it in Windows 2008,12,2016 Server edition.Will appreciate a quick response.Thanks,Tanay

  • Anonymous
    November 25, 2018
    Any idea, how sqliosim would handle if underlying device is offline for few seconds