Исходники.RU - Информационный сервер для программистов: исходники со всего света.

ODBC and MFC


Michael Patterson -- patterso@sprynet.com
Wednesday, November 20, 1996

Environment: VC++ 1.52, Win 3.1, 16-bit


Hello,
I have two MFC related - ODBC questions:


[Q1]  I have a question about the fastest way
      for bulk inserts into an Access datasource.

I think/assume the MFC class, CRecordset, repeatedly calls
'SQLExecute' during the AddNew/Update process.  Doing that
for a couple of hundred records, seems to take a lot of time.

My assumption, until rollback or commit:
{
   ::SQLPrepare(hstmt,
      (UCHAR FAR*)"INSERT INTO MY_DATASOURCE VALUES (?)",
      SQL_NTS);
   //
   ::SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT,
      SQL_C_SSHORT, SQL_SMALLINT, 5, 0, &iMyData, 0, &cbMyData);
   //
   while (TRUE)
   {
      iMyData = iMyData + 1;
      ::SQLExecute(hstmt);
   }
}

I do know about the ODBC function 'SQLParamOptions', which
will allow you to do bulk inserts.  But, the driver that I am
using does not support that function.  [I don't know how much
faster bulk inserts would be using that function.]

My record size for that data source is 20 bytes.  And like
I said before, it's just taking too long to insert records.
Any help would great.

[Q2]  I also have one other question.  How do I enable my MFC
      application to use the ODBC 'Setup' & 'Installer' DLL functions?


Thanks.
Mike


wei@cmgroup.com
Friday, November 22, 1996

[Mini-digest: 2 responses]


> [Q1]  I have a question about the fastest way
>       for bulk inserts into an Access datasource.

> I think/assume the MFC class, CRecordset, repeatedly calls
> 'SQLExecute' during the AddNew/Update process.  Doing that
> for a couple of hundred records, seems to take a lot of time.

The speed problem may not necessary because of calling SQLExecute multiple
times. Access is file based DB. Every insert involve disk access. The speed
was controled by disk access time, as we know, it's slow. I'm not quite 
familiar with Access. If it does support transaction, use transaction will
speed up a lot even you call AddNew/Update many times the disk access only
once (assume you're not insert a huge amount of data at a time).


-----From: hrubin@nyx.net (Howard Rubin)

Michael Patterson said:
[Q1]  I have a question about the fastest way
      for bulk inserts into an Access datasource.

According to MSDN, the fastest way to insert into an Access
database is to use SQLSetPos(). If you have any questions after
reading the MSDN article, and I certainly did, email me at work
(hrubin@disc.com) and I'll send you a working sample app. that
I wrote after several calls to Microsoft Tech support ODBC group.
- Howard Rubin

Mike Blaszczak -- mikeblas@nwlink.com
Sunday, November 24, 1996

At 18:57 11/22/96 -0500, wei@cmgroup.com wrote:

>> [Q1]  I have a question about the fastest way
>>       for bulk inserts into an Access datasource.
>
>> I think/assume the MFC class, CRecordset, repeatedly calls
>> 'SQLExecute' during the AddNew/Update process.  Doing that
>> for a couple of hundred records, seems to take a lot of time.
>
>The speed problem may not necessary because of calling SQLExecute multiple
>times. Access is file based DB. Every insert involve disk access.

This isn't true.  Access uses the Jet engine, and the Jet engine
aggressively caches and spools database accesses--inside of
transactions or not.

>The speed was controled by disk access time, as we know,
>it's slow.

Speed of an applicatio nis really contorlled by the algorithm used to
implement the application's behaviour. Implementing questionable code
kills your performance; implementing smooth code raises your performance.
These aspects of what you write govern your application's performance
with far more weight than the database platform you use or which compiler
options you switch on or off.

>I'm not quite familiar with Access. If it does support transaction,
>use transaction will speed up a lot even you call AddNew/Update 
>many times the disk access only once (assume you're not insert
>a huge amount of data at a time).

There is not a one-ot-one relationship between disk writes in Access
and the use of transactions in Access.  Access might do a disk hit
inside of a transaction or outside of a transaction.  Access might
read or write at any time while it is working on your query.

.B ekiM
http://www.nwlink.com/~mikeblas/
I'm afraid I've become some sort of speed freak.
These words are my own. I do not speak on behalf of Microsoft.

Michael Patterson -- patterso@sprynet.com
Monday, November 25, 1996

At 09:38 AM 11/24/96 -0800, Mike Blaszczak  wrote:
>At 18:57 11/22/96 -0500, wei@cmgroup.com wrote:
>
>>> [Q1]  I have a question about the fastest way
>>>       for bulk inserts into an Access datasource.
>>
>>> I think/assume the MFC class, CRecordset, repeatedly calls
>>> 'SQLExecute' during the AddNew/Update process.  Doing that
>>> for a couple of hundred records, seems to take a lot of time.
>>
>>The speed problem may not necessary because of calling SQLExecute multiple
>>times. Access is file based DB. Every insert involve disk access.
>
>This isn't true.  Access uses the Jet engine, and the Jet engine
>aggressively caches and spools database accesses--inside of
>transactions or not.

{{{
In my case, are you absolutely certain that each time I did a SQLExecute,
there wasn't a disk access?  It appeared to be doing that.  After the
suggestions from Mr. Rubin and Mr. Sheng, I set the driver to manual
commit mode, used SQLSetPos, and then finally committed or rolled back the
transaction - and it increased the speed dramatically.  By definition,
when the driver is in auto-commit mode, isn't each statement "committed
immediately" after it is executed?  If I understand it correctly, Mr. Sheng
was correct - at least in my case.

>>The speed was controled by disk access time, as we know,
>>it's slow.
>
>Speed of an applicatio nis really contorlled by the algorithm used to
>implement the application's behaviour. Implementing questionable code
>kills your performance; implementing smooth code raises your performance.
>These aspects of what you write govern your application's performance
>with far more weight than the database platform you use or which compiler
>options you switch on or off.

{{{
If your defintion of algorithm, includes not using the 'best fit'
combination of ODBC function, you're absolutely right.  That's why
I posted the question.  After correctly implementing the ODBC functions,
I noticed a dramatic improvement [even much better than I expected].

>>I'm not quite familiar with Access. If it does support transaction,
>>use transaction will speed up a lot even you call AddNew/Update 
>>many times the disk access only once (assume you're not insert
>>a huge amount of data at a time).
>
>There is not a one-ot-one relationship between disk writes in Access
>and the use of transactions in Access.  Access might do a disk hit
>inside of a transaction or outside of a transaction.  Access might
>read or write at any time while it is working on your query.

{{{
I still need to learn a LOT about ODBC, but my error was in not using the
proper ODBC functions.  I also left the driver in auto-commit mode which
I assume made each statement to be committed immediately after it is
executed.

Mr. Rubin sent me the following:
**************************************************************************
>From Q126131:

Insert, Delete, and Update Performance
 --------------------------------------

In general, Inserts, Deletes, and Updates can be done in one of six ways,
as listed below in order of increasing performance. To make the comparison
more concrete, 100 inserts were done using each of these methods into an
Microsoft Access version 2.0 table that had five columns of text data.

NOTE: These results do not represent exhaustive benchmark testing, so they
should not be treated as such. They are provided to illustrate the relative
performance of each of these methods.

1. 100 SQLExecDirect() inserts with no transaction        5457ms
2. 100 SQLExecDirect() inserts with transaction           4756ms
3. 100 SQLPrepare()/SQLExecute() inserts no transaction   3515ms
4. 100 SQLPrepare()/SQLExecute() inserts  transaction     2994ms
5. 100 SQLSetPos()/SQL_ADD inserts with no transaction     831ms
6. 100 SQLSetPos()/SQL_ADD inserts with transaction        721ms

****************************************************************************

Can I assume that using a transaction is faster, when doing bulk 'inserts'?

Nick Irias -- cabago@netcom.com
Tuesday, November 26, 1996

[Mini-digest: 2 responses]

If you are trying to learn more about  ODBC,  one option is to use DrDeeBee
(I dont work for them), a utility that is shipped with Intersolv ODBC
drivers.  It will produce a log file of all ODBC calls w/ arguments and
return codes.  It is a usefull debugging tool and is worth the cost of a
single Intersolv ODBC driver license even if you dont need the ODBC driver.
 I dont know if it is available separately.

I have attached a sample log of a connect/ interrogate driver/ disconnect
session as an example ( I edited the log to hide the userid and password). 
The log file is a blow by blow of every odbc call that my appI made.

SQLAllocEnv
	0x01000000
	SQL_SUCCESS
SQLAllocConnect
	0x01000000
	0x01010000
	SQL_SUCCESS
SQLGetInfo
	0x01010000
	SQL_DRIVER_ODBC_VER
	[5]02.50
	6
	5
	SQL_SUCCESS
SQLDriverConnect
	0x01010000
	0x00000000
	[37]DSN=paleo7;SRVR=vm7;UID=*****;PWD=*******
	SQL_NTS
	[37]DSN=paleo7;SRVR=vm7;UID=*****;PWD=*******
	255
	37
	SQL_DRIVER_NOPROMPT
	SQL_SUCCESS
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	512
	0
	SQL_NO_DATA_FOUND
SQLGetFunctions
	0x01010000
	0
	FALSE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	FALSE
	TRUE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	FALSE
	FALSE
	FALSE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	TRUE
	FALSE
	TRUE
	FALSE
	FALSE
	TRUE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	FALSE
	SQL_SUCCESS
SQLGetInfo
	0x01010000
	SQL_CURSOR_COMMIT_BEHAVIOR
	SQL_CB_PRESERVE
	2
	NULL
	SQL_SUCCESS
SQLGetInfo
	0x01010000
	SQL_CURSOR_ROLLBACK_BEHAVIOR
	SQL_CB_PRESERVE
	2
	NULL
	SQL_SUCCESS
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	511
	0
	SQL_NO_DATA_FOUND
SQLGetInfo
	0x01010000
	SQL_DATA_SOURCE_NAME
	[6]paleo7
	255
	6
	SQL_SUCCESS
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	511
	0
	SQL_NO_DATA_FOUND
SQLGetInfo
	0x01010000
	SQL_DRIVER_NAME
	[11]IVOR709.DLL
	255
	11
	SQL_SUCCESS
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	511
	0
	SQL_NO_DATA_FOUND
SQLGetInfo
	0x01010000
	SQL_DRIVER_VER
	[10]02.12.0000
	255
	10
	SQL_SUCCESS
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	511
	0
	SQL_NO_DATA_FOUND
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	511
	0
	SQL_NO_DATA_FOUND
SQLGetInfo
	0x01010000
	SQL_DBMS_NAME
	[6]Oracle
	255
	6
	SQL_SUCCESS
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	511
	0
	SQL_NO_DATA_FOUND
SQLGetInfo
	0x01010000
	SQL_DBMS_VER
	[10]07.00.0000
	255
	10
	SQL_SUCCESS
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	511
	0
	SQL_NO_DATA_FOUND
SQLGetInfo
	0x01010000
	SQL_SERVER_NAME
	[3]vm7
	255
	3
	SQL_SUCCESS
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	511
	0
	SQL_NO_DATA_FOUND
SQLGetInfo
	0x01010000
	SQL_DATABASE_NAME
	[0]
	255
	0
	SQL_SUCCESS
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	511
	0
	SQL_NO_DATA_FOUND
SQLGetInfo
	0x01010000
	SQL_USER_NAME
	[3]CPS
	255
	3
	SQL_SUCCESS
SQLError
	NULL
	0x01010000
	NULL
	[5]00000
	0
	[0]
	511
	0
	SQL_NO_DATA_FOUND
SQLDisconnect
	0x01010000
	SQL_SUCCESS
SQLFreeConnect
	0x01010000
	SQL_SUCCESS
SQLFreeEnv
	0x01000000
	SQL_SUCCESS

-----From: Mike Blaszczak 

At 18:51 11/25/96 -0800, Michael Patterson wrote:

>>>The speed problem may not necessary because of calling SQLExecute multiple
>>>times. Access is file based DB. Every insert involve disk access.
>>
>>This isn't true.  Access uses the Jet engine, and the Jet engine
>>aggressively caches and spools database accesses--inside of
>>transactions or not.
>
>{{{
>In my case, are you absolutely certain that each time I did a SQLExecute,
>there wasn't a disk access? 

Nope.  I made no such claim: I was refuting the generalization that "every
insert [into an Access DB] involves disk access".  That's simply not true,
and it really doesn't even have much to do with transactions.  I don't
think you posted enough information about your situation for anyone
to draw any conclusion about when your drive light was coming on, and I
certainly didn't do that.

>By definition,
>when the driver is in auto-commit mode, isn't each statement "committed
>immediately" after it is executed? 

Yes, it is.  But "committed"  doesn't necessarily mean it has been
written to disk.  It  means that you can't roll back the transaction
anymore. It  means that the locks that you grew during the transaction
have been released.

>Can I assume that using a transaction is faster, when doing bulk 'inserts'?

That really depends on what, exactly, you mean by "using a transaction".

First, a database transaction has no relationship with physical I/O.  There
are situations where ending a transaction won't cause writes.  The most obvious
is the use of a read-only transaction: if you do this:

        SELECT something;       // the databsae locks what you've selected
        // sit and think about it
        COMMIT TRANSACTION;     // the database releases your locks

There are no disk writes.

Second, it's completely up to the database what relationships there might be
between a transaction involving particular SQL and physical I/O.  That
relationship might further be complicated by the underlying operating system.

IF you want to insert 10 rows to a table, you might do this ten times:

        INSERT ROW #1;          // the server acquires locks on the data and
                                // index pages.
                                // the server creates a transaction log.
                                // the server processes your insert. (*)
                                // the server makes a note of what
                                // changed in its transaction log.
        COMMIT TRANSACTION;     // the server deletes the transaction log.
                                // the server releases the locks.
                                // the server finalizes the insert. (+)

        INSERT ROW #2;          // the server acquires locks on the data
                                // and index pages.
                                // the server creates a transaction log.

                   // and so on

You can see that, with this repetition, the DBMS has to do lots and lots of
stuff over and over again.  Opening the log, acquiring and releasing locks,
and releasing the log is an expensive proposition.

The server might do a disk hit on _any_ of these lines: large database objects
might spill their locking tables out of memory and onto disk.  If that happens,
just testing to acquire a lock can cause a disk hit.

The line marked (*) may or may not alter the underlying table.  It is very
likely that it will just dork with something in memory, and later use that
information in memory when it starts stuffing the data to disk, on the
line marked (+).  For some DBMSs, even the line marked (+) might only let the
data sit around in memory to be spooled off to disk by some other thread,
assuming nobody else deletes that same row before it gets written, say.

If you insert 10 rows to a table, a better way might be to do this:

        INSERT ROW #1;          // the server acquires locks on the
                                // data and index pages.
                                // the server processes your insert.
                                // the server creates a transaction log.
                                // the server makes a note of what changed
                                // in its transaction log.
        INSERT ROW #2:          // the server processes your insert.
                                // the server makes a note of what changed
                                // in its transaction log.

        // and so on

        COMMIT TRANSACTION;     // the server deletes the transaction log.
                                // the server releases the locks.
                                // the server finalizes the insert.

This is clearly faster because you're not going to redundantly do all
the log management, you're not worrying about setting and clearing locks
quite so frequently.

_Maybe_ the server will do disk I/O less frequently, but there is no
correlation between committing a transaction and physical I/O. But
turning off per-statement auto commits causes the DBMS to do less
management, and that's what makes things faster. It _might_ do less 
disk I/O, but that's not a certainty... even when we can identify the
specific DBMS in question.

Certainly, avoiding redundant work where you can is something that
improves an algorithm. It's important to think about what's going on
under the API when you pick out your algorithm.

Getting more familiar with ODBC will help you understand these things,
but I think it might be better to learn some theory first.  "Fundamentals
of Database Systems", by Elmasri and Navathe (Benjamin Cummings, 1989,
ISBN 0-9053-0145-3) is an excellent textbook on these architectural
features of database systems.  Coming to understand what this book
discusses will give you an excellent understanding of what's going on--
and that'll help you figure out what features of the system to look for
when you want to try to find what's so darned slow.

.B ekiM
http://www.nwlink.com/~mikeblas/
I'm afraid I've become some sort of speed freak.
These words are my own. I do not speak on behalf of Microsoft.

| Вернуться в корень Архива |