Friday, March 12, 2010

General SQL Optimization Tips and Tricks

SQL and You

SQL can get complicated.  You are simple.  That's just the way it is. 
You like issuing commands like...




select * from orders

...when all you need is a few columns.  But the SQL Server will
dutifully fetch all the data you requested, no matter the cost to the
performance of the server, the network, and your fellow database users.



The above query might not be too bad, but when you see what an order
looks like...





...you figure out you need to join in some additional data to make a
meaningful report.

so you join to the customer table, and your query becomes:


select o.*, b.billto_firstname+' '+b.billto_lastname as CustName from orders o

join buyerdata b on b.OrderID = o.OrderID

...giving us a concatenated name...



Ok, time goes on and you add joins to categorize the orders by state,
product type, salesman, etc. etc. etc. until your SQL gets into a state
we like to call gnarly.  Another thing that happens is that your
SQL is written to make life easier for you, not for the database
server.  Soon, your query gives you the perfect data, and your
report/web site/whatever looks amazing - after minutes of crunching to
gather the data.



This article is written to help you find and correct some common things
that make queries easy for us, but difficult for your database server. 
This process in general, is called query optimization and
companies pay big bucks for people who are experts at this.  Imagine a
business with 800 users entering data on their desktops.  Some are on
the phone with vendors or clients, and system performance is mission
critical.  Suddenly you launch off the un-optimized query for your
weekly TPS report, and the entire database starts chugging like The Little Engine That Could.  This means hundreds of your business
clients all start hearing words like "Geez, hang on - this screen is
loading...".  You can see the business need here.



Even if your business is small, why tax your systems needlessly if you
can take a few steps to write better SQL that gives you the data you
need faster?  I promise - if your small business grows big, that query
that generates a report out of your 1MB database in 3 seconds now will
probably take 30 seconds when you reach 10MB and 5 minutes at 100MB. 



Tip 1:  Never use Select *

Select * is great for when you just want to get a feel for what data is
in a table.  But it's the least efficient thing you can do in a
production query.  Think about it.  First, the parser has to access the
data dictionary and fetch the names and types of all the columns in the
table (or multiple, joined tables).  Then your '*' is replaced with all
the names of the columns - which (by the way) are not guaranteed to be
in any particular order under ANSI SQL specs.  If you are writing a
query for production use, one that will potentially execute thousands of
times a day, take the time to tell it which columns to return.



Tip 2: Indexes

Indexes are designed to help queries. If you're summarizing sales by
agent, then adding a secondary index to the agent column in your sales
table will save you a lot of time.  Technically the time is being spent,
but it's being used in tiny slices whenever you change data in the
sales table.  That triggers the maintenance of the index, which can then
be searched much faster than the whole table.



Tip 3: Case Insensitive Searches

Consider:


Select fname, lname, email from Customers where lastname='McHenry'

This query might work, unless you're looking for all people who entered
any of these: McHenry, mchenry, Mchenry, MCHENRY, McHENRY.  ok - so you
want to just find the customers with a case insensitive search.  So you
do this:


Select fname, lname, email from Customers where lowercase(lastname)=lowercase(@searchname)

What happens now? The query is executed not against the field "lastname"
(and any indexes applied to it), it is instead executed against the
function lowercase(), which has no index.  There are two
common workable solutions to this issue.   Neither of the solutions is
to make sure all data is entered using strict capitalization
standards
. No one wants to be the Capitalization Nazi!


  1. Use a function index.  Oracle supports these. you can actually make
    an index on "lowercase(field)" or "getmonth(datefield)".  This is not a
    widespread feature, but is really useful in these cases.















  2. Add a lower or upper column in your data to use for joining and
    searching.  If you have an ASP.NET membership provider database, note
    that the aspnet_users table has a column named LoweredUserName
    If this approach is good enough for Microsoft, it's good enough for me.


Tip 4: Data Warehousing

Sometimes you just *must* have that report that performs a 23 table join
against a 40 million record database.  There's just no way you can put
out your TPS reports without it.  What many companies do is perform
certain data mining and harvesting tasks once per day, rolling up and
summarizing certain data into a "data warehouse" for offline reporting
needs.  This data is often de-normalized to make faster queries and it's
usually stored on a different database server and even a different
network to prevent reporting from slowing down the workday.

The idea is that you create a query that performs certain common joins
and relations... rolls up hourly/daily/monthly or whatever kind of
totals, creates various types of crosstabs, and stores the rolled up
data in a reporting database.

These summaries are run during off hours (or infrequently during the
workday) to prevent the database servers from clogging with work while
you're open for business.

Managers and their lackeys can hammer away at the reporting database all
day long and your main operation could not care less.



Tip 5: Joins vs. Sub-Selects

Sometimes when you just want a single lookup from a table (say decoding a
County ID to a name) , it's faster to not join in the county table. 
Some SQL servers can detect this kind of lookup and perform this
optimization for you.  Note...




select U.UserName, RR.RoleName from aspnet_Users U

join aspnet_UsersInRoles R on R.UserId=U.UserId

join aspnet_Roles RR on RR.RoleId=R.RoleId

We have joined the aspnet_UsersInRoles table to get the ROLEID of the
user's roles, and joined the aspnet_Roles table to get the name of the
roles for the user.



Note the difference here:




select U.UserName,

(select RR.RoleName from aspnet_Roles RR where RR.RoleId=R.RoleId) as Role

from aspnet_Users U

join aspnet_UsersInRoles R on R.UserId=U.UserId

See how we're using the sub-select to pick up one field?  The thing
about this tip is that sometimes the former way is faster and sometimes
the latter.  You have to experiment.



Tip 6: Explain Plan/Query Optimizers

Ok, most SQL Servers have a module called a Query Optimizer.  This
module can take your weak, human SQL and tweak it before execution to
improve performance.  But there is also usually a UI that you can use to
performance tune your queries.  I know Oracle has this, and have used
it many times.  It breaks your query up into chunks called a query
plan
, and assigns an estimated cost to each part of the
plan.  You can then alter the text of your query to improve the overall
performance.



Tip 7:  Query Hints

On the rare occasions when your SQL is tricking the optimizer into
taking the wrong approach to your query, Oracle and SQL Server have the
ability for you to add query hints to your SQL.  These hints can
tell the optimizer to use a specific kind of cursor, or prefer a
different index, and alter the query plan used during execution.

The syntax differs greatly between the various SQL servers
(Oracle/MSSQL/etc.), and will be shown only as an example here.


  1. SQL
    Server

  2. More SQL
    Server

  3. Oracle


Tip 8: Views

Sometimes creating a view can help optimize your query.  The idea is to
optimize the SQL for certain common transactions and save it as a view,
making the need to reinvent the wheel significantly lower.  But there is
also such a thing as a materialized view. This is a type of view
that is maintained like an index, as the live data is stored in the
main tables.  A normal view does not store it's own data, but a
materialized view does.  The view's data can then be queried without
having to do whatever complex joins were needed to generate the data. 
Of course the trade-off here is storage space.

How to Auto-generate Order Line Item numbers for bulk uploads

 I had a problem where I had 17000 line items to insert into 9000 orders. The system required line item numbers, preferably numbered 1throug...