Chapter 8
SQL Changes
After leaving the SQL portion of VFP virtually unchanged for a long time, Microsoft began to introduce additional features and increased compliance with the ANSI-92 standard with VFP 8. VFP 9 continues that process, offering additional functionality for SELECT, INSERT, UPDATE, and DELETE, as well as a number of other improvements.
VFP’s SQL sublanguage lets you manipulate data without worrying about record pointers or work areas. VFP 9 includes a number of enhancements to this part of the language. Many of the limits related to queries have been removed, the use of subqueries has been expanded, and the UPDATE and DELETE commands accept join conditions to specify the records to modify or remove. VFP 9 also offers some performance improvements, an easier way to test performance, and the ability to query against a buffered table.
VFP 9 has no limits
The most basic change to queries in VFP 9 is the elimination of a number of restrictions. In earlier versions, for example, the total number of joins and subqueries in a single query was limited to nine; in VFP 9, there’s no limit. Table 1 shows the limits related to queries removed or raised in VFP 9.
Description |
Limit in VFP 8 (and earlier) |
Limit in VFP 9 |
Total number of joins and subqueries |
9 |
No limit |
Number of UNIONs |
9 |
No limit |
Number of tables and aliases referenced |
30 |
No limit |
Number of items listed in IN clause |
24 |
Based on SYS(3055) setting |
Nesting level for subqueries |
1 |
No limit |
For everyday queries, most developers never run into these limits, but each of them can pose problems in certain situations.
Lots of tables
With fully normalized data, the old limits on joins, as
well as those on the number of tables and aliases referenced, can make it
difficult to pull together all the information for a particular entity. For
example, consider the Northwind database that comes with VFP 8
and 9. Collecting all the data related to each order, including customer
information, shipper information, supplier information and so forth, involves
11 tables. Listing 1 shows such
a query.
The Developer Downloads for this chapter, available from www.hentzenwerke.com,
include the query in Listing 1 as
ManyTables.PRG.
SELECT ProductName, CategoryName, OrderDate, ;
Customers.CompanyName AS CustomerName, ;
OrderDetails.Quantity, ;
OrderDetails.UnitPrice, ;
Suppliers.CompanyName AS SupplierName, ;
Employees.LastName, Employees.FirstName, ;
Territories.TerritoryDescription, ;
Region.RegionDescription, ;
Shippers.CompanyName AS ShipperName;
FROM Orders ;
LEFT JOIN Customers ;
ON Orders.CustomerID = Customers.CustomerID ;
LEFT JOIN OrderDetails ;
ON Orders.OrderID = OrderDetails.OrderID ;
LEFT JOIN Products ;
ON OrderDetails.ProductID = Products.ProductID ;
LEFT JOIN Categories ;
ON Products.CategoryID = Categories.CategoryID ;
LEFT JOIN Suppliers ;
ON Products.SupplierID = Suppliers.SupplierID ;
LEFT JOIN Shippers ;
ON Orders.ShipVia = Shippers.ShipperID ;
LEFT JOIN Employees ;
ON Orders.EmployeeID = Employees.EmployeeID ;
LEFT JOIN EmployeeTerritories ;
ON Employees.EmployeeID = EmployeeTerritories.EmployeeID ;
LEFT JOIN Territories ;
ON EmployeeTerritories.TerritoryID = Territories.TerritoryID ;
LEFT JOIN Region ;
ON Territories.RegionID = Region.RegionID ;
ORDER BY Products.ProductID, Orders.OrderDate ;
INTO CURSOR Unfolded
This query lists 11 different aliases and performs 10 joins; VFP 8 and earlier versions can’t run it because it generates error 1805, “SQL: Too many subqueries.” VFP 9 executes it without a problem. (While this particular example is a little contrived, flattening a normalized database is a common problem and can involve many, many joins.)
Nearly unlimited IN list
In earlier versions of VFP, the IN (list of items) operator was limited to 24 items in the list. While VFP 9 doesn’t entirely remove the limit, it does give you control over it through the SYS(3055) function. Even without manipulating SYS(3055), the limit is significantly higher than in earlier versions. In our tests, we could include 154 items before we had to raise SYS(3055).
The IN operator can be used both to filter based on a list of items and to filter based on subquery results. The change here affects only the use of IN with a specific list of items.
When you control the query yourself, the limit on the IN clause isn’t generally a problem. You can usually find another approach to avoid a large IN clause. One solution is to store the list of values to a cursor and do a join with that cursor. For example, this query:
SELECT cFirst, cLast ;
FROM Person ;
WHERE UPPER(cLast) IN ("BLACK", "BROWN", "GREEN", "SILVER", "WHITE")
could be replaced with:
CREATE CURSOR Names (cName C(25))
INSERT INTO Names VALUES ("BLACK")
INSERT INTO Names VALUES ("BROWN")
INSERT INTO Names VALUES ("GREEN")
INSERT INTO Names VALUES ("SILVER")
INSERT INTO Names VALUES ("WHITE")
SELECT cFirst, cLast ;
FROM Names ;
JOIN Person ;
ON UPPER(cLast) = RTRIM(cName)
However, you don’t always have the chance to write this kind of code. In particular, other applications that access VFP data through OLE DB may generate queries that use the IN clause and offer no chance to code around it.
SYS(3055) was added in VFP 6 to solve a problem in updating views. It controls the allocation of space for evaluating FOR and WHERE clauses. The default value is 320, and it can be increased in increments of 8. In our tests, each increase of 8 in the SYS(3055) value allowed 4 more items in the list for IN.
Nested Subqueries
The ability to use subqueries (a query within a query) makes it possible to get some results with a single query that would otherwise require multiple queries. Perhaps the most common query involving a subquery is finding all the records in one table that are not in another. For example, this query (using the TasTrade database that comes with VFP) gets a list of companies from the Customer table that placed no orders:
SELECT Company_Name ;
FROM Customer ;
WHERE Customer_ID NOT IN ;
(SELECT Customer_ID FROM Orders) ;
INTO CURSOR NoSales
Another common use for a subquery is to perform aggregation and then allow the main query to retrieve additional information about the aggregate results. For example, you can use a subquery to get a list of the products included in each customer's most recent order, as in Listing 2. The subquery in this example is correlated, meaning it uses a field from a table not listed in the subquery itself, only in the main query. In the example, OrdLast.Customer_ID is used in the WHERE clause of the subquery, but OrdLast is an alias for Orders defined in the main query.
The Developer Downloads for this chapter, available from www.hentzenwerke.com, include the query in Listing 2 as CustProducts.PRG and the query in Listing 3 as ProductFirstNotLast.PRG.
SELECT CustLast.Customer_ID, Product_ID ;
FROM Order_Line_Items OLILast;
JOIN Orders OrdLast;
ON OLILast.Order_ID = OrdLast.Order_ID ;
JOIN Customer CustLast;
ON OrdLast.Customer_ID = ;
CustLast.Customer_ID ;
WHERE OrdLast.Order_Date = ;
(SELECT MAX(Order_Date) ;
FROM Orders Ord ;
WHERE OrdLast.Customer_ID = ;
Ord.Customer_ID ) ;
INTO CURSOR CustProducts
In VFP 8 and earlier, subqueries cannot be nested. That is, the WHERE clause of a subquery can’t use another subquery. VFP 9 allows nested subqueries, increasing the number of questions you can answer with a single query.
Suppose you want to find out what products a company included in its first order, but not its most recent. While you could use the cursor created in Listing 2 in another query, in VFP 9, you can do the whole job with a single query, shown in Listing 3.
SELECT Customer.Company_Name, Product_ID ;
FROM Order_Line_Items ;
JOIN Orders ;
ON Order_Line_Items.Order_ID = Orders.Order_ID ;
JOIN Customer ;
ON Orders.Customer_ID = Customer.Customer_ID ;
WHERE Orders.Order_Date = ;
(SELECT MIN(Order_Date) ;
FROM Orders Ord ;
WHERE Orders.Customer_ID=Ord.Customer_ID );
AND Product_ID NOT IN ;
(SELECT Product_ID ;
FROM Order_Line_Items OLILast;
JOIN Orders OrdLast;
ON OLILast.Order_ID = OrdLast.Order_ID ;
JOIN Customer CustLast;
ON OrdLast.Customer_ID = ;
CustLast.Customer_ID ;
WHERE OrdLast.Order_Date = ;
(SELECT MAX(Order_Date) ;
FROM Orders Ord ;
WHERE OrdLast.Customer_ID = ;
Ord.Customer_ID ) );
INTO CURSOR FirstNotLast
More flexible subqueries
The previous section looked at one way subqueries have improved in VFP 9. In fact, the VFP team did a lot of work with subqueries for this version. In VFP 8 and earlier, subqueries could be used only in the WHERE clause of a SQL command, that is, only in filtering data. In addition to removing the limit on nesting, VFP 9 allows you to use subqueries in the field list and the FROM clause of a query and in the SET clause of the SQL UPDATE command. VFP 9 also allows the GROUP BY clause in correlated subqueries (subqueries that refer to one or more fields from tables in the main query, like the subquery in Listing 2). Finally, VFP 9 allows you to use the TOP clause in subqueries, as long as those subqueries are not correlated.
Derived tables—Subqueries in the FROM clause
The result of a subquery in the FROM clause is called a derived table. One use for a derived table is retrieving additional data when you use an aggregate function.
Continuing to use the TasTrade database that comes with VFP, consider the problem of finding out about each customer’s most recent order. If all you want is the customer id and the order date, it’s easy:
SELECT Customer_ID, MAX(Order_Date) ;
FROM Orders ;
GROUP BY 1 ;
INTO CURSOR MostRecentOrder
This query is simple and works in every version of VFP. Suppose, though, you want not just the customer id, but additional information about the order, such as the company name, the contact person for the company, and the name of the shipper. Prior to VFP 9, you need to use a correlated subquery, two queries in sequence, or a very ugly subquery. Listing 4 shows the solution using a correlated subquery, Listing 5 shows the two queries in sequence technique, and Listing 6 shows the ugly subquery approach. What makes the subquery in Listing 6 so ugly is the need to combine the customer id and the most recent date into a single field for comparison.
The queries shown in Listing 4,
Listing 5, and Listing 6 are included
in the Developer Downloads for this chapter, available from www.hentzenwerke.com,
as MostRecentCorrelated.PRG, MostRecentTwoQueries.PRG, and MostRecentSubquery,
respectively.
SELECT Orders.Order_ID, Customer.Company_Name as Cust_Name, ;
Shippers.Company_Name AS Ship_Name, Orders.Order_Date ;
FROM Orders ;
JOIN Customer ;
ON Orders.Customer_ID = Customer.Customer_ID ;
JOIN Shippers ;
ON Orders.Shipper_ID = shippers.Shipper_ID ;
WHERE Orders.Order_Date = ;
(SELECT MAX(Order_Date) ;
FROM Orders Ord WHERE Orders.Customer_ID=Ord.Customer_ID );
ORDER BY Cust_Name ;
INTO CURSOR MostRecentOrders
SELECT Customer_ID, MAX(Order_Date) as Order_Date ;
FROM Orders ;
GROUP BY 1 ;
INTO CURSOR RecentOrder
SELECT Orders.Order_ID, Customer.Company_Name as Cust_Name, ;
Shippers.Company_Name AS Ship_Name, Orders.Order_Date ;
FROM Orders ;
JOIN RecentOrder ;
ON Orders.Customer_ID = RecentOrder.Customer_ID ;
AND Orders.Order_Date = RecentOrder.Order_Date ;
JOIN Customer ;
ON Orders.Customer_ID = Customer.Customer_ID ;
JOIN Shippers ;
ON Orders.Shipper_ID = shippers.Shipper_ID ;
ORDER BY Cust_Name ;
INTO CURSOR MostRecentOrders
SELECT Orders.Order_ID, Customer.Company_Name as Cust_Name, ;
Shippers.Company_Name AS Ship_Name, Orders.Order_Date ;
FROM Orders ;
JOIN Customer ;
ON Orders.Customer_ID = Customer.Customer_ID ;
JOIN Shippers ;
ON Orders.Shipper_ID = shippers.Shipper_ID ;
WHERE Orders.Customer_ID + DTOS(Orders.Order_Date) IN ;
(SELECT Customer_ID + DTOS(MAX(Order_Date)) ;
FROM Orders GROUP BY Customer_ID );
ORDER BY Cust_Name ;
INTO CURSOR MostRecentOrders
VFP 9 gives you a better alternative. You can perform the subquery on the fly right in the join clause. Putting the subquery into the join clause means it doesn’t have to be correlated; in some cases, that means it will run faster. The query in Listing 7 uses this approach. Here, the subquery has moved into the JOIN clause, and doesn’t need to combine the customer id and most recent date. Instead, the ON portion of the JOIN compares the two fields separately, much like the two query solution in Listing 5.
The query in Listing 7 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as MostRecentOrderDetails.PRG.
SELECT Orders.Order_ID, Customer.Company_Name as Cust_Name, ;
Shippers.Company_Name AS Ship_Name, Orders.Order_Date ;
FROM Orders ;
JOIN (SELECT Customer_ID, MAX(Order_Date) as Order_Date ;
FROM Orders CheckOrderDate ;
GROUP BY 1) RecentOrder ;
ON Orders.Customer_ID = RecentOrder.Customer_ID ;
AND Orders.Order_Date = RecentOrder.Order_Date ;
JOIN Customer ;
ON Orders.Customer_ID = Customer.Customer_ID ;
JOIN Shippers ;
ON Orders.Shipper_ID = shippers.Shipper_ID ;
ORDER BY Cust_Name ;
INTO CURSOR MostRecentOrders
Be aware that a subquery in the FROM clause can’t be correlated, which means it may not refer to fields of tables used in the main query, only to fields of the tables it lists. This is because all subqueries in the FROM clause are computed before the joins are performed, thus it’s not yet clear what records are in the result.
Computing fields with a subquery
In addition to supporting derived tables, VFP 9 lets you put subqueries in the field list of a query. That is, you can use a subquery to compute a field to appear in the result. A subquery used this way must return a single field and no more than a single record. If there are no records in the result for a particular record, that field is assigned the null value.
Why would you do this? Why not include the expression in the main query and add any necessary tables? As with derived tables, this approach is handy when grouping is involved. Suppose you want to find the total value of the orders placed for each customer in a particular year. Along with that, you want a great deal of customer information, including the address, phone number, and fax number.
Clearly to compute the total value of orders for a customer, you need to group data from Order_Line_Items by customer. You can extract multiple fields from Customer along the way as long as you add them to the GROUP BY clause. (Prior to VFP 8, you could include additional fields, even without putting them in the GROUP BY clause. In VFP 8, you can do so by issuing SET ENGINEBEHAVIOR 70.) Listing 8 shows a query that retrieves the customer id, company name, address, phone, and fax information along with the total value of that customer’s orders.
The query in Listing 8 is
included in the Developer Downloads for
this chapter, available from www.hentzenwerke.com, as
CustomerTotalGrouped.PRG.
SELECT Customer.Customer_ID, Customer.Company_Name, ;
Customer.Address, Customer.City, Customer.Region, ;
Customer.Postal_Code, Customer.Phone, Customer.Fax, ;
SUM(quantity*unit_price) AS yTotal ;
FROM Customer ;
LEFT JOIN Orders ;
JOIN Order_Line_Items;
ON Orders.Order_ID = Order_Line_Items.Order_ID ;
ON Customer.Customer_ID = Orders.Customer_ID ;
AND BETWEEN(Order_Date,DATE(m.nYear,1,1),DATE(m.nYear,12,31)) ;
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8;
INTO CURSOR CustomerTotal
However, adding so many fields to the GROUP BY clause slows the query down. One alternative that works in this situation is to remove the fields from the GROUP BY clause and wrap them in MAX() or MIN(). As with grouping, because these fields are the same for all records in the group, using MAX() or MIN() doesn’t change the results. This version, shown in Listing 9, is somewhat faster than listing all the fields in the GROUP BY clause.
The query in Listing 9 is included as CustomerTotalMax.PRG in the Developer Downloads for this chapter, available from www.hentzenwerke.com.
SELECT Customer.Customer_ID, MAX(Customer.Company_Name), ;
MAX(Customer.Address), MAX(Customer.City), MAX(Customer.Region), ;
MAX(Customer.Postal_Code), MAX(Customer.Phone), MAX(Customer.Fax), ;
SUM(quantity*unit_price) AS yTotal ;
FROM Customer ;
LEFT JOIN Orders ;
JOIN Order_Line_Items;
ON Orders.Order_ID = Order_Line_Items.Order_ID ;
ON Customer.Customer_ID = Orders.Customer_ID ;
AND BETWEEN(Order_Date,DATE(m.nYear,1,1),DATE(m.nYear,12,31)) ;
GROUP BY 1;
INTO CURSOR CustomerTotal
However, the ability to use a subquery in the field list provides an even simpler and more efficient solution to this problem. You can compute the total value of the orders in a subquery, allowing the main query to refer only to the parent table. Not only do you eliminate the extra fields in GROUP BY and the extra calls to aggregate functions, but you can also eliminate the outer join. Listing 10 shows the query.
The query in Listing 10 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as CustomerTotal.PRG.
SELECT Customer.Customer_ID, Customer.Company_Name, ;
Customer.Address, Customer.City, Customer.Region, ;
Customer.Postal_Code, Customer.Phone, Customer.Fax, ;
(SELECT SUM(quantity*unit_price) ;
FROM Orders ;
JOIN Order_Line_Items;
ON Orders.Order_ID = Order_Line_Items.Order_ID ;
WHERE BETWEEN(Order_Date,DATE(m.nYear,1,1),DATE(m.nYear,12,31)) ;
AND Customer.Customer_ID=Orders.Customer_ID ) as yTotal ;
FROM Customer ;
INTO CURSOR CustomerTotal
In our tests, using the queries shown, the subquery version (Listing 10) was about 12% faster than the version with calls to MAX() (Listing 9), which in turn was about 12% faster than listing all the parent fields in the GROUP BY clause (Listing 8). The more fields from the parent in the field list, the greater the advantage of the subquery version.
Computing replacements in UPDATE
The third new place you can use subqueries is the SET clause of the UPDATE command. That is, you can use a subquery to compute the value to which a field is to be set. However, when you use this approach, the UPDATE command cannot include a subquery in the WHERE clause. In addition, you’re limited to a single subquery in the SET clause, so you can’t use this approach to compute the values of multiple fields.
For this example, imagine that you have a data warehouse (SalesByProduct) for TasTrade data that tells you how many of each product were sold and the dollar amount of those sales. It’s designed to hold data for a single month and you want to update it at the end of the month.
To update the data, use the following UPDATE commands. Set nMonth and nYear to the month and year of the data you’re collecting before running the code in Listing 11. (“Correlated Updates,” later in this chapter, provides a better solution to this problem.)
Code to create the data warehouse as a cursor is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as CreateWarehouse.PRG. The code in Listing 11 is SubqueryInSet.PRG.
UPDATE SalesByProduct ;
SET TotalSales = (;
SELECT NVL(SUM(quantity*unit_price),$0) ;
FROM Order_Line_Items ;
JOIN Orders ;
ON Order_Line_Items.Order_ID = Orders.Order_ID ;
WHERE MONTH(Order_Date) = nMonth AND YEAR(Order_Date) = nYear;
AND Order_Line_Items.Product_ID = SalesByProduct.Product_ID)
UPDATE SalesByProduct ;
SET UnitsSold = (;
SELECT CAST(NVL(SUM(quantity),0) AS N(12)) ;
FROM Order_Line_Items ;
JOIN Orders ;
ON Order_Line_Items.Order_ID = Orders.Order_ID ;
WHERE MONTH(Order_Date) = nMonth AND YEAR(Order_Date) = nYear;
AND Order_Line_Items.Product_ID = SalesByProduct.Product_ID)
The second UPDATE in
the example uses the new CAST() function that lets you change data types on the
fly. See Chapter 14, “Language Improvements,” for details.
Correlated subqueries and grouping
A correlated subquery is one that refers to one or more fields of the main query, generally in the subquery’s WHERE clause. VFP 8 prohibits the GROUP BY clause in correlated subqueries. Because correlation can give you the same effect as grouping in many cases (see Listing 4, for example), you may not have run into this limit.
But there are a few situations where the ability to use GROUP BY in a correlated subquery makes it easier to get the desired results. Fortunately, VFP 9 permits grouping in a correlated subquery.
For example, consider the case where you want a list of customers who placed at least one order totaling more than a specified value and had that order shipped somewhere other than their address of record. (You might be checking into suspicious transactions.) In VFP 8 and earlier, extracting this information requires two queries in sequence: the first collects information about orders over the specified amount, while the second compares the shipping address of those records to the customer address and extracts customer information. Listing 12 shows one way to do this, with the threshold value set to $4000.
The code in Listing 12 is included in the Developer Downloads
for this chapter, available from www.hentzenwerke.com, as
SuspiciousTwoQueries.PRG. The query in Listing 13
is called SuspiciousOrders.PRG.
SELECT Orders.Customer_ID, Orders.Ship_to_Address, ;
Orders.Order_ID, Orders.Order_Date ;
FROM Orders ;
JOIN Order_line_items ;
ON Orders.Order_ID=Order_Line_Items.Order_ID ;
GROUP BY 1, 2, 3, 4 ;
HAVING SUM(Quantity*Unit_Price)> 4000 ;
INTO CURSOR BigOrders
SELECT Company_Name, Order_ID, Order_Date ;
FROM Customer ;
JOIN BigOrders ;
ON Customer.Customer_ID = BigOrders.Customer_ID ;
WHERE BigOrders.Ship_to_Address <> Customer.Address ;
INTO CURSOR Suspicions
In VFP 9, the ability to group in correlated subqueries means you can find this result with one, more readable query. Listing 13 shows the one-query version.
SELECT Company_Name, Ord.Order_ID, Ord.Order_Date ;
FROM Customer ;
JOIN Orders Ord;
ON Customer.Customer_ID = Ord.Customer_ID ;
WHERE Ord.Order_ID in (;
SELECT Orders.Order_ID;
FROM Orders ;
JOIN Order_Line_Items ;
ON Orders.Order_ID=Order_Line_Items.Order_ID ;
AND Orders.Customer_ID=Customer.Customer_ID ;
AND Orders.Ship_to_Address <> Customer.Address ;
GROUP BY Orders.Order_ID ;
HAVING SUM(Quantity*Unit_Price)> 4000 ) ;
INTO CURSOR Suspicions
As with a number of other situations where a single query replaces two queries, the version in Listing 13 is faster than the one in Listing 12.
Using TOP n in subqueries
The TOP n clause of SELECT lets you return only the first n records (or first n% of the records) of the result set. While the MIN() and MAX() functions let you choose the single smallest or largest value in a given field, TOP n lets you choose multiple items. You can use it to see things like the 10 most recent orders or the 30 most expensive products. For example, this query finds the 10% of products with the fewest overall sales.
SELECT TOP 10 PERCENT Product_ID, ;
SUM(Quantity) AS nSales ;
FROM Order_Line_Items ;
GROUP BY Product_ID ;
ORDER BY nSales ;
INTO CURSOR LowSales
In VFP 8 and earlier, you can’t use the TOP n clause in subqueries. VFP 9 changes that, permitting TOP n in subqueries unless the subquery is correlated. When you use TOP n in a subquery, you must include an ORDER BY clause in that subquery.
Imagine that you’re considering discontinuing the items extracted by the query above (the lowest-selling items). You might want to contact customers who have purchased those items to make sure it won’t be a problem for them. The query in Listing 14 uses the previous query as a subquery to create that list of customers.
The code in Listing 14 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as Discontinue.PRG.
SELECT DISTINCT Company_Name, English_Name ;
FROM Customer ;
JOIN Orders ;
ON Customer.Customer_ID = Orders.Customer_ID ;
JOIN Order_Line_Items ;
ON Orders.Order_ID = Order_Line_Items.Order_ID ;
JOIN ( ;
SELECT TOP 10 PERCENT Product_ID, ;
SUM(Quantity) AS nSales ;
FROM Order_Line_Items ;
GROUP BY Product_ID ;
ORDER BY nSales );
AS LowSales ;
ON Order_Line_Items.Product_ID = ;
LowSales.Product_ID ;
JOIN Products ;
ON LowSales.Product_ID = Products.Product_ID ;
ORDER BY English_Name, Company_Name ;
INTO CURSOR BoughtLowSellers
Correlated updates
In addition to supporting subqueries in the SET clause, the SQL UPDATE command in VFP 9 has a new FROM clause that allows you to draw the update data from another table. This gives what you might call correlated updates.
The example in Listing 11 has one serious drawback. You have to use a separate UPDATE command for each field you want to change. Using the FROM clause, you can achieve the same result with a query followed by an UPDATE command. The code in Listing 15 computes the new values, stores them in a cursor, and then references that cursor in the UPDATE command.
The code in Listing 15 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as CorrelatedUpdate.PRG. The code in Listing 16 is CorrelatedUpdateSubquery.PRG in the Developer Downloads.
SELECT Order_Line_Items.Product_ID, ;
SUM(Quantity*Order_Line_Items.Unit_Price) as TotalSales, ;
SUM(Quantity) AS UnitsSold ;
FROM Order_Line_Items ;
JOIN Orders ;
ON Order_Line_Items.Order_ID = Orders.Order_ID ;
AND MONTH(Order_Date) = nMonth AND YEAR(Order_Date) = nYear ;
GROUP BY 1 ;
INTO CURSOR MonthlySales
UPDATE SalesByProduct ;
SET SalesByProduct.TotalSales = NVL(MonthlySales.TotalSales, $0), ;
SalesByProduct.UnitsSold = NVL(MonthlySales.UnitsSold, 0) ;
FROM SalesByProduct ;
LEFT JOIN MonthlySales ;
ON SalesByProduct.Product_ID = MonthlySales.Product_ID
Along with specifying that the values come from another table, you can actually perform joins in the FROM clause to put together the list of values. In Listing 15, the outer join ensures that the records for products not sold in the specified month are set to 0.
In fact, the FROM clause of UPDATE supports subqueries (derived tables), so you can do this entire operation in a single UPDATE command, as shown in Listing 16.
UPDATE SalesByProduct ;
SET SalesByProduct.TotalSales = NVL(MonthlySales.TotalSales, $0), ;
SalesByProduct.UnitsSold = NVL(MonthlySales.UnitsSold, 0) ;
FROM SalesByProduct ;
LEFT JOIN (;
SELECT Order_Line_Items.Product_ID, ;
SUM(Quantity*Order_Line_Items.Unit_Price) as TotalSales, ;
SUM(Quantity) AS UnitsSold ;
FROM Order_Line_Items ;
JOIN Orders ;
ON Order_Line_Items.Order_ID = Orders.Order_ID ;
AND (MONTH(Order_Date) = nMonth AND YEAR(Order_Date) = nYear) ;
GROUP BY 1) AS MonthlySales ;
ON SalesByProduct.Product_ID = MonthlySales.Product_ID
Correlated DELETEs
In VFP 8 and earlier, the SQL DELETE command lets you list only one table. While you can use subqueries in the WHERE clause, deleting records based on information in other tables can be tricky. VFP 9 allows you to list multiple tables in a DELETE’s FROM clause, joining them according to the usual rules. This provides a much cleaner way to perform correlated deletion, deletion from one table based on data in one or more other tables.
The syntax for a correlated DELETE is a little confusing. If the FROM clause of DELETE contains more than one table, you must specify the target table for the deletion between DELETE and FROM:
DELETE [Target] FROM Table1 [JOIN Table2 …]
Use the local alias of the target table between DELETE and FROM. This may be the name of the table, but if you assign a local alias to the table in the FROM clause, use that instead. (Note that the same rules apply for UPDATE, when the table being updated is also included in the FROM clause, as in Listing 16.)
The TasTrade database doesn’t lend itself to an example of
this sort of deletion; it’s designed with the assumption that records are
marked inactive rather than deleted. Assume, though, you have Products and
Suppliers tables like those in TasTrade, with the primary key of Supplier used
as a foreign key in Products. Suppose there’s a problem getting products from
The code in Listing 17 is
included as DeleteProductSubquery.PRG,
while Listing 18 is included as
DeleteProduct.PRG in the Developer Downloads for this chapter. Both programs
include code to create and populate the tables.
DELETE FROM Products ;
WHERE Supplier_ID IN ( ;
SELECT Supplier_ID ;
FROM Supplier ;
WHERE UPPER(Supplier.Country) = "AUSTRALIA" )
DELETE Products ;
FROM Products ;
JOIN Supplier ;
ON Products.Supplier_ID = Supplier.Supplier_ID ;
WHERE
UPPER(Supplier.Country) = "AUSTRALIA"
A more perfect
The UNION clause of SELECT lets you combine the results
of several queries into a single result set. In VFP 8, the rules for
Use names in ORDER BY with
In earlier versions of VFP, when you use the UNION
clause to combine multiple queries into a single result, the ORDER BY clause
can list only the positions of the fields in the field list. You can’t refer to
fields by name, even if the field has the same name in every query in the
In VFP 9, you can use field names in the ORDER BY clause of a
UNIONed query. The field names you use are the ones in the result set. Be aware
that when the names of corresponding fields in the UNION are different, the
result draws the field name from the last query in the
Listing 19 is a simple example that uses this capability. Even with this straightforward query, using the field names increases readability considerably.
The query in Listing 19 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as AllCompanies.PRG.
SELECT Company_Name, Address, City, Region, Postal_Code, Country ;
FROM Customer ;
UNION ;
SELECT Company_Name, Address, City, Region, Postal_Code, Country ;
FROM Supplier ;
ORDER BY Country, City ;
INTO CURSOR AllCompanies
Insert data from UNIONed result
VFP 8 introduced the ability to populate a table or cursor directly from a query result with the addition of the INSERT INTO … SELECT syntax. This made it possible to compute results and add them in a single step.
VFP 9 adds another capability to that syntax: the query used can include the UNION clause. This means you can consolidate data and add it to a table or cursor in one step.
For example, suppose you have a data warehouse for TasTrade, containing the annual sales for each employee by product as well as an annual total for each employee. (Note that this is a different data warehouse than the one described in “Computing Replacements in UPDATE,” earlier in this chapter.)
You can compute the sales for each product by each employee
in a specified year with a single query; similarly, you can compute the totals
for each employee for a year with one query. However, collecting both the
product-specific and the total data requires either two queries or a query
involving a
The command in Listing 20 is included in this chapter’s Developer Downloads, available from www.hentzenwerke.com, as WarehouseUnion.PRG. The program includes code to create the warehouse. (Here, it’s a cursor; in production code, of course, you’d use a table.)
INSERT INTO Warehouse ;
SELECT CrossProd.Product_ID, ;
CrossProd.Employee_ID, ;
m.nYear as nYear, ;
NVL(nUnitsSold, 0), ;
NVL(nTotalSales, $0);
FROM (SELECT Employee.Employee_ID, ;
Products.Product_ID ;
FROM Employee, Products) AS CrossProd ;
LEFT JOIN ( ;
SELECT Product_ID, Employee_ID, ;
SUM(Quantity) AS nUnitsSold, ;
SUM(Quantity * Unit_Price) AS nTotalSales ;
FROM Orders ;
JOIN Order_Line_Items ;
ON Orders.Order_ID = ;
Order_Line_Items.Order_ID ;
WHERE YEAR(Order_Date) = m.nYear ;
GROUP BY Product_ID, Employee_ID ) ;
AS AnnualSales ;
ON CrossProd.Employee_ID = ;
AnnualSales.Employee_ID ;
AND CrossProd.Product_ID = AnnualSales.Product_ID ;
UNION ;
SELECT "Total" AS Product_ID, Employee.Employee_ID, ;
m.nYear AS nYear, ;
CAST(NVL(SUM(Quantity),0) as N(12)) ;
AS nUnitsSold, ;
NVL(SUM(Quantity * Unit_Price), $0) ;
AS nTotalSales ;
FROM Orders ;
JOIN Order_Line_Items ;
ON Orders.Order_ID = Order_Line_Items.Order_ID ;
AND YEAR(Order_Date) = m.nYear ;
RIGHT JOIN Employee ;
ON Orders.Employee_ID = Employee.Employee_ID ;
GROUP BY Employee.Employee_ID ;
ORDER BY 2, 1
No parentheses with
UNION
Although it
wasn’t really syntactically correct, earlier versions of VFP didn’t object if
queries in a UNION were enclosed in parentheses. In VFP 9, a single query in a
UNION can be surrounded by parentheses, but putting parentheses around multiple
queries in a UNION raises a new error, error 2196. Listing 21 shows a query that works in VFP 8, but fails
in VFP 9, due to the new rule.
The query in Listing
21
is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as
UnionParens.PRG. When you run it in VFP 9, it generates an error.
SELECT Company_Name, Address, City, Region, Postal_Code, Country ;
FROM Customer ;
UNION ;
(SELECT Company_Name, Address, City, Region, Postal_Code, Country ;
FROM Supplier ;
UNION ;
SELECT Company_Name, "", "", "", "", "" FROM Shippers )
According to the Fox
team, using parentheses around multiple UNIONs can cause incorrect results.
Combining DISTINCT and ORDER BY
VFP allows you to order query results by any field from the source tables; fields in the ORDER BY list don’t have to be in the field list. In VFP 9, this is no longer true for queries that use SELECT DISTINCT. For example, this query executes in VFP 8, but it raises error 1808 (“SQL: ORDER BY clause is invalid.”) in VFP 9:
SELECT Distinct Customer_ID;
FROM Orders ;
ORDER BY Order_Date
This behavior is
affected by SET ENGINEBEHAVIOR. (See “Turn off new behavior” later in this
chapter.)
Optimization changes
VFP 9 includes several changes to improve the performance of your queries, as well as a new function that makes testing optimization easier.
Fully optimize LIKE with “%”
The LIKE operator lets you compare strings. If a condition in a SQL command includes cField LIKE cString, the specified field is compared to the specified character string on a character by character basis. Unlike the = operator, if cString is shorter than cField, the two do not match, unless a wildcard character is used. The LIKE operator supports two wildcard characters—use “_” to represent a single unknown character and “%” to represent 0 or more unknown characters. For example, you can find all the customers in TasTrade whose names begin with the letter “P” using this query:
SELECT Customer_ID, Company_Name ;
FROM Customer ;
WHERE UPPER(Company_Name) LIKE "P%" ;
INTO CURSOR PCompanies
Earlier versions of VFP could not fully optimize that query. LIKE “string%” expressions could only be partially optimized. VFP 9 fully optimizes such expressions. (Full optimization applies only when the % wildcard is at the end of the character string.)
Our tests showed mixed results regarding the effect of this optimization. For many queries, versions using LIKE and = were equally fast in both VFP 8 and VFP 9. However, in one situation, the case when VFP takes a shortcut and simply filters the original table, optimization of LIKE made a significant difference. VFP takes this shortcut with any query that involves a single table, has no calculated fields, and is fully optimizable. Filtering the source table rather than creating an actual file on disk saves considerable time. Optimization of LIKE means the VFP engine can take this approach with some additional queries. (You can turn off this shortcut by including the NOFILTER clause in the query.)
Better speed for TOP n
When you use the TOP n or TOP n PERCENT clause to return only a subset of the records that otherwise match the query conditions, VFP has to figure out which records are at the top of the list. When you specify the TOP n of a large set, that process can take considerable time. VFP 9 improves performance in that situation.
In our tests, we didn’t see a difference until we worked with a very large table. Choosing the TOP 20 out of a table of nearly 75,000 records showed no difference. When we looked for the TOP 20 in a table of over a million records, however, VFP 9 finished in about one-third the time of VFP 8.
Along with making TOP n calculations faster, the behavior of
TOP n queries has changed slightly. In
earlier versions of VFP, a query with a TOP n clause could return more than n
records due to ties in the data. VFP 9 never returns more than the exact number
of records specified by the TOP n clause. (See “Turning off new behavior” later
in this chapter for the exception to this rule.) When there are ties, it
appears VFP chooses records in physical order from the group with the same
value.
These changes are among
those affected by the new SET ENGINEBEHAVIOR TO 90 setting. See “Turn off new
behavior” later in this chapter for details.
Improved performance
with OR
In earlier
versions, when a query used the OR operator to combine conditions involving
different tables, the result couldn’t always be optimized. Changes to the VFP 9
SQL engine mean that such conditions should be optimized if the individual
conditions are optimizable.
In our testing, not
every query using OR with conditions based on different tables showed
improvement, but we were able to see the difference in some cases. For example,
this query (using the Northwind data) ran nearly four times as fast in VFP 9:
SELECT Orders.OrderId, ProductId ;
FROM Orders ;
JOIN OrderDetails ;
ON Orders.OrderId=OrderDetails.OrderId ;
WHERE ;
(Orders.OrderDate=DATE(1997,9,1) AND OrderDetails.Quantity>2);
OR ;
(Orders.OrderDate=DATE(1997,9,2) AND OrderDetails.Quantity>3);
INTO CURSOR Result
You may also see some
improvements where OR is used with subqueries.
Filtering and temporary
indexes
When you join two
tables in a query, VFP picks an index tag to match up corresponding records. If
the fields in question aren’t indexed, or the existing tag won’t help much (for
example, when the join involves a small table and a much larger table, and
there’s only a tag for the smaller table), the engine creates an index on the
fly. In the output from SYS(3054), the join shows as “using temp index.”
VFP 9 speeds up queries
that build a temporary index and have a non-optimizable filter on the same
table (the one for which the index is built). The effect is most noticeable
with large tables and in cases where the non-optimizable filter eliminates many
records. We suspect that the engine is filtering before building the temporary
index, which speeds up both building the index and using it.
In our tests, none of
the sample tables that come with VFP were large enough to see this effect. We
were able to demonstrate the improvement using a table with about 75,000
records. Our test query joined the table to itself on an unindexed field, using
a filter condition that selected only about 250 records. In VFP 9, the query
took under a second. In VFP 8, the same query took almost 50 seconds.
Correlating faster
You may never run
into another area where performance has been improved. If you have a query with
a correlated subquery, and the query also includes a filter on the table from
the main query that’s used in the subquery, VFP 9 performs measurably better
than VFP 8.
We tested with the
following query using data from TasTrade and found VFP 9 about four times as
fast:
SELECT Orders.Order_ID, Customer.Company_Name as Cust_Name, ;
Shippers.Company_Name AS Ship_Name, Orders.Order_Date ;
FROM Orders ;
JOIN Customer ;
ON Orders.Customer_ID = Customer.Customer_ID ;
JOIN Shippers ;
ON Orders.Shipper_ID = shippers.Shipper_ID ;
WHERE Orders.Discount>0 AND ;
Orders.Order_Date = (SELECT MAX(Order_Date) ;
FROM Orders Ord ;
WHERE Orders.Customer_ID=Ord.Customer_ID );
ORDER BY Cust_Name ;
INTO CURSOR MostRecentOrders
Logging optimization results
The SYS(3054) function was introduced in VFP 5. It gives you information about how FoxPro is optimizing a query. It’s been improved several times and now offers a great deal of data about the optimization process. However, it’s still hard to use SYS(3054) to gather information about query performance through an entire program or application.
Enter SYS(3092). This new function lets you direct SYS(3054) output to a log file. By itself, SYS(3054) can send output only to the active window or a variable. With SYS(3092), you can collect data about a whole series of queries and examine it at your leisure.
The syntax is:
cLogFile = SYS(3092 [, cFileName [, lAdditive ] ] )
The cFileName parameter
specifies the name (including path) of the log file. Use lAdditive to specify
whether an existing file is overwritten. The default is to overwrite an
existing file.
To turn off logging and
make the log file available for reading, pass the empty string as the cFileName
parameter.
The function returns
the name of the active log file. Note that the new log file is set before the
value is returned, so to save the name of an old log file before changing it,
you must call the function once with no parameters, and then call it again,
passing the new value.
When you turn on
logging with SYS(3092), the output from SYS(3054) is still echoed to the active
window or stored to a specified variable.
Once you establish a
log file with SYS(3092), use SYS(3054) as you normally would
and run the queries you want to test. When you’re done testing, reset SYS(3054)
and then issue SYS(3092, "") to stop logging. You can then examine
the log file to see your optimization results.
The information in the
log file is most useful if you pass either 2 or 12 as the second parameter to
SYS(3054). Added in VFP 7, those settings include the query itself in the
output before reporting on optimization.
SELECT from buffered
tables
Since VFP 3,
FoxPro developers have been frustrated by the behavior of SELECT with
buffering. When a buffered table is used in a query, VFP uses the actual table
on disk, not
the open buffered version. This means query results don’t reflect uncommitted
changes to
the data.
This behavior follows
naturally from the normal behavior of queries. In general, whether a table is
open or not, when it’s listed in the FROM clause of a query, the VFP engine
opens it again in a new work area.
In some situations, it
would be really handy to be able to pull data from a buffered table with a
query. In VFP 8 and earlier, you have to turn to Xbase commands (such as
CALCULATE) instead.
VFP 9 gives you the
option of looking at the buffered data. Add the new WITH (Buffering=.T.) clause
to a query and it uses an available buffer rather than the table on
disk. Listing 22
shows a query that counts the number of customers in each country using
buffered data.
SELECT Country, CNT(*) ;
FROM Customer WITH (Buffering = .T.) ;
GROUP BY Country ;
INTO CURSOR BufferedCount
The Developer Downloads for
this chapter, available from www.hentzenwerke.com,
include QueryWithBuffering.PRG, which demonstrates the effect of the WITH
clause.
The WITH clause applies
to a single table. If the query lists multiple tables for which you want to use
buffered data, include a WITH statement for each.
One big warning. If you’re
using row buffering, the query commits the changes to the current row. This
actually makes sense as the query moves the record pointer in the buffered
table. In earlier versions, where queries operated against the data on disk,
the record pointer in the buffer didn’t move, but when you query the buffer
itself, the record pointer does move.
You can also control
the behavior of queries with buffered tables globally. The new SET SQLBUFFERING
command lets you specify whether queries draw from disk or from buffers by
default. The WITH (Buffering = lExpr) clause overrides the current setting for
a particular table and query. SET SQLBUFFERING is scoped to the data session.
Use SET(“SQLBUFFERING”) to query the current setting.
Turn off new behavior
The significant
changes to VFP’s SQL engine introduced in VFP 8 caused problems for some
existing applications. Rather than force developers to change working code or
be stuck in VFP 7, the Fox team added the SET ENGINEBEHAVIOR command, which
allows you to turn off the VFP 8 changes. While it’s not a good idea to use it
all the time, the command provides a flexible solution for existing
applications.
Most of the changes to
the SQL engine in VFP 9 are unlikely to cause compatibility problems. However,
there are a few items that may be an issue for some applications, so the Fox
team added a new setting to SET ENGINEBEHAVIOR.
As noted in “Better
speed for TOP n” earlier in the chapter, in VFP 9, a TOP n query now returns
exactly n records; in the case of ties, it may discard some of the tied
results.
When a query includes
one of the aggregate functions (CNT(), SUM(), AVG(), MIN() or MAX()), but has
no GROUP BY clause, VFP 9 always returns a single record. If no records meet
the join and filter conditions, the result record has the null value for all
fields. In earlier versions, such a query returned an empty result.
Finally, all fields
listed in the ORDER BY clause of a query using SELECT DISTINCT must be included
in the field list of the query.
To turn off these
behaviors, SET ENGINEBEHAVIOR to 80 or 70. The default is SET ENGINEBEHAVIOR
90, which enables the new behaviors.
The bottom line
Many of the SQL changes in VFP 9 increase compatibility with the SQL-92 standard. They also provide more tools for manipulating your data as needed. While some of the changes aren’t likely to have a strong impact on your day-to-day work, you will probably find that others come in handy over and over. The ability to extract some data with a single query, where previously two queries were needed, not only tends to speed up the code, but also makes it easier to define views for those tasks.
Although we are already finding many uses for derived tables,
we suspect our
favorite SQL change may well turn out to be SYS(3092), which turns on logging
of optimization results.
Updates and corrections for this chapter can be found on Hentzenwerke’s website, www.hentzenwerke.com. Click “Catalog” and navigate to the page for this book.