Chapter N

Chapter 8
SQL Changes

After leaving the SQL portion of VFP virtually unchanged for a long time, Microsoft began to introduce additional features and increased compliance with the ANSI-92 standard with VFP 8. VFP 9 continues that process, offering additional functionality for SELECT, INSERT, UPDATE, and DELETE, as well as a number of other improvements.

VFP’s SQL sublanguage lets you manipulate data without worrying about record pointers or work areas. VFP 9 includes a number of enhancements to this part of the language. Many of the limits related to queries have been removed, the use of subqueries has been expanded, and the UPDATE and DELETE commands accept join conditions to specify the records to modify or remove. VFP 9 also offers some performance improvements, an easier way to test performance, and the ability to query against a buffered table.

VFP 9 has no limits

The most basic change to queries in VFP 9 is the elimination of a number of restrictions. In earlier versions, for example, the total number of joins and subqueries in a single query was limited to nine; in VFP 9, there’s no limit. Table 1 shows the limits related to queries removed or raised in VFP 9.

Table 1. SQL queries were limited in a number of ways in earlier versions of Visual FoxPro. Many of those limits were lifted in VFP 9.

Description	Limit in VFP 8 (and earlier)	Limit in VFP 9
Total number of joins and subqueries	9	No limit
Number of UNIONs	9	No limit
Number of tables and aliases referenced	30	No limit
Number of items listed in IN clause	24	Based on SYS(3055) setting
Nesting level for subqueries	1	No limit

For everyday queries, most developers never run into these limits, but each of them can pose problems in certain situations.

Lots of tables

With fully normalized data, the old limits on joins, as well as those on the number of tables and aliases referenced, can make it difficult to pull together all the information for a particular entity. For example, consider the Northwind database that comes with VFP 8
and 9. Collecting all the data related to each order, including customer information, shipper information, supplier information and so forth, involves 11 tables. Listing 1 shows such
a query.

The Developer Downloads for this chapter, available from www.hentzenwerke.com, include the query in Listing 1 as ManyTables.PRG.

Listing 1. Flattening a normalized database can involve many joins and tables.

SELECT ProductName, CategoryName, OrderDate, ;

Customers.CompanyName AS CustomerName, ;

OrderDetails.Quantity, ;

OrderDetails.UnitPrice, ;

Suppliers.CompanyName AS SupplierName, ;

Employees.LastName, Employees.FirstName, ;

Territories.TerritoryDescription, ;

Region.RegionDescription, ;

Shippers.CompanyName AS ShipperName;

FROM Orders ;

LEFT JOIN Customers ;

ON Orders.CustomerID = Customers.CustomerID ;

LEFT JOIN OrderDetails ;

ON Orders.OrderID = OrderDetails.OrderID ;

LEFT JOIN Products ;

ON OrderDetails.ProductID = Products.ProductID ;

LEFT JOIN Categories ;

ON Products.CategoryID = Categories.CategoryID ;

LEFT JOIN Suppliers ;

ON Products.SupplierID = Suppliers.SupplierID ;

LEFT JOIN Shippers ;

ON Orders.ShipVia = Shippers.ShipperID ;

LEFT JOIN Employees ;

ON Orders.EmployeeID = Employees.EmployeeID ;

LEFT JOIN EmployeeTerritories ;

ON Employees.EmployeeID = EmployeeTerritories.EmployeeID ;

LEFT JOIN Territories ;

ON EmployeeTerritories.TerritoryID = Territories.TerritoryID ;

LEFT JOIN Region ;

ON Territories.RegionID = Region.RegionID ;

ORDER BY Products.ProductID, Orders.OrderDate ;

INTO CURSOR Unfolded

This query lists 11 different aliases and performs 10 joins; VFP 8 and earlier versions can’t run it because it generates error 1805, “SQL: Too many subqueries.” VFP 9 executes it without a problem. (While this particular example is a little contrived, flattening a normalized database is a common problem and can involve many, many joins.)

Nearly unlimited IN list

In earlier versions of VFP, the IN (list of items) operator was limited to 24 items in the list. While VFP 9 doesn’t entirely remove the limit, it does give you control over it through the SYS(3055) function. Even without manipulating SYS(3055), the limit is significantly higher than in earlier versions. In our tests, we could include 154 items before we had to raise SYS(3055).

The IN operator can be used both to filter based on a list of items and to filter based on subquery results. The change here affects only the use of IN with a specific list of items.

When you control the query yourself, the limit on the IN clause isn’t generally a problem. You can usually find another approach to avoid a large IN clause. One solution is to store the list of values to a cursor and do a join with that cursor. For example, this query:

SELECT cFirst, cLast ;

FROM Person ;

WHERE UPPER(cLast) IN ("BLACK", "BROWN", "GREEN", "SILVER", "WHITE")

could be replaced with:

CREATE CURSOR Names (cName C(25))

INSERT INTO Names VALUES ("BLACK")

INSERT INTO Names VALUES ("BROWN")

INSERT INTO Names VALUES ("GREEN")

INSERT INTO Names VALUES ("SILVER")

INSERT INTO Names VALUES ("WHITE")

SELECT cFirst, cLast ;

FROM Names ;

JOIN Person ;

ON UPPER(cLast) = RTRIM(cName)

However, you don’t always have the chance to write this kind of code. In particular, other applications that access VFP data through OLE DB may generate queries that use the IN clause and offer no chance to code around it.

SYS(3055) was added in VFP 6 to solve a problem in updating views. It controls the allocation of space for evaluating FOR and WHERE clauses. The default value is 320, and it can be increased in increments of 8. In our tests, each increase of 8 in the SYS(3055) value allowed 4 more items in the list for IN.

Nested Subqueries

The ability to use subqueries (a query within a query) makes it possible to get some results with a single query that would otherwise require multiple queries. Perhaps the most common query involving a subquery is finding all the records in one table that are not in another. For example, this query (using the TasTrade database that comes with VFP) gets a list of companies from the Customer table that placed no orders:

SELECT Company_Name ;

FROM Customer ;

WHERE Customer_ID NOT IN ;

(SELECT Customer_ID FROM Orders) ;

INTO CURSOR NoSales

Another common use for a subquery is to perform aggregation and then allow the main query to retrieve additional information about the aggregate results. For example, you can use a subquery to get a list of the products included in each customer's most recent order, as in Listing 2. The subquery in this example is correlated, meaning it uses a field from a table not listed in the subquery itself, only in the main query. In the example, OrdLast.Customer_ID is used in the WHERE clause of the subquery, but OrdLast is an alias for Orders defined in the main query.

The Developer Downloads for this chapter, available from www.hentzenwerke.com, include the query in Listing 2 as CustProducts.PRG and the query in Listing 3 as ProductFirstNotLast.PRG.

Listing 2. The subquery here finds each customer’s most recent order, and then uses those results to get the list of products included in that order.

SELECT CustLast.Customer_ID, Product_ID ;

FROM Order_Line_Items OLILast;

JOIN Orders OrdLast;

ON OLILast.Order_ID = OrdLast.Order_ID ;

JOIN Customer CustLast;

ON OrdLast.Customer_ID = ;

CustLast.Customer_ID ;

WHERE OrdLast.Order_Date = ;

(SELECT MAX(Order_Date) ;

FROM Orders Ord ;

WHERE OrdLast.Customer_ID = ;

Ord.Customer_ID ) ;

INTO CURSOR CustProducts

In VFP 8 and earlier, subqueries cannot be nested. That is, the WHERE clause of a subquery can’t use another subquery. VFP 9 allows nested subqueries, increasing the number of questions you can answer with a single query.

Suppose you want to find out what products a company included in its first order, but not its most recent. While you could use the cursor created in Listing 2 in another query, in VFP 9, you can do the whole job with a single query, shown in Listing 3.

Listing 3. Nesting subqueries lets you solve more problems with a single query rather than a series of queries.

SELECT Customer.Company_Name, Product_ID ;

FROM Order_Line_Items ;

JOIN Orders ;

ON Order_Line_Items.Order_ID = Orders.Order_ID ;

JOIN Customer ;

ON Orders.Customer_ID = Customer.Customer_ID ;

WHERE Orders.Order_Date = ;

(SELECT MIN(Order_Date) ;

FROM Orders Ord ;

WHERE Orders.Customer_ID=Ord.Customer_ID );

AND Product_ID NOT IN ;

(SELECT Product_ID ;

FROM Order_Line_Items OLILast;

JOIN Orders OrdLast;

ON OLILast.Order_ID = OrdLast.Order_ID ;

JOIN Customer CustLast;

ON OrdLast.Customer_ID = ;

CustLast.Customer_ID ;

WHERE OrdLast.Order_Date = ;

(SELECT MAX(Order_Date) ;

FROM Orders Ord ;

WHERE OrdLast.Customer_ID = ;

Ord.Customer_ID ) );

INTO CURSOR FirstNotLast

More flexible subqueries

The previous section looked at one way subqueries have improved in VFP 9. In fact, the VFP team did a lot of work with subqueries for this version. In VFP 8 and earlier, subqueries could be used only in the WHERE clause of a SQL command, that is, only in filtering data. In addition to removing the limit on nesting, VFP 9 allows you to use subqueries in the field list and the FROM clause of a query and in the SET clause of the SQL UPDATE command. VFP 9 also allows the GROUP BY clause in correlated subqueries (subqueries that refer to one or more fields from tables in the main query, like the subquery in Listing 2). Finally, VFP 9 allows you to use the TOP clause in subqueries, as long as those subqueries are not correlated.

Derived tables—Subqueries in the FROM clause

The result of a subquery in the FROM clause is called a derived table. One use for a derived table is retrieving additional data when you use an aggregate function.

Continuing to use the TasTrade database that comes with VFP, consider the problem of finding out about each customer’s most recent order. If all you want is the customer id and the order date, it’s easy:

SELECT Customer_ID, MAX(Order_Date) ;

FROM Orders ;

GROUP BY 1 ;

INTO CURSOR MostRecentOrder

This query is simple and works in every version of VFP. Suppose, though, you want not just the customer id, but additional information about the order, such as the company name, the contact person for the company, and the name of the shipper. Prior to VFP 9, you need to use a correlated subquery, two queries in sequence, or a very ugly subquery. Listing 4 shows the solution using a correlated subquery, Listing 5 shows the two queries in sequence technique, and Listing 6 shows the ugly subquery approach. What makes the subquery in Listing 6 so ugly is the need to combine the customer id and the most recent date into a single field for comparison.

The queries shown in Listing 4, Listing 5, and Listing 6 are included
in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as MostRecentCorrelated.PRG, MostRecentTwoQueries.PRG, and MostRecentSubquery, respectively.

Listing 4. Finding data associated with an aggregate is one reason to use a correlated subquery.

SELECT Orders.Order_ID, Customer.Company_Name as Cust_Name, ;

Shippers.Company_Name AS Ship_Name, Orders.Order_Date ;

FROM Orders ;

JOIN Customer ;

ON Orders.Customer_ID = Customer.Customer_ID ;

JOIN Shippers ;

ON Orders.Shipper_ID = shippers.Shipper_ID ;

WHERE Orders.Order_Date = ;

(SELECT MAX(Order_Date) ;

FROM Orders Ord WHERE Orders.Customer_ID=Ord.Customer_ID );

ORDER BY Cust_Name ;

INTO CURSOR MostRecentOrders

Listing 5. Another way to find the data associated with an aggregate result is to use two queries in sequence. The first query does the grouping, and then the second query joins with that result.

SELECT Customer_ID, MAX(Order_Date) as Order_Date ;

FROM Orders ;

GROUP BY 1 ;

INTO CURSOR RecentOrder

SELECT Orders.Order_ID, Customer.Company_Name as Cust_Name, ;

Shippers.Company_Name AS Ship_Name, Orders.Order_Date ;

FROM Orders ;

JOIN RecentOrder ;

ON Orders.Customer_ID = RecentOrder.Customer_ID ;

AND Orders.Order_Date = RecentOrder.Order_Date ;

JOIN Customer ;

ON Orders.Customer_ID = Customer.Customer_ID ;

JOIN Shippers ;

ON Orders.Shipper_ID = shippers.Shipper_ID ;

ORDER BY Cust_Name ;

INTO CURSOR MostRecentOrders

Listing 6. Yet another way to find data associated with an aggregate result is to use a subquery in the WHERE clause. The resulting query is hard to maintain because of the strange expression used to match data to the subquery result.

SELECT Orders.Order_ID, Customer.Company_Name as Cust_Name, ;

Shippers.Company_Name AS Ship_Name, Orders.Order_Date ;

FROM Orders ;

JOIN Customer ;

ON Orders.Customer_ID = Customer.Customer_ID ;

JOIN Shippers ;

ON Orders.Shipper_ID = shippers.Shipper_ID ;

WHERE Orders.Customer_ID + DTOS(Orders.Order_Date) IN ;

(SELECT Customer_ID + DTOS(MAX(Order_Date)) ;

FROM Orders GROUP BY Customer_ID );

ORDER BY Cust_Name ;

INTO CURSOR MostRecentOrders

VFP 9 gives you a better alternative. You can perform the subquery on the fly right in the join clause. Putting the subquery into the join clause means it doesn’t have to be correlated; in some cases, that means it will run faster. The query in Listing 7 uses this approach. Here, the subquery has moved into the JOIN clause, and doesn’t need to combine the customer id and most recent date. Instead, the ON portion of the JOIN compares the two fields separately, much like the two query solution in Listing 5.

The query in Listing 7 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as MostRecentOrderDetails.PRG.

Listing 7. This query uses a subquery in the FROM clause (a derived table) to solve the problem of finding data associated with aggregates.

SELECT Orders.Order_ID, Customer.Company_Name as Cust_Name, ;

Shippers.Company_Name AS Ship_Name, Orders.Order_Date ;

FROM Orders ;

JOIN (SELECT Customer_ID, MAX(Order_Date) as Order_Date ;

FROM Orders CheckOrderDate ;

GROUP BY 1) RecentOrder ;

ON Orders.Customer_ID = RecentOrder.Customer_ID ;

AND Orders.Order_Date = RecentOrder.Order_Date ;

JOIN Customer ;

ON Orders.Customer_ID = Customer.Customer_ID ;

JOIN Shippers ;

ON Orders.Shipper_ID = shippers.Shipper_ID ;

ORDER BY Cust_Name ;

INTO CURSOR MostRecentOrders

Be aware that a subquery in the FROM clause can’t be correlated, which means it may not refer to fields of tables used in the main query, only to fields of the tables it lists. This is because all subqueries in the FROM clause are computed before the joins are performed, thus it’s not yet clear what records are in the result.

Computing fields with a subquery

In addition to supporting derived tables, VFP 9 lets you put subqueries in the field list of a query. That is, you can use a subquery to compute a field to appear in the result. A subquery used this way must return a single field and no more than a single record. If there are no records in the result for a particular record, that field is assigned the null value.

Why would you do this? Why not include the expression in the main query and add any necessary tables? As with derived tables, this approach is handy when grouping is involved. Suppose you want to find the total value of the orders placed for each customer in a particular year. Along with that, you want a great deal of customer information, including the address, phone number, and fax number.

Clearly to compute the total value of orders for a customer, you need to group data from Order_Line_Items by customer. You can extract multiple fields from Customer along the way as long as you add them to the GROUP BY clause. (Prior to VFP 8, you could include additional fields, even without putting them in the GROUP BY clause. In VFP 8, you can do so by issuing SET ENGINEBEHAVIOR 70.) Listing 8 shows a query that retrieves the customer id, company name, address, phone, and fax information along with the total value of that customer’s orders.

The query in Listing 8 is included in the Developer Downloads for
this chapter, available from www.hentzenwerke.com, as CustomerTotalGrouped.PRG.

Listing 8. In a query aggregating a field or fields of a child table, you can add fields from a parent table as long as you add them to the GROUP BY clause. Don’t forget to define nYear before running this query.

SELECT Customer.Customer_ID, Customer.Company_Name, ;

Customer.Address, Customer.City, Customer.Region, ;

Customer.Postal_Code, Customer.Phone, Customer.Fax, ;

SUM(quantity*unit_price) AS yTotal ;

FROM Customer ;

LEFT JOIN Orders ;

JOIN Order_Line_Items;

ON Orders.Order_ID = Order_Line_Items.Order_ID ;

ON Customer.Customer_ID = Orders.Customer_ID ;

AND BETWEEN(Order_Date,DATE(m.nYear,1,1),DATE(m.nYear,12,31)) ;

GROUP BY 1, 2, 3, 4, 5, 6, 7, 8;

INTO CURSOR CustomerTotal

However, adding so many fields to the GROUP BY clause slows the query down. One alternative that works in this situation is to remove the fields from the GROUP BY clause and wrap them in MAX() or MIN(). As with grouping, because these fields are the same for all records in the group, using MAX() or MIN() doesn’t change the results. This version, shown in Listing 9, is somewhat faster than listing all the fields in the GROUP BY clause.

The query in Listing 9 is included as CustomerTotalMax.PRG in the Developer Downloads for this chapter, available from www.hentzenwerke.com.

Listing 9. Another approach to retrieving parent data in a grouped query is to wrap the extra fields in MAX() or MIN(). Again, don’t forget to define nYear before running this query.

SELECT Customer.Customer_ID, MAX(Customer.Company_Name), ;

MAX(Customer.Address), MAX(Customer.City), MAX(Customer.Region), ;

MAX(Customer.Postal_Code), MAX(Customer.Phone), MAX(Customer.Fax), ;

SUM(quantity*unit_price) AS yTotal ;

FROM Customer ;

LEFT JOIN Orders ;

JOIN Order_Line_Items;

ON Orders.Order_ID = Order_Line_Items.Order_ID ;

ON Customer.Customer_ID = Orders.Customer_ID ;

AND BETWEEN(Order_Date,DATE(m.nYear,1,1),DATE(m.nYear,12,31)) ;

GROUP BY 1;

INTO CURSOR CustomerTotal

However, the ability to use a subquery in the field list provides an even simpler and more efficient solution to this problem. You can compute the total value of the orders in a subquery, allowing the main query to refer only to the parent table. Not only do you eliminate the extra fields in GROUP BY and the extra calls to aggregate functions, but you can also eliminate the outer join. Listing 10 shows the query.

The query in Listing 10 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as CustomerTotal.PRG.

Listing 10. Using a subquery in the field list simplifies the problem of showing additional fields from the parent table when data is aggregated based on a child table. Be sure to give nYear a value before running this query.

SELECT Customer.Customer_ID, Customer.Company_Name, ;

Customer.Address, Customer.City, Customer.Region, ;

Customer.Postal_Code, Customer.Phone, Customer.Fax, ;

(SELECT SUM(quantity*unit_price) ;

FROM Orders ;

JOIN Order_Line_Items;

ON Orders.Order_ID = Order_Line_Items.Order_ID ;

WHERE BETWEEN(Order_Date,DATE(m.nYear,1,1),DATE(m.nYear,12,31)) ;

AND Customer.Customer_ID=Orders.Customer_ID ) as yTotal ;

FROM Customer ;

INTO CURSOR CustomerTotal

In our tests, using the queries shown, the subquery version (Listing 10) was about 12% faster than the version with calls to MAX() (Listing 9), which in turn was about 12% faster than listing all the parent fields in the GROUP BY clause (Listing 8). The more fields from the parent in the field list, the greater the advantage of the subquery version.

Computing replacements in UPDATE

The third new place you can use subqueries is the SET clause of the UPDATE command. That is, you can use a subquery to compute the value to which a field is to be set. However, when you use this approach, the UPDATE command cannot include a subquery in the WHERE clause. In addition, you’re limited to a single subquery in the SET clause, so you can’t use this approach to compute the values of multiple fields.

For this example, imagine that you have a data warehouse (SalesByProduct) for TasTrade data that tells you how many of each product were sold and the dollar amount of those sales. It’s designed to hold data for a single month and you want to update it at the end of the month.

To update the data, use the following UPDATE commands. Set nMonth and nYear to the month and year of the data you’re collecting before running the code in Listing 11. (“Correlated Updates,” later in this chapter, provides a better solution to this problem.)

Code to create the data warehouse as a cursor is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as CreateWarehouse.PRG. The code in Listing 11 is SubqueryInSet.PRG.

Listing 11. You can use a subquery in the SET portion of the SQL UPDATE command to calculate the new field values on the fly.

UPDATE SalesByProduct ;

SET TotalSales = (;

SELECT NVL(SUM(quantity*unit_price),$0) ;

FROM Order_Line_Items ;

JOIN Orders ;

ON Order_Line_Items.Order_ID = Orders.Order_ID ;

WHERE MONTH(Order_Date) = nMonth AND YEAR(Order_Date) = nYear;

AND Order_Line_Items.Product_ID = SalesByProduct.Product_ID)

UPDATE SalesByProduct ;

SET UnitsSold = (;

SELECT CAST(NVL(SUM(quantity),0) AS N(12)) ;

FROM Order_Line_Items ;

JOIN Orders ;

ON Order_Line_Items.Order_ID = Orders.Order_ID ;

WHERE MONTH(Order_Date) = nMonth AND YEAR(Order_Date) = nYear;

AND Order_Line_Items.Product_ID = SalesByProduct.Product_ID)

The second UPDATE in the example uses the new CAST() function that lets you change data types on the fly. See Chapter 14, “Language Improvements,” for details.

Correlated subqueries and grouping

A correlated subquery is one that refers to one or more fields of the main query, generally in the subquery’s WHERE clause. VFP 8 prohibits the GROUP BY clause in correlated subqueries. Because correlation can give you the same effect as grouping in many cases (see Listing 4, for example), you may not have run into this limit.

But there are a few situations where the ability to use GROUP BY in a correlated subquery makes it easier to get the desired results. Fortunately, VFP 9 permits grouping in a correlated subquery.

For example, consider the case where you want a list of customers who placed at least one order totaling more than a specified value and had that order shipped somewhere other than their address of record. (You might be checking into suspicious transactions.) In VFP 8 and earlier, extracting this information requires two queries in sequence: the first collects information about orders over the specified amount, while the second compares the shipping address of those records to the customer address and extracts customer information. Listing 12 shows one way to do this, with the threshold value set to $4000.

The code in Listing 12 is included in the Developer Downloads
for this chapter, available from www.hentzenwerke.com, as SuspiciousTwoQueries.PRG. The query in Listing 13 is called SuspiciousOrders.PRG.

Listing 12. In VFP 8 and earlier, it takes two queries to get a list of large orders shipped somewhere other than their home office, along with customer information.

SELECT Orders.Customer_ID, Orders.Ship_to_Address, ;

Orders.Order_ID, Orders.Order_Date ;

FROM Orders ;

JOIN Order_line_items ;

ON Orders.Order_ID=Order_Line_Items.Order_ID ;

GROUP BY 1, 2, 3, 4 ;

HAVING SUM(Quantity*Unit_Price)> 4000 ;

INTO CURSOR BigOrders

SELECT Company_Name, Order_ID, Order_Date ;

FROM Customer ;

JOIN BigOrders ;

ON Customer.Customer_ID = BigOrders.Customer_ID ;

WHERE BigOrders.Ship_to_Address <> Customer.Address ;

INTO CURSOR Suspicions

In VFP 9, the ability to group in correlated subqueries means you can find this result with one, more readable query. Listing 13 shows the one-query version.

Listing 13. Using GROUP BY in a correlated subquery in VFP makes it possible to find customers with suspicious orders in a single query.

SELECT Company_Name, Ord.Order_ID, Ord.Order_Date ;

FROM Customer ;

JOIN Orders Ord;

ON Customer.Customer_ID = Ord.Customer_ID ;

WHERE Ord.Order_ID in (;

SELECT Orders.Order_ID;

FROM Orders ;

JOIN Order_Line_Items ;

ON Orders.Order_ID=Order_Line_Items.Order_ID ;

AND Orders.Customer_ID=Customer.Customer_ID ;

AND Orders.Ship_to_Address <> Customer.Address ;

GROUP BY Orders.Order_ID ;

HAVING SUM(Quantity*Unit_Price)> 4000 ) ;

INTO CURSOR Suspicions

As with a number of other situations where a single query replaces two queries, the version in Listing 13 is faster than the one in Listing 12.

Using TOP n in subqueries

The TOP n clause of SELECT lets you return only the first n records (or first n% of the records) of the result set. While the MIN() and MAX() functions let you choose the single smallest or largest value in a given field, TOP n lets you choose multiple items. You can use it to see things like the 10 most recent orders or the 30 most expensive products. For example, this query finds the 10% of products with the fewest overall sales.

SELECT TOP 10 PERCENT Product_ID, ;

SUM(Quantity) AS nSales ;

FROM Order_Line_Items ;

GROUP BY Product_ID ;

ORDER BY nSales ;

INTO CURSOR LowSales

In VFP 8 and earlier, you can’t use the TOP n clause in subqueries. VFP 9 changes that, permitting TOP n in subqueries unless the subquery is correlated. When you use TOP n in a subquery, you must include an ORDER BY clause in that subquery.

Imagine that you’re considering discontinuing the items extracted by the query above (the lowest-selling items). You might want to contact customers who have purchased those items to make sure it won’t be a problem for them. The query in Listing 14 uses the previous query as a subquery to create that list of customers.

The code in Listing 14 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as Discontinue.PRG.

Listing 14. The subquery here finds the lowest-selling 10% of products. The main query uses that information to get a list of customers who purchased those products.

SELECT DISTINCT Company_Name, English_Name ;

FROM Customer ;

JOIN Orders ;

ON Customer.Customer_ID = Orders.Customer_ID ;

JOIN Order_Line_Items ;

ON Orders.Order_ID = Order_Line_Items.Order_ID ;

JOIN ( ;

SELECT TOP 10 PERCENT Product_ID, ;

SUM(Quantity) AS nSales ;

FROM Order_Line_Items ;

GROUP BY Product_ID ;

ORDER BY nSales );

AS LowSales ;

ON Order_Line_Items.Product_ID = ;

LowSales.Product_ID ;

JOIN Products ;

ON LowSales.Product_ID = Products.Product_ID ;

ORDER BY English_Name, Company_Name ;

INTO CURSOR BoughtLowSellers

Correlated updates

In addition to supporting subqueries in the SET clause, the SQL UPDATE command in VFP 9 has a new FROM clause that allows you to draw the update data from another table. This gives what you might call correlated updates.

The example in Listing 11 has one serious drawback. You have to use a separate UPDATE command for each field you want to change. Using the FROM clause, you can achieve the same result with a query followed by an UPDATE command. The code in Listing 15 computes the new values, stores them in a cursor, and then references that cursor in the UPDATE command.

The code in Listing 15 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as CorrelatedUpdate.PRG. The code in Listing 16 is CorrelatedUpdateSubquery.PRG in the Developer Downloads.

Listing 15. The new FROM clause in SQL UPDATE lets you draw replacement values from another table.

SELECT Order_Line_Items.Product_ID, ;

SUM(Quantity*Order_Line_Items.Unit_Price) as TotalSales, ;

SUM(Quantity) AS UnitsSold ;

FROM Order_Line_Items ;

JOIN Orders ;

ON Order_Line_Items.Order_ID = Orders.Order_ID ;

AND MONTH(Order_Date) = nMonth AND YEAR(Order_Date) = nYear ;

GROUP BY 1 ;

INTO CURSOR MonthlySales

UPDATE SalesByProduct ;

SET SalesByProduct.TotalSales = NVL(MonthlySales.TotalSales, $0), ;

SalesByProduct.UnitsSold = NVL(MonthlySales.UnitsSold, 0) ;

FROM SalesByProduct ;

LEFT JOIN MonthlySales ;

ON SalesByProduct.Product_ID = MonthlySales.Product_ID

Along with specifying that the values come from another table, you can actually perform joins in the FROM clause to put together the list of values. In Listing 15, the outer join ensures that the records for products not sold in the specified month are set to 0.

In fact, the FROM clause of UPDATE supports subqueries (derived tables), so you can do this entire operation in a single UPDATE command, as shown in Listing 16.

Listing 16. Instead of running a query ahead of time to compute the results, you can use a derived table in the FROM clause of an UPDATE command.

UPDATE SalesByProduct ;

SET SalesByProduct.TotalSales = NVL(MonthlySales.TotalSales, $0), ;

SalesByProduct.UnitsSold = NVL(MonthlySales.UnitsSold, 0) ;

FROM SalesByProduct ;

LEFT JOIN (;

SELECT Order_Line_Items.Product_ID, ;

SUM(Quantity*Order_Line_Items.Unit_Price) as TotalSales, ;

SUM(Quantity) AS UnitsSold ;

FROM Order_Line_Items ;

JOIN Orders ;

ON Order_Line_Items.Order_ID = Orders.Order_ID ;

AND (MONTH(Order_Date) = nMonth AND YEAR(Order_Date) = nYear) ;

GROUP BY 1) AS MonthlySales ;

ON SalesByProduct.Product_ID = MonthlySales.Product_ID

Correlated DELETEs

In VFP 8 and earlier, the SQL DELETE command lets you list only one table. While you can use subqueries in the WHERE clause, deleting records based on information in other tables can be tricky. VFP 9 allows you to list multiple tables in a DELETE’s FROM clause, joining them according to the usual rules. This provides a much cleaner way to perform correlated deletion, deletion from one table based on data in one or more other tables.

The syntax for a correlated DELETE is a little confusing. If the FROM clause of DELETE contains more than one table, you must specify the target table for the deletion between DELETE and FROM:

DELETE [Target] FROM Table1 [JOIN Table2 …]

Use the local alias of the target table between DELETE and FROM. This may be the name of the table, but if you assign a local alias to the table in the FROM clause, use that instead. (Note that the same rules apply for UPDATE, when the table being updated is also included in the FROM clause, as in Listing 16.)

The TasTrade database doesn’t lend itself to an example of this sort of deletion; it’s designed with the assumption that records are marked inactive rather than deleted. Assume, though, you have Products and Suppliers tables like those in TasTrade, with the primary key of Supplier used as a foreign key in Products. Suppose there’s a problem getting products from Australia and you decide to eliminate from your product list all products that come from suppliers in Australia. To perform this deletion in VFP 8, you use a subquery as in Listing 17. In VFP 9, you can use a JOIN condition instead. Listing 18 shows a DELETE command to remove those products from the Products table.

The code in Listing 17 is included as DeleteProductSubquery.PRG,
while Listing 18 is included as DeleteProduct.PRG in the Developer Downloads for this chapter. Both programs include code to create and populate the tables.

Listing 17. In VFP 8 and earlier, deleting from one table based on data in another requires a subquery.

DELETE FROM Products ;

WHERE Supplier_ID IN ( ;

SELECT Supplier_ID ;

FROM Supplier ;

WHERE UPPER(Supplier.Country) = "AUSTRALIA" )

Listing 18. The DELETE command now supports multiple tables in the FROM clause. The join conditions determine which records are deleted.

DELETE Products ;

FROM Products ;

JOIN Supplier ;

ON Products.Supplier_ID = Supplier.Supplier_ID ;

WHERE UPPER(Supplier.Country) = "AUSTRALIA"

A more perfect UNION

The UNION clause of SELECT lets you combine the results of several queries into a single result set. In VFP 8, the rules for UNION were loosened, making the clause easier to use. VFP 9 offers two more improvements to UNION and one restriction.

Use names in ORDER BY with UNION

In earlier versions of VFP, when you use the UNION clause to combine multiple queries into a single result, the ORDER BY clause can list only the positions of the fields in the field list. You can’t refer to fields by name, even if the field has the same name in every query in the UNION. This makes such queries hard to read and hard to maintain, because the ORDER BY list must be corrected if the field list changes.

In VFP 9, you can use field names in the ORDER BY clause of a UNIONed query. The field names you use are the ones in the result set. Be aware that when the names of corresponding fields in the UNION are different, the result draws the field name from the last query in the UNION. (A better approach than depending on the order of the queries is to use the AS clause to ensure corresponding fields have the same name in each query in the UNION.)

Listing 19 is a simple example that uses this capability. Even with this straightforward query, using the field names increases readability considerably.

The query in Listing 19 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as AllCompanies.PRG.

Listing 19. Field names in ORDER BY—VFP 9 allows you to use the field names of the result set in the ORDER BY clause of a UNIONed query.

SELECT Company_Name, Address, City, Region, Postal_Code, Country ;

FROM Customer ;

UNION ;

SELECT Company_Name, Address, City, Region, Postal_Code, Country ;

FROM Supplier ;

ORDER BY Country, City ;

INTO CURSOR AllCompanies

Insert data from UNIONed result

VFP 8 introduced the ability to populate a table or cursor directly from a query result with the addition of the INSERT INTO … SELECT syntax. This made it possible to compute results and add them in a single step.

VFP 9 adds another capability to that syntax: the query used can include the UNION clause. This means you can consolidate data and add it to a table or cursor in one step.

For example, suppose you have a data warehouse for TasTrade, containing the annual sales for each employee by product as well as an annual total for each employee. (Note that this is a different data warehouse than the one described in “Computing Replacements in UPDATE,” earlier in this chapter.)

You can compute the sales for each product by each employee in a specified year with a single query; similarly, you can compute the totals for each employee for a year with one query. However, collecting both the product-specific and the total data requires either two queries or a query involving a UNION. If you want to add a year’s worth of data to the warehouse, you can do it with the INSERT in Listing 20.

The command in Listing 20 is included in this chapter’s Developer Downloads, available from www.hentzenwerke.com, as WarehouseUnion.PRG. The program includes code to create the warehouse. (Here, it’s a cursor; in production code, of course, you’d use a table.)

Listing 20. VFP 8 added the ability to INSERT directly from a query result; VFP 9 extends it to queries involving a UNION. Be sure to assign a value to the variable nYear before running this example.

INSERT INTO Warehouse ;

SELECT CrossProd.Product_ID, ;

CrossProd.Employee_ID, ;

m.nYear as nYear, ;

NVL(nUnitsSold, 0), ;

NVL(nTotalSales, $0);

FROM (SELECT Employee.Employee_ID, ;

Products.Product_ID ;

FROM Employee, Products) AS CrossProd ;

LEFT JOIN ( ;

SELECT Product_ID, Employee_ID, ;

SUM(Quantity) AS nUnitsSold, ;

SUM(Quantity * Unit_Price) AS nTotalSales ;

FROM Orders ;

JOIN Order_Line_Items ;

ON Orders.Order_ID = ;

Order_Line_Items.Order_ID ;

WHERE YEAR(Order_Date) = m.nYear ;

GROUP BY Product_ID, Employee_ID ) ;

AS AnnualSales ;

ON CrossProd.Employee_ID = ;

AnnualSales.Employee_ID ;

AND CrossProd.Product_ID = AnnualSales.Product_ID ;

UNION ;

SELECT "Total" AS Product_ID, Employee.Employee_ID, ;

m.nYear AS nYear, ;

CAST(NVL(SUM(Quantity),0) as N(12)) ;

AS nUnitsSold, ;

NVL(SUM(Quantity * Unit_Price), $0) ;

AS nTotalSales ;

FROM Orders ;

JOIN Order_Line_Items ;

ON Orders.Order_ID = Order_Line_Items.Order_ID ;

AND YEAR(Order_Date) = m.nYear ;

RIGHT JOIN Employee ;

ON Orders.Employee_ID = Employee.Employee_ID ;

GROUP BY Employee.Employee_ID ;

ORDER BY 2, 1

No parentheses with UNION

Although it wasn’t really syntactically correct, earlier versions of VFP didn’t object if queries in a UNION were enclosed in parentheses. In VFP 9, a single query in a UNION can be surrounded by parentheses, but putting parentheses around multiple queries in a UNION raises a new error, error 2196. Listing 21 shows a query that works in VFP 8, but fails in VFP 9, due to the new rule.

The query in Listing 21 is included in the Developer Downloads for this chapter, available from www.hentzenwerke.com, as UnionParens.PRG. When you run it in VFP 9, it generates an error.

Listing 21. Putting parentheses around multiple queries in a UNION raises a
new error.

SELECT Company_Name, Address, City, Region, Postal_Code, Country ;

FROM Customer ;

UNION ;

(SELECT Company_Name, Address, City, Region, Postal_Code, Country ;

FROM Supplier ;

UNION ;

SELECT Company_Name, "", "", "", "", "" FROM Shippers )

According to the Fox team, using parentheses around multiple UNIONs can cause incorrect results.

Combining DISTINCT and ORDER BY

VFP allows you to order query results by any field from the source tables; fields in the ORDER BY list don’t have to be in the field list. In VFP 9, this is no longer true for queries that use SELECT DISTINCT. For example, this query executes in VFP 8, but it raises error 1808 (“SQL: ORDER BY clause is invalid.”) in VFP 9:

SELECT Distinct Customer_ID;

FROM Orders ;

ORDER BY Order_Date

This behavior is affected by SET ENGINEBEHAVIOR. (See “Turn off new behavior” later in this chapter.)

Optimization changes

VFP 9 includes several changes to improve the performance of your queries, as well as a new function that makes testing optimization easier.

Fully optimize LIKE with “%”

The LIKE operator lets you compare strings. If a condition in a SQL command includes cField LIKE cString, the specified field is compared to the specified character string on a character by character basis. Unlike the = operator, if cString is shorter than cField, the two do not match, unless a wildcard character is used. The LIKE operator supports two wildcard characters—use “_” to represent a single unknown character and “%” to represent 0 or more unknown characters. For example, you can find all the customers in TasTrade whose names begin with the letter “P” using this query:

SELECT Customer_ID, Company_Name ;

FROM Customer ;

WHERE UPPER(Company_Name) LIKE "P%" ;

INTO CURSOR PCompanies

Earlier versions of VFP could not fully optimize that query. LIKE “string%” expressions could only be partially optimized. VFP 9 fully optimizes such expressions. (Full optimization applies only when the % wildcard is at the end of the character string.)

Our tests showed mixed results regarding the effect of this optimization. For many queries, versions using LIKE and = were equally fast in both VFP 8 and VFP 9. However, in one situation, the case when VFP takes a shortcut and simply filters the original table, optimization of LIKE made a significant difference. VFP takes this shortcut with any query that involves a single table, has no calculated fields, and is fully optimizable. Filtering the source table rather than creating an actual file on disk saves considerable time. Optimization of LIKE means the VFP engine can take this approach with some additional queries. (You can turn off this shortcut by including the NOFILTER clause in the query.)

Better speed for TOP n

When you use the TOP n or TOP n PERCENT clause to return only a subset of the records that otherwise match the query conditions, VFP has to figure out which records are at the top of the list. When you specify the TOP n of a large set, that process can take considerable time. VFP 9 improves performance in that situation.

In our tests, we didn’t see a difference until we worked with a very large table. Choosing the TOP 20 out of a table of nearly 75,000 records showed no difference. When we looked for the TOP 20 in a table of over a million records, however, VFP 9 finished in about one-third the time of VFP 8.

Along with making TOP n calculations faster, the behavior of TOP n queries has changed slightly. In earlier versions of VFP, a query with a TOP n clause could return more than n records due to ties in the data. VFP 9 never returns more than the exact number of records specified by the TOP n clause. (See “Turning off new behavior” later in this chapter for the exception to this rule.) When there are ties, it appears VFP chooses records in physical order from the group with the same value.

These changes are among those affected by the new SET ENGINEBEHAVIOR TO 90 setting. See “Turn off new behavior” later in this chapter for details.

Improved performance with OR

In earlier versions, when a query used the OR operator to combine conditions involving different tables, the result couldn’t always be optimized. Changes to the VFP 9 SQL engine mean that such conditions should be optimized if the individual conditions are optimizable.

In our testing, not every query using OR with conditions based on different tables showed improvement, but we were able to see the difference in some cases. For example, this query (using the Northwind data) ran nearly four times as fast in VFP 9:

SELECT Orders.OrderId, ProductId ;

FROM Orders ;

JOIN OrderDetails ;

ON Orders.OrderId=OrderDetails.OrderId ;

WHERE ;

(Orders.OrderDate=DATE(1997,9,1) AND OrderDetails.Quantity>2);

OR ;

(Orders.OrderDate=DATE(1997,9,2) AND OrderDetails.Quantity>3);

INTO CURSOR Result

You may also see some improvements where OR is used with subqueries.

Filtering and temporary indexes

When you join two tables in a query, VFP picks an index tag to match up corresponding records. If the fields in question aren’t indexed, or the existing tag won’t help much (for example, when the join involves a small table and a much larger table, and there’s only a tag for the smaller table), the engine creates an index on the fly. In the output from SYS(3054), the join shows as “using temp index.”

VFP 9 speeds up queries that build a temporary index and have a non-optimizable filter on the same table (the one for which the index is built). The effect is most noticeable with large tables and in cases where the non-optimizable filter eliminates many records. We suspect that the engine is filtering before building the temporary index, which speeds up both building the index and using it.

In our tests, none of the sample tables that come with VFP were large enough to see this effect. We were able to demonstrate the improvement using a table with about 75,000 records. Our test query joined the table to itself on an unindexed field, using a filter condition that selected only about 250 records. In VFP 9, the query took under a second. In VFP 8, the same query took almost 50 seconds.

Correlating faster

You may never run into another area where performance has been improved. If you have a query with a correlated subquery, and the query also includes a filter on the table from the main query that’s used in the subquery, VFP 9 performs measurably better than VFP 8.

We tested with the following query using data from TasTrade and found VFP 9 about four times as fast:

SELECT Orders.Order_ID, Customer.Company_Name as Cust_Name, ;

Shippers.Company_Name AS Ship_Name, Orders.Order_Date ;

FROM Orders ;

JOIN Customer ;

ON Orders.Customer_ID = Customer.Customer_ID ;

JOIN Shippers ;

ON Orders.Shipper_ID = shippers.Shipper_ID ;

WHERE Orders.Discount>0 AND ;

Orders.Order_Date = (SELECT MAX(Order_Date) ;

FROM Orders Ord ;

WHERE Orders.Customer_ID=Ord.Customer_ID );

ORDER BY Cust_Name ;

INTO CURSOR MostRecentOrders

Logging optimization results

The SYS(3054) function was introduced in VFP 5. It gives you information about how FoxPro is optimizing a query. It’s been improved several times and now offers a great deal of data about the optimization process. However, it’s still hard to use SYS(3054) to gather information about query performance through an entire program or application.

Enter SYS(3092). This new function lets you direct SYS(3054) output to a log file. By itself, SYS(3054) can send output only to the active window or a variable. With SYS(3092), you can collect data about a whole series of queries and examine it at your leisure.

The syntax is:

cLogFile = SYS(3092 [, cFileName [, lAdditive ] ] )

The cFileName parameter specifies the name (including path) of the log file. Use lAdditive to specify whether an existing file is overwritten. The default is to overwrite an existing file.

To turn off logging and make the log file available for reading, pass the empty string as the cFileName parameter.

The function returns the name of the active log file. Note that the new log file is set before the value is returned, so to save the name of an old log file before changing it, you must call the function once with no parameters, and then call it again, passing the new value.

When you turn on logging with SYS(3092), the output from SYS(3054) is still echoed to the active window or stored to a specified variable.

Once you establish a log file with SYS(3092), use SYS(3054) as you normally would
and run the queries you want to test. When you’re done testing, reset SYS(3054) and then issue SYS(3092, "") to stop logging. You can then examine the log file to see your optimization results.

The information in the log file is most useful if you pass either 2 or 12 as the second parameter to SYS(3054). Added in VFP 7, those settings include the query itself in the output before reporting on optimization.

SELECT from buffered tables

Since VFP 3, FoxPro developers have been frustrated by the behavior of SELECT with buffering. When a buffered table is used in a query, VFP uses the actual table on disk, not
the open buffered version. This means query results don’t reflect uncommitted changes to
the data.

This behavior follows naturally from the normal behavior of queries. In general, whether a table is open or not, when it’s listed in the FROM clause of a query, the VFP engine opens it again in a new work area.

In some situations, it would be really handy to be able to pull data from a buffered table with a query. In VFP 8 and earlier, you have to turn to Xbase commands (such as CALCULATE) instead.

VFP 9 gives you the option of looking at the buffered data. Add the new WITH (Buffering=.T.) clause to a query and it uses an available buffer rather than the table on
disk. Listing 22 shows a query that counts the number of customers in each country using buffered data.

Listing 22. You can query buffered data using the new WITH (Buffering=.T.) clause.

SELECT Country, CNT(*) ;

FROM Customer WITH (Buffering = .T.) ;

GROUP BY Country ;

INTO CURSOR BufferedCount

The Developer Downloads for this chapter, available from www.hentzenwerke.com, include QueryWithBuffering.PRG, which demonstrates the effect of the WITH clause.

The WITH clause applies to a single table. If the query lists multiple tables for which you want to use buffered data, include a WITH statement for each.

One big warning. If you’re using row buffering, the query commits the changes to the current row. This actually makes sense as the query moves the record pointer in the buffered table. In earlier versions, where queries operated against the data on disk, the record pointer in the buffer didn’t move, but when you query the buffer itself, the record pointer does move.

You can also control the behavior of queries with buffered tables globally. The new SET SQLBUFFERING command lets you specify whether queries draw from disk or from buffers by default. The WITH (Buffering = lExpr) clause overrides the current setting for a particular table and query. SET SQLBUFFERING is scoped to the data session. Use SET(“SQLBUFFERING”) to query the current setting.

Turn off new behavior

The significant changes to VFP’s SQL engine introduced in VFP 8 caused problems for some existing applications. Rather than force developers to change working code or be stuck in VFP 7, the Fox team added the SET ENGINEBEHAVIOR command, which allows you to turn off the VFP 8 changes. While it’s not a good idea to use it all the time, the command provides a flexible solution for existing applications.

Most of the changes to the SQL engine in VFP 9 are unlikely to cause compatibility problems. However, there are a few items that may be an issue for some applications, so the Fox team added a new setting to SET ENGINEBEHAVIOR.

As noted in “Better speed for TOP n” earlier in the chapter, in VFP 9, a TOP n query now returns exactly n records; in the case of ties, it may discard some of the tied results.

When a query includes one of the aggregate functions (CNT(), SUM(), AVG(), MIN() or MAX()), but has no GROUP BY clause, VFP 9 always returns a single record. If no records meet the join and filter conditions, the result record has the null value for all fields. In earlier versions, such a query returned an empty result.

Finally, all fields listed in the ORDER BY clause of a query using SELECT DISTINCT must be included in the field list of the query.

To turn off these behaviors, SET ENGINEBEHAVIOR to 80 or 70. The default is SET ENGINEBEHAVIOR 90, which enables the new behaviors.

The bottom line

Many of the SQL changes in VFP 9 increase compatibility with the SQL-92 standard. They also provide more tools for manipulating your data as needed. While some of the changes aren’t likely to have a strong impact on your day-to-day work, you will probably find that others come in handy over and over. The ability to extract some data with a single query, where previously two queries were needed, not only tends to speed up the code, but also makes it easier to define views for those tasks.

Although we are already finding many uses for derived tables, we suspect our
favorite SQL change may well turn out to be SYS(3092), which turns on logging of optimization results.

Updates and corrections for this chapter can be found on Hentzenwerke’s website, www.hentzenwerke.com. Click “Catalog” and navigate to the page for this book.