Thursday, April 29, 2010

Everyone knows "Joel on Software", doesn't it?

Well, this is a post is out of the things directly related to Oracle technologies, but a thing about of software development itself. Joel Spolsky, if it says shortly, is author of Joel on Software, if it says a bit wider then Joel has written more then 1000 articles and a number of books about software development, management, business, and the Internet. I recommend to read his top articles like "Things You Should Never Do" or "Painless Functional Specifications" and many more articles, answering the daily questions we ask while developing or managing the development.

Tuesday, April 27, 2010

DWH Core Perfomance Fundamentals

I've added blog Structured Data: Oracle Database Performance And Scalability Blog in my blog list. There are a lot of cool stuff, dont be lazy to check it out. Here is a series of articles regarding DWH core perfomance fundamentals:

The Core Performance Fundamentals Of Oracle Data Warehousing – Introduction

Monday, April 26, 2010

ANSI join syntax vs. non-ANSI join syntax

Recently I've participated in discussion about the SQL ANSI join syntax vs. non-ANSI. In my opinion, non-ANSI join syntax is much more simpler to read and 99% of the statements could be effectivlly written using Oracle syntax. The only case when i use ANSI syntax is FULL OUTER JOIN, because it's simple to use, to understand and to write. One of the additional arguments of using ANSI syntax was the following statement:

 

SQL> with A as
2 (select 1 COL1, 'A' COL2
3 from DUAL
4 union all
5 select 2, 'B'
6 from DUAL
7 union all
8 select 3, 'C' from DUAL),
9 B as
10 (select 1 COL1, 'D' COL2
11 from DUAL
12 union all
13 select 2, 'E' from DUAL)
14 select A.COL2, B.COL2 from A left outer join
15 B on (A.COL1 = B.COL1 and B.COL2 in ('D', 'E'))
16 /

COL2 COL2
---- ----
A D
B E
C

SQL>




as you can see LEFT OUTER JOIN is beign using with IN clause which is not allowed in non-ANSI syntax:


 

SQL> with
2 a as (select 1 col1, 'A' col2 from dual
3 union all
4 select 2, 'B' from dual
5 union all
6 select 3, 'C' from dual),
7 b as (select 1 col1, 'D' col2 from dual
8 union all
9 select 2, 'E' from dual)
10 select a.col2, b.col2
11 from a, b
12 where a.col1 = b.col1 (+)
13 and b.col2 in ('D','E')
14 /

COL2 COL2
---- ----
A D
B E

SQL>
SQL> with
2 a as (select 1 col1, 'A' col2 from dual
3 union all
4 select 2, 'B' from dual
5 union all
6 select 3, 'C' from dual),
7 b as (select 1 col1, 'D' col2 from dual
8 union all
9 select 2, 'E' from dual)
10 select a.col2, b.col2
11 from a, b
12 where a.col1 = b.col1 (+)
13 and b.col2(+) in ('D','E')
14 /

with
a as (select 1 col1, 'A' col2 from dual
union all
select 2, 'B' from dual
union all
select 3, 'C' from dual),
b as (select 1 col1, 'D' col2 from dual
union all
select 2, 'E' from dual)
select a.col2, b.col2
from a, b
where a.col1 = b.col1 (+)
and b.col2(+) in ('D','E')

ORA-01719: outer join operator (+) not allowed in operand of OR or IN

SQL>



Yes, it's not allowed, but could we re-write the query with non-ANSI syntax without much overhead? Yes, we are:

 

SQL> with
2 a as (select 1 col1, 'A' col2 from dual
3 union all
4 select 2, 'B' from dual
5 union all
6 select 3, 'C' from dual),
7 b as (select 1 col1, 'D' col2 from dual
8 union all
9 select 2, 'E' from dual)
10 select a.col2, b.col2
11 from a, b
12 where a.col1 = b.col1 (+)
13 and (b.col2 in ('D','E') or b.col2 is null)
14 /

COL2 COL2
---- ----
A D
B E
C

SQL>



let's check the execution plans for ANSI syntax and non-ANSI:

 

SQL> select * from v$version
2 /

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
PL/SQL Release 11.1.0.7.0 - Production
CORE 11.1.0.7.0 Production
TNS for Linux: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production




ANSI:

 


---------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:00.01 |
| 1 | NESTED LOOPS OUTER| | 1 | 6 | 3 |00:00:00.01 |
| 2 | VIEW | | 1 | 3 | 3 |00:00:00.01 |
| 3 | UNION-ALL | | 1 | | 3 |00:00:00.01 |
| 4 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 |
| 5 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 |
| 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 |
| 7 | VIEW | | 3 | 2 | 2 |00:00:00.01 |
|* 8 | VIEW | | 3 | 2 | 2 |00:00:00.01 |
| 9 | UNION-ALL | | 3 | | 6 |00:00:00.01 |
| 10 | FAST DUAL | | 3 | 1 | 3 |00:00:00.01 |
| 11 | FAST DUAL | | 3 | 1 | 3 |00:00:00.01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("A"."COL1"="B"."COL1")



non-ANSI syntax:

 

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:00.01 | | | |
|* 1 | FILTER | | 1 | | 3 |00:00:00.01 | | | |
|* 2 | HASH JOIN OUTER| | 1 | 1 | 3 |00:00:00.01 | 1348K| 1348K| 907K (0)|
| 3 | VIEW | | 1 | 3 | 3 |00:00:00.01 | | | |
| 4 | UNION-ALL | | 1 | | 3 |00:00:00.01 | | | |
| 5 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
| 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
| 7 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
| 8 | VIEW | | 1 | 2 | 2 |00:00:00.01 | | | |
| 9 | UNION-ALL | | 1 | | 2 |00:00:00.01 | | | |
| 10 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
| 11 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter((INTERNAL_FUNCTION("B"."COL2") OR "B"."COL2" IS NULL))
2 - access("A"."COL1"="B"."COL1")



What we see is that both execution plans has 11 operations, but the main different is in JOIN operations. In case ANSI we have NESTED LOOPS OUTER which loops 3 times through B for each record in A. In case non-ANSI syntax we have HASH JOIN OUTER join which executed only once but with additional overhead with filter predicate.

As result i could state that non-ANSI example looks much better especially in case with much data in source, let's check:

ANSI:

 

with
a as (select level col1, dbms_random.string(1, 3) col2 from dual connect by level <= 1000),
b as (select level col1, dbms_random.string(1, 3) col2 from dual connect by level <= 100)
select A.COL2, B.COL2 from A left outer join
B on (A.COL1 = B.COL1 and B.COL2 in ('D', 'E'))
/

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.47 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.47 |
| 2 | NESTED LOOPS OUTER | | 1 | 1 | 1000 |00:00:01.46 |
| 3 | VIEW | | 1 | 1 | 1000 |00:00:00.01 |
| 4 | CONNECT BY WITHOUT FILTERING | | 1 | | 1000 |00:00:00.01 |
| 5 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 |
| 6 | VIEW | | 1000 | 1 | 0 |00:00:01.46 |
|* 7 | VIEW | | 1000 | 1 | 0 |00:00:01.46 |
| 8 | CONNECT BY WITHOUT FILTERING| | 1000 | | 100K|00:00:00.11 |
| 9 | FAST DUAL | | 1000 | 1 | 1000 |00:00:00.01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

7 - filter(("A"."COL1"="B"."COL1" AND INTERNAL_FUNCTION("B"."COL2")))



non-ANSI:

 

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | | | |
|* 2 | FILTER | | 1 | | 900 |00:00:00.01 | | | |
|* 3 | HASH JOIN OUTER | | 1 | 1 | 1000 |00:00:00.01 | 1452K| 1452K| 1264K (0)|
| 4 | VIEW | | 1 | 1 | 1000 |00:00:00.01 | | | |
| 5 | CONNECT BY WITHOUT FILTERING| | 1 | | 1000 |00:00:00.01 | | | |
| 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
| 7 | VIEW | | 1 | 1 | 100 |00:00:00.01 | | | |
| 8 | CONNECT BY WITHOUT FILTERING| | 1 | | 100 |00:00:00.01 | | | |
| 9 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter((INTERNAL_FUNCTION("B"."COL2") OR "B"."COL2" IS NULL))
3 - access("A"."COL1"="B"."COL1")



As result we have elapsed time for NESTED LOOPS OUTER operation equal to 00:00:01.46 in ANSI case and 00:00:00.01 for HASH JOIN OUTER in non-ANSI case. It's important to note that in ANSI case we have 100000(100K) records in step ID = 8 which lead to use more resourses and consequently more time for execution.

Any way the example doesn't show that ANSI join syntax is something to be avoid, but shows that ANSI / non-ANSI join syntax may be treated differently by SQL engine.

Monday, April 12, 2010

"the pointless" triggers can cause "the problems"

I've just surfed via Christian Antognini blog and found the fact of perfomance issue(CPU overhead) with triggers on Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 (64-bit) Solaris 10 (SPARC) due to Oracle bug. In the example, for reason of integrity constraint each table has its own trigger even it doesn't support any integrity (check example by the link above). "Firing blank rounds" cause increasing of CPU usage.

Out of bugs i would add a comment:
In addition, DML operations inside the triggers could lead to large number of row-level exclusive locks for large number of tables (trigger's shower). More over it's usually very hard to predict how many triggers will be fired due to some system operations when triggers realize some bussiness rules and their execution depends on the exact case.

Friday, April 2, 2010

Installing Oracle Weblogic Server cluster (2 nodes) on Linux RHEL 5

Here is brief description how to install Oracle WebLogic Servers as cluster under Linux RHEL 5 assuming that there are 2 nodes in.

NODE1:

While installing WebLogic:

1) Create domain (say WLDM)
2) In optional configuration check as the following:

 Select Optional Configuration:
------------------------------

1|Administration Server [x]
2|Managed Servers, Clusters and Machines [x]
3|RDBMS Security Store [ ]



3) Do necessary settings for Admin Console
4) DO necessary settings for each Managed server. Please note that you need to set up settings for the second node Managed server as well before it will be installed.

 

Configure Managed Servers:
--------------------------

Add or delete configuration information for Managed Servers. A typical production environment has one or more Managed Servers. Each Managed Server is an
instance of WebLogic Server used to host enterprise applications.

| Name* | Listen address* | Listen port | SSL listen port | SSL enabled |
_|__________|_________________|_____________|_________________|_____________|
1| MS1 | 10.10.10.133 | 7003 | 7503 | true |
2->| MS2 | 10.10.10.134 | 7003 | 7503 | true |



5) Configure Cluster.


 
Configure Clusters:
-------------------

Add or delete configuration information for clusters. A cluster consists of multiple server instances working together to provide increased scalability
and reliability.

| Name* | Cluster messaging mode | Multicast address | Multicast port | Cluster address |
_|___________|________________________|___________________|________________|_________________|
1->| WLCL | unicast | N/A | N/A | |




6) Assign Managed Servers to Clusters:

 

Assign Servers to Clusters:
---------------------------

Assign Managed Servers to a cluster in the domain.

Cluster
|_____WLCL [1]
|_____MS1
|_____MS2




7) Configure machines and assign machines to Managed servers. Please note there are two types of machines differed by non-Unix/Unix. You need to fill the only type for each of the node otherwise you will get duplication by machines.

8) Finish installing process by exact installing Domain right after all the setting commited.

 **** Domain Created Successfully! ****


NODE2:

Start installing Weblogic as you did in node1

1) Create domain the same as we pointed in node1
2) In optional configureation check as follows:

 

Select Optional Configuration:
------------------------------

1|Administration Server [ ]
2|Managed Servers, Clusters and Machines [x]
3|RDBMS Security Store [ ]


3) Do necessary setting for exact MS2 Managed server that we did in node1:

 

Configure Managed Servers:
--------------------------

Add or delete configuration information for Managed Servers. A typical production environment has one or more Managed Servers. Each Managed Server is an
instance of WebLogic Server used to host enterprise applications.

| Name* | Listen address* | Listen port | SSL listen port | SSL enabled |
_|__________|_____________________|_____________|_________________|_____________|
1->| MS2 | 10.10.10.134 | 7003 | 7503 | true |



4) Skip cluster and machine configureation settings
5) Finish installing process by exact installing Domain right after all the setting commited.

 **** Domain Created Successfully! ****



Now you can start node1 Admin Console and Managed server with Managed server at node2. Please note that Admin Console required to be start first once it controls the cluster.

Monday, March 22, 2010

Oracle BI Publisher + Digital signature (Digital ID)

Oracle BI Publisher has an feature related to digital signature in PDF documents.

Here is some requirements to do in order to use the feature:

1. You need up and running Oracle BI Scheduler. In order to run scheduler you need to create specific schema manually and deploy the database objects via Oracle BI Publisher.

2. It's required to add Oracle Security Developer Tools JARs to your application server (particullary, JARs which usually goes, for instance, with Oracle Client. Without JARs you're not able to use digital signature which cause stuck of scheduler job in Running state.

3. Create manually or use existing PFX (Personal Information Exchange) file with private and public key ((here is good explanation how to create PFX file manually: http://blogs.msdn.com/maximelamure/archive/2007/01/24/create-your-own-pfx-file-for-clickonce.aspx) and upload it as Digital ID in Oracle BI Publisher Admin Console. Give appropriate rights to user groups which are allowed to sign documents.

4. Create report in BI Publisher by uploading template with signature area. You can also manually add signature area in BI Publisher in report configure page.


5. Create a job to generate the report in BI Publisher Scheduler.

6. Once report succcessfully cooked you could download PDF with your digital ID.

Tuesday, March 2, 2010

Balancing and proxying. Apache 2.2 + mod_proxy_balancer + Clustered Oracle WebLogic Managed servers

Recently i was working around the task aimed to realize simple loadbalancing between to clustered Oracle WebLogic servers and proxing facilities to pass external request to our intranet for specific Oracle BI application (analytics). In the front-end tier I've used Apache 2.2 with mod_proxy_balancer module on Linux RHLE 5. Additionally WebLogic servers were in intranet whereis web server looking to external network. So below the setting that i've added into Apache's httpd.conf web server side to make external access possible with loadbalancing by requests.

Assume that we have servers like below:

Level 1: Internet - Apache 2.2 HTTP server + mod_proxy_balancer (www.mydomain.com)
Level 2: Intranet - Oracle WebLogic Managed server 1 (clustered) (intradomain1.com - 10.10.10.133)
Level 2: Intranet - Oracle WebLogic Managed server 2 (clustered) (intradomain2.com - 10.10.10.134)


httpd.conf:

....

  
#Switching off facilities of forwarding proxy server
ProxyRequests Off

#Actual balancing settings with two members of balancer group. Balancer is accessed by balancer://lb link.
#ProxySet lbmethod sets balancer load-balance method as byrequest to perform weighted request counting.
#loadfactor is normalized weighted load applied to the worker. I set it to 50/50 to equal balancing between nodes.
#route is a value appended to session id. Usually used with inside balancers, for instance, when we need to used predefined requests direction using subsequent balancers.

<Proxy>
BalancerMember http://10.10.10.133:7003 loadfactor=50 route=r1
BalancerMember http://10.10.10.134:7003 loadfactor=50 route=r2
ProxySet lbmethod=byrequest
</Proxy>

#Allow everyone to access any proxied content
<Proxy *>
Order deny,allow
Allow from all
</Proxy>

#Every request to http://www.mydomain.com/analytics goes through Apache balancer to pass into intranet Weblogic application servers.
ProxyPass /analytics balancer://lb/analytics/

#Reverse intranet servers requests back
ProxyPassReverse / balancer://lb/analytics/
ProxyPassReverse / http://10.10.10.133/analytics/
ProxyPassReverse / http://10.10.10.134/analytics/

#Set location accessable by http://www.mydomain.com/lbm to see actual balancing state.
<Location>
SetHandler balancer-manager
</Location>



ProxyPass directive allows remote servers to be mapped into the space of the local server; the local server does not act as a proxy in the conventional sense, but appears to be a mirror of the remote server. The local server is often called a reverse proxy or gateway. The path is the name of a local virtual path; url is a partial URL for the remote server and cannot include a query string.

ProxyPassReverse
directive lets Apache adjust the URL in the Location, Content-Location and URI headers on HTTP redirect responses. This is essential when Apache is used as a reverse proxy (or gateway) to avoid by-passing the reverse proxy because of HTTP redirects on the backend servers which stay behind the reverse proxy.

Only the HTTP response headers specifically mentioned above will be rewritten. Apache will not rewrite other response headers, nor will it rewrite URL references inside HTML pages. This means that if the proxied content contains absolute URL references, they will by-pass the proxy. A third-party module that will look inside the HTML and rewrite URL references is Nick Kew's mod_proxy_html
.

For details please check documentation:
http://httpd.apache.org/docs/2.2/mod/mod_proxy.html


Please note with setting i've mentioned if there is no session's replication between application servers you could have a problems with application access or its work. The one of the other possible solution is to use session ID as identifier of the backend node e.g preventing of switching nodes in the same session (it's called sticky session). I'm going to describe in future posts how to make replication and how to use session ID to identify required destination.