For the Oracle and MSSQL DBA role at RHB Bank, here are few interview questions from the panel with sample answers related to the JD's focus on HA, performance, compliance, and banking ops. These draw from standard DBA best practices.
## Technical Basics
- **What are the key differences between Oracle and MSSQL architectures?**
Oracle uses SGA/PGA for memory (shared server processes) and tablespaces for storage, while MSSQL relies on buffer pool and filegroups. I'd leverage Oracle's multitenant CDB/PDB for isolation in banking vs. MSSQL's contained databases for similar security.
- **How do you install a new RDBMS instance per standards?**
For Oracle: Run DBCA with custom templates, set init.ora params like db_block_size, create tablespaces, and validate via opatch lsinventory. For MSSQL: Use setup.exe with /Q for silent install, configure max server memory, and enable TDE early for compliance.
- **Explain RAW disk management and capacity planning.**
RAW devices bypass filesystem for direct I/O (e.g., Oracle ASM). I'd use V$DATAFILE for growth trends, forecast via AWR/Perfmon, and plan ASM diskgroups with 20-30% headroom for banking transaction spikes.
## High Availability & Patching
- **How do you set up and monitor HA like Data Guard or Always On?**
Oracle Data Guard: Configure primary/standby with broker, enable fast-start failover, monitor via DGMGRL show configuration. MSSQL Always On: Create AG via SSMS, sync async, use dashboard for lag checks. Test switchovers quarterly for RHB's 24x7 needs.
- **Walk through applying patches with minimal downtime.**
Oracle: Stage PSU via opatch apply, rollback-enabled, during maintenance window; use rolling upgrade for RAC. MSSQL: Slipstream CU with SP, use cluster-aware updating. Always test in dev, verify with health checks post-reboot.
- **How do you optimize database hot spots and I/O?**
Identify via Oracle V$SESSION_WAIT (direct path read) or MSSQL sys.dm_io_virtual_file_stats. Move hot objects to faster storage (SSD ASM), tune db_file_multiblock_read_count, and use partitioning for large banking tables.
## Performance & Monitoring
- **How do you analyze stats and metrics proactively?**
Oracle: Generate AWR during peak, check wait events/top SQL. MSSQL: Query DMVs like sys.dm_exec_query_stats. Automate baselines, alert on >80% CPU/IO, tune indexes to avoid contention in transaction-heavy apps.
- **Troubleshoot a slow banking app like Accounts Payable.**
Step 1: Check alert logs/ERRORLOG for ORA-XXXX or severity errors. Step 2: Run tkprof/execution plans. Step 3: Validate stats/indexes, bounce if needed. Escalate to Oracle Support with SR if unresolved.
## Security & Compliance
- **Ensure compliance with BNM, ISO 20000, and Group Security?**
Implement TDE for data-at-rest, roles over users, audit SYS via unified auditing. Align backups to ITSM checklists, conduct vuln scans quarterly, and document for BNM audits—critical for RHB's financial data.
- **Handle disaster recovery backups?**
Oracle RMAN multilevel incremental to tape/NAS, validate with RESTORE...PREVIEW. MSSQL: Native full/diff/log to URL (S3), test point-in-time recovery. Target RPO<1hr, RTO<4hrs via cross-region replicas.
## Behavioral & Scenario-Based
- **Resolve a critical RDBMS error under pressure?**
Once, ORA-01555 snapshot too old in a payroll run: Flushed undo stats, increased retention, rebuilt index. Escalated to Oracle MOS, resolved in 2hrs—no data loss. Documented RCA for team rotation.
- **2 AM page: Core banking app down—what's first 10 mins?**
1-min: Acknowledge, check OEM/SSMS alerts. 2-5 mins: Validate connectivity (tnsping/sqlcmd), logs for crashes. 6-8 mins: Restart listener/service if hung. 9-10 mins: Failover to standby, notify ops. Root cause next.
- **How do you handle documentation for contracts?**
Use Confluence for SOPs (e.g., patching runbooks), version via Git. Include diagrams (Visio for HA flows), train via walkthroughs. Ensures smooth handover for 12-month renewable roles at RHB.
## Certifications & Experience
- **ITIL V3 in DBA ops?**
Applied Incident/Change Management: CAB approval for patches, root cause in Problem Mgmt. ITIL ensures ITSM alignment for ISO 20000 compliance in banking SLAs.
- **Enterprise experience in finance?**
Managed 50+ instances (Oracle 19c/Linux, MSSQL 2019/Windows) for a bank: Handled 10TB growth yearly, tuned for 1M TPS peaks, with zero downtime via rolling ops.
Sure here are few more questions:
Advanced Architecture
What is Oracle RAC vs. single-instance, and when for RHB's multi-site setup?
RAC uses shared storage with multiple nodes via cluster interconnect for scalability; single-instance is simpler but single-point failure. Recommend RAC for RHB's high-availability banking transactions across Selangor sites, using SCAN listeners for load balancing.
MSSQL filegroups vs. Oracle tablespaces for financial tables?
Filegroups group files logically (e.g., PRIMARY for indexes on SSD); tablespaces manage physical extents. For banking, place hot ledgers in fast filegroups/tablespaces with partitioning to handle 1M+ TPS without I/O bottlenecks.
Oracle CDB/PDB vs. MSSQL contained DBs for security?
CDB/PDB isolates pluggable DBs within one container; contained DBs embed users/permissions. Use PDBs for RHB's multi-tenant compliance, applying common roles for BNM audit while PDBs isolate client schemas.
Troubleshooting & Errors
Diagnose ORA-600 vs. MSSQL severity 17+ in production?
ORA-600: Check alert.log/trace, query V$DIAG_ALERT_EXT, upload to MOS for arg analysis. MSSQL: Review ERRORLOG, run DBCC CHECKDB. Stabilize with restart if needed, then block recovery—resolved payroll outage in 45 mins once.
RAW device expansion without downtime?
Oracle ASM: Add disks to diskgroup online (asmcmd admkdg), rebalance. MSSQL: Add files to filegroup dynamically. Forecast via V$DATAFILE_SIZE, plan 20% buffer for RHB's growth, test in non-prod first.
Database corruption recovery?
Oracle: RMAN VALIDATE + BLOCKRECOVER from backup. MSSQL: DBCC CHECKDB WITH REPAIR_ALLOW_DATA_LOSS as last resort, restore to alternate filegroup. Target RTO<4hrs with validated DR drills quarterly.
Monitoring & Tools
Third-party tools for 24x7 alerting?
OEM Cloud Control for Oracle (alerts on waits), SSMS + SCOM for MSSQL. Integrated Dynatrace for end-to-end; scripted thresholds (CPU>80%) paged rotation team—achieved 99.99% uptime in prior bank role.
Automated stats gathering?
Oracle: DBMS_STATS.GATHER_SCHEMA_STATS with incremental prefs. MSSQL: AUTO_UPDATE_STATS async. Schedule nightly windows, auto-tune async for banking peaks to prevent stale plans causing contention.
Alert log parsing for proactive detection?
Use adrci/show alert -tail, grep ORA- or LF_SYNC waits. Script logminer for clustered RAC; set up Splunk forwarding. Caught undo issues early, avoiding 2AM pages in 24x7 rotation.
Cloud & Modernization
Oracle RDS or Azure SQL Managed Instance experience?
Migrated Oracle to RDS with DMS, automated snapshots for HA. Azure SQL MI for MSSQL lift-shift, managed patching. Hybrid-ready for RHB: Blue-green deploys, cost 30% savings vs. on-prem.
On-prem to Exadata/Azure migration zero data loss?
Oracle: Data Guard switchover to Exadata. MSSQL: Log shipping + AG sync to Azure. GoldenGate for live sync during cutover, validate rowcounts post-migration—zero downtime for 5TB finance DB.
Containerizing DBs for dev/test?
Docker Oracle-XE for dev, Kubernetes StatefulSets with MSSQL persistence. Use for RHB pipelines: Helm charts for spin-up, GitOps for configs—cut test env setup from days to minutes.
Operations & Teamwork
OS-level tuning for DBAs?
Linux: ASMLib for Oracle ASM, hugepages for SGA. Windows: IO affinity masks for MSSQL NUMA. Tuned Linux ulimits/io_uring for 20% I/O gain in transaction logs.
Vendor escalation process?
Log SR via MOS/Portal with traces/diags. Severity 1 for prod impact, provide 10046 traces. Escalate to TAM if SLA breach—resolved ORA-600 in 4hrs vs. 24hr standard.
DBA success metrics?
Uptime 99.99%, MTTR<30min, backup success 100%, growth forecast accuracy ±10%. Tracked via custom dashboard, presented quarterly to ops leads.
Scenario Deep Dives
CPU spikes from backups in Always On?
Offload full backups to secondary replica (READ_ONLY routing), throttle via MAXDOP/backup compression. Configured readable secondary, reduced primary load 40% during peaks.
'Log file sync' waits during peak hour?
Commit ratio too high—batch commits. 2) Slow storage—SSD redo logs. 3) LGWR overload—add CPUs/async. Fixed in finance app, cut waits from 500ms to 10ms.
Handover docs to prevent repeated incidents?
Confluence runbooks + post-incident RCAs in Jira. Include decision trees, scripts in Git. Weekly rotation handovers—reduced repeat issues 70% in team of 5.
No comments:
Post a Comment