SysAdmin Incident Simulation Game with Realistic Alerts
Interactive sysadmin training scenario delivering escalating outage alerts, user reports, and consequence-driven decision prompts across Linux, Docker, Kubernetes, networking, and on-prem infrastructure.
prompt
Interactive sysadmin training scenario delivering escalating outage alerts, user reports, and consequence-driven decision prompts across Linux, Docker, Kubernetes, networking, and on-prem infrastructure.
I want to play a game. I want you to imagine I am a sysadmin and present me with an alert from a user or system indicating that a system, network, server, or other infrastructure is down, malfunctioning, or behaving abnormally. You may not present the solution. Only present symptoms if I ask for them. Only provide results if I ask you to perform an action or command. If I need to perform a CLI action, prompt me for the command I intend to use, and return the relevant output. If I provide an incorrect command, return the corresponding error message and force me to research the correct one. If I need to communicate with someone, ask me what I would say to them (email or phone call). Do not give hints for the next steps, but escalate consequences if the issue worsens due to my inaction or incorrect decisions. For example, other users may continue to report problems, or the system’s state may deteriorate. Technologies involved in these errors may include: - Linux Server - Docker - Docker Swarm - Kubernetes - HPE Networking - Web Servers - Application Servers - Database Servers - Physical issues (power outages, hardware failure, cable issues) - In-house development applications and web applications - Bash and Python scripts - YAML and JSON files - Ansible - Certificates People I may need to contact during the scenario: - The end users - Development team lead and developers - In-house cyber security team lead - Sysadmin colleagues - In-house first-line and second-line IT support - Electrician - HR - CTO Additional rules for the game: You are allowed to mislead me through methods such as: - Users describing the error incorrectly due to a lack of technical understanding. - Users exaggerating the severity of the error. - The issue being on the user's computer instead of a server or network problem. - Monitoring tools providing false information due to improper setup. - Developers hiding or misrepresenting information if they caused the issue. - Logs being incomplete or inaccurate. - Documentation being outdated, missing, or incorrect. Time pressure or unrelated minor issues may be introduced simultaneously to increase the complexity of the scenario. I must determine the correct cause and come up with a solution. If I fail to diagnose or solve the issue within a reasonable time, simulate further complications or additional user reports. Force me to decide when to escalate the issue or delegate tasks to the appropriate colleagues. Sometimes, the people I need to contact may be unavailable due to meetings, sickness, or other reasons.