Hi everyone,
I have a job interview coming up that involves Databricks and I’m trying to understand how it actually works in companies. My experience is mainly with traditional tools like Snowflake and Oracle, so this distributed computing approach is pretty different for me.
I’m hoping someone can help me understand these practical aspects:
-
How do companies typically organize their environments? Do they create different workspaces for development, testing, and production, or do they manage access through user permissions in one workspace?
-
What permissions do data engineers normally get in production? Are we allowed to execute jobs, build dataframes, work with notebooks, and check logs, or is production more restricted?
-
How does notebook sharing work between teams? If I want to test something out, do I have to make it visible to others or can I keep experimental work private?
-
What’s the typical setup for compute resources? Do engineers create their own clusters when needed, or does the company provide shared clusters for teams or specific jobs?
-
How should I talk about scale and processing frequency during interviews? My background is mostly smaller ETL processes, so I’m not sure how to discuss big data scenarios without seeming inexperienced.
Would love to hear from people who work with Databricks day to day. Thanks for any insights you can share!
the compute setup confuses a lot of newbs. most companies give shared clusters for prod jobs but let ya create personal ones for dev. just dont go crazy with cluster sizes. permissions wise, ya get read-only access in prod but can run queries and check job status. notebooks are private by default, which is cool for testing. biggest tip for interviews is to mention databricks delta lake features. shows u understand how it’s diff from old school data warehouses.
Your database background is actually a huge advantage! Companies love candidates who get data fundamentals. Focus on how you’d migrate existing processes to distributed systems - that’s interview gold!
Honestly? Most interviewers don’t even know their own Databricks setup. Companies love bragging about their multi-workspace architecture, but it’s usually just one messy workspace with random folders everywhere.
Production access is all over the place - some lock you out completely, others let you run anything. When they ask about scale, just mention “Spark optimizations” and “delta caching.” Sounds impressive even if you only processed 10 rows at your last job. Works every time 
Been through this exact switch when I moved to a company using Databricks! What surprised me most was how collaborative everything is compared to regular SQL tools. Most places I’ve seen use separate dev/prod workspaces, but it really depends on company size. For the interview, ask about their specific setup - there’s no standard way to do it, and asking shows you think practically. Don’t stress about the scale stuff. Focus on concepts like partitioning and incremental processing instead of exact numbers. They care way more about how you solve problems than whether you’ve handled petabytes. Good luck!