Engineering Best Practices #
This document outlines foundational best practices for how we work as an engineering team. It’s a living document and will evolve as we learn and grow. Our goal is to build high-quality software efficiently, foster a collaborative environment, and ensure our systems are robust and maintainable.
How We Prioritize Work #
We prioritize tasks to ensure we’re moving fast where it matters most while staying unblocked and responsive. Effective prioritization helps us deliver value quickly and consistently.
Priority Order (High-Level) #
- P00 - Critical Production Issues / Blockers – Drop everything. These are bleeding issues affecting customers or halting development for multiple team members. Resolve ASAP.
- POC Items – Proof of concepts are key to winning new business and validating new strategic directions. Make them successful and fast, focusing on the core objectives.
- Urgent Customer Requests / High-Impact Bug Fixes (P0) – Fixing what’s broken for customers, addressing security vulnerabilities, or resolving issues with significant user impact.
- Roadmap Items & Committed Projects – These are planned features and improvements that follow customer pain points, strategic goals, and long-term plans.
- Feature Requests & Other Bug Fixes (P1-P2) – Valuable additions or less critical bugs. These typically have fuzzier timelines and are prioritized accordingly.
- Technical Debt & Improvements – Proactive work to improve code quality, system performance, developer tooling, and reduce future risks. This should be continuously integrated into sprints/cycles.
Priority Labels #
- P00 (Critical) – System down, critical security vulnerability, major data loss/corruption, or a blocker preventing multiple team members from working. Requires immediate, all-hands-on-deck attention.
- P0 (Urgent) – Severe bug significantly impacting user experience or core functionality for many users, time-sensitive deliverables, or critical tasks blocking project milestones. Needs to be addressed within days (or hours, if specified).
- P1 (High) – Important bug affecting some users or non-critical functionality, or important features/tasks that should be addressed in the current or next cycle.
- P2 (Medium) – Minor bug, small feature enhancement, or task that would be good to do. Can be deferred if necessary.
- P3 (Low) – Cosmetic issues, minor optimizations, or ideas for future consideration.
Unblocking Is a Priority #
- If someone is blocked on you, make it your top priority to unblock them first before returning to your own P0/P1 tasks.
- If you are blocked, surface the blocker clearly and early. Proactively seek help or information to get unblocked. Document the blocker in the relevant task/ticket.
- Most project delays happen due to blockers, not just task complexity—resolving these swiftly is key.
- Communicate proactively if priorities shift or if estimated timelines for high-priority tasks are longer than expected.
Timeline Management #
We operate on timelines driven by customer expectations, market opportunities, and internal planning. Demo dates and deadlines keep us on track and accountable.
Internal vs. External Demos/Deadlines #
- External timelines/demos are customer-facing or have significant business implications. These are typically hard deadlines and require careful planning and risk management.
- Internal demos/checkpoints are smaller milestones to track progress, gather feedback, and make course corrections. They help ensure alignment and identify issues early.
How to Reliably Deliver #
- Understand Requirements Clearly: Ensure you understand the “what” and “why” before diving into the “how.” Ask clarifying questions.
- Break Down Large Tasks: Decompose epics and large stories into smaller, manageable, and estimable tasks.
- Estimate Scope Realistically: Provide honest estimates, and include buffers for unforeseen issues. Re-estimate if scope changes.
- Identify and Manage Dependencies: Coordinate with teammates and other teams for any dependencies. Highlight risks associated with dependencies.
- De-risk Early: Tackle the riskiest or most uncertain parts of a feature first (e.g., new technology, complex integrations).
- Plan for Overhead: Account for:
- Code reviews (both giving and receiving)
- Writing tests (unit, integration, e2e)
- Integration with others’ work
- Deployment and potential rollbacks
- Time to fix bugs or adjust based on feedback
- Documentation
- Define “Done” Clearly: Ensure every task has a clear definition of done, including testing, documentation, and any necessary approvals.
- Communicate Progress and Impediments: Regularly update task statuses and proactively communicate any delays or issues that might impact timelines.
- Manage Scope Creep: If new requirements emerge mid-cycle, assess their impact on the timeline and communicate clearly. Decide consciously whether to include them or defer.
Fixing Bugs #
Bug fixes are most effective when they are thoughtful, well-understood, and prevent regressions.
Understand Before Acting #
- Reproduce the issue reliably: Capture steps to reproduce, environment details, and expected vs. actual behavior.
- Diagnose the root cause: Use logs, metrics, debugging tools, and code analysis. Don’t just treat symptoms.
- Verify understanding: Discuss complex bugs with a teammate if unsure.
- Avoid guesswork—seek clarity before writing a fix.
Performance Issues #
- Performance is often limited by waiting (I/O, network, locks) or doing too much work (inefficient algorithms, excessive computation).
- Determine which is the bottleneck using profiling tools and metrics.
- Investigate database queries, API latency, CPU/memory usage, etc.
- Measure before and after optimizing: Ensure your changes actually improve performance and don’t introduce regressions.
Logging for Insight #
- If something is unclear, add meaningful logs to shed light on execution flow, variable states, and errors.
- Good logging today saves hours of debugging tomorrow. Ensure logs are structured and contain relevant context.
- Be mindful of logging sensitive information.
The Fix Itself #
- Write a regression test: Once a bug is fixed, add an automated test that reproduces the bug to ensure it doesn’t reappear. This is crucial.
- Consider the blast radius: Understand the potential impact of your fix on other parts of the system. Test accordingly.
- Keep fixes focused: Address the specific bug. Avoid unrelated refactoring or changes in the same commit/PR unless directly related.
- Log the fix: Ensure your commit message and any related tickets clearly explain the bug, the root cause, and the implemented fix.
Testing #
Comprehensive testing is fundamental to software quality, enabling us to iterate quickly and with confidence.
Types of Tests #
- Unit Tests: Test individual functions, methods, or classes in isolation. They should be fast and numerous.
- Integration Tests: Test the interaction between components, services, or modules (e.g., service to database, inter-service communication).
- End-to-End (E2E) Tests: Test entire user flows and system behavior from the user’s perspective. These are typically slower and more complex.
- Performance Tests: Measure responsiveness, stability, and scalability under load.
- Security Tests: Identify vulnerabilities in the application.
Best Practices #
- Write Tests as You Code: Ideally, write tests before or alongside your feature code (TDD/BDD is an option).
- Aim for High, Meaningful Coverage: Strive for good test coverage, but focus on testing critical paths and complex logic rather than just hitting a percentage.
- Tests as Documentation: Well-written tests serve as executable documentation, showing how components are intended to be used.
- Automate Testing: Integrate tests into the CI/CD pipeline to catch regressions early and automatically. Failed tests should break the build.
- Keep Tests Independent and Fast: Tests should be able to run in any order and execute quickly to provide fast feedback.
- Test Edge Cases and Error Conditions: Don’t just test the “happy path.”
Code Reviews #
Software engineering is collaborative. Code reviews help us maintain quality, share knowledge, catch issues early, and improve as engineers.
For the Author #
- Keep PRs/MRs Small and Focused: Reviewing large changesets is difficult and error-prone. Aim for changes that address a single concern.
- Write Clear Descriptions: Explain the “what” and “why” of your changes. Link to relevant tickets or design docs.
- Self-Review First: Before submitting, review your own code as if you were the reviewer. You’ll often catch mistakes yourself.
- Guide the Reviewer: Point out areas needing specific attention or parts that are complex.
- Respond to Comments Constructively: Be open to feedback. Discuss and address comments thoughtfully.
For the Reviewer #
- Review Promptly: Your review unblocks others. Aim for a reasonable turnaround time (e.g., within a few business hours or by EOD). Communicate if you can’t review quickly.
- Understand the Context: Read the PR description, linked tickets, and design docs. Understand what the code is trying to achieve.
- Review for Correctness and Clarity:
- Look for obvious bugs, logic errors, typos, style issues.
- Is the code easy to understand and maintain?
- Are there sufficient and meaningful tests?
- Does it solve the actual problem effectively and efficiently?
- Does it match the design doc or agreed-upon approach?
- Are assumptions clear and reasonable?
- Are there any performance or security implications?
- Does it introduce unnecessary complexity?
- Ask Questions: If something is unclear, ask. Don’t assume.
- Be Constructive and Respectful: Focus on the code, not the author. Offer suggestions, explain your reasoning, and be kind. Frame feedback as suggestions or questions.
- Think Long-Term: Would you be comfortable maintaining this code in the future? If not, explain why.
- Avoid Rubber-Stamping: Approval should reflect genuine understanding and confidence in the change.
- Balance Thoroughness with Pragmatism: Provide comprehensive feedback, but also be mindful of project timelines. Focus on the most important issues.
Reviewing code is one ofr the most impactful ways to improve product quality, spread team knowledge, and mentor others.
Design Documents (Tech Design) #
Design documents (Tech Design Documents - TDDs) are essential for clarifying thinking, soliciting feedback, aligning stakeholders, and documenting decisions before significant code is written—when changes are cheap and easy. They are a critical tool for ensuring robust, scalable, and maintainable solutions, and they form the blueprint for implementation.
Why Write a Design Doc? #
- Clarify your own thinking: Systematically explore the problem space and different solution pathways.
- Promote rigorous engineering: Ensure solutions are well-reasoned, consider edge cases, and account for non-functional requirements.
- Get early feedback: Catch problems, uncover unknowns, identify risks, and improve the design collaboratively before implementation. This is when changes are least costly.
- Ensure shared understanding & alignment: Build a common understanding of the problem, the proposed solution, its implications, and the plan of action across the team and with stakeholders before significant effort is invested.
- Align with long-term strategy: Ensure individual solutions contribute positively to the overall architecture and technical vision.
- Document trade-offs, assumptions, and key decisions: Provide crucial context for future reference, new team members, and operational support.
- Communicate effectively: Clearly articulate the problem, proposed solution, and impact to stakeholders, collaborators, and your future self.
- Serve as an Implementation Proposal: A strong design doc naturally evolves into an implementation plan, outlining the path forward.
The Design Thinking Process: A Guideline #
The goal of this process is to ensure your design document tells a clear, logical story, making the implicit explicit, so that a technically competent reader can follow your reasoning from problem to solution. Before diving into writing the document itself, follow this thinking process:
- Investigate & Understand Deeply (The “What Is”):
- Thoroughly investigate the problem or opportunity. Reproduce bugs, gather data, review existing systems and documentation.
- Collect all relevant information and context. This phase is crucial for grounding your design in reality.
- Define Clear Requirements & Constraints (The “Must-Haves” and “Can’t-Haves”):
- Requirements: What problem are we solving (the “Why”)? Who is it for? What are user stories/use cases? What are measurable success criteria? What is explicitly not in scope?
- Constraints: What technical limitations (e.g., existing infrastructure, APIs, performance targets, chosen tech) and business limitations (e.g., deadlines, budget) must be respected?
- Envision the Ideal State (The “North Star”):
- Based solely on the requirements and constraints, what would the optimal, most robust long-term solution look like?
- Describe its key attributes and architecture without yet being limited by the current system’s implementation details. This establishes a clear target.
- Analyze the Current State (The “As-Is”):
- Document the existing system’s behavior, architecture, and limitations relevant to the problem.
- Identify specific pain points or areas for improvement. This section should provide enough context for the reader to understand the starting point.
- Identify Gaps & Opportunities:
- What are the key differences between the Current State and the Ideal State? This highlights the scope of work.
- Explore Solutions & Alternatives Critically (The “How”):
- Brainstorm multiple viable approaches to bridge the gap or solve the immediate problem.
- For each significant alternative (even if one seems obvious), critically analyze pros, cons, risks, assumptions, and trade-offs. Briefly explaining why other common alternatives were not chosen can strengthen the justification for your recommended approach.
- Propose an Incremental Path (The “Roadmap”):
- Recommend a path forward, starting with an immediate, valuable step (e.g., the urgent bug fix, MVP for a feature) that is directionally aligned with the Ideal State.
- Outline subsequent phases if needed to reach the full Ideal State.
- Document & Iterate with Feedback:
- Capture this thinking in the design document. Share early and often. Use the document as a tool for collaboration and refinement. Ensure your reasoning is evident throughout the document.
When to Write a Design Doc #
- For any non-trivial feature, change, or bug fix where the solution isn’t immediately obvious or has potential side effects.
- When introducing new services, systems, or significant architectural modifications.
- When a change has wide-ranging impact, many dependencies, or touches critical systems.
- When there are multiple viable solutions requiring discussion, trade-off analysis, and a documented decision.
- If unsure, err on the side of writing one, even if brief, or discuss with your tech lead. For very minor, low-risk changes with obvious solutions, a full TDD may be overkill; clear PR descriptions and robust testing might suffice.
Design Doc Template (Suggestion) #
This template is a comprehensive suggestion. Adapt it based on the project’s complexity and scope. Not every section will need exhaustive detail for every doc. The key is to ensure a logical flow of information that allows the reader to understand the context, requirements, decision-making process, proposed solution, and implementation plan.
- Title, Author(s), Reviewer(s), Date, Status (e.g., Draft, In Review, Approved, Implemented, Deprecated)
-
Overview & Context
- Abstract/Summary: Brief overview of the problem, the proposed solution’s core idea, and key impact. (1-2 paragraphs)
- Problem Statement: Detailed description of the issue, its impact, and why it needs to be solved. Ensure this is clear and compelling.
- Goals / Objectives: What this design aims to achieve. Quantifiable where possible.
- Non-Goals: What is explicitly out of scope.
- Relevant Requirements: Key functional and non-functional requirements guiding this design. (Summarize or link from initial investigation)
- Relevant Constraints: Key technical or business constraints. (Summarize or link)
-
Proposed Solution & Alternatives
- Current State Analysis (Brief): Summary of the existing situation relevant to this design, providing necessary context.
- Ideal State Vision (Brief): Quick reminder of the “North Star” this work aligns with, if applicable.
- Proposed Solution(s) / Approaches Explored:
- For each viable approach considered (aim for 2-3 for significant decisions):
- High-level architecture / core idea.
- Pros.
- Cons.
- Key Assumptions & Risks (include “pre-mortem” thinking: “How could this go wrong?”).
- Estimated complexity / feasibility.
- Alignment with Ideal State / long-term strategy.
- Recommended Approach & Rationale: Clearly state the chosen solution and justify why it’s preferred over alternatives, based on the trade-off analysis. Make your reasoning explicit.
- For each viable approach considered (aim for 2-3 for significant decisions):
- Incremental Path / Staging (if applicable):
- Immediate Fix / Phase 1: Detailed design for the first implementable part.
- Future Phases: High-level outline of subsequent steps to reach the full vision.
-
Detailed Design (of Recommended Immediate Fix / Phase 1)
- Detailed architecture for key components.
- Data models, schema changes, API contracts (if applicable).
- Key algorithms, logic flows.
- Diagrams (architecture, sequence, state, data flow, etc. – use what clarifies). Ensure diagrams are well-explained.
-
Impact & Considerations
- Impact Analysis:
- Impact on other systems, teams, or user experience.
- Data migration or backfill strategy.
- Non-Functional Requirements (Address explicitly for chosen solution):
- Performance, Scalability, and Reliability (e.g., expected load, latency targets, error handling, fault tolerance).
- Security and Privacy (e.g., data protection, authentication, authorization, threat modeling).
- Operational Considerations (monitoring, logging, alerting, deployment strategy, rollback plan, on-call impact, diagnosability).
- Testing Strategy: How will this be tested (unit, integration, E2E, performance, security, etc.)? Key test scenarios.
- Impact Analysis:
-
Planning & Follow-up (Implementation Plan)
- Implementation Plan / Milestones: Key steps for execution (high-level, or more detailed if the design is sufficiently mature). Break down the work into manageable tasks or phases.
- Identify dependencies between tasks or on other teams.
- Consider a rough timeline or effort estimation if possible at this stage.
- Define key checkpoints or deliverables.
- Rollout Plan: How will this feature/change be deployed to production? (e.g., phased rollout, feature flag, A/B test, all at once).
- Monitoring & Verification Plan: How will you confirm the change is working as expected post-deployment and that it meets the goals?
- Open Questions / Unresolved Issues: Items needing further investigation or decisions.
- Future Considerations / Potential Next Steps: (Beyond planned phases).
- Implementation Plan / Milestones: Key steps for execution (high-level, or more detailed if the design is sufficiently mature). Break down the work into manageable tasks or phases.
- Decision Log: Record significant decisions made during the design process and their rationale. (This item remains a bullet as it’s typically a single, distinct section rather than part of the numbered flow).
Guidance Notes #
- Living Document: Design docs should be updated as assumptions change, new information arises, or decisions evolve during implementation.
- Seek Feedback Actively: Solicit reviews from relevant stakeholders, domain experts, and peers early and throughout the process. Be open to iterating on the design.
- Clarity, Cohesion, and Self-Sufficiency:
- Strive for clarity. Use diagrams effectively. Be thorough but avoid unnecessary jargon or verbosity.
- Ensure a cohesive narrative: the document should logically guide the reader from the problem statement through the analysis and trade-offs to the proposed solution and implementation plan. Each section should build upon the previous ones.
- Aim for self-sufficiency: The document should provide enough context and explanation that it can be understood without requiring significant verbal clarification from the author. Make implicit assumptions and reasoning explicit.
- Audience Awareness: Consider who will be reading and reviewing the document (e.g., fellow engineers, product managers, leadership) and tailor the level of detail and language accordingly, while maintaining technical rigor.
- Embrace Iteration and Evolving Clarity: It’s okay if you don’t have every detail of a complex, multi-phase solution perfectly mapped out from the start. Clearly understanding the Current State and having a well-defined Ideal State (North Star) allows the team to define and confidently execute valuable initial steps or phases. Subsequent phases can be outlined at a higher level and remain somewhat “fuzzy.” The clarity gained from implementing and learning from earlier phases will inform and refine the approach to later ones. The design document should reflect this, focusing on detailed plans for immediate work while outlining a directional path for the future.
- Tailor as Needed: This template is a comprehensive suggestion. Adapt it based on the project’s complexity and scope. For smaller changes, a more lightweight version might be appropriate, but the core principles of clear problem definition, solution proposal, and impact consideration should still apply.
Version Control (Git) Best Practices #
Effective use of version control is crucial for collaboration, tracking history, and managing codebases.
Commits #
- Atomic Commits: Each commit should represent a single logical change. Avoid bundling unrelated changes.
- Meaningful Commit Messages:
- Follow a consistent format (e.g., Conventional Commits:
feat: add user login endpoint). - The subject line should be concise (e.g., <50-70 chars).
- The body (if needed) should explain the “what” and “why” of the change, not just the “how.”
- Follow a consistent format (e.g., Conventional Commits:
- Commit Often: Commit your work frequently in small, logical chunks. This makes it easier to revert changes and understand history.
Branches #
- Feature Branches: Develop new features and bug fixes on separate branches (e.g.,
preetam/product/123-new-user-profileorpreetam/fix/123-login-bug). - Keep Branches Short-Lived: Merge feature branches back into the main development line (
main) as soon as they are complete and reviewed. - Update Regularly: Keep your feature branches up-to-date with the main development branch by rebasing (preferred for cleaner history before merging) or merging regularly to avoid large, complex merge conflicts.
- Branch Naming Conventions: Follow a team-agreed naming convention for branches.
Merging #
- Ensure Tests Pass: All automated tests should pass before merging a branch.
- Code Review Approval: Ensure the code has been reviewed and approved according to team guidelines.
Communication & Collaboration #
Effective communication is the bedrock of a successful engineering team.
- Be Proactive: Don’t wait for problems to escalate. Communicate risks, blockers, and updates early and often.
- Choose the Right Channel:
- Async (Slack, Comments in Tickets/PRs): For non-urgent updates, questions, and discussions that can be picked up later.
- Sync (Meetings, Huddles, Calls): For urgent issues, complex discussions, brainstorming, and decision-making that benefits from real-time interaction.
- Be Clear and Concise: Get to the point. Provide necessary context but avoid jargon where possible, especially with non-technical stakeholders.
- Document Decisions: Record important decisions, their rationale, and action items, especially those made in meetings. Use wikis, tickets, or design docs.
- Assume Good Intent: Approach discussions with a positive and collaborative mindset.
- Knowledge Sharing: Actively share knowledge through pairing, documentation, internal talks (brown bags), and mentoring.