Sometimes there is no TL;DR. A software engineer must experience and understand and iterate the process over and over again to realize the core values. There is no shortcut. Hard earned skills determine the future so we will not have any quick summary here.
This picture defines how I feel every time I have to go through the basics again and again, and again. Sometimes it’s frustrating, sometimes I feel like I have reached the dead-end. I questions myself a lot like: Was I just doing it wrong? Was it even necessary? As instead should I not be following the flock to enumerate what’s trending in my portfolio?
In the retrospective, every time I went through this iterative process, I learned something more, something new that I did not understand completely the first time because there were some dots missing that I didn’t even know existed and I would need to know to complete the puzzle picture. So I guess this is called “experience” to be able to know that those dots exist and it takes time to connect them, and “expertise” means to be able to find the right connection.
This article is about some of the core values that have helped me in my more journey and other’s become a better software engineer.
Here is the list of the characteristics of a successful software engineer.
“Quality begins with the intent.” — W. Edwards Deming
Every software engineer wants to add the highest qualities in their software but only few work their way towards achieving it because many do not either know the qualities (and take it seriously) or do not understand the trade-offs that needs to be selected while designing the system. Further more, the definition of quality depends on understanding of the product, that depends on the domain knowledge the developer or the team has.
While the resources are limited (including time, money and qualified human resource availability), therefore, there trade-offs are made consciously or unconsciously by the engineering team as the team selects qualities in the deliverables as per their understanding and the team’s technical knowledge. Hence, it is very much important to understand those qualities as they drive what we build.
Figure: The product quality model defined in ISO/IEC 25010
As a part of best engineering practice, engineers must understand the quality attributes and implement as required them in their design and development process.
Some of the activities they can do to benefit from the use of the quality models include:
- Identifying software and system requirements.
- Validating the comprehensiveness of a requirements definition.
- Identifying software and system design objectives.
- Identifying quality control criteria as part of quality assurance.
- Identifying acceptance criteria for a software product.
- Establishing measures of quality characteristics in support of these activities.
Software development is not a straightforward process as there are inter-dependent things that might also depend on the previous processes or external entities which directly or indirectly influences the next steps. So working in a cross-skilled diverse team would benefit the developer’s systems understanding and professional growth.
Engineering culture is most important aspect of any team that helps each and every team member to grow professionally and personally, the quality to the team is reflected in the system. Therefore, a good engineering culture is a must for the better future of the organization.
Worships Computer Science Basics
“I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.” — Linus Torvalds
“Algorithms + Data Structures = Programs” — Niklaus Wirth
Computer science basics knowledge of hardware, data structures, algorithms are the most important to anyone who wants to become a good software engineer. We design the system to run on some specific hardware, and communicate to external resources via some network. Therefore, in order to build quality software, we must understand these core units as deeply as we can.
A common developer does not need to understand hardware deeply compared to a kernel developer or embedded device programmer. But still understanding different types of computer memory (RAM, ROM), storage devices especially hard disk drives (HDD, SDD) and CPU processing and communication mechanisms would be useful when designing more complex solutions that implements algorithms for parallel and concurrent processing in multi-core CPU.
Figure: Block diagram of a basic computer with uni-processor CPU. Black lines indicate data flow, whereas red lines indicate control flow. Arrows indicate the direction of flow.
90% of Systems Problem Solving = Proper Usage Of Data Structure — Personal Experience
Data structure is a data organization, management, and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.
Data structures serve as the basis for Abstract Data Types (ADT). ADT defines the logical form of the data type. The data structure implements the physical form of the data type.
Different types of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks. For example, relational databases commonly use B-tree indexes for data retrieval, while compiler implementations usually use hash tables to look up identifiers. Therefore they provide a means to manage large amounts of data efficiently for uses such as large databases and internet indexing services. Usually, efficient data structures are key to designing efficient algorithms. Some formal design methods and programming languages emphasize data structures, rather than algorithms, as the key organizing factor in software design.
Data structures can be used to organize the storage and retrieval of information stored in both main memory and secondary memory. They are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by a pointer — a bit string, representing a memory address, that can be itself stored in memory and manipulated by the program. Thus, the array and record data structures are based on computing the addresses of data items with arithmetic operations, while the linked data structures are based on storing addresses of data items within the structure itself. Many data structures use both principles, sometimes combined in non-trivial ways (as in XOR linking).
Choosing The Right Data Structure
- Do you need random access?
- Do you perform a lot of insertions? How about deletions?
- Do you allow duplicates?
- Are you searching for elements frequently?
- Does your data need to be ordered?
- Would you need to traverse the elements?
- How big is your data?
Figure: Big-O of various data structures
Big-O is a measure of how quickly an algorithm runs or solves a problem (note it is the upper bound of how long the run time is).
Common Big-O definitions listed below:
A list of instructions for solving a problem. A set of mathematical instructions or rules that, especially if given to a computer, will help to calculate an answer to a problem. — General definitions.
Many developers come up with a simple pseudo code and convert it to code without the prior knowledge of problem domain categories generally using the simple brute-force methods because they are unable to define the problem criteria and they are unaware of other methodologies to that might be useful to identify and solve those problems which requires such advanced knowledge. Therefore, it is necessary to skill-up in the algorithm design skills by improving their problem domain knowledge.
One of the crucial aspects of problem solving is understanding the problem and determining its nature. Understanding at least P and NP problem categories helps to solve general problems. NP Complete, NP Hard problems categories are rarely solved in even the most hard cases.
P (Polynomial) Problems
P problems refer to problems where an algorithm would take a polynomial amount of time to solve, or where Big-O is a polynomial (i.e. O(1), O(n), O(n²), etc). These are problems that would be considered ‘easy’ to solve, and thus do not generally have immense run times. Many algorithms complete in polynomial time such as:
- All basic mathematical operations; addition, subtraction, division, multiplication
- Testing for primacy
- Hash table lookup, string operations, sorting problems
- Shortest Path Algorithms; Dijkstra, Bellman-Ford, Floyd-Warshall
- Linear and Binary Search Algorithms for a given set of numbers
NP (Non-deterministic Polynomial) Problems
NP problems were a little harder for me to understand, but I think this is what they are. In terms of solving a NP problem, the run-time would not be polynomial. It would be something like O(n!) or something much larger. However, this class of problems can be given a specific solution, and checking the solution would have a polynomial run-time. Integer Factorization and Graph Isomorphism are two examples of NP problems.
Traveling Salesman, Knapsack, and Graph Coloring are examples of NP Complete and NP Hard examples.
Standard Steps In Algorithms Development Process
- Problem definition
- Development of a model
- Specification of the algorithm
- Designing an algorithm
- Checking the correctness of the algorithm
- Analysis of algorithm
- Implementation of algorithm
- Program testing
- Documentation preparation
Embraces Agile As A Discipline
Remember, agile requires significantly greater discipline!
“Individuals And Interactions Over Processes And Tools” — The Agile Manifesto
The most important part of the above line is “interactions”. Systems are defined by the way people interact. These interactions drive the behavior of the system. If you want to change behavior, then change interactions. This is why it talks about “business people and developers must work together daily throughout the project” and “conveying information in face-to-face conversation” and “self-organizing teams”.
Figure: Agile process overview
It is also why Agile transformation results in conversations around cross functional teams, the DevOps movement and organizational design. These are all focusing on improving the business by improving the “interactions” between individuals.
Processes and tools force certain interaction models. These can either be helpful, and enable collaboration, or create barriers to it.
To improve the system we must look at the interactions. Between roles, specialisms, teams, stakeholders, users, customers. Experiment with ways to improve them. This will improve our outcomes.
Figure: Human communication complexity as a reference to modern distributed systems networks
Therefore, it is very important for the engineer to understand Agile Methodology as a people-focused, results-focused approach to software development that respects our rapidly changing world. It’s centered around adaptive planning, self-organization, and short delivery times. It’s flexible, fast, and aims for continuous improvements in quality, using tools like Scrum and eXtreme Programming.
It works by first admitting that the old “waterfall” method of software development leaves a lot to be desired. The process of “plan, design, build, test, deliver,” works okay for making cars or buildings but not as well for creating software systems. In a business environment where hardware, demand, and competition are all swiftly-changing variables, agile works by walking the fine line between too much process and not enough.
Figure: Agile vs Traditional Waterfall Development Style
Writes Clean Code
“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” — Martin Fowler
“Code is more often read than written” — Guido van Rossum
I cannot emphasize more about the value of writing human readable code than the text excerpt from the book “Clean Code” by Robert C. Martin.
“As the mess builds, the productivity of the team continues to decrease, asymptotically approaching zero. As productivity decreases, management does the only thing they can; They add more staff to the project in hopes of increasing productivity. But that new staff is not versed in the design of the system. They don’t know the difference between a change that matches the design intent and a change that thwarts the design intent. Furthermore, They, and everyone else on the team, are under horrendous pressure to increase productivity. So they all make more and more messes, driving the productivity ever further toward zero.
A programmer without “code-sense” can look at a messy module and recognize the mess but will have no idea what to do about it. A programmer with “code-sense” will look at a messy module and see options and variations. The “code-sense” will help that programmer choose the best variation and guide him or her to plot a sequence of behavior preserving transformations to get from here to there.
Some School Of Thoughts
I like my code to be elegant and efficient. The logic should be straightforward to make it hard for bugs to hide, the dependencies minimal to ease maintenance, error handling complete according to an articulated strategy, and performance close to optimal so as not to tempt people to make the code messy with unprincipled optimizations. Clean code does one thing well. — Bjarne Stroustrup
Clean code is simple and direct. Clean code reads like well-written prose. Clean code never obscures the designer’s intent but rather is full of crisp abstractions and straightforward lines of control. — Grady Booch.
Implements Test Driven Development(TDD)
“Code without tests is bad code. It doesn’t matter how well written it is; it doesn’t matter how pretty or object-oriented or well-encapsulated it is. With tests, we can change the behavior of our code quickly and verifiably. Without them, we really don’t know if our code is getting better or worse.” — Michael C. Feathers
TDD is a programming philosophy that helps improve the quality of code by helping the developers improve their critical thinking process through writing tests first. But to write better tests we must also understand and improve the building blocks, functions and unit tests.
Functions building blocks of any system. We have to take care of how we write our functions so that we can assure it’s functionality is always as expected, no supersizes and magics!
General guideline to write quality functions:
- A function should be small, 8–10 lines max.
- A function should do just one thing.
- A function name should use descriptive names to accurately describe what it does.
- A function should take fewer arguments. This also helps in designing the data structure.
- A function should have no side effects. A function or expression is said to have a side effect if it modifies some state variable value outside its local environment, that is to say it has an observable effect besides returning a value to the invoker of the operation.
- A function should not use flag arguments. A flag argument is a kind of function argument that tells the function to carry out a different operation depending on its value. Split method into several independent methods that can be called from the client without the flag.
A test that verifies the behavior of the smallest part of the system or component is a unit test. It encourages good design and rapid feedback and they seem to help teams avoid a lot of trouble.
A test is NOT a unit test if:
- It talks to the database.
- It communicates across the network.
- It touches the file system.
- It can’t run correctly at the same time as any of your other unit tests.
- You have to do special things to your environment (such as editing config files) to run it.
Test Driven Development (TDD)
TDD is a software development process that relies on the repetition of a very short development cycle: requirements are turned into very specific test cases, then the software is improved to pass the new tests, only. This is opposed to software development that allows software to be added that is not proven to meet requirements.
The Three Laws of TDD:
- First Law : You may not write production code until you have written a failing unit test.
- Second Law : You may not write more of a unit test than is sufficient to fail, and not compiling is failing.
- Third Law : You may not write more production code than is sufficient to pass the currently failing test.
The extension of TDD is Behavior Driven Development(BDD) in agile methodologies that represents User Stories and gives TDD a specific format of “Given, When, Then”. In order to improve TDD skills “Red-Green Refactor“ method coined by James Shore can be very much useful.
Takes Care Of The Documentation
“Documentation is a love letter that you write to your future self.” — Damian Conway
“Working software over comprehensive documentation” is one of the Agile Manifesto’s four value statements but well maintained documentation can be a great catalyst for some of the other agile principles, like “Simplicity”, “Business people and developers work together daily”, “Attention to technical excellence and good design”, or “The team regularly reflects on how to become more effective”. Well we definitely value working software more than comprehensive documentation but the right level of documentation are invaluable artifacts. They help increase the life-expectancy by giving the correct level of quality information of the system and stops software corrosion and hence to become a legacy. There will be many questions about it such as — How much documentation is necessary? When do we do it? What are the types of documentation? How to maintain it? What tools are available?
Documentation should be done from the start of the system development process to updating the changes made. The documentation depends upon the nature of the work. For example, for architectural design visual documentation are easier, for comments are preferred for code.
Code changes are much easier to rationalize but architectural decisions are hard because architectural changes are not necessarily reflected in code. To stress this point, in his 2011 blog post about architecture decision records, Michael Nygard writes: “One of the hardest things to track during the life of a project is the motivation behind certain decisions.” But, he adds, unless you understand that rationale, you can only either accept the decision blindly or reject it blindly. Writing down the reasoning, context, and consequences of a decision right after you take it can hugely improve future decision-making.
“Your most important architecture decisions might be the ones you didn’t know you made. Conscious decision making is a major step towards better architecture. Unconscious decisions often come in the form of assumptions. Assumptions are risky because they lead to non-requirements, those requirements that exist but weren’t documented anywhere. Tacit assumptions and unconscious decisions both lead to missed expectations or surprises down the road.” — Gregor Hohpe at The Architect Elevator
To document architecture, UML has been used predominantly for system and architectural documentation tool. With the need to document non-functional choice a new idea of Architectural Decision(AD) emerged in the last decade. AD is a software design choice that addresses a functional or non-functional requirement that is architecturally significant. An Architecturally Significant Requirement(ASR) is a requirement that has a measurable effect on a software system’s architecture and quality. An Architectural Decision Record (ADR) captures a single AD, such as often done when writing personal notes or meeting minutes; the collection of ADRs created and maintained in a project constitute its decision log. All these are within the topic of Architectural Knowledge Management (AKM). Source
The larger the scale of the system, better documentation is needed. Along with working tests, documentation updates must be tracked along with the code changes in the version control system like Git. Before any merge, like code is reviewed, documentation must also be reviewed. It is recommended to keep the code and its necessary documentations are tracked in the same repository.
Strict documentation and its management is much more important in open source projects as without it will be extremely hard for contributors to collaborate.
“Code is more often read than written.” — Guido van Rossum
“Code tells you how; Comments tell you why.” — Jeff Atwood
Code Reviews Regularly
“Computer programming is an exact science in that all properties of the program and all consequences of executing it in any given environment can, in principle, be found out from the text of the program itself by purely deductive reasoning.” — Tony Hoare
Reviews are an opportunity for others to look at our documents, design, code, software architecture and for you to inspect others’ work. They facilitate knowledge interchange. But their primary goal is to increase software quality. They help us to spot faults before they become real disasters. Therefore, code review is the most important aspect to improve the quality of the system and increase usability of the product. Correct key performance indicators(KPI) should be set and this process should be automated as much as possible. Static and dynamic analysis tools should be used to automatically test even before the commit is made on the workstation and in every pull request. Hence, standards code review process should be followed (including developers tooling such as IDE) in order to improve the quality of code via code reviews as well.
“Code without tests is bad code. It doesn’t matter how well written it is; it doesn’t matter how pretty or object-oriented or well-encapsulated it is. With tests, we can change the behavior of our code quickly and verifiably. Without them, we really don’t know if our code is getting better or worse.” — Michael C. Feathers
A lot of developers ( I mean huge numbers) are working in legacy code or are contributing randomly towards it, knowingly, unknowingly, because they still do not write clean code, forget the best engineering practices. Professionally, (in a way) I was raised by many legacy systems (thanks to the nomads) and I still feel like there more such untamed monoliths lingering, waiting for me. So like me a lot of the developers will have to review one of the legendary legacies constituting code bloats, dead code and ball-of-mud at some point in their career which is inevitable. Therefore, they have to prepare early on to face them, improve them and eventually not make them.
Legacy undying systems have a lot to teach as I have learned much more from them. But many of them who are inexperienced dealing with such mega-structures feel overwhelmed by their sheer size and they either are in the state of analysis-paralysis or end up becoming a part of the cargo-cult. Well, I started my career with such anxious moments but I was lucky enough to work with other experienced team members and many good resources, in particular I would recommend “Working Effectively with Legacy Code” by Michael C. Feathers has been of great value while facing such challenges through out my career.
I have quoted few things that tells everything that best describes legacy code from that book below.
“What do you think about when you hear the term legacy code? If you are at all like me, you think of tangled, unintelligible structure, code that you have to change but don’t really understand. You think of sleepless nights trying to add in features that should be easy to add, and you think of demoralization, the sense that everyone on the team is so sick of a code base that it seems beyond care, the sort of code that you just wish would die. Part of you feels bad for even thinking about making it better. It seems unworthy of your efforts. That deﬁnition of legacy code has nothing to do with who wrote it. Code can degrade in many ways, and many of them have nothing to do with whether the code came from another team. In the industry, legacy code is often used as a slang term for difﬁcult-to-change code that we don’t understand. But over years of working with teams, helping them get past serious code problems, I’ve arrived at a different deﬁnition. To me, legacy code is simply code without tests.
Four primary reasons to change software.
- Adding a feature
- Fixing a bug
- Improving the design
- Optimizing resource usage
Behavior is the most important thing about software. It is what users depend on. Users like it when we add behavior (provided it is what they really wanted), but if we change or remove behavior they depend on (introduce bugs), they stop trusting us.
Requirements change. Designs that cannot tolerate changing requirements are poor designs to begin with. It is the goal of every competent software developer to create designs that tolerate change. This seems to be an intractably hard problem to solve. So hard, in fact, that nearly every system ever produced suffers from slow, debilitating rot. The rot is so pervasive that we’ve come up with a special name for rotten programs. We call them: Legacy Code. — Robet C. Martin
Preserving existing behavior is one of the largest challenges in software development. Even when we are changing primary features, we often have very large areas of behavior that we have to preserve. To mitigate risk, we have to ask three questions:
1. What changes do we have to make?
2. How will we know that we’ve done them correctly?
3. How will we know that we haven’t broken anything?
How much change can you afford if changes are risky? Some teams mandate “If it’s not broke, don’t ﬁx it”. In short we can follow these steps while refactoring legacy code:
- Identify change points (Seams)
- Break dependencies
- Write the tests
- Make your changes
An important part of code review is refactoring. I have seen refactoring being so much opinionated that everyone has their own version while doing it. To standardize the refactoring I recommend “Refactoring Improving the Design of Existing Code” by Martin Fowler.
“An approach in evaluating software architecture is reasoning about the quality attributes a software architecture exhibits. Architecture review gives the general tone in these definitions is that you need to make high-level decisions about the system you’re going to build:
- What style are you going to use? What is the structure?
- How is it going to function? How do structural components of the architecture work together?
- How does it meet the needs of all the stakeholders?” — Software Architecture Review Guidelines by Alexander Nowak
Befriends Design Principles, Patterns and Architecture
Understanding the terms: design principles, design patterns and architecture can sometimes be confusing. Let’s look at it in simpler terms.
Design principles provide high level guidelines to design better software applications and they do not provide any implementation and are not bound to any programming language.
Some widely used principles are:
Do One Thing And Do It Well — Unix Philosophy
Keep It Simple Stupid (KISS)
Don’t Repeat Yourself (DRY)
You Aren’t Gonna Need It (YAGNI)
SOLID (Single Responsibilty, Open-Closed, Liskov-Substition, Interface Segregation)
Design patterns provide low level solution implementation (components of a subsystem with their relationships and collaborations with each other) for the commonly occurring problems and suggest specific implementation for the specific object oriented programming problem.
Design patterns may be viewed as a structured approach to programming that acts as an intermediate between the levels of a programming paradigm and a concrete algorithm. For example: Object-oriented design patterns typically show relationships and interactions between classes or objects, without specifying the final application classes or objects that are involved. The book “Design Patterns: Elements of Reusable Object-Oriented Software” by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides defines three types of design patterns: creational, structural and behavioral.
The book “Object-Oriented Analysis and Design with Applications” by Grady Booch can be of good help to relate and differentiate core ideas more with simple explanations and examples. His principal Principles of Hierarchy, Abstraction, Modularization, and Encapsulation(PHAME) abstracts the overall understanding.
Architecture is concerned with further higher level like the subsystems of an application with their relationships and collaborations with each other, the relation of one application to another and the entire solutions.
Primarily there are three levels of architectures created by Technical Architect (Software Architect, System Architect), Solution Architect and Enterprise Architect. Let’s understand them in brief.
Technical architects are higher technology focused and a lower strategy focused so they have a more hands-on approach defining best practice standards to follow, for example, Java Architect, Infrastructure Architect, Security Architect.
Solution architects typically straddle the middle position when it comes to strategy versus technology focus and organizational versus project scope. A Solution architect is responsible for designing a high level solution to a specific set of business requirements, within the framework laid down by the enterprise architecture team. This solution may span multiple applications.
Enterprise architects are responsible for strategic thinking, roadmaps, principles, and governance of the entire enterprise. They usually have a close relationship with the business, vendors, and senior IT management and therefore, handle the entire enterprise with the main interest in describing the company in terms of its business entities, its properties and the relations between them and the external environment. One of the principal concerns of Enterprise architecture are the lifecycle of the applications and what technologies are used by which one. At the same time, they ensure that the company as a whole has integrity and consistency.
Figure: Layered view of a system’s understanding
The interesting from learning all levels and their differentiation helps engineers to dive deeper into systems thinking enabling them to understand the trade-offs and adding high value towards desired qualities in the system.
Figure: Tools for Systems Thinking
Pair Programming Time To Time
Pair programming is like a double-edge sword, done effectively, it does wonders, otherwise could be the cause of chaos. When practiced in the presence of an experienced technical leader, it can do magic - transform a novice into a reliable software engineer. Since most of the developers are so used to thinking alone and coding alone, and because of this predominant behavior pair programming is perceived difficult, which definitely is not as the team is not equipped well enough to do it properly. Therefore, it should be practiced time and again to boost the technical and personal growth.
Lives In The Cloud Native Landscape
A successful software engineer understands the Cloud Native landscape. The Twelve-Factors App is the manuscript that helps to build in the Cloud. In short, it is a methodology that every successful engineer follows while building cloud-native distributed applications.
The sweet version of the manuscript:
Code: Manage all code in version control systems (like Git or Mercurial). The code base comprehensively dictates what is deployed.
Dependencies: Dependencies should be managed entirely and explicitly by the code base, either vendor specific (stored with the code) or version pinned in a format that a package manager can install from.
Config: Separate configuration parameters from the application and define them in the deployment environment instead of baking them into the application itself.
Backing Services: Local and remote services are both abstracted as network-accessible resources with connection details set in configuration.
Build, Release, Run: The build stage of your application should be completely separate from your application release and operations processes. The build stage creates a deployment artifact from source code, the release stage combines the artifact and configuration, and the run stage executes the release.
Processes: Applications are implemented as processes that should not rely on storing state locally. State should be offloaded to a backing service as described in the fourth factor.
Port binding: Applications should natively bind to a port and listen for connections. Routing and request forwarding should be handled externally.
Concurrency: Applications should rely on scaling through the process model. Running multiple copies of an application concurrently, potentially across multiple servers, allows scaling without adjusting application code.
Disposable: Processes should be able to start quickly and stop gracefully without serious side effects.
Dev/Prod: Your testing, staging, and production environments should match closely and be kept in sync. Differences between environments are opportunities for incompatibilities and untested configurations to appear.
Logs: Applications should stream logs to standard output so external services can decide how to best handle them.
Admin Processes: One-off administration processes should be run against specific releases and shipped with the main process code.
Last but not least a successful software engineer love this profession! Loves life and finds a work-life balance.