Hardware Architecture for Pick and Place with 3D Camera
1. Objective — What You Are Building and Why
In this lesson, you will understand the hardware architecture and the software architecture of a real industrial pick-and-place application with:
- a 6-axis robot
- a 3D camera
- an edge AI device
- a PC running ROS2
- and optionally a PLC
This lesson is extremely important because before writing perception code, inverse kinematics code, or bin-picking logic, you must understand:
who talks to whom, through which cable, on which device, and for what purpose
In simulation, everything was easy because everything was inside the same virtual world.
In the real field, that is no longer true.
Now you have:
- a real sensor
- a real robot
- a real controller
- a real compute architecture
- real network communication
- real safety constraints
So this lesson is about building the correct mental map of the whole system.
We will explain the architecture in 3 steps.
2. Hardware Architecture — Step by Step
Step 1 — PC + Jetson + RealSense
This is the first building block of the system.
Physical connections
- the RealSense camera is connected to the Jetson Orin Nano through a USB cable
- the Jetson Orin Nano is connected to the PC through an Ethernet cable
Why do we do this?
Because in the real application, the camera is not just visualizing data.
It becomes the sensing device that gives us the information we need to understand:
- where the object is
- what the object orientation is
- how that object is positioned in the workspace
To do this, we need AI inference.
That is why we choose the Jetson Orin Nano as an edge inference device.
Why the Jetson exists in this architecture
In simulation, you did not need a dedicated inference device.
But in the real world, if you want to detect and localize objects from RGB-D images, you usually need algorithms such as:
- object detection
- segmentation
- later even pose estimation
And these algorithms often rely on convolutional neural networks, such as YOLO-based pipelines.
So the Jetson exists because it is the compute device that can process camera data efficiently and run inference closer to the sensor.
That is the reason for Step 1.
Information flow in Step 1
At this stage, the logic is:
- the RealSense captures RGB and depth data
- the Jetson receives those data streams
- the Jetson runs the perception/inference pipeline
- the PC, which runs the ROS2 orchestration, asks the Jetson for the result
So conceptually:
- the camera is the input sensor
- the Jetson is the perception and AI device
- the PC is the ROS2 orchestrator
The PC does not need to do all the low-level image computation itself.
It delegates that job to the Jetson.
That is a clean architecture.
Very simple mental model for Step 1
Think of it like this:
- RealSense = the eyes
- Jetson = the visual brain
- PC = the application manager
That is the easiest way to remember it.
Step 2 — Add the robot controller through an industrial switch
Now we move to the second stage.
At this point, it is not enough to have only:
- PC
- Jetson
- camera
Now we want the perception result to actually drive a real robot.
To do that, the robot controller must be part of the same networked system.
Physical connections in Step 2
We use an industrial Ethernet switch.
The switch receives:
- one Ethernet cable from the PC
- one Ethernet cable from the Jetson Orin Nano
- one Ethernet cable from the robot controller
Now all three devices are on the same network infrastructure.
Why do we need the switch?
Because now the PC must communicate with two different systems:
- the Jetson, to get perception results
- the robot controller, to send robot motion execution commands
So the switch is the network backbone of the application.
It is the piece that lets all the industrial devices live on the same communication layer.
Information flow in Step 2
This is the most important part of the lesson.
The logic now becomes:
A. Perception side
- the camera acquires the scene
- the Jetson processes the scene
- the Jetson computes where the object is, with respect to the camera frame
B. Orchestration side
- the PC, running ROS2, requests that information from the Jetson
- the PC receives the object pose or detection result
- the PC computes transforms and inverse kinematics
- the PC computes the desired end-effector pose with respect to the robot base
C. Robot side
- the PC sends the motion request through the robot driver layer / hardware interface
- the controller executes the motion on the real robot
So the PC is really the central orchestrator of the whole application.
It is not doing every computation itself, but it is coordinating all the actors.
Very simple mental model for Step 2
You can think of the architecture like this:
- Jetson tells the PC where the object is
- PC decides how the robot should move
- controller makes the robot execute that motion
That is the core logic.
Step 3 — Add the PLC
Now we move to the final stage.
In many industrial applications, the robot system is not a stand-alone demo cell.
It is part of a bigger machine.
For example:
- an existing packaging machine
- a bottling machine
- an assembly machine
- a machine tending station
In these cases, there is usually already a PLC controlling the overall machine logic.
So we add the PLC to the architecture.
Physical connection in Step 3
The PLC is also connected to the same industrial switch through Ethernet.
Now the switch connects:
- PC
- Jetson
- robot controller
- PLC
Why the PLC matters
Because in a real machine, you often do not want ROS2 deciding by itself when to run the pick-and-place cycle.
Instead, the PLC is the machine supervisor.
So the PLC may tell the ROS2 application things like:
- start a cycle
- wait
- object available
- machine ready
- robot zone free
- pick sequence allowed
So in this last stage, the PLC becomes the trigger source for the whole pipeline.
Information flow in Step 3
Now the complete system works like this:
- the PLC gives the command to start the pick-and-place cycle
- the PC / ROS2 orchestrator receives the trigger
- the PC asks the Jetson to process the scene
- the Jetson returns the object location
- the PC computes transforms and inverse kinematics
- the PC sends robot execution commands to the controller
- the robot performs the pick-and-place motion
- optionally, ROS2 reports status back to the PLC
This is very useful when integrating robotics into an already existing industrial line.
So the PLC is not replaced.
It remains the high-level machine supervisor, while ROS2 manages the robotics and perception intelligence.
That is often the correct industrial architecture.
3. Software Architecture — How the software is split
Now that the hardware architecture is clear, let’s explain the software architecture.
This is just as important.
Because even if the cables are correct, if the software is not organized well, the system becomes hard to maintain, hard to debug, and hard to scale.
PC side — ROS2 orchestrator
The PC runs the main ROS2 application.
This is where the high-level orchestration happens.
On the PC, you typically have:
- the ROS2 nodes that request perception results
- the transform logic
- the inverse kinematics or MoveIt logic
- the robot hardware interface / bridge
- the motion sequencing logic
- later the full pick-and-place pipeline
So the PC is the main application layer.
It is the conductor of the orchestra.
Jetson side — split responsibilities with Docker
On the Jetson, we want a cleaner architecture.
Instead of putting everything into one big environment, we split responsibilities.
A good architecture is to use Docker containers.
Why?
Because containers make the Jetson software:
- reproducible
- isolated
- portable
- easier to debug
Container 1 — Camera streaming
The first container handles:
- the RealSense driver / wrapper
- RGB-D streaming
- camera topics
- point cloud / depth publishing
So this container is responsible for exposing the sensor data correctly.
Its job is:
get raw data out of the camera and publish them reliably
Container 2 — Perception / inference
The second container handles:
- object detection
- CNN inference
- YOLO or other models
- later more advanced perception logic
Its job is:
consume the camera data and produce perception results
So one container is for streaming, the other is for intelligence.
That is a very healthy separation.
Why split streaming and inference?
Because they are two different responsibilities.
If you mix everything together, then:
- debugging becomes harder
- updates become riskier
- resource management becomes worse
If the streaming fails, you want to know it is the streaming container.
If inference fails, you want to know it is the inference container.
This split gives you modularity.
And modularity is one of the main lessons of ROS and good robotics architecture.
ROS domain consistency
One critical practical point:
all the ROS2 systems that must communicate together need to be configured correctly.
That means, in particular, the ROS Domain ID must be consistent between the devices that are supposed to talk to each other.
So if:
- the PC is on one ROS domain
- the Jetson is on another ROS domain
then nodes may not discover each other correctly.
So one of the practical deployment checks is:
make sure the PC and the Jetson use the same ROS domain when they must exchange ROS2 data
This is one of those details that can make the whole system look broken even when the code is actually fine.
4. Putting it all together — Final architecture overview
Let’s now summarize the complete architecture in the simplest possible way.
Hardware
- RealSense → attached to Jetson through USB
- Jetson → connected to switch
- PC → connected to switch
- robot controller → connected to switch
- PLC → connected to switch in the final industrial setup
Software
- PC → ROS2 orchestrator, transforms, planning, robot execution
- Jetson container 1 → camera streaming
- Jetson container 2 → inference / object detection
- controller → robot execution
- PLC → external machine trigger and coordination
Information logic
- camera sees scene
- Jetson understands scene
- PC decides the motion
- robot executes
- PLC supervises the machine cycle
That is the whole architecture.
5. Key Takeaways
After this lesson, you should clearly understand how the real pick-and-place system is structured.
You understood why the Jetson exists
It is not just another computer.
It is the edge device chosen to run inference close to the camera.
You understood the role of the PC
The PC is the ROS2 orchestrator.
It coordinates perception results, transforms, planning, and execution.
You understood the role of the industrial switch
It makes the PC, Jetson, controller, and PLC part of the same communication backbone.
You understood the role of the PLC
In a real industrial machine, the PLC often stays the overall machine supervisor and triggers the ROS2 robotic pipeline.
You understood the software split
The Jetson should preferably split:
- camera streaming
- inference
into separate containers.
That makes the architecture cleaner and more scalable.
Most importantly
You now have the mental model of how the hardware and software pieces communicate together.
That is the real victory of this lesson.
Because once this architecture is clear, the next lessons on:
- perception
- object detection
- transforms
- pick logic
- PLC integration
will make much more sense.
Without architecture, the code feels random.
With architecture, every node has a purpose.
0 comments