Hardware Architecture for Pick and Place with 3D Camera

1. Objective — What You Are Building and Why

In this lesson, you will understand the hardware architecture and the software architecture of a real industrial pick-and-place application with:

a 6-axis robot
a 3D camera
an edge AI device
a PC running ROS2
and optionally a PLC

This lesson is extremely important because before writing perception code, inverse kinematics code, or bin-picking logic, you must understand:

who talks to whom, through which cable, on which device, and for what purpose

In simulation, everything was easy because everything was inside the same virtual world.

In the real field, that is no longer true.

Now you have:

a real sensor
a real robot
a real controller
a real compute architecture
real network communication
real safety constraints

So this lesson is about building the correct mental map of the whole system.

We will explain the architecture in 3 steps.

2. Hardware Architecture — Step by Step

Step 1 — PC + Jetson + RealSense

This is the first building block of the system.

Physical connections

the RealSense camera is connected to the Jetson Orin Nano through a USB cable
the Jetson Orin Nano is connected to the PC through an Ethernet cable

Why do we do this?

Because in the real application, the camera is not just visualizing data.

It becomes the sensing device that gives us the information we need to understand:

where the object is
what the object orientation is
how that object is positioned in the workspace

To do this, we need AI inference.

That is why we choose the Jetson Orin Nano as an edge inference device.

Why the Jetson exists in this architecture

In simulation, you did not need a dedicated inference device.

But in the real world, if you want to detect and localize objects from RGB-D images, you usually need algorithms such as:

object detection
segmentation
later even pose estimation

And these algorithms often rely on convolutional neural networks, such as YOLO-based pipelines.

So the Jetson exists because it is the compute device that can process camera data efficiently and run inference closer to the sensor.

That is the reason for Step 1.

Information flow in Step 1

At this stage, the logic is:

the RealSense captures RGB and depth data
the Jetson receives those data streams
the Jetson runs the perception/inference pipeline
the PC, which runs the ROS2 orchestration, asks the Jetson for the result

So conceptually:

the camera is the input sensor
the Jetson is the perception and AI device
the PC is the ROS2 orchestrator

The PC does not need to do all the low-level image computation itself.

It delegates that job to the Jetson.

That is a clean architecture.

Very simple mental model for Step 1

Think of it like this:

RealSense = the eyes
Jetson = the visual brain
PC = the application manager

That is the easiest way to remember it.

Step 2 — Add the robot controller through an industrial switch

Now we move to the second stage.

At this point, it is not enough to have only:

PC
Jetson
camera

Now we want the perception result to actually drive a real robot.

To do that, the robot controller must be part of the same networked system.

Physical connections in Step 2

We use an industrial Ethernet switch.

The switch receives:

one Ethernet cable from the PC
one Ethernet cable from the Jetson Orin Nano
one Ethernet cable from the robot controller

Now all three devices are on the same network infrastructure.

Why do we need the switch?

Because now the PC must communicate with two different systems:

the Jetson, to get perception results
the robot controller, to send robot motion execution commands

So the switch is the network backbone of the application.

It is the piece that lets all the industrial devices live on the same communication layer.

Information flow in Step 2

This is the most important part of the lesson.

The logic now becomes:

A. Perception side

the camera acquires the scene
the Jetson processes the scene
the Jetson computes where the object is, with respect to the camera frame

B. Orchestration side

the PC, running ROS2, requests that information from the Jetson
the PC receives the object pose or detection result
the PC computes transforms and inverse kinematics
the PC computes the desired end-effector pose with respect to the robot base

C. Robot side

the PC sends the motion request through the robot driver layer / hardware interface
the controller executes the motion on the real robot

So the PC is really the central orchestrator of the whole application.

It is not doing every computation itself, but it is coordinating all the actors.

Very simple mental model for Step 2

You can think of the architecture like this:

Jetson tells the PC where the object is
PC decides how the robot should move
controller makes the robot execute that motion

That is the core logic.

Step 3 — Add the PLC

Now we move to the final stage.

In many industrial applications, the robot system is not a stand-alone demo cell.

It is part of a bigger machine.

For example:

an existing packaging machine
a bottling machine
an assembly machine
a machine tending station

In these cases, there is usually already a PLC controlling the overall machine logic.

So we add the PLC to the architecture.

Physical connection in Step 3

The PLC is also connected to the same industrial switch through Ethernet.

Now the switch connects:

PC
Jetson
robot controller
PLC

Why the PLC matters

Because in a real machine, you often do not want ROS2 deciding by itself when to run the pick-and-place cycle.

Instead, the PLC is the machine supervisor.

So the PLC may tell the ROS2 application things like:

start a cycle
wait
object available
machine ready
robot zone free
pick sequence allowed

So in this last stage, the PLC becomes the trigger source for the whole pipeline.

Information flow in Step 3

Now the complete system works like this:

the PLC gives the command to start the pick-and-place cycle
the PC / ROS2 orchestrator receives the trigger
the PC asks the Jetson to process the scene
the Jetson returns the object location
the PC computes transforms and inverse kinematics
the PC sends robot execution commands to the controller
the robot performs the pick-and-place motion
optionally, ROS2 reports status back to the PLC

This is very useful when integrating robotics into an already existing industrial line.

So the PLC is not replaced.

It remains the high-level machine supervisor, while ROS2 manages the robotics and perception intelligence.

That is often the correct industrial architecture.

3. Software Architecture — How the software is split

Now that the hardware architecture is clear, let’s explain the software architecture.

This is just as important.

Because even if the cables are correct, if the software is not organized well, the system becomes hard to maintain, hard to debug, and hard to scale.

PC side — ROS2 orchestrator

The PC runs the main ROS2 application.

This is where the high-level orchestration happens.

On the PC, you typically have:

the ROS2 nodes that request perception results
the transform logic
the inverse kinematics or MoveIt logic
the robot hardware interface / bridge
the motion sequencing logic
later the full pick-and-place pipeline

So the PC is the main application layer.

It is the conductor of the orchestra.

Jetson side — split responsibilities with Docker

On the Jetson, we want a cleaner architecture.

Instead of putting everything into one big environment, we split responsibilities.

A good architecture is to use Docker containers.

Why?

Because containers make the Jetson software:

reproducible
isolated
portable
easier to debug

Container 1 — Camera streaming

The first container handles:

the RealSense driver / wrapper
RGB-D streaming
camera topics
point cloud / depth publishing

So this container is responsible for exposing the sensor data correctly.

Its job is:

get raw data out of the camera and publish them reliably

Container 2 — Perception / inference

The second container handles:

object detection
CNN inference
YOLO or other models
later more advanced perception logic

Its job is:

consume the camera data and produce perception results

So one container is for streaming, the other is for intelligence.

That is a very healthy separation.

Why split streaming and inference?

Because they are two different responsibilities.

If you mix everything together, then:

debugging becomes harder
updates become riskier
resource management becomes worse

If the streaming fails, you want to know it is the streaming container.
If inference fails, you want to know it is the inference container.

This split gives you modularity.

And modularity is one of the main lessons of ROS and good robotics architecture.

ROS domain consistency

One critical practical point:

all the ROS2 systems that must communicate together need to be configured correctly.

That means, in particular, the ROS Domain ID must be consistent between the devices that are supposed to talk to each other.

So if:

the PC is on one ROS domain
the Jetson is on another ROS domain

then nodes may not discover each other correctly.

So one of the practical deployment checks is:

make sure the PC and the Jetson use the same ROS domain when they must exchange ROS2 data

This is one of those details that can make the whole system look broken even when the code is actually fine.

4. Putting it all together — Final architecture overview

Let’s now summarize the complete architecture in the simplest possible way.

Hardware

RealSense → attached to Jetson through USB
Jetson → connected to switch
PC → connected to switch
robot controller → connected to switch
PLC → connected to switch in the final industrial setup

Software

PC → ROS2 orchestrator, transforms, planning, robot execution
Jetson container 1 → camera streaming
Jetson container 2 → inference / object detection
controller → robot execution
PLC → external machine trigger and coordination

Information logic

camera sees scene
Jetson understands scene
PC decides the motion
robot executes
PLC supervises the machine cycle

That is the whole architecture.

5. Key Takeaways

After this lesson, you should clearly understand how the real pick-and-place system is structured.

You understood why the Jetson exists

It is not just another computer.

It is the edge device chosen to run inference close to the camera.

You understood the role of the PC

The PC is the ROS2 orchestrator.

It coordinates perception results, transforms, planning, and execution.

You understood the role of the industrial switch

It makes the PC, Jetson, controller, and PLC part of the same communication backbone.

You understood the role of the PLC

In a real industrial machine, the PLC often stays the overall machine supervisor and triggers the ROS2 robotic pipeline.

You understood the software split

The Jetson should preferably split:

camera streaming
inference

into separate containers.

That makes the architecture cleaner and more scalable.

Most importantly

You now have the mental model of how the hardware and software pieces communicate together.

That is the real victory of this lesson.

Because once this architecture is clear, the next lessons on:

perception
object detection
transforms
pick logic
PLC integration

will make much more sense.

Without architecture, the code feels random.

With architecture, every node has a purpose.

Complete and Continue

Discussion