Hardware Architecture for Pick and Place with 3D Camera

1. Objective — What You Are Building and Why

In this lesson, you will understand the hardware architecture and the software architecture of a real industrial pick-and-place application with:

  • a 6-axis robot
  • a 3D camera
  • an edge AI device
  • a PC running ROS2
  • and optionally a PLC

This lesson is extremely important because before writing perception code, inverse kinematics code, or bin-picking logic, you must understand:

who talks to whom, through which cable, on which device, and for what purpose

In simulation, everything was easy because everything was inside the same virtual world.

In the real field, that is no longer true.

Now you have:

  • a real sensor
  • a real robot
  • a real controller
  • a real compute architecture
  • real network communication
  • real safety constraints

So this lesson is about building the correct mental map of the whole system.

We will explain the architecture in 3 steps.

2. Hardware Architecture — Step by Step

Step 1 — PC + Jetson + RealSense

This is the first building block of the system.

Physical connections

  • the RealSense camera is connected to the Jetson Orin Nano through a USB cable
  • the Jetson Orin Nano is connected to the PC through an Ethernet cable

Why do we do this?

Because in the real application, the camera is not just visualizing data.

It becomes the sensing device that gives us the information we need to understand:

  • where the object is
  • what the object orientation is
  • how that object is positioned in the workspace

To do this, we need AI inference.

That is why we choose the Jetson Orin Nano as an edge inference device.

Why the Jetson exists in this architecture

In simulation, you did not need a dedicated inference device.

But in the real world, if you want to detect and localize objects from RGB-D images, you usually need algorithms such as:

  • object detection
  • segmentation
  • later even pose estimation

And these algorithms often rely on convolutional neural networks, such as YOLO-based pipelines.

So the Jetson exists because it is the compute device that can process camera data efficiently and run inference closer to the sensor.

That is the reason for Step 1.

Information flow in Step 1

At this stage, the logic is:

  1. the RealSense captures RGB and depth data
  2. the Jetson receives those data streams
  3. the Jetson runs the perception/inference pipeline
  4. the PC, which runs the ROS2 orchestration, asks the Jetson for the result

So conceptually:

  • the camera is the input sensor
  • the Jetson is the perception and AI device
  • the PC is the ROS2 orchestrator

The PC does not need to do all the low-level image computation itself.

It delegates that job to the Jetson.

That is a clean architecture.

Very simple mental model for Step 1

Think of it like this:

  • RealSense = the eyes
  • Jetson = the visual brain
  • PC = the application manager

That is the easiest way to remember it.

Step 2 — Add the robot controller through an industrial switch

Now we move to the second stage.

At this point, it is not enough to have only:

  • PC
  • Jetson
  • camera

Now we want the perception result to actually drive a real robot.

To do that, the robot controller must be part of the same networked system.

Physical connections in Step 2

We use an industrial Ethernet switch.

The switch receives:

  • one Ethernet cable from the PC
  • one Ethernet cable from the Jetson Orin Nano
  • one Ethernet cable from the robot controller

Now all three devices are on the same network infrastructure.

Why do we need the switch?

Because now the PC must communicate with two different systems:

  1. the Jetson, to get perception results
  2. the robot controller, to send robot motion execution commands

So the switch is the network backbone of the application.

It is the piece that lets all the industrial devices live on the same communication layer.

Information flow in Step 2

This is the most important part of the lesson.

The logic now becomes:

A. Perception side

  1. the camera acquires the scene
  2. the Jetson processes the scene
  3. the Jetson computes where the object is, with respect to the camera frame

B. Orchestration side

  1. the PC, running ROS2, requests that information from the Jetson
  2. the PC receives the object pose or detection result
  3. the PC computes transforms and inverse kinematics
  4. the PC computes the desired end-effector pose with respect to the robot base

C. Robot side

  1. the PC sends the motion request through the robot driver layer / hardware interface
  2. the controller executes the motion on the real robot

So the PC is really the central orchestrator of the whole application.

It is not doing every computation itself, but it is coordinating all the actors.

Very simple mental model for Step 2

You can think of the architecture like this:

  • Jetson tells the PC where the object is
  • PC decides how the robot should move
  • controller makes the robot execute that motion

That is the core logic.

Step 3 — Add the PLC

Now we move to the final stage.

In many industrial applications, the robot system is not a stand-alone demo cell.

It is part of a bigger machine.

For example:

  • an existing packaging machine
  • a bottling machine
  • an assembly machine
  • a machine tending station

In these cases, there is usually already a PLC controlling the overall machine logic.

So we add the PLC to the architecture.

Physical connection in Step 3

The PLC is also connected to the same industrial switch through Ethernet.

Now the switch connects:

  • PC
  • Jetson
  • robot controller
  • PLC

Why the PLC matters

Because in a real machine, you often do not want ROS2 deciding by itself when to run the pick-and-place cycle.

Instead, the PLC is the machine supervisor.

So the PLC may tell the ROS2 application things like:

  • start a cycle
  • wait
  • object available
  • machine ready
  • robot zone free
  • pick sequence allowed

So in this last stage, the PLC becomes the trigger source for the whole pipeline.

Information flow in Step 3

Now the complete system works like this:

  1. the PLC gives the command to start the pick-and-place cycle
  2. the PC / ROS2 orchestrator receives the trigger
  3. the PC asks the Jetson to process the scene
  4. the Jetson returns the object location
  5. the PC computes transforms and inverse kinematics
  6. the PC sends robot execution commands to the controller
  7. the robot performs the pick-and-place motion
  8. optionally, ROS2 reports status back to the PLC

This is very useful when integrating robotics into an already existing industrial line.

So the PLC is not replaced.

It remains the high-level machine supervisor, while ROS2 manages the robotics and perception intelligence.

That is often the correct industrial architecture.

3. Software Architecture — How the software is split

Now that the hardware architecture is clear, let’s explain the software architecture.

This is just as important.

Because even if the cables are correct, if the software is not organized well, the system becomes hard to maintain, hard to debug, and hard to scale.

PC side — ROS2 orchestrator

The PC runs the main ROS2 application.

This is where the high-level orchestration happens.

On the PC, you typically have:

  • the ROS2 nodes that request perception results
  • the transform logic
  • the inverse kinematics or MoveIt logic
  • the robot hardware interface / bridge
  • the motion sequencing logic
  • later the full pick-and-place pipeline

So the PC is the main application layer.

It is the conductor of the orchestra.

Jetson side — split responsibilities with Docker

On the Jetson, we want a cleaner architecture.

Instead of putting everything into one big environment, we split responsibilities.

A good architecture is to use Docker containers.

Why?

Because containers make the Jetson software:

  • reproducible
  • isolated
  • portable
  • easier to debug

Container 1 — Camera streaming

The first container handles:

  • the RealSense driver / wrapper
  • RGB-D streaming
  • camera topics
  • point cloud / depth publishing

So this container is responsible for exposing the sensor data correctly.

Its job is:

get raw data out of the camera and publish them reliably

Container 2 — Perception / inference

The second container handles:

  • object detection
  • CNN inference
  • YOLO or other models
  • later more advanced perception logic

Its job is:

consume the camera data and produce perception results

So one container is for streaming, the other is for intelligence.

That is a very healthy separation.

Why split streaming and inference?

Because they are two different responsibilities.

If you mix everything together, then:

  • debugging becomes harder
  • updates become riskier
  • resource management becomes worse

If the streaming fails, you want to know it is the streaming container.If inference fails, you want to know it is the inference container.

This split gives you modularity.

And modularity is one of the main lessons of ROS and good robotics architecture.

ROS domain consistency

One critical practical point:

all the ROS2 systems that must communicate together need to be configured correctly.

That means, in particular, the ROS Domain ID must be consistent between the devices that are supposed to talk to each other.

So if:

  • the PC is on one ROS domain
  • the Jetson is on another ROS domain

then nodes may not discover each other correctly.

So one of the practical deployment checks is:

make sure the PC and the Jetson use the same ROS domain when they must exchange ROS2 data

This is one of those details that can make the whole system look broken even when the code is actually fine.

4. Putting it all together — Final architecture overview

Let’s now summarize the complete architecture in the simplest possible way.

Hardware

  • RealSense → attached to Jetson through USB
  • Jetson → connected to switch
  • PC → connected to switch
  • robot controller → connected to switch
  • PLC → connected to switch in the final industrial setup

Software

  • PC → ROS2 orchestrator, transforms, planning, robot execution
  • Jetson container 1 → camera streaming
  • Jetson container 2 → inference / object detection
  • controller → robot execution
  • PLC → external machine trigger and coordination

Information logic

  • camera sees scene
  • Jetson understands scene
  • PC decides the motion
  • robot executes
  • PLC supervises the machine cycle

That is the whole architecture.

5. Key Takeaways

After this lesson, you should clearly understand how the real pick-and-place system is structured.

You understood why the Jetson exists

It is not just another computer.

It is the edge device chosen to run inference close to the camera.

You understood the role of the PC

The PC is the ROS2 orchestrator.

It coordinates perception results, transforms, planning, and execution.

You understood the role of the industrial switch

It makes the PC, Jetson, controller, and PLC part of the same communication backbone.

You understood the role of the PLC

In a real industrial machine, the PLC often stays the overall machine supervisor and triggers the ROS2 robotic pipeline.

You understood the software split

The Jetson should preferably split:

  • camera streaming
  • inference

into separate containers.

That makes the architecture cleaner and more scalable.

Most importantly

You now have the mental model of how the hardware and software pieces communicate together.

That is the real victory of this lesson.

Because once this architecture is clear, the next lessons on:

  • perception
  • object detection
  • transforms
  • pick logic
  • PLC integration

will make much more sense.

Without architecture, the code feels random.

With architecture, every node has a purpose.

Complete and Continue  
Discussion

0 comments